Correlation as an Asset Class and the Smile

Robert L. Kosowski , Salih N. Neftci , in Principles of Financial Engineering (Third Edition), 2015

16.5 Application to Option Payoffs

The major advantage of the dirac delta functions, interpreted as the limits of distributions, is in differentiating functions that have points that cannot be differentiated in the usual sense. There are many such points in option trading. The payoff at the strike K is one example. Knock-in, knock-out barriers is another example. Dirac delta will be useful for discussing derivatives at those points.

Before we proceed, for simplicity we will assume in this section that interest rates are equal to zero:

(16.7) r t = 0

We also assume that the underlying S T follows the risk-neutral SDE, which in this case will be given by

(16.8) d S T = σ ( S t ) S t d W t

Note that with interest rates being zero, the drift is eliminated and that the volatility is not of the Black–Scholes form. It depends on the random variable S t . Let

(16.9) f ( S T ) = max [ S T K , 0 ] = ( S t K ) +

be the vanilla call option payoff shown in Figure 16.4. The function is not differentiable at S T =K, yet its first-order derivative is like a step function. More interestingly, the second-order derivative can be interpreted as a dirac delta function. These derivatives are shown in Figures 16.4 and 16.5.

Figure 16.4. Call option payoff and first derivative.

Figure 16.5. Second derivative and dirac delta function.

Now write the equivalent of Ito's Lemma in a setting where functions have kinks as in the option payoff case. This is called Tanaka's formula and essentially extends Ito's Lemma to functions that cannot be differentiated at all points. We can write

(16.10) d ( S t K ) + = ( S t K ) + S t d S t + 1 2 2 ( S t K ) + S t 2 σ ( S t ) 2 d t

where we define

(16.11) ( S t K ) + S t = 1 s t > K

(16.12) 2 ( S t K ) + S t 2 = δ K ( S t )

Taking integrals from t 0 to T we get:

(16.13) ( S T K ) + = ( S t 0 K ) + + t 0 T 1 s t > K d S t + 1 2 t 0 T 2 ( S t K ) + S t 2 σ ( s t ) 2 d t

where the first term on the right-hand side is the time value of the option at time t 0 and is known with certainty. We also know that with zero interest rates, the option price C ( S t 0 ) will be given by

(16.14) C ( s t 0 ) = E t 0 P ˜ ( S T K ) +

Now, using the risk-adjusted probability P ˜ , (i) apply the expectation operator to both sides of Eq. (16.13), (ii) change the order of integration and expectation, and (iii) use the property of dirac delta functions in eliminating the terms valued at points other than S t =K. We obtain the characterization of the option price as:

(16.15) E t P [ ( S T K ) + ] = ( S t 0 K ) + + t 0 T σ ( K ) 2 ϕ t ( K ) d t = C ( S t 0 )

where ϕ t ( . ) is the continuous density function that corresponds to the risk-adjusted probability of S t . 4 This means that the time value of the option depends (i) on the intrinsic value of the option, (ii) on the time spent around K during the life of the option, and (iii) on the volatility at that strike, σ(K).

The main point for us is that this expression shows that the option price depends not on the overall volatility, but on the volatility of S t around K. This is exactly what the notion of volatility smile is.

16.5.1 An Interpretation of Dynamic Hedging

There are many dynamic strategies that replicate an option's final payoff. The best known is delta hedging. In delta hedging, the financial engineer will buy or sell the delta=D t units of the underlying, borrow any necessary funds, and adjust D t as the underlying S t moves over time. As t→T, the expiration date, this will duplicate the option's payoff. This is the case because, as the time value goes to zero the option price merges with (S T K)+.

However, there is an alternative dynamic hedging procedure that is similar to the approach adopted in the previous section. The dynamic hedging technique, called stop-loss strategy, is as follows.

In order to replicate the payoff of the long call, hold one unit of S t if K<S t . Otherwise hold no S t . This strategy requires that as S t crosses level K, we keep adjusting the position as soon as possible. Either buy one unit of S t or sell the S t immediately as S t crosses the K from left to right or from right to left, respectively. The P/L of this position is given by the term

(16.16) 1 2 t 0 T 2 ( S t 0 K ) + S t 2 σ ( S t ) 2 d t

Clearly the switches at S t =K cannot be done instantaneously at zero cost. The trader is moving with time Δ while the underlying Wiener process is moving at a faster rate Δ . These adjustments are shown in Figures 16.6 and 16.7. The resulting hedging cost is the options value.

Figure 16.6. Hedging strategy adjustment—call option.

Figure 16.7. Hedging strategy adjustment and positions.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123869685000169

Physical Basis of Acoustics

J.P. Lefebvre , in Acoustics, 1999

1.1.1 Conservation equations

Conservation equations describe conservation of mass (continuity equation), momentum (motion equation) and energy (from the first law of thermodynamics).

Let us consider a portion of material volume Ω filled with a (piecewise) continuous medium (for greater generality we suppose the existence of a discontinuity surface ∑ – a shock wave or an interface – moving at velocity V ); the equations of conservation of mass, momentum, and energy are as follows.

Mass conservation equation (or continuity equation). The hypothesis of continuous medium allows us to introduce the notion of a (piecewise continuous) density function ρ, so that the total mass M of the material volume Ω is M = Ω ρ dΩ; the mass conservation (or continuity) equation is written

(1.1) d d t Ω ρ d Ω = 0

where d/dt is the material time derivative of the volume integral.

Momentum conservation equation (or motion equation). Let υ be the local velocity; the momentum of the material volume Ω is defined as Ω ρ υ dΩ; and, if σ is the stress tensor and F the supply of body forces per unit volume (or volumic force source), the momentum balance equation for a volume Ω of boundary S with outward normal n is written

(1.2) d d t Ω ρ υ d Ω = S σ . n d S + Ω F d Ω

Energy conservation equation (first law of thermodynamics). If e is the specific internal energy, the total energy of the material volume Ω is Ω ρ ( e + 1 2 υ 2 ) dΩ; and, if q is the heat flux vector and r the heat supply per unit volume and unit time (or volumic heat source), the energy balance equation for a volume Ω of boundary S with outer normal n is written

(1.3) d d t Ω ρ ( e + 1 2 υ 2 ) d Ω = S ( σ · υ q ) · n d S + Ω ( F · υ + r ) d Ω

Using the lemma on derivatives of integrals over a material volume Ω crossed by a discontinuity ∑ of velocity V :

ϕ ; d d t Ω ϕ d Ω = Ω ( ϕ t + · ( ϕ υ ) ) d Ω + Σ [ ϕ ( υ V ) · n ] Σ d Ω

where [Φ] designates the jump Φ(2) – Φ(1) of the quantity Φ at the crossing of the discontinuity surface ∑; or

d d t Ω ϕ d Ω = Ω ( d ϕ d t + ϕ · υ ) d Ω + Σ [ ϕ ( υ V ) · n ] Σ d σ

with

ϕ ˙ d ϕ d t = ϕ t + υ · ϕ

the material time derivative of the function ɸ. Using the formula

U : S U · n d S = Ω · U d Ω + Σ [ U · n ] Σ d σ

one obtains

Ω ( d ρ d t + ρ · υ ) d Ω + Σ [ ρ ( υ V ) · n ] Σ d σ = 0 Ω ( d d t ( ρ υ ) + ρ υ · υ ) d Ω + Σ [ ( ρ υ ( υ V ) ) · n ] Σ d σ = Ω · σ d Ω + Σ [ σ · n ] Σ d σ + Ω F d Ω Ω ( d d t ρ ( e + 1 2 υ 2 ) + ρ ( e + 1 2 υ 2 ) · υ ) d Ω + Σ [ ρ ( e + 1 2 υ 2 ) ( υ V ) · n ] Σ d σ = Ω · ( σ · υ q ) d Ω + Σ [ ( σ · υ q ) · n ] Σ d σ + Ω ( F · υ + r ) d Ω

or

Ω ( d ρ d t + ρ · υ ) d Ω + Σ [ ρ ( υ V ) · n ] Σ d σ = 0 Ω ( d d t ( ρ υ ) + ρ υ · υ · σ F ) d Ω + Σ [ ( ρ υ ( υ V ) σ ) · n ] Σ d σ = 0 Ω ( d d t ( ρ ( e + 1 2 υ 2 ) ) + ρ ( e + 1 2 υ 2 ) · υ · ( σ · υ q ) ( F · υ + r ) ) d Ω + Σ [ ( ρ ( e + 1 2 υ 2 ) ( υ V ) ( σ · υ q ) ) · n ] Σ d σ = 0

The continuity hypothesis states that all equations are true for any material volume Ω. So one finds the local forms of the conservation equations:

d ρ d t + ρ · υ = 0 [ ρ ( υ V ) · n ] Σ = 0 d d t ( ρ υ ) + ρ υ · υ · σ F = 0 [ ( ρ υ ( υ V ) σ ) · n ] Σ = 0 d d t ( ρ ( e + 1 2 υ 2 ) ) + ρ ( e + 1 2 υ 2 ) · υ · ( σ · υ q ) ( F · υ + r ) = 0 [ ( ρ ( e + 1 2 υ 2 ) ( υ V ) ( σ · υ q ) ) · n ] Σ = 0

So at the discontinuities, one obtains

(1.4) { [ ρ ( N V ) · n ] Σ = 0 [ ( ρ υ ( υ V ) σ ) · n ] Σ = 0 [ ( ρ ( e + 1 2 N 2 ) ( N V ) ( σ · N q ) ) · n ] Σ = 0

Away from the discontinuity surfaces combining the first three local forms of the conservation equations above, one obtains

(1.5) { ρ ˙ + ρ · υ = 0 ρ υ · σ = F ρ e ˙ + · q = σ : D + r

or

(1.6) { ρ t + · ( ρ υ ) = 0 ρ ( υ t + υ · υ ) · σ = F ρ ( e t + υ · e ) + · q = σ : D + r

with D = 1 2 ( υ + T υ ) the strain rate tensor.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780122561900500026

COMPENDIUM OF THE FOUNDATIONS OF CLASSICAL STATISTICAL PHYSICS

Jos Uffink , in Philosophy of Physics, 2007

5.5 Comments

Gibbs' statistical mechanics has produced a formalism with clearly delineated concepts and methods, using only Hamiltonian mechanics and probability theory. It can and is routinely used to calculate equilibrium properties of gases and other systems by introducing a specific form of the Hamiltonian. The main problems that Gibbs has left open are, first, the motivation for the special choice of the equilibrium ensembles and, second, that the quantities serving as thermodynamic analogies are not uniquely defined. However, much careful work has been devoted to show that, under certain assumptions about tempered interaction of molecules, unique thermodynamic state functions, with their desired properties are obtained in the 'thermodynamic limit' (cf. §6.3).

1.

Motivating the choice of ensemble. While Gibbs had not much more to offer in recommendation of these three ensembles than their simplicity as candidates for representation for equilibrium, modern views often provide an additional story. First, the microcanonical ensemble is particularly singled out for describing an ensemble of systems in thermal isolation with a fixed energy E.

Arguments for this purpose come in different kinds. As argued by Boltzmann (1868), and shown more clearly by Einstein (1902), the microcanonical ensemble is the unique stationary density for an isolated ensemble of systems with fixed energy, if one assumes the ergodic hypothesis. Unfortunately, for this argument, the ergodic hypothesis is false for any system that has a phase space of dimension 2 or higher (cf. paragraph 6.1).

A related but more promising argument relies on the theorem that the measure P mc associated with the microcanonical ensemble via P mc(A) = ∫ A ρmc(x)dx is the unique stationary measure under all measures that are absolutely continuous with respect to P mc, if one assumes that the system is metrically transitive (again, see paragraph 6.1).

This argument is applicable for more general systems, but its conclusion is weaker. In particular, one would now have to argue that physically interesting systems are indeed metrically transitive, and why measures that are not absolutely continuous with respect to the microcanonical one are somehow to be disregarded. The first problem is still an open question, even for the hard-spheres model (as we shall see in paragraph 6.1). The second question can be answered in a variety of ways.

For example, [Penrose, 1979, p. 1941] adopts a principle that every ensemble should be representable by a (piecewise) continuous density function, in order to rule out "physically unreasonable cases". (This postulate implies absolute continuity of the ensemble measure with respect to the microcanonical measure by virtue of the Radon-Nikodym theorem.) See [Kurth, 1960, p. 78] for a similar postulate. Another argument, proposed by [Malament and Zabell, 1980], assumes that the measure P associated with a physically meaningful ensemble should have a property called 'translation continuity'. Roughly, this notion means that the probability assigned to any measurable set should be a continuous function under small displacements of that set within the energy hypersurface. Malament & Zabell show that this property is equivalent to absolute continuity of P with respect to μmc, and thus singles out the microcanonical measure uniquely if the system is metrically transitive (see [van Lith, 2001b, for a more extensive discussion]).

A third approach, due to Tolman and Jaynes, more or less postulates the microcanonical density, as a appropriate description of our knowledge about the microstate of a system with given energy (regardless of whether the system is metrically transitive or not).

Once the microcanonical ensemble is in place as a privileged description of an isolated system with a fixed energy, one can motivate the corresponding status for the other ensembles with relatively less effort. The canonical distribution is shown to provide the description of a small system S1 in weak energetic contact with a larger system S 2, acting as a 'heat bath' (see [Gibbs, 1902, p. 180–183]). Here, it is assumed that the total system is isolated and described by a microcanonical ensemble, where the total system has a Hamiltonian H tot = H 1 + H 2 + H int with H 2 >> H 1 >> H int. More elaborate versions of such an argument are given by Einstein (1902) and Martin-Löf (1979). Similarly, the grand-canonical ensemble can be derived for a small system that can exchange both energy and particles with a large system. (see [van Kampen, 1984]).

2.

The 'equivalence' of ensembles. It is often argued in physics textbooks that the choice between these different ensembles (say the canonical and micro-canonical) is deprived of practical relevance by a claim that they are all "equivalent". (See [Lorentz, 1916, p. 32] for perhaps the earliest version of this argument, or [Thompson, 1972, p. 72; Huang, 1987, p. 161–2] for recent statements.) What is meant by this claim is that if the number of constituents increases, N → ∞, and the total Hamiltonian is proportional to N, the thermodynamic relations derived from each of them will coincide in this limit.

However, these arguments should not be mistaken as settling the empirical equivalence of the various ensembles, even in this limit. For example, it can be shown that the microcanonical ensemble admits the description of certain metastable thermodynamic states, (e.g. with negative heat capacity) that are excluded in the canonical ensemble (see [Touchette, 2003; Touchette et al., 2004, and literature cited therein]).

3.

The coarse-grained entropy. The coarse-graining approach is reminiscent of Boltzmann's construction of cells in his (1877b); cf. the discussion in paragraph 4.4). The main difference is that here one assumes a partition on phase-space Γ, where Boltzmann adopted it in the μ-space. Nevertheless, the same issues about the origin or status of a privileged partition can be debated (cf. p. 977). If one assumes that the partition is intended to represent what we know about the system, i.e. if one argues that all we know is whether its state falls in a particular cell ω i , it can be argued that the its status is subjective. If one argues that the partition is meant to represent limitations in the precision of human observational possibilities, perhaps enriched by instruments, i.e. that we cannot observe more about the system than that its state is in some cell ωi , one might argue that its choice is objective, in the sense that there are objective facts about what a given epistemic community can observe or not. Of course, one can then still maintain that the status of the coarse-graining would then be anthropocentric (see also the discussion in §7.5). However, note that Gibbs himself did not argue for a preferential size of the cells in phase space, but for taking the limit in which their size goes to zero in a different order.

4.

Statistical equilibrium. Finally, a remark about Gibbs' notion of equilibrium. This is fundamentally different from Boltzmann's 1877 notion of equilibrium as the macrostate corresponding to the region occupying the largest volume in phase space (cf. section 4.4). For Gibbs, statistical equilibrium can only apply to an ensemble. And since any given system can be regarded as belonging to an infinity of different ensembles, it makes no sense to say whether an individual system is in statistical equilibrium or not. In contrast, in Boltzmann's case, equilibrium can be attributed to a single system (namely if the microstate of that system is an element of the set Γeq ⊂ Γ). But it is not guaranteed to remain there for all times.

Thus, one might say that in comparison with the orthodox thermodynamical notion of equilibrium (which is both stationary and a property of an individual system) Boltzmann (1877b) and Gibbs each made an opposite choice about which aspect to preserve and which aspect to sacrifice. See [Uffink, 1996b; Callender, 1999; Lavis, 2005] for further discussions.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444515605500129

Probability Distributions I

B.R. Martin , in Statistics for Physical Science, 2012

3.2 Single Variates

In this section we will examine the case of a single random variable. The ideas discussed here will be extended to the multivariate case in Section 3.3.

3.2.1 Probability Distributions

First we will need some definitions that extend those given in Chapter 2 in the discussion of the axioms of probability, starting with the case of a single discrete random variable. If x is a discrete random variable that can take the values x k ( k = 1 , 2 , ) with probabilities P [ x k ] , then we can define a probability distribution f ( x ) by

(3.1a) P [ x ] f ( x ) .

Thus,

(3.1b) P [ x k ] = f ( x k ) for x = x k , otherwise f ( x ) = 0.

To distinguish between the cases of discrete and continuous variables, the probability distribution for the former is often called the probability mass function (or simply a mass function) sometimes abbreviated to pmf. A pmf satisfies the following two conditions:

1.

f ( x ) is a single-valued non-negative real number for all real values of x, i.e., f ( x ) 0 ;

2.

f ( x ) summed over all values of x is unity:

(3.1c) x f ( x ) = 1 .

We saw in Chapter 1 that we are also interested in the probability that x is less than or equal to a given value. This was called the cumulative distribution function (or simply the distribution function), sometimes abbreviated to cdf, and is given by

(3.2a) F ( x ) = x k x f ( x k ) .

So, if x takes on the values x k ( k = 1 , 2 , n ) , the cumulative distribution function is

(3.2b) F ( x ) = { 0 < x < x 1 f ( x 1 ) x 1 x < x 2 f ( x 1 ) + f ( x 2 ) x 2 x < x 3 f ( x 1 ) + + f ( x n ) x n x <

F ( x ) is a nondecreasing function with limits 0 and 1 as x and x + , respectively. The quantile x α of order α , defined in Chapter 1, is thus the value of x such that F ( x α ) = α , with 0 α 1 , and so x α = F 1 ( α ) , where F 1 is the inverse function of F. For example, the median is x 0.5 .

As sample sizes become larger, frequency plots tend to approximate smooth curves and if the area of the histogram is normalized to unity, as in Fig. 1.5, the resulting function f ( x ) is a continuous probability density function (or simply a density function) abbreviated to pdf, introduced in Chapter 1. The definitions above may be extended to continuous random variables with the appropriate changes. Thus, for a continuous random variable x, with a pdf f ( x ) , (3.2a) becomes

(3.3) F ( x ) = x f ( x ) d x , ( < x < ) .

It follows from (3.3) that if a member of a population is chosen at random, that is, by a method that makes it equally likely that each member will be chosen, then F ( x ) is the probability that the member will have a value x . While all this is clearly consistent with earlier definitions, once again we should note the element of circularity in the concept of randomness defined in terms of probability. In mathematical statistics it is usual to start from the cumulative distribution and define the density function as its derivative. For the mathematically well-behaved distributions usually met in physical science the two approaches are equivalent.

The density function f ( x ) has the following properties analogous to those for discrete variables.

1.

f ( x ) is a single-valued non-negative real number for all real values of x.

In the frequency interpretation of probability, f ( x ) d x is the probability of observing the quantity x in the range ( x , x + d x ) . Thus, the second condition is:

2.

f ( x ) is normalized to unity:

+ f ( x ) d x = 1 .

It follows from property 2 that the probability of x lying between any two real values a and b for which a < b is given by

(3.4) P [ a x b ] = a b f ( x ) d x ,

and so, unlike a discrete random variable, the probability of a continuous random variable assuming exactly any of its values is zero. This result may seem rather paradoxical at first until you consider that between any two values a and b there is an infinite number of other values and so the probability of selecting an exact value from this infinitude of possibilities must be zero. The density function cannot therefore be given in a tabular form like that of a discrete random variable.

EXAMPLE 3.1

A family has 5 children. Assuming that the birth of a girl or boy is equally likely, construct a frequency table of possible outcomes and plot the resulting probability mass function f ( g ) and the associated cumulative distribution function F ( g ) .

The probability of a sequence of births containing g girls (and hence b = 5 g boys) is ( 1 2 ) g ( 1 2 ) b = ( 1 2 ) 5 . However there are C g 5 such sequences, and so the probability of having g girls is P [ g ] = C g 5 / 32 . The probability mass function f ( g ) is thus as given in the table below.

g 0 1 2 3 4 5
f ( g ) 1/32 5/32 10/32 10/32 5/32 1/32

From this table we can find the cumulative distribution function using (3.2b), and f ( g ) and F ( g ) are plotted in Figs 3.1(a) and (b), respectively, below.

FIGURE 3.1. Plots of the probability mass function f ( g ) and the cumulative distribution function F ( g ) .

EXAMPLE 3.2

Find the value of N in the continuous density function:

f ( x ) = { N e x x 2 / 2 x 0 0 x < 0 ,

and find its associated distribution function F ( x ) . Plot f ( x ) and F ( x ) .

Because f ( x ) has to be correctly normalized, to find N we evaluate the integral:

N 2 0 e x x 2 d x = 1.

Integrating by parts, gives

1 N = 1 2 0 e x x 2 d x = 1 2 [ e x ( x 2 + 2 x + 2 ) ] 0 = 1 ,

so that N

=

 

1. The resulting density function is plotted in Fig. 3.2(a). The associated distribution function is

F ( x ) = 1 2 0 x e u u 2 d u = 1 2 [ e u ( u 2 + 2 u + 2 ) ] 0 x = 1 2 e x ( x 2 + 2 x + 2 ) + 1 ,

and is shown in Fig. 3.2(b).

FIGURE 3.2. Probability density function f ( x ) = e x x 2 / 2 ( x 0 ) and the corresponding cumulative distribution function F ( x ) .

Some of the earlier definitions of Chapter 1 may now be rewritten in terms of these formal definitions. Thus, the general moments about an arbitrary point λ are, for a continuous variate,

(3.5) μ n = + f ( x ) ( x λ ) n ,

so that the mean and variance, also with respect to the point λ , are

(3.6) μ λ = + f ( x ) ( x λ ) d x and σ 2 = + f ( x ) ( x μ λ ) 2 d x ,

respectively. The integrals in (3.5) may not converge for all n, and some distributions possess only the trivial zero-order moment. For convenience, usually λ = 0 will be used in what follows.

3.2.2 Expectation Values

The expectation value, also called the expected value, of a random variable is obtained by finding the average value of the variate over all its possible values weighted by the probability of their occurrence. Thus, if x is a discrete random variable with the possible values x 1 , x 2 , , x n , then the expectation value of x is defined as

(3.7) E [ x ] i = 1 n x i P [ x i ] = x x f ( x ) ,

where the second sum is over all relevant values of x and f ( x ) is their probability mass distribution. The analogous quantity for a continuous variate with density function f ( x ) is

(3.8a) E [ x ] = + x f ( x ) d x .

We can see from this definition that the nth moment of a distribution about any point λ is

(3.8b) μ n = E [ ( x λ ) n ] .

In particular, the nth central moment is

(3.8c) μ n = E [ ( x E [ x ] ) n ] = + ( x μ ) n f ( x ) d x ,

and for λ = 0 the nth algebraic moment is

(3.8d) μ n = E [ x n ] = + x n f ( x ) d x

Thus, the mean is the first algebraic moment and the variance is the second central moment. It follows from (3.8) that if c is a constant, then

(3.9a) E [ c x ] = c E [ x ] ,

and for a set of random variables A, B, C, etc.:

(3.9b) E [ A + B + C + ˙ ˙ ˙ ] = E [ A ] + E [ B ] + E [ C ] + ˙ ˙ ˙

In addition, if the random variables A, B, C, etc. are independent, then

(3.9c) E [ A B C ] = E [ A ] E [ B ] E [ C ]

EXAMPLE 3.3

Three 'fair' dice are thrown and yield face values a, b, and c. What is the expectation value for the sum of their face values?

From (3.7),

E [ a ] = i = 1 6 i ( 1 / 6 ) = 7 / 2 ,

and since E [ a ] = E [ b ] = E [ c ] , then from (3.9b) E [ a + b + c ] = 21 / 2 .

EXAMPLE 3.4

Find the mean of the continuous distribution of Example 3.2 .

Using (3.8d), the mean is

μ = 1 2 0 x 3 e x d x .

Integrating by parts gives

μ = e x 2 [ x 3 + 3 x 2 + 6 x + 6 ] 0 = 3 .

3.2.3 Moment Generating, and Characteristic Functions

The usefulness of moments partly stems from the fact that knowledge of them determines the form of the density function. Formally, if the moments μ n of a random variable x exist and the series

(3.10) n = 1 μ n n ! r n

converges absolutely for some r > 0 , then the set of moments μ n uniquely determines the density function. There are exceptions to this statement, but fortunately it is true for all the distributions commonly met in physical science. In practice, knowledge of the first few moments essentially determines the general characteristics of the distribution and so it is worthwhile to construct a method that gives a representation of all the moments. Such a function is called a moment generating function (mgf) and is defined by

(3.11) M x ( t ) E [ e x t ] .

For a discrete random variable x, this is

(3.12a) M x ( t ) = e x t f ( x ) ,

and for a continuous variable,

(3.12b) M x ( t ) = + e x t f ( x ) d x .

The moments may be generated from (3.11) by first expanding the exponential,

M x ( t ) = E [ 1 + x t + 1 2 ! ( x t ) 2 + ˙ ˙ ˙ ] = n = 0 1 n ! μ n t n ,

then differentiating n times and setting t = 0 , that is:

(3.13) μ n = n M x ( t ) t n | t = 0 .

For example, setting n = 0 and n = 1 , gives μ 0 = 1 and μ 1 = μ . Also, since the mgf about any point λ is

M λ ( t ) = E [ exp { ( x λ ) t } ] ,

then if λ = μ ,

(3.14) M μ ( t ) = e μ t M x ( t ) .

An important use of the mgf is to compare two density functions f ( x ) and g ( x ) . If two random variables possess mgfs that are equal for some interval symmetric about the origin, then f ( x ) and g ( x ) are identical density functions. It is also straightforward to show that the mgf of a sum of independent random variables is equal to the product of their individual mgfs.

It is sometimes convenient to consider, instead of the mgf, its logarithm. The Taylor expansion 3 for this quantity is

ln M x ( t ) = κ 1 t + κ 2 t 2 2 + ,

where κ n is the cumulant of order n, and

κ n = n ln M x ( t ) t n | t = 0 .

Cumulants are simply related to the central moments of the distribution, the first few relations being

κ i = μ i ( i = 1 , 2 , 3 ) , κ 4 = μ 4 3 μ 2 2 .

For some distributions the integral defining the mgf may not exist and in these circumstances the Fourier transform of the density function, defined as

(3.15) ϕ x ( t ) E [ e i t x ] = + e i t x f ( x ) d x = M x ( i t ) ,

may be used. In statistics, ϕ x ( t ) is called the characteristic function (cf). The density function is then obtainable by the Fourier transform theorem (known in this context as the inversion theorem):

(3.16) f ( x ) = 1 2 π + e i t x ϕ x ( t ) d t .

The cf obeys theorems analogous to those obeyed by the mgf, that is: (a) if two random variables possess cfs that are equal for some interval symmetric about the origin then they have identical density functions; and (b) the cf of a sum of independent random variables is equal to the product of their individual cfs. The converse of (b) is however untrue.

EXAMPLE 3.5

Find the moment generating function of the density function used in Example 3.2 and calculate the three moments μ 1 , μ 2 , and μ 3 .

Using definition (3.12b),

M x ( t ) = 0 e x t f ( x ) d x = 1 2 0 e x t x 2 e x d x = 1 2 0 e x ( 1 t ) x 2 d x ,

which integrating by parts gives:

M x ( t ) = { e x ( 1 t ) 2 ( 1 t ) 3 [ ( 1 t ) 2 x 2 + 2 ( 1 t ) x + 2 ] } 0 = 1 ( 1 t ) 3 .

Then, using (3.13), the first three moments of the distribution are found to be

μ 1 = 3 , μ 2 = 12 , μ 3 = 60.

EXAMPLE 3.6

(a) Find the characteristic function of the density function:

f ( x ) = { 2 x / a 2 a x < 0 0 otherwise ,

and (b) the density function corresponding to a characteristic function e | t | .

(a)

From (3.15),

ϕ x ( t ) = E [ e i t x ] = 2 a 2 0 a e i t x x d x .

Again, integration by parts gives

ϕ x ( t ) = 2 a 2 [ e i t x ( i t ) 2 ( i t x 1 ) ] 0 a = 2 a 2 t 2 [ e i t a ( i t a 1 ) + 1 ] .

(b)

From the inversion theorem,

f ( x ) = 1 2 π e | t | e i t x d x = 1 π 0 e t cos ( t x ) d x ,

where the symmetry of the circular functions has been used. The second integral may be evaluated by parts to give

π f ( x ) = [ e t cos ( t x ) ] 0 x 0 e t sin ( t x ) d t = 1 x { [ e t sin ( t x ) ] 0 + x 0 e t cos ( t x ) d t } = 1 π x 2 f ( x ) .

Thus,

f ( x ) = 1 π ( 1 + x 2 ) , x .

This is the density of the Cauchy distribution that we will meet again in Section 4.5.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123877604000032

k-Sample test based on the common area of kernel density estimators

P. Martínez-Camblor , ... N. Corral , in Journal of Statistical Planning and Inference, 2008

In this section, a Crámer–Chernoff type theorem for the statistic defined in (2) will be proved. This large deviation result will enable us to investigate the Bahadur efficiency of the AC test (see Section 5). Similar results have been given by Louani (2000), for the L 1 distance between a kernel density estimator and its target, and by Martínez-Camblor et al. (2006), for the L 1 measure between two kernel density estimators based on independent samples. In the present case, we will need the following conditions:

(C1)

The k samples { x ij } j = 1 n i , 1 i k , are drawn from equal, absolutely continuous populations, with common density f;

(C2)

lim n n i / n j = 1 for each i , j 1 , , k where n means n i for each i = 1 , , k ;

(C3)

the kernel function K is a continuous density function symmetrical about zero;

(C4)

lim n i h n i = 0 , and lim n i n i h n i = for each i = 1 , , k .

There is a limitation imposed by condition (C2), which states that the sample sizes should grow at the same rate. In practice, the available sample sizes may be very different to each other. In the simulations reported in Section 3, we have seen that this issue may influence the power of the proposed test statistic when the differences among sample sizes get larger.

Theorem 3

Under (C1)–(C4), we have as λ 0

lim n i = 1 k 1 kn i log P { 1 - AC > λ } = - k 2 ( k - 1 ) λ 2 ( 1 + o ( 1 ) )

Proof

We first consider the case n 1 = n 2 = = n k = n . We will prove the inequality

(4) lim n inf 1 n log P { 1 - AC > λ } - k 2 ( k - 1 ) λ 2 ( 1 + o ( 1 ) )

Let A = { A 1 , , A k } be a family of open intervals such that A i A j = if i j , i = 1 k A i = ( - , ) and A i f ( t ) d t = 1 / k , i = 1 , , k . We have

AC = min { f ^ n 1 , , f ^ n k } = i = 1 k A i min { f ^ n 1 , , f ^ n k } = i = 1 k A i min { f ^ n 1 , , f ^ n k } i = 1 k A i f ^ n i ( t ) d t = 1 n j = 1 n i = 1 k A i 1 h n i K x ij - t h n i d t 1 n j = 1 n Z k , j ( n )

and hence

1 n log P { 1 - AC > λ } 1 n log P 1 n j = 1 n ( 1 - Z k , j ( n ) ) > λ .

By the Crámer-Chernoff theorem (Van der Vaart, 1998, Proposition 14.23), the equality

lim n 1 n log P 1 n j = 1 n ( 1 - Z k , j ( n ) ) > λ = inf t > 0 { - λ t + log ( E [ e t ( 1 - Z k ) ] ) }

follows, where Z k = lim n Z k , j ( n ) . Now, use (C3) and the dominated convergence theorem to get for i = 1 , , k

lim n A i 1 h n i K x - t h n i d t = 1 if x A i i 1 2 if x A i c A ¯ i c 0 if x A ¯ i i

where for each borelian set A the notation A ¯ , A i and A c hold for the complementary, the interior and the closure set, respectively. Since by definition P { x ij A i c A ¯ i c } = 0 , the variable Z k follows a binomial distribution Bin ( k , a ) , where a = P { x ij A i } = 1 / k . Then,

E [ e t ( 1 - Z k ) ] = e t ( a e - t + 1 - a ) k

and

inf t > 0 { - λ t + log ( E [ e t ( 1 - Z k ) ] ) } = inf t > 0 { ( 1 - λ ) t + k log ( a e - t + ( 1 - a ) } = ( λ - 1 ) log ( 1 - λ ) ( 1 - a ) a ( k + λ - 1 ) + k log ( 1 - a ) k k + λ - 1

Using, in a neighborhood of λ 0 , a Taylor expansion and the fact that a = 1 / k we get (4). The next step is to establish

(5) lim sup n 1 n log P { 1 - AC > λ } - k 2 ( k - 1 ) λ 2 ( 1 + o ( 1 ) )

in the case of equal sample sizes. Let π ( A ) the set which contains the k k possible combinations (repetitions allowed) of elements of A = { A 1 , , A k } . We have

1 - AC sup B π ( A ) j = 1 k B j f ^ n j - A j min { f ^ n 1 , , f ^ n k }

where we use the notation B = { B 1 , , B k } . Then,

P { 1 - AC > λ } P sup B π ( A ) j = 1 k B j f ^ n j - A j min { f ^ n 1 , , f ^ n k } > λ B π ( A ) P j = 1 k B j f ^ n j - A j min { f ^ n 1 , , f ^ n k } > λ

Now,

j = 1 k B j f ^ n j - A j min { f ^ n 1 , , f ^ n k } = 1 n i = 1 n j = 1 k B j 1 h n i K x ji - t h n i d t - A j min { f ^ n 1 , , f ^ n k }

Use (C3) and the dominated convergence theorem to get (in the same notation as above) for i = 1 , , k

lim n B i 1 h n i K x - t h n i d t - A j min { f ^ n 1 , , f ^ n k } = 1 - a if x B i i 1 2 - a if x B i c B ¯ i c - a if x B ¯ i i

By using arguments similar to those needed for proving (4), we obtain

lim sup n P j = 1 k B j f ^ n j - A j min { f ^ n 1 , , f ^ n k } > λ = inf t > 0 { - λ t + k log ( a + ( 1 - a ) e - t ) } = ( λ + 1 - k ) log k - λ - 1 k - 1 - ( λ + 1 ) log ( λ + 1 )

Using a Taylor expansion in a neighborhood of λ = 0 we get (5) as λ 0 .

It remains to prove the result for arbitrary sample sizes. Introduce m = min { n 1 , , n k } , M = max { n 1 , , n k } and

AC m = min 1 m i = 1 m 1 h n 1 K x i 1 - t h n 1 , , 1 m i = 1 m 1 h n k K x ik - t h n k d t .

It has been proven that

(6) lim m 1 m log P { 1 - AC m > λ } = - k 2 ( k - 1 ) λ 2 ( 1 + o ( 1 ) )

On the other hand, for each j = 1 , , k ,

1 n j i = 1 n j 1 h n j K x ij - t h n , j = 1 m i = 1 m 1 h n j K x ij - t h n j + m - n j mn j i = 1 m 1 h n j K x ij - t h n j + 1 n j i = m + 1 n j 1 h n j K x ij - t h n j

From this we obtain

AC m + 2 ( k - 1 ) ( m - M ) m AC AC m + 2 ( k - 1 ) ( M - m ) m

Use (6), the continuity of the probability, and condition (C2) to conclude.  

Read full article

URL:

https://www.sciencedirect.com/science/article/pii/S0378375808000499