Unit 4 - Hypothesis testing

15 min read

This unit will present more tests based on CLT and sometimes Slutsky, like T-test when data is Gaussian, σ2\sigma^2 is unknown and Slutsky does not apply; like Wald’s test when we use asymptotic normality of MLE; like Implicit hypotheses when testing about multivariate parameters; like Goodness of fit when answering questions like “does my data follow a Gaussian distribution?”.

Asymptotic test - Clinical trials example

Let X1,,XnX_1,\dots ,X_ n be i.i.d. test group samples distributed according to N(Δd,σd2)\mathcal{N}\left(\Delta _ d,\sigma _ d^2\right) and let Y1,,YmY_1,\dots ,Y_ m be i.i.d. control group samples distributed according to N(Δc,σc2)\mathcal{N}\left(\Delta _ c,\sigma _ c^2\right). Assume that X1,,Xn,Y1,,YmX_1,\dots ,X_ n, Y_1,\dots ,Y_ m are independent.

Hypotheses:

H0:Δd=Δcvs.H1:Δd>ΔcH_0: \Delta_d=\Delta_c \quad \text {vs.} \quad H_1:\Delta_d>\Delta_c

From (even don’t need CLT to get this):

XˉnN(Δd,σd2n)andYˉmN(Δc,σc2m)\bar X_n\sim \mathcal{N}\left(\Delta _ d,\frac {\sigma _ d^2}{n}\right) \quad \text {and} \quad \bar Y_m\sim \mathcal{N}\left(\Delta _ c,\frac {\sigma _ c^2}{m}\right)

We can get:

XˉnYˉm(ΔdΔc)σd2n+σc2mN(0,1)\frac {\bar X_n-\bar Y_m-(\Delta_d-\Delta_c)}{\sqrt {\frac{\sigma_d^2}{n}+\frac{\sigma_c^2}{m}}}\sim \mathcal N(0,1)

Assume that m=cnm = cn and nn\to\infty, using Slutsky’s lemma, we can replace the variance σ2\sigma^2 by the sample variance σ2^\widehat{\sigma^2}:

XˉnYˉm(ΔdΔc)σd2^n+σc2^mN(0,1)\frac {\bar X_n-\bar Y_m-(\Delta_d-\Delta_c)}{\sqrt {\frac{\widehat{\sigma_d^2}}{n}+\frac{\widehat{\sigma_c^2}}{m}}}\sim \mathcal N(0,1)

where:

σd2^=1n1i=1n(XiXˉn)2andσc2^=1m1i=1m(YiYˉm)2\widehat{\sigma_d^2}=\frac 1{n-1}\sum _{i=1}^n (X_i-\bar X_n)^2 \quad \text {and} \quad \widehat{\sigma_c^2}=\frac 1{m-1}\sum _{i=1}^m (Y_i-\bar Y_m)^2

This is a one-side, two-sample test. Here we use (n1)(n-1) instead of nn, because it is a no-biased variance estimator.

However, when the sample size is small, we cannot realistically apply Slutsky’s lemma, so we cannot replace the variance σ2\sigma^2 by the sample variance σ2^\widehat{\sigma^2}. Slutsky’s theorem only gives a good approximation when the sample size is very large.

The χ2\chi^2 distribution

For a positive integer dd, the χ2\chi^2 (pronounced “Kai-squared”) distribution with dd degrees of freedom is the law of the random variable Z12+Z22++Zd2Z_1^2 + Z_2^2 + \cdots + Z_ d^2 , where Z1,,ZdiidN(0,1)Z_1, \ldots , Z_ d \stackrel{iid}{\sim } \mathcal{N}(0,1) .

If ZNd(0,Id)\mathbf{Z} \sim \mathcal{N}_ d (0, I_ d), then Z22χd2\Vert \mathbf{Z} \Vert _2^2 \sim \chi^2_d .

And χ22=Exp(12)\chi^2_2=\textsf {Exp}(\frac 12) .

If Xχd2X \sim \chi^2_d, then

  • E[X]=d\mathbb E[X]=d
  • Var[X]=2d\textsf {Var}[X]=2d

The sample variance

Cochran’s theorem states that if X1,,XniidN(μ,σ2)X_1, \ldots , X_ n \stackrel{iid}{\sim } \mathcal{N}(\mu , \sigma ^2), then the sample variance:

Sn=1ni=1n(XiXˉn)2=1n(i=1nXi2)(Xˉn)2S_ n = \frac{1}{n} \sum _{i = 1}^ n (X_ i - \bar X_ n)^2=\frac{1}{n} \left(\sum _{i = 1}^ n X_ i^2\right) - (\bar X_ n)^2

satisfies:

  • Xˉn\bar X_ n is independent of SnS_n
  • nSnσ2χn12\frac{n S_ n}{\sigma ^2} \sim \chi _ {n -1}^2

Here it is χn12\chi _ {n -1}^2 because there is only (n1)(n-1) degree of freedom: the sum of all the variables (XiXˉn)(X_ i - \bar X_ n) equal to 00: i=1n(XiXˉn)=0\sum _ {i = 1}^ n (X_ i - \bar X_ n)=0 .

We often prefer the unbiased estimator of σ2\sigma^2:

S~n=1n1i=1n(XiXˉn)2=nn1Sn\begin{aligned} \widetilde{S}_ n &= \frac{1}{n-1} \sum _{i=1}^ n \left(X_ i - \bar X _ n\right)^2 \\ &=\frac{n}{n-1} S_n \end{aligned}

Then its expectation:

E[S~n]=nn1E[Sn]=nn1E[σ2χn12n]=σ2n1E[χn12]=σ2n1(n1)=σ2\begin{aligned} \mathbb E[\widetilde{S}_ n] &= \frac{n}{n-1}\mathbb E[S_n] \\ &= \frac{n}{n-1}\mathbb E \left[\frac{\sigma^2\chi _ {n -1}^2}{n}\right] \\ &= \frac{\sigma^2}{n-1}\mathbb E \left[\chi _ {n -1}^2\right] \\ &= \frac{\sigma^2}{n-1}\mathbb (n-1)\\ &= \sigma^2 \end{aligned}

Student’s T distribution

For a positive integer dd, the Student’s T distribution with dd degrees of freedom (denoted by tdt_d) is the law of the random variable ZV/d\frac {Z}{\sqrt{V/d}}, where ZN(0,1)Z\sim \mathcal N (0, 1), Vχd2V \sim \chi^2_d and Z ⁣ ⁣ ⁣VZ \perp \!\!\! \perp V (ZZ is independent of VV).

Student’s T test

One-Sample, Two-Sided

The test statistic:

Tn=n(Xˉnμ)S~n=n(Xˉnμ1n1i=1n(XiXˉn)2)\begin{aligned} T_{n} &= \frac{\sqrt{n}(\bar{X}_ n - \mu)}{\sqrt{\widetilde{S}_ n}} \\ &= \sqrt{n} \left( \frac{\bar{X}_ n - \mu }{\sqrt{\frac{1}{n- 1} \sum _{i = 1}^ n (X_ i - \bar{X}_ n)^2} } \right) \end{aligned}

where Xˉn\bar X_n is the sample mean of i.i.d. Gaussian observations with mean μ\mu and variance σ2\sigma^2, S~n\widetilde{S} _ n is the unbiased sample variance.

Since n(Xˉnμ)/σN(0,1)\sqrt{n}(\bar{X} _ n - \mu)/\sigma \sim \mathcal N(0,1), and S~n/σ2χn12/(n1)\widetilde{S}_ n/\sigma^2 \sim \chi _ {n -1}^2/(n-1), then Tntn1T _ n \sim t _ {n-1} (by Cochran’s theorem), which is Student’s T distribution with (n1)(n-1) degrees of freedom. So the distribution of TnT_n is pivotal, and its quantiles can be found in tables.

The student’s T test of level α\alpha is specified by:

ψα=1(Tn>qα/2)\psi _{\alpha } = \mathbf{1}(\vert T_ n\vert > q_{\alpha /2})

where qα/2q _ {\alpha/2} is the (1α/2)(1-\alpha/2)-quantile of tn1t _ {n-1}.

Be careful that: The Student’s T test requires the data X1,,XnX_1,\ldots,X_n to be Gaussian. This test is non-asymptotic. That is, for any fixed nn, we can compute the level of our test rather than the asymptotic level.

One-Sample, One-Sided

The student’s T test of level α\alpha is specified by:

ψα=1(Tn>qα)\psi _{\alpha } = \mathbf{1}(\vert T_ n\vert > q_{\alpha})

where qαq _ {\alpha} is the (1α)(1-\alpha)-quantile of tn1t _ {n-1}.

Two-Sample

Back to the Clinical trials example, we have:

XˉnYˉm(ΔdΔc)σd2n+σc2mN(0,1)\frac {\bar X_n-\bar Y_m-(\Delta_d-\Delta_c)}{\sqrt {\frac{\sigma_d^2}{n}+\frac{\sigma_c^2}{m}}}\sim \mathcal N(0,1)

When the samples size is small, we can not use Slutsky’s lemma anymore, which means that we can not replace the variance σ2\sigma^2 by the sample variance σ2^\widehat{\sigma^2} anymore.

But we have approximately:

XˉnYˉm(ΔdΔc)σd2^n+σc2^mapprox.tN\frac {\bar X_n-\bar Y_m-(\Delta_d-\Delta_c)}{\sqrt {\frac{\widehat{\sigma_d^2}}{n}+\frac{\widehat{\sigma_c^2}}{m}}} \stackrel{\text {approx.}}{\sim } t_N

where the degrees of freedom NN is given by the Welch-Satterthwaite formula :

min(n,m)N=(σ^X2/n+σ^Y2/m)2σ^X4n2(n1)+σ^Y4m2(m1)n+m\min (n,m)\, \leq \, N\, =\, \frac{\big (\hat\sigma _ X^2/n + \hat\sigma _ Y^2/m\big )^2}{\frac{\hat\sigma _ X^4}{n^2(n-1)}+\frac{\hat\sigma _ Y^4}{m^2(m-1)}} \, \leq \, n+m

Wald’s test

According to Asymptotic Normality of the MLE:

n(θ^nMLEθ)n(d)N(0,I(θ)1)\sqrt{n}(\widehat{\theta }_ n^{\textsf {MLE}} - \theta ^*) \xrightarrow [n \to \infty ]{(d)} \mathcal{N}(0, \mathcal{I}(\theta ^*)^{-1})

where θRd\theta^ * \in \mathbb R^d, and I(θ)\mathcal I (\theta^ * ) denotes the Fisher information.

Standardize the statement of asymptotic normality above:

nI(θ)1/2(θ^nMLEθ)n(d)N(0,Id)\sqrt{n} \mathcal{I}(\theta ^*)^{1/2} (\widehat{\theta }_ n^{\textsf {MLE}} - \theta ^*) \xrightarrow [n \to \infty ]{(d)} \mathcal{N}(0, \mathbf{I}_{d})

The Wald’s test:

nI(θ)1/2(θ^nMLEθ)2n(d)χd2\left\Vert \sqrt{n}\, \mathcal{I}(\mathbf{\theta ^*})^{1/2}(\widehat{\theta }_ n^{\textsf {MLE}}- \theta ^*) \right\Vert ^2 \xrightarrow [n\to \infty ]{(d)} \chi ^2_ d

which is also:

n(θ^nMLEθ)TI(θMLE)(θ^nMLEθ)n(d)χd2n (\widehat{\theta }_ n^{\textsf {MLE}} - \theta ^*)^{\textsf T} \mathcal{I}(\theta ^{\textsf {MLE}}) (\widehat{\theta }_ n^{\textsf {MLE}} - \theta ^*) \xrightarrow [n \to \infty ]{(d)} \chi ^2_ d

Wald’s Test in 1 Dimension

In 1 dimension, Wald’s Test coincides with the two-sided test based on on the asymptotic normality of the MLE.

Given the hypotheses:

H0:θ=θ0H1:θθ0\begin{aligned} H_0&: \theta ^*= \theta _0 \\ H_1&: \theta ^*\ne \theta _0 \end{aligned}

a two-sided test of level α\alpha, based on the asymptotic normality of the MLE, is

ψα=1(nI(θ0)θ^MLEθ0>qα/2(N(0,1)))\psi _\alpha =\mathbf{1}\left(\sqrt{n\mathcal I(\theta _0)} \left\vert \widehat{\theta }^{\textsf {MLE}} -\theta _0 \right\vert>q_{\alpha /2}(\mathcal{N}(0,1))\right)

where the Fisher information I(θ0)1\, \mathcal I(\theta _0)^{-1}\, is the asymptotic variance of θ^MLE\widehat{\theta }^{\textsf {MLE}} under the null hypothesis.

On the other hand, a Wald’s test of level α\alpha is

ψαWald=1(nI(θ0)(θ^MLEθ0)2>qα(χ12))=1(nI(θ0)θ^MLEθ0>qα(χ12))\begin{aligned} \psi ^{\textsf {Wald}}_\alpha &= \mathbf{1}\left(n\mathcal I(\theta _0) \left(\widehat{\theta }^{\textsf {MLE}} -\theta _0\right)^2\, >\, q_{\alpha }(\chi ^2_1)\right) \\ &= \mathbf{1}\left(\sqrt{n \mathcal I(\theta _0)} \, \left\vert \widehat{\theta }^{\textsf {MLE}} -\theta _0 \right\vert\, >\, \sqrt{q_{\alpha }(\chi ^2_1)}\right) \end{aligned}

Using the result from the problem above, we see that the two-sided test of level α\alpha is the same as Wald’s test at level α\alpha.

Example: Performing Wald’s Test on a Gaussian Data Set

Suppose X1,,XniidN(μ,σ2)X_1, \ldots , X_ n \stackrel{iid}{\sim } N(\mu , \sigma ^2). The goal is to hypothesis test between:

H0:(μ,σ2)=(0,1)H1:(μ,σ2)(0,1)\begin{aligned} H_0&: (\mu , \sigma ^2) = (0,1) \\ H_1&: (\mu , \sigma ^2) \ne (0,1) \end{aligned}

The Wald’s test of level α\alpha is:

ψαWald=1(Wn>qα(χ22))=1(n(θ^nT(01))I((0,1))(θ^n(01))>qα(χ22))\begin{aligned} \psi ^{\textsf {Wald}}_\alpha &= \mathbf{1}\left(W_ n > q_\alpha (\chi _2^2) \right) \\ &= \mathbf{1}\left( n \left(\widehat{\theta }_ n^ T-\begin{pmatrix} 0& 1\end{pmatrix}\right)\mathcal{I}((0,1))\left(\widehat{\theta }_ n- \begin{pmatrix} 0\\ 1\end{pmatrix}\right) > q_\alpha (\chi _2^2) \right) \end{aligned}

where:

θ^nT=(μ^nMLE(σ2^)nMLE)=(Xn1ni=1n(XiXn)2)\widehat{\theta } _ n^ T=\begin{pmatrix} \widehat{\mu }_ n^{MLE}\\ (\widehat{\sigma ^2})_ n^{MLE}\end{pmatrix} = \begin{pmatrix} \overline{X}_ n\\ \frac{1}{n} \sum _{i = 1}^ n ( X_ i - \overline{X}_ n )^2 \end{pmatrix}

and

I(μ,σ2)=(1σ20012σ4)\mathcal{I}(\mu , \sigma ^2) = \begin{pmatrix} \frac{1}{\sigma ^2} & 0 \\ 0 & \frac{1}{2 \sigma ^4} \end{pmatrix}

Likelihood Ratio Test

Basic Form

Given the hypotheses:

H0:θ=θ0H1:θ=θ1\begin{aligned} H_0&: \theta ^*= \theta _0 \\ H_1&: \theta ^*= \theta _1 \end{aligned}

The likelihood ratio in this set-up is of the form :

ψC=1(Ln(x1,,xn;θ1)Ln(x1,,xn;θ0)>C)\psi _ C = \mathbf{1}\left( \frac{L_ n(x_1, \ldots , x_ n; \theta _1 )}{L_ n(x_1, \ldots , x_ n; \theta _0 )} > C \right)

where CC is a threshold to be specified.

Likelihood Ratio Test (based on log-likelihood)

Consider an i.i.d. sample X1,,XnX_1, \ldots , X_ n with statistical model (E,(Pθ)θΘ)\left(E, ( \mathbf P _ \theta ) _ {\theta \in \Theta }\right), where θRd\theta \in \mathbb R^d.

Suppose the null hypothesis has the form:

H0:(θr+1,,θd)=(θr+1(0),,θd(0))H_0: (\theta ^*_{r+1}, \ldots , \theta _ d^*)= (\theta ^{(0)}_{r+1}, \ldots , \theta _ d^{(0)})

for some fixed and given numbers θr+1(0),,θd(0)\theta ^{(0)}_{r+1}, \ldots , \theta _ d^{(0)}.

Thus Θ0\Theta_0, the region defined by the null hypothesis, is

Θ0:={vRd:(vr+1,,vd)=(θr+1(0),,θd(0))}\Theta _0 := \{ \mathbf{v} \in \mathbb {R}^ d: (v_{r+1}, \ldots , v_{d}) = (\theta ^{(0)}_{r+1}, \ldots , \theta _ d^{(0)}) \}

where (θr+1(0),,θd(0))(\theta ^{(0)}_{r+1}, \ldots , \theta _ d^{(0)}) consists of known values.

The likelihood ratio test involves the test-statistic:

Tn=2(n(θn^MLE)n(θn^c))T_ n = 2 \left( \ell _ n(\widehat{\theta _ n}^{MLE}) - \ell _ n(\widehat{\theta _ n}^{c}) \right)

where n\ell_n is the log-likelihood.

The estimator θnc^\widehat{\theta _ n^{c}} is the constrained MLE , and it is defined to be:

θnc^=argmaxθΘ0n(X1,,Xn;θ)\widehat{\theta _ n^{c}} = \text {argmax}_{\theta \in \Theta _0} \ell _ n(X_1, \ldots , X_ n ; \theta )

Wilks’ Theorem

Assume H0H_0 is true and the MLE technical conditions are satisfied. Then, TnT_n is a pivotal statistic; i.e., it converges to a pivotal distribution.

Tnn(d)χdr2T_ n \xrightarrow [n \to \infty ]{(d)} \chi _{d-r}^2

Goodness of Fit Tests

Goodness of fit (GoF) tests: we want to know if the hypothesized distribution is a good fit for the data. In order to answer questions like:

  • Does XX have distribution N(0,1)\mathcal N(0,1)?
  • Does XX have a Gaussian distribution ?
  • Does XX have distribution U([0,1])\mathcal U([0,1])?

Key characteristic of GoF tests: no parametric modeling.

Suppose you observe i.i.d. samples X1,,XnPX_1, \ldots , X_ n \sim P from some unknown distribution P\mathbf P. Let F\mathcal F denote a parametric family of probability distributions (for example, F\mathcal F could be the family of normal distributions {N(μ,σ2)}μR,σ2>0\{ \mathcal{N}(\mu , \sigma ^2) \} _ {\mu \in \mathbb {R}, \sigma ^2 > 0} \,).

In the topic of goodness of fit testing, our goal is to answer the question “Does P\mathbf P belong to the family F\mathcal F, or is P\mathbf P any distribution outside of P\mathbf P ?

Parametric hypothesis testing is a particular case of goodness of fit testing. However, in the context of parametric hypothesis testing, we assume that the data distribution P\mathbf P comes from some parametric statistical model {Pθ}θΘ\{ \mathbf{P} _ \theta \} _ {\theta \in \Theta }, and we ask if the distribution P\mathbf P belongs to a submodel {Pθ}θΘ0\{ \mathbf{P} _ \theta \} _ {\theta \in \Theta _0} or its complement {Pθ}θΘ1\{ \mathbf{P} _ \theta \} _ {\theta \in \Theta _1 }. In parametric hypothesis testing, we allow only a small set of alternatives {Pθ}θΘ1\{ \mathbf{P} _ \theta \} _ {\theta \in \Theta _1 }, where as in the goodness of fit testing, we allow the alternative to be anything.

GoF for Discrete Distributions

The probability simplex in RK\mathbb R^K, denoted by ΔK\Delta_K, is the set of all vectors p=[p1,,pK]T\mathbf{p} = \left[p_1, \dots , p_ K\right]^ T such that:

p1=pT1=1,pi0 for all K\mathbf{p}\cdot \mathbf{1}\, =\, \mathbf{p}^ T \mathbf{1} = 1, \quad p_ i \ge 0 \textsf { for all } K

where 1\mathbf 1 denotes the vector 1=(111)T\, \mathbf{1}=\begin{pmatrix} 1& 1& \ldots & 1\end{pmatrix}^ T. Equivalently, in more familiar notation,

ΔK={p=(p1,,pK)[0,1]K:i=1Kpi=1}\Delta _ K\, =\, \left\{ \mathbf{p}=(p_1,\ldots ,p_ K) \in [0,1]^ K \, :\, \sum _ {i=1}^{K} p_ i \, =\, 1\right\}

We want to test:

H0:p=p0,H1:pp0H_0: \mathbf{p} = \mathbf{p}^0, \quad H_1: \mathbf{p} \ne \mathbf{p}^0

where p0\mathbf{p}^0 is a fixed PMF.

The categorical likelihood of observing a sequence of nn i.i.d. outcomes X1,,XnXX_1, \dots , X_ n \sim X can be written using the number of occurrences Ni,i=1,,KN_ i, i=1,\dots ,K, of the KK outcomes as:

Ln(X1,,Xn,p1,,pK)=p1N1p2N2pKNKL_ n(X_1,\dots ,X_ n,p_1,\dots ,p_ K) = p_1^{N_1}p_2^{N_2} \cdots p_ K^{N_ K}

The categorical likelihood of the random variable XX, when written as a random function, is

L(X,p1,,pK)=i=1Kpi1(X=ai)L(X,p_1,\dots ,p_ K) = \prod _{i=1}^ K p_ i^{\mathbf{1}(X = a_ i)}

(the sample space of a categorical random variable XX is E={a1,,aK}E = \{ a_1, \ldots , a_ K \}).

Let p^\widehat{\mathbf p} be the MLE:

p^nMLE=argmaxpΔKlogLn(X1,,Xn,p)\widehat{\mathbf{p}}^{\textsf {MLE}}_ n = \textsf {argmax}_{\mathbf{p} \in \Delta _K} \log L_ n(X_1, \ldots , X_ n, \mathbf{p})

then:

p^j=Njn,j=1,,K\widehat{\mathbf p}_j=\frac {N_j}{n}, \quad j=1,\ldots,K

χ2\chi ^2 test : if H0H_0 is true, then n(p^p0)\sqrt{n}(\widehat{\mathbf{p}} - \mathbf{p}^0) is asymptotically normal and:

ni=1K(pi^pi0)2pi0n(d)χK12n \sum _{i = 1}^ K \frac{ ( \widehat{ p_ i } - p_ i^0)^2 }{p_ i^0} \xrightarrow [n \to \infty ]{(d)} \chi _{K -1}^2

GoF for Continuous Distributions

Let X1,,XnX_1,\ldots,X_n be i.i.d. real random variables. The cdf of X1X_1 is defined as:

F(t)=P[X1t]=E[1(X1t)],tRF(t)=\mathbf P[X_1\le t]=\mathbb E[\mathbf{1}(X_1\leq t)], \quad \forall t \in \mathbb R

which completely characterizes the distribution of X1X_1.

The empirical cdf (a.k.a. sample cdf) of the sample X1,,XnX_1,\ldots,X_n is defined as:

Fn(t)=1ni=1n1(Xit)=#{i=1,,n:Xit}n,tR\begin{aligned} F_n(t)&=\frac{1}{n} \sum _{i = 1}^ n \mathbf{1}(X_ i \leq t) \\ &=\frac {\#\{i=1,\ldots,n:X_i\leq t\}}{n}, \quad \forall t \in \mathbb R \end{aligned}

By the LLN, for all tRt \in \mathbb R,

Fn(t)na.s.F(t)F_n(t) \xrightarrow [n \to \infty ]{a.s.} F(t)

By Glivenko-Cantelli Theorem (Fundamental theorem of statistics):

suptRFn(t)F(t)na.s.0\sup _{t \in \mathbb {R}} \left\vert F_ n(t) - F(t) \right\vert \xrightarrow [n \to \infty ]{a.s.} 0

By the CLT, for all tRt \in \mathbb R,

n(Fn(t)F(t))n(d)N(0,F(t)(1F(t))\sqrt{n} ( F_ n(t) - F(t) ) \xrightarrow [n \to \infty ]{(d)} \mathcal N(0,F(t)(1-F(t))

(The variance of Bernoulli distribution is p(1p)p(1-p).)

Donsker’s Theorem states that if F\mathbf F is continuous, then

nsuptRFn(t)F(t)n(d)sup0x1B(x)\sqrt{n} \sup _{t \in \mathbb {R}} \vert F_ n(t) - F(t) \vert \xrightarrow [n \to \infty ]{(d)} \sup _{0 \leq x \leq 1} \vert\mathbb {B}(x)\vert

where B\mathbb B is a random curve called a Brownian bridge.

We want to test:

H0:F=F0,H1:FF0H_0: \mathbf{F} = \mathbf{F}^0, \quad H_1: \mathbf{F} \ne \mathbf{F}^0

where F0\mathbf{F}^0 is a continuous cdf. Let Fn\mathbf F_n be the empirical cdf of the sample X1,,XnX_1,\ldots,X_n. If H0H_0 is true (F=F0\, \mathbf{F} = \mathbf{F}^0\,), then Fn(t)F0(t)\mathbf{F} _ n(t) \thickapprox \mathbf{F}^0(t), for all tRt \in \mathbb R.

Kolmogorov-Smirnov test

The Kolmogorov-Smirnov test statistic is defined as:

Tn=suptRnFn(t)F0(t)T_ n = \sup _{t \in \mathbb {R}} \sqrt{n} \vert F_ n(t) - F^0(t) \vert

and the Kolmogorov-Smirnov test is

1(Tn>qα)where qα=qα(supt[0,1]B(t))\mathbf{1}(T_ n>q_\alpha )\quad \text {where } q_\alpha =q_\alpha (\sup _{t \in [0,1]}\vert \mathbb {B}(t) \vert)

Here, qα=qα(supt[0,1]B(t))q_\alpha =q_\alpha (\sup _ {t \in [0,1]}\vert \mathbb {B}(t) \vert)\, is the (1α)(1-\alpha)-quantile of the supremum supt[0,1]B(t)\sup _{t \in [0,1]}\vert \mathbb {B}(t) \vert of the Brownian bridge as in Donsker’s Theorem.

TnT_n is called a pivotal statistic: If H0H_0 is true, the distribution of TnT_n does not depend on the distribution of the XiX_i‘s and it is easy to reproduce it in simulations. In practice, the quantile values can be found in K-S Tables.

Even though the K-S test statistics TnT_n is defined as a supremum over the entire real line, it can be computed explicitly as follows:

Tn=nsuptRFn(t)F0(t)=nmaxi=1,,n{max(i1nF0(X(i)),inF0(X(i)))}\begin{aligned} T_n&=\sqrt{n}\sup _{t \in \mathbb {R}} \vert F_ n(t) - F^0(t) \vert \\ &=\sqrt{n}\max _{i=1,\ldots ,n}\left\{ \max \left(\left\vert \frac{i-1}{n}-F^0(X_{(i)}) \right\vert,\left\vert \frac{i}{n}-F^0(X_{(i)}) \right\vert \right) \right\} \end{aligned}

where X(i)X_{(i)} is the order statistic , and represents the ithi^{th} smallest value of the sample. For example, X(1)X_{(1)} is the smallest and X(n)X_{(n)} is the greatest of a sample of size nn.

Kolmogorov-Lilliefors Test

What if I want to test: “Does X have Gaussian distribution?” but I don’t know the parameters? A simple idea is using plug-in:

suptRFn(t)Φμ^,σ^2(t)\sup _{t \in \mathbb {R}} \vert F_ n(t) - \Phi_{\hat\mu,\hat\sigma^2}(t) \vert

where: μ^=Xˉn\hat\mu=\bar X_n, σ^2=Sn2\hat\sigma^2=S_n^2, and Φμ^,σ^2(t)\Phi_{\hat\mu,\hat\sigma^2}(t) is the cdf of N(μ^,σ^2)\mathcal N(\hat\mu,\hat\sigma^2).

In this case Donsker’s theorem is no longer valid.

Instead, we compute the quantiles for the test statistic:

T~n=suptRFn(t)Φμ^,σ^2(t)\widetilde{T}_ n=\sup _{t \in \mathbb {R}} \vert F_ n(t) - \Phi_{\hat\mu,\hat\sigma^2}(t) \vert

They do not depend on unknown parameters! This is the Kolmogorov-Lilliefors test.

Example: Testing the Mean for a Sample with Unknown Distribution

Suppose that you observe a sample X1,,XniidPX_1, \ldots , X_ n \stackrel{iid}{\sim } \mathbf{P} for some distribution P\mathbf{P} with continuous cdf. Your goal is to decide between the null and alternative hypotheses:

H0:μ=0,H1:μ0H_0: \mu = 0, \quad H_1: \mu \ne 0

Looking at a histogram, you suspect that X1,,XnX_1, \ldots , X_ n have a Gaussian distribution. We would like to first test this suspicion. Formally, we would like to decide between the following null and alternative hypotheses:

H0:P{N(μ,σ2)}μR,σ2>0H1:P{N(μ,σ2)}μR,σ2>0\begin{aligned} H_0'&: \mathbf P \in \{ \mathcal{N}(\mu , \sigma ^2) \} _{\mu \in \mathbb {R}, \sigma ^2 > 0} \\ H_1'&: \mathbf P \notin \{ \mathcal{N}(\mu , \sigma ^2) \} _{\mu \in \mathbb {R}, \sigma ^2 > 0} \end{aligned}

We can use Kolmogorov-Lilliefors test to decide between H0H_0' and H1H_1'.

Suppose that the test we used in the previous part for H0H_0' and H1H_1' fails to reject.

Then we can use Student’s T test to decide between the original hypotheses H0H_0 and H1H_1.

In practice, many of the methods for statistical inference, such as the student’s T test, rely on the assumption the data is Gaussian. Hence, before performing such a test, we need to evaluate whether or not the data is Gaussian. This problem gives an example of such a procedure. First we tested for the Gaussianity of our data, and since the Kolmogorov-Lilliefors test failed to reject, assuming that there was no error, we could apply the student’s T test to answer our original hypothesis testing question.