This unit will present more tests based on CLT and sometimes Slutsky, like T-test when data is Gaussian, σ2 is unknown and Slutsky does not apply; like Wald’s test when we use asymptotic normality of MLE; like Implicit hypotheses when testing about multivariate parameters; like Goodness of fit when answering questions like “does my data follow a Gaussian distribution?”.
Asymptotic test - Clinical trials example
Let X1,…,Xn be i.i.d. test group samples distributed according to N(Δd,σd2) and let Y1,…,Ym be i.i.d. control group samples distributed according to N(Δc,σc2). Assume that X1,…,Xn,Y1,…,Ym are independent.
Hypotheses:
H0:Δd=Δcvs.H1:Δd>Δc
From (even don’t need CLT to get this):
Xˉn∼N(Δd,nσd2)andYˉm∼N(Δc,mσc2)
We can get:
nσd2+mσc2Xˉn−Yˉm−(Δd−Δc)∼N(0,1)
Assume that m=cn and n→∞, using Slutsky’s lemma, we can replace the variance σ2 by the sample variance σ2:
This is a one-side, two-sample test. Here we use (n−1) instead of n, because it is a no-biased variance estimator.
However, when the sample size is small, we cannot realistically apply Slutsky’s lemma, so we cannot replace the variance σ2 by the sample variance σ2. Slutsky’s theorem only gives a good approximation when the sample size is very large.
The χ2 distribution
For a positive integer d, the χ2 (pronounced “Kai-squared”) distribution with d degrees of freedom is the law of the random variable Z12+Z22+⋯+Zd2 , where Z1,…,Zd∼iidN(0,1) .
If Z∼Nd(0,Id), then ∥Z∥22∼χd2 .
And χ22=Exp(21) .
If X∼χd2, then
E[X]=d
Var[X]=2d
The sample variance
Cochran’s theorem states that if X1,…,Xn∼iidN(μ,σ2), then the sample variance:
Sn=n1∑i=1n(Xi−Xˉn)2=n1(∑i=1nXi2)−(Xˉn)2
satisfies:
Xˉn is independent of Sn
σ2nSn∼χn−12
Here it is χn−12 because there is only (n−1) degree of freedom: the sum of all the variables (Xi−Xˉn) equal to 0: ∑i=1n(Xi−Xˉn)=0 .
For a positive integer d, the Student’s T distribution with d degrees of freedom (denoted by td) is the law of the random variable V/dZ, where Z∼N(0,1), V∼χd2 and Z⊥⊥V (Z is independent of V).
where Xˉn is the sample mean of i.i.d. Gaussian observations with mean μ and variance σ2, Sn is the unbiased sample variance.
Since n(Xˉn−μ)/σ∼N(0,1), and Sn/σ2∼χn−12/(n−1), then Tn∼tn−1 (by Cochran’s theorem), which is Student’s T distribution with (n−1) degrees of freedom. So the distribution of Tn is pivotal, and its quantiles can be found in tables.
The student’s T test of level α is specified by:
ψα=1(∣Tn∣>qα/2)
where qα/2 is the (1−α/2)-quantile of tn−1.
Be careful that: The Student’s T test requires the data X1,…,Xn to be Gaussian. This test is non-asymptotic. That is, for any fixed n, we can compute the level of our test rather than the asymptotic level.
One-Sample, One-Sided
The student’s T test of level α is specified by:
ψα=1(∣Tn∣>qα)
where qα is the (1−α)-quantile of tn−1.
Two-Sample
Back to the Clinical trials example, we have:
nσd2+mσc2Xˉn−Yˉm−(Δd−Δc)∼N(0,1)
When the samples size is small, we can not use Slutsky’s lemma anymore, which means that we can not replace the variance σ2 by the sample variance σ2 anymore.
But we have approximately:
nσd2+mσc2Xˉn−Yˉm−(Δd−Δc)∼approx.tN
where the degrees of freedom N is given by the Welch-Satterthwaite formula :
The likelihood ratio in this set-up is of the form :
ψC=1(Ln(x1,…,xn;θ0)Ln(x1,…,xn;θ1)>C)
where C is a threshold to be specified.
Likelihood Ratio Test (based on log-likelihood)
Consider an i.i.d. sample X1,…,Xn with statistical model (E,(Pθ)θ∈Θ), where θ∈Rd.
Suppose the null hypothesis has the form:
H0:(θr+1∗,…,θd∗)=(θr+1(0),…,θd(0))
for some fixed and given numbers θr+1(0),…,θd(0).
Thus Θ0, the region defined by the null hypothesis, is
Θ0:={v∈Rd:(vr+1,…,vd)=(θr+1(0),…,θd(0))}
where (θr+1(0),…,θd(0)) consists of known values.
The likelihood ratio test involves the test-statistic:
Tn=2(ℓn(θnMLE)−ℓn(θnc))
where ℓn is the log-likelihood.
The estimator θnc is the constrained MLE , and it is defined to be:
θnc=argmaxθ∈Θ0ℓn(X1,…,Xn;θ)
Wilks’ Theorem
Assume H0 is true and the MLE technical conditions are satisfied. Then, Tn is a pivotal statistic; i.e., it converges to a pivotal distribution.
Tn(d)n→∞χd−r2
Goodness of Fit Tests
Goodness of fit (GoF) tests: we want to know if the hypothesized distribution is a good fit for the data. In order to answer questions like:
Does X have distribution N(0,1)?
Does X have a Gaussian distribution ?
Does X have distribution U([0,1])?
Key characteristic of GoF tests: no parametric modeling.
Suppose you observe i.i.d. samples X1,…,Xn∼P from some unknown distribution P. Let F denote a parametric family of probability distributions (for example, F could be the family of normal distributions {N(μ,σ2)}μ∈R,σ2>0).
In the topic of goodness of fit testing, our goal is to answer the question “DoesPbelong to the familyF, or isPany distribution outside ofP?”
Parametric hypothesis testing is a particular case of goodness of fit testing. However, in the context of parametric hypothesis testing, we assume that the data distribution P comes from some parametric statistical model {Pθ}θ∈Θ, and we ask if the distribution P belongs to a submodel {Pθ}θ∈Θ0 or its complement {Pθ}θ∈Θ1. In parametric hypothesis testing, we allow only a small set of alternatives {Pθ}θ∈Θ1, where as in the goodness of fit testing, we allow the alternative to be anything.
GoF for Discrete Distributions
The probability simplex in RK, denoted by ΔK, is the set of all vectors p=[p1,…,pK]T such that:
p⋅1=pT1=1,pi≥0 for all K
where 1 denotes the vector 1=(11…1)T. Equivalently, in more familiar notation,
ΔK={p=(p1,…,pK)∈[0,1]K:∑i=1Kpi=1}
We want to test:
H0:p=p0,H1:p=p0
where p0 is a fixed PMF.
The categorical likelihood of observing a sequence of n i.i.d. outcomes X1,…,Xn∼X can be written using the number of occurrences Ni,i=1,…,K, of the K outcomes as:
Ln(X1,…,Xn,p1,…,pK)=p1N1p2N2⋯pKNK
The categorical likelihood of the random variable X, when written as a random function, is
L(X,p1,…,pK)=∏i=1Kpi1(X=ai)
(the sample space of a categorical random variableX is E={a1,…,aK}).
Let p be the MLE:
pnMLE=argmaxp∈ΔKlogLn(X1,…,Xn,p)
then:
pj=nNj,j=1,…,K
χ2 test : if H0 is true, then n(p−p0) is asymptotically normal and:
n∑i=1Kpi0(pi−pi0)2(d)n→∞χK−12
GoF for Continuous Distributions
Let X1,…,Xn be i.i.d. real random variables. The cdf of X1 is defined as:
F(t)=P[X1≤t]=E[1(X1≤t)],∀t∈R
which completely characterizes the distribution of X1.
The empirical cdf (a.k.a. sample cdf) of the sample X1,…,Xn is defined as:
By Glivenko-Cantelli Theorem (Fundamental theorem of statistics):
supt∈R∣Fn(t)−F(t)∣a.s.n→∞0
By the CLT, for all t∈R,
n(Fn(t)−F(t))(d)n→∞N(0,F(t)(1−F(t))
(The variance of Bernoulli distribution is p(1−p).)
Donsker’s Theorem states that if F is continuous, then
nsupt∈R∣Fn(t)−F(t)∣(d)n→∞sup0≤x≤1∣B(x)∣
where B is a random curve called a Brownian bridge.
We want to test:
H0:F=F0,H1:F=F0
where F0 is a continuous cdf. Let Fn be the empirical cdf of the sample X1,…,Xn. If H0 is true (F=F0), then Fn(t)≈F0(t), for all t∈R.
Kolmogorov-Smirnov test
The Kolmogorov-Smirnov test statistic is defined as:
Tn=supt∈Rn∣Fn(t)−F0(t)∣
and the Kolmogorov-Smirnov test is
1(Tn>qα)where qα=qα(supt∈[0,1]∣B(t)∣)
Here, qα=qα(supt∈[0,1]∣B(t)∣) is the (1−α)-quantile of the supremum supt∈[0,1]∣B(t)∣ of the Brownian bridge as in Donsker’s Theorem.
Tn is called a pivotal statistic: If H0 is true, the distribution of Tn does not depend on the distribution of the Xi‘s and it is easy to reproduce it in simulations. In practice, the quantile values can be found in K-S Tables.
Even though the K-S test statistics Tn is defined as a supremum over the entire real line, it can be computed explicitly as follows:
where X(i) is the order statistic , and represents the ith smallest value of the sample. For example, X(1) is the smallest and X(n) is the greatest of a sample of size n.
Kolmogorov-Lilliefors Test
What if I want to test: “Does X have Gaussian distribution?” but I don’t know the parameters? A simple idea is using plug-in:
supt∈R∣Fn(t)−Φμ^,σ^2(t)∣
where: μ^=Xˉn, σ^2=Sn2, and Φμ^,σ^2(t) is the cdf of N(μ^,σ^2).
In this case Donsker’s theorem is no longer valid.
Instead, we compute the quantiles for the test statistic:
Tn=supt∈R∣Fn(t)−Φμ^,σ^2(t)∣
They do not depend on unknown parameters! This is the Kolmogorov-Lilliefors test.
Example: Testing the Mean for a Sample with Unknown Distribution
Suppose that you observe a sample X1,…,Xn∼iidP for some distribution P with continuous cdf. Your goal is to decide between the null and alternative hypotheses:
H0:μ=0,H1:μ=0
Looking at a histogram, you suspect that X1,…,Xn have a Gaussian distribution. We would like to first test this suspicion. Formally, we would like to decide between the following null and alternative hypotheses:
We can use Kolmogorov-Lilliefors test to decide between H0′ and H1′.
Suppose that the test we used in the previous part for H0′ and H1′fails to reject.
Then we can use Student’s T test to decide between the original hypotheses H0 and H1.
In practice, many of the methods for statistical inference, such as the student’s T test, rely on the assumption the data is Gaussian. Hence, before performing such a test, we need to evaluate whether or not the data is Gaussian. This problem gives an example of such a procedure. First we tested for the Gaussianity of our data, and since the Kolmogorov-Lilliefors test failed to reject, assuming that there was no error, we could apply the student’s T test to answer our original hypothesis testing question.