December 14, 2019
Unit 4 - Hypothesis testing
Notes on T-tests, Wald's test, likelihood ratio tests, and goodness of fit tests from MITx 18.6501x Fundamentals of Statistics
Article Notes
Published
December 14, 2019
Reading Time
17 minutes
Format
Mathematical notebook entry with static MathJax rendering.
This unit will present more tests based on CLT and sometimes Slutsky, like T-test when data is Gaussian,
Asymptotic test - Clinical trials example
Let
Hypotheses:
From (even don’t need CLT to get this):
We can get:
Assume that
where:
This is a one-side, two-sample test. Here we use
However, when the sample size is small, we cannot realistically apply Slutsky’s lemma, so we cannot replace the variance
The distribution
For a positive integer
If
And
If
The sample variance
Cochran’s theorem states that if
satisfies:
is independent of
Here it is
We often prefer the unbiased estimator of
Then its expectation:
Student’s T distribution
For a positive integer
Student’s T test
One-Sample, Two-Sided
The test statistic:
where
Since
The student’s T test of level
where
Be careful that: The Student’s T test requires the data
One-Sample, One-Sided
The student’s T test of level
where
Two-Sample
Back to the Clinical trials example, we have:
When the samples size is small, we can not use Slutsky’s lemma anymore, which means that we can not replace the variance
But we have approximately:
where the degrees of freedom
Wald’s test
According to Asymptotic Normality of the MLE:
where
Standardize the statement of asymptotic normality above:
The Wald’s test:
which is also:
Wald’s Test in 1 Dimension
In 1 dimension, Wald’s Test coincides with the two-sided test based on on the asymptotic normality of the MLE.
Given the hypotheses:
a two-sided test of level
where the Fisher information
On the other hand, a Wald’s test of level
Using the result from the problem above, we see that the two-sided test of level
Example: Performing Wald’s Test on a Gaussian Data Set
Suppose
The Wald’s test of level
where:
and
Likelihood Ratio Test
Basic Form
Given the hypotheses:
The likelihood ratio in this set-up is of the form :
where
Likelihood Ratio Test (based on log-likelihood)
Consider an i.i.d. sample
Suppose the null hypothesis has the form:
for some fixed and given numbers
Thus
where
The likelihood ratio test involves the test-statistic:
where
The estimator
Wilks’ Theorem
Assume
Goodness of Fit Tests
Goodness of fit (GoF) tests: we want to know if the hypothesized distribution is a good fit for the data. In order to answer questions like:
- Does
have distribution ? - Does
have a Gaussian distribution ? - Does
have distribution ?
Key characteristic of GoF tests: no parametric modeling.
Suppose you observe i.i.d. samples
In the topic of goodness of fit testing, our goal is to answer the question “Does
Parametric hypothesis testing is a particular case of goodness of fit testing. However, in the context of parametric hypothesis testing, we assume that the data distribution
GoF for Discrete Distributions
The probability simplex in
where
We want to test:
where
The categorical likelihood of observing a sequence of
The categorical likelihood of the random variable
(the sample space of a categorical random variable
Let
then:
GoF for Continuous Distributions
Let
which completely characterizes the distribution of
The empirical cdf (a.k.a. sample cdf) of the sample
By the LLN, for all
By Glivenko-Cantelli Theorem (Fundamental theorem of statistics):
By the CLT, for all
(The variance of Bernoulli distribution is
Donsker’s Theorem states that if
where
We want to test:
where
Kolmogorov-Smirnov test
The Kolmogorov-Smirnov test statistic is defined as:
and the Kolmogorov-Smirnov test is
Here,
Even though the K-S test statistics
where
Kolmogorov-Lilliefors Test
What if I want to test: “Does X have Gaussian distribution?” but I don’t know the parameters? A simple idea is using plug-in:
where:
In this case Donsker’s theorem is no longer valid.
Instead, we compute the quantiles for the test statistic:
They do not depend on unknown parameters! This is the Kolmogorov-Lilliefors test.
Example: Testing the Mean for a Sample with Unknown Distribution
Suppose that you observe a sample
Looking at a histogram, you suspect that
We can use Kolmogorov-Lilliefors test to decide between
Suppose that the test we used in the previous part for
Then we can use Student’s T test to decide between the original hypotheses
In practice, many of the methods for statistical inference, such as the student’s T test, rely on the assumption the data is Gaussian. Hence, before performing such a test, we need to evaluate whether or not the data is Gaussian. This problem gives an example of such a procedure. First we tested for the Gaussianity of our data, and since the Kolmogorov-Lilliefors test failed to reject, assuming that there was no error, we could apply the student’s T test to answer our original hypothesis testing question.