The trinity of statistical inference: estimation, confidence intervals and testing.
Estimator: one value whose performance can be measured by consistency, asymptotic normality, bias, variance and quadratic risk.
Confidence intervals provide “error bars” around estimators. Their size depends on the confidence level.
Hypothesis testing: we want to ask a yes/no answer about an unknown parameter. They are characterized by hypotheses, level, power, test statistic and rejection region. Under the null hypothesis, the value of the unknown parameter becomes known (no need for plug-in).
Statistical model
Formal definition:
Let the observed outcome of a statistical experiment be a sampleX1,X2,…,Xn of n i.i.d. random variables in some measurable space E (usually E⊆R) and denote by P their common distribution. A statistical model associated to that statistical experiment is a pair:
(E,(Pθ)θ∈Θ)
where:
E is called sample space
(Pθ)θ∈Θ is a family of probability measured on E
Θ is any set , called parameter set.
For example: the statistical model of Bernoulli distribution: ({0,1},(Ber(p))p∈(0,1))
Parametric, nonparametric and semiparametric models
Usually, we will assume that the statistical model is well specified, i.e., defined such that P=Pθ, for some θ⊆Θ. This particular θ is called the true parameter, and is unknown: The aim of the statistical experiment is to estimate θ, or check it’s properties when they have a special meaning.
if Θ⊆Rd for some d≥1, the model is called parametric
if Θ is infinite dimensional, the model is called nonparametric
if Θ=Θ1×Θ2, where Θ1 is finite dimensional, and Θ2 is infinite dimensional, then the model is called semiparametric.
Identifiability
The parameter θ is called identifiable iff the map θ∈Θ↦Pθ is injective, i.e.:
θ=θ′⇒Pθ=Pθ′
or equivalently:
Pθ=Pθ′⇒θ=θ′
Estimation
A statistic is any measurable function of the sample.
An estimator of θ is a statistic θ^n=θ^n(X1,…,Xn) whose expression does not depend on θ.
An estimator θ^n of θ is weakly (resp. strongly) consistent if θ^nP/(resp.a.s.)n→∞θ
An estimator θ^n of θ is asymptotically normal if n(θ^n−θ)(d)n→∞N(0,σ2). The quantity σ2 is then called asymptotic variance of θ^n.
Bias of an estimator θ^n of θ:
bias(θ^n)=E[θ^n]−θ
If bias(θ^n)=0, we say that θ^ is unbiased.
We want estimators to have low bias and low variance at the same time.
The Risk (or quadratic risk) of an estimator θ^n∈R is
R(θ^n)=E[∣θ^n−θ∣2]
which means: Quadratic Risk=Variance+Bias2
For example: for Bernoulli distribution ({0,1},(Ber(p))p∈(0,1)), using p^n=Xn as an estimator for p, this estimator is unbiased, consistent, and its quadratic risk tends to 0 as the sample size n→∞.
Confidence Intervals
Let (E,(Pθ)θ∈Θ) be a statistical model based on observations X1,X2,…,Xn and assume Θ⊆R. Let α∈(0,1).
Confidence interval (C.I.) of level 1−α for θ: any random (depending on X1,X2,…,Xn) interval I whose boudnaries do not depend on θ and such that:
Pθ[I∋θ]≥1−α,∀θ∈Θ
C.I. of asymptotic level1−α for θ: any random interval I whose boundaries do not depend on θ and such that:
limn→∞P[I∋θ]≥1−α,∀θ∈Θ
Be aware that it is P≥1−α, not P=1−α.
For example: for Bernoulli distribution ({0,1},(Ber(p))p∈(0,1)), using p^n=Xn as an estimator for p, and from CLT:
np(1−p)Xn−p(d)n→∞N(0,1)
For a fixed α∈(0,1), if qα/2 is the (1−α/2)-quantile of N(0,1), then with probability ≃1−α (if n is large enough),
95% C.I. means if we were to repeat the experiment then the true parameter θ would be in the resulting confidence interval about 95% of the time.
It is wrong to say that
By 95% of chance that the true parameter θ is in the resulting confidence interval
Because from the frequentists’ point of view, the true parameter θ is deterministic (fixed, even though unknown). Once the confidence interval is calculated, we can only say that the true parameter θ is in the C.I. or not, like a Bernoulli distribution, only 1 or 0 is taken. But I suppose we can say that:
The expectation of that Bernoulli distribution is 95%.
Steps to find a confidence interval
Find an estimator for θ^ for θ
Determine the (asymptotic) distribution of θ^
Compute a confidence interval for θ based on θ^ with level α
Delta method
Exponential distribution example (1/2)
Take Exponential distribution as an example, PDF: f(t)=λe−λt,∀t≥0.
Let X1,X2,…,Xn∼iidexp(λ), and its sample mean: Xn:=n1∑i=1nXi. By LLN: Xna.s./Pn→∞λ1, because E[X1]=λ1.
So a natural estimator of λ is:
λ^:=Xn1
Hence: λ^a.s./Pn→∞λ.
Be careful that, E[X11]>E[X1]1=λ.
By CLT:
n(Xn−λ1)(d)n→∞N(0,λ−2)
How does the CLT transfer to λ^? How to find an asymptotic confidence interval for λ? Here we need to use the Delta method.
The Delta method
Let (Zn)n≥1 sequence of r.v. that satisfies
n(Zn−θ)(d)n→∞N(0,σ2)
for some θ∈R and σ2>0 (the sequence (Zn)n≥1 is said to be asymptotically normal around θ).
Let g:R→R be continuously differentiable at the point θ. Then, (g(Zn))n≥1 is also asymptotically normal around g(θ); More precisely:
n(g(Zn)−g(θ))(d)n→∞N(0,(g′(θ))2σ2)
Exponential distribution example (2/2)
By using the delta method, g(x)=x1,
n(λ^−λ)(d)n→∞N(0,λ2)
To calculate the asymptotic confidence interval for λ:
[λ^−nqα/2λ,λ^+nqα/2λ]
Then we can use “Solve” or “Plug-in” method to get confidence interval for λ.
Hypothesis testing
Statistical formulation
Consider a sample X1,X2,…,Xn of i.i.d. random variables and a statistical model (E,(Pθ)θ∈Θ)
Let Θ0 and Θ1 be disjoint subsets of Θ
Consider the two hypotheses:
H0:θ∈Θ0
H1:θ∈Θ1
H0 is the null hypothesis, H1 is the alternative hypothesis
If we believe that the true θ is either in H0 or in H1, we may want to test H0 against H1
We want to decide whether to reject H0 (look for evidence against H0 in the data)
Asymmetry in the hypotheses
H0 and in H1do not play a symmetric role: the data is is only used to try to disprove H0
In particular lack of evidence, does not mean that H0 is true (“innocent until proven guilty”)
A test is a statistic ψ∈{0,1} such that:
If ψ=0, H0 is not rejected
If ψ=1, H0 is rejected
Errors
Rejection region of a test ψ:
Rψ={x∈En:ψ(x)=1}
Type I error of a test ψ (rejecting H0 when it is actually true): αψ
Type II error of a test ψ (not rejecting H0 although H1 is actually true): βψ
Power of a test ψ:
πψ=θ∈Θ1inf(1−βψ(θ))
Level, test statistic and rejection region
A test has levelα if:
αψ(θ)≤α,∀θ∈Θ0
A test has asymptotic levelα if:
limn→∞αψn(θ)≤α,∀θ∈Θ0
In general, a test has the form:
ψ=1{Tn>c}
for some statistic Tn and threshold c∈R
Tn is called the test statistic. The rejection region is Rψ={Tn>c}
One-sided vs two-sided tests
We can refine the terminology when θ∈Θ⊂R and H0 is of the form:
H0:θ=θ0⟺Θ0={θ0}
If H1:θ=θ0: two-sided test
if H1:θ>θ0 or H1:θ<θ0: one-sided test
p-value
The (asymptotic) p-value of a test αψ is the smallest (asymptotic) level α at which αψ rejects H0. It is random, it depends on the sample.
p-value≤α⟺H0 is rejected by ψα, at the (asymptotic) level α
The smaller the p-value, the more confidently one can reject H0.
Steps of hypothesis testing
Find estimators
Find pivot and determine the distribution of pivot. Write some statistic Tn, and let ψ=1{Tn>c}
It is pivot if we can manage to write it down in such a way that it’s distribution under the null hypothesis is known and does not depend on any additional parameters.