আপনি সম্ভবত দুটি নমুনা টি পরীক্ষার কথা ভাবছেন কারণ এটি প্রায়শই প্রথম স্থানে টি বিতরণ আসে। তবে সত্যিই সমস্ত টি পরীক্ষার মানে হল পরীক্ষার পরিসংখ্যানগুলির জন্য রেফারেন্স বিতরণ একটি টি বিতরণ। তাহলে জেড ~ এন ( 0 , 1 ) এবং এস 2 ~ χ 2 ঘ সঙ্গে জেড এবং এস 2 স্বাধীন, তারপর
জেডttttZ∼N(0,1)S2∼χ2dZS2√S2/d∼td
ZS2/d−−−−√∼td
by definition. I'm writing this out to emphasize that the
tt distribution is just a name that was given to the distribution of this ratio because it comes up a lot, and anything of this form will have a
tt distribution. For the two sample t test, this ratio appears because under the null the difference in means is a zero-mean Gaussian and the variance estimate for independent Gaussians is an independent
χ2χ2 (the independence can be shown via
Basu's theorem which uses the fact that the standard variance estimate in a Gaussian sample is ancillary to the population mean, while the sample mean is complete and sufficient for that same quantity).
With linear regression we basically get the same thing. In vector form, ˆβ∼N(β,σ2(XTX)−1)β^∼N(β,σ2(XTX)−1). Let S2j=(XTX)−1jjS2j=(XTX)−1jj and assume the predictors XX are non-random. If we knew σ2σ2 we'd have
ˆβj−0σSj∼N(0,1)
β^j−0σSj∼N(0,1)
under the null
H0:βj=0H0:βj=0 so we'd actually have a Z test. But once we estimate
σ2σ2 we end up with a
χ2χ2 random variable that, under our normality assumptions, turns out to be independent of our statistic
ˆβjβ^j and then we get a
tt distribution.
Here's the details of that: assume y∼N(Xβ,σ2I)y∼N(Xβ,σ2I). Letting H=X(XTX)−1XTH=X(XTX)−1XT be the hat matrix we have
‖e‖2=‖(I−H)y‖2=yT(I−H)y.
∥e∥2=∥(I−H)y∥2=yT(I−H)y.
HH is idempotent so we have the really nice result that
yT(I−H)y/σ2∼χ2n−p(δ)yT(I−H)y/σ2∼χ2n−p(δ)
with non-centrality parameter
δ=βTXT(I−H)Xβ=βT(XTX−XTX)β=0δ=βTXT(I−H)Xβ=βT(XTX−XTX)β=0, so actually this is a central
χ2χ2 with
n−pn−p degrees of freedom (this is a special case of
Cochran's theorem). I'm using
pp to denote the number of columns of
XX, so if one column of
XX gives the intercept then we'd have
p−1p−1 non-intercept predictors. Some authors use
pp to be the number of non-intercept predictors so sometimes you might see something like
n−p−1n−p−1 in the degrees of freedom there, but it's all the same thing.
The result of this is that E(eTe/σ2)=n−pE(eTe/σ2)=n−p, so ˆσ2:=1n−peTeσ^2:=1n−peTe works great as an estimator of σ2σ2.
This means that
ˆβjˆσSj=ˆβjSj√eTe/(n−p)=ˆβjσSj√eTeσ2(n−p)
β^jσ^Sj=β^jSjeTe/(n−p)−−−−−−−−−−√=β^jσSjeTeσ2(n−p)−−−−−−√
is the ratio of a standard Gaussian to a chi squared divided by its degrees of freedom. To finish this, we need to show independence and we can use the following result:
Result: for Z∼Nk(μ,Σ)Z∼Nk(μ,Σ) and matrices AA and BB in Rl×kRl×k and Rm×kRm×k respectively, AZAZ and BZBZ are independent if and only if AΣBT=0AΣBT=0 (this is exercise 58(b) in chapter 1 of Jun Shao's Mathematical Statistics).
We have ˆβ=(XTX)−1XTyβ^=(XTX)−1XTy and e=(I−H)ye=(I−H)y where y∼N(Xβ,σ2I)y∼N(Xβ,σ2I). This means
(XTX)−1XT⋅σ2I⋅(I−H)T=σ2((XTX)−1XT−(XTX)−1XTX(XTX)−1XT)=0
(XTX)−1XT⋅σ2I⋅(I−H)T=σ2((XTX)−1XT−(XTX)−1XTX(XTX)−1XT)=0
so
ˆβ⊥eβ^⊥e, and therefore
ˆβ⊥eTeβ^⊥eTe.
The upshot is we now know
ˆβjˆσSj∼tn−p
β^jσ^Sj∼tn−p
as desired (under all of the above assumptions).
Here's the proof of that result. Let C=(AB)C=(AB) be the (l+m)×k(l+m)×k matrix formed by stacking AA on top of BB. Then
CZ=(AZBZ)∼N((AμBμ),CΣCT)
CZ=(AZBZ)∼N((AμBμ),CΣCT)
where
CΣCT=(AB)Σ(ATBT)=(AΣATAΣBTBΣATBΣBT).CΣCT=(AB)Σ(ATBT)=(AΣATBΣATAΣBTBΣBT).
CZCZ is a multivariate Gaussian and it is a well-known result that two components of a multivariate Gaussian are independent if and only if they are uncorrelated, so the condition
AΣBT=0AΣBT=0 turns out to be exactly equivalent to the components
AZAZ and
BZBZ in
CZCZ being uncorrelated.
◻□