লিনিয়ার রিগ্রেশন এর জন্য টি-টেস্ট বোঝা

17

আমি লিনিয়ার রিগ্রেশন (নাল হাইপোথিসিসের কোনও পারস্পরিক সম্পর্ক নেই) সম্পর্কে কিছু অনুমান পরীক্ষা করা কীভাবে কাজ করার চেষ্টা করছি। আমি যে বিষয়টিতে রইলাম তাতে প্রতিটি গাইড এবং পৃষ্ঠা টি-টেস্ট ব্যবহার করছে বলে মনে হচ্ছে। তবে আমি বুঝতে পারি না লিনিয়ার রিগ্রেশনটির টি-টেস্টের অর্থ কী। একটি টি-টেস্ট, যদি না আমার সম্পূর্ণ ভুল বোঝাপড়া বা মানসিক মডেল থাকে, তবে দুটি জনসংখ্যার তুলনা করার জন্য ব্যবহৃত হয়। তবে রেজিস্ট্রার এবং রেজিস্ট্র্যান্ড একই ধরণের জনসংখ্যার নমুনা নয় এবং এটি একই ইউনিটেরও নাও থাকতে পারে, তাই তাদের তুলনা করা বুদ্ধিমান নয়।

সুতরাং, যখন লিনিয়ার রিগ্রেশনটিতে টি-টেস্ট ব্যবহার করা হয়, তখন এটি আসলে আমরা কী করছি?

regression t-test

— জায়েমার - মনিকা পুনরায় স্থাপন করুন
সূত্র

37

আপনি সম্ভবত দুটি নমুনা পরীক্ষার কথা ভাবছেন কারণ এটি প্রায়শই প্রথম স্থানে বিতরণ আসে। তবে সত্যিই সমস্ত পরীক্ষার মানে হল পরীক্ষার পরিসংখ্যানগুলির জন্য রেফারেন্স বিতরণ একটি বিতরণ। তাহলে এবং সঙ্গে এবং স্বাধীন, তারপর $t$ $t$ $t$ $t$ $Z \sim \mathcal N(0,1)$ $S^2 \sim \chi^2_d$ $Z$ $S^2$

Z S 2 / d - - - - \sqrt \sim t d

$\frac{Z}{\sqrt{S^2 / d}} \sim t_d$ by definition. I'm writing this out to emphasize that the

t $t$ distribution is just a name that was given to the distribution of this ratio because it comes up a lot, and anything of this form will have a

t $t$ distribution. For the two sample t test, this ratio appears because under the null the difference in means is a zero-mean Gaussian and the variance estimate for independent Gaussians is an independent

χ2 $\chi^2$ (the independence can be shown via Basu's theorem which uses the fact that the standard variance estimate in a Gaussian sample is ancillary to the population mean, while the sample mean is complete and sufficient for that same quantity).

With linear regression we basically get the same thing. In vector form, $\hat \beta \sim \mathcal N(\beta, \sigma^2 (X^T X)^{-1})$ . Let $S^2_j = (X^T X)^{-1}_{jj}$ and assume the predictors $X$ are non-random. If we knew $\sigma^2$ we'd have

β ^ j - 0 σ S j \sim N (0, 1)

$\frac{\hat \beta_j - 0}{\sigma S_j} \sim \mathcal N(0, 1)$ under the null

H0:βj=0 $H_0 : \beta_j = 0$ so we'd actually have a Z test. But once we estimate

σ2 $\sigma^2$ we end up with a

χ2 $\chi^2$ random variable that, under our normality assumptions, turns out to be independent of our statistic

β^j $\hat \beta_j$ and then we get a

t $t$ distribution.

Here's the details of that: assume $y \sim \mathcal N(X\beta, \sigma^2 I)$ . Letting $H = X(X^TX)^{-1}X^T$ be the hat matrix we have

∥ e ∥ 2 = ∥ (I - H) y ∥ 2 = y T (I - H) y .

$\|e\|^2 = \|(I-H)y\|^2 = y^T(I-H)y.$

H $H$ is idempotent so we have the really nice result that

y T (I - H) y / σ 2 \sim χ 2 n - p (δ)

$y^T(I-H)y / \sigma^2 \sim \mathcal \chi_{n-p}^2(\delta)$ with non-centrality parameter

δ=βTXT(I−H)Xβ=βT(XTX−XTX)β=0 $\delta = \beta^TX^T(I-H)X\beta = \beta^T(X^TX - X^T X)\beta = 0$ , so actually this is a central

χ2 $\chi^2$ with

n−p $n-p$ degrees of freedom (this is a special case of Cochran's theorem). I'm using

p $p$ to denote the number of columns of

X $X$ , so if one column of

X $X$ gives the intercept then we'd have

p−1 $p-1$ non-intercept predictors. Some authors use

p $p$ to be the number of non-intercept predictors so sometimes you might see something like

n−p−1 $n-p-1$ in the degrees of freedom there, but it's all the same thing.

The result of this is that $E(e^Te / \sigma^2) = n-p$ , so $\hat \sigma^2 := \frac{1}{n-p} e^T e$ works great as an estimator of $\sigma^2$ .

This means that

β ^ j σ ^ S j = β ^ j S j e T e / ( n - p ) - - - - - - - - - - \sqrt = β ^ j σ S j e T e σ 2 ( n - p ) - - - - - - \sqrt

$\frac{\hat \beta_j}{\hat \sigma S_j}= \frac{\hat \beta_j}{S_j\sqrt{e^Te / (n-p)}} = \frac{\hat \beta_j}{\sigma S_j\sqrt{\frac{e^Te}{\sigma^2(n-p)}}}$ is the ratio of a standard Gaussian to a chi squared divided by its degrees of freedom. To finish this, we need to show independence and we can use the following result:

Result: for $Z \sim \mathcal N_k(\mu, \Sigma)$ and matrices $A$ and $B$ in $\mathbb R^{l\times k}$ and $\mathbb R^{m\times k}$ respectively, $AZ$ and $BZ$ are independent if and only if $A\Sigma B^T = 0$ (this is exercise 58(b) in chapter 1 of Jun Shao's Mathematical Statistics).

We have $\hat \beta = (X^TX)^{-1}X^T y$ and $e = (I-H)y$ where $y \sim \mathcal N(X\beta, \sigma^2 I)$ . This means

(X T X) - 1 X T \cdot σ 2 I \cdot (I - H) T = σ 2 ((X T X) - 1 X T - (X T X) - 1 X T X (X T X) - 1 X T) = 0

$(X^TX)^{-1}X^T \cdot \sigma^2 I \cdot (I-H)^T = \sigma^2 \left((X^TX)^{-1}X^T - (X^TX)^{-1}X^TX(X^TX)^{-1}X^T\right) = 0$ so

β^⊥e $\hat \beta \perp e$ , and therefore

β^⊥eTe $\hat \beta \perp e^T e$ .

The upshot is we now know

β ^ j σ ^ S j \sim t n - p

$\frac{\hat \beta_j}{\hat \sigma S_j} \sim t_{n-p}$ as desired (under all of the above assumptions).

Here's the proof of that result. Let $C = {A \choose B}$ be the $(l+m)\times k$ matrix formed by stacking $A$ on top of $B$ . Then

C Z = (A Z B Z) \sim N ((A μ B μ), C Σ C T)

$CZ = {AZ \choose BZ} \sim \mathcal N \left({A\mu \choose B\mu}, C\Sigma C^T \right)$ where

C Σ C T = (A B) Σ (A T B T) = (A Σ A T B Σ A T A Σ B T B Σ B T) .

$C\Sigma C^T = {A \choose B} \Sigma \left(\begin{array}{cc} A^T & B^T \end{array}\right) = \left(\begin{array}{cc}A\Sigma A^T & A\Sigma B^T \\ B\Sigma A^T & B\Sigma B^T\end{array}\right).$

CZ $CZ$ is a multivariate Gaussian and it is a well-known result that two components of a multivariate Gaussian are independent if and only if they are uncorrelated, so the condition

AΣBT=0 $A\Sigma B^T = 0$ turns out to be exactly equivalent to the components

AZ $AZ$ and

BZ $BZ$ in

CZ $CZ$ being uncorrelated.

$\square$

— jld
সূত্র

3

+1 always enjoy reading your answer.

— Haitao Du

9

@Chaconne's answer is great. But here is a much shorter nonmathematical version!

Since the goal is to compute a P value, you first need to define a null hypothesis. Almost always, that is that the slope is actually horizontal so the numerical value for the slope (beta) is 0.0.

The slope fit from your data is not 0.0. Is that discrepancy due to random chance or due to the null hypothesis being wrong? You can't ever answer that for sure, but a P value is one way to sort-of-kind-of get at an answer.

The regression program reports a standard error of the slope. Compute the t ratio as the slope divided by its standard error. Actually, it is (slope minus null hypothesis slope) divided by the standard error, but the null hypothesis slope is nearly always zero.

Now you have a t ratio. The number of degrees of freedom (df) equals the number of data points minus the number of parameters fit by the regression (two for linear regression).

With those values (t and df) you can determine the P value with an online calculator or table.

It is essentially a one-sample t-test, comparing an observed computed value (the slope) with a hypothetical value (the null hypothesis).

— Harvey Motulsky
সূত্র

4

The real question is why this is "essentially a one-sample t-test", and I don't see how it can become clear from your answer...

— amoeba says Reinstate Monica