লিনিয়ার রিগ্রেশনে তাত্পর্যপূর্ণ বৈপরীত্য: একটি গুণফলের জন্য অ-উল্লেখযোগ্য সামগ্রিক এফ-স্ট্যাটিস্টিকের জন্য উল্লেখযোগ্য টি-টেস্ট

35

আমি 4 টি শ্রেণিবদ্ধ ভেরিয়েবল (প্রতিটি 4 টি স্তর সহ) এবং একটি সংখ্যাসূচক আউটপুটের মধ্যে একাধিক লিনিয়ার রিগ্রেশন মডেল ফিট করছি। আমার ডেটাসেটে 43 টি পর্যবেক্ষণ রয়েছে।

রিগ্রেশন আমাকে অনুসরণ দেয় $p$ থেকে -values $t$ : প্রতিবার ঢাল সহগ জন্য -test $.15, .67, .27, .02$ । সুতরাং, 4 র্থ predictor জন্য সহগ এ গুরুত্বপূর্ণ $\alpha = .05$ আত্মবিশ্বাসের স্তরে ।

অন্যদিকে, রিগ্রেশন আমাকে নাল অনুমানের সামগ্রিক টেষ্ট থেকে $p$ মূল্য দেয় যে আমার সমস্ত opeালের সহগগুলি শূন্যের সমান। আমার ডেটাসেটের জন্য, এই মানটি । $F$ $p$ $.11$

My question: how should I interpret these results? Which $p$ -value should I use and why? Is the coefficient for the 4th variable significantly different from $0$ at the $\alpha = .05$ confidence level?

I've seen a related question, $F$ and $t$ statistics in a regression, but there was an opposite situation: high $t$ -test $p$ -values and low $F$ -test $p$ -value. Honestly, I don't quite understand why we would need an $F$ -test in addition to a $t$ -test to see if linear regression coefficients are significantly different from zero.

— Leo
সূত্র

2

If you have 4 categorical variables with 4 levels each, you should have 3*4=12 coefficients for your independent variables (plus the intercept)...

— boscovich

@andrea: I've decided to treat them as numerical variables.

— Leo

4

0.02 is barely significant (especially if you consider the fact that you have five tests in total) and 0.11 is not very high. A generous interpretation would be that with a little more power the overall F-test would also be significant (and perhaps the first coefficient as well). A more conservative interpretation is that you shouldn't have much confidence in any of these results (including the coefficient with a .02 p value). Either way, you shouldn't read too much in the difference between .02 and .11.

— Gala

3

For a discussion of the opposite case, you can also see here: how can a regression be significant yet all predictors be non-significant, in addition to the question linked above.

— gung - Reinstate Monica

37

I'm not sure that multicollinearity is what's going on here. It certainly could be, but from the information given I can't conclude that, and I don't want to start there. My first guess is that this might be a multiple comparisons issue. That is, if you run enough tests, something will show up, even if there's nothing there.

One of the issues that I harp on is that the problem of multiple comparisons is always discussed in terms of examining many pairwise comparisons—e.g., running t-tests on every unique pairing of levels. (For a humorous treatment of multiple comparisons, look here.) This leaves people with the impression that that is the only place this problem shows up. But this is simply not true—the problem of multiple comparisons shows up everywhere. For instance, if you run a regression with 4 explanatory variables, the same issues exist. In a well-designed experiment, IV's can be orthogonal, but people routinely worry about using Bonferroni corrections on sets of a-priori, orthogonal contrasts, and don't think twice about factorial ANOVA's. To my mind this is inconsistent.

The global F test is what's called a 'simultaneous' test. This checks to see if all of your predictors are unrelated to the response variable. The simultaneous test provides some protection against the problem of multiple comparisons without having to go the power-losing Bonferroni route. Unfortunately, my interpretation of what you report is that you have a null finding.

Several things mitigate against this interpretation. First, with only 43 data, you almost certainly don't have much power. It's quite possible that there is a real effect, but you just can't resolve it without more data. Second, like both @andrea and @Dimitriy, I worry about the appropriateness of treating 4-level categorical variables as numeric. This may well not be appropriate, and could have any number of effects, including diminishing your ability to detect what is really there. Lastly, I'm not sure that significance testing is quite as important as people believe. A $p$ of $.11$ is kind of low; is there really something going on there? maybe! who knows?—there's no 'bright line' at .05 that demarcates real effects from mere appearance.

— gung - Reinstate Monica
সূত্র

24

I would like to suggest that this phenomenon (of a non-significant overall test despite a significant individual variable) can be understood as a kind of aggregate "masking effect" and that although it conceivably could arise from multicollinear explanatory variables, it need not do that at all. It also turns out not to be due to multiple comparison adjustments, either. Thus this answer is adding some qualifications to the answers that have already appeared, which on the contrary suggest that either multicollinearity or multiple comparisons should be looked at as the culprits.

To establish the plausibility of these assertions, let's generate a collection of perfectly orthogonal variables--just as non-collinear as possible--and a dependent variable that explicitly is determined solely by the first of the explanands (plus a good amount of random error independent of everything else). In R this can be done (reproducibly, if you wish to experiment) as

set.seed(17)
p <- 5 # Number of explanatory variables
x <- as.matrix(do.call(expand.grid, lapply(as.list(1:p), function(i) c(-1,1))))
y <- x[,1] + rnorm(2^p, mean=0, sd=2)

It's unimportant that the explanatory variables are binary; what matters is their orthogonality, which we can check to make sure the code is working as expected, which can be done by inspecting their correlations. Indeed, the correlation matrix is interesting: the small coefficients suggest y has little to do with any of the variables except the first (which is by design) and the off-diagonal zeros confirm the orthogonality of the explanatory variables:

> cor(cbind(x,y))
     Var1  Var2  Var3   Var4  Var5      y
Var1 1.00 0.000 0.000  0.000  0.00  0.486
Var2 0.00 1.000 0.000  0.000  0.00  0.088
Var3 0.00 0.000 1.000  0.000  0.00  0.044
Var4 0.00 0.000 0.000  1.000  0.00 -0.014
Var5 0.00 0.000 0.000  0.000  1.00 -0.167
y    0.49 0.088 0.044 -0.014 -0.17  1.000

Let's run a series of regressions, using only the first variable, then the first two, and so on. For brevity and easy comparison, in each one I show only the line for the first variable and the overall F-test:

>temp <- sapply(1:p, function(i) print(summary(lm(y ~ x[, 1:i]))))

#              Estimate Std. Error t value Pr(>|t|)   
1  x[, 1:i]       0.898      0.294    3.05   0.0048 **
F-statistic: 9.29 on 1 and 30 DF,  p-value: 0.00478 

2  x[, 1:i]Var1    0.898      0.298    3.01   0.0053 **
F-statistic: 4.68 on 2 and 29 DF,  p-value: 0.0173 

3  x[, 1:i]Var1   0.8975     0.3029    2.96   0.0062 **
F-statistic: 3.05 on 3 and 28 DF,  p-value: 0.0451 

4  x[, 1:i]Var1   0.8975     0.3084    2.91   0.0072 **
F-statistic: 2.21 on 4 and 27 DF,  p-value: 0.095 

5  x[, 1:i]Var1   0.8975     0.3084    2.91   0.0073 **
F-statistic: 1.96 on 5 and 26 DF,  p-value: 0.118

Look at how (a) the significance of the first variable barely changes, (a') the first variable remains significant (p < .05) even when adjusting for multiple comparisons (e.g., apply Bonferroni by multiplying the nominal p-value by the number of explanatory variables), (b) the coefficient of the first variable barely changes, but (c) the overall significance grows exponentially, quickly inflating to a non-significant level.

I interpret this as demonstrating that including explanatory variables that are largely independent of the dependent variable can "mask" the overall p-value of the regression. When the new variables are orthogonal to existing ones and to the dependent variable, they will not change the individual p-values. (The small changes seen here are because the random error added to y is, by accident, slightly correlated with all the other variables.) One lesson to draw from this is that parsimony is valuable: using as few variables as needed can strengthen the significance of the results.

I am not saying that this is necessarily happening for the dataset in the question, about which little has been disclosed. But knowledge that this masking effect can happen should inform our interpretation of the results as well as our strategies for variable selection and model building.

— whuber
সূত্র

+1, I agree w/ this analysis. FWIW, this is the explanation I was hinting at (perhaps not well) in my discussion about power in my answer to the other question. I do have 1 question about your version here, why do you use 32 as the mean of your error term? Is that a typo, or is it important in some way?

— gung - Reinstate Monica

@gung Where do you see 32? If you're referring to rnorm(2^p, sd=2), please note that the first argument is the number of terms, not the mean. The mean by default is zero and therefore has not been explicitly specified.

— whuber

Oh, sorry. I guess I was confusing rnorm() w/

N (μ, σ)

$\mathcal N(\mu, \sigma)$ .

— gung - Reinstate Monica

@gung I am grateful for the opportunity to clarify the code and therefore have edited the offending line.

— whuber

11

You frequently have this happen when you have a high degree of collinearity among your explanatory variables. The ANOVA F is a joint test that all the regressors are jointly uninformative. When your Xs contain similar information, the model cannot attribute the explanatory power to one regressor or another, but their combination can explain much of the variation in the response variable.

Also, the fact that you seem to be treating you categorical variables as if they were continuous may be problematic. You are explicitly imposing restrictions like bumping $x_{1}$ from 1 to 2 has the same effect on $y$ as bumping it from 3 to 4. Sometime's that's OK, but often it's not.

— Dimitriy V. Masterov
সূত্র

If collinearity is a problem, then you will have high standard errors and perhaps implausibly large coefficients, maybe even with the wrong signs. To make sure that this is what is happening, calculate the variance inflation factors (VIFs) after your regression. A reasonable rule of thumb is that collinearity is a problem if the largest VIF is greater than 10. If so, you really have two options here. One is to re-specify the model to reduce the near-linear dependence by dropping some of your variables. The second is to get a larger and/or better (less homogenous) sample.

— Dimitriy V. Masterov

1

(+1) This explanation is a good one, but it is unnecessary to attribute the phenomenon to multicollinearity: the key distinction is between jointly informative and individually informative. Including additional uncorrelated regressors (which avoids any multicollinearity) lowers the former while leaving the latter unchanged.

— whuber