I would like to suggest that this phenomenon (of a non-significant overall test despite a significant individual variable) can be understood as a kind of aggregate "masking effect" and that although it conceivably could arise from multicollinear explanatory variables, it need not do that at all. It also turns out not to be due to multiple comparison adjustments, either. Thus this answer is adding some qualifications to the answers that have already appeared, which on the contrary suggest that either multicollinearity or multiple comparisons should be looked at as the culprits.
To establish the plausibility of these assertions, let's generate a collection of perfectly orthogonal variables--just as non-collinear as possible--and a dependent variable that explicitly is determined solely by the first of the explanands (plus a good amount of random error independent of everything else). In R
this can be done (reproducibly, if you wish to experiment) as
set.seed(17)
p <- 5 # Number of explanatory variables
x <- as.matrix(do.call(expand.grid, lapply(as.list(1:p), function(i) c(-1,1))))
y <- x[,1] + rnorm(2^p, mean=0, sd=2)
It's unimportant that the explanatory variables are binary; what matters is their orthogonality, which we can check to make sure the code is working as expected, which can be done by inspecting their correlations. Indeed, the correlation matrix is interesting: the small coefficients suggest y
has little to do with any of the variables except the first (which is by design) and the off-diagonal zeros confirm the orthogonality of the explanatory variables:
> cor(cbind(x,y))
Var1 Var2 Var3 Var4 Var5 y
Var1 1.00 0.000 0.000 0.000 0.00 0.486
Var2 0.00 1.000 0.000 0.000 0.00 0.088
Var3 0.00 0.000 1.000 0.000 0.00 0.044
Var4 0.00 0.000 0.000 1.000 0.00 -0.014
Var5 0.00 0.000 0.000 0.000 1.00 -0.167
y 0.49 0.088 0.044 -0.014 -0.17 1.000
Let's run a series of regressions, using only the first variable, then the first two, and so on. For brevity and easy comparison, in each one I show only the line for the first variable and the overall F-test:
>temp <- sapply(1:p, function(i) print(summary(lm(y ~ x[, 1:i]))))
# Estimate Std. Error t value Pr(>|t|)
1 x[, 1:i] 0.898 0.294 3.05 0.0048 **
F-statistic: 9.29 on 1 and 30 DF, p-value: 0.00478
2 x[, 1:i]Var1 0.898 0.298 3.01 0.0053 **
F-statistic: 4.68 on 2 and 29 DF, p-value: 0.0173
3 x[, 1:i]Var1 0.8975 0.3029 2.96 0.0062 **
F-statistic: 3.05 on 3 and 28 DF, p-value: 0.0451
4 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0072 **
F-statistic: 2.21 on 4 and 27 DF, p-value: 0.095
5 x[, 1:i]Var1 0.8975 0.3084 2.91 0.0073 **
F-statistic: 1.96 on 5 and 26 DF, p-value: 0.118
Look at how (a) the significance of the first variable barely changes, (a') the first variable remains significant (p < .05) even when adjusting for multiple comparisons (e.g., apply Bonferroni by multiplying the nominal p-value by the number of explanatory variables), (b) the coefficient of the first variable barely changes, but (c) the overall significance grows exponentially, quickly inflating to a non-significant level.
I interpret this as demonstrating that including explanatory variables that are largely independent of the dependent variable can "mask" the overall p-value of the regression. When the new variables are orthogonal to existing ones and to the dependent variable, they will not change the individual p-values. (The small changes seen here are because the random error added to y
is, by accident, slightly correlated with all the other variables.) One lesson to draw from this is that parsimony is valuable: using as few variables as needed can strengthen the significance of the results.
I am not saying that this is necessarily happening for the dataset in the question, about which little has been disclosed. But knowledge that this masking effect can happen should inform our interpretation of the results as well as our strategies for variable selection and model building.