Quick answer
The reason is because, assuming the data are i.i.d. and Xi∼N(μ,σ2), and defining
X¯S2==∑NXiN∑N(X¯−Xi)2N−1
when forming confidence intervals, the sampling distribution associated with the sample variance (
S2, remember, a random variable!) is a chi-square distribution (
S2(N−1)/σ2∼χ2n−1), just as the sampling distribution associated with the sample mean is a standard normal distribution (
(X¯−μ)n−−√/σ∼Z(0,1)) when you know the variance, and with a t-student when you don't (
(X¯−μ)n−−√/S∼Tn−1).
Long answer
First of all, we'll prove that S2(N−1)/σ2 follows a chi-square distribution with N−1 degrees of freedom. After that, we'll see how this proof is useful when deriving the confidence intervals for the variance, and how the chi-square distribution appears (and why it is so useful!). Let's begin.
The proof
ν
mχ2ν(t)=(1−2t)−ν/2.
If we can show that the distribution of
S2(N−1)/σ2 has a moment generating function like this one, but with
ν=N−1, then we have shown that
S2(N−1)/σ2 follows a chi-square distribution with
N−1 degrees of freedom. In order to show this, note two facts:
If we define,
Y=∑(Xi−X¯)2σ2=∑Z2i,
where Zi∼N(0,1), i.e., standard normal random variables, the moment generating function of Y is given by
mY(t)===E[etY]E[etZ21]×E[etZ22]×...E[etZ2N]mZ2i(t)×mZ22(t)×...mZ2N(t).
The MGF of Z2 is given by
mZ2(t)==∫∞−∞f(z)exp(tz2)dz(1−2t)−1/2,
where I have used the PDF of the standard normal, f(z)=e−z2/2/2π−−√ and, hence,
mY(t)=(1−2t)−N/2,
which implies that Y follows a chi-square distribution with N degrees of freedom.
If Y1 and Y2 are independent and each distribute as a chi-square distribution but with ν1 and ν2 degrees of freedom, then W=Y1+Y2 distributes with a chi-square distribution with ν1+ν2 degrees of freedom (this follows from taking the MGF of W; do this!).
With the above facts, note that if you multiply the sample variance by N−1, you obtain (after some algebra),
(N−1)S2=−n(X¯−μ)+∑(Xi−μ)2,
and, hence, dividing by
σ2,
(N−1)S2σ2+(X¯−μ)2σ2/N=∑(Xi−μ)2σ2.
Note that the second term in the left-side of this sum distributes as a chi-square distribution with 1 degree of freedom, and the right-hand side sum distributes as a chi-square with
N degrees of freedom.
Therefore, S2(N−1)/σ2 distributes as a chi-square with N−1 degrees of freedom.
Calculating the Confidence Interval for the variance.
When looking for a confidence interval for the variance, you want to know the limits L1 and L2 in
P(L1≤σ2≤L2)=1−α.
Let's play with the inequality inside the parenthesis. First, divide by
S2(N−1),
L1S2(N−1)≤σ2S2(N−1)≤L2S2(N−1).
And then remember two things: (1) the statistic
S2(N−1)/σ2 has a chi-squared distribution with
N−1 degrees of freedom and (2) the variances is always greather than zero, which implies that you can invert the inequalities, because
L1S2(N−1)≤σ2S2(N−1)σ2S2(N−1)≤L2S2(N−1)⇒⇒S2(N−1)σ2≤S2(N−1)L1,S2(N−1)L2≤S2(N−1)σ2,
hence, the probability we are looking for is:
P(S2(N−1)L2≤S2(N−1)σ2≤S2(N−1)L1)=1−α.
Note that
S2(N−1)/σ2∼χ2(N−1). We want then,
∫N−1S2(N−1)L2pχ2(x)dx∫S2(N−1)L1N−1pχ2(x)dx==(1−α)/2 ,(1−α)/2
(we integrate up to
N−1 because the expected value of a chi-squared random variable with
N−1 degrees of freedom is
N−1) or, equivalently,
∫S2(N−1)L20pχ2(x)dx=α/2,∫∞S2(N−1)L1pχ2(x)dx=α/2.
Calling
χ2α/2=S2(N−1)L2 and
χ21−α/2=S2(N−1)L1, where the values
χ2α/2 and
χ21−α/2 can be found in chi-square tables (in computers mainly!) and solving for
L1 and
L2,
L1L2==S2(N−1)χ21−α/2,S2(N−1)χ2α/2.
Hence, your confidence interval for the variance is
C.I.=(S2(N−1)χ21−α/2,S2(N−1)χ2α/2).