কেন সম্ভাবনা-অনুপাতের পরীক্ষা চি-স্কোয়ার বিতরণ করা হয়?


34

সম্ভাবনা অনুপাত পরীক্ষার পরীক্ষার পরিসংখ্যান কেন চি-স্কোয়ার বিতরণ করা হয়?

2(ln Lalt modelln Lnull model)χ2dfaltdfnull2(ln Lalt modelln Lnull model)χ2dfaltdfnull



14
রেফারেন্সের জন্য ধন্যবাদ। আমার থেকে একটি এখানে: stats.stackexchange.com/faq#etiquette
ডাঃ

5
সেখানে "আপনার হাস্যরসের অনুভূতি আনুন" লক্ষ্য করুন। আমি অভদ্র হওয়ার ইচ্ছা করি নি, তবে এই প্রশ্নের উত্তর তুলনামূলকভাবে ক্লান্তিকর এবং মূলত সেই নিবন্ধটির (বা আরও ভাল পরিসংখ্যানের পাঠ্যপুস্তকের কিছু) বিষয়বস্তুর সমন্বয়ে গঠিত হবে। আপনি যদি এর মধ্যে একটিতে ব্যাখ্যা দিয়ে আপনার নির্দিষ্ট সমস্যাটি বর্ণনা করেন তবে আমি আপনাকে সাহায্য করে খুশি হব।
নিক সাব্বে

2
পে-ওয়াল ছাড়াই উইল্কসের মূল কাগজে সরাসরি লিঙ্ক
আয়োরগো

উত্তর:


23

As mentioned by @Nick this is a consequence of Wilks' theorem. But note that the test statistic is asymptotically χ2χ2-distributed, not χ2χ2-distributed.

আমি এই উপপাদ্যটি দ্বারা অত্যন্ত অভিভূত কারণ এটি একটি বিস্তৃত প্রসঙ্গে। সম্ভাবনা সঙ্গে একটি পরিসংখ্যান মডেল বিবেচনা ( θ | Y ) যেখানে Y ভেক্টর পর্যবেক্ষণ হল এন পরামিতি সঙ্গে একটি ডিস্ট্রিবিউশন থেকে স্বাধীন প্রতিলিপি পর্যবেক্ষণ θ একটি submanifold একাত্মতার বি 1 এর আর সঙ্গে মাত্রা অস্পষ্ট ( বি 1 ) = গুলি । যাক বি 0বি 1 একটি submanifold সঙ্গে মাত্রা হতে অস্পষ্ট ( বি 0l(θy)ynθB1Rddim(B1)=sB0B1)=mdim(B0)=m. Imagine you are interested in testing H0:{θB0}H0:{θB0}.

The likelihood ratio is lr(y)=supθB1l(θy)supθB0l(θy).

lr(y)=supθB1l(θy)supθB0l(θy).
Define the deviance d(y)=2log(lr(y))d(y)=2log(lr(y)). Then Wilks' theorem says that, under usual regularity assumptions, d(y)d(y) is asymptotically χ2χ2-distributed with smsm degrees of freedom when H0H0 holds true.

It is proven in Wilk's original paper mentioned by @Nick. I think this paper is not easy to read. Wilks published a book later, perhaps with an easiest presentation of his theorem. A short heuristic proof is given in Williams' excellent book.


3
Sad that this theorem is not mentioned in the wikipedia page devoted to Samuel S. Wilks
Stéphane Laurent

5
Oh come on Stephane. This is Wikipedia, you can edit it and improve it!
StasK

1
@StasK I know that but I have never tried. And I already spend too much time of my life with statistics & mathematics ;)
Stéphane Laurent

Is there an intuition for why the 2 is in front of the log in the definition of the deviance?
user56834

@Programmer2134 It is derived from a second order taylor expansion.
Frank Vel

25

I second Nick Sabbe's harsh comment, and my short answer is, It is not. I mean, it only is in the normal linear model. For absolutely any other sort of circumstances, the exact distribution is not a χ2χ2. In many situations, you can hope that Wilks' theorem conditions are satisfied, and then asymptotically the log-likelihood ratio test statistics converges in distribution to χ2χ2. Limitations and violations of the conditions of Wilks' theorem are too numerous to disregard.

  1. The theorem assumes i.i.d. data expect issues with dependent data, such as time series or unequal probability survey samples (for which the likelihoods are poorly defined, anyway; the "regular" χ2χ2 tests, such as independence tests in contingency tables, start behaving as a sum kakvk,vki.i.d.χ21kakvk,vki.i.d.χ21 (Rao & Scott). For i.i.d. data, ak=1ak=1, and the sum becomes the χ2χ2. But for non-independent data, this is no longer the case.
  2. The theorem assumes the true parameter to be in the interior of the parameter space. If you have a Euclidean space to work with, that's not an issue. However, in some problems, the natural restrictions may arise, such as variance 0 or correlation between -1 and 1. If the true parameter is one the boundary, then the asymptotic distribution is a mixture of χ2χ2 with different degrees of freedom, in the sense that the cdf of the test is the sum of such cdfs (Andrews 2001, plus two or three more of his papers from the same period, with history going back to Chernoff 1954).
  3. The theorem assumes that all the relevant derivatives are non-zero. This can be challenged with some nonlinear problems and/or parameterizations, and/or situations when a parameter is not identified under the null. Suppose you have a Gaussian mixture model, and your null is one component N(μ0,σ20)N(μ0,σ20) vs. the alternative of two distinct components fN(μ1,σ21)+(1f)N(μ2,σ22)fN(μ1,σ21)+(1f)N(μ2,σ22) with a mixing fraction ff. The null is apparently nested in the alternative, but this can be expressed in a variety of ways: as f=0f=0 (in which case the parameters μ1,σ21μ1,σ21 are not identified), f=1f=1 (in which case μ2,σ22μ2,σ22 are not identified), or μ1=μ2,σ1=σ2μ1=μ2,σ1=σ2 (in which case ff is not identified). Here, you can't even say how many degrees of freedom your test should have, as you have different number of restrictions depending on how you parameterize the nesting. See the work of Jiahua Chen on this, e.g. CJS 2001.
  4. The χ2χ2 may work OK if the distribution has been correctly specified. But if it was not, the test will break down again. In the (largely neglected by statisticians) subarea of multivariate analysis known as structural equation covariance modeling, a multivariate normal distribution is often assumed, but even if the structure is correct, the test will misbehave if the distribution is different. Satorra and Bentler 1995 show that the distribution will become kakvk,vki.i.d.χ21kakvk,vki.i.d.χ21, the same story as with non-independent data in my point 1, but they've also demonstrated how the akaks depend on the structure of the model and the fourth moments of the distribution.
  5. For finite samples, in a large class of situations likelihood ratio is Bartlett-correctible: while Prob[d(y)x]=F(x;χ2d)[1+O(n1)]Prob[d(y)x]=F(x;χ2d)[1+O(n1)] for a sample of size nn, and F(x;χ2d)F(x;χ2d) being the distribution function of the χ2dχ2d distribution, for the regular likelihood problems you can find a constant bb such that Prob[d(y)/(1+b/n)x]=F(x;χ2d)[1+O(n2)]Prob[d(y)/(1+b/n)x]=F(x;χ2d)[1+O(n2)], i.e., to a higher order of accuracy. So the χ2χ2 approximation for finite samples can be improved (and arguably should be improved if you know how). The constant bb depends on the structure of the model, and sometimes on the auxiliary parameters, but if it can be consistently estimated, that works, too, in improving the order of coverage.

For a review of these and similar esoteric issues in likelihood inference, see Smith 1989.


1
Thanks! Very instructive. What do you mean by "it only is in the normal linear model" ? For a Fisher test when B0B0 and B1B1 are linear subspaces, then the deviance is a monotone function of the Fisher statistic, and it is only asymptotically χ2χ2.
Stéphane Laurent

With known variance, I should add.
StasK
আমাদের সাইট ব্যবহার করে, আপনি স্বীকার করেছেন যে আপনি আমাদের কুকি নীতি এবং গোপনীয়তা নীতিটি পড়েছেন এবং বুঝতে পেরেছেন ।
Licensed under cc by-sa 3.0 with attribution required.