বিটা / ডিরিচলেট রিগ্রেশন কেন সাধারণীকরণীয় লিনিয়ার মডেল হিসাবে বিবেচিত হয় না?

ভিত্তিটি হ'ল আর প্যাকেজ betareg^{1 এর} ভিনগেটের এই উদ্ধৃতি ।

আরও আরও, মডেল কিছু বৈশিষ্ট্য ভাগ করে (যেমন লিনিয়ার প্রেডিক্টর, লিংক ফাংশন, ছড়িয়ে পড়া প্যারামিটার) জেনারাইজড লিনিয়ার মডেলগুলি (জিএলএমস; ম্যাককুলাঘ এবং নেল্ডার 1989) দিয়ে, তবে এটি এই কাঠামোর কোনও বিশেষ ক্ষেত্রে নয় (এমনকি স্থির বিচ্ছুরণের জন্যও নয়) )

এই উত্তরটি সত্যটিও বোঝায়:

[...] এটি প্রতিক্রিয়া ভেরিয়েবলটি বিটা হিসাবে বিতরণ করা হলে এটি এক প্রকারের রিগ্রেশন মডেল উপযুক্ত। আপনি এটিকে সাধারণীকরণীয় রৈখিক মডেলের অনুরূপ হিসাবে ভাবতে পারেন । এটি ঠিক আপনি যা খুঁজছেন তা [...] (জোর আমার)

প্রশ্নের শিরোনাম এ সব বলে: কেন বিটা / ডিরিচলেট রিগ্রেশনকে সাধারণীভূত লিনিয়ার মডেলগুলি বিবেচনা করা হয় না (তারা কি তা নয়)?

যতদূর আমি জানি, জেনারালাইজড লিনিয়ার মডেল স্বতন্ত্রগুলির শর্তসাপেক্ষে তাদের নির্ভরশীল পরিবর্তনশীলগুলির প্রত্যাশায় নির্মিত মডেলগুলি সংজ্ঞায়িত করে।

$f$ is the link function that maps the expectation, $g$ is probability distribution, $Y$ the outcomes and $X$ the predictiors, $\beta$ are linear parameters and $\sigma^2$ the variance.

f (E (Y ∣ X)) \sim g (β X, I σ^{2})

$f\left(\mathbb E\left(Y\mid X\right)\right) \sim g(\beta X, I\sigma^2)$

Different GLMs impose (or relax) the relationship between the mean and the variance, but $g$ must be a probability distribution in the exponential family, a desirable property which should improve robustness of the estimation if I recall correctly. The Beta and Dirichlet distributions are part of the exponential family, though, so I'm out of ideas.

[1] Cribari-Neto, F., & Zeileis, A. (2009). Beta regression in R.

generalized-linear-model beta-regression dirichlet-regression

— Firebug
সূত্র

(+1) Related: stats.stackexchange.com/a/189196.

— amoeba says Reinstate Monica

@amoeba Thanks for the link, hadn't seen that question before.

— Firebug

I think the issue is that while if you write the beta distribution with the standard

a

$a$ ,

b

$b$ parameters (i.e.

a = b = 1

$a = b = 1$ implies uniform(0,1)), then the beta distribution is in the exponential family, if you write it in terms of

μ

$\mu$ (mean) and

ϕ

$\phi$ (dispersion), it is not. But I've never cared that much for whether a distribution is in the exponential family.

— Cliff AB

@CliffAB After reading the comments under Tim's answer below it seems the parametrization of the Beta leads to non-orthogonality of the parameters, which seems to be a requirement for McCullagh-Nelder GLMs.

— Firebug

I think this short answer: stats.stackexchange.com/a/18812/28666 is relevant and adds to the answers here (hinting to why GLMs were originally defined with exponential dispersion family).

— amoeba says Reinstate Monica

উত্তর:

Check the original reference:

Ferrari, S., & Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics, 31(7), 799-815.

as the authors note, the parameters of re-parametrized beta distribution are correlated, so

Note that the parameters $\beta$ and $\phi$ are not orthogonal, in contrast to what is verified in the class of generalized linear regression models (McCullagh and Nelder, 1989).

So while the model looks like a GLM and quacks like a GLM, it does not perfectly fit the framework.

— Tim
সূত্র

+1 but it would be great to have a more detailed answer. I, personally, don't understand the quote (even after opening the linked paper). Why are these parameters not orthogonal in beta regression?.. Why is this required for GLMs?.. Etc.

— amoeba says Reinstate Monica

@amoeba honestly, I'm not the kind of person who can give you detailed answer on that. I was never that much interested in the theory behind GLM's to have deep enough understanding of such subtleties. McCullagh and Nelder mention this requirement, but I'd need to check their book to see why exactly it is important. If someone would give detailed explanation of why this is an issue, I'd consider issing a bounty for such answer.

— Tim

The orthogonality requirement in GLMs is important: It means that you can estimate the equation

g (μ) = x^{⊤} β

$g(\mu) = x^\top \beta$ without worrying about misspecifying the rest of the likelihood. Parameter estimates are consistent if the mean equation above is correctly specified. Inference is valid if additionally the variance is correctly specified. However, in beta regression you cannot separate the two model equations in this way, even if

ϕ

$\phi$ is just a constant. For consistent results everything has to be specified correctly.

— Achim Zeileis

@AchimZeileis I remembered that I saw your name on CV. What you say makes perfect sense. Maybe you'd like to transform your comment to answer by adding some more rationale? As I said, I'd be happy to award bounty for someone giving detailed enough answer for the question.

— Tim

@Tim Will try to do so when I have more time. That's why I thought a quick comment is better than nothing...

— Achim Zeileis

The answer by @probabilityislogic is on the right track.

The beta distribution is in the two parameter exponential family. The simple GLM models described by Nelder and Wedderburn (1972) do not include all of the distributions in the two parameter exponential family.

In terms of the article by N&W, the GLM applies to the density functions of the following type (this was later named exponential dispersion family in Jørgensen 1987):

π (z; θ, ϕ) = \exp [α (ϕ) {z θ - g (θ) + h (z)} + β (ϕ, z)]

$\pi(z;\theta,\phi) = \exp \left[ \alpha(\phi) \lbrace z\theta - g(\theta) +h(z)\rbrace +\beta(\phi,z) \right]$

with an additional link function $f()$ and linear model for the natural parameter $\theta = f(\mu) = f(X\beta)$ .

So we could rewrite the above distribution also:

π (z; μ, ϕ) = e x p [z (f (μ) α (ϕ)) + h (z) α (ϕ) - g (f (μ)) α (ϕ) + β (ϕ, z)]

$\pi(z;\mu,\phi) = exp \left[z(f(\mu)\alpha(\phi)) +h(z)\alpha(\phi) - g(f(\mu))\alpha(\phi) +\beta(\phi,z) \right]$

The two parameter exponential family is:

f (z; θ_{1}, θ_{2}) = e x p [T_{1} (z) η_{1} (θ_{1}, θ_{2}) + T_{2} (z) η_{2} (θ_{1}, θ_{2}) - g (θ_{1}, θ_{2}) + h (z)]

$f(z;\theta_1,\theta_2) = exp \left[T_1(z)\eta_1(\theta_1,\theta_2) + T_2(z)\eta_2(\theta_1,\theta_2) - g(\theta_1,\theta_2) +h(z) \right]$

which looks similar but more general (also if one of the $\theta$ is constant).

The difference is clear, and also putting the beta distribution in a form as a GLM is not possible.

However, I lack sufficient understanding to create a more intuitive and well informed answer (I have a feeling that there can be much deeper and more elegant relationships to a variety of fundamental principles). The GLM generalizes the distribution of the error by using a single variate exponential dispersion model in place of a least squares model and generalizes the linear relationship in the mean, by using a link function.

The best and most simple intuition seems to be the dispersion- $\alpha(\phi)$ -term in the exponential, which gets multiplied with everything and thus the dispersion does not vary with $\theta$ . Whereas several two parameter exponential families, and quasi-likelihood methods, allow the dispersion parameter to be a function of $\theta$ as well.

— Sextus Empiricus
সূত্র

দ্বিতীয় প্যারামিটার

ϕ

$\phi$ এনএন্ডডাব্লু সংজ্ঞায়িত ডিএফ-এ বিচ্ছুরণ। এটি এক প্যারামিটার প্রাকৃতিক ঘাতক পরিবারকে প্রসারিত করে

π (z; θ)

$\pi(z;\theta)$

— সেক্সটাস এম্পেরিকাস

@amoeba beta is a bivariate exponential family distribution, e.g. www2.stat.duke.edu/courses/Spring11/sta114/lec/expofam.pdf

— টিম

আমি নিশ্চিত না যে এটি সম্পূর্ণরূপে সম্ভব না, এমনকি স্থির বিচ্ছুরণের সাথেও। কমপক্ষে এনএন্ডডাব্লু দ্বারা বর্ণিত গ্ল্যাম অনুসারে নয় (আমি কী জানি যে বিটা রিগ্রেশন সমাধানের জন্য প্রচুর লোকেরা আরও অনেক বেশি কঠিন কাজ করে)। আমি যদি পুনরাবৃত্তিটি কমপক্ষে স্কোয়ারের একই পথে চলার চেষ্টা করি তবে কী ঘটে যায় এবং কোথায় ভুল ঘটে তা দেখানোর জন্য আমি উত্তরটি সম্পাদনা করব।

— সেক্সটাস এম্পেরিকাস

আমি উত্তরটি কিছুটা সম্পাদনা করেছি। 1) পরিবার এবং ছড়িয়ে পড়া মডেলগুলির সম্পর্কে আমার প্রাথমিক বর্ণনাটি ভুল ছিল। জিএলএম একটি প্যারামিটার এক্সফেনশিয়াল পরিবারের সমস্ত বিতরণ অন্তর্ভুক্ত করে কারণ এটি কেবল সেই ঘনত্বের ফাংশন নয়, এটি একটি লিঙ্ক ফাংশন। 2) আরও ভাল স্বজ্ঞাত দৃষ্টিভঙ্গির দিক থেকে আমি খুব বেশি দূরে যেতে পারিনি এবং খুব শীঘ্রই পাওয়ার আশা করি না। জিএলএম মডেলগুলি বিভিন্ন উপস্থাপনায় শাস্ত্রীয় মডেলের সাথে সম্পর্কিত, ফিটিং পদ্ধতির ম্যাট্রিক্স সূত্রের ওজন যুক্ত করে, লিংক ফাংশন এবং বৈচিত্রের সাথে পদগুলি সহ লগ-সম্ভাবনা ফাংশনগুলির ডেরাইভেটিভস, .....

— সেক্সটাস এম্পেরিকাস

I took the liberty to edit your answer a little bit, hope you are fine with the edits. Also, it looks like this answer stats.stackexchange.com/a/18812/28666 hints to why N&W used this particular distribution family and not a broader one.

— amoeba says Reinstate Monica

I don't think the beta distribution is part of the exponential dispersion family. To get this, you need to have a density

f (y; θ, τ) = \exp (\frac{y θ - c (θ)}{τ} + d (y, τ))

$f (y;\theta,\tau)=\exp\left (\frac {y\theta - c (\theta)}{\tau} + d (y,\tau)\right)$

for specified functions $c ()$ and $d ()$ . The mean is given as $c'(\theta)$ and the variance is given as $\tau c''(\theta)$ . The parameter $\theta$ is called the canonical parameter.

The beta distribution cannot be written this way - one way to see this is by noting there is no $y$ term in the log likelihood - it has $\log [y]$ and $\log [1-y]$ instead

f_{b e t a} (y; μ, ϕ) = \exp (ϕ μ \log [\frac{y}{1 - y}] + ϕ \log [1 - y] - \log [B (ϕ μ, ϕ (1 - μ)] - \log [\frac{y}{1 - y}])

$f_{beta}(y;\mu,\phi)=\exp\left (\phi\mu\log\left[\frac {y}{1-y}\right] +\phi\log [1-y] - \log [B (\phi\mu,\phi (1-\mu)]-\log\left[\frac {y}{1-y}\right]\right)$

Yet another way to see that beta is not exponential dispersion family is that it can be written as $y=\frac {x}{x+z}$ where $x$ and $z$ are independent and both follow gamma distributions with the same scale parameter (and gamma is exponential family).

— probabilityislogic
সূত্র

This answer is not correct as written. One way to see this is that, according to the logic presented, the Bernoulli and binomial distributions, for example, would not be in the class of exponential families either.

— কার্ডিনাল

দুঃখিত, আপনি সঠিক যে আমি দেওয়া উদাহরণটি ভ্রান্ত ছিল। (সতর্কতা: CrossValidated মানসিক গাণিতিক এবং মোবাইল ব্যবহার বিপজ্জনক হতে পারে!) কিন্তু, আমার বিন্দু এখনো দাঁড়িয়েছে। কারণ এটি একটি জন্য অনির্বাচন এই উত্তরটি ভুল খুব একটুর জন্য "সংজ্ঞায়িত" "সূচকীয় পরিবার" --- কোন প্রচলিত উৎস বা ব্যবহারিক প্রয়োগের চেয়ে অনেক সংকীর্ণ ধারণার।

— কার্ডিনাল

Hmm. Wikipedia does list beta in the list of exponential family distributions.

— amoeba says Reinstate Monica

True - I was thinking of the natural exponential family - which is a special case

— সম্ভাব্যতাব্লোগিক

প্যারামিটার

θ

$\theta$ ফাংশনটিতে একটি লিঙ্ক ফাংশন দ্বারাও বর্ণনা করা হয় এবং তারপরে সংক্ষিপ্তভাবে সংজ্ঞায়িত বিতরণ ফাংশনটি আরও বিস্তৃত হয়, যার মধ্যে রয়েছে একটি প্যারামিটার এক্সফেনশিয়াল পরিবারের সমস্ত বিতরণ, তবে কেবলমাত্র দুটি পরামিতি সূচকীয় পরিবার।

— সেক্সটাস এম্পেরিকাস