রিজ, লাসো এবং ইলাস্টিক নেট

33

রিজ, লাসো এবং ইলাস্টিকনেট নিয়ন্ত্রণের পদ্ধতিগুলি কীভাবে তুলনা করতে পারে? তাদের সম্পর্কিত সুবিধা এবং অসুবিধাগুলি কী কী? যে কোনও ভাল প্রযুক্তিগত কাগজ, বা বক্তৃতা নোট পাশাপাশি প্রশংসা করা হবে।

— user3269
সূত্র

39

ইন পরিসংখ্যান শেখার উপাদানসমূহ বই, Hastie এট অল। এই সঙ্কুচিত কৌশলগুলির একটি খুব অন্তর্দৃষ্টিপূর্ণ এবং পুঙ্খানুপুঙ্খ তুলনা সরবরাহ করুন। বইটি অনলাইনে উপলব্ধ ( পিডিএফ)). The comparison is done in section 3.4.3, page 69.

The main difference between Lasso and Ridge is the penalty term they use. Ridge uses $L_2$ penalty term which limits the size of the coefficient vector. Lasso uses $L_1$ penalty which imposes sparsity among the coefficients and thus, makes the fitted model more interpretable. Elasticnet is introduced as a compromise between these two techniques, and has a penalty which is a mix of $L_1$ and $L_2$ norms.

— MMM
সূত্র

3

That is a wonderful reference book.

— bdeonovic

4

also because the authors are the inventors of these techniques!

— Bakaburg

1

Thank you for giving us a reference of this beautiful book

— Christina

1

I highly recommend section 18.4 as well, pages 661-668. Provides more information on lasso vs. elastic net.

— Katya Handler

1

Link to the book is dead as of 14 Oct 2016

— Ashe

22

To summarize, here are some salient differences between Lasso, Ridge and Elastic-net:

Lasso does a sparse selection, while Ridge does not.
When you have highly-correlated variables, Ridge regression shrinks the two coefficients towards one another. Lasso is somewhat indifferent and generally picks one over the other. Depending on the context, one does not know which variable gets picked. Elastic-net is a compromise between the two that attempts to shrink and do a sparse selection simultaneously.
Ridge estimators are indifferent to multiplicative scaling of the data. That is, if both X and Y variables are multiplied by constants, the coefficients of the fit do not change, for a given $\lambda$ parameter. However, for Lasso, the fit is not independent of the scaling. In fact, the $\lambda$ parameter must be scaled up by the multiplier to get the same result. It is more complex for elastic net.
Ridge penalizes the largest $\beta$ 's more than it penalizes the smaller ones (as they are squared in the penalty term). Lasso penalizes them more uniformly. This may or may not be important. In a forecasting problem with a powerful predictor, the predictor's effectiveness is shrunk by the Ridge as compared to the Lasso.

— balaks
সূত্র

@ balaks for the second point that you made, what does it mean of 'one does not know which variable gets picked'? Did you mean LASSO is indifferent, so it kind of randomly picks one so we don't really know which one is the best?

— meTchaikovsky

4

I highly recommended you to have a look at An introduction to statistical learning book (Tibshirani et. al, 2013).

The reason for this is that Elements of statistical learning book is intended for individuals with advanced training in the mathematical sciences. In the foreword to ISL, the authors write:

An Introduction to Statistical Learning arose from the perceived need for a broader and less technical treatment of these topics. [...]

An Introduction to Statistical Learning is appropriate for advanced undergraduates or master’s students in statistics or related quantitative fields or for individuals in other disciplines who wish to use statistical learning tools to analyze their data.

— jeza
সূত্র

1

Can you elaborate on why you found this reference to be useful?

— J. M. is not a statistician

1

It's fine to quote a book, but please mark it as a quote and not as your own text. Otherwise it's plagiarism. I edited it for you now.

— amoeba says Reinstate Monica

1

The above answers are very clear and informative. I would like to add one minor point from the statistic perspective. Take the ridge regression as an example. It is an extension of the ordinal least square regression to solve the multicollinearity problems when there are many correlated features. If the linear regression is

Y=Xb+e

The normal equation solution for the multiple linear regression

b=inv(X.T*X)*X.T*Y

The normal equation solution for the ridge regression is

b=inv(X.T*X+k*I)*X.T*Y.

It is a biased estimator for b and we can always find a penalty term k which will make the mean square error of Ridge regression smaller than that of OLS regression.

For LASSO and Elastic-Net, we could not find such a analytic solution.

— Emma
সূত্র