# Ridge Regularization: An Essential Concept in Data Science

@article{Hastie2020RidgeRA, title={Ridge Regularization: An Essential Concept in Data Science}, author={Trevor J. Hastie}, journal={Technometrics}, year={2020}, volume={62}, pages={426 - 433} }

Abstract Ridge or more formally regularization shows up in many areas of statistics and machine learning. It is one of those essential devices that any good data scientist needs to master for their craft. In this brief ridge fest, I have collected together some of the magic and beauty of ridge that my colleagues and I have encountered over the past 40 years in applied statistics.

#### Supplemental Presentations

#### Topics from this paper

#### 15 Citations

Comment: Ridge Regression—Still Inspiring After 50 Years

- Computer Science, MathematicsTechnometrics
- 2020

Comments will focus on two new results related to ridge regularization: response guided principal component regression and leave-one-out analysis in kernel machines.

Comment: Ridge Regression and Regularization of Large Matrices

- Computer Science, MathematicsTechnometrics
- 2020

We view ridge regression through the lens of eigenvalue shrinkage, and consider its influence on two modern problems in high-dimensional statistical inference: covariance estimation and community d...

Can’t Ridge Regression Perform Variable Selection?

- Computer Science, MathematicsTechnometrics
- 2021

A new variable selection method based on an individually penalized ridge regression, a slightly generalized version of the ridge regression is proposed, which is shown to perform competitively based on simulation and a real data example.

InfoGram and Admissible Machine Learning

- Computer Science, MathematicsArXiv
- 2021

A new information-theoretic learning framework (admissible machine learning) and algorithmic risk-management tools (InfoGram, L-features, ALFA-testing) that can guide an analyst to redesign off-the-shelf ML methods to be regulatory compliant, while maintaining good prediction accuracy are introduced.

DeepShadows: Separating Low Surface Brightness Galaxies from Artifacts using Deep Learning

- Computer Science, PhysicsAstron. Comput.
- 2021

This work investigates the use of convolutional neural networks (CNNs) for the problem of separating LSBGs from artifacts in survey images and demonstrates that CNNs offer a very promising path in the quest to study the low-surface-brightness universe.

Anthropogenic influence on extreme precipitation over global land areas seen in multiple observational datasets

- Environmental Science
- 2021

The intensification of extreme precipitation under anthropogenic forcing is robustly projected by global climate models, but highly challenging to detect in the observational record. Large internal…

Anthropogenic influence on extreme precipitation over global land areas seen in multiple observational datasets

- MedicineNature communications
- 2021

A physically interpretable anthropogenic signal that is detectable in all global observational datasets is found that is robustly projected by global climate models and capable of identifying the time evolution of the spatial patterns.

Using Machine Learning to Understand Veterans' Receipt of Loans in the Paycheck Protection Program

- 2020

This paper provides the first quantitative investigation of the receipt of funds from the Paycheck Protection Program (PPP) among Veterans between April and June. We find that Veterans received 3.5%…

Semiparametric Portfolios: Improving Portfolio Performance by Exploiting Non-Linearities in Firm Characteristics

- 2021

We present a semiparametric portfolio optimization method in which portfolio weights are parameterized as a non-linear function of firm characteristics. This approach generalizes the linear…

A tutorial on individualized treatment effect prediction from randomized trials with a binary endpoint.

- Medicine, MathematicsStatistics in medicine
- 2021

The causal structure of individualized treatment effect is laid out in terms of potential outcomes and the required assumptions that underlie a causal interpretation of its prediction are described, including logistic regression-based methods that are both well-known and naturally provide the required probabilistic estimates.

#### References

SHOWING 1-10 OF 39 REFERENCES

Regularization and variable selection via the elastic net

- Mathematics
- 2005

Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a…

Statistical Learning with Sparsity: The Lasso and Generalizations

- Computer Science
- 2015

Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underlying signal in a set of data and extract useful and reproducible patterns from big datasets.

Group lasso with overlap and graph lasso

- Mathematics, Computer ScienceICML '09
- 2009

A new penalty function is proposed which, when used as regularization for empirical risk minimization procedures, leads to sparse estimators and is studied theoretical properties of the estimator, and illustrated on simulated and breast cancer gene expression data.

Efficient quadratic regularization for expression arrays.

- Mathematics, MedicineBiostatistics
- 2004

This article exposes a class of techniques based on quadratic regularization of linear models, including regularized (ridge) regression, logistic and multinomial regression, linear and mixture discriminant analysis, the Cox model and neural networks, and shows that dramatic computational savings are possible over naive implementations.

Reconciling modern machine learning practice and the bias-variance trade-off

- Computer Science
- 2018

This paper reconciles the classical understanding and the modern practice within a unified performance curve that subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance.

Computer Age Statistical Inference: Algorithms, Evidence, and Data Science

- Computer Science
- 2016

This book takes an exhilarating journey through the revolution in data analysis following the introduction of electronic computation in the 1950s, with speculation on the future direction of statistics and data science.

Regression Shrinkage and Selection via the Lasso

- Mathematics
- 1996

SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a…

Surprises in High-Dimensional Ridgeless Least Squares Interpolation

- Mathematics, Computer ScienceArXiv
- 2019

This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.

Ridge Regression: Biased Estimation for Nonorthogonal Problems

- Computer ScienceTechnometrics
- 2000

The ridge trace is introduced is the ridge trace, a method for showing in two dimensions the effects of nonorthogonality, and how to augment X′X to obtain biased estimates with smaller mean square error.

Dropout Training as Adaptive Regularization

- Computer Science, MathematicsNIPS
- 2013

By casting dropout as regularization, this work develops a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer and consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.