Regularization using sklearn

Lets Code More
4 min readOct 13, 2022

In the previous article, we discuss the concept and need for Regularization in Machine Learning in detail.

In this article we will implement the Regularization using sklearn in:

  • Linear Model
  • Polynomial Model

L1 Regularization (Linear model).

In data_reg.csv, you’ll find data for a bunch of points including six predictor variables and one outcome variable. Use sklearn’s Lasso class to fit a linear regression model to the data, while also using L1 regularization to control for model complexity.

# add import statements
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso
import matplotlib.pyplot as plt

Load Data

Split the data so that the six predictor features (first six columns) are stored in X, and the outcome feature (last column) is stored in y.

train_data = pd.read_csv('data_reg.csv',header=None)
X = train_data.iloc[:,:-1]
y = train_data.iloc[:,-1]
X

Fit data using linear regression with Lasso regularization.

  • Create an instance of sklearn’s Lasso class and assign it to the variable lasso_reg. You don’t need to set any parameter values: use the default values for this code.
  • Use the Lasso object’s .fit() method to fit the regression model onto the data.
# Create the linear regression model with lasso regularization.
lasso_reg = Lasso() # For L2 regularization you can use Ridge class instead of Lasso.
# Fit the model.
lasso_reg.fit(X, y)
Output:
Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
normalize=False, positive=False, precompute=False, random_state=None,
selection='cyclic', tol=0.0001, warm_start=False)

Inspect the coefficients of the regression model.

Retrieve and print out the coefficients from the regression model.

reg_coef = lasso_reg.coef_
print(reg_coef)

Output:

[ 0.   2.35793224  2.00441646 -0.05511954 -3.92808318  0.  ]

For which of the predictor features(X) has the lasso regularization step zeroed the corresponding coefficient?

As you can see the answer is 1st one and the last one.

L1 Regularization (complex/polynomial model).

from sklearn.preprocessing import PolynomialFeatures# it will gives 4 degree polynomial object
poly_feat = PolynomialFeatures(degree = 4)
# fit your x a/c to these polynomial features
X_poly = poly_feat.fit_transform(X)
# pd.DataFrame(X_poly)
# create the linear regression model with lasso regularization.
lasso_reg1 = Lasso(max_iter=2000,tol=1) # For L2 regularization you can use Ridge class instead of Lasso.
# fit the model.
lasso_reg1.fit(X_poly, y)

Output:

Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=2000,
normalize=False, positive=False, precompute=False, random_state=None,
selection='cyclic', tol=1, warm_start=False)
reg_coef1 = lasso_reg1.coef_
print(reg_coef1)

Output:

[ 0.00000000e+00  0.00000000e+00  3.00406732e+00  1.78623934e+00
-2.30803479e-01 -3.71384816e+00 0.00000000e+00 -1.95309557e-02
-0.00000000e+00 -1.46557498e-03 4.36299806e-03 -0.00000000e+00
0.00000000e+00 -0.00000000e+00 -2.45727864e-02 -1.01845590e-02
0.00000000e+00 0.00000000e+00 2.16431684e-03 4.85729985e-04
-0.00000000e+00 0.00000000e+00 3.10117551e-03 -7.60551752e-03
-0.00000000e+00 0.00000000e+00 -0.00000000e+00 0.00000000e+00
-1.79015576e-04 -6.60026412e-03 2.79593429e-03 2.13528975e-03
-1.85709370e-03 1.02252022e-01 -8.94814737e-04 -0.00000000e+00
-2.71088626e-03 5.56251694e-03 0.00000000e+00 -7.99245716e-05
-8.05436897e-05 2.53141781e-03 0.00000000e+00 -2.98481699e-05
2.48767581e-03 0.00000000e+00 -1.77469974e-03 0.00000000e+00
-0.00000000e+00 -1.41137351e-03 -5.42600005e-03 1.96914071e-03
-9.85972197e-03 0.00000000e+00 -2.52634618e-04 2.47306094e-04
4.12408140e-04 0.00000000e+00 -2.05851988e-04 0.00000000e+00
-0.00000000e+00 1.79069871e-04 -0.00000000e+00 -0.00000000e+00
8.85595786e-04 6.16512582e-04 -2.45551578e-03 3.91079530e-02
2.06486701e-04 -4.28287203e-04 -0.00000000e+00 5.28995512e-03
-0.00000000e+00 -0.00000000e+00 1.97048809e-04 -2.17888673e-04
3.12484685e-02 4.50154136e-04 0.00000000e+00 0.00000000e+00
-4.83654915e-03 0.00000000e+00 -0.00000000e+00 0.00000000e+00
6.97679476e-05 -5.44836823e-04 1.17250213e-04 -3.13609307e-05
-9.25542836e-05 2.35137914e-03 1.58983545e-03 -3.57551378e-04
4.41112857e-05 8.74331344e-05 -0.00000000e+00 1.78243468e-05
1.09107883e-04 -2.20059626e-04 9.36258443e-06 -8.98379269e-05
9.51699641e-05 8.55839364e-05 9.64122705e-06 0.00000000e+00
0.00000000e+00 -6.79864680e-04 -0.00000000e+00 8.32102812e-05
4.67344448e-04 0.00000000e+00 9.44499432e-05 6.41790420e-05
1.40408169e-03 0.00000000e+00 7.99617040e-05 1.11875476e-03
-0.00000000e+00 -1.09987694e-03 -0.00000000e+00 0.00000000e+00
-9.74191826e-06 9.65759281e-06 -1.22422879e-04 -6.35556234e-04
-3.43086780e-05 -9.80202585e-05 -3.45295391e-03 2.86928653e-04
-0.00000000e+00 -0.00000000e+00 3.95157958e-06 2.36673880e-05
7.56733082e-04 8.13851204e-05 0.00000000e+00 0.00000000e+00
-4.52758043e-05 -0.00000000e+00 0.00000000e+00 0.00000000e+00
-1.42154252e-03 -1.79079826e-03 3.48940113e-04 -4.71194899e-05
0.00000000e+00 1.42377137e-04 1.98581612e-04 -3.81913229e-04
0.00000000e+00 3.36359350e-05 8.43926368e-06 -2.78835488e-03
2.24652045e-03 -0.00000000e+00 -0.00000000e+00 -1.06389254e-04
2.90640893e-04 6.65648014e-04 -0.00000000e+00 2.70061512e-05
-1.71133453e-04 -9.84070791e-03 5.36837135e-04 -0.00000000e+00
-0.00000000e+00 -2.23426539e-05 3.54557924e-05 1.58731004e-03
-2.01882444e-04 0.00000000e+00 0.00000000e+00 3.36604029e-04
0.00000000e+00 0.00000000e+00 0.00000000e+00 7.36115505e-06
-5.01957179e-05 7.50021355e-05 3.37826591e-05 -6.88448690e-06
2.56017761e-05 -9.80553159e-04 2.37078531e-04 -0.00000000e+00
-0.00000000e+00 4.35401818e-06 -3.06333003e-05 9.27049820e-04
-1.26314465e-04 5.76089776e-03 0.00000000e+00 -1.90154988e-04
1.87037014e-03 0.00000000e+00 0.00000000e+00 1.23148373e-06
1.69083049e-06 2.41186486e-04 -5.76670399e-06 -8.92376098e-04
-6.70392956e-03 -3.09427218e-05 0.00000000e+00 -0.00000000e+00
-0.00000000e+00 1.54791055e-04 -1.46379338e-03 0.00000000e+00
-0.00000000e+00 -0.00000000e+00]

And here you can see that many coefficients are 0.

WHICH ONE TO USE (L1 V/S L2)?

L1 L2 Computationally inefficient(unless data is sparse), seems easy because no square values but actually absolute values are hard to differentiate Computationally efficient because (square values have very nice derivatives) A most special benefit is FEATURE SELECTION. e.g You have 1000 data and only 10 are relevant. It will detect irrelevant columns and make them 0. As you saw in the above code how regularization will remove features from a model (by setting their coefficients to zero) No feature selection (treat all columns similar)

If you like this post. Support me by buying me coffee.

https://www.buymeacoffee.com/letscodemore

Note: This article is part of Data Science / Regression series.

This article is originally published at https://www.letscodemore.com/

--

--

Lets Code More

Lets Code More is a blog that publishes excellent articles about technology, coding, and programming. https://www.letscodemore.com/