岭回归

岭回归，其实也是一种线性回归。只不过在算法建立回归方程时候，加上正则化的限制，从而达到解决过拟合的效果

API

sklearn.linear_model.Ridge(alpha=1.0, fit_intercept=True,solver="auto", normalize=False)
- 具有l2正则化的线性回归
  - alpha:正则化力度，也叫 λ
    - λ取值：0~1 1~10
- solver:会根据数据自动选择优化方法
  - sag:如果数据集、特征都比较大，选择该随机梯度下降优化
- normalize:数据是否进行标准化
  - normalize=False:可以在fit之前调用preprocessing.StandardScaler标准化数据
- Ridge.coef_:回归权重
- Ridge.intercept_:回归偏置

Ridge方法相当于SGDRegressor(penalty='l2', loss="squared_loss"),只不过SGDRegressor实现了一个普通的随机梯度下降学习，推荐使用Ridge(实现了SAG)

sklearn.linear_model.RidgeCV(_BaseRidgeCV, RegressorMixin)
- 具有l2正则化的线性回归，可以进行交叉验证
- coef_:回归系数

class _BaseRidgeCV(LinearModel):
    def __init__(self, alphas=(0.1, 1.0, 10.0),
                 fit_intercept=True, normalize=False, scoring=None,
                 cv=None, gcv_mode=None,
                 store_cv_values=False):

正则化力度越大，权重系数会越小
正则化力度越小，权重系数会越大

代码

from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge, RidgeCV
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error  # 均方误差


def load_data():
    boston_data = load_boston()
    print("特征数量为:(样本数,特征数)", boston_data.data.shape)
    x_train, x_test, y_train, y_test = train_test_split(boston_data.data,
                                                        boston_data.target, random_state=22)
    return x_train, x_test, y_train, y_test
    
def linear_Ridge():
    """
    Ridge: 岭回归方法
    :return:
    """
    x_train, x_test, y_train, y_test = load_data()
    transfer = StandardScaler()  # 建议使用标准化处理数据
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.transform(x_test)

    estimator = Ridge(max_iter=10000, alpha=0.5)  # 岭回归
    # estimator = RidgeCV(alphas=[0.1, 0.2, 0.3, 0.5])  # 加了交叉验证的岭回归
    estimator.fit(x_train, y_train)

    print("岭回归_权重系数为: ", estimator.coef_)
    print("岭回归_偏置为:", estimator.intercept_)

    y_predict = estimator.predict(x_test)
    error = mean_squared_error(y_test, y_predict)
    print("岭回归_房价预测:", y_predict)
    print("岭回归_均分误差:", error)

    return None


if __name__ == '__main__':
    linear_Ridge()

结果

特征数量为:(样本数,特征数) (506, 13)
岭回归_权重系数为:  [-0.64193209  1.13369189 -0.07675643  0.74427624 -1.93681163  2.71424838
 -0.08171268 -3.27871121  2.45697934 -1.81200596 -1.74659067  0.87272606
 -3.90544403]
岭回归_偏置为: 22.62137203166228
岭回归_房价预测: [28.22536271 31.50554479 21.13191715 32.65799504 20.02127243 19.07245621
10832868 19.61646071 19.63294981 32.85629282 20.99521805 27.5039205
55295503 19.79534148 36.87534254 18.80312973  9.39151837 18.50769876
66823994 24.3042416  19.08011554 34.10075629 29.79356171 17.51074566
89376386 26.53739131 34.68266415 27.42811508 19.08866098 14.98888119
85920064 15.82430706 37.18223651  7.77072879 16.25978968 17.17327251
44393003 19.99708381 40.57013125 28.94670553 25.25487557 17.75476957
77349313  6.87948646 21.78603146 25.27475292 20.4507104  20.47911411
25121804 26.12109499  8.54773286 27.48936704 30.58050833 16.56570322
40627771 35.52573005 32.2505845  21.8734037  17.61137983 22.08222631
49713296 24.09419259 20.15174912 38.49803353 24.63926151 19.77214318
95001219  6.7578343  42.03931243 21.92262496 16.89673286 22.59476215
75560357 21.42352637 36.88420001 27.18201696 21.03801678 20.39349944
35646095 22.27374662 31.142768   20.39361408 23.99587493 31.54490413
76213545 20.8977756  29.0705695  21.99584672 26.30581808 20.10938421
47834262 24.08620166 19.90788343 16.41215513 15.26575844 18.40106165
82285704 16.61995784 20.87907604 26.70640134 20.75218143 17.88976552
27287641 23.36686439 21.57861455 36.78815164 15.88447635 21.47747831
80013402 33.71367379 20.61690009 26.83175792 22.69265611 17.38149366
67395385 21.67101719 27.6669245  25.06785897 23.73251233 14.65355067
19441045  3.81755887 29.1743764  20.68219692 22.33163756 28.01411044
55668351]
岭回归_均分误差: 20.641771606180907

岭回归​

代码​

岭回归

代码