3.誤差衡量
MSE,RMSE,MAE越接近於0越好,R方越接近於1越好。
RMSE,是MSE的開根號
MAE平均絕對值誤差(mean absolute error)
R方
from sklearn.datasets import load_irisfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegressionimport numpy as np
iris = load_iris()
'''iris數據集的第三列是鳶尾花長度,第四列是鳶尾花寬度x和y就是自變量和因變量reshape(-1,1)就是將iris.data[:,3]由一維數組轉置為二維數組,以便於與iris.data[:,2]進行運算'''x,y = iris.data[:,2].reshape(-1,1),iris.data[:,3]lr = LinearRegression()
'''train_test_split可以進行訓練集與測試集的拆分,返回值分別為訓練集的x,測試集的x,訓練集的y,測試集的y,分別賦值給x_train,x_test,y_train,y_test,test_size:測試集佔比random_state:選定隨機種子'''x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.25,random_state = 0)lr.fit(x_train,y_train)y_hat = lr.predict(x_test)print(y_train[:5])print(y_hat[:5])
from sklearn.metrics import mean_squared_error,mean_absolute_error,r2_scoreprint('MSE:',mean_squared_error(y_test,y_hat))print('RMSE:',np.sqrt(mean_squared_error(y_test,y_hat))print('MAE:',mean_absolute_error(y_test,y_hat))print('R方:',r2_score(y_test,y_hat))print('R方:',lr.score(x_test,y_test))
from matplotlib import pyplot as pltplt.rcParams['font.family'] = 'SimHei'plt.rcParams['axes.unicode_minus'] = Falseplt.rcParams['font.size'] = 15plt.figure(figsize = (20,8))plt.scatter(x_train,y_train,color = 'green',marker = 'o',label = '訓練集')plt.scatter(x_test,y_test,color = 'orange',marker = 'o',label = '測試集')plt.plot(x,lr.predict(x),'r-')plt.legend()plt.xlabel('花瓣長度')plt.ylabel('花瓣寬度')就這樣畫出了一張很醜的圖,如果想畫更精美的圖或者其他方面的比較
剛剛我們做了對鳶尾花花瓣長度和寬度的線性回歸,探討長度與寬度的關係,探究鳶尾花的花瓣寬度受長度變化的趨勢是怎麼樣的。但是在現實生活當中的數據是十分複雜的,像這種單因素影響的事物是比較少的,我們需要引入多元線性回歸來對多個因素的權重進行分配,從而與複雜事物相符合。吶,boston數據集的介紹在這裡了,我就不詳細介紹了
現在,我們要探討boston當中每一個因素對房價的影響有多大,這就是一個多因素影響的典型例子。import pandas as pdimport numpy as npfrom sklearn.datasets import load_bostonfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_splitboston = load_boston()lr = LinearRegression()x,y = boston.data,boston.targetx_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.15,random_state = 0)lr.fit(x_train,y_train)print(lr.coef_)print(lr.intercept_)y_hat = lr.predict(x_test)[-1.24536078e-01 4.06088227e-02 5.56827689e-03 2.17301021e+00
-1.72015611e+01 4.02315239e+00 -4.62527553e-03 -1.39681074e+00
2.84078987e-01 -1.17305066e-02 -1.06970964e+00 1.02237522e-02
-4.54390752e-01]
36.09267761760974聲明:https://blog.csdn.net/weixin_45891155/article/details/111109655