算法部分不再详述,之前发过很多:【算法系列】决策树决策树(DecisionTree)ID3算法决策树(DecisionTree)C4.5算法决策树(DecisionTree)CART算法ID3,C4.5、CART三种决策树差异实验:导入所需的python库importnumpyasnpimportmatplotlib.pyplotaspltimportpandasaspdimportdatasetdataset=pd.read_csv('Social_Network_Ads.csv')X=dataset。iloc[:,[2,3]].valuesy=dataset.iloc[:,4].values将数据集拆分为训练集和测试集fromsklearn.model_selectionimporttrain_test_splitX_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=0)从sklearn.preprocessing导入特征缩放导入DecisionTreeClassifierclassifier=DecisionTreeClassifier(criterion='entropy',random_state=0)classifier.fit(X_train,y_train)预测测试集的结果y_pred=classifier.predict(X_test)makeconfusionmatrixfromsklearn.metricsimportconfusion_matrixcm=confusion_matrixy_pred,)可视化训练集的结果frommatplotlib.colorsimportListedColormapX_set,y_set=X_train,y_trainX1,X2=np.meshgrid(np.arange(start=X_set[:,0].min()-1,stop=X_set[:,0].max()+1,step=0.01),np.arange(start=X_set[:,1].min()-1,stop=X_set[:,1].max()+1,step=0.01))plt.contourf(X1,X2,classifier.predict(np.array([X1.ravel(),X2.ravel()]).T).reshape(X1.shape),alpha=0.75,cmap=ListedColormap(('red','green')))plt.xlim(X1.min(),X1.max())plt.ylim(X2.min(),X2.max())fori,jinenumerate(np.unique(y_set)):plt.scatter(X_set[y_set==j,0],X_set[y_set==j,1],c=ListedColormap(('red','green'))(i),label=j)plt.title('决策树分类(训练集)')plt.xlabel('Age')plt.ylabel('EstimatedSalary')plt.legend()plt.show()将测试集结果进行可视化frommatplotlib.colorsimportListedColormapX_set,y_set=X_test,y_testX1,X2=np.meshgrid(np.arange(start=X_set[:,0].min()-1,stop=X_set[:,0].max()+1,step=0.01),np.arange(start=X_set[:,1].min()-1,stop=X_set[:,1].max()+1,step=0.01))plt.contourf(X1,X2,classifier.predict(np.array([X1.ravel(),X2.ravel()]).T).reshape(X1.shape),alpha=0.75,cmap=ListedColormap(('red','green')))plt.xlim(X1.min(),X1.max())plt.ylim(X2.min(),X2.max())fori,jinenumerate(np.unique(y_set)):plt.scatter(X_set[y_set==j,0],X_set[y_set==j,1],c=ListedColormap(('red','green'))(i),label=j)plt.title('决策树分类(测试集)')plt.xlabel('年龄')plt.ylabel('预估工资')plt.legend()plt.show()
