鍏紬鍙?DSShuo浣滆€咃細xihuishaw瀹㈡埛娴佸け-鐢熷瓨鍒嗘瀽瀹㈡埛娴佸け涓嶅悓鐨勮涓氾紝涓嶅悓鐨勫鎴风敓鍛藉懆鏈燂紝瀵瑰鎴锋祦澶辩殑瀹氫箟鏄笉鍚岀殑銆備絾涓€鑸潵璇达紝瀹㈡埛娴佸け鏄寚瀹㈡埛鍦ㄤ竴瀹氭椂闂村唴涓嶅啀浣跨敤鍏徃鐨勪骇鍝佸拰鏈嶅姟銆傚浜庢祦澶遍娴嬶紝鏈夎澶氭満鍣ㄥ涔犳ā鍨嬪彲浠ラ娴嬪鎴锋槸鍚︿細娴佸け銆傞娴嬪鎴锋祦澶辨湁鍑犱釜濂藉锛氭彁鍓嶅共棰勫彲鑳藉け鍘诲鎴风殑瀹㈡埛锛屽苟鎻愬墠閲囧彇瀹㈡埛淇濈暀鎺柦锛涘鍙兘娴佸け瀹㈡埛鐨勫鎴疯繘琛屾暟鎹垎鏋愶紝鎵惧嚭娴佸け瀹㈡埛涓庣暀瀛樺鎴风殑鏈€澶у尯鍒紱鏍规嵁鎹熷け鎯呭喌锛屽舰鎴愬強鏃舵湁鏁堢殑棰勮鏈哄埗锛涙垜浠煡閬撳摢浜涘鎴蜂細娴佸け锛屽鎴锋祦澶辩殑姒傜巼锛屾垜浠繀椤诲嚭鍙颁竴瀹氱殑绛栫暐鏉ョ暀浣忔繏涓存祦澶辩殑瀹㈡埛銆備絾鏄紝浠嶇劧瀛樺湪涓€浜涢棶棰樸€傞€氳繃娴佸け棰勬祴妯″瀷锛屾垜浠煡閬撴湁浜涘鎴蜂細娴佸け锛屼篃鐭ラ亾鐗瑰緛鐨勯噸瑕佹€э紝浣嗚繕鏄姄涓嶄綇鐣欎綇瀹㈡埛鐨勨€滈挬瀛愨€濄€傛暟鎹垎鏋愬笀鍙兘鍏虫敞杩欎簺娴佸け鐨勫鎴峰拰閲嶈鐨勫奖鍝嶇壒寰併€傛媶瑙e緱鍒颁竴浜涚嚎绱€傜敓瀛樺垎鏋怌OX姣斾緥椋庨櫓妯″瀷锛坈oxproportional-hazardsmodel锛夛紝绠€绉癈OX妯″瀷锛屾槸鑻卞浗缁熻瀛﹀D.R.COX锛?972锛夋彁鍑虹殑涓€绉嶅崐鍙傛暟鍥炲綊妯″瀷銆傝妯″瀷閫氬父鐢ㄤ簬鍖诲鐮旂┒锛屽垎鏋愪竴涓垨澶氫釜棰勫畾鍙橀噺瀵规偅鑰呯敓瀛樻椂闂寸殑褰卞搷銆傝繖绉嶇被鍨嬬殑鐢熷瓨妯″瀷鏈€鏈夎叮鐨勬柟闈㈡槸瀹冭兘澶熸鏌ョ敓瀛樻椂闂村拰棰勬祴鍙橀噺涔嬮棿鐨勫叧绯汇€備緥濡傦紝濡傛灉鎴戜滑姝e湪妫€鏌ユ偅鑰呯殑瀛樻椿鐜囷紝閭d箞棰勬祴鍙橀噺鍙兘鏄勾榫勩€佽鍘嬨€佹€у埆銆佸惛鐑熶範鎯瓑銆傝繖浜涢娴嬪彉閲忛€氬父绉颁负鍗忓彉閲忋€?imgsrc="https://p.pstatp.com/origin/pgc-image/229d9f5f7a2e43d5b94e8dbe5fc10b22"style="缂╂斁锛?7%;"/>妯″瀷鍙傛暟璇存槑锛歳iskfunction椋庨櫓鍑解€嬧€嬫暟位(t)锛氱粰瀹氭椂闂磘鐨勭灛鏃舵浜¢闄╋紱鍗忓彉閲廧锛氱壒寰佸悜閲忥紱鍩虹嚎椋庨櫓鍑芥暟位o(t)锛氭弿杩颁簨浠堕闄╅殢鏃堕棿鐨勫彉鍖栵紝鍗虫墍鏈夊崗鍙橀噺閮界瓑浜?鐨勬綔鍦ㄩ闄╋紱鍙﹀锛岀敤鍗曞彉閲忔潵鍒嗘瀽甯哥敤鐨凨aplan-Meier鏇茬嚎锛孋OX妯″瀷鏄竴绉嶅鍙橀噺鐢熷瓨鍒嗘瀽鐨勬柟娉曪紝COX妯″瀷鍙互鍖呭惈鍒嗙被鍙橀噺锛堝鎬у埆锛夊拰鏁板€煎彉閲忥紙濡傚勾榫勶級銆傜浉鍙嶏紝Kaplan-Meier鏇茬嚎鍙兘鍖呭惈鍒嗙被鍙橀噺銆傝€孋OX鍥炲綊鎵╁睍浜嗙敓瀛樺垎鏋愭柟娉曪紝鍙互鍚屾椂璇勪及澶氫釜鍗遍櫓鍥犵礌瀵圭敓瀛樻椂闂寸殑褰卞搷锛屽簲鐢ㄨ寖鍥存洿骞匡紙鐩存帴濂戒汉馃槑锛夈€傛ā鍨嬪簲鐢ㄤ互Kaggle涓婄殑鐢典俊鎹熷け鏁版嵁闆嗕负渚嬶紝浣跨敤lifelines鍖呮瀯寤洪闄╂ā鍨嬨€傝鍙栨暟鎹甶mportpandasaspdimportnumpyasnpimportmatplotlib.pyplotaspltdf=pd.read_csv('Telecom_customerchurn.csv')df=df.dropna()df.set_index('Customer_ID',inplace=True)鍒犲幓鍒暟澶т簬2鐨勭壒寰乨f_str=df.loc[:,df.dtypes==object]foriindf_str.columns:iflen(np.unique(df_str[i].values))>2:deldf[i]鐗瑰緛one-hotdf_str=df.loc[:,df.dtypes==object]foriindf_str.columns:one_hot=pd.get_dummies(df[i])one_hot.columns=[i+'_'+jforjinone_hot.columns]df=df.drop(i,axis=1)df=df.join(one_hot)survival_time=df['months'].valuesdeldf['months']churn=df['churn'].valuesdeldf['娴佸け']鍒犻櫎鐩稿叧鎬ч珮鐨勭壒寰乧orr_matrix=df.corr().abs()upper=corr_matrix.where(np.triu(np.ones(corr_matrix.shape),k=1).astype(np.bool))to_drop=[upper.columns涓垪鐨勫垪锛屽鏋滄湁鐨勮瘽(upper[column]>0.98)]df.drop(to_drop,axis=1,inplace=True)df=df[list(df.columns[:69])]df['months']=survival_timedf['churn']=churndf=df[df['churn']==1]閫夋嫨鍙橀噺骞舵瀯寤篶ox妯″瀷df_sampled=df.sample(n=1000)fromlifelinesimportCoxPHFittercph=CoxPHFitter(penalizer=0.01)cph.fit(df_sampled,duration_col='months',event_col='churn')df_stats=cph.summaryfeatures_valuable=list(df_stats[df_stats['exp(coef)'].values>1.01].index)+list(df_stats[df_stats['exp(coef)'].values<0.98].index)df=df[features_valuable+['churn','months']]CPH妯″瀷鐨勪竴涓熀鏈亣璁炬槸鐗瑰緛涓嶅瓨鍦ㄥ閲嶅叡绾挎€э紝寤烘ā鍓嶉渶瑕佸鐞嗙壒寰佷箣闂寸殑澶氶噸鍏辩嚎鎬э細澶氶噸鍏辩嚎鎬ч棶棰樺彲浠ュ湪鎷熷悎Cox妯″瀷涔嬪墠瑙e喅锛涘湪鍥炲綊杩囩▼涓彲浠ュ绯绘暟鐨勫ぇ灏忓簲鐢ㄦ儵缃氾紝浣跨敤鎯╃綒鍙互鎻愰珮浼拌鐨勭ǔ瀹氭€у苟鎺у埗鍗忓彉閲忎箣闂寸殑楂樼浉鍏虫€э紝鍏朵腑coef鏄壒寰佸搴旂殑鏉冮噸銆傚鏋滀竴涓壒寰佹湁exp(coef)=1锛屽畠娌℃湁鏁堟灉锛涘鏋渆xp(coef)>1锛屽畠浼氶檷浣庨闄╁苟鎻愰珮鐢熷瓨鐜囥€備簡瑙f瘡涓壒寰佹垨鍐冲畾鐨勫奖鍝嶇殑鏈€濂芥柟娉曟槸涓哄崟涓壒寰佹垨鍐冲畾缁樺埗涓€鏉$敓瀛樻洸绾匡紝鍚屾椂淇濇寔鍏朵粬鐗瑰緛鐨勬暟鎹笉鍙樸€傝繖閲岃皟鐢╬lot_partial_effects_on_outcome()鏂规硶骞朵紶閫掑弬鏁?鎰熷叴瓒g殑鐗瑰緛鍜岃鏄剧ず鐨勫€笺€俧eaturemodels=9锛屾湁80%鐨勫瓨娲荤巼瓒呰繃42涓湀锛岃€屽叾浠栧€煎瓨娲荤巼杈冧綆銆傞€氳繃缁樺埗浠ヤ笂澶氫釜鐗瑰緛鐨勭敓瀛樻洸绾匡紝鍙互寰楀嚭鍝簺琛屼负鍙互鎻愰珮瀹㈡埛鐨勭敓瀛樼巼銆傛垜浠敋鑷冲彲浠ヤ负姣忎釜瀹㈡埛缁樺埗鐢熷瓨鏇茬嚎锛屽苟閫氳繃鏌ョ湅瀹㈡埛鐗瑰緛鏉ュ垎鏋愮敓瀛樼巼浣庣殑鍘熷洜锛?缂╂斁:67%;"/>鎮ㄨ繕鍙互閫氳繃涓哄鎴峰疄鏂戒笉鍚岀殑绛栫暐鏉ユ瘮杈冪瓥鐣ュ鐢熷瓨鏇茬嚎鐨勫奖鍝嶏細67%锛涒€?>杩欓噷绛栫暐1锛堟绾匡級鐨勬墽琛屾晥鏋滄瘮绛栫暐2锛堢豢绾匡級濂斤紝瀛樻椿鐜囨洿楂樸€傚洜姝わ紝姣忎釜瀹㈡埛閮藉彲浠ヨ繘琛屽垎鏋愬拰璁捐鍓嶇灮鎬х殑绛栫暐锛屼互纭繚缁熻涓婄殑鏈€楂樼敓瀛樼巼銆傛€荤粨Cox姣斾緥椋庨櫓妯″瀷涓嶄粎鍙互鎵惧埌褰卞搷瀹㈡埛娴佸け鐨勫洜绱犲拰涓嶅悓鍥犵礌鐨勫奖鍝嶆柟鍚戯紝杩樺彲浠ラ€氳繃鐗瑰緛鍒嗘瀽锛屽緱鍑轰釜鎬у寲鐨勯檷浣庡鎴锋祦澶辩巼鐨勭瓥鐣ワ紝鐢氳嚦鍙互鍦ㄤ笉鍚岀殑绛栫暐涔嬮棿杩涜姣旇緝浠ユ壘鍒版彁楂樹繚鐣欑巼鐨勬渶浣崇瓥鐣ャ€?/p>
