数据分析实战案例：pandas在餐厅评分数据中的使用

时间：2023-03-25 19:51:32 Python

为了更好的掌握pandas在实际数据分析中的应用，今天我们就来介绍一下如何使用pandas分析美国餐厅的评分数据。餐厅评分数据简介数据来源为UCIMLRepository，包含千余条数据，有5个属性，分别是：userID：用户IDplaceID：餐厅IDrating：整体评分food_rating：食物评分service_rating：服务评分我们将pandas用于读取：importnumpyasnppath='../data/restaurant_final.csv'df=pd.pd.pd.read_csv（path）.1156U10431326301111157U10111327151101158U10681327331101159U10681325941111160U10681326600001161rows×5columnstoanalyzeratingdataIfweareconcernedaboutthetotalratingsandfoodratingsofdifferentrestaurants,wecanfirstlookattheaverageoftheserestaurantratings,hereweusethepivot_tablemethod:mean_ratings=df.pivot_table(values=['rating','food_rating'],index='placeID',aggfunc='mean')mean_ratings[:5]food_ratingratingplaceID1325601.000.501325611.000.751325641.251.251325721.001.001325831.001.00然后再看一个下各个placeID，投票人数统计：ratings_by_place=df.groupby('placeID').size()ratings_by_place[:10]placeID13256041325614132564413257215132583413258461325945132608613260951326136dtype:int64Ifthenumberofvotersistoosmall,thenthesedataarenotobjective.Let’schoosearestaurantwithmorethan4voters:active_place=ratings_by_place.index4[ratings_by]Intlating_by>6([132560,132561,132564,132572,132583,132584,132594,132608,132609,132613,...135080,135081,135082,135085,135086,135088,135104,135106,135108,135109],dtype='int64',name='placeID',length=124)选择这些餐厅的平均评分数据：mean_ratings=mean_ratings.loc[active_place]mean_ratingsfood_ratingratingplaceID1325601.0000000.5000001325611.0000000.7500001325641.2500001.2500001325721.0000001.0000001325831.0000001.000000.........1350881.1666671.0000001351041.4285710.8571431351061.2000001.2000001351081.1818181.1818181351091.2500001.000000124rows×2columns对rating进行排序，选择评分最高的10个：top_ratings=mean_ratings.sort_values(by='rating',ascending=False)top_ratings[:10]food_ratingratingplaceID1329551.8000002.0000001350342.0000002.0000001349862.0000002.0000001329221.5000001.8333331327552.0000001.8000001350741.7500001.7500001350132.0000001.7500001349761.7500001.7500001350551.7142861.7142861350751.6923081.692308我们还可以计算平均总评分和平均食物评分的差值，并以一栏diff进行保存：mean_ratings['diff']=mean_ratings['rating']-mean_ratings['food_rating']sorted_by_diff=mean_ratings.sort_values(by='diff')sorted_by_diff[:10]food_ratingratingdiffplaceID1326672.0000001.250000-0.7500001325941.2000000.600000-0.6000001328581.4000000.800000-0.6000001351041.4285710.857143-0.5714291325601.0000000.500000-0.5000001350271.3750000.875000-0.5000001327401.2500000.750000-0.5000001349921.5000001.000000-0.5000001327061.2500000.750000-0.5000001328701.0000000.600000-0.400000将数据进行反转，选择差距最大的前10：sorted_by_diff[::-1][:10]food_ratingratingdiffplaceID1349870.5000001.0000000.5000001329371.0000001.5000000.5000001350661.0000001.5000000.5000001328511.0000001.4285710.4285711350490.6000001.0000000.4000001329221.5000001.8333330.3333331350301.3333331.5833330.2500001350631.0000001.2500000.2500001326261.0000001.2500000.2500001350001.0000001.2500000.250000计算rating的标准差，并选择最大的前10个：#StandarddeviationofratinggroupedbyplaceIDrating_std_by_place=df.groupby('placeID')['rating'].std()#Filterdowntoactive_titlesrating_std_by_place=rating_std_by_place.loc[active_place]#OrderSeriesbyvalueindescendingorderrating_std_by_place.sort_values(ascending=False)[:10]placeID1349871.1547011350491.0000001349831.0000001350530.9910311350270.9910311328470.9831921327670.9831921328840.9831921350820.9718251327060.957427Name:rating,dtype:float64本文已收录于http://www.flydean.com/02-pandas-restaurant/最通俗的解读，最深刻的干货，最简洁的教程，很多小技巧你不懂知道等你来发现！欢迎关注我的公众号：《程序那些事儿》，懂技术，更懂你！

上一篇：如何快速上手项目？

下一篇：ApacheCNPython翻译合集（二）20211110更新

数据分析实战案例：pandas在餐厅评分数据中的使用相关文章