Pandas进阶教程：Plot详解

时间：2023-03-25 23:35:39 Python

简介python中的matplotlib是一个非常重要且方便的图形化工具，使用matplotlib可以可视化数据分析，今天这篇文章将详细讲解matplotlib在Pandas中的应用。如果我们要使用matplotlib做基础绘图，需要引用它：在[1]中：importmatplotlib.pyplotasplt如果我们想从2020年1月1日开始，随机生成365天的数据，那么图表应该可以这样写：ts=pd.Series(np.random.randn(365),index=pd.date_range("1/1/2020",periods=365))ts.plot()使用DF绘制图像同时多个系列：df3=pd.DataFrame(np.random.randn(365,4),index=ts.index,columns=list("ABCD"))df3=df3.cumsum()df3.plot()可以指定要使用数据的行和列：df3=pd.DataFrame(np.random.randn(365,2),columns=["B","C"]).cumsum()df3["A"]=pd.Series(list(range(len(df))))df3.plot(x="A",y="B");其他图像plot()支持很多图像类型，包括bar、hist、box、density、area、scatter、hexbin、pie等，下面举个例子看看如何使用。bardf.iloc[5].plot(kind="bar");多列条形图：df2=pd.DataFrame(np.random.rand(10,4),columns=["a","b","c","d"])df2.plot.bar();stackedbardf2.plot.bar(stacked=True);barhbarh表示水平条形图：df2.plot.barh(stacked=True);Histogramsdf2。plot.hist(alpha=0.5);boxdf.plot.box();box可以自定义颜色：color={....:"boxes":"DarkGreen",....:"whiskers":"DarkOrange",....:"medians":"DarkBlue",....:"caps":"Gray",....:}df.plot.box(color=color,sym="r+");可以转换为水平：df.plot.box(vert=False);除了方框，还可以使用DataFrame.boxplot绘制箱线图：在[42]中：df=pd.DataFrame(np.random.rand(10,5))在[44]中：bp=df.boxplot()boxplot可以使用by来分组：df=pd.DataFrame(np.random.rand(10,2),columns=["Col1","Col2"])dfOut[90]:Col1Col200.0476330.15004710.2963850.21282620.5621410.13624330.9977860.22456040.5854570.17891450.5512010.86710260.7401420.00387270.9591300.5815061300.58150680.11444890.534290.04290.04290.0428820.314820.314885DF.Boxetx]=pd.Series(["A","A","A","A","A","B","B","B","B","B"])dfOut[92]:Col1Col2X00.0476330.150047A10.2963850.212826A20.5621410.136243A30.9977860.224560A40.5854570.178914A50.5512010.867102B60.7401420.003872B70.9591300.581506B80.1144890.534242B90.0428820.314845Bbp=df.boxplot(by="X")Area使用Series.plot.area()或DataFrame.plot.area()可以绘制面积图[60]:df=pd.DataFrame(np.random.rand(10,4),columns=["a","b","c","d"])在[61]中：df.plot.area();如果不想堆叠，可以指定stacked=FalseIn[62]:df.plot.area(stacked=False);ScatterDataFrame。plot.scatter()可以创建点图。在[63]中：df=pd.DataFrame(np.random.rand(50,4),columns=["a","b","c","d"])在[64]中：df.plot.scatter(x="a",y="b");散点图也可以有第三个轴：df.plot.scatter(x="a",y="b",c="c",s=50);第三个参数可以改成散点的大小：df.plot.scatter(x="a",y="b",s=df["c"]*200);Hexagonalbin一个honeycombplot可以使用DataFrame.plot.hexbin()创建：在[69]中：df=pd.DataFrame(np.random.randn(1000,2),columns=["a","b"])在[70]中：df["b"]=df["b"]+np.arange(1000)在[71]中：df.plot.hexbin(x="a",y="b",gridsize=25);默认情况下，颜色深度表示(x,y)中元素的个数，可以通过reduce_C_function指定不同的聚合方式：如mean、max、sum、std。在[72]中：df=pd.DataFrame(np.random.randn(1000,2),columns=["a","b"])在[73]中：df["b"]=df["b"]=df["b"]+np.arange(1000)在[74]中：df["z"]=np.random.uniform(0,3,1000)在[75]中：df.plot.hexbin(x="a",y="b",C="z",reduce_C_function=np.max,gridsize=25);Pie使用DataFrame.plot.pie()或Series.plot.pie()构建饼图：In[76]:series=pd.Series(3*np.random.rand(4),index=["a","b","c","d"],name="series")在[77]中：系列。plot.pie(figsize=(6,6));可以根据列数作图：In[78]:df=pd.DataFrame(....:3*np.random.rand(4,2),index=["a","b","c","d"],columns=["x","y"]....:)....:在[79]中：df.plot.pie(subplots=True,figsize=(8,4));更多自定义内容：在[80]中：series.plot.pie(....:labels=["AA","BB","CC","DD"],....:colors=["r","g","b","c"],....:autopct="%.2f",.....:fontsize=20,....:figsize=(6,6),....:);如果相加的值不为1，会画出一个伞形：在[81]中：series=pd.Series([0.1]*4,index=["a","b","c","d"],name="series2")在[82]中：series.plot.pie(figsize=(6,6));处理绘图中的NaN数据下面是默认绘图方法中处理NaN数据的方法：绘图处理NaN的方法LineLeavegapsatNaNsLine(stacked)Fill0'sBarFill0'sScatterDropNaNsHistogramDropNaNs(column-wise)BoxDropNaNs(column-wise)AreaFill0'sKDEDropNaNs(column-wise)HexbinDropNaNsPieFill0'sOtherdrawingtoolsScattermatrixScattermatrix你可以在pandas.plotting中使用scatter_matrix来绘制散点矩阵：In[83]:frompandas.绘图导入scatter_matrixIn[84]:df=pd.DataFrame(np.random.randn(1000,4),columns=["a","b","c","d"])In[85]:scatter_matrix(df,alpha=0.2,figsize=(6,6),对角线="kde");密度图密度图可以使用Series.plot.kde()和DataFrame.plot.kde()来绘制密度图：在[86]中：ser=pd.Series(np.random.randn(1000))在[87中]:ser.plot.kde();Andrews曲线Andrews曲线允许将多元数据绘制为大量曲线，使用样本的属性作为傅立叶级数创建的系数。通过为每个类别对这些曲线进行不同的着色，可以可视化数据聚类。属于同一类的样本的曲线通常会靠得更近，并形成更大的结构。在[88]中：frompandas.plottingimportandrews_curves在[89]中：data=pd.read_csv("data/iris.data")在[90]中：plt.figure();在[91]中：andrews_curves(data,"姓名”）;平行坐标平行坐标是一种用于绘制多元数据的绘图技术。平行坐标允许人们查看数据中的簇并直观地估计其他统计数据。使用平行坐标将点表示为连接的线段。每条垂直线代表一个属性。一组相连的线段代表一个数据点。倾向于收敛的点会显得更靠近。在[92]中：frompandas.plottingimportparallel_coordinates在[93]中：data=pd.read_csv("data/iris.data")在[94]中：plt.figure();在[95]中：parallel_coordinates(data,"姓名”）;滞后图滞后图是用时间序列和相应的滞后阶数制成的散点图。可用于观察自相关。在[96]中：frompandas.plottingimportlag_plot在[97]中：plt.figure();在[98]中：spacing=np.linspace(-99*np.pi,99*np.pi,num=1000)在[99]:data=pd.Series(0.1*np.random.rand(1000)+0.9*np.sin(spacing))In[100]:lag_plot(data);自相关图自相关图通常用于检查时间序列中的随机性。自相关图是平面二维坐标挂线图。横坐标代表延迟阶数，纵坐标代表自相关系数。In[101]:frompandas.plottingimportautocorrelation_plotIn[102]:plt.figure();In[103]:spacing=np.linspace(-9*np.pi,9*np.pi,num=1000)In[104]:data=pd.Series(0.7*np.random.rand(1000)+0.3*np.sin(spacing))在[105]:autocorrelation_plot(data);Bootstrapplotbootstrapplot用于直观地评估统计量的不确定性，如均值、中位数、中间范围等。从数据集中随机选择一个指定大小的子集，计算该子集的相关统计量，重复指定次数次。生成的图和直方图构成了引导图。在[106]中：frompandas.plottingimportbootstrap_plot在[107]中：data=pd.Series(np.random.rand(1000))在[108]中：bootstrap_plot(data,size=50,samples=500,color="灰色的”）;RadViz基于弹簧张力最小化算法。它将数据集的特征映射到二维目标空间的单位圆中的一点，点的位置由与该点绑定的特征决定。将一个实例扔到圆心，该特征会将实例“拉”向该实例在圆中的位置（对应于该实例的归一化值）。在[109]中：frompandas.plottingimportradviz在[110]中：data=pd.read_csv("data/iris.data")在[111]中：plt.figure();在[112]中：radviz(data,"姓名”）;图片格式在matplotlib1.5版本之后，提供了很多默认的绘图设置，可以通过matplotlib.style.use(my_plot_style)进行设置。您可以使用matplotlib.style.available列出所有可用的样式类型：importmatplotlibasplt;plt.style.availableOut[128]:['seaborn-dark','seaborn-darkgrid','seaborn-ticks','fivethirtyeight','seaborn-whitegrid','classic','_classic_test','fast','seaborn-talk','seaborn-dark-palette','seaborn-bright','seaborn-pastel','grayscale','seaborn-notebook','ggplot','seaborn-colorblind','seaborn-muted','seaborn','Solarize_Light2','seaborn-paper','bmh','seaborn-white','dark_background','seaborn-poster','seaborn-deep']删除小图标。默认情况下，绘制的图会有一个代表列类型的图标，可以通过使用legend=False来禁用：In[115]:df=pd.DataFrame(np.random.randn(1000,4),index=ts.index,columns=list("ABCD"))在[116]中：df=df.cumsum()在[117]中：df.plot(legend=False);设置标签的名字In[118]:df.plot();在[119]中：df.plot(xlabel="newx",ylabel="newy");如果缩放图中X轴或Y轴的数据差异过大，图像显示可能不友好，数值小的部分基本无法显示。您可以传入logy=True以缩放Y轴：在[120]中：ts=pd.Series(np.random.randn(1000),index=pd.date_range("1/1/2000",periods=1000))在[121]:ts=np.exp(ts.cumsum())在[122]中：ts.plot(logy=True);多个Y轴使用secondary_y=True绘制多个Y轴数据：In[125]:plt.figure();In[126]:ax=df.plot(secondary_y=["A","B"])In[127]:ax.set_ylabel("CDscale");In[128]:ax.right_ax.set_ylabel("AB比例尺");小图标默认会加上右字，如果要去掉，可以设置mark_right=False:In[129]:plt.figure();In[130]:df.plot(secondary_y=["A","B"],mark_right=False);坐标文字调整当使用时间作为坐标时，由于时间过长，x轴坐标值显示不完整，可以使用x_compat=True来调整：In[133]:plt.figure();在[134]中：df["A"].plot(x_compat=True);如果有多张图片需要调整，可以使用with:In[135]:plt.figure();In[136]:withpd.plotting.plot_params.use("x_compat",True):.....:df["A"].plot(color="r").....:df["B"].plot(color="g").....:df["C"].plot(color="b").....：用子图绘制DF时，可以将多个Series分离为子图：In[137]:df.plot(subplots=True,figsize=(6,6));可以修改子图的布局：df.plot(subplots=True,layout=(2,3),figsize=(6,6),sharex=False);以上等同于：在[139]中：df.plot(subplots=True,layout=(2,-1),figsize=(6,6),sharex=False);一个更复杂的例子：在[140]中：fig,axes=plt.subplots(4,4,figsize=(9,9))在[141]中：plt.subplots_adjust(wspace=0.5,hspace=0.5)在[142]中]：target1=[轴[0][0]，轴[1][1]，轴[2][2]，轴[3][3]]在[143]中：target2=[轴[3][0]、轴[2][1]、轴[1][2]、轴[0][3]]在[144]中：df.plot(subplots=True,ax=target1,legend=False,sharex=False,sharey=False);在[145]:(-df).plot(subplots=True,ax=target2,legend=False,sharex=False,sharey=False);如果设置table=True绘制表格，可以直接在图中显示表格数据：In[165]:fig,ax=plt.subplots(1,1,figsize=(7,6.5))In[166]:df=pd.DataFrame(np.random.rand(5,3),columns=["a","b","c"])In[167]:ax.xaxis.tick_top()#显示x轴ticksontop.In[168]:df.plot(table=True,ax=ax)figtable也可以显示在图片上：In[172]:frompandas.plottingimporttableIn[173]:fig,ax=plt.subplots(1,1)In[174]:table(ax,np.round(df.describe(),2),loc="upperright",colWidths=[0.2,0.2,0.2]);In[175]:df.plot(ax=ax,ylim=(0,2),legend=None);如果Y轴的数据使用Colormaps如果过多，线条的默认颜色可能难以区分。在这种情况下，可以传入颜色图。在[176]中：df=pd.DataFrame(np.random.randn(1000,10),index=ts.index)在[177]中：df=df.cumsum()在[178]中：plt.figure()；在[179]中：df.plot(colormap="cubehelix");本文已收录于http://www.flydean.com/09-python-pandas-plot/最流行的解读，最深刻的干货，最简单的教程，很多你不知道的小技巧等着你等你发现！

上一篇：互联网公司Top10Python薪资爆料

下一篇：Python教程：高级功能

Pandas进阶教程：Plot详解相关文章