对于记录下来的数据,如何用Python进行分析或作图呢?本文将介绍numpy、matplotlib、pandas、scipy等几个用于数据分析和绘图的包。准备环境Python环境推荐使用Anacondarelease版本,下载链接:官方:https://www.anaconda.com/prod...清华源:https://mirrors.tuna.tsinghua...Anaconda是一个scientificComputing的Python发行版已经包含许多流行的用于科学计算和数据分析的Python包。你可以用condalist列出已有的包,你会发现本文要介绍的包有几个:$condalist|grepnumpynumpy1.17.2py37h99e6662_0$conda列表|grep“matplot\|seaborn\|plotly”matplotlib3.1。1py37h54f8f79_0seaborn0.9.0py37_0$conda列表|grep"pandas\|scipy"pandas0.25.1py37h0a44026_0scipy1.3.1py37h1410ff5_0如果你已经有Python环境,那么pip安装它们:pipinstallnumpymatplotcipypandas:https://mirrors.tuna.tsinghua.edu.cn/help/pypi/本文环境为:Python3.7.4(Anaconda3-2019.10)准备数据本文假设数据data0.txt格式如下:id,data,timestamp0,55,1592207702.6888051,41,1592207702.7831342,57,1592207702.8836193,59,1592207702.9805974,58,1592207703.083135,41,1592207703.1830116,52,1592207703.281802...CSV格式:请求分隔,读取可简。之后,我们将一起实现以下目标:CSV数据、numpy读取计算数据列数据、matplotlib图形数据列数据、scipy插值、表单曲线时间戳列数据、pandas分析前后差异、numpy每秒读取数据数numpy可以使用loadtxt直接读取CSV数据,importnumpyasnp#id,(data),timestampdatas=np.loadtxt(p,dtype=np.int32,delimiter=",",skiprows=1,usecols=(1))dtype=np.int32:数据类型np.int32delimiter=",":分隔符","skiprows=1:跳过第1行usecols=(1):读取第1列如果读取多列,#id,(data,timestamp)dtype={'names':('data','timestamp'),'formats':('i4','f8')}datas=np.loadtxt(path,dtype=dtype,delimiter=",",skiprows=1,usecols=(1,2))dtype指令可以在:https://numpy.org/devdocs/ref...numpy分析数据numpy计算均值,样本标准差:#averagedata_avg=np.mean(datas)#data_avg=np.average(datas)#标准差#data_std=np.std(datas)#样本标准差data_std=np.std(datas,ddof=1)print("avg:{:.2f},std:{:.2f},sum:{}".format(data_avg,data_std,np.sum(datas)))matplotlib图形仅需四行即可图形显示:importsysimportmatplotlib.pyplotaspltimportnumpyasnpdef_plot(path):print("Load:{}".format(path))#id,(data),时间戳datas=np.loadtxt(path,dtype=np.int32,delimiter=",",skiprows=1,usecols=(1))fig,ax=plt.subplots()ax.plot(range(len(datas)),datas,label=str(i))ax.legend()plt.show()if__name__=="__main__":如果len(sys.argv)<2:sys.exit("pythondata_plot.py*.txt")_plot(sys.argv[1])ax.plot(x,y,...)横坐标x取的数据下标范围(len(datas))。完整代码见文末Gist地址data_plot.py。运行效果如下:$pythondata_plot.pydata0.txtArgsnonzero:FalseLoad:data0.txtsize:20avg:52.15,std:8.57,sum:1043可以读取多个文件一起显示:$pythondata_plot.pydata*.txtArgsnonzero:FalseLoad:data0.txtsize:20avg:52.15,std:8.57,sum:1043Load:data1.txtsize:20avg:53.35,std:6.78,sum:1067,用scipy插值并平滑到一条曲线:fromscipyimportinterpolatexnew=np.arange(xvalues[0],xvalues[-1],0.01)ynew=interpolate.interp1d(xvalues,yvalues,kind='cubic')完整代码参见要点的data_interp.py地址在文末。运行效果如下:pythondata_interp.pydata0.txtmatplotlib作图时如何配置、延迟、保存,可以看代码和注释。pandas分析数据需要读取timestamp列数据,#id,data,(timestamp)stamps=np.loadtxt(path,dtype=np.float64,delimiter=",",skiprows=1,usecols=(2))numpy计算前后差值,stamps_diff=np.diff(stamps)pandas统计每秒的个数,stamps_int=np.array(stamps,dtype='int')stamps_int=stamps_int-stamps_int[0]importpandasaspdstamps_s=pd.Series(data=stamps_int)stamps_s=stamps_s.value_counts(sort=False)方法:直接把timestamp改成秒数,然后在pandas中统计相同的值。完整代码见文末Gist地址的stamp_diff.py。运行效果如下:pythonstamp_diff.pydata0.txtmatplotlib作图时如何显示多个图表,也可以看代码。结束语本文代码Gist地址:https://gist.github.com/ikuok...分享实用的Coding技巧和知识!欢迎关注,共同成长!
