当前位置: 首页 > 后端技术 > Python

excel的熊猫处理(更新)

时间:2023-03-25 20:10:00 Python

https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://雪球网/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...https://xueqiu.com/1635232612...读取文件`importpadasdf=pd.read_csv("")#读取文件pd.read_clipboard()#读取粘贴板的内容#解决数据显示不全的问题pd.set_option('display.max_columns',None)pd.set_option('display.max_rows',None)#获取指定单元格的值datefirst=config.iloc[0,1]datename=config.iloc[0,2]#新建一列二,过滤料号列的前两位sheet["two"]=sheet["料号"].apply(lambdax:x[:2])`*1*2*3*4*5*6*7*8*9*10*11数值处理`df["dog"]=df["dog"].replace(-1,0)#数值替换#apply理解为函数作为对象可以作为参数传递给其他参数,并且可以作为函数的返回值df["price_new"]=df["price"].apply(lambdapri:pyi.lower())#新列对旧列处理df["pricee"]=df["price"]*2#newcolumn`*1*2*3*4getdata`data=df.head()#默认读取上一行df=pd.read_excel("柠檬.xlsx",sheet_name=["python","student"])#可以通过sheet同时读取多个dfname=pd.read_excel("lemon.clsx",sheet_name=0)data=df.values#获取所有数据print("获取所有值:n{0}".format(data))#格式化输出df=pd.read_excel("lemon.xlsx")data=df.ix[0].values#代表第一行,不包括表头print("getallthevalues:n{0}".format(data))#格式化输出`*1*2*3*4*5*6*7*8*9*10loc和iloc详解`loc[row,cloumn]rowfirstandthencolumn:所有行或列,一般多行可以用方括号,连续的可以用a:c等iloc[index,columns]行索引,列索引,索引从0开始,用法同`*1*2multi-row`multi-rownesteddf=pd.read_excel("lemon.xlsx")data=df.loc[1,2]#读取指定的倍数如果可以,需要在ix[]中嵌套list指定行数print("获取所有的值:n{0}".format(data))#格式化并输出多行esdf=pd.read_excel('lemon.xlsx')data=df.ix[1,2]#读取第一行第二列的值,不需要嵌套listprint("读取数据指定行的:n{0}".format(data))多行多列嵌套df=pd.read_excel('lemon.xlsx')data=df.ix[[1,2],['title','data']].values#读取第一行第二行的标题和数据列的值,这里需要一个嵌套列表print("读取指定行的数据:n{0}".format(data))获取所有行和指定列df=pd.read_excel('lemon.xlsx')data=df.ix[:,['title','data']].values#读取所有行的标题和数据列值,这里需要嵌套列表print("读取指定行数据:n{0}".format(data))`*1*2*3*4*5*6*7*8*9*10*11*12*13*14*15*16*17*18*19输出行号和列号`输出行号并打印输出df=pd.read_excel('lemon.xlsx')print("输出行号列表",df.index.values)输出结果为:输出行号列表[0123]输出列名并打印出df=pd.read_excel('lemon.xlsx')print("Outputcolumntitle",df.columns.values)运行结果如下:outputcolumntitle['case_id''title''data']获取指定行数的值df=pd.read_excel('lemon.xlsx')print("outputvalue",df.sample(3).values)#这个方法和head()方法类似,df.values方法输出值[[2'密码输入错误''{"手机":"18688773467","pwd":"12345678"}'][3'正常充值''{"手机":"18688773467","amount":"1000"}'][1'正常登录''{"手机":"18688773467","pwd":"123456"}']]`*1*2*3*4*5*6*7*8*9*10*11*12*13*14*15*16*17*18*19获取指定值`获取s的值指定列df=pd.read_excel('lemon.xlsx')print("Outputvaluen",df['data'].values)exceldatatodictionarydf=pd.read_excel('lemon.xlsx')test_data=[]foriindf.index.values:#获取行号的索引并遍历:#根据i,获取每一行指定的数据,使用to_dict转成字典row_data=df.ix[i,['case_id','模块','title','http_method','url','data','expected']].to_dict()test_data.append(row_data)print("最终得到的数据为:{0}".format(test_data))`*1*2*3*4*5*6*7*8*9*10*11*12基本格式`去除所有空值行df.dropna()填充空df.fillna(value=0)df["price"].fillna(df["price".mean()])去掉字符串两边的空格df["city"]=df["city"].map(str.strip)大小写转换df["city"]=df["city"].map(str.lower)改变数据格式df["price"].fillna(0).astype("int")改变列名df.rename(columns={"category":"category_size"})删除重复df["city"].drop_duplicates()df["city"].drop_duplicates(keep="last")数字修改和替换df["city"].replace("sh","shanghai")前3行数据df.tail(3)给出行数和列数data.describe()打印出第八行data.loc[8]打印出该列data.loc[8,column_1]第八行[column_1]第四到第六行数据子集data.loc[range(4,6)](leftclosedandrightopen)统计出现的次数data[column_1].value_counts()len()函数作用于column_1中的每个元素map()操作作用于每个元素data[column_1].map(len).map(lambdax:x/100).plot()plotisdrawingapply()将函数应用于列applymap()将函数应用于数据框中的所有单元格遍历行和列i,rowindata.iterrows():打印(我,row)选择指定数据的行important_dates=['1/20/14','1/30/14']data_frame_value_in_set=data_frame.loc[data_frame['PurchaseDate'].isin(important_dates),:]选择0-3列将pandas导入为pdimportsysinput_file=r"supplier_data.csv"output_file=r"output_files6output.csv"data_frame=pd.read_csv(input_file)data_frame_column_by_index=data_frame.iloc[:,[0,3]]data_frame_column_by_index.to_csv(output_file,index=False)添加行头importpandasaspdinput_file=r"supplier_data_no_header_row.csv"output_file=r"output_files11output.csv"header_list=['SupplierName','InvoiceNumber','PartNumber','Cost','PurchaseDate']data_frame=pd.read_csv(input_file,header=None,names=header_list)data_frame.to_csv(output_file,index=False)`*1*2*3*4*5*6*7*8*9*10*11*12*13*14*15*16*17*18*19*20*21*22*23*24*25*26*27*28*29*30*31*32*33*34*35*36*37*38*39*40*41*42*43*44*45*46*47*48*49*50*51*52*53*54*55*56*57*58*59*60*61*62*63*64*65*66*67*68*69*70*71*72*73*74*75*76*77*78*79*80*81*82多表数据合并`数据合并1.通过concat()方法合并表As如下:objs(必填参数):参与连接的pandas对象的列表或字典axis:指定连接的轴,默认为0join:选择inner或outer(默认),其他轴上的索引是否为基础onintersection(inner)orunionSet(outer)合并join_axes:指定用于其他N-1个轴的索引,不进行union/intersection操作keys:连接对象相关的值,用于在上面形成层次索引连接轴verify_integrity:是否去重ignore_index:是否忽略索引合并:eg:frames=[df1,df2,df3]result=pd.concat(frames)result=pd.concat(frames,keys=["x","y","z"])#定义每个表`*1*2*3*4*5*6*7*8*9*10*11*12*13*14*15*16*17`添加df4表,水平连接到df1表的第2367列,空出填充nanindex:是新行axis=1指的是列df4=pd.DataFrame(["B":["sf"],"D":["'sf],index=[2,3,6,7]])result=pd.concat([df1,df4],axis=1)`*1*2*3*4*5`横向进行df1和df4交集合并结果=pd.concat([df1,df4],axis=1,join="inner")列添加,row是df1表根据df1的索引与df的交集4表的横向索引pd.concat([df1,df4],axis=1,join_axes=[df1.index])添加列,行以df1为准,空的为NaN。通过append()方法连接表result=df1.append(df2)result=df1.append(df4,ignore_index=True)新增s1表一列,空格为Nan,与df1横向合并s1=pd.Series(["1","2","3","4"],name="x")result=pd.concat([df1,s1],axis=1)name是一列,serise是一维列表,没有name,从索引0开始填充pd.concat([df1,s1],axis=1,ignore_index=True)合并表后,原索引列名不会保留,并且密钥将用作连接两个表的中介。result=pd.merge(left,right,on="key")result=pd.merge(right,left,on=["key1","key2"])key1和key2,只要它们的值相同,最终的安排是较大的值为key1,较小的key2通过左表Indexjoin右表right=pd.DataFrame({"key1":["K0","K2","K1","K2"],"key2":["K0","K1","K0","K0"],"C":["C0","C1","C2","C3"],"D":["D0","D1","D2","D3"]},index=["k0","k1","k2"])result=left.join(right)基于索引,如果right没有left索引,就用Nan填充,就会被拼接result=pd.merge(left,right,on="K")result=pd.merge(left,right,on="K",suffixes=["_l","_r"])拼接后更改neme属性`*1*2*3*4*5*6*7*8*9*10*11*12*13*14*15*16*17*18*19*20*21*22*23*24*25*26*27*28*29*30*31*32*33*34*35*36*37#解决显示不全的问题pd.set_option('display.max_columns',None)pd.set_option('display.max_rows',None)config=pd.read_excel("C:UsersAdministratorDesktop数据文件名配置.xlsx",dtype=object)datefirst=config.iloc[0,1]datename=config.iloc[0,2]dateall=datefirst+r""+datenametextfile=config.iloc[1,1]textname=config.iloc[1,2]textall=textfile+r""+textnamesheet=pd.read_excel(dateall,sheet_name="Sheet2",dtype=object)sheet["two"]=sheet["料号"].apply(lambdax:x[:2])#取出不包含的数据df=sheet[~sheet["two"].isin(["41","48"])]df1=df[~df["检验结果"].isin(["未检验","试生产验证合格"])]#删除不需要的列result=df1.iloc[:,:len(df1.columns)-1]#取出包含的数据DTR561=结果[结果["模型"].isin(["DTR561"])]DTR562=结果[结果["模型"].isin(["DTR562"])]HPS322=result[result["机型"].isin(["HPS322"])]HPS829=result[result["机型"].isin(["HPS829"])]writer=pd.ExcelWriter("C:UsersAdministratorDesktop数据筛选.xlsx")result.to_excel(writer,sheet_name="全部机型",index=False)DTR561.to_excel(writer,sheet_name="DTR561",index=False)DTR562.to_excel(writer,sheet_name="DTR562",index=False)HPS322.to_excel(writer,sheet_name="HPS322",index=False)HPS829.to_excel(writer,sheet_name="HPS829",index=False)writer.save()print("数据过滤完成")