csv文件读写乱码问题的简单解决方法

时间：2023-03-11 20:59:13 科技观察

今天简单总结一个处理csv文件乱码的问题。也许你有类似的经历。用excel打开csv文件时，所有汉字显示为乱码。然后，用notepad++手动打开，修改编码为utf-8并保存，再用excel打开就可以正常显示了。现在使用Python，只需很少的代码就可以自动执行上述过程。首先导入3个模块：#coding:utf-8#@author:zhenguo#@date:2020-12-16#@describe:functionsaboutautomaticfileprocessingimportpandasaspdimportosimportchardetchardet模块用于获取文件的编码格式，pandas根据这个读取格式，然后保存为xlsx格式。获取filename文件的编码格式：defget_encoding(filename):"""返回文件的编码格式"""withopen(filename,'rb')asf:returnchardet.detect(f.read())['encoding']另存为utf-8编码xlsx格式文件，支持csv、xls、xlsx格式文件乱码处理。需要注意的是，如果读入的文件是csv格式，保存时使用xlsx格式：defto_utf8(filename):"""Saveasto_utf-8"""encoding=get_encoding(filename)ext=os.path。splitext(filename)ifext[1]=='.csv':if'gb'inencodingor'GB'inencoding:df=pd.read_csv(filename,engine='python',encoding='GBK')else:df=pd.read_csv(文件名,engine='python',encoding='utf-8')df.to_excel(ext[0]+'.xlsx')elifext[1]=='.xls'orext[1]=='.xlsx':if'gb'inencodingor'GB'inencoding:df=pd.read_excel(filename,encoding='GBK')else:df=pd.read_excel(filename,encoding='utf-8')df.to_excel(filename)else:print('onlysupportcsv,xls,xlsxformat')上面的函数实现了单个文件的转换，下面的batch_to_utf8实现了目录path下所有后缀为ext_name的文件的批量乱码转换：defbatch_to_utf8(path,ext_name='csv'):"""路径下，将ext_name后缀的乱码文件批量转换为可读文件。"""forfileinos.listdir(path):ifos.path.splitext(file)[1]=='.'+ext_name:to_utf8(os.path.join(path,file))call:if__name__=='__main__':batch_to_utf8('.')#保存当前所有csv文件目录为xlsx格式，在读写utf-8编码的文件时出现乱码，经常遇到Arrived，相信今天文章中的to_utf8，batch_to_utf8函数将解决这个问题。以后遇到的话，不妨直接参考这两个函数试试

上一篇：Bash脚本：正则表达式基础

下一篇：Github13000stars，JAX相对于TensorFlow、PyTorch的快速发展

csv文件读写乱码问题的简单解决方法相关文章