当前位置: 首页 > 后端技术 > Python

Python正确的字符串处理(踩过的一个坑)

时间:2023-03-26 19:34:08 Python

不管是谁,只要处理过用户提交的调查数据,就能明白这种乱七八糟的数据是怎么回事。为了得到一组可以用来分析的统一格式的字符串,需要做很多事情:去掉空格,去掉各种标点符号,正确大小写等等。一种方法是使用内置的字符串方法和正则表达式re模块:一般写作states=['Alabama','Georgia!','Georgia','georgia','FlOrIda','southcarolina##','Westvirginia?']importredefclean_strings(strings):#一般数据处理步骤result=[]forvalueinstrings:value=value.strip()value=re.sub('[!#?]','',value)value=value.title()result.append(value)returnresultIn[173]:clean_strings(states)Out[173]:['Alabama','Georgia','Georgia','乔治亚','Florida','SouthCarolina','WestVirginia']推荐写法defremove_punctuation(value):returnre.sub('[!#?]','',value)clean_ops=[str.strip,remove_punctuation,str.title]#函数也是对象defclean_strings(strings,ops):result=[]forvalueinstrings:forfunctioninops:value=function(value)result.append(value)returnresultIn[175]:clean_strings(states,clean_ops)Out[175]:['Alabama','Georgia','Georgia','Georgia','Florida','南卡罗丽娜','WestVirginia']#orIn[176]:forxinmap(remove_punctuation,states):#.....:print(x)AlabamaGeorgiaGeorgiageorgiaFlOrIdasouthcarolinaWestvirginia