pyspark读取csv中的数据。csv有一个标题。表头中有两列,列名分别为:bd、tt。frompyspark.sqlimportSparkSessionfrompyspark.sql.functionsimport*frompyspark.sql.typesimportStructType,StructField,IntegerType,StringTypedefrun():spark=SparkSession\.builder\.appName("read_csv")\.getOrCreate(复制代码)\\#定义模式schema=StructType([StructField('bd',StringType(),True),StructField('tt',StringType(),True)],)df=spark.read.csv(r"map.csv",schema=schema,encoding='utf-8',header=True)#header表示数据第一行是否为列名,inferSchema表示schema是自动推断的,并没有指定schema这次df=df.select("bd","tt")rows=df.collect()forrowinrows:result[row['bd']]=row['tt'].split(";")分析()if__name__=='__main__':run()
