RDD转dataframe

from pyspark.sql import SparkSession,Row
from pyspark.sql.types import StructField, StructType, StringType, IntegerType, LongType

data = [('Alex','male',3),('Nancy','female',6),['Jack','male',9]] # mixed
rdd_ = spark.sparkContext.parallelize(data)

# schema
schema = StructType([
        # true代表不为空
        StructField("name", StringType(), True),
        StructField("gender", StringType(), True),
        StructField("num", StringType(), True)
    ])
df = spark.createDataFrame(rdd_, schema=schema)  # working when the struct of data is same.
print(df.show())

相关阅读:
004-spring cache-声明性的基于XML的缓存
003-spring cache-JCache (JSR-107) annotations
002-spring cache 基于注解的缓存-02详细-Cacheable 、CachePut、CacheEvict、Caching、CacheConfig、EnableCaching、自定义
002-spring cache 基于注解的缓存-01-关键注解概述、spel、缓存Key 与缓存解析器
001-springboot cache 简介、基础使用
tools-eclipse-004-UML图安装
001-Spring的设计理念和整体架构
java-信息安全（十八）java加密解密，签名等总结
005-java的Annotation
002-原始jpa以及基本加载过程，基本sql使用

原文地址：https://www.cnblogs.com/muyue123/p/13260672.html