• Spark Scala重命名所有列名


    构造一个dataframe

    import org.apache.spark.sql._
    import org.apache.spark.sql.types._
    
    val data = 
    Array(List("Category A", 100, "This is category A"),
    List("Category B", 120, "This is category B"),
    List("Category C", 150, "This is category C"))
    
    // Create a schema for the dataframe
    val schema =
      StructType(
        StructField("Category", StringType, true) ::
        StructField("Count", IntegerType, true) ::
        StructField("Description", StringType, true) :: Nil)
    
    // Convert list to List of Row
    val rows = data.map(t=>Row(t(0),t(1),t(2))).toList
    
    // Create RDD
    val rdd = spark.sparkContext.parallelize(rows)
    
    // Create data frame
    val df = spark.createDataFrame(rdd,schema)
    print(df.schema)
    df.show()
    +----------+-----+------------------+
    |  Category|Count|       Description|
    +----------+-----+------------------+
    |Category A|  100|This is category A|
    |Category B|  120|This is category B|
    |Category C|  150|This is category C|
    +----------+-----+------------------+

    重命名一列:

    val df2 = df.withColumnRenamed("Category", "category_new")
    df2.show()
    scala> df2.show()
    +------------+-----+------------------+
    |category_new|Count|       Description|
    +------------+-----+------------------+
    |  Category A|  100|This is category A|
    |  Category B|  120|This is category B|
    |  Category C|  150|This is category C|
    +------------+-----+------------------+

    重命名所有列:(每个列名转为小写并在末尾增加_new)

    # Rename columns
    val new_column_names=df.columns.map(c=>c.toLowerCase() + "_new")
    val df3 = df.toDF(new_column_names:_*)
    df3.show()
    scala> df3.show()
    +------------+---------+------------------+
    |category_new|count_new|   description_new|
    +------------+---------+------------------+
    |  Category A|      100|This is category A|
    |  Category B|      120|This is category B|
    |  Category C|      150|This is category C|
    +------------+---------+------------------+

    还可以替换列名中的.或者其它字符

    c.toLowerCase().replaceAll("\.", "_") + "_new"

    转自:https://kontext.tech/column/spark/527/scala-change-data-frame-column-names-in-spark

    如果没有一直坚持,也不会有质的飞跃,当生命有了限度,每个人的价值就会浮现。

    船长博客,期待共同交流提高!

    本文如对您有帮助,记得点击右下边小球【赞一下】,热烈期待您关注博客 n(*≧▽≦*)n

    0成本创业_月入5000被动收入

  • 相关阅读:
    Freckles (最小生成树)
    素数筛法 模板
    Least Common Multiple (最小公倍数,先除再乘)
    Watering Grass (贪心,最小覆盖)
    Minimal coverage (贪心,最小覆盖)
    今年暑假不AC (贪心)
    Bzoj1072 [SCOI2007]排列perm
    Bzoj1087 [SCOI2005]互不侵犯King
    POJ1185 炮兵阵地
    POJ3254 Corn Fields
  • 原文地址:https://www.cnblogs.com/v5captain/p/14654996.html
Copyright © 2020-2023  润新知