• spark dataFrame withColumn


    说明:withColumn用于在原有DF新增一列

    1. 初始化sqlContext

    val sqlContext = new org.apache.spark.sql.SQLContext(sc) 

    2.导入sqlContext隐式转换

    import sqlContext.implicits._ 

    3.  创建DataFrames

    val df = sqlContext.read.json("file:///usr/local/spark-2.3.0/examples/src/main/resources/people.json")

    4. 显示内容

    df.show()   

    | age|   name| 
    +----+-------+
    |null|Michael|
    |  30|   Andy|
    |  19| Justin|

    5. 为原有df新加一列

    df.withColumn("id2", monotonically_increasing_id()+1) 

    6. 显示添加列后的内容

     res6.show() 

    +----+-------+---+
    | age|   name|id2|
    +----+-------+---+
    |null|Michael|  1|
    |  30|   Andy|  2|
    |  19| Justin|  3|
    +----+-------+---+

    完成的过程如下:

    scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) 
    warning: there was one deprecation warning; re-run with -deprecation for details
    sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@2513155a
    scala> import sqlContext.implicits._
    import sqlContext.implicits._
    scala> val df = sqlContext.read.json("file:///usr/local/spark-2.3.0/examples/src/main/resources/people.json")
    2018-06-25 18:55:30 WARN  ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    2018-06-25 18:55:30 WARN  ObjectStore:568 - Failed to get database default, returning NoSuchObjectException
    2018-06-25 18:55:32 WARN  ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
    df: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
    scala> df.show()
    +----+-------+
    | age|   name|
    +----+-------+
    |null|Michael|
    |  30|   Andy|
    |  19| Justin|
    +----+-------+
    scala> df.withColumn("id2", monotonically_increasing_id()+1)
    res6: org.apache.spark.sql.DataFrame = [age: bigint, name: string ... 1 more field]
    scala> res6.show()
    +----+-------+---+
    | age|   name|id2|
    +----+-------+---+
    |null|Michael|  1|
    |  30|   Andy|  2|
    |  19| Justin|  3|
    +----+-------+---+
  • 相关阅读:
    [adminitrative][archlinux][setfont] 设置console的字体大小
    [daily][archlinux][rsync] rsync
    [skill][msgpack] 初试msgpack库以及基本使用
    AWS之搭建深度学习主机
    AWS之SSH登录:使用 PuTTY 从 Windows 连接到 Linux 实例
    加拿大大学排名 by USNews
    Python多进程vs多线程
    Python之JSON使用
    Python之模块与包
    Android重打包+重新签名工具Apktool Box
  • 原文地址:https://www.cnblogs.com/abcdwxc/p/9225855.html
Copyright © 2020-2023  润新知