• Spark 将DataFrame所有的列类型改为double


    Spark 将DataFrame所有的列类型改为double

    1.单列转化方法

    import org.apache.spark.sql.types._
    val data = Array(("1", "2", "3", "4", "5"), ("6", "7", "8", "9", "10"))
    val df = spark.createDataFrame(data).toDF("col1", "col2", "col3", "col4", "col5")
    
    import org.apache.spark.sql.functions._
    df.select(col("col1").cast(DoubleType)).show()
    

    2.循环转变

    val colNames = df.columns
    
    var df1 = df
    for (colName <- colNames) {
      df1 = df1.withColumn(colName, col(colName).cast(DoubleType))
    }
    df1.show()
    

    3.通过:_*

    val cols = colNames.map(f => col(f).cast(DoubleType))
    df.select(cols: _*).show()
    
    +----+----+----+----+----+
    |col1|col2|col3|col4|col5|
    +----+----+----+----+----+
    | 1.0| 2.0| 3.0| 4.0| 5.0|
    | 6.0| 7.0| 8.0| 9.0|10.0|
    +----+----+----+----+----+
    
    

    查询指定多列和转变指定列的类型了:

    val name = "col1,col3,col5"
    df.select(name.split(",").map(name => col(name)): _*).show()
    df.select(name.split(",").map(name => col(name).cast(DoubleType)): _*).show()
    
    +----+----+----+
    |col1|col3|col5|
    +----+----+----+
    |   1|   3|   5|
    |   6|   8|  10|
    +----+----+----+
    
    +----+----+----+
    |col1|col3|col5|
    +----+----+----+
    | 1.0| 3.0| 5.0|
    | 6.0| 8.0|10.0|
    +----+----+----+
    
    

    上部分完整代码:

    import org.apache.spark.sql.SparkSession
    import org.apache.spark.sql.types._
    import org.apache.spark.sql.DataFrame
    
    object ChangeAllColDatatypes {
    
      def main(args: Array[String]): Unit = {
        val spark = SparkSession.builder().appName("ChangeAllColDatatypes").master("local").getOrCreate()
        import org.apache.spark.sql.types._
        val data = Array(("1", "2", "3", "4", "5"), ("6", "7", "8", "9", "10"))
        val df = spark.createDataFrame(data).toDF("col1", "col2", "col3", "col4", "col5")
    
        import org.apache.spark.sql.functions._
        df.select(col("col1").cast(DoubleType)).show()
    
        val colNames = df.columns
    
        var df1 = df
        for (colName <- colNames) {
          df1 = df1.withColumn(colName, col(colName).cast(DoubleType))
        }
        df1.show()
    
        val cols = colNames.map(f => col(f).cast(DoubleType))
        df.select(cols: _*).show()
        val name = "col1,col3,col5"
        df.select(name.split(",").map(name => col(name)): _*).show()
        df.select(name.split(",").map(name => col(name).cast(DoubleType)): _*).show()
    
      }
    

    上部分原文地址:董可伦

  • 相关阅读:
    P2617 Dynamic Rankings 动态主席树
    P4338 [ZJOI2018]历史 LCT+树形DP
    P3348 [ZJOI2016]大森林
    P3613 睡觉困难综合征 LCT+贪心+位运算
    SP16549 QTREE6
    P3703 [SDOI2017]树点涂色 LCT维护颜色+线段树维护dfs序+倍增LCA
    U19464 山村游历(Wander) LCT维护子树大小
    P4219 [BJOI2014]大融合 LCT维护子树大小
    P2542 [AHOI2005]航线规划 LCT维护双连通分量
    P3950 部落冲突
  • 原文地址:https://www.cnblogs.com/aixing/p/13327350.html
Copyright © 2020-2023  润新知