• 大数据学习——sparkSql


    官网http://spark.apache.org/docs/1.6.2/sql-programming-guide.html

    val sc: SparkContext // An existing SparkContext.
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    val df = sqlContext.read.json("hdfs://mini1:9000/person.json")
    1.在本地创建一个文件,有三列,分别是id、name、age,用空格分隔,然后上传到hdfs上
    hdfs dfs -put person.json /
    
    2.在spark shell执行下面命令,读取数据,将每一行的数据使用列分隔符分割
    val lineRDD = sc.textFile("hdfs://mini1:9000/person.json").map(_.split(" ")) 

    3.定义case class(相当于表的schema) case class Person(id:Int, name:String, age:Int)

    4.将RDD和case class关联 val personRDD = lineRDD.map(x => Person(x(0).toInt, x(1), x(2).toInt))

    5.将RDD转换成DataFrame val personDF = personRDD.toDF

    6.对DataFrame进行处理 personDF.show

     DSL风格语法

     

     SQL风格语法 

    scala> val dataRDD=sc.textFile("hdfs://mini1:9000/person.json")
    dataRDD: org.apache.spark.rdd.RDD[String] = hdfs://mini1:9000/person.json MapPartitionsRDD[120] at textFile at <console>:27
    
    scala> case class Person(id:Int ,name: String, age: Int)
    defined class Person
    
    scala> val personDF=dataRDD.map(_.split(" ")).map(x=> Person(x(0).toInt,x(1),x(2).toInt)).toDF()
    scala>  personDF.registerTempTable("t_person")

    SparkSqlTest
    package org.apache.spark
    
    import org.apache.spark.rdd.RDD
    import org.apache.spark.sql.{DataFrame, SQLContext}
    
    /**
      * Created by Administrator on 2019/6/12.
      */
    object SparkSqlTest {
      def main(args: Array[String]) {
        val conf = new SparkConf().setAppName("sparksql").setMaster("local[1]")
        val sc = new SparkContext(conf)
        val sqlContext = new SQLContext(sc)
        val file: RDD[String] = sc.textFile("hdfs://mini1:9000/person.json")
        val personRDD = file.map(_.split(" ")).map(x => Person(x(0).toInt, x(1), x(2).toInt))
        import sqlContext.implicits._
        val personDF: DataFrame = personRDD.toDF()
        personDF.registerTempTable("t_person")
        sqlContext.sql("select * from t_person").show
    
      }
    }
    
    case class Person(id: Int, name: String, age: Int)
    +---+--------+---+
    | id| name|age|
    +---+--------+---+
    | 1|zhangsan| 23|
    | 2| wangwu| 34|
    | 3| lisi| 43|
    +---+--------+---+
  • 相关阅读:
    Navicat 创建mysql存过、定时执行存过
    windows 系统 MySQL_5.6.21安装教程
    ldf和mdf文件怎么还原到sqlserver数据库
    免安装的tomcat转服务
    关掉IE提示“当前安全设置会使计算机有风险”
    U盘制作系统启动盘方法
    Tomcat窗口标题,中文乱码解决方法
    MyEclipse10安装SVN插件
    IE浏览器的卸载操作
    739. Daily Temperatures 每日温度
  • 原文地址:https://www.cnblogs.com/feifeicui/p/11011787.html
Copyright © 2020-2023  润新知