• 2020年寒假假期总结0206


      spark-shell 交互式编程

      数据格式如下所示:
      Tom,DataBase,80
      Tom,Algorithm,50
      Tom,DataStructure,60
      Jim,DataBase,90
      Jim,Algorithm,60
      Jim,DataStructure,80

      请根据给定的实验数据,在 spark-shell 中通过编程来计算以下内容:
      (1) 该系总共有多少学生;
      (2) 该系共开设来多少门课程;
      (3) Tom 同学的总成绩平均分是多少;
      (4) 求每名同学的选修的课程门数;
      (5) 该系 DataBase 课程共有多少人选修;
      (6) 各门课程的平均分是多少;
      (7) 使用累加器计算共有多少人选了 DataBase 这门课。

    1var students=sc.textFile("file:///home/hadoop/test/chapter5-data1.txt")
    var name=students.map(row=>row.split(",")(0))
    var namenumber=name.distinct()
    println(namenumber.count)

     

    2var lessons=sc.textFile("file:///home/hadoop/test/chapter5-data1.txt")
    var lessonname=lessons.map(row=>row.split(",")(1))
    var lessonnumber=lessonname.distinct()
    println(lessonnumber.count)

    3)
    val students=sc.textFile("file:///home/hadoop/test/chapter5-data1.txt")
    val Tom=students.filter(row=>row.split(",")(0)=="Tom")
    Tom.map(row=>(row.split(",")(0),row.split(",")(2).toInt)).mapValues(x=>(x,1)).reduceByKey((x,y)=>(x._1+y._1,x._2+y._2)).mapValues(x=>(x._1/x._2)).collect()

    4var temp = sc.textFile("file:///home/hadoop/test/chapter5-data1.txt")
    var student = temp.map(row=>(row.split(",")(0),row.split(",")(1)))
    student.mapValues(x => (x,1)).reduceByKey((x,y) => (" ",x._2 + y._2)).mapValues(x =>x._2).foreach(println)

    5var temp = sc.textFile("file:///home/hadoop/test/chapter5-data1.txt")
    var lesson = temp.filter(row=>row.split(",")(1)=="DataBase")
    println(lesson.count)

    6var temp = sc.textFile("file:///home/hadoop/test/chapter5-data1.txt")
    var lesson = temp.map(row=>(row.split(",")(1),row.split(",")(2).toInt))
    lesson.mapValues(x=>(x,1)).reduceByKey((x,y)=> (x._1+y._1,x._2+ y._2)).mapValues(x=>(x._1/x._2)).collect()

    7var temp = sc.textFile("file:///home/hadoop/test/chapter5-data1.txt")
    var dateBase = temp.filter(row=>row.split(",")(1)=="DataBase").map(row=>(row.split(",")(1),1))
    var accum = sc.longAccumulator("longAccumulator")
    dateBase.values.foreach(x => accum.add(x))
    println(accum.value)

  • 相关阅读:
    centos7配置本地源给其他机器使用
    python 评委打分程序 (有输入纠错功能)
    【计网】MTU(Maximum Transmission Unit,MTU)
    【计网】集线器、交换机和路由器
    【Network】3.4.1 可靠传输的基本概念
    【Network】5.4 TCP的流量控制
    Live2DViewerEX 创意工坊文件解密
    本地分支reset至与任意远程分支同步
    Logstash日志解析的Grok模式详解
    ELK学习笔记
  • 原文地址:https://www.cnblogs.com/heiyang/p/12271573.html
Copyright © 2020-2023  润新知