• spark学习第一天-词频统计demo


    依赖:

     <properties>
        <scala.version>2.11.12</scala.version>
        <spark.version>2.3.0</spark.version>
      </properties>
      <dependencies>
        <dependency>
          <groupId>org.scala-lang</groupId>
          <artifactId>scala-library</artifactId>
          <version>${scala.version}</version>
        </dependency>
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-core_2.11</artifactId>
          <version>${spark.version}</version>
        </dependency>
    
      </dependencies>

    代码:

    package com.cslc
    import org.apache.spark.SparkConf
    import org.apache.spark.SparkContext
    /**
      * Created by liuzhimin on 2019/5/28.
      */
    object word_count {
      def main(args: Array[String]): Unit = {
        val conf = new SparkConf().setAppName("word count first scala")
        val sc = new SparkContext(conf)
        val lines = sc.textFile("hdfs://cslcdip/user/dip/word.txt")
        lines.flatMap(_.split(" ")).map((_, 1)).reduceByKey(_+_).collect().foreach(println)
        sc.stop()
      }
    } 

    主要函数:

    方法一
    a.flatMap(x=>x.split(" ")).map(x=>(x,1)).groupBy(x._1).map(x=>(x._1,x._2.map(x=>x._2).sum))

    方法二

    a.flatMap(x=>x.split(" ")).map(x=>(x,1)).groupBy(x._1).map(x=>(x._1,x._2.map(x=>x._2).reduce(_+_)))

    方法三:
    flatMap(x=>x.split(" ")).map(x=>(x, 1)).reduceByKey(_+_).collect()

  • 相关阅读:
    loj 6035 「雅礼集训 2017 Day4」洗衣服
    BZOJ 3251 树上三角形
    UwrhrQNgRh
    百度之星2018资格赛1002题解
    [CF-676B]PYRAMID OF GLASSES
    【CF-371C】Hamburgers
    洛谷P1012拼数——字符串排序
    位运算详解及应用
    NOIP 2014 Day2 T1 无线网络发射器
    写代码要注意的几点(2)
  • 原文地址:https://www.cnblogs.com/students/p/10956404.html
Copyright © 2020-2023  润新知