• spark读取文件单词统计 local模式


    读取文件中的单词进行聚合统计

    pom.xml:

    <properties>
            <scala.version>2.11.8</scala.version>
            <spark.version>2.2.0</spark.version>
        </properties>
    
        <dependencies>
            <dependency>
                <groupId>org.scala-lang</groupId>
                <artifactId>scala-library</artifactId>
                <version>${scala.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-core_2.11</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
                <version>2.6.0</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>2.6.0</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-hdfs</artifactId>
                <version>2.6.0</version>
            </dependency>
        </dependencies>
    
        <build>
            <sourceDirectory>src/main/scala</sourceDirectory>
            <testSourceDirectory>src/test/scala</testSourceDirectory>
            <plugins>
                <!--指定java版本-->
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.0</version>
                    <configuration>
                        <source>1.8</source>
                        <target>1.8</target>
                        <encoding>UTF-8</encoding>
                    </configuration>
                </plugin>
    
                <!--scala依赖插件,为scala提供支持-->
                <plugin>
                    <groupId>net.alchim31.maven</groupId>
                    <artifactId>scala-maven-plugin</artifactId>
                    <version>3.2.0</version>
                    <executions>
                        <execution>
                            <goals>
                                <goal>compile</goal>
                                <goal>testCompile</goal>
                            </goals>
                            <configuration>
                                <args>
                                    <arg>-dependencyfile</arg>
                                    <arg>${project.build.directory}/.scala_dependencies</arg>
                                </args>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
    
                <!--把所有jar包集成到一个jar包中-->
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-shade-plugin</artifactId>
                    <version>3.1.1</version>
                    <executions>
                        <execution>
                            <phase>package</phase>
                            <goals>
                                <goal>shade</goal>
                            </goals>
                            <configuration>
    <--!去掉META-INF文件中可能出现的非法签名文件--> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>com.test.WordCount</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build>

    WordCount:

    object WordCount {
      def main(args: Array[String]): Unit = {
        val cof = new SparkConf().setMaster("local[2]").setAppName("wordcount")
    
        val sc = new SparkContext(cof)
    
        val rdd1 = sc.textFile("data/wordcount")
    
        val rdd2 = rdd1.flatMap(item => item.split(" "))
    
        val rdd3 = rdd2.map(item => (item,1))
    
        val rdd4 = rdd3.reduceByKey((curr,agg) => (curr+agg))
    
        val result = rdd4.collect()
    
        result.foreach(item => print(item))
      }
    

     

  • 相关阅读:
    面试题:给定一个长度为N的数组,其中每个元素的取值范围都是1到N。判断数组中是否有重复的数字
    位运算技巧3
    Android消息循环分析
    ubuntu安装软件的方式
    fragment Trying to instantiate a class com.example.testhuanxindemo.MyFragment that is not a Fragmen
    LAN路由
    php 简易验证码(GD库)
    飘逸的python
    它们,不能是虚函数!!!
    HTML5调用摄像头实现拍照功能(兼容各大主流浏览器)
  • 原文地址:https://www.cnblogs.com/chong-zuo3322/p/12910293.html
Copyright © 2020-2023  润新知