• Spark- 使用第三方依赖解析IP地址


    使用 github上已有的开源项目
    1)git clone https://github.com/wzhe06/ipdatabase.git

    2)编译下载的项目: mvn clean package- DskipTests

    3)安装jar包到自己的 maven仓库

      mvn install: install-file -Dfile=${编译的jar包路径}/target/ipdatabase-1.0-SNAPSHOT jar  -DgroupId=com.ggstar  -DartifactId=ipdatabase   -Dversion=1.0  -Dpackaging=jar

    4)添加依赖到pom

    <dependency>
      <groupId>com.ggstar<groupId>
      <artifactId>ipdatabase</artifactId>
      <version>1.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.poi</groupId>
      <cartifactId>poi-ooxml</artifactId>
      <version>3.14</version>
    </dependency>
    <dependency>
      <groupId>org.apache.poi</groupId> 
      <cartifactId>poi
    </artifactId>
     <version>3.14</version>
    </dependency>

    5)将源码main/resource下的ipDatabase.csv和ipRegion.xlxs拷贝到当前项目的resource目录下

    6)ip解析工具类

    /**
      * IP解析工具类
      */
    object IpUtils {
    
      def getCity(ip:String): Unit ={
        IpHepler.findRegionByIp(ip)
      }
    }

    7)打包到yarn运行

    在pom文件排除spark打包,因为环境上有。

    <!--scala 依赖-->
        <dependency>
          <groupId>org.scala-lang</groupId>
          <artifactId>scala-library</artifactId>
          <version>${scala.version}</version>
          <scope>provided</scope>
        </dependency>
        <!--SparkSQL-->
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-sql_2.11</artifactId>
          <version>${spark.version}</version>
          <scope>provided</scope>
        </dependency>
    
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-hive_2.11</artifactId>
          <version>${spark.version}</version>
          <scope>provided</scope>
        </dependency>

    打包时注意,pom.xml中需要添加如下plugin

    <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
              <archive>
                <mainfest>
                  <mainClass></mainClass>
                </mainfest>
              </archive>
              <descriptRefs>
                <descriptRef>
                  jar-with-dependencies
                </descriptRef>
              </descriptRefs>
            </configuration>
          </plugin>
          <plugin>

    提交运行

    /bin/spark-submit
    class com.rz.log.SparkstatcleanJobYARN
    --name SparkstatcleanJobYARN
    --master yarn
    --executor-memory 1G
    --num-executors 1
    --files /home/hadoop/Lib/ipDatabase. CSV, /home/hadoop/lib/ipRegion XlSx 
    /home/hadoop/lib/sql-1.0-jar-with-dependencies.jar 
    hdfs://hadoop001:8020/imooc/input/* hdfs://hadoop001: 8020/imooc/clean
  • 相关阅读:
    bzoj2763: [JLOI2011]飞行路线(分层图spfa)
    8.20noip模拟题
    8.19noip模拟题
    1046: [HAOI2007]上升序列(dp)
    bzoj1079: [SCOI2008]着色方案(dp)
    逆序对
    P1966 火柴排队(逆序对)
    NOIP 2015 DAY2
    8.15学校模拟
    差分
  • 原文地址:https://www.cnblogs.com/RzCong/p/10612380.html
Copyright © 2020-2023  润新知