• Hadoop Job使用第三方依赖jar文件


    Hadoop Job使用第三方依赖jar文件

    当我们实现了一个Hadoop MapReduce Job以后,而这个Job可能又依赖很多外部的jar文件,在Hadoop集群上运行时,有时会出现找不到具体Class的异常。出现这种问题,基本上就是在Hadoop Job执行过程中,没有从执行的上下文中找到对应的jar文件(实际是unjar的目录,目录里面是对应的Class文件)。所以,我们自然而然想到,正确配置好对应的classpath,MapReduce Job运行时就能够找到。
    有两种方式可以更好地实现,一种是设置HADOOP_CLASSPATH,将Job所依赖的jar文件加载到HADOOP_CLASSPATH,这种配置只针对该Job生效,Job结束之后HADOOP_CLASSPATH会被清理;另一种方式是,直接在构建代码的时候,将依赖jar文件与Job代码打成一个jar文件,这种方式可能会使得最终的jar文件比较大,但是结合一些代码构建工具,如Maven,可以在依赖控制方面保持一个Job一个依赖的构建配置,便于管理。下面,我们分别说明这两种方式。

    设置HADOOP_CLASSPATH

    比如,我们有一个使用HBase的应用,操作HBase数据库中表,肯定需要ZooKeeper,所以对应的jar文件的位置都要设置正确,让运行时Job能够检索并加载。
    Hadoop实现里面,有个辅助工具类org.apache.hadoop.util.GenericOptionsParser,能够帮助我们加载对应的文件到classpath中,操作比较容易一些。
    下面我们是我们实现的一个例子,程序执行入口的类,代码如下所示:

    01
    02
    03
    04
    05
    06
    07
    08
    09
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    package org.shirdrn.kodz.inaction.hbase.job.importing;
     
    import java.io.IOException;
    import java.net.URISyntaxException;
     
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.hbase.HBaseConfiguration;
    import org.apache.hadoop.hbase.client.Put;
    import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
    import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.util.GenericOptionsParser;
     
    /**
    * Table DDL: create 't_sub_domains', 'cf_basic', 'cf_status'
    * <pre>
    * cf_basic:domain cf_basic:len
    * cf_status:status cf_status:live
    * </pre>
    *
    * @author shirdrn
    */
    public class DataImporter {
     
         public static void main(String[] args)
                   throws IOException, InterruptedException, ClassNotFoundException, URISyntaxException {
              
              Configuration conf = HBaseConfiguration.create();
              String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
              
              assert(otherArgs.length == 2);
              
              if(otherArgs.length < 2) {
                   System.err.println("Usage: " +
                             " ImportDataDriver -libjars <jar1>[,<jar2>...[,<jarN>]] <tableName> <input>");
                   System.exit(1);
              }
              String tableName = otherArgs[0].trim();
              String input = otherArgs[1].trim();
              
              // set table columns
              conf.set("table.cf.family", "cf_basic");
              conf.set("table.cf.qualifier.fqdn", "domain");
              conf.set("table.cf.qualifier.timestamp", "create_at");
                        
              Job job = new Job(conf, "Import into HBase table");
              job.setJarByClass(DataImporter.class);
              job.setMapperClass(ImportFileLinesMapper.class);
              job.setOutputFormatClass(TableOutputFormat.class);
              
              job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, tableName);
              job.setOutputKeyClass(ImmutableBytesWritable.class);
              job.setOutputValueClass(Put.class);
              
              job.setNumReduceTasks(0);
              
              FileInputFormat.addInputPath(job, new Path(input));
              
              int exitCode = job.waitForCompletion(true) ? 0 : 1;
              System.exit(exitCode);
         }
     
    }

    可以看到,我们可以通过-libjars选项来指定该Job运行所依赖的第三方jar文件,具体使用方法,说明如下:

    • 第一步:设置环境变量

    我们修改.bashrc文件,增加如下配置内容:

    1
    2
    3
    4
    5
    export HADOOP_HOME=/opt/stone/cloud/hadoop-1.0.3
    export PATH=$PATH:$HADOOP_HOME/bin
    export HBASE_HOME=/opt/stone/cloud/hbase-0.94.1
    export PATH=$PATH:$HBASE_HOME/bin
    export ZK_HOME=/opt/stone/cloud/zookeeper-3.4.3

    不要忘记要使当前的配置生效:

    1
    2
    3
    . .bashrc
    source .bashrc

    这样就可以方便地引用外部的jar文件了。

    • 第二步:确定Job依赖的jar文件列表

    上面提到,我们要使用HBase,需要HBase和ZooKeeper的相关jar文件,用到的文件如下所示:

    1
    HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.94.1.jar:$ZK_HOME/zookeeper-3.4.3.jar ./bin/hadoop jar import-into-hbase.jar

    设置当前Job执行的HADOOP_CLASSPATH变量,只对当前Job有效,所以没有必要在.bashrc中进行配置。

    • 第三步:运行开发的Job

    运行我们开发的Job,通过命令行输入HADOOP_CLASSPATH变量,以及使用-libjars选项指定当前这个Job依赖的第三方jar文件,启动命令行如下所示:

    1
    xiaoxiang@ubuntu3:~/hadoop$ HADOOP_CLASSPATH=$HBASE_HOME/hbase-0.94.1.jar:$ZK_HOME/zookeeper-3.4.3.jar ./bin/hadoop jar import-into-hbase.jar org.shirdrn.kodz.inaction.hbase.job.importing.ImportDataDriver -libjars $HBASE_HOME/hbase-0.94.1.jar,$HBASE_HOME/lib/protobuf-java-2.4.0a.jar,$ZK_HOME/zookeeper-3.4.3.jar t_sub_domains /user/xiaoxiang/datasets/domains/

    需要注意的是,环境变量中内容使用冒号分隔,而-libjars选项中的内容使用逗号分隔。

    这样,我们就能够正确运行开发的Job了。
    下面看看我们开发的Job运行的结果:

    001
    002
    003
    004
    005
    006
    007
    008
    009
    010
    011
    012
    013
    014
    015
    016
    017
    018
    019
    020
    021
    022
    023
    024
    025
    026
    027
    028
    029
    030
    031
    032
    033
    034
    035
    036
    037
    038
    039
    040
    041
    042
    043
    044
    045
    046
    047
    048
    049
    050
    051
    052
    053
    054
    055
    056
    057
    058
    059
    060
    061
    062
    063
    064
    065
    066
    067
    068
    069
    070
    071
    072
    073
    074
    075
    076
    077
    078
    079
    080
    081
    082
    083
    084
    085
    086
    087
    088
    089
    090
    091
    092
    093
    094
    095
    096
    097
    098
    099
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:host.name=ubuntu3
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.version=1.6.0_30
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Sun Microsystems Inc.
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.6.0_30/jre
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/stone/cloud/hadoop-1.0.3/libexec/../conf:/usr/java/jdk1.6.0_30/lib/tools.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/..:/opt/stone/cloud/hadoop-1.0.3/libexec/../hadoop-core-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/asm-3.2.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/aspectjrt-1.6.5.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/aspectjtools-1.6.5.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-beanutils-1.7.0.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-beanutils-core-1.8.0.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-cli-1.2.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-codec-1.4.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-collections-3.2.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-configuration-1.6.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-daemon-1.0.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-digester-1.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-el-1.0.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-httpclient-3.0.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-io-2.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-lang-2.4.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-logging-1.1.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-logging-api-1.0.4.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-math-2.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/commons-net-1.4.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/core-3.1.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hadoop-capacity-scheduler-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hadoop-datajoin-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hadoop-fairscheduler-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hadoop-thriftfs-1.0.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/hsqldb-1.8.0.10.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jackson-core-asl-1.8.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jackson-mapper-asl-1.8.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jasper-compiler-5.5.12.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jasper-runtime-5.5.12.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jdeb-0.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jersey-core-1.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jersey-json-1.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jersey-server-1.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jets3t-0.6.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jetty-6.1.26.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jetty-util-6.1.26.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jsch-0.1.42.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/junit-4.5.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/kfs-0.2.2.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/log4j-1.2.15.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/mockito-all-1.8.5.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/oro-2.0.8.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/protobuf-java-2.4.0a.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/servlet-api-2.5-20081211.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/slf4j-api-1.4.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/slf4j-log4j12-1.4.3.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/xmlenc-0.52.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jsp-2.1/jsp-2.1.jar:/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/jsp-2.1/jsp-api-2.1.jar:/opt/stone/cloud/hbase-0.94.1/hbase-0.94.1.jar:/opt/stone/cloud/zookeeper-3.4.3/zookeeper-3.4.3.jar
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/stone/cloud/hadoop-1.0.3/libexec/../lib/native/Linux-amd64-64
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:os.version=3.0.0-12-server
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:user.name=xiaoxiang
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/xiaoxiang
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Client environment:user.dir=/opt/stone/cloud/hadoop-1.0.3
    13/04/10 22:03:32 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ubuntu3:2222 sessionTimeout=180000 watcher=hconnection
    13/04/10 22:03:32 INFO zookeeper.ClientCnxn: Opening socket connection to server /172.0.8.252:2222
    13/04/10 22:03:32 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 17561@ubuntu3
    13/04/10 22:03:32 WARN client.ZooKeeperSaslClient: SecurityException: java.lang.SecurityException: Unable to locate a login configuration occurred when trying to find JAAS configuration.
    13/04/10 22:03:32 INFO client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client' could not be found. If you are not using SASL, you may ignore this. On the other hand, if you expected SASL to work, please fix your JAAS configuration.
    13/04/10 22:03:32 INFO zookeeper.ClientCnxn: Socket connection established to ubuntu3/172.0.8.252:2222, initiating session
    13/04/10 22:03:32 INFO zookeeper.ClientCnxn: Session establishment complete on server ubuntu3/172.0.8.252:2222, sessionid = 0x13decd0f3960042, negotiated timeout = 180000
    13/04/10 22:03:32 INFO mapreduce.TableOutputFormat: Created table instance for t_sub_domains
    13/04/10 22:03:32 INFO input.FileInputFormat: Total input paths to process : 1
    13/04/10 22:03:32 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    13/04/10 22:03:32 WARN snappy.LoadSnappy: Snappy native library not loaded
    13/04/10 22:03:32 INFO mapred.JobClient: Running job: job_201303302227_0034
    13/04/10 22:03:33 INFO mapred.JobClient:  map 0% reduce 0%
    13/04/10 22:03:50 INFO mapred.JobClient:  map 2% reduce 0%
    13/04/10 22:03:53 INFO mapred.JobClient:  map 3% reduce 0%
    13/04/10 22:03:56 INFO mapred.JobClient:  map 4% reduce 0%
    13/04/10 22:03:59 INFO mapred.JobClient:  map 6% reduce 0%
    13/04/10 22:04:03 INFO mapred.JobClient:  map 7% reduce 0%
    13/04/10 22:04:06 INFO mapred.JobClient:  map 8% reduce 0%
    13/04/10 22:04:09 INFO mapred.JobClient:  map 10% reduce 0%
    13/04/10 22:04:15 INFO mapred.JobClient:  map 12% reduce 0%
    13/04/10 22:04:18 INFO mapred.JobClient:  map 13% reduce 0%
    13/04/10 22:04:21 INFO mapred.JobClient:  map 14% reduce 0%
    13/04/10 22:04:24 INFO mapred.JobClient:  map 15% reduce 0%
    13/04/10 22:04:27 INFO mapred.JobClient:  map 17% reduce 0%
    13/04/10 22:04:33 INFO mapred.JobClient:  map 18% reduce 0%
    13/04/10 22:04:36 INFO mapred.JobClient:  map 19% reduce 0%
    13/04/10 22:04:39 INFO mapred.JobClient:  map 20% reduce 0%
    13/04/10 22:04:42 INFO mapred.JobClient:  map 21% reduce 0%
    13/04/10 22:04:45 INFO mapred.JobClient:  map 23% reduce 0%
    13/04/10 22:04:48 INFO mapred.JobClient:  map 24% reduce 0%
    13/04/10 22:04:51 INFO mapred.JobClient:  map 25% reduce 0%
    13/04/10 22:04:54 INFO mapred.JobClient:  map 27% reduce 0%
    13/04/10 22:04:57 INFO mapred.JobClient:  map 28% reduce 0%
    13/04/10 22:05:00 INFO mapred.JobClient:  map 29% reduce 0%
    13/04/10 22:05:03 INFO mapred.JobClient:  map 31% reduce 0%
    13/04/10 22:05:06 INFO mapred.JobClient:  map 32% reduce 0%
    13/04/10 22:05:09 INFO mapred.JobClient:  map 33% reduce 0%
    13/04/10 22:05:12 INFO mapred.JobClient:  map 34% reduce 0%
    13/04/10 22:05:15 INFO mapred.JobClient:  map 35% reduce 0%
    13/04/10 22:05:18 INFO mapred.JobClient:  map 37% reduce 0%
    13/04/10 22:05:21 INFO mapred.JobClient:  map 38% reduce 0%
    13/04/10 22:05:24 INFO mapred.JobClient:  map 39% reduce 0%
    13/04/10 22:05:27 INFO mapred.JobClient:  map 41% reduce 0%
    13/04/10 22:05:30 INFO mapred.JobClient:  map 42% reduce 0%
    13/04/10 22:05:33 INFO mapred.JobClient:  map 43% reduce 0%
    13/04/10 22:05:36 INFO mapred.JobClient:  map 44% reduce 0%
    13/04/10 22:05:39 INFO mapred.JobClient:  map 46% reduce 0%
    13/04/10 22:05:42 INFO mapred.JobClient:  map 47% reduce 0%
    13/04/10 22:05:45 INFO mapred.JobClient:  map 48% reduce 0%
    13/04/10 22:05:48 INFO mapred.JobClient:  map 50% reduce 0%
    13/04/10 22:05:54 INFO mapred.JobClient:  map 52% reduce 0%
    13/04/10 22:05:57 INFO mapred.JobClient:  map 53% reduce 0%
    13/04/10 22:06:00 INFO mapred.JobClient:  map 54% reduce 0%
    13/04/10 22:06:03 INFO mapred.JobClient:  map 55% reduce 0%
    13/04/10 22:06:06 INFO mapred.JobClient:  map 57% reduce 0%
    13/04/10 22:06:12 INFO mapred.JobClient:  map 59% reduce 0%
    13/04/10 22:06:15 INFO mapred.JobClient:  map 60% reduce 0%
    13/04/10 22:06:18 INFO mapred.JobClient:  map 61% reduce 0%
    13/04/10 22:06:21 INFO mapred.JobClient:  map 62% reduce 0%
    13/04/10 22:06:24 INFO mapred.JobClient:  map 63% reduce 0%
    13/04/10 22:06:27 INFO mapred.JobClient:  map 64% reduce 0%
    13/04/10 22:06:30 INFO mapred.JobClient:  map 66% reduce 0%
    13/04/10 22:06:33 INFO mapred.JobClient:  map 67% reduce 0%
    13/04/10 22:06:36 INFO mapred.JobClient:  map 68% reduce 0%
    13/04/10 22:06:42 INFO mapred.JobClient:  map 69% reduce 0%
    13/04/10 22:06:45 INFO mapred.JobClient:  map 70% reduce 0%
    13/04/10 22:06:48 INFO mapred.JobClient:  map 71% reduce 0%
    13/04/10 22:06:51 INFO mapred.JobClient:  map 73% reduce 0%
    13/04/10 22:06:54 INFO mapred.JobClient:  map 74% reduce 0%
    13/04/10 22:06:57 INFO mapred.JobClient:  map 75% reduce 0%
    13/04/10 22:07:00 INFO mapred.JobClient:  map 77% reduce 0%
    13/04/10 22:07:03 INFO mapred.JobClient:  map 78% reduce 0%
    13/04/10 22:07:12 INFO mapred.JobClient:  map 79% reduce 0%
    13/04/10 22:07:18 INFO mapred.JobClient:  map 80% reduce 0%
    13/04/10 22:07:24 INFO mapred.JobClient:  map 81% reduce 0%
    13/04/10 22:07:30 INFO mapred.JobClient:  map 82% reduce 0%
    13/04/10 22:07:36 INFO mapred.JobClient:  map 83% reduce 0%
    13/04/10 22:07:48 INFO mapred.JobClient:  map 84% reduce 0%
    13/04/10 22:07:51 INFO mapred.JobClient:  map 85% reduce 0%
    13/04/10 22:07:59 INFO mapred.JobClient:  map 86% reduce 0%
    13/04/10 22:08:05 INFO mapred.JobClient:  map 87% reduce 0%
    13/04/10 22:08:11 INFO mapred.JobClient:  map 88% reduce 0%
    13/04/10 22:08:17 INFO mapred.JobClient:  map 89% reduce 0%
    13/04/10 22:08:23 INFO mapred.JobClient:  map 90% reduce 0%
    13/04/10 22:08:29 INFO mapred.JobClient:  map 91% reduce 0%
    13/04/10 22:08:35 INFO mapred.JobClient:  map 92% reduce 0%
    13/04/10 22:08:41 INFO mapred.JobClient:  map 93% reduce 0%
    13/04/10 22:08:47 INFO mapred.JobClient:  map 94% reduce 0%
    13/04/10 22:08:53 INFO mapred.JobClient:  map 95% reduce 0%
    13/04/10 22:08:59 INFO mapred.JobClient:  map 96% reduce 0%
    13/04/10 22:09:05 INFO mapred.JobClient:  map 97% reduce 0%
    13/04/10 22:09:11 INFO mapred.JobClient:  map 98% reduce 0%
    13/04/10 22:09:17 INFO mapred.JobClient:  map 99% reduce 0%
    13/04/10 22:09:23 INFO mapred.JobClient:  map 100% reduce 0%
    13/04/10 22:09:31 INFO mapred.JobClient: Job complete: job_201303302227_0034
    13/04/10 22:09:31 INFO mapred.JobClient: Counters: 18
    13/04/10 22:09:31 INFO mapred.JobClient:   Job Counters
    13/04/10 22:09:31 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=550605
    13/04/10 22:09:31 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/04/10 22:09:31 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/04/10 22:09:31 INFO mapred.JobClient:     Launched map tasks=2
    13/04/10 22:09:31 INFO mapred.JobClient:     Data-local map tasks=2
    13/04/10 22:09:31 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
    13/04/10 22:09:31 INFO mapred.JobClient:   File Output Format Counters
    13/04/10 22:09:31 INFO mapred.JobClient:     Bytes Written=0
    13/04/10 22:09:31 INFO mapred.JobClient:   FileSystemCounters
    13/04/10 22:09:31 INFO mapred.JobClient:     HDFS_BYTES_READ=104394990
    13/04/10 22:09:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=64078
    13/04/10 22:09:31 INFO mapred.JobClient:   File Input Format Counters
    13/04/10 22:09:31 INFO mapred.JobClient:     Bytes Read=104394710
    13/04/10 22:09:31 INFO mapred.JobClient:   Map-Reduce Framework
    13/04/10 22:09:31 INFO mapred.JobClient:     Map input records=4995670
    13/04/10 22:09:31 INFO mapred.JobClient:     Physical memory (bytes) snapshot=279134208
    13/04/10 22:09:31 INFO mapred.JobClient:     Spilled Records=0
    13/04/10 22:09:31 INFO mapred.JobClient:     CPU time spent (ms)=129130
    13/04/10 22:09:31 INFO mapred.JobClient:     Total committed heap usage (bytes)=202833920
    13/04/10 22:09:31 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1170251776
    13/04/10 22:09:31 INFO mapred.JobClient:     Map output records=4995670
    13/04/10 22:09:31 INFO mapred.JobClient:     SPLIT_RAW_BYTES=280

    可以看到,除了加载Hadoop对应的HADOOP_HOME变量指定的路径下,lib*目录下的jar文件以外,还加载了我们设置的-libjars选项中指定的第三方jar文件,供Job运行时使用。

    将Job代码和依赖jar文件打包

    我比较喜欢这种方式,因为这样做首先利用饿Maven的很多优点,如管理依赖、自动构建。另外,对于其他想要使用该Job的开发人员或部署人员,无需关系更多的配置,只要按照Maven的构建规则去构建,就可以生成最终的部署文件,从而也就减少了在执行Job的时候,出现各种常见的问题(如CLASSPATH设置有问题等)。
    使用如下的Maven构建插件配置,执行mvn package命令,就可以完成这些任务:

    01
    02
    03
    04
    05
    06
    07
    08
    09
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    <build>
         <plugins>
              <plugin>
                   <artifactId>maven-assembly-plugin</artifactId>
                   <configuration>
                        <archive>
                             <manifest>
                                  <mainClass>org.shirdrn.solr.cloud.index.hadoop.SolrCloudIndexer</mainClass>
                             </manifest>
                        </archive>
                        <descriptorRefs>
                             <descriptorRef>jar-with-dependencies</descriptorRef>
                        </descriptorRefs>
                   </configuration>
                   <executions>
                        <execution>
                             <id>make-assembly</id>
                             <phase>package</phase>
                             <goals>
                                  <goal>single</goal>
                             </goals>
                        </execution>
                   </executions>
              </plugin>
         </plugins>
    </build>

    最后生成的jar文件在target目录下面,例如名称类似solr-platform-2.0-jar-with-dependencies.jar,然后可以直接拷贝这个文件到指定的目录,提交到Hadoop计算集群运行。

  • 相关阅读:
    编译安装mysql5.7.9
    配置阿里云作为yum 源
    python 序列类型
    python 数据类型之list
    python 数据类型之数float
    深度学习与中文短文本分析总结与梳理
    相似度的算法(欧几里德距离和皮尔逊算法)
    人工智能(Machine Learning)—— 机器学习
    python设置redis过期时间
    K-均值聚类(K-means)算法
  • 原文地址:https://www.cnblogs.com/zzwx/p/8820206.html
Copyright © 2020-2023  润新知