• hadoop安装及配置入门篇


    声明:
          author: 龚细军
          时间: 17-08-01
          类型: 笔记
          转载时请注明出处及相应链接。
          链接地址: http://www.cnblogs.com/gongxijun/p/5726024.html

    本笔记所记录全部基于真实操作所得,所使用hadoop版本为hadoop-2.7.2,使用操作系统为kylin-linux.

    默认是:已经安装好了jdk环境.并已经下载好hadoop&解压之后

    1. 下载完成hadoo并解压之后

    进入到安装目录,我们会看到如下几个文件夹和文件

    /hadoop-2.7.2$ ls
    bin  include  lib      LICENSE.txt  NOTICE.txt  README.txt  share
    etc  input    libexec  logs         output      sbin        wc-in

    介绍一下基本情况:

    bin目录: hadoop的指令集合存储区,例如 hadoop ,hdfs , yarn,mapred等  这个文件比较重要

    我们可以如此使用它们:

    /hadoop-2.7.2$ bin/hadoop dfs -cat output/* |more

    include目录: C++/C 开发用的头文件 

    lib目录: 提供各种库,c/c++开发库

    etc目录: 环境配置包,其他的版本采用conf目录替换,进入该目录下会看到

    /hadoop-2.7.2/etc/hadoop$ ls | grep .xml
    capacity-scheduler.xml
    core-site.xml
    hadoop-policy.xml
    hdfs-site.xml
    hdfs-site.xml~
    httpfs-site.xml
    kms-acls.xml
    kms-site.xml
    mapred-queues.xml.template
    mapred-site.xml.template
    ssl-client.xml.example
    ssl-server.xml.example
    yarn-site.xml

    关于如何伪分布式配置 

    1.配置文件core.site.xml

      <configuration>
                <property>
                <name>fs.default.name</name>
                <value>hdfs://localhost:9000</value>
                </property>
        </configuration>

    2.hdfs.site.xml文件配置

        <configuration>
                <property>
                <name>dfs.replication</name>
                <value>1</value>
                </property>
                <property>
                <name>dfs.name.dir</name>
                <value>/home/gongxijun/HDFS/fileinput</value>
                </property>
                <property>
                <name>dfs.data.dir</name>
                <value>/home/gongxijun/HDFS/fileoutput</value>
                </property>
                <property>
                <name>dfs.permissions</name>
                <value>false</value>
                <description>
                if "true" ,enable permission checking in HDFS. if "false",permission checking is turned off,but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode , owner or group of files or directories.
                </description>
                </property>
                </configuration>

    3.配置mapred-site.xml文件,如要将mapred-site.xml.template文件复制一份mapred-site.xml,并对mapred-site.xml进行如下配置

    <configuration>
    <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
    </property>
    </configuration>
        

    之后,启动hadoop,输入./start-all.sh

    程序pom.xml文件配置

       <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>${hadoop.version}</version>
                <scope>compile</scope>
                <exclusions>
                    <exclusion>
                        <artifactId>zookeeper</artifactId>
                        <groupId>org.apache.zookeeper</groupId>
                    </exclusion>
                    <exclusion>
                        <artifactId>slf4j-log4j12</artifactId>
                        <groupId>org.slf4j</groupId>
                    </exclusion>
                    <exclusion>
                        <artifactId>jsp-api</artifactId>
                        <groupId>javax.servlet.jsp</groupId>
                    </exclusion>
                    <exclusion>
                        <artifactId>jasper-runtime</artifactId>
                        <groupId>tomcat</groupId>
                    </exclusion>
                    <exclusion>
                        <artifactId>jasper-compiler</artifactId>
                        <groupId>tomcat</groupId>
                    </exclusion>
                    <exclusion>
                        <artifactId>jersey-server</artifactId>
                        <groupId>com.sun.jersey</groupId>
                    </exclusion>
                    <exclusion>
                        <artifactId>asm</artifactId>
                        <groupId>asm</groupId>
                    </exclusion>
                </exclusions>
            </dependency>

    运行程序如下:

     1 package com.qunar.mapReduce;
     2 
     3 import org.apache.hadoop.conf.Configuration;
     4 import org.apache.hadoop.fs.Path;
     5 import org.apache.hadoop.io.IntWritable;
     6 import org.apache.hadoop.io.LongWritable;
     7 import org.apache.hadoop.io.Text;
     8 import org.apache.hadoop.mapreduce.Job;
     9 import org.apache.hadoop.mapreduce.Mapper;
    10 import org.apache.hadoop.mapreduce.Reducer;
    11 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    12 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    13 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    14 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
    15 
    16 import java.io.IOException;
    17 import java.util.Scanner;
    18 import java.util.StringTokenizer;
    19 
    20 /**
    21  * *********************************************************
    22  * <p/>
    23  * Author:     XiJun.Gong
    24  * Date:       2016-07-29 14:59
    25  * Version:    default 1.0.0
    26  * Class description:
    27  * <p/>
    28  * *********************************************************
    29  */
    30 public class MapReduceDemo {
    31 
    32     public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
    33 
    34         private final static IntWritable one = new IntWritable(1);
    35         private Text word = new Text();
    36 
    37         @Override
    38         public void map(LongWritable key, Text value, Context context)
    39                 throws IOException, InterruptedException {
    40             String line = value.toString();
    41             StringTokenizer tokenizer = new StringTokenizer(line);
    42             while (tokenizer.hasMoreTokens()) {
    43                 word.set(tokenizer.nextToken());
    44                 context.write(word, one);
    45             }
    46         }
    47 
    48         public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
    49 
    50             @Override
    51             public void reduce(Text key, Iterable<IntWritable> values, Context context)
    52                     throws IOException, InterruptedException {
    53                 int sum = 0;
    54                 for (IntWritable val : values) {
    55                     sum += val.get();
    56                 }
    57 
    58                 context.write(key, new IntWritable(sum));
    59             }
    60         }
    61 
    62 
    63         public static void main(String[] args) throws Exception {
    64 
    65             Configuration configuration = new Configuration();
    66             Job job = new Job(configuration, "wordCount");
    67             job.setOutputKeyClass(Text.class);
    68             job.setOutputValueClass(IntWritable.class);
    69             job.setMapperClass(Map.class);
    70             job.setReducerClass(Reduce.class);
    71             job.setInputFormatClass(TextInputFormat.class);
    72             job.setOutputFormatClass(TextOutputFormat.class);
    73             Scanner reader = new Scanner(System.in);
    74             while (reader.hasNext()) {
    75                 FileInputFormat.addInputPath(job, new Path(reader.next()));
    76                 FileOutputFormat.setOutputPath(job, new Path(reader.next()));
    77                 job.waitForCompletion(true);
    78             }
    79         }
    80     }
    81 }

    运行程序:

    Connected to the target VM, address: '127.0.0.1:51980', transport: 'socket'
    12:41:05.404 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of successful kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
    12:41:05.441 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[Rate of failed kerberos logins and latency (milliseconds)], always=false, type=DEFAULT, sampleName=Ops)
    12:41:05.442 [main] DEBUG o.a.h.m.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, about=, value=[GetGroups], always=false, type=DEFAULT, sampleName=Ops)
    12:41:05.444 [main] DEBUG o.a.h.m.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics
    12:41:05.871 [main] DEBUG o.a.h.s.a.util.KerberosName - Kerberos krb5 configuration not found, setting default realm to empty
    12:41:05.883 [main] DEBUG org.apache.hadoop.security.Groups -  Creating new Groups object
    12:41:05.895 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...
    12:41:05.896 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
    12:41:05.896 [main] DEBUG o.a.hadoop.util.NativeCodeLoader - java.library.path=/home/gongxijun/Qunar/idea-IU-139.1117.1/bin::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
    12:41:05.897 [main] WARN  o.a.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    12:41:05.900 [main] DEBUG o.a.hadoop.util.PerformanceAdvisory - Falling back to shell based
    12:41:05.905 [main] DEBUG o.a.h.s.JniBasedUnixGroupsMappingWithFallback - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
    12:41:05.957 [main] DEBUG org.apache.hadoop.util.Shell - setsid exited with exit code 0
    12:41:05.957 [main] DEBUG org.apache.hadoop.security.Groups - Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
    12:41:05.961 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login
    12:41:05.962 [main] DEBUG o.a.h.security.UserGroupInformation - hadoop login commit
    12:41:05.968 [main] DEBUG o.a.h.security.UserGroupInformation - using local user:UnixPrincipal: gongxijun
    12:41:05.969 [main] DEBUG o.a.h.security.UserGroupInformation - Using user: "UnixPrincipal: gongxijun" with name gongxijun
    12:41:05.969 [main] DEBUG o.a.h.security.UserGroupInformation - User entry: "gongxijun"
    12:41:05.970 [main] DEBUG o.a.h.security.UserGroupInformation - UGI loginUser:gongxijun (auth:SIMPLE)

    输入命令:

    /home/gongxijun/web进阶.txt
    /home/gongxijun/a.txt

    显示结果:

    12:44:36.992 [main] INFO  org.apache.hadoop.mapreduce.Job - Counters: 33
        File System Counters
            FILE: Number of bytes read=6316
            FILE: Number of bytes written=518809
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
        Map-Reduce Framework
            Map input records=84
            Map output records=85
            Map output bytes=1476
            Map output materialized bytes=1652
            Input split bytes=99
            Combine input records=0
            Combine output records=0
            Reduce input groups=82
            Reduce shuffle bytes=1652
            Reduce input records=85
            Reduce output records=82
            Spilled Records=170
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=9
            CPU time spent (ms)=0
            Physical memory (bytes) snapshot=0
            Virtual memory (bytes) snapshot=0
            Total committed heap usage (bytes)=459276288
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=1335
        File Output Format Counters 
            Bytes Written=1311

     结果在a.txt文件夹中:

    (kafuka卡夫卡)    1
    (缺陷:    1
    (需要重点学习)    1
    ---去查看QMQ--message---->broker    1
    /    1
    1.    2
    1.判断线程安全的两个机准:    1
    2.    3
    3.    1
    Apache    1
    Cache    1
    Client:    1
    ConCurrentHashMap    1
    Dubbo    1
    Executor    1
    Futrue/CountDownLatch    1
    Guava    1
    HTTP:    1
    HashMap    1
    Hession    1
    HttpComponents    1
    Java    1
    Json    1
    Key-Value    1
    Kryo(重点)    1
    LRU    1
    Protobuf    1
    QMQ/AMQ/rabbitimq    1
    ReadWriterLock    1
    ReentrantLock    1
    async-http-client    1
    c3p0    1
    client实现    1
    dbpc    1
    redis    1
    seriialization    1
    servlet    1
    snchronized    1
    spymemcached    1
    tomcat-jdbc    1
    xmemcached    1
    一致性Hash    1
    一:    1
    三:    1
    乐观锁:    1
    二:    1
    互斥    1
    共享数据    1
    分布式锁?    1
    分布式:    1
    前端轮询,后端异步:    1
    单例的    1
    参数回调    1
    可复用资源,创建代价大    1
    可扩展性,服务降级,负载均衡,灰度    1
    可重入锁    1
    可靠性    1
    回顾    1
    场景:    1
    对象池:    1
    将对象的状态信息转换为可以存储或传输形式的过程.    1
    尽量不要使用本地缓存    1
    并发修改    1
    序列化:    1
    建议:    1
    异步调用    1
    异步:    1
    形成环)    1
    性能    1
    方式:    1
    本地缓存太大,可以使用对象池    1
    概念:    1
    池化技术    1
    消息队列:    1
    类型:    1
    线程池    1
    缓存--本地    1
    读写锁:    1
    连接池:    1
    (分段锁)    1
    (推荐使用)    1
    ,    1
  • 相关阅读:
    ubuntu装openssh-client和openssh-server
    路由器开源系统openwrt配置页面定制
    linux 串口接收
    SHA算法
    密码学Hash函数
    椭圆曲线加密
    ElGamal密码
    Diffie-Hellman密钥交换
    RSA加密
    公钥密码学
  • 原文地址:https://www.cnblogs.com/gongxijun/p/5726024.html
Copyright © 2020-2023  润新知