• hadoop分布式环境安装


    1. 下载hadoop和jdk安装包到指定目录,并安装java环境。

     

    2.解压hadoop到指定目录,配置环境变量。vim /etc/profile

    export JAVA_HOME=/home/xiaozw/soft/java/jdk1.8.0_201
    export HADOOP_HOME=/home/xiaozw/soft/java/hadoop-3.1.2
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
    export JAVA_PATH=${JAVA_HOME}/bin:${JRE_HOME}/bin
    export PATH=$PATH:${JAVA_PATH}:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

    3. 复制配置文件到新文件夹,备份用。

    cp -r hadoop hadoop_cluster

    重命名配置文件。

    mv hadoop hadoop_bak

    创建软链接
    ln -s hadoop hadoop_cluster

     

    修改配置文件,路径:soft/java/hadoop-3.1.2/etc/hadoop_cluster/

    分别修改

     core-site.xml

    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://c0:9000/</value>
    </property>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/xiaozw/soft/tmp/hadoop-${user.name}</value>
    </property>

    hdfs-site.xml

    <property>
    <name>dfs.replication</name>
    <value>2</value>
    </property>
    <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>c3:9868</value>
    </property>

    mapred-site.xml

    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>

    yarn-site.xml

    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <property>
    <name>yarn.recourcemanager.hostname</name>
    <value>c3</value>
    </property>
    <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
    </property>
    <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>604800</value>
    </property>

    修改hadoop_cluster/hadoop-env.sh

    export JAVA_HOME=/home/xiaozw/soft/java/jdk1.8.0_201

    4. 克隆多台机器。修改hostname

     分别修改每台机器。

    vim /etc/hostname
    c0

    每台机器都一样配置。

    vim /etc/hosts

    192.168.132.143 c0
    192.168.132.144 c1
    192.168.132.145 c2
    192.168.132.146 c3

    4台服务器需要ssh免密码登录。 

    设置2台服务器为data-node。进入配置文件目录:

    cd soft/java/hadoop-3.1.2/etc/hadoop_cluster/

    sudo vim workers

    新建脚本方便拷贝文件到多台服务器上面。

    bat.sh

    for((i=1;i<=3;i++))
    {
    #scp /home/xiaozw/soft/java/hadoop-3.1.2/etc/hadoop_cluster/hadoop-env.sh xiaozw@c$i:/home/xiaozw/soft/java/hadoop-3.1.2/etc/hadoop_cluster/hadoop-env.sh
    #scp /home/xiaozw/soft/java/hadoop-3.1.2/etc/hadoop_cluster/hdfs-site.xml xiaozw@c$i:/home/xiaozw/soft/java/hadoop-3.1.2/etc/hadoop_cluster/hdfs-site.xml
    #scp /home/xiaozw/soft/java/hadoop-3.1.2/etc/hadoop_cluster/core-site.xml xiaozw@c$i:/home/xiaozw/soft/java/hadoop-3.1.2/etc/hadoop_cluster/core-site.xml
    #scp /home/xiaozw/soft/java/hadoop-3.1.2/etc/hadoop_cluster/mapred-site.xml xiaozw@c$i:/home/xiaozw/soft/java/hadoop-3.1.2/etc/hadoop_cluster/mapred-site.xml
    #scp /home/xiaozw/soft/java/hadoop-3.1.2/etc/hadoop_cluster/yarn-site.xml xiaozw@c$i:/home/xiaozw/soft/java/hadoop-3.1.2/etc/hadoop_cluster/yarn-site.xml
    scp /home/xiaozw/soft/java/hadoop-3.1.2/etc/hadoop_cluster/workers xiaozw@c$i:/home/xiaozw/soft/java/hadoop-3.1.2/etc/hadoop_cluster/workers
    ssh xiaozw@c$i rm -rf /home/xiaozw/soft/tmp/
    #scp /etc/hosts xiaozw@c$i:/etc/hosts
    }

    新增权限
    chmod a+x bat.sh

    5. 启动hadoop 

    start-all.sh

    http://192.168.132.143:9870/dfshealth.html#tab-overview

     http://192.168.132.143:8088/cluster

    统计最高温度demo:

    public static void main(String[] args) {
                try {
                    log.info("开始。。。");
                    Configuration conf = new Configuration();
                    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    //            if (otherArgs.length != 2) {
    //                System.err.println("Usage: wordcount  ");
    //                System.exit(2);
    //            }
                    Job job = new Job(conf, "max tempperature");
    
                    //运行的jar
                    //job.setJarByClass(MaxTemperature2.class);
                    job.setJar("/home/xiaozw/soft/download/demo-0.0.1-SNAPSHOT.jar");
    
                    FileSystem fs=FileSystem.get(conf);
                    //如果输出路径存在,删除。
                    Path outDir=new Path("/home/xiaozw/soft/hadoop-data/out");
                    if(fs.exists(outDir)){
                        fs.delete(outDir,true);
                    }
    
                    Path tmpDir=new Path("/home/xiaozw/soft/tmp");
                    if(fs.exists(tmpDir)){
                        fs.delete(tmpDir,true);
                    }
    
                    //job执行作业时输入和输出文件的路径
                    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
                    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
    
                    //指定自定义的Mapper和Reducer作为两个阶段的任务处理类
                    job.setMapperClass(TempMapper.class);
                    job.setReducerClass(TempReduce.class);
    
                    //设置最后输出结果的Key和Value的类型
                    job.setOutputKeyClass(Text.class);
                    job.setOutputValueClass(IntWritable.class);
    
                    //提交作业并等待它完成
                    System.exit(job.waitForCompletion(true) ? 0 : 1);
                } catch (Exception e) {
                    e.printStackTrace();
                }
                log.info("结束。。。");
        }
    package com.example.demo;
    
    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Mapper;
    
    public class TempMapper extends Mapper<Object, Text, Text, IntWritable>{
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            String lineValue = value.toString();
            String year = lineValue.substring(0, 4);
            int temperature = Integer.parseInt(lineValue.substring(8));
            context.write(new Text(year), new IntWritable(temperature));
        }
    }
    package com.example.demo;
    
    import java.io.IOException;
    
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Reducer;
    
    public class TempReduce extends Reducer<Text, IntWritable, Text, IntWritable>{
        public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int maxTemp = Integer.MIN_VALUE;
            for(IntWritable value : values){
                maxTemp = Math.max(maxTemp, value.get());
            }
            context.write(key, new IntWritable(maxTemp));
        }
    }

    网盘文件和代码下载地址:

    链接:https://pan.baidu.com/s/14wdv5CTXzw_0pmDisCa0uA
    提取码:auao

  • 相关阅读:
    C# 和 java 基本数据类型
    ESB
    sql server和mysql到底有什么区别
    MySQL服务器线程池插件
    Index Merge 算法
    任职22年,PowerShell之父将从微软离职:曾因开发PowerShell被微软降级过
    Umi 4 发布啦 🎈
    VB/VBA,可能超乎你的想象
    消息推送的场景
    DPI
  • 原文地址:https://www.cnblogs.com/xiaozw/p/11101407.html
Copyright © 2020-2023  润新知