hadoop官方文档:
1
|
https: //hadoop .apache.org /docs/ |
安装hadoop集群
配置DNS解析或hosts文件:
1
2
3
4
5
6
7
|
cat > /etc/hosts <<EOF 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.3.149.20 hadoop-master 10.3.149.21 hadoop-node1 10.3.149.22 hadoop-node2 EOF |
配置root用户免秘钥:
1
2
3
4
5
6
7
|
ssh -keygen ssh -copy- id -i . ssh /id_rsa .pub root@hadoop-master ssh -copy- id -i . ssh /id_rsa .pub root@hadoop-node1 ssh -copy- id -i . ssh /id_rsa .pub root@hadoop-node2 ssh root@hadoop-master 'date' ssh root@hadoop-node1 'date' ssh root@hadoop-node2 'date' |
配置hadoop免秘钥:
1
2
3
4
5
6
7
8
9
10
11
12
|
useradd hadoop echo '123456' | passwd --stdin hadoop su hadoop ssh -keygen ssh -copy- id -i . ssh /id_rsa .pub hadoop@hadoop-master ssh -copy- id -i . ssh /id_rsa .pub hadoop@hadoop-node1 ssh -copy- id -i . ssh /id_rsa .pub hadoop@hadoop-node2 ssh hadoop@hadoop-master 'date' ssh hadoop@hadoop-node1 'date' ssh hadoop@hadoop-node2 'date' exit |
安装java:
1
|
tar -xf jdk-8u231-linux-x64. tar .gz -C /usr/local/ |
创建软连接:
1
2
|
cd /usr/local/ ln -sv jdk1.8.0_231/ jdk |
添加环境变量:
1
2
3
4
5
6
7
|
cat > /etc/profile .d /java .sh <<EOF export JAVA_HOME= /usr/local/jdk export JRE_HOME=$JAVA_HOME /jre export CLASSPATH=.:$JAVA_HOME /lib/dt .jar:$JAVA_HOME /lib/tools .jar:$JRE_HOME /lib export PATH=$PATH:$JAVA_HOME /bin :$JRE_HOME /bin EOF . /etc/profile .d /java .sh |
测试是否安装成功:
1
2
|
java -version javac -version |
安装hadoop:
hadoop下载地址:
1
2
|
https: //mirrors .tuna.tsinghua.edu.cn /apache/hadoop/common/ http: //archive .apache.org /dist/hadoop/common/ |
hadoop2.7版本的:
1
|
http: //archive .apache.org /dist/hadoop/common/hadoop-2 .7.1 /hadoop-2 .7.1. tar .gz |
下载安装包:
1
|
wget https: //mirrors .tuna.tsinghua.edu.cn /apache/hadoop/common/hadoop-2 .10.0 /hadoop-2 .10.0. tar .gz |
解压:
1
2
3
|
tar -xf hadoop-2.10.0. tar .gz -C /usr/local/ cd /usr/local/ ln -sv hadoop-2.10.0/ hadoop |
配置环境变量:
1
2
3
4
|
cat > /etc/profile .d /hadoop .sh <<EOF export HADOOP_HOME= /usr/local/hadoop export PATH=$PATH:$HADOOP_HOME /bin :$HADOOP_HOME /sbin EOF |
应用环境变量:
1
|
. /etc/profile .d /hadoop .sh |
创建数据目录:
1
2
3
4
|
# master mkdir -pv /data/hadoop/hdfs/ {nn,snn} # node mkdir -pv /data/hadoop/hdfs/dn |
master节点的配置:
进入配置目录:
1
|
cd /usr/local/hadoop/etc/hadoop |
core-site.xml
1
2
3
4
5
6
7
|
<configuration> <property> <name>fs.defaultFS< /name > <value>hdfs: //hadoop-master :8020< /value > <final> true < /final > < /property > < /configuration > |
yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
<configuration> <property> <name>yarn.resourcemanager.address< /name > <value>hadoop-master:8032< /value > < /property > <property> <name>yarn.resourcemanager.scheduler.address< /name > <value>hadoop-master:8030< /value > < /property > <property> <name>yarn.resourcemanager.resource-tracker.address< /name > <value>hadoop-master:8031< /value > < /property > <property> <name>yarn.resourcemanager.admin.address< /name > <value>hadoop-master:8033< /value > < /property > <property> <name>yarn.resourcemanager.webapp.address< /name > <value>hadoop-master:8088< /value > < /property > <property> <name>yarn.nodemanager.aux-services< /name > <value>mapreduce_shuffle< /value > < /property > <property> <name>yarn.nodemanager.auxservices.mapreduce_shuffle.class< /name > <value>org.apache.hadoop.mapred.ShuffleHandler< /value > < /property > <property> <name>yarn.resourcemanager.scheduler.class< /name > <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler< /value > < /property > < /configuration > |
hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
|
<configuration> <property> <name>dfs.replication< /name > <value>1< /value > < /property > <property> <name>dfs.namenode.name. dir < /name > <value> file : ///data/hadoop/hdfs/nn < /value > < /property > <property> <name>dfs.datanode.data. dir < /name > <value> file : ///data/hadoop/hdfs/dn < /value > < /property > <property> <name>fs.checkpoint. dir < /name > <value> file : ///data/hadoop/hdfs/snn < /value > < /property > <property> <name>fs.checkpoint.edits. dir < /name > <value> file : ///data/hadoop/hdfs/snn < /value > < /property > < /configuration > |
mapred-site.xml
1
2
3
4
5
6
|
<configuration> <property> <name>mapreduce.framework.name< /name > <value>yarn< /value > < /property > < /configuration > |
创建master文件:
1
2
3
|
cat > master <<EOF hadoop-master EOF |
创建slave
1
2
3
4
|
cat > slaves <<EOF hadoop-node1 hadoop-node2 EOF |
常用配置注解:
1
|
http: //blog .51yip.com /hadoop/2020 .html |
node节点上:
将主节点上的配置复制到node节点即可:
1
2
|
scp ./* root@hadoop-node1: /usr/local/hadoop/etc/hadoop/ scp ./* root@hadoop-node2: /usr/local/hadoop/etc/hadoop/ |
删除slaves文件:其他配置同master。
1
|
rm /usr/local/hadoop/etc/hadoop/slaves -rf |
创建日志目录:
1
2
|
mkdir /usr/local/hadoop/logs chmod g+w /usr/local/hadoop/logs/ |
改属主属组:
1
2
3
|
chown -R hadoop.hadoop /data/hadoop/ cd /usr/local/ chown -R hadoop.hadoop hadoop hadoop/ |
启动与停止集群
格式化hdfs:格式化之后就可以启动集群了
1
2
|
su hadoop [hadoop@hadoop-master ~]$ hadoop namenode - format |
先启动hdfs:从下面的输出可以看出各个节点以及运行的程序。
1
2
3
4
5
6
7
|
[hadoop@hadoop-master ~]$ start-dfs.sh Starting namenodes on [hadoop-master] hadoop-master: starting namenode, logging to /usr/local/hadoop-2 .10.0 /logs/hadoop-hadoop-namenode-hadoop-master .out hadoop-node2: starting datanode, logging to /usr/local/hadoop-2 .10.0 /logs/hadoop-hadoop-datanode-hadoop-node2 .out hadoop-node1: starting datanode, logging to /usr/local/hadoop-2 .10.0 /logs/hadoop-hadoop-datanode-hadoop-node1 .out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2 .10.0 /logs/hadoop-hadoop-secondarynamenode-hadoop-master .out |
查看本节点运行的进程:可以到任意一个节点上使用如下命令。
1
2
3
4
5
|
~]$ jps 1174 Jps 32632 ResourceManager 32012 NameNode 32220 SecondaryNameNode |
再启动yarn:可以看到对应的节点启动的进程。
1
2
3
4
5
|
[hadoop@hadoop-master ~]$ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop-2 .10.0 /logs/yarn-hadoop-resourcemanager-hadoop-master .out hadoop-node2: starting nodemanager, logging to /usr/local/hadoop-2 .10.0 /logs/yarn-hadoop-nodemanager-hadoop-node2 .out hadoop-node1: starting nodemanager, logging to /usr/local/hadoop-2 .10.0 /logs/yarn-hadoop-nodemanager-hadoop-node1 .out |
或者一次性启动:
1
|
[hadoop@hadoop-master ~]$ start-all.sh |
查看hadoop集群的运行状态:
1
|
hadoop dfsadmin -report |
访问概览web页面:
1
|
http: //10 .3.149.20:50070/ |
集群信息web页面:
1
|
http: //10 .3.149.20:8088 /cluster |
停止集群:
1
2
|
stop-dfs.sh stop-yarn.sh |
或者:
1
|
stop-all.sh |
hdfs文件系统的使用
浏览目录:
1
|
~]$ hdfs dfs - ls / |
创建目录:
1
|
~]$ hdfs dfs - mkdir /test |
上传文件:
1
|
~]$ hdfs dfs -put /etc/fstab /test/fstab |
查看文件存储位置:到其中一个datanode上的数据目录就可以查看到这个文件块,默认为128m,超过这个大小文件会分成两块,但是小于128m的文件并不会真正占用128m。
1
|
]$ cat /data/hadoop/hdfs/dn/current/BP-1469813358-10 .3.149.20-1595493741225 /current/finalized/subdir0/subdir0/blk_1073741825 |
递归浏览
1
|
~]$ hdfs dfs - ls -R / |
查看文件:
1
|
~]$ hdfs dfs - cat /fstab |
更多使用命令帮助:
1
|
https: //hadoop .apache.org /docs/r2 .10.0 /hadoop-project-dist/hadoop-common/FileSystemShell .html |
统计字符数运算示例:
在 /usr/local/hadoop/share/hadoop/mapreduce 目录中有很多用于计算的示例可以用来测试。
先上传用于测试的文件:
1
2
|
hdfs dfs mkdir /test hdfs dfs -put /etc/fstab /test/fstab |
查看帮助:直接运行程序会给出帮助信息
1
|
yarn jar hadoop-mapreduce-examples-2.10.0.jar |
测试:这里选择一个单词统计进行测试。
1
2
|
cd /usr/local/hadoop/share/hadoop/mapreduce ]$ yarn jar hadoop-mapreduce-examples-2.10.0.jar wordcount /test/fstab /test/count |
可以在下面页面查看到正在运行的任务:
1
|
http: //10 .3.149.20:8088 /cluster/apps |
查看运算的结果:
1
|
]$ hdfs dfs - cat /test/count/part-r-00000 |
yarn常用命令:
查看运行中的应用:
1
|
~]$ yarn application -list |
已经运行过的应用:
1
|
~]$ yarn application -list -appStates=all |
查看应用的状态:
1
|
~]$ yarn application -status application_1595496103452_0001 |