一 环境
机器
18台物理机
系统:
CentOS release 6.6 (Final)
java版本:
14:27 [root@hostname]$ java -version java version "1.7.0_65" OpenJDK Runtime Environment (rhel-2.5.1.2.el6_5-x86_64 u65-b17) OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
hadoop 版本:
下载地址:http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
spark 版本:
下载地址:http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
二 ssh 配置
配置hosts
配置/etc/hosts
18台机器全部配置
新建用户组
我们后面将使用hadoop:hadoop来使用hadoop,那么先创建hadoop用户,用户组
14:56 [root@a03]$ groupadd hadoop;adduser -g hadoop hadoop;passwd hadoop
Changing password for user hadoop.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
18台机器都需要新建
生成master公钥
注意先切换到hadoop用户
15:51 [hadoop@a01]$ cd .ssh/ # 如果没有该目录,先执行一次ssh localhost
tty:[1] jobs:[0] cwd:[~/.ssh]
15:52 [hadoop@a01]$ ssh-keygen -t rsa # 一直按回车就可以,生成的密钥保存为.ssh/id_rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
1b:d8:7b:66:a6:8f:0d:04:82:26:ac:aa:c7:35:84:c6 hadoop@a01
The key's randomart image is:
+--[ RSA 2048]----+
| |
|. . |
|.oo.. . |
|.oE .. + |
|.. . . S |
|. o . + |
|.. . . + = |
|. o X |
|.. o.o |
+-----------------+
拷贝master公钥到datanode机器上
16:54 [hadoop@a01]$ pwd /home/hadoop/.ssh 16:51 [hadoop@a01]$ ssh-copy-id hadoop@a02 The authenticity of host '[a02]:22022 ([10.xxx.x.xx]:22022)' can't be established. RSA key fingerprint is ad:xx:xx:xx:xx:xx:2b:xx:xx:4a:46:xx:xx:xx:d3:3b. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '[a02]:22022,[10.xxx.x.xx]:22022' (RSA) to the list of known hosts. hadoop@a02's password: Now try logging into the machine, with "ssh 'hadoop@a02'", and check in: .ssh/authorized_keys to make sure we haven't added extra keys that you weren't expecting.
拷到所有的17台datanode上.验证一下
17:00 [hadoop@a01]$ ssh a02 Last login: Wed Sep 9 16:49:27 2015 from 10.107.7.49 tty:[1] jobs:[0] cwd:[~] 17:00 [hadoop@a02]$
看到已经不需要密码了
三 hadoop安装
解压
17:20 [root@a01]$ tar -zxvf hadoop-2.6.0.tar.gz -C /usr/local/
将hadoop 分配给hadoop用户用户组
17:21 [root@a01]$ cd /usr/local/ tty:[1] jobs:[0] cwd:[/usr/local] 17:22 [root@a01]$ chown -R hadoop:hadoop hadoop-2.6.0/
配置集群
集群/分布式模式需要修改 etc/hadoop 中的5个配置文件,后四个文件可点击查看官方默认设置值,这里仅设置了正常启动所必须的设置项: slaves、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml 。
slaves 配置
17:25 [hadoop@a01]$ vim slaves
a02
a03
a04
a05
a06
a07
a08
a09
a10
a11
a12
a13
a14
a15
a16
a17
a18
core-site.xml 配置
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://a01:9000</value> </property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/local/</value>
<description>Abase for other temporary directories.</description>
</property>
<!-- file system properties --> <property> <name>fs.file.impl</name> <value>org.apache.hadoop.fs.LocalFileSystem</value> <description>The FileSystem for file: uris.</description> </property> <property> <name>fs.hdfs.impl</name> <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> <description>The FileSystem for hdfs: uris.</description> </property> <property> <name>fs.trash.interval</name> <value>4320</value> </property> <property> <name>fs.trash.checkpoint.interval</name> <value>432</value> </property> </configuration>
hdfs-site.xml 配置
<configuration> <!-- base config --> <property> <name>dfs.nameservices</name> <value>a01</value> </property> <property> <name>dfs.block.size</name> <value>268435456</value> <description>The default block size for new files.</description> </property> <property> <name>dfs.replication</name> <value>3</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> <property> <name>dfs.datanode.data.dir.perm</name> <value>770</value> <description>Permissions for the directories on on the local filesystem where the DFS data node store its blocks. The permissions can either be octal or symbolic.</description> </property> </configuration>
mapred-site.xml 配置
文件mapred-site.xml
,这个文件不存在,首先需要从模板中复制一份:
$ cp mapred-site.xml.template mapred-site.xml
然后配置修改如下:
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapred.local.dir</name> <value>/opt/mapred/local</value> </property> <property> <name>mapred.map.tasks</name> <value>10</value> </property> <property> <name>mapred.reduce.tasks</name> <value>20</value> </property> <!-- i/o properties config--> <property> <name>mapreduce.output.fileoutputformat.compress.type</name> <value>BLOCK</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress.codec</name> <value>org.apache.hadoop.io.compress.GzipCodec</value> </property> <property> <name>mapreduce.output.fileoutputformat.compress</name> <value>false</value> </property> <property> <name>mapred.output.compression.type</name> <value>BLOCK</value> <property> <name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>com.hadoop.compression.lzo.LzopCodec</value> </property> <property> <name>mapreduce.task.userlog.limit.kb</name> <value>1024</value> </property> <property> <name>io.sort.factor</name> <value>100</value> </property> <property> <name>io.sort.mb</name> <value>100</value> </property> <property> <name>io.sort.record.percent</name> <value>0.05</value> <description>The percentage of io.sort.mb dedicated to tracking record boundaries. Let this value be r, io.sort.mb be x. The maximum number of records collected before the collection thread must block is equal to (r * x) / 4</description> </property> <property> <name>io.sort.spill.percent</name> <value>0.80</value> </property> <property> <name>mapred.job.shuffle.input.buffer.percent</name> <value>0.5</value> </property> <property> <name>mapred.map.tasks.speculative.execution</name> <value>false</value> </property> <property> <name>mapred.reduce.tasks.speculative.execution</name> <value>false</value> </property> <property> <name>mapreduce.task.timeout</name> <value>900000</value> </property> <property> <name>mapred.child.java.opts</name> <value>-Xmx1024m</value> </property>
看标红的mapred.local.dir配置,要提前在所有的机器上建好所配置的路径,这里是/opt/mapred/local/ ,并给hadoop用户分配权限.
$ mkdir -p /opt/mapred/local;chown -R hadoop:hadoop /opt/mapred/
这个配置是 mapred做本地计算所使用的文件夹,可以配置多块硬盘,逗号分隔.
yarn-site.xml配置
<property> <name>yarn.resourcemanager.hostname</name> <value>a01</value> </property> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/opt/yarn/local</value> <description>the local directories used by thenodemanager</description> </property>
看 yarn.nodemanager.local-dirs 配置,要提前在所有的机器上建好所配置的路径,这里是/opt/yarn/local/ ,并给hadoop用户分配权限.
mkdir -p /opt/yarn/local;chown -R hadoop:hadoop /opt/yarn
hadoop-env.sh 配置
hadoop-env.sh中java_home设置为java的安装目录
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/
配置全部完成,将hadoop目录打包复制到所有节点上
$ tar -zcf ./hadoop.tar.gz ./hadoop # for i in {2..9}; do scp hadoop.tgz a0$i:~/;done # for i in {10..18}; do scp hadoop.tgz a$i:~/;done
再去各节点,解压包
$ cd /usr/local/ $ tar -xvf /home/hadoop/hadoop.tgz $ chown -R hadoop:hadoop hadoop-2.6.0/
给所有节点配置hadoop环境变量
在/etc/profile
export HADOOP_HOME=/usr/local/hadoop-2.6.0 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop/ export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
修改完成后 source 一下
source /etc/profile
启动hadoop
$ cd /usr/local/hadoop-2.6.0/ $ bin/hdfs namenode -format # 首次运行需要执行初始化,后面不再需要 $ start-dfs.sh $ start-yarn.sh
通过jps 可以看到 namenode 节点上启动 namenode,reourcemanager进程
20:04 [hadoop@a01]$ jps 6298 NameNode 6719 ResourceManager 7040 Jps
datanode 节点上启动了 NodeManager,Datanode 进程
19:55 [root@a02]$ jps 4296 NodeManager 4177 DataNode 4466 Jps
可以从 a01:50070看到状态
可以看到17个节点全部正常运行
问题解决
1.集群总容量不对
但是看到容量的数据有问题,17台机器的总量竟然只有243G,每台都是1t的硬盘,看下系统的磁盘分区
10:12 [hadoop@a01$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda2 15G 3.2G 11G 24% / tmpfs 32G 0 32G 0% /dev/shm /dev/sda1 190M 27M 153M 16% /boot /dev/sda5 898G 75M 853G 1% /opt
看到是一个盘分了4个分区.主分区是15G, 15* 17=255 ,那么应该就是hadoop只用的主分区.
在core-site.xml 中配置 hadoop.tmp.dir ,配置到 /opt/hadoop/local 下.默认这个配置是 /tmp/hadoop-${user}/
<property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/local/</value> <description>Abase for other temporary directories.</description> </property>
下面先新建 /opt/hadoop/local/ ,并分配给hadoop用户 用户组,
再将更改后的配置复制到所有节点上,
重新格式化namenode,
重启hadoop
$ mkdir -p /opt/hadoop/local;chown -R hadoop:hadoop /opt/hadoop $ stop-all.sh $ for i in {50..66};do scp core-site.xml xx.xxx.x.$i:/usr/local/hadoop-2.6.0/etc/hadoop/;done $ cd $HADOOP_HOME $ bin/hdfs namenode -format $ start-all.sh
启动成功了,再看下 namenode:50070
看到现在集群总存储是14.9T 了,并且 namenode storage 也变成 /opt/hadoop/local/dfs/namenode了.问题解决
2.集群总内存,总cpu不对
看到总内存是136G,总核数也是136个,每台机器为8G 8核,但是机器的内存是64g 的 24 核的.
查了一下,是yarn中默认设置内存与cpu都是8.
这个值需要根据自己集群的情况来调整,调优方案见
http://blog.javachen.com/2015/06/05/yarn-memory-and-cpu-configuration.html
调整后的yarn 配置
<property> <name>yarn.resourcemanager.hostname</name> <value>a01</value> </property> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/opt/yarn/local</value> <description>the local directories used by thenodemanager</description> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>56832</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>2048</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>56832</value> </property> <property> <name>yarn.app.mapreduce.am.resource.mb</name> <value>4096</value> </property> <property> <name>yarn.app.mapreduce.am.command-opts</name> <value>-Xmx3276m</value> </property>
再看rn 页面
看到现在每台机器是55g 内存.问题解决