Hadoop完全分布式集群搭建:
1)配置文件
1.hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_171
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
2.hdfs-core.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9820</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.1/tmp</value>
</property>
</configuration>
3.hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave1:9868</value>
</property>
</configuration>
4.workers
slave1
slave2
slave3
2)格式化文件系统
$ bin/hdfs namenode -format
3)启动集群
$ sbin/start-dfs.sh
4)可视化查看Hadoop集群:
master:9870
Hadoop-HA搭建
1)配置文件
1.hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_171
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_ZKFC_USER=root
export HDFS_JOURNALNODE_USER=root
2.hdfs-core.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.1/tmp</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>root</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>slave1:2181,slave2:2181,slave3:2181</value>
</property>
</configuration>
3.hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>master:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>slave1:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>master:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>slave1:9870</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;slave1:8485;slave2:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
4.workers
slave1
slave2
slave3
2)zookeeper集群搭建
zoo.cfg
tickTime=2000
dataDir=/opt/module/zookeeper-3.4.12/data
clientPort=2181
initLimit=5
syncLimit=2
server.1=slave1:2888:3888
server.2=slave2:2888:3888
server.3=slave3:2888:3888
项目主目录/data/myid 内容分别是[1,2,3]
3)每个zk节点上都执行:zkServer.sh start
看是否启动成功:zkServer.sh status
4)启动journalnode(每个journalnode节点都启动)
hdfs --daemon start journalnode
5)同步编辑日志
如果已有集群并且是单namenode
hdfs namenode -initializeSharedEdits(在已经format的namenode上执行)
hdfs --daemon start namenode
hdfs namenode -bootstrapStandby(没有format的namenode上执行)
如果是新建集群
hdfs namenode -format
hdfs --daemon start namenode
hdfs namenode -bootstrapStandby(没有format的namenode上执行)
6)格式化zookeeper并启动
$HADOOP_HOME/bin/hdfs zkfc -formatZK (在其中一台namenode节点上格式化即可)
$HADOOP_HOME/bin/hdfs --daemon start zkfc (两台zkfc(也就是namenode)节点都启动)
7)yarn搭建
yarn-env.sh
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>cluster1</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>slave2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>slave3</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>slave2:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>slave3:8088</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>slave1:2181,slave2:2181,slave3:2181</value>
</property>