简要记录MacOS本地大数据环境搭建,重要配置和启动相关命令
相关工具持续记录中...
1. brew安装软件
brew查看安装路径的命令,如:brew info hadoop
,brew -h
查看命令帮助
-
brew install hadoop
-
brew install hive
-
brew install apache-flink
-
brew install kafka
-
brew install zookeeper
2. hadoop
-
编辑
vim ./libexec/etc/hadoop/hadoop-env.sh
配置JAVA_HOME
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_241.jdk/Contents/Home
-
编辑
vim ./libexec/etc/hadoop/core-site.xml
配置NameNode的主机名和端口号:<configuration> <property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property> <!-- hdfs地址 --> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/Cellar/hadoop/hdfs/tmp</value> <description>A base for other temporary directories</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
-
编辑
vim ./libexec/etc/hadoop/hdfs-site.xml
变量dfs.replication指定了每个HDFS数据库的复制次数。 通常为3, 由于我们只有一台主机和一个伪分布式模式的DataNode,将此值修改为1
<configuration>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!-- 新加 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
-
命令
hdfs namenode -format
初次格式化,格式化hdfs操作只要第一次才使用,否则会造成数据全部丢失 -
命令
./sbin/start-dfs.sh
启动 NameNode 和 DataNode,[http://localhost:9870] -
命令
./start-yarn.sh
启动yarn服务:[http://localhost:8088/cluster]
也可以合并为
./start-all.sh
./stop-all.sh
3. hive
vim libexec/conf/hive-site.xml
用mysql保存元数据信息
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
</configuration>
-
复制mysql的驱动程序到 $HIVE_HOME/lib下面
-
终端输入
hive
命令启动
4. flink
- 命令
./bin/start-cluster.sh
启动flink,本地地址:http://localhost:8081/#/overview
5. kafka
-
配置
zookeeper
,配置文件地址:/usr/local/etc/zookeeper/
-
启动
zookeeper
服务
nohup zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties &
-
编辑
vim /usr/local/etc/kafka/server.properties
解除注释:listeners=PLAINTEXT://localhost:9092
-
启动Kafka服务
nohup kafka-server-start /usr/local/etc/kafka/server.properties &