• CHD4 impala安装配置


    impala基于CHD,提供针对HDFS,hbase的实时查询,查询语句类似于hive
    包括几个组件
    Clients:提供Hue, ODBC clients, JDBC clients, and the Impala Shell与impala交互查询
    Hive Metastore:保存数据的元数据,让impala知道数据的结构等信息
    Cloudera Impala:协调查询在每个datanode上,分发并行查询任务,并将查询返回客户端
    HBase and HDFS:存储数据


    环境
    hadoop-2.0.0-cdh4.1.2
    hive-0.9.0-cdh4.1.2
    impala利用yum安装
    增加yum库
    [cloudera-impala]
    name=Impala
    baseurl=http://archive.cloudera.com/impala/redhat/5/x86_64/impala/1/
    gpgkey = http://archive.cloudera.com/impala/redhat/5/x86_64/impala/RPM-GPG-KEY-cloudera
    gpgcheck = 1
    加至/etc/yum.repos.d目录下


    注意cdh与hive及impala需要版本匹配,具体去impala官网去查一下
    需要内存比较大,需要64位机器(推荐有点忘了是否支持32位),支持的linux版本也有要求
    http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/PDF/Installing-and-Using-Impala.pdf
    安装CHD4
    http://archive.cloudera.com/cdh4/cdh/4/
    cdh与hive都可以在这找到


    三台机器
    master安装 namenode,secondnamenode,ResourceManager, impala-state-store,impala-shell,hive
    slave1安装 datanode,nodemanager,impala-server, impala-shell
    slave2安装  datanode,nodemanager,impala-server, impala-shell


    hadoop配置
    在master机器上配置
    $HADOOP_HOME/etc/hadoop中的core-site.xml增加


    <property>
     <name>io.native.lib.available</name>
     <value>true</value>
    </property>
    <property>
     <name>fs.default.name</name>
     <value>hdfs://master:9000</value>
     <description>The name of the default file system.Either theliteral string "local" or a host:port for NDFS.</description>
     <final>true</final>
    </property>


    $HADOOP_HOME/etc/hadoop中的hdfs-site.xml增加


    <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:/home/hadoop/cloudera/hadoop/dfs/name</value>
     <description>Determines where on the local filesystem the DFS namenode should store the name table.If this is a comma-delimited list ofdirectories,then name table is replicated in all of the directories,forredundancy.</description>
     <final>true</final>
    </property>
    <property>
     <name>dfs.datanode.data.dir</name>
     <value>file:/home/hadoop/cloudera/hadoop/dfs/data</value>
     <description>Determines where on the local filesystem an DFS datanode should store its blocks.If this is a comma-delimited list ofdirectories,then data will be stored in all named directories,typically ondifferent devices.Directories that do not exist are ignored.
      </description>
     <final>true</final>
    </property>
    <property>
        <name>dfs.http.address</name>
        <value>fca-vm-arch-proxy1:50070</value>
    </property>
    <property>
     <name>dfs.replication</name>
     <value>2</value>
    </property>
    <property>
       <name>dfs.secondary.http.address</name>
       <value>fca-vm-arch-proxy1:50090</value>
     </property>
    <property>
     <name>dfs.permission</name>
     <value>false</value>
    </property>


    $HADOOP_HOME/etc/hadoop中的mapred-site.xml增加


    <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
    </property>
    <property>
     <name>mapreduce.job.tracker</name>
     <value>hdfs://fca-vm-arch-proxy1:9001</value>
     <final>true</final>
    </property>
    <property>
     <name>mapreduce.map.memory.mb</name>
     <value>1536</value>
    </property>
    <property>
     <name>mapreduce.map.java.opts</name>
     <value>-Xmx1024M</value>
    </property>
    <property>
     <name>mapreduce.reduce.memory.mb</name>
     <value>3072</value>
    </property>
    <property>
     <name>mapreduce.reduce.java.opts</name>
     <value>-Xmx2560M</value>
    </property>
    <property>
     <name>mapreduce.task.io.sort.mb</name>
     <value>512</value>
    </property>
    <property>
     <name>mapreduce.task.io.sort.factor</name>
     <value>100</value>
    </property>
    <property>
     <name>mapreduce.reduce.shuffle.parallelcopies</name>
     <value>50</value>
    </property>


    $HADOOP_HOME/etc/hadoop/hadoop-env.sh增加
    export JAVA_HOME=/jdk1.6.0_22


    系统环境变量
    $HOME/.bash_profile增加
    export JAVA_HOME=/jdk1.6.0_22
    export JAVA_BIN=${JAVA_HOME}/bin
    export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    export HADOOP_HOME=/home/hadoop/cloudera/hadoop-2.0.0-cdh4.1.2
    export HADOOP_MAPRED_HOME=${HADOOP_HOME}
    export HADOOP_COMMON_HOME=${HADOOP_HOME}
    export HADOOP_HDFS_HOME=${HADOOP_HOME}
    export HADOOP_YARN_HOME=${HADOOP_HOME}
    export PATH=$PATH:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${HIVE_HOME}/bin
    export JAVA_HOME JAVA_BIN PATH CLASSPATH JAVA_OPTS
    export HADOOP_LIB=${HADOOP_HOME}/lib
    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop


    source $HOME/.bash_profile使变量生效


    yarn配置


    $HADOOP_HOME/etc/hadoop/yarn-site.xml中增加


    <property>
     <name>yarn.resourcemanager.address</name>
     <value>fca-vm-arch-proxy1:9002</value>
    </property>
    <property>
     <name>yarn.resourcemanager.scheduler.address</name>
     <value>fca-vm-arch-proxy1:9003</value>
    </property>
    <property>
     <name>yarn.resourcemanager.resource-tracker.address</name>
     <value>fca-vm-arch-proxy1:9004</value>
    </property>
    <property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce.shuffle</value>
    </property>
    <property>
     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>


    $HADOOP_HOME/etc/hadoop/slaves增加
    slave1
    slave2


    将master上的CHD目录及.bash_profile拷贝到slave1,slave2,并配置环境变量,并且配置好ssh无密码登陆,网上很多不详细说了


    启动hdfs和yarn


    以上步骤都执行完成后,用hadoop用户登录到master机器依次执行:
    hdfs namenode -format
    start-dfs.sh
    start-yarn.sh
    通过jps命令查看:
    master成功启动了NameNode、ResourceManager、SecondaryNameNode进程;
    slave1,slave2成功启动了DataNode、NodeManager进程。


    hive安装
    hive只需要在master上安装因为impala-state-store需要hive读取元数据,hive又依赖于关系统型数据库(mysql)所以安装mysql
    下载hive
    http://archive.cloudera.com/cdh4/cdh/4/


    解压缩hive


    $HOME/.bash_profile增加
    export HIVE_HOME=/home/hadoop/hive-0.9.0-cdh4.1.2
    export PATH=$PATH:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${HIVE_HOME}/bin
    export HIVE_CONF_DIR=$HIVE_HOME/conf
    export HIVE_LIB=$HIVE_HOME/lib


    source $HOME/.bash_profile使环境变量生效
    在hive/lib目录下加入mysql-connector-java-5.1.8.jar


    $HIVE_HOME/conf/hive.site.xml增加
    <property>
      <name>hive.metastore.uris</name>
      <value>thrift://master:9083</value>
      <description>Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.</description>
    </property>
    <property>
    <name>hive.metastore.local</name>
    <value>false</value>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:mysql://master:3306/hive?createDatabaseIfNoExist=true</value>
      <description>JDBC connect string for a JDBC metastore</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
      <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>root</value>
      <description>username to use against metastore database</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>password</value>
      <description>password to use against metastore database</description>
    </property>
    <property>
      <name>hive.security.authorization.enabled</name>
      <value>false</value>
      <description>enable or disable the hive client authorization</description>
    </property>
    <property>
      <name>hive.security.authorization.createtable.owner.grants</name>
      <value>ALL</value>
      <description>the privileges automatically granted to the owner whenever a table gets created.
       An example like "select,drop" will grant select and drop privilege to the owner of the table</description>
    </property>
    <property>
    <name>hive.querylog.location</name>
    <value>${user.home}/hive-logs/querylog</value>
    </property>
    由于hive metstore我们是安装在远程节点上的所以hive.metastore.local是false
    hive.metastore.uris设置远程连接metstore


    验证成功状态
    完成以上步骤之后,验证hive安装是否成功


    在master命令行执行hive,并输入”show tables;”,出现以下提示,说明hive安装成功:
    >hive
    hive> show databases;
    ok
    Time taken: 18.952 seconds


    impala安装


    master上安装 impala-state-store
    sudo yum install impala-state-store
    master上安装 impala-shell
    sudo yum install impala-shell


    配置impala
    修改/etc/default/impala


    IMPALA_STATE_STORE_HOST=192.168.200.114
    IMPALA_STATE_STORE_PORT=24000
    IMPALA_BACKEND_PORT=22000
    IMPALA_LOG_DIR=/var/log/impala


    IMPALA_STATE_STORE_ARGS=" -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}"
    IMPALA_SERVER_ARGS="
        -log_dir=${IMPALA_LOG_DIR}
        -state_store_port=${IMPALA_STATE_STORE_PORT}
        -use_statestore
        -state_store_host=${IMPALA_STATE_STORE_HOST}
        -be_port=${IMPALA_BACKEND_PORT}"


    ENABLE_CORE_DUMPS=false


     LIBHDFS_OPTS=-Djava.library.path=/usr/lib/impala/lib
     MYSQL_CONNECTOR_JAR=/home/hadoop/cloudera/hive/hive-0.9.0-cdh4.1.2/lib/mysql-connector-java-5.1.8.jar
     IMPALA_BIN=/usr/lib/impala/sbin
     IMPALA_HOME=/usr/lib/impala
     HIVE_HOME=/home/hadoop/cloudera/hive/hive-0.9.0-cdh4.1.2
    # HBASE_HOME=/usr/lib/hbase
     IMPALA_CONF_DIR=/usr/lib/impala/conf
     HADOOP_CONF_DIR=/usr/lib/impala/conf
     HIVE_CONF_DIR=/usr/lib/impala/conf
    # HBASE_CONF_DIR=/etc/impala/conf


    拷贝hadoop的core-site.xml,hdfs-site.xml,hive的hive-site.xml到/usr/lib/impala/conf中
    core-site.xml增加


    <property>
       <name>dfs.client.read.shortcircuit</name>
       <value>true</value>
    </property>
    <property>
    <name>dfs.client.read.shortcircuit.skip.checksum</name>
    <value>false</value>
    </property>


    hdfs-site.xml增加,hadoop的hdfs-site.xml也增加


    <property>
       <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
       <value>true</value>
    </property>
    <property>
       <name>dfs.datanode.data.dir.perm</name>
       <value>750</value>
    </property>
    <property>
       <name>dfs.block.local-path-access.user</name>
       <value>hadoop</value>
    </property>
    <property>
      <name>dfs.client.read.shortcircuit</name>
      <value>true</value>
    </property>
    <property>
        <name>dfs.client.file-block-storage-locations.timeout</name>
        <value>3000</value>
    </property>
    <property>
    <name>dfs.client.use.legacy.blockreader.local</name>
    <value>true</value>
    </property>


    拷贝mysql-connector-java-5.1.8.jar到/usr/lib/impala/lib下
    拷贝mysql-connector-java-5.1.8.jar到/var/lib/impala下
    拷贝/usr/lib/impala/lib/*.so*到$HADOOP_HOME/lib/native/
    slave1,slave2上安装
    sudo yum install impala
    sudo yum install impala-server
    sudo yum install impala-shell
    master上的hive-site.xml,core-site.xml,hdfs-site.xml拷贝到slaver1,slaver2上,jar的拷贝与master一致


    启动hive metastore
    在master上执行hive --service metastore


    启动impala statestore
    在master上执行statestored -log_dir=/var/log/impala -state_store_port=24000


    在slave1,slave2上启动impalad
    sudo /etc/init.d/impala-server start


    impala查看/var/log/impala/statestored.INFO是否成功 statestored.ERROR查看错误
    注意先要在master启动hive metastore,impala statesored,再在slave1,slave2启动impalad-server


    测试是否成功


    master上执行
    impala-shell
    [Not connected] >connect slave1;
    [slave1:21000] > use hive;
    Query: use hive
    [slave1:21000] >show tables;
    ok
    没有错误说成功
    如果在slave1上插入数据需要在slave2上refresh 表名才能同步数据,而不是网上说的refresh,后面必须加表名。
    如果不是shell执行操作,应该可以同步数据,没测试过。


    注意事项

    impala在插入数据时可以会出错

    hdfsOpenFile(hdfs://fmaster:9000/user/hive/warehouse/test/.2038125373027453036......

    是权限问题,因为我们是用sudo (root用户)启动的impala但是test表hadoop用户有增删改查的权限,但是root没有

    解决方法

    hdfs dfs -chmod -R 777 /user/hive/warehouse/test

























  • 相关阅读:
    Zabbix通过进程名监控进程状态配置详解
    kibana 统计field所有值百分比
    使用Logstash filter grok过滤日志文件
    python 修改文件内容
    清理elasticsearch的索引
    zabbix3.2.1安装graphtrees插件
    snmpwalk用法
    Zabbix通过SNMPv2监控DELL服务器的硬件信息
    zabbix上的宏(macro)介绍
    解决TeamViewer无法按给定网络地址联系伙伴
  • 原文地址:https://www.cnblogs.com/jiangu66/p/3228656.html
Copyright © 2020-2023  润新知