• CentOS7 安装kylin2.6.0集群


    1. 环境准备

    zookeeper3.4.12

    mysql5.7

    hive2.3.4

    hadoop2.7.3

    JDK1.8

    hbase1.3.3

    2. 集群规划

    ip地址 机器名 角色
    192.168.1.101 palo101 hadoop namenode, hadoop datanode, yarn nodeManager, zookeeper, hive, hbase master,hbase region server,
    192.168.1.102 palo102 hadoop secondary namenode, hadoop datanode, yarn nodeManager,  yarn resource manager, zookeeper, hive, hbase master,hbase region server
    192.168.1.103 palo103 hadoop datanode, yarn nodeManager, zookeeper, hive,hbase region server,mysql

    3. 下载kylin2.6

    wget http://mirrors.tuna.tsinghua.edu.cn/apache/kylin/apache-kylin-2.6.0/apache-kylin-2.6.0-bin-hbase1x.tar.gz   #下载kylin2.6.0二进制文件
    tar -xzvf apache-kylin-2.6.0-bin-hbase1x.tar.gz          #解压kylin2.6.0二进制压缩包
    mv apache-kylin-2.6.0-bin apache-kylin-2.6.0             #将kylin解压过的文件重命名(去掉最后的bin)
    mkdir /usr/local/kylin/                                  #创建目标存放路径
    mv apache-kylin-2.6.0    /usr/local/kylin/               #将kylin2.6.0文件夹移动到/usr/local/kylin目录下

    4. 添加系统环境变量

    vim /etc/profile

    在文件末尾添加

    #kylin
    export KYLIN_HOME=/usr/local/kylin/apache-kylin-2.6.0
    export KYLIN_CONF_HOME=$KYLIN_HOME/conf
    export PATH=:$PATH:$KYLIN_HOME/bin:$CATALINE_HOME/bin
    export tomcat_root=$KYLIN_HOME/tomcat   #变量名小写
    export hive_dependency=$HIVE_HOME/conf:$HIVE_HOME/lib/*:$HCAT_HOME/share/hcatalog/hive-hcatalog-core-2.3.4.jar   #变量名小写

    :wq保存退出,并输入source /etc/profile使环境变量生效

    5. 配置kylin

    5.1 配置$KYLIN_HOME/bin/kylin.sh

    vim $KYLIN_HOME/bin/kylin.sh

    在文件开头添加

    export HBASE_CLASSPATH_PREFIX=${tomcat_root}/bin/bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*:$hive_dependency:$HBASE_CLASSPATH_PREFIX

    这么做的目的是为了加入$hive_dependency环境,解决后续的两个问题,都是没有hive依赖的原因:
    a) kylinweb界面load hive表会失败
    b) cube build的第二步会报org/apache/Hadoop/hive/conf/hiveConf的错误。

    5.2 hadoop压缩配置

    关于snappy压缩支持问题,如果支持需要事先重新编译Hadoop源码,使得native库支持snappy.使用snappy能够实现一个适合的压缩比,使得这个运算的中间结果和最终结果都能占用较小的存储空间
    本例的hadoop不支持snappy压缩,这个会导致后续cube build报错。

    vim  $KYLIN_HOME/conf/Kylin_job_conf.xml

    修改配置文件,将配置项mapreduce.map.output.compress,mapreduce.output.fileoutputformat.compress修改为false

        <property>
            <name>mapreduce.map.output.compress</name>
            <value>false</value>
            <description>Compress map outputs</description>
        </property>
        <property>
            <name>mapreduce.output.fileoutputformat.compress</name>
            <value>false</value>
            <description>Compress the output of a MapReduce job</description>
        </property>

    还有一个关于压缩的地方需要修改

    vim   $KYLIN_HOME/conf/kylin.properties

    将kylin.hbase.default.compression.codec设置为none或者注释掉

    #kylin.storage.hbase.compression-codec=none

    5.3 主配置$KYLIN_HOME/conf/kylin.properties

    vim   $KYLIN_HOME/conf/kylin.properties

    修改为:

    ## The metadata store in hbase
    ##hbase上存储kylin元数据
    kylin.metadata.url=kylin_metadata@hbase
    
    ## metadata cache sync retry times
    ##元数据同步重试次数
    kylin.metadata.sync-retries=3
    
    ## Working folder in HDFS, better be qualified absolute path, make sure user has the right permission to this directory
    ##hdfs上kylin工作目录
    kylin.env.hdfs-working-dir=/kylin
    
    ## kylin zk base path
    kylin.env.zookeeper-base-path=/kylin
    
    ## DEV|QA|PROD. DEV will turn on some dev features, QA and PROD has no difference in terms of functions.
    #kylin.env=DEV
    
    ## Kylin server mode, valid value [all, query, job]
    ##kylin主节点模式,从节点的模式为query,只有这一点不一样
    kylin.server.mode=all
    
    ## List of web servers in use, this enables one web server instance to sync up with other servers.
    ##集群的信息同步
    kylin.server.cluster-servers=192.168.1.131:7070,192.168.1.193:7070,192.168.1.194:7070
    
    
    ## Display timezone on UI,format like[GMT+N or GMT-N]
    ##改为中国时间
    kylin.web.timezone=GMT+8    
    
    ## Timeout value for the queries submitted through the Web UI, in milliseconds
    ##web查询超时时间(毫秒)
    kylin.web.query-timeout=300000
    
    ## Max count of concurrent jobs running
    ##可并发执行的job数量
    kylin.job.max-concurrent-jobs=10
    
    
    #### ENGINE ###
    ## Time interval to check hadoop job status
    ##检查hdfs job的时间间隔(秒)
    kylin.engine.mr.yarn-check-interval-seconds=10
    
    
    ## Hive database name for putting the intermediate flat tables
    ##build cube 产生的Hive中间表存放的数据库
    kylin.source.hive.database-for-flat-table=kylin_flat_db
    
    
    ## The percentage of the sampling, default 100%
    kylin.job.cubing.inmem.sampling.percent=100
    
    ## Max job retry on error, default 0: no retry
    kylin.job.retry=0
    
    ## Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4]
    ##不采用压缩
    kylin.storage.hbase.compression-codec=none  
     
    ## The cut size for hbase region, in GB.
    kylin.storage.hbase.region-cut-gb=5
    
    ## The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster.
    ## Set 0 to disable this optimization.
    kylin.storage.hbase.hfile-size-gb=2
    
    ## The storage for final cube file in hbase
    kylin.storage.url=hbase
    
    ## The prefix of hbase table
    kylin.storage.hbase.table-name-prefix=KYLIN_
    
    ## The namespace for hbase storage
    kylin.storage.hbase.namespace=default
    
    ###定义kylin用于MR jobs的job.jar包和hbase的协处理jar包,用于提升性能(添加项)
    kylin.job.jar=/usr/local/kylin/apache-kylin-2.6.0/lib/kylin-job-2.6.0.jar
    kylin.coprocessor.local.jar=/usr/local/kylin/apache-kylin-2.6.0/lib/kylin-coprocessor-2.6.0.jar

    5.4 将配置好的kylin复制到其他两台机器上去

    scp -r /usr/local/kylin/  192.168.1.102:/usr/local
    scp -r /usr/local/kylin/  192.168.1.103:/usr/local

    5.5 将192.168.1.102,192.168.1.103上的kylin.server.mode改为query

    vim   $KYLIN_HOME/conf/kylin.properties

    修改项为

    kylin.server.mode=query      ###kylin主节点模式,从节点的模式为query,只有这一点不一样

    6. 启动kylin

    6.1 前提条件:依赖服务先启动

    a) 启动zookeeper,所有节点运行

    $ZOO_KEEPER_HOME/bin/zkServer.sh start

    b) 启动hadoop,主节点运行

    $HADOOP_HOME/bin/start-all.sh

    c) 启动JobHistoryserver服务,master主节点启动.

    $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

    d) 启动hivemetastore服务

    nohup $HIVE_HOME/bin/hive --service metastore /dev/null 2>&1 &

    e) 启动hbase集群,主节点启动

    $HBASE_HOME/bin/start-hbase.sh

    启动后的进程为:

    192.168.1.101

    [root@palo101 apache-kylin-2.6.0]# jps
    62403 NameNode          #hdfs NameNode
    31013 NodeManager       #yarn NodeManager
    22325 Kafka      
    54217 QuorumPeerMain    #zookeeper
    7274 Jps
    62589 DataNode          #hadoop datanode
    28895 HRegionServer     #hbase region server
    8440 HMaster            #hbase master

    192.168.1.102

    [root@palo102 ~]# jps
    47474 QuorumPeerMain    #zookeeper
    15203 NodeManager       #yarn NodeManager
    15061 ResourceManager   #yarn ResourceManager
    49877 Jps
    6694 HRegionServer      #hbase region server
    7673 Kafka
    37517 SecondaryNameNode #hdfs SecondaryNameNode
    37359 DataNode          #hadoop datanode

    192.168.1.103

    [root@palo103 ~]# jps
    1185 RunJar             #hive metastore
    62404 NodeManager       #yarn NodeManager
    47365 HRegionServer     #hbase region server
    62342 QuorumPeerMain    #zookeeper
    20952 ManagerBootStrap  
    52440 Kafka
    31801 RunJar            #hive thrift server
    47901 DataNode          #hadoop datanode
    36494 Jps

    6.2 检查配置是否正确

     $KYLIN_HOME/bin/check-env.sh
    [root@palo101 bin]#  $KYLIN_HOME/bin/check-env.sh
    Retrieving hadoop conf dir...
    KYLIN_HOME is set to /usr/local/kylin/apache-kylin-2.6.0
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

    hive依赖检查find-hive-dependency.sh
    hbase依赖检查find-hbase-dependency.sh
    所有的依赖检查可吃用check-env.sh

    6.3 所有节点运行下面命令来启动kylin

     $KYLIN_HOME/bin/kylin.sh start

     启动时如果出现下面的错误

    Failed to find metadata store by url: kylin_metadata@hbase

    解决办法 为:

    1)将$HBASE_HOME/conf/hbase-site.html的属性hbase.rootdir改成与$HADOOP_HOME/etc/hadoop/core-site.xml中的属性fs.defaultFS一致

    2)进入zk的bin的zkCli,将/hbase删除,然后重启hbase可以解决

    6.4 登录kylin

    http://192.168.1.101:7070/kylin, 其他几台也可以登录,只要切换相应的ip即可

    默认登录名密码为:admin/KYLIN

     登录后的主页面为:

    7 FAQ

    7.1 如果遇到类似下面的错误

    WARNING: Failed to process JAR
    [jar:file:/home/hadoop-2.7.3/contrib/capacity-scheduler/.jar!/] for

    这个问题只是一些小bug问题把这个脚本的内容改动一下就好了${HADOOP_HOME}/etc/hadoop/hadoop-env.sh,把下面的这一段循环语句给注释掉

     vim ${HADOOP_HOME}/etc/hadoop/hadoop-env.sh
    #for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
    #  if [ "$HADOOP_CLASSPATH" ]; then
    #    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
    #  else
    #    export HADOOP_CLASSPATH=$f
    #  fi
    #done

     7.2 如果遇到Caused by: java.lang.ClassCastException: com.fasterxml.jackson.datatype.joda.JodaModule cannot be cast to com.fasterxml.jackson.databind.Module的错误

     产生这个问题的原因是hive使用的jackson-datatype-joda-2.4.6.jar,而kylin使用的是jackson-databind-2.9.5.jar,jar包版本不一致造成的。

    hive:

    kylin:

    解决办法为:

    mv $HIVE_HOME/lib/jackson-datatype-joda-2.4.6.jar $HIVE_HOME/lib/jackson-datatype-joda-2.4.6.jarback

    即不使用hive的这个jar包,详情请参见https://issues.apache.org/jira/browse/KYLIN-3129

    7.3 如果遇到Failed to load keystore type JKS with path conf/.keystore due to (No such file or directory)

    解决办法为:

    打开apache-kylin-2.6.0/tomcat/conf/server.xml文件,把其中的https的配置删除掉(或者注释掉)

            <!--
            <Connector port="7443" protocol="org.apache.coyote.http11.Http11Protocol"
                       maxThreads="150" SSLEnabled="true" scheme="https" secure="true"
                       keystoreFile="conf/.keystore" keystorePass="changeit"
                       clientAuth="false" sslProtocol="TLS" />
             -->

    8. 简单使用入门

    8.1 执行官方发布的样例数据

     $KYLIN_HOME/bin/sample.sh

    如果出现Restart Kylin Server or click Web UI => System Tab => Reload Metadata to take effect,就说明示例cube创建成功了,如图:

    8.2 重启kylin或者重新加载元数据让数据生效

    本例中选择重新加载元数据,操作如图所示

    8.3 进入hive,查看kylin cube表结构

    $HIVE_HOME/bin/hive        #进入hive shell客户端
    hive>show databases;       #查询hive中数据库列表
    hive>use kylin_flat_db;    #切换到kylin的hive数据库
    hive>show tables;          #查询kylin hive数据库中的所有表

    输入如下:

    [druid@palo101 kafka_2.12-2.1.0]$ $HIVE_HOME/bin/hive
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/workspace/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/workspace/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in file:/home/workspace/apache-hive-2.3.4-bin/conf/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> show databases;
    OK
    default
    dw_sales
    kylin_flat_db
    ods_sales
    Time taken: 1.609 seconds, Fetched: 4 row(s)
    hive> use kylin_flat_db;
    OK
    Time taken: 0.036 seconds
    hive> show tables;
    OK
    kylin_account
    kylin_cal_dt
    kylin_category_groupings
    kylin_country
    kylin_sales
    Time taken: 0.321 seconds, Fetched: 5 row(s)
    hive> 

    再来看hbase

    [druid@palo101 kafka_2.12-2.1.0]$ hbase shell
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
    HBase Shell; enter 'help<RETURN>' for list of supported commands.
    Type "exit<RETURN>" to leave the HBase Shell
    Version 1.3.3, rfd0d55b1e5ef54eb9bf60cce1f0a8e4c1da073ef, Sat Nov 17 21:43:34 CST 2018
    
    hbase(main):001:0> list
    TABLE                                                                                                                                                                                                       
    dev                                                                                                                                                                                                         
    kylin_metadata                                                                                                                                                                                              
    test                                                                                                                                                                                                        
    3 row(s) in 0.3180 seconds
    
    => ["dev", "kylin_metadata", "test"]

    hbase中多了个叫kylin_metadata的表,说明使用官方示例数据的cube已经创建成功了!

    8.4 构建cube

    刷新http://192.168.1.101:7070/kylin,我们发现多了个项目learn_kylin

     选择kylin_sales_model,进行构建

    可以在monitor里查看构建的进度

     Build成功之后model里面会出现storage信息,之前是没有的,可以到hbase里面去找对应的表,同时cube状态变为ready,表示可查询。

    8.5 kylin中进行查询

    至此,kylin集群部署结束。

  • 相关阅读:
    linux异步信号handle浅析
    数据库的基本操作增删改查
    POJ1789Truck History最小生成树两种做法(Kruskal+Prim)模板题
    POJ1113Wall求凸包周长
    POJ3565AntsKM变形
    HDU2150Pipe判断线段是否相交
    POJ1815Friendship最大流最小割点+拆点+枚举
    HDU3081 Marriage Match II 最大匹配+并查集+匈牙利算法
    POJ3348Cows求凸包面积
    HDU3277Marriage Match III并查集+二分+最大流
  • 原文地址:https://www.cnblogs.com/lenmom/p/10329956.html
Copyright © 2020-2023  润新知