• 离线电商数仓(六十七)之数据质量监控(三)Griffin(四) 安装及使用(二)


    2.3 直接使用编译好的 Griffin 包(选择)

    2.3.1 修改 jar 配置文件
    Griffin编译完成后,会在Service和Measure模块的target目录下分别看到service-0.6.0.jar
    和 measure-0.6.0.jar 两个 jar 包。因为我们使用的是直接编译好的 jar 包,所以需要将
    service-0.6.0.jar 中的配置文件修改成与环境一致。
    1)使用 WinRaR 等解压工具打开 service-0.6.0.jar(注意:是打开不是解压)
    2)修改 BOOT-INF/classes/application.properties
    # Apache Griffin 应用名称 spring.application.name=griffin_service 
    # MySQL 数据库配置信息 spring.datasource.url=jdbc:mysql://hadoop102:3306/quartz?autoR econnect=true&useSSL=false 
    spring.datasource.username=root 
    spring.datasource.password=123456 
    spring.jpa.generate-ddl=true 
    spring.datasource.driver-class-name=com.mysql.jdbc.Driver 
    spring.jpa.show-sql=true 
    # Hive metastore 配置信息 hive.metastore.uris=thrift://hadoop102:9083 
    hive.metastore.dbname=default 
    hive.hmshandler.retry.attempts=15 
    hive.hmshandler.retry.interval=2000ms 
    # Hive cache time 
    cache.evict.hive.fixedRate.in.milliseconds=900000 
    # Kafka schema registry 按需配置 
    kafka.schema.registry.url=http://hadoop102:8081 
    # Update job instance state at regular intervals 
    jobInstance.fixedDelay.in.milliseconds=60000 
    # Expired time of job instance which is 7 days that is 604800000 milliseconds.Time unit only supports milliseconds 
    jobInstance.expired.milliseconds=604800000 
    # schedule predicate job every 5 minutes and repeat 12 times at most 
    #interval time unit s:second m:minute h:hour d:day,only support these four units 
    predicate.job.interval=5m 
    predicate.job.repeat.count=12 
    # external properties directory location 
    external.config.location=
    # external BATCH or STREAMING env 
    external.env.location= 
    # login strategy ("default" or "ldap") 
    login.strategy=default 
    # ldap 
    ldap.url=ldap://hostname:port ldap.email=@example.com ldap.searchBase=DC=org,DC=example 
    ldap.searchPattern=(sAMAccountName={0}) 
    # hdfs default name
    fs.defaultFS= 
    # elasticsearch 
    elasticsearch.host=hadoop102
    elasticsearch.port=9200
    elasticsearch.scheme=http
    # elasticsearch.user = user # elasticsearch.password = password # livy livy.uri=http://hadoop102:8998/batches # yarn url yarn.uri=http://hadoop103:8088 # griffin event listener internal.event.listeners=GriffinJobEventHook
    2)修改 BOOT-INF/classes/sparkProperties.json
    { 
    "file": "hdfs://hadoop102:9000/griffin/griffin-measure.jar", 
    "className": "org.apache.griffin.measure.Application", 
    "name": "griffin", 
    "queue": "default", 
    "numExecutors": 2, 
    "executorCores": 1, 
    "driverMemory": "1g", 
    "executorMemory": "1g", 
    "conf": { 
    "spark.yarn.dist.files": 
    "hdfs://hadoop102:9000/home/spark_conf/hive-site.xml" },"files": [ ]
    }
    3)修改 BOOT-INF/classes/hive-site.xml
    <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 
    <configuration> 
    <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfN otExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> 
    <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> 
    <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> <description>username to use against metastore database</description> </property>
    
    <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description> </property> 
    <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property> 
    <property> <name>hive.cli.print.header</name> <value>true</value> </property> 
    <property> <name>hive.cli.print.current.db</name> <value>true</value> </property> 
    <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> 
    <property> <name>datanucleus.schema.autoCreateAll</name> <value>true</value> </property>
     <!-- <property> <name>hive.execution.engine</name> <value>tez</value> </property> -->
    <property> <name>hive.metastore.uris</name> <value>thrift://hadoop102:9083</value> </property> 
    </configuration>
    4)修改 BOOT-INF/classes/application-mysql.properties
    #Data Access Properties 
    spring.datasource.url=jdbc:mysql://192.168.1.102:3306/quartz?a utoReconnect=true&useSSL=false 
    spring.datasource.username=root 
    spring.datasource.password=123456 
    spring.jpa.generate-ddl=true 
    spring.datasource.driver-class-name=com.mysql.jdbc.Driver
    spring.jpa.show-sql=true 
    spring.jpa.hibernate.ddl-auto=update
    5)修改 BOOT-INF/classes/env/env_batch.json
    { 
    "spark": { "log.level": "INFO" },
    "sinks": [ 
      { "type": "CONSOLE", "config": { "max.log.lines": 10 } },
      { "type": "HDFS", "config": { "path": "hdfs://hadoop102:9000/griffin/persist", "max.persist.lines": 10000, "max.lines.per.file": 10000 } },
      { "type": "ELASTICSEARCH", "config": { "method": "post", "api": "http://hadoop102:9200/griffin/accuracy", "connection.timeout": "1m", "retry": 10 } } ], "griffin.checkpoint": [] }
    6)修改 BOOT-INF/classes/env/env_streaming.json
    { 
    "spark": { "log.level": "WARN", "checkpoint.dir": "hdfs:///griffin/checkpoint/${JOB_NAME}", "init.clear": true, "batch.interval": "1m", "process.interval": "5m", "config": { "spark.default.parallelism": 4, "spark.task.maxFailures": 5, "spark.streaming.kafkaMaxRatePerPartition": 1000, "spark.streaming.concurrentJobs": 4, "spark.yarn.maxAppAttempts": 5, "spark.yarn.am.attemptFailuresValidityInterval": "1h", "spark.yarn.max.executor.failures": 120, "spark.yarn.executor.failuresValidityInterval": "1h", "spark.hadoop.fs.hdfs.impl.disable.cache": true } },
    "sinks": [ { "type": "CONSOLE", "config": {
    "max.log.lines": 100 } },{ "type": "HDFS", "config": { "path": "hdfs://hadoop102:9000/griffin/persist", "max.persist.lines": 10000, "max.lines.per.file": 10000 } },{ "type": "ELASTICSEARCH", "config": { "method": "post", "api": "http://hadoop102:9200/griffin/accuracy" } } ],
    "griffin.checkpoint": [ { "type": "zk", "config": { "hosts": "zk:2181", "namespace": "griffin/infocache", "lock.path": "lock", "mode": "persist", "init.clear": true, "close.clear": false } } ] }
    2.4 上传执行 Griffin
    2.4.1 修改名称并上传 HDFS
    命令执行完成后,会在 Service 和 Measure 模块的 target 目录下分别看到 service-0.6.0.jar
    和 measure-0.6.0.jar 两个 jar 包。
    1)修改/opt/module/griffin-master/measure/target/measure-0.6.0-SNAPSHOT.jar 名称
    [atguigu@hadoop102 measure]$ mv measure-0.6.0-SNAPSHOT.jar griffin-measure.jar
    2)上传 griffin-measure.jar 到 HDFS 文件目录里
     
    [atguigu@hadoop102 measure]$ hadoop fs -mkdir /griffin/ [atguigu@hadoop102 measure]$ hadoop fs -put griffin-measure.jar /griffin/
    注意:这样做的目的主要是因为 Spark 在 YARN 集群上执行任务时,需要到 HDFS 的/griffin 目录下加载 griffin-measure.jar,避免发生类 org.apache.griffin.measure.Application 找不到的错误。
    3)上传 hive-site.xml 文件到 HDFS 的/home/spark_conf/路径
    [atguigu@hadoop102 ~]$ hadoop fs -mkdir -p /home/spark_conf/ 
    [atguigu@hadoop102 ~]$ hadoop fs -put /opt/module/hive/conf/hive-site.xml /home/spark_conf/
    2.4.2 执行 Griffin
    1)确保其他服务已经启动
    ① 启动 HDFS & YARN :
    [atguigu@hadoop102 module]$ /opt/module/hadoop-2.7.2/sbin/start-dfs.sh 
    [atguigu@hadoop103 module]$ /opt/module/hadoop-2.7.2/sbin/start-yarn.sh
    ② 启动 elasticsearch 服务:
    [atguigu@hadoop102 module]$ nohup /opt/module/elasticsearch-5.2.2/bin/elasticsearch &
    ③ 启动 hive 服务:
    [atguigu@hadoop102 hive]$ nohup /opt/module/hive/bin/hive --service metastore & 
    [atguigu@hadoop102 hive]$ nohup /opt/module/hive/bin/hive --service hiveserver2 &
    ④ 启动 livy 服务:
    [atguigu@hadoop102 livy]$ /opt/module/livy/bin/livy-server start
    2)进入到/opt/module/griffin-master/service/target/路径,运行 service-0.6.0-SNAPSHOT.jar
    控制台启动:控制台打印信息
    [atguigu@hadoop102 target]$ java -jar /opt/module/griffin/service-0.6.0-SNAPSHOT.jar
    后台启动:启动后台并把日志归写倒 service.out
    [atguigu@hadoop102 ~]$ nohup java -jar service-0.6.0-SNAPSHOT.jar>service.out 2>&1 &
    2.4.3 浏览器访问
    http://hadoop102:8080 默认账户和密码都是无

    本文来自博客园,作者:秋华,转载请注明原文链接:https://www.cnblogs.com/qiu-hua/p/13947941.html

  • 相关阅读:
    微信开发笔记:修改公众号自定义菜单
    微信开发笔记:公众号获取access_token
    微信开发笔记:微信浏览器分享设置以及回调
    HTML5开发笔记:初窥CANVAS,上传canvas图片到服务器
    求解分组问题(百度面试题)
    Python求解啤酒问题(携程2016笔试题)
    Fiddler教程【转】
    求解朋友关系中的朋友圈数量
    HTTP协议详解【转】
    求解暗黑字符串(网易2017秋招)
  • 原文地址:https://www.cnblogs.com/qiu-hua/p/13947941.html
Copyright © 2020-2023  润新知