• Hive数据导入Elasticsearch


    Elasticsearch Jar包准备

    所有节点导入elasticsearch-hadoop-5.5.1.jar

    /opt/cloudera/parcels/CDH-5.12.0-1.cdh5.12.0.p0.29/lib/hive/lib/elasticsearch-hadoop-5.5.1.jar

     

    HDFS导入数据准备

    hdfs dfs -ls /user/logb/464/part-r-00000

     

    进入HIVE shell 执行

    引用Elasticsearch jar包进行hive界面

    hive -hiveconf hive.aux.jars.path=file:///usr/local/elasticsearch/elasticsearch-hadoop-5.5.1.jar

    创建与Elasticsearch对接log_apache_seo_d1外部表

    create external table log_apache_seo_d1 (ipaddress string,uniqueid string,url string, sessionid string ,sessiontimes string, areaaddress string ,localaddress string , browsertype string,operationsys string,refeurl string , receivetime string ,userid string ) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource' = 'radiott/artiststt','es.index.auto.create' = 'true','es.nodes' = 'node4','es.port' = '9200');

     

    创建源数据表log_apache_seo_source_d1

    CREATE TABLE log_apache_seo_source_d1 (ipaddress string,uniqueid string,url string, sessionid string ,sessiontimes string, areaaddress string ,localaddress string , browsertype string,operationsys string,refeurl string , receivetime string ,userid string )  row format delimited fields terminated by ' ' stored as textfile;

     

    加载MR结果到HIVE

    load data inpath '/user/logb/464/part-r-00000' into table log_apache_seo_source_d1 ;

     

    将HIVE数据加载到Elasticsearch所需表中

    insert overwrite table log_apache_seo_d1 select s.ipaddress,s.uniqueid,s.url,s.sessionid,s.sessiontimes,s.areaaddress,s.localaddress,s.browsertype,s.operationsys,s.refeurl,s.receivetime,s.userid from  log_apache_seo_source_d1 s;

     

    编写shell脚本

    #!/bin/sh

    # upload logs to hdfs

    hive -e "

    set hive.enforce.bucketing=true;

    set hive.exec.compress.output=true;

    set mapred.output.compress=true;

    set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;

    set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec;

    load data inpath '/user/logb/464/part-r-00000' into table log_apache_seo_source_d1 ;

    "

    执行脚本任务

    0 */2 * * * /opt/bin/hive_opt/crontab_import.sh  

  • 相关阅读:
    POJ 3267 The Cow Lexicon(动态规划)
    POJ 1125 Stockbroker Grapevine(最短路径Floyd算法)
    HDU 2374 || SDUT2386 A Game with Marbles(简单题)
    JavaScript之scrollTop、scrollHeight、offsetTop、offsetHeight等属性学习笔记
    基于SNMP的MIB库访问实现的研究
    一个兼容大多数浏览器 的 图片滚动的js
    C#获取本地计算机名,IP,MAC地址,硬盘ID
    中文首字母搜素的实现 sql函数
    xml文档的加密与解密
    修改Windows 2003 server远程桌面端口3389
  • 原文地址:https://www.cnblogs.com/lingluo2017/p/8710961.html
Copyright © 2020-2023  润新知