• ETL from hadoop to vertica


    根据项目需要,我做了一个POC(proof of concept),XML TXT的数据从HADOOP

    引入到VERTICA.

    我采用的方案是pig,具体信息可以参加vertica官方的文档.

    Access hp vertica from pig:

    https://my.vertica.com/docs/6.1.x/HTML/index.htm#18525.htm

    我用virtualbox 在本机搭建了一个hadoop集群(apache version) ,从vertica 官网下载了一个

    虚拟机(开放文件格式的那版),然后导入到virtualbox,并且使这四台机器都在一个网段.

    我测试了把txt 和xml文件类型导入到vertica.

    还有就是从vertica load数据到hdfs.

    从vertica获取数据:

    A =LOAD 'sql://{select * from customer_dimension limit 100;}'
    
    USING com.vertica.pig.VerticaLoader('vertica','VMart','5433','dbadmin','password')

    把XML文件从HDFS ETL 到 vertica的pig latin 代码

    Register ' /home/hadoop/pig-0.12.0/contrib/piggybank/java/piggybank.jar;'
    
    REGISTER '/home/hadoop/pig-0.12.0/lib/vertica-jdk5-6.1.3-0.jar';
    
    REGISTER '/home/hadoop/pig-0.12.0/lib/pig-vertica.jar';
    
     
    
    xml = load '/tmp/xmldemo.xml' using
    
    org.apache.pig.piggybank.storage.XMLLoader('Step') as(step:chararray);
    
     
    
    rows = foreach xml GENERATE
    
    REGEX_EXTRACT(step,'StepName="(.*?)"',1) as stepname:chararray,
    
    REGEX_EXTRACT(step,'StepDescription="(.*?)"',1) as stepdesc:chararray,
    
    REGEX_EXTRACT(step,'TestDateTimeStarted="(.*?)"',1) as timestart:chararray,
    
    REGEX_EXTRACT(step,'GUIResponseTime="(.*?)"',1) as guitime:chararray,
    
    REGEX_EXTRACT(step,'StepNumber="(.*?)"',1) as stepnumber:chararray,
    
    REGEX_EXTRACT(step,'TestResult="(.*?)"',1) as result:chararray,
    
    REGEX_EXTRACT(step,'TestElapsedTimeSec="(.*?)"',1) as testtimetotal:chararray
    
     
    
    STORE rows INTO '{xmltable(stepname varchar,stepdesc varchar,timestart varchar,guiresponsetime varchar,stepnumber int,tsetresult varchar,totalseconds int)}'
    
    USING com.vertica.pig.VerticaStorer('vertica','VMart','5433', 'dbadmin','password');
    Looking for a job working at Home about MSBI
  • 相关阅读:
    Prometheus 基于文件的服务发现
    Prometheus 标签使用示例整合
    Prometheus 重新标签
    Prometheus 配置采集目标
    Prometheus 配置文件详解
    Prometheus 安装部署
    Prometheus 介绍详解
    Python urlib 模块
    ArcGIS Python编程案例-电子资料链接
    如何查看Python对象的属性
  • 原文地址:https://www.cnblogs.com/huaxiaoyao/p/3427495.html
Copyright © 2020-2023  润新知