• hive多分隔符支持


    1.问题描述

    如何将多个字符作为字段分割符的数据文件加载到Hive表中,事例数据如下:

      字段分隔符为“@#$”

    test1@#$test1name@#$test2value
    test2@#$test2name@#$test2value
    test3@#$test3name@#$test4value

    如何将上述事例数据加载到Hive表(multi_delimiter_test)中,表结构如下:

    字段名

    字段类型

    s1 string
    s2 string
    s3 string

    2.Hive多分隔符支持

    Hive在0.14及以后版本支持字段的多分隔符,参考https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe

    3.实现步骤

    1.准备多分隔符文件并装载到HDFS对应目录

    [ec2-user@ip-172-31-8-141  ~]$ cat multi_delimiter_test.dat
    
    test1@#$test1name@#$test2value
    
    test2@#$test2name@#$test2value
    
    test3@#$test3name@#$test4value  
    
    [ec2-user@ip-172-31-8-141  ~]$ hadoop dfs -put multi_delimiter_test.dat /fayson/multi_delimiter_test
    
    [ec2-user@ip-172-31-8-141  ~]$ hadoop dfs -ls /fayson/multi_delimiter_test
    
    DEPRECATED: Use of this  script to execute hdfs command is deprecated.
    
    Instead use the hdfs  command for it.
    
    
    Found 1 items
    
    -rw-r--r--   3 user_r supergroup         93 2017-08-23 03:24  /fayson/multi_delimiter_test/multi_delimiter_test.dat
    
    [ec2-user@ip-172-31-8-141  ~]$

    2.基于准备好的多分隔符文件建表

    create  external table multi_delimiter_test(
    
    s1 string,
    
    s2 string,
    
    s3 string)
    
    ROW FORMAT  SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH  SERDEPROPERTIES ("field.delim"="@#$")
    
    stored as  textfile location '/fayson/multi_delimiter_test';

    3.测试

    >  select * from multi_delimiter_test;
    +--------------------------+--------------------------+--------------------------+--+
    |  multi_delimiter_test.s1  |  multi_delimiter_test.s2  |  multi_delimiter_test.s3  |
    +--------------------------+--------------------------+--------------------------+--+
    | test1                    | test1name                | test2value               |
    | test2                    | test2name                | test2value               |
    | test3                    | test3name                | test4value               |
    +--------------------------+--------------------------+--------------------------+--+

    字段名

    字段类型

    s1

    String

    s2

    String

  • 相关阅读:
    Cesium加载Geowebcache切片
    Vue开发--脚手架的搭建
    OpenLayers动态测量距离和面积,并可自定义测量的线样式
    OpenLayers要素拖拽
    改造SuperMap的DrawHandler接口,自定义绘制的图形样式
    Cesium动态绘制实体(点、标注、面、线、圆、矩形)
    ArcMap制图遇到的小问题
    GeoServer 2.15.2版本跨域问题解决方法
    MySQL 8.0 主从同步
    Service__cmd--MySQL安装并连接SQLyog
  • 原文地址:https://www.cnblogs.com/LIAOBO/p/13752039.html
Copyright © 2020-2023  润新知