• hive多分隔符支持


    1.问题描述

    如何将多个字符作为字段分割符的数据文件加载到Hive表中,事例数据如下:

      字段分隔符为“@#$”

    test1@#$test1name@#$test2value
    test2@#$test2name@#$test2value
    test3@#$test3name@#$test4value

    如何将上述事例数据加载到Hive表(multi_delimiter_test)中,表结构如下:

    字段名

    字段类型

    s1 string
    s2 string
    s3 string

    2.Hive多分隔符支持

    Hive在0.14及以后版本支持字段的多分隔符,参考https://cwiki.apache.org/confluence/display/Hive/MultiDelimitSerDe

    3.实现步骤

    1.准备多分隔符文件并装载到HDFS对应目录

    [ec2-user@ip-172-31-8-141  ~]$ cat multi_delimiter_test.dat
    
    test1@#$test1name@#$test2value
    
    test2@#$test2name@#$test2value
    
    test3@#$test3name@#$test4value  
    
    [ec2-user@ip-172-31-8-141  ~]$ hadoop dfs -put multi_delimiter_test.dat /fayson/multi_delimiter_test
    
    [ec2-user@ip-172-31-8-141  ~]$ hadoop dfs -ls /fayson/multi_delimiter_test
    
    DEPRECATED: Use of this  script to execute hdfs command is deprecated.
    
    Instead use the hdfs  command for it.
    
    
    Found 1 items
    
    -rw-r--r--   3 user_r supergroup         93 2017-08-23 03:24  /fayson/multi_delimiter_test/multi_delimiter_test.dat
    
    [ec2-user@ip-172-31-8-141  ~]$

    2.基于准备好的多分隔符文件建表

    create  external table multi_delimiter_test(
    
    s1 string,
    
    s2 string,
    
    s3 string)
    
    ROW FORMAT  SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH  SERDEPROPERTIES ("field.delim"="@#$")
    
    stored as  textfile location '/fayson/multi_delimiter_test';

    3.测试

    >  select * from multi_delimiter_test;
    +--------------------------+--------------------------+--------------------------+--+
    |  multi_delimiter_test.s1  |  multi_delimiter_test.s2  |  multi_delimiter_test.s3  |
    +--------------------------+--------------------------+--------------------------+--+
    | test1                    | test1name                | test2value               |
    | test2                    | test2name                | test2value               |
    | test3                    | test3name                | test4value               |
    +--------------------------+--------------------------+--------------------------+--+

    字段名

    字段类型

    s1

    String

    s2

    String

  • 相关阅读:
    抖动代码
    WSAWaitForMultipleEvents()
    udp 不需要 listen
    WSAEventSelect
    C++ Win32控制台应用程序捕捉关闭事件
    玩转Win32开发(2):完整的开发流程
    win32线程池代码(WinApi/C++)
    又线程类封装
    Win2 Socket(套接字)相关 API
    Win32函数Sleep的精度测试
  • 原文地址:https://www.cnblogs.com/LIAOBO/p/13752039.html
Copyright © 2020-2023  润新知