• 把HDFS上的数据导入到Hive中


    1. 首先下载测试数据,数据也可以创建

    http://files.grouplens.org/datasets/movielens/ml-latest-small.zip

    2. 数据类型与字段名称

    movies.csv(电影元数据)
    movieId,title,genres
    
    ratings.csv(用户打分数据)
    userId,movieId,rating,timestamp

    3. 先把数据存放到HDFS上

    hdfs dfs -mkdir /hive_operate
    hdfs dfs -mkdir /hive_operate/movie_table
    hdfs dfs -mkdir /hive_operate/rating_table
    
    hdfs dfs -put movies.csv /hive_operate/movie_table
    hdfs dfs -put ratings.csv /hive_operate/rating_table

    4. 创建movie_table和rating_table

    ]$ cat create_movie_table.sql 
    create external table movie_table
    (
    movieId STRING,
    title STRING,
    genres STRING
    )
    row format delimited fields terminated by ','
    stored as textfile
    location '/hive_operate/movie_table';
    
    ]$ cat create_rating_table.sql
    create external table rating_table
    (userId STRING,
    movieId STRING,
    rating STRING,
    ts STRING
    )
    row format delimited fields terminated by ','
    stored as textfile
    location '/hive_operate/rating_table';
    其中字段名为timestamp为hive的保留字段,执行的时候会报错,需用反引号或者修改字段名,我这边修改的字段名

    5. 执行

    可以通过复制命令到终端执行,也可以通过hive -f movie_table_e来创建表

    6. 查看

    hive> show tables;
    OK
    movie_table
    rating_table
    
    hive> select * from rating_table limit 10;
    OK
    1    31    2.5    1260759144
    1    1029    3.0    1260759179
    1    1061    3.0    1260759182
    1    1129    2.0    1260759185
    1    1172    4.0    1260759205
    1    1263    2.0    1260759151
    1    1287    2.0    1260759187
    1    1293    2.0    1260759148
    1    1339    3.5    1260759125
    1    1343    2.0    1260759131

    7. 生成新表(行为表)

    create table behavior_table as
    select B.userid, A.movieid, B.rating, A.title
    from movie_table A
    join rating_table B
    on A.movieid == B.movieid;

    8. 把Hive表数据导入到本地

    table->local file
    insert overwrite local directory '/root/hive_test/1.txt' select * from behavior_table;

    9. 把Hive表数据导入到HDFS上

    table->hdfs file
    insert overwrite directory '/root/hive_test/1.txt' select * from behavior_table;

    10. 把本地数据导入到Hive表中

    local file -> table
    LOAD DATA LOCAL INPATH '/root/hive_test/a.txt' OVERWRITE INTO TABLE behavior_table;

    11. 把HDFS上的数导入到HIve表中

    hdfs file -> table
    LOAD DATA INPATH '/a.txt' OVERWRITE INTO TABLE behavior_table;
  • 相关阅读:
    HDFS命令操作和高可用
    nginx-nginx和反向代理概念
    day01 Hadoop 简单介绍及架构设计
    常用正则表达式
    【JavaWeb笔记】第二章 JDBC
    【JavaSE笔记】第二章 进制数据类型和运算符
    LeetCode-94 Binary Tree Inorder Traversal Solution (with Java)
    LeetCode-1019 Next Greater Node In Linked List Solution (with Java)
    LeetCode-946 Validate Stack Sequences Solution (with Java)
    LeetCode-739 Daily Temperatures Solution (with Java)
  • 原文地址:https://www.cnblogs.com/654wangzai321/p/9970321.html
Copyright © 2020-2023  润新知