• Hive之不同数据库的表数据转移


    从数据库economy表stocks中导入data到数据库human_resources表emp_stocks
    
    
    Problem 1: 无法查看本地临时目录emp_stocks中的数据000000_0,000001_0;
    原因是:自己居然在hive中的文件系统hdfs用dfs -cat file命令去查询,脑子进水了不是,应该先退出hdfs文件系统hive> quit;然后执行查询 
    cat /emp_stocks/000000_0; 如果将数据库economy表stocks中要查询的数据导入到hive中hdfs文件系统的一个临时目录,可用dfs -cat file查询。
    
    Problem 2: 在将数据库economy表stocks中要查询的数据导入到一个临时目录时,临时目录中的文件列分隔符出现乱码;
    原因是:Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type - then those columns are serialized to JSON format.
    
    首先,将数据库economy表stocks中要查询的数据导入到一个临时目录(可以是local的,也可以是hdfs系统的->emit the 'local' keyword)
    hive> set hive.cli.print.current.db = true;
    hive(economy)> insert overwrite local directory '/home/landen/UntarFile/hive-0.10.0/emp_stocks' 
        > select exchange,symbol,ymd,price_open,price_close,price_adj_close from stocks;
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_201303271617_0008, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201303271617_0008
    Kill Command = /home/landen/UntarFile/hadoop-1.0.4/libexec/../bin/hadoop job  -kill job_201303271617_0008
    Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
    2013-03-28 22:25:06,557 Stage-1 map = 0%,  reduce = 0%
    2013-03-28 22:25:19,031 Stage-1 map = 11%,  reduce = 0%
    2013-03-28 22:25:23,603 Stage-1 map = 24%,  reduce = 0%
    2013-03-28 22:25:24,607 Stage-1 map = 36%,  reduce = 0%
    2013-03-28 22:25:28,307 Stage-1 map = 41%,  reduce = 0%
    2013-03-28 22:25:34,324 Stage-1 map = 53%,  reduce = 0%
    2013-03-28 22:25:37,345 Stage-1 map = 72%,  reduce = 0%
    2013-03-28 22:25:40,352 Stage-1 map = 83%,  reduce = 0%
    2013-03-28 22:25:43,427 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 22.03 sec
    2013-03-28 22:25:44,433 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 22.03 sec
    2013-03-28 22:25:45,437 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 22.03 sec
    2013-03-28 22:25:46,441 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 22.03 sec
    2013-03-28 22:25:47,445 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 22.03 sec
    2013-03-28 22:25:48,453 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 22.03 sec
    2013-03-28 22:25:49,456 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 47.59 sec
    2013-03-28 22:25:50,460 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 47.59 sec
    2013-03-28 22:25:51,464 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 47.59 sec
    2013-03-28 22:25:52,467 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 47.59 sec
    2013-03-28 22:25:53,473 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 47.59 sec
    2013-03-28 22:25:54,478 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 47.59 sec
    2013-03-28 22:25:55,482 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 47.59 sec
    2013-03-28 22:25:56,486 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 47.59 sec
    2013-03-28 22:25:57,490 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 47.59 sec
    2013-03-28 22:25:58,494 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 47.59 sec
    MapReduce Total cumulative CPU time: 47 seconds 590 msec
    Ended Job = job_201303271617_0008
    Copying data to local directory /home/landen/UntarFile/hive-0.10.0/emp_stocks
    Copying data to local directory /home/landen/UntarFile/hive-0.10.0/emp_stocks
    8487547 Rows loaded to /home/landen/UntarFile/hive-0.10.0/emp_stocks
    MapReduce Jobs Launched: 
    Job 0: Map: 2   Cumulative CPU: 47.59 sec   HDFS Read: 481098497 HDFS Write: 330347902 SUCCESS
    Total MapReduce CPU Time Spent: 47 seconds 590 msec
    OK
    Time taken: 71.212 seconds
    
    其次,再将上面那个临时目录'/home/landen/UntarFile/hive-0.10.0/emp_stocks'中的数据导入到human_resources表emp_stocks中:
    hive (human_resources)> load data local inpath '/home/landen/UntarFile/hive-0.10.0/emp_stocks'
                          > overwrite into table emp_stocks;
    Copying data from file:/home/landen/UntarFile/hive-0.10.0/emp_stocks
    Copying file: file:/home/landen/UntarFile/hive-0.10.0/emp_stocks/000001_0
    Copying file: file:/home/landen/UntarFile/hive-0.10.0/emp_stocks/000000_0
    Loading data to table human_resources.emp_stocks
    Deleted hdfs://localhost:9000/home/landen/UntarFile/hive-0.10.0/user/hive/warehouse/emp_stocks
    Table human_resources.emp_stocks stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 330347902, raw_data_size: 0]
    OK
    Time taken: 10.752 seconds
    hive (human_resources)> select count(1) from emp_stocks;
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapred.reduce.tasks=<number>
    Starting Job = job_201303271617_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201303271617_0009
    Kill Command = /home/landen/UntarFile/hadoop-1.0.4/libexec/../bin/hadoop job  -kill job_201303271617_0009
    Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
    2013-03-29 12:07:30,122 Stage-1 map = 0%,  reduce = 0%
    2013-03-29 12:08:30,461 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 6.14 sec
    2013-03-29 12:08:32,668 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 6.14 sec
    2013-03-29 12:08:33,674 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:34,678 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:35,681 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:36,684 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:37,687 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:38,690 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:39,693 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:40,696 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:43,888 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:44,892 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:45,895 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:46,898 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:47,901 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:48,904 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:49,906 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:50,909 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:51,913 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:52,916 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:53,919 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:54,922 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 6.6 sec
    2013-03-29 12:08:55,926 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 15.8 sec
    2013-03-29 12:08:56,930 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 15.8 sec
    2013-03-29 12:08:57,934 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 15.8 sec
    2013-03-29 12:08:58,947 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 15.8 sec
    2013-03-29 12:08:59,950 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 15.8 sec
    2013-03-29 12:09:00,955 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 15.8 sec
    2013-03-29 12:09:01,960 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 18.76 sec
    2013-03-29 12:09:02,964 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 18.76 sec
    2013-03-29 12:09:03,967 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 18.76 sec
    2013-03-29 12:09:04,970 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 18.76 sec
    2013-03-29 12:09:05,974 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 18.76 sec
    2013-03-29 12:09:06,978 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 18.76 sec
    2013-03-29 12:09:08,042 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 18.76 sec
    MapReduce Total cumulative CPU time: 18 seconds 760 msec
    Ended Job = job_201303271617_0009
    MapReduce Jobs Launched: 
    Job 0: Map: 2  Reduce: 1   Cumulative CPU: 18.76 sec   HDFS Read: 330365236 HDFS Write: 8 SUCCESS
    Total MapReduce CPU Time Spent: 18 seconds 760 msec
    OK
    8487547
    Time taken: 111.511 seconds
    hive (human_resources)> 
    
    key 1: Loading files into tables
    
    Synopsis
    
        1. Load operations are current pure copy/move operations that move datafiles into locations corresponding to Hive tables.filepath can be a relative path, eg: project/data1; absolute path, eg: /user/hive/project/data1; a full URI with scheme and (optionally) an authority, eg: hdfs://namenode:9000/user/hive/project/data1;
        2. The target being loaded to can be a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns;
        3. filepath can refer to a file (in which case hive will move the file into the table) or it can be a directory (in which case hive will move all the files within that directory into the table). In either case filepath addresses a set of files;
        4. If the keyword LOCAL is specified, then: the load command will look for filepath in the local file system. If a relative path is specified - it will be interpreted relative to the current directory of the user. User can specify a full URI for local files as well - for example: file:///user/hive/project/data1;
        5. the load command will try to copy all the files addressed by filepath to the target filesystem. The target file system is inferred by looking at the location attribute of the table. The copied data files will then be moved to the table;
        6. If the keyword LOCAL is not specified, then Hive will either use the full URI of filepath if one is specified. Otherwise the following rules are applied: If scheme or authority are not specified, Hive will use the scheme and authority from hadoop configuration variable fs.default.name that specifies the Namenode URI ; If the path is not absolute - then Hive will interpret it relative to /user/<username>. Hive will move the files addressed by filepath into the table (or partition);
        7. if the OVERWRITE keyword is used then the contents of the target table (or partition) will be deleted and replaced with the files referred to by filepath. Otherwise the files referred by filepath will be added to the table.
        
        Note that if the target table (or partition) already has a file whose name collides with any of the filenames contained in filepath - then the existing file will be replaced with the new file.
    
    Notes
    
        1. filepath cannot contain subdirectories;
        2. If we are not using the keyword LOCAL - filepath must refer to files within the same filesystem as the table (or partition's) location;
        3. Hive does some minimal checks to make sure that the files being loaded match the target table. Currently it checks that if the table is stored in sequencefile format - that the files being loaded are also sequencefiles and vice versa;
        4. Please read CompressedStorage if your datafile is compressed.
    
    key 2: Inserting data into Hive Tables from queries
    
    Synopsis
    
        1. INSERT OVERWRITE will overwrite any existing data in the table or partition unless IF NOT EXISTS is provided for a partition (as of Hive 0.9.0);
        2. INSERT INTO will append to the table or partition keeping the existing data in tact. (Note: INSERT INTO syntax is only available starting in version 0.8);
        3. Inserts can be done to a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns;
        4. Multiple insert clauses (also known as Multi Table Insert) can be specified in the same query;
        5. The output of each of the select statements is written to the chosen table (or partition). Currently the OVERWRITE keyword is mandatory and implies that the contents of the chosen table or partition are replaced with the output of corresponding select statement;
        6. The output format and serialization class is determined by the table's metadata (as specified via DDL commands on the table).
    In the dynamic partition inserts, users can give partial partition specification, which means you just specify the list of partition column names in the PARTITION clause. The column values are optional. If a partition column value is given, we call this static partition, otherwise dynamic partition. Each dynamic partition column has a corresponding input column from the select statement. This means that the dynamic partition creation is determined by the value of the input column. The dynamic partition columns must be specified last among the columns in the SELECT statement and in the same order in which they appear in the PARTITION() clause.
    
    Notes
    
        1. Multi Table Inserts minimize the number of data scans required. Hive can insert data into multiple tables by scanning the input data just once (and applying different query operators) to the input data.
    
    key 3: Writing data into filesystem from queries
    
    Synopsis
    
        1. directory can be full URI. If scheme or authority are not specified, Hive will use the scheme and authority from hadoop configuration variable fs.default.name that specifies the Namenode URI;
        2. if LOCAL keyword is used - then Hive will write data to the directory on the local file system;
        3. Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type - then those columns are serialized to JSON format.
    
    Notes
    
        1. INSERT OVERWRITE statements to directories, local directories and tables (or partitions) can all be used together within the same query;
        2. INSERT OVERWRITE statements to HDFS filesystem directories is the best way to extract large amounts of data from Hive. Hive can write to HDFS directories in parallel from within a map-reduce job;
        3. The directory is, as you would expect, OVERWRITten, in other words, if the specified path exists, it is clobbered and replaced with the output.
    
  • 相关阅读:
    Number原生类型的扩展
    复杂参数的基本使用方式
    使用Number原生类型
    Function原生类型
    面向对象类型系统
    Error原生类型的扩展
    Date原生类型的扩展
    flex学习网站大全(转)
    如何调试javaScript
    使用JavaScriptConverter实现返回DataTable对象
  • 原文地址:https://www.cnblogs.com/likai198981/p/2988663.html
Copyright © 2020-2023  润新知