背景
1、当进程在进行远程通信时,彼此可以发送各种类型的数据,无论是什么类型的数据都会以二进制序列的形式在网络上传送。发送方需要把对象转化为字节序列才可在网络上传输,称为对象序列化;接收方则需要把字节序列恢复为对象,称为对象的反序列化。
2、Hive的反序列化是对key/value反序列化成hive table的每个列的值。
3、Hive可以方便的将数据加载到表中而不需要对数据进行转换,这样在处理海量数据时可以节省大量的时间。
Solution 1 : 将json格式数据导入到MongoDB中,然后MongoDB可以将数据转换为CSV格式数据,然后导入到mysql中;
CSSer.com采用的是wordpress程序,数据库为mysql,要想移植到MongoDB数据库,则需要进行数据转换。 数据转移有多种方案,本质上需要将mysql数据转换为一种MongoDB可以直接导入的格式即可。MongoDB提供了mongoimport工具,可以支持导入json,csv的格式。 先来看一下mongoimport支持的参数: $ mongoimport --help options: --help produce help message -v [ --verbose ] be more verbose (include multiple times for more verbosity e.g. -vvvvv) -h [ --host ] arg mongo host to connect to ( <set name>/s1,s2 for sets) --port arg server port. Can also use --host hostname:port --ipv6 enable IPv6 support (disabled by default) -u [ --username ] arg username -p [ --password ] arg password --dbpath arg directly access mongod database files in the given path, instead of connecting to a mongod server - needs to lock the data directory, so cannot be used if a mongod is currently accessing the same path --directoryperdb if dbpath specified, each db is in a separate directory -d [ --db ] arg database to use -c [ --collection ] arg collection to use (some commands) -f [ --fields ] arg comma separated list of field names e.g. -f name,age --fieldFile arg file with fields names - 1 per line --ignoreBlanks if given, empty fields in csv and tsv will be ignored --type arg type of file to import. default: json (json,csv,tsv) --file arg file to import from; if not specified stdin is used --drop drop collection first --headerline CSV,TSV only - use first line as headers --upsert insert or update objects that already exist --upsertFields arg comma-separated fields for the query part of the upsert. You should make sure this is indexed --stopOnError stop importing at first error rather than continuing --jsonArray load a json array, not one item per line. Currently limited to 4MB. 由上面的帮助文档可以看出,采用csv作为中间数据格式,无论对于mysql的导出,还是mongodb的导入,都算得上是成本最低了,于是一回就尝试了一把: 首先,将mysql数据库中的wp-posts表导出,一回偷懒了,直接用phpmyadmin的导出功能,选择csv格式导出,并选中了“删除字段中的换行符”以及“将字段名放在第一行”,保存文件名为csser.csv。 接着,到mongodb服务器,shell下连接MongoDB数据库,并进行数据导入: $ mongoimport -d csser -c posts -type csv -file csser.csv --headerline connected to: 127.0.0.1 imported 548 objects $ mongo MongoDB shell version: 1.8.1 connecting to: test > use csser switched to db csser > db.posts.count() 547 > db.posts.find({}, {"post_title":1}).sort({"ID":-1}).limit(1) { "_id" : ObjectId("4df4641d31b0642fe609426d"), "post_title" : "CSS Sprites在线应用推荐-CSS-sprit" }
Solution2 : 通过Hive中SerDe将JSON数据转换为hive理解的数据格式,原因有:
1、创建Hive表使用序列化时,需要自写一个实现Deserializer的类,并且选用create命令的row format参数;
2、在处理海量数据的时候,如果数据的格式与表结构吻合,可以用到Hive的反序列化而不需要对数据进行转换,可以 节省大量的时间。
How-to: Use a SerDe in Apache Hive
- by Jon Natkins
- December 21, 2012
- no comments
Apache Hive is a fantastic tool for performing SQL-style queries across data that is often not appropriate for a relational database. For example, semistructured and unstructured data can be queried gracefully via Hive, due to two core features: The first is Hive’s support of complex data types, such as structs, arrays, and unions, in addition to many of the common data types found in most relational databases. The second feature is the SerDe.
What is a SerDe?
The SerDe interface allows you to instruct Hive as to how a record should be processed. A SerDe is a combination of a Serializer and a Deserializer (hence, Ser-De). The Deserializer interface takes a string or binary representation of a record, and translates it into a Java object that Hive can manipulate. The Serializer, however, will take a Java object that Hive has been working with, and turn it into something that Hive can write to HDFS or another supported system. Commonly, Deserializers are used at query time to execute SELECT
statements, and Serializers are used when writing data, such as through an INSERT-SELECT
statement.
In this article, we will examine a SerDe for processing JSON data, which can be used to transform a JSON record into something that Hive can process.
Developing a SerDe
JSONSerDe.java代码如下:
/** * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package com.cloudera.hive.serde; import java.util.ArrayList; import java.util.Arrays; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Properties; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hive.serde.serdeConstants; import org.apache.hadoop.hive.serde2.SerDe; import org.apache.hadoop.hive.serde2.SerDeException; import org.apache.hadoop.hive.serde2.SerDeStats; import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.StructField; import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; import org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo; import org.apache.hadoop.hive.serde2.typeinfo.MapTypeInfo; import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo; import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo; import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory; import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Writable; import org.codehaus.jackson.map.ObjectMapper; /** * This SerDe can be used for processing JSON data in Hive. It supports * arbitrary JSON data, and can handle all Hive types except for UNION. * However, the JSON data is expected to be a series of discrete records, * rather than a JSON array of objects. * * The Hive table is expected to contain columns with names corresponding to * fields in the JSON data, but it is not necessary for every JSON field to * have a corresponding Hive column. Those JSON fields will be ignored during * queries. * * Example: * * { "a": 1, "b": [ "str1", "str2" ], "c": { "field1": "val1" } } * * Could correspond to a table: * * CREATE TABLE foo (a INT, b ARRAY<STRING>, c STRUCT<field1:STRING>); * * JSON objects can also interpreted as a Hive MAP type, so long as the keys * and values in the JSON object are all of the appropriate types. For example, * in the JSON above, another valid table declaraction would be: * * CREATE TABLE foo (a INT, b ARRAY<STRING>, c MAP<STRING,STRING>); * * Only STRING keys are supported for Hive MAPs. */ public class JSONSerDe implements SerDe { private StructTypeInfo rowTypeInfo; private ObjectInspector rowOI; private List<String> colNames; private List<Object> row = new ArrayList<Object>(); /** * An initialization function used to gather information about the table. * Typically, a SerDe implementation will be interested in the list of * column names and their types. That information will be used to help perform * actual serialization and deserialization of data. */ @Override public void initialize(Configuration conf, Properties tbl) throws SerDeException { // Get a list of the table's column names. String colNamesStr = tbl.getProperty(serdeConstants.LIST_COLUMNS); colNames = Arrays.asList(colNamesStr.split(",")); // Get a list of TypeInfos for the columns. This list lines up with // the list of column names. String colTypesStr = tbl.getProperty(serdeConstants.LIST_COLUMN_TYPES); List<TypeInfo> colTypes = TypeInfoUtils.getTypeInfosFromTypeString(colTypesStr); rowTypeInfo = (StructTypeInfo) TypeInfoFactory.getStructTypeInfo(colNames, colTypes); rowOI = TypeInfoUtils.getStandardJavaObjectInspectorFromTypeInfo(rowTypeInfo); } /** * This method does the work of deserializing a record into Java objects that * Hive can work with via the ObjectInspector interface. For this SerDe, the * blob that is passed in is a JSON string, and the Jackson JSON parser is * being used to translate the string into Java objects. * * The JSON deserialization works by taking the column names in the Hive * table, and looking up those fields in the parsed JSON object. If the value * of the field is not a primitive, the object is parsed further. */ @Override public Object deserialize(Writable blob) throws SerDeException { Map<?,?> root = null; row.clear(); try { ObjectMapper mapper = new ObjectMapper(); // This is really a Map<String, Object>. For more information about how // Jackson parses JSON in this example, see // http://wiki.fasterxml.com/JacksonDataBinding root = mapper.readValue(blob.toString(), Map.class); } catch (Exception e) { throw new SerDeException(e); } // Lowercase the keys as expected by hive Map<String, Object> lowerRoot = new HashMap(); for(Map.Entry entry: root.entrySet()) { lowerRoot.put(((String)entry.getKey()).toLowerCase(), entry.getValue()); } root = lowerRoot; Object value= null; for (String fieldName : rowTypeInfo.getAllStructFieldNames()) { try { TypeInfo fieldTypeInfo = rowTypeInfo.getStructFieldTypeInfo(fieldName); value = parseField(root.get(fieldName), fieldTypeInfo); } catch (Exception e) { value = null; } row.add(value); } return row; } /** * Parses a JSON object according to the Hive column's type. * * @param field - The JSON object to parse * @param fieldTypeInfo - Metadata about the Hive column * @return - The parsed value of the field */ private Object parseField(Object field, TypeInfo fieldTypeInfo) { switch (fieldTypeInfo.getCategory()) { case PRIMITIVE: // Jackson will return the right thing in this case, so just return // the object if (field instanceof String) { field = field.toString().replaceAll("\n", "\\\\n"); } return field; case LIST: return parseList(field, (ListTypeInfo) fieldTypeInfo); case MAP: return parseMap(field, (MapTypeInfo) fieldTypeInfo); case STRUCT: return parseStruct(field, (StructTypeInfo) fieldTypeInfo); case UNION: // Unsupported by JSON default: return null; } } /** * Parses a JSON object and its fields. The Hive metadata is used to * determine how to parse the object fields. * * @param field - The JSON object to parse * @param fieldTypeInfo - Metadata about the Hive column * @return - A map representing the object and its fields */ private Object parseStruct(Object field, StructTypeInfo fieldTypeInfo) { Map<Object,Object> map = (Map<Object,Object>)field; ArrayList<TypeInfo> structTypes = fieldTypeInfo.getAllStructFieldTypeInfos(); ArrayList<String> structNames = fieldTypeInfo.getAllStructFieldNames(); List<Object> structRow = new ArrayList<Object>(structTypes.size()); for (int i = 0; i < structNames.size(); i++) { structRow.add(parseField(map.get(structNames.get(i)), structTypes.get(i))); } return structRow; } /** * Parse a JSON list and its elements. This uses the Hive metadata for the * list elements to determine how to parse the elements. * * @param field - The JSON list to parse * @param fieldTypeInfo - Metadata about the Hive column * @return - A list of the parsed elements */ private Object parseList(Object field, ListTypeInfo fieldTypeInfo) { ArrayList<Object> list = (ArrayList<Object>) field; TypeInfo elemTypeInfo = fieldTypeInfo.getListElementTypeInfo(); for (int i = 0; i < list.size(); i++) { list.set(i, parseField(list.get(i), elemTypeInfo)); } return list.toArray(); } /** * Parse a JSON object as a map. This uses the Hive metadata for the map * values to determine how to parse the values. The map is assumed to have * a string for a key. * * @param field - The JSON list to parse * @param fieldTypeInfo - Metadata about the Hive column * @return */ private Object parseMap(Object field, MapTypeInfo fieldTypeInfo) { Map<Object,Object> map = (Map<Object,Object>) field; TypeInfo valueTypeInfo = fieldTypeInfo.getMapValueTypeInfo(); for (Map.Entry<Object,Object> entry : map.entrySet()) { map.put(entry.getKey(), parseField(entry.getValue(), valueTypeInfo)); } return map; } /** * Return an ObjectInspector for the row of data */ @Override public ObjectInspector getObjectInspector() throws SerDeException { return rowOI; } /** * Unimplemented */ @Override public SerDeStats getSerDeStats() { return null; } /** * JSON is just a textual representation, so our serialized class * is just Text. */ @Override public Class<? extends Writable> getSerializedClass() { return Text.class; } /** * This method takes an object representing a row of data from Hive, and uses * the ObjectInspector to get the data for each column and serialize it. This * implementation deparses the row into an object that Jackson can easily * serialize into a JSON blob. */ @Override public Writable serialize(Object obj, ObjectInspector oi) throws SerDeException { Object deparsedObj = deparseRow(obj, oi); ObjectMapper mapper = new ObjectMapper(); try { // Let Jackson do the work of serializing the object return new Text(mapper.writeValueAsString(deparsedObj)); } catch (Exception e) { throw new SerDeException(e); } } /** * Deparse a Hive object into a Jackson-serializable object. This uses * the ObjectInspector to extract the column data. * * @param obj - Hive object to deparse * @param oi - ObjectInspector for the object * @return - A deparsed object */ private Object deparseObject(Object obj, ObjectInspector oi) { switch (oi.getCategory()) { case LIST: return deparseList(obj, (ListObjectInspector)oi); case MAP: return deparseMap(obj, (MapObjectInspector)oi); case PRIMITIVE: return deparsePrimitive(obj, (PrimitiveObjectInspector)oi); case STRUCT: return deparseStruct(obj, (StructObjectInspector)oi, false); case UNION: // Unsupported by JSON default: return null; } } /** * Deparses a row of data. We have to treat this one differently from * other structs, because the field names for the root object do not match * the column names for the Hive table. * * @param obj - Object representing the top-level row * @param structOI - ObjectInspector for the row * @return - A deparsed row of data */ private Object deparseRow(Object obj, ObjectInspector structOI) { return deparseStruct(obj, (StructObjectInspector)structOI, true); } /** * Deparses struct data into a serializable JSON object. * * @param obj - Hive struct data * @param structOI - ObjectInspector for the struct * @param isRow - Whether or not this struct represents a top-level row * @return - A deparsed struct */ private Object deparseStruct(Object obj, StructObjectInspector structOI, boolean isRow) { Map<Object,Object> struct = new HashMap<Object,Object>(); List<? extends StructField> fields = structOI.getAllStructFieldRefs(); for (int i = 0; i < fields.size(); i++) { StructField field = fields.get(i); // The top-level row object is treated slightly differently from other // structs, because the field names for the row do not correctly reflect // the Hive column names. For lower-level structs, we can get the field // name from the associated StructField object. String fieldName = isRow ? colNames.get(i) : field.getFieldName(); ObjectInspector fieldOI = field.getFieldObjectInspector(); Object fieldObj = structOI.getStructFieldData(obj, field); struct.put(fieldName, deparseObject(fieldObj, fieldOI)); } return struct; } /** * Deparses a primitive type. * * @param obj - Hive object to deparse * @param oi - ObjectInspector for the object * @return - A deparsed object */ private Object deparsePrimitive(Object obj, PrimitiveObjectInspector primOI) { return primOI.getPrimitiveJavaObject(obj); } private Object deparseMap(Object obj, MapObjectInspector mapOI) { Map<Object,Object> map = new HashMap<Object,Object>(); ObjectInspector mapValOI = mapOI.getMapValueObjectInspector(); Map<?,?> fields = mapOI.getMap(obj); for (Map.Entry<?,?> field : fields.entrySet()) { Object fieldName = field.getKey(); Object fieldObj = field.getValue(); map.put(fieldName, deparseObject(fieldObj, mapValOI)); } return map; } /** * Deparses a list and its elements. * * @param obj - Hive object to deparse * @param oi - ObjectInspector for the object * @return - A deparsed object */ private Object deparseList(Object obj, ListObjectInspector listOI) { List<Object> list = new ArrayList<Object>(); List<?> field = listOI.getList(obj); ObjectInspector elemOI = listOI.getListElementObjectInspector(); for (Object elem : field) { list.add(deparseObject(elem, elemOI)); } return list; } }
Using the SerDe
然后将JSONSerDe.java代码通过Eclipse打包成*.jar文件,并添加相关属性到hive-site.xml文件中,否则在hive客户端执行相关MR操作时提示com.cloudera.hive.serde.JSONSerDe无法找到的Hive ClassNotFoundException相关异常,sulution如下:
you need to tell Hive about the JAR. This is how I do it in hive-site.xml: 将下面的配置语句,加在配置文件: $HIVE_INSTALL/conf/hive-site.xml中,value中*.jar的路径为你机器上实际的放置,在$HIVE_INSTALL/lib目录下寻找。 <property> <name>hive.aux.jars.path</name> <value>file:///home/landen/UntarFile/hive-0.10.0/lib/*.jar</value> <description>These JAR file are available to all users for all jobs</description> </property>
Notice: 仅仅ADD JAR home/landen/UntarFile/hive-0.10.0/lib/*.jar是不够的,还需要在hive启动之前告知其*.jar路径,否则会出现上述相关异常。
Tables can be configured to process data using a SerDe by specifying the SerDe to use at table creation time, or through the use of an ALTER TABLE
statement. For example:
create table if not exists tweets( text string comment 'tweet content', created_at int comment 'the time the tweet issued', user_id int comment 'user id') ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe' LOCATION '/home/landen/UntarFile/hive-0.10.0/StorageTable' ;
相关知识:
1、SerDe是Serialize/Deserilize的简称,目的是用于序列化和反序列化。 2、用户在建表时可以用自定义的SerDe或使用Hive自带的SerDe,SerDe能为表指定列,且对列指定相应的数据。 CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path] 创建指定SerDe表时,使用row format row_format参数,例如: a、添加jar包。在hive客户端输入:hive>add jar /run/serde_test.jar; 或者在linux shell端执行命令:${HIVE_HOME}/bin/hive -auxpath /run/serde_test.jar b、建表:create table serde_table row format serde 'hive.connect.TestDeserializer'; 3、编写序列化类TestDeserializer。实现Deserializer接口的三个函数: a)初始化:initialize(Configuration conf, Properties tb1)。 b)反序列化Writable类型返回Object:deserialize(Writable blob)。 c)获取deserialize(Writable blob)返回值Object的inspector:getObjectInspector()。 public interface Deserializer { /** * Initialize the HiveDeserializer. * @param conf System properties * @param tbl table properties * @throws SerDeException */ public void initialize(Configuration conf, Properties tbl) throws SerDeException; /** * Deserialize an object out of a Writable blob. * In most cases, the return value of this function will be constant since the function * will reuse the returned object. * If the client wants to keep a copy of the object, the client needs to clone the * returned value by calling ObjectInspectorUtils.getStandardObject(). * @param blob The Writable object containing a serialized object * @return A Java object representing the contents in the blob. */ public Object deserialize(Writable blob) throws SerDeException; /** * Get the object inspector that can be used to navigate through the internal * structure of the Object returned from deserialize(...). */ public ObjectInspector getObjectInspector() throws SerDeException; }
Hive客户端执行过程如下:
landen@landen-Lenovo:~/UntarFile/hive-0.10.0$ bin/hive WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. Logging initialized using configuration in jar:file:/home/landen/UntarFile/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties Hive history file=/tmp/landen/hive_job_log_landen_201305051116_1786137407.txt hive (default)> show databases; OK database_name dataprocess default economy financials human_resources login student Time taken: 4.411 seconds hive (default)> use dataprocess; OK Time taken: 0.032 seconds
hive (dataprocess)> load data local path '/home/landen/文档/语料库/NLPIR——tweets.txt' overwrite into table tweets;
Copying data.......... hive (dataprocess)> describe tweets; OK col_name data_type comment text string from deserializer created_at int from deserializer user_id int from deserializer Time taken: 0.427 seconds
可以看出导入到JSON格式数据已被(JSONSerDe)反序列化为Hive所能理解的数据格式文件 hive (dataprocess)> select * from tweets limit 20;//此时还未启动MapReduce过程 OK text created_at user_id @shizhao,我此前一直用dh的,你问问谁用bluehost借用一下就可以了。一般的小站流量根本没多大的.. 1177292576 1 可以看了 1177248274 0 你给的链接无法查看 1177248218 0 转移备份,在看iyee关于blognetwork的文章... 1177174402 0 当帮主也不错啊 1177172873 0 没人告知 1177172446 0 twitter支持中文了? 原来头字符不能是中文的.... 1177172440 0 我也要 1177172414 0 @geegi 你在skype上吗? 1177083182 0 ... 可怜的AMD,但我相信它们比Intel更有钱途 1177082821 0 ..... 并购ATi似乎不在这时候体现吧 1177082690 0 ... 不过就是粘了点改革开放的春风,更多有钱的人不是踢足球的 :( 1177081404 0 @QeeGi 很有理 1177081154 0 ... 不涨工资,还要存款,计划买房,压力不小,生活如此辛苦 1177080852 0 ........ 偶要去吃kfc 1176980497 0 @hung 虽然显示面积大了,但感觉不太方便啊 1176961521 0 @hung 你不用书签栏 1176961395 0 $40-45 million ebay买下StumbleUpon 1176954286 0 ... 加班ing 1176890179 0 ... wjs就是典型的小资,鄙视 1176884977 0 Time taken: 12.161 seconds hive (dataprocess)> select count(*) from tweets;//此时开始启动MapReduce过程 Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.pe
- LOAD DATA LOCAL INPATH 'disp_20130204_14_disp1.log' OVERWRITE INTO TABLE disp_log_data;
r.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201305041640_0008, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201305041640_0008 Kill Command = /home/landen/UntarFile/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305041640_0008 Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1 2013-05-05 11:20:50,690 Stage-1 map = 0%, reduce = 0% 2013-05-05 11:21:36,395 Stage-1 map = 6%, reduce = 0% 2013-05-05 11:22:02,540 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:03,545 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:04,549 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:05,552 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:06,556 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:07,559 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:08,564 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:09,569 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:10,572 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:11,593 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:13,348 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:14,351 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:15,355 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:16,358 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:17,361 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:18,365 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:19,369 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:20,373 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:21,376 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:22,380 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:23,384 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:24,389 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:26,460 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:27,464 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:28,468 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:29,471 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:30,793 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:32,357 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:33,706 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:34,709 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:36,622 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:37,626 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:38,631 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:39,635 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:40,639 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:41,643 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:42,648 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:43,651 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:44,655 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:45,659 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:46,662 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 39.64 sec 2013-05-05 11:22:47,669 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 80.39 sec 2013-05-05 11:22:48,683 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 80.39 sec 2013-05-05 11:22:49,686 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 80.39 sec 2013-05-05 11:22:50,693 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 80.39 sec 2013-05-05 11:22:51,696 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 80.39 sec 2013-05-05 11:22:52,699 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 80.39 sec 2013-05-05 11:22:53,705 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 111.41 sec 2013-05-05 11:22:54,987 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 111.41 sec 2013-05-05 11:22:55,994 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 111.41 sec 2013-05-05 11:22:56,998 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 111.41 sec 2013-05-05 11:22:58,003 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 111.41 sec 2013-05-05 11:22:59,010 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 111.41 sec 2013-05-05 11:23:00,017 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 111.41 sec 2013-05-05 11:23:01,021 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 111.41 sec 2013-05-05 11:23:02,655 Stage-1 map = 38%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:04,766 Stage-1 map = 38%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:06,201 Stage-1 map = 38%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:07,945 Stage-1 map = 38%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:09,201 Stage-1 map = 38%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:10,624 Stage-1 map = 38%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:11,628 Stage-1 map = 44%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:13,317 Stage-1 map = 44%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:14,323 Stage-1 map = 44%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:15,327 Stage-1 map = 44%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:16,331 Stage-1 map = 44%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:17,334 Stage-1 map = 44%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:18,405 Stage-1 map = 44%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:19,409 Stage-1 map = 44%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:20,412 Stage-1 map = 44%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:21,417 Stage-1 map = 44%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:22,420 Stage-1 map = 44%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:27,402 Stage-1 map = 44%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:30,861 Stage-1 map = 50%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:31,865 Stage-1 map = 50%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:33,569 Stage-1 map = 50%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:34,573 Stage-1 map = 50%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:35,576 Stage-1 map = 50%, reduce = 8%, Cumulative CPU 111.41 sec 2013-05-05 11:23:36,630 Stage-1 map = 56%, reduce = 8%, Cumulative CPU 131.8 sec 2013-05-05 11:23:37,635 Stage-1 map = 56%, reduce = 8%, Cumulative CPU 131.8 sec 2013-05-05 11:23:38,671 Stage-1 map = 56%, reduce = 8%, Cumulative CPU 131.8 sec 2013-05-05 11:23:39,676 Stage-1 map = 56%, reduce = 8%, Cumulative CPU 131.8 sec 2013-05-05 11:23:40,683 Stage-1 map = 56%, reduce = 8%, Cumulative CPU 131.8 sec 2013-05-05 11:23:41,691 Stage-1 map = 56%, reduce = 8%, Cumulative CPU 131.8 sec 2013-05-05 11:23:42,701 Stage-1 map = 56%, reduce = 17%, Cumulative CPU 131.8 sec 2013-05-05 11:23:43,705 Stage-1 map = 56%, reduce = 17%, Cumulative CPU 131.8 sec 2013-05-05 11:23:44,752 Stage-1 map = 56%, reduce = 17%, Cumulative CPU 131.8 sec 2013-05-05 11:23:45,755 Stage-1 map = 56%, reduce = 17%, Cumulative CPU 131.8 sec 2013-05-05 11:23:46,758 Stage-1 map = 56%, reduce = 17%, Cumulative CPU 131.8 sec 2013-05-05 11:23:47,769 Stage-1 map = 56%, reduce = 17%, Cumulative CPU 131.8 sec 2013-05-05 11:23:48,773 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 131.8 sec 2013-05-05 11:23:49,776 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 131.8 sec 2013-05-05 11:23:50,779 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 131.8 sec 2013-05-05 11:23:51,784 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 131.8 sec 2013-05-05 11:23:52,788 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 131.8 sec 2013-05-05 11:23:53,793 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 131.8 sec 2013-05-05 11:23:54,812 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:23:55,831 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:23:56,834 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:23:57,838 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:23:58,843 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:23:59,918 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:00,921 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:01,924 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:02,927 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:03,931 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:04,934 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:05,938 Stage-1 map = 63%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:06,941 Stage-1 map = 69%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:07,944 Stage-1 map = 69%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:08,948 Stage-1 map = 69%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:09,952 Stage-1 map = 69%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:10,956 Stage-1 map = 69%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:11,960 Stage-1 map = 69%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:12,964 Stage-1 map = 69%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:13,968 Stage-1 map = 69%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:14,973 Stage-1 map = 69%, reduce = 17%, Cumulative CPU 182.57 sec 2013-05-05 11:24:15,977 Stage-1 map = 94%, reduce = 17%, Cumulative CPU 198.58 sec 2013-05-05 11:24:16,981 Stage-1 map = 94%, reduce = 17%, Cumulative CPU 198.58 sec 2013-05-05 11:24:17,985 Stage-1 map = 94%, reduce = 17%, Cumulative CPU 198.58 sec 2013-05-05 11:24:18,988 Stage-1 map = 94%, reduce = 17%, Cumulative CPU 198.58 sec 2013-05-05 11:24:19,992 Stage-1 map = 94%, reduce = 17%, Cumulative CPU 198.58 sec 2013-05-05 11:24:20,995 Stage-1 map = 94%, reduce = 17%, Cumulative CPU 198.58 sec 2013-05-05 11:24:21,998 Stage-1 map = 94%, reduce = 17%, Cumulative CPU 198.58 sec 2013-05-05 11:24:23,001 Stage-1 map = 94%, reduce = 17%, Cumulative CPU 198.58 sec 2013-05-05 11:24:24,008 Stage-1 map = 94%, reduce = 17%, Cumulative CPU 198.58 sec 2013-05-05 11:24:25,012 Stage-1 map = 94%, reduce = 17%, Cumulative CPU 198.58 sec 2013-05-05 11:24:26,016 Stage-1 map = 94%, reduce = 17%, Cumulative CPU 198.58 sec 2013-05-05 11:24:27,024 Stage-1 map = 94%, reduce = 17%, Cumulative CPU 198.58 sec 2013-05-05 11:24:28,028 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 225.88 sec 2013-05-05 11:24:29,034 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 225.88 sec 2013-05-05 11:24:30,037 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 225.88 sec 2013-05-05 11:24:31,043 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 225.88 sec 2013-05-05 11:24:32,046 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 225.88 sec 2013-05-05 11:24:33,049 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 225.88 sec 2013-05-05 11:24:34,055 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 227.04 sec 2013-05-05 11:24:35,058 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 227.04 sec 2013-05-05 11:24:36,061 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 227.04 sec 2013-05-05 11:24:37,065 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 227.04 sec 2013-05-05 11:24:38,068 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 227.04 sec 2013-05-05 11:24:39,072 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 227.04 sec 2013-05-05 11:24:40,076 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 227.04 sec MapReduce Total cumulative CPU time: 3 minutes 47 seconds 40 msec Ended Job = job_201305041640_0008 MapReduce Jobs Launched: Job 0: Map: 4 Reduce: 1 Cumulative CPU: 227.04 sec HDFS Read: 845494724 HDFS Write: 8 SUCCESS Total MapReduce CPU Time Spent: 3 minutes 47 seconds 40 msec OK将下面的配置语句,加在配置文件: $HIVE_INSTALL/conf/hive-site.xml中,value中hive-contrib-*.jar的路径为你机器上实际的放置,在$HIVE_INSTALL/lib目录下寻找。 _c0 4999999(500万条过滤后的twitter语料库) Time taken: 266.063 seconds hive (dataprocess)> select text,created_at from tweets where user_id = 1; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_201305041640_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201305041640_0009 Kill Command = /home/landen/UntarFile/hadoop-1.0.4/libexec/../bin/hadoop job -kill job_201305041640_0009 Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 0 2013-05-05 20:45:19,007 Stage-1 map = 0%, reduce = 0% 2013-05-05 20:45:48,825 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:45:49,836 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:45:50,838 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:45:51,841 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:45:52,844 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:45:56,152 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:45:57,158 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:45:58,161 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:45:59,163 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:00,166 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:01,169 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:02,200 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:03,203 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:04,206 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:05,208 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:06,212 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:07,215 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:08,219 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:09,225 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:10,227 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:11,231 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:12,234 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:13,237 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:14,239 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:15,242 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:16,244 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:17,247 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:18,250 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:19,256 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:20,260 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:21,263 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:22,266 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:23,277 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:24,279 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:25,282 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:26,286 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:27,290 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:28,292 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:29,296 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:30,298 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:31,301 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:32,303 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:33,306 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 13.11 sec 2013-05-05 20:46:34,610 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:35,688 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:36,693 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:37,696 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:38,698 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:39,701 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:40,703 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:41,707 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:42,710 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:43,713 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:44,715 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:45,718 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:46,721 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:47,723 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:48,728 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:49,732 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:50,764 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:51,820 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:52,823 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:53,879 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:54,998 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:56,161 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:57,164 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:58,167 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:46:59,262 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:47:00,811 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 100.22 sec 2013-05-05 20:47:02,161 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:03,164 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:04,166 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:05,169 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:08,703 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:09,710 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:10,713 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:11,715 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:12,718 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:13,723 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:14,726 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:15,729 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:16,732 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:17,737 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:18,739 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:19,745 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:20,749 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:21,754 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:22,757 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:23,760 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:24,763 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:25,766 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:26,770 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:27,778 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:28,781 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:29,784 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:30,788 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:31,791 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:32,795 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:33,798 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 111.27 sec 2013-05-05 20:47:34,964 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 155.37 sec 2013-05-05 20:47:35,967 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 155.37 sec 2013-05-05 20:47:37,161 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 155.37 sec 2013-05-05 20:47:38,173 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 155.37 sec 2013-05-05 20:47:39,176 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 155.37 sec 2013-05-05 20:47:40,244 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 158.69 sec 2013-05-05 20:47:41,247 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 158.69 sec 2013-05-05 20:47:42,249 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 158.69 sec 2013-05-05 20:47:43,319 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 158.69 sec 2013-05-05 20:47:44,322 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 158.69 sec 2013-05-05 20:47:45,325 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 158.69 sec 2013-05-05 20:47:46,327 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 158.69 sec 2013-05-05 20:47:47,330 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 158.69 sec 2013-05-05 20:47:48,333 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 158.69 sec 2013-05-05 20:47:49,644 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 158.69 sec MapReduce Total cumulative CPU time: 2 minutes 38 seconds 690 msec Ended Job = job_201305041640_0009 MapReduce Jobs Launched: Job 0: Map: 4 Cumulative CPU: 158.69 sec HDFS Read: 845494724 HDFS Write: 138 SUCCESS Total MapReduce CPU Time Spent: 2 minutes 38 seconds 690 msec OK text created_at @shizhao,我此前一直用dh的,你问问谁用bluehost借用一下就可以了。一般的小站流量根本没多大的.. 1177292576 Time taken: 172.857 seconds hive (dataprocess)>
Conclusion
The SerDe interface is extremely powerful for dealing with data with a complex schema. By utilizing SerDes, any dataset can be made queryable through Hive.
参考资料:
(How-to: Use a SerDe in Apache Hive) http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/
(Hive中SerDe概述) http://blog.csdn.net/dajuezhao/article/details/5753791
(大数据解决方案设计) http://www.infoq.com/cn/articles/BigDataBlueprint
(把JSON格式的数据储存到MongDB中) http://www.myexception.cn/database/502613.html