• 如何将json数据导入到Hive中


    背景

    1、当进程在进行远程通信时,彼此可以发送各种类型的数据,无论是什么类型的数据都会以二进制序列的形式在网络上传送。发送方需要把对象转化为字节序列才可在网络上传输,称为对象序列化;接收方则需要把字节序列恢复为对象,称为对象的反序列化。

    2、Hive的反序列化是对key/value反序列化成hive table的每个列的值。

    3、Hive可以方便的将数据加载到表中而不需要对数据进行转换,这样在处理海量数据时可以节省大量的时间。

    Solution 1 : 将json格式数据导入到MongoDB中,然后MongoDB可以将数据转换为CSV格式数据,然后导入到mysql中;

    CSSer.com采用的是wordpress程序,数据库为mysql,要想移植到MongoDB数据库,则需要进行数据转换。
    
    数据转移有多种方案,本质上需要将mysql数据转换为一种MongoDB可以直接导入的格式即可。MongoDB提供了mongoimport工具,可以支持导入json,csv的格式。
    
    先来看一下mongoimport支持的参数:
    
    $ mongoimport --help
    options:
      --help                  produce help message
      -v [ --verbose ]        be more verbose (include multiple times for more
                              verbosity e.g. -vvvvv)
      -h [ --host ] arg       mongo host to connect to ( <set name>/s1,s2 for sets)
      --port arg              server port. Can also use --host hostname:port
      --ipv6                  enable IPv6 support (disabled by default)
      -u [ --username ] arg   username
      -p [ --password ] arg   password
      --dbpath arg            directly access mongod database files in the given
                              path, instead of connecting to a mongod  server -
                              needs to lock the data directory, so cannot be used
                              if a mongod is currently accessing the same path
      --directoryperdb        if dbpath specified, each db is in a separate
                              directory
      -d [ --db ] arg         database to use
      -c [ --collection ] arg collection to use (some commands)
      -f [ --fields ] arg     comma separated list of field names e.g. -f name,age
      --fieldFile arg         file with fields names - 1 per line
      --ignoreBlanks          if given, empty fields in csv and tsv will be ignored
      --type arg              type of file to import.  default: json (json,csv,tsv)
      --file arg              file to import from; if not specified stdin is used
      --drop                  drop collection first
      --headerline            CSV,TSV only - use first line as headers
      --upsert                insert or update objects that already exist
      --upsertFields arg      comma-separated fields for the query part of the
                              upsert. You should make sure this is indexed
      --stopOnError           stop importing at first error rather than continuing
      --jsonArray             load a json array, not one item per line. Currently
                              limited to 4MB.
    
    由上面的帮助文档可以看出,采用csv作为中间数据格式,无论对于mysql的导出,还是mongodb的导入,都算得上是成本最低了,于是一回就尝试了一把:
    
    首先,将mysql数据库中的wp-posts表导出,一回偷懒了,直接用phpmyadmin的导出功能,选择csv格式导出,并选中了“删除字段中的换行符”以及“将字段名放在第一行”,保存文件名为csser.csv。
    
    接着,到mongodb服务器,shell下连接MongoDB数据库,并进行数据导入:
    
    $ mongoimport -d csser -c posts -type csv -file csser.csv --headerline
    connected to: 127.0.0.1
    imported 548 objects
    
    $ mongo
    MongoDB shell version: 1.8.1
    connecting to: test
    > use csser
    switched to db csser
    > db.posts.count()
    547
    
    > db.posts.find({}, {"post_title":1}).sort({"ID":-1}).limit(1)
    { "_id" : ObjectId("4df4641d31b0642fe609426d"), "post_title" : "CSS Sprites在线应用推荐-CSS-sprit" }

    Solution2 : 通过Hive中SerDe将JSON数据转换为hive理解的数据格式,原因有:

           1、创建Hive表使用序列化时,需要自写一个实现Deserializer的类,并且选用create命令的row format参数;

           2、在处理海量数据的时候,如果数据的格式与表结构吻合,可以用到Hive的反序列化而不需要对数据进行转换,可以   节省大量的时间。

    How-to: Use a SerDe in Apache Hive

    Apache Hive is a fantastic tool for performing SQL-style queries across data that is often not appropriate for a relational database. For example, semistructured and unstructured data can be queried gracefully via Hive, due to two core features: The first is Hive’s support of complex data types, such as structs, arrays, and unions, in addition to many of the common data types found in most relational databases. The second feature is the SerDe.

    What is a SerDe?

    The SerDe interface allows you to instruct Hive as to how a record should be processed. A SerDe is a combination of a Serializer and a Deserializer (hence, Ser-De). The Deserializer interface takes a string or binary representation of a record, and translates it into a Java object that Hive can manipulate. The Serializer, however, will take a Java object that Hive has been working with, and turn it into something that Hive can write to HDFS or another supported system. Commonly, Deserializers are used at query time to execute SELECT statements, and Serializers are used when writing data, such as through an INSERT-SELECT statement.

    In this article, we will examine a SerDe for processing JSON data, which can be used to transform a JSON record into something that Hive can process.

    Developing a SerDe

    JSONSerDe.java代码如下:

    View Code
    /**
    * Licensed to the Apache Software Foundation (ASF) under one
    * or more contributor license agreements. See the NOTICE file
    * distributed with this work for additional information
    * regarding copyright ownership. The ASF licenses this file
    * to you under the Apache License, Version 2.0 (the
    * "License"); you may not use this file except in compliance
    * with the License. You may obtain a copy of the License at
    *
    * http://www.apache.org/licenses/LICENSE-2.0
    *
    * Unless required by applicable law or agreed to in writing, software
    * distributed under the License is distributed on an "AS IS" BASIS,
    * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    * See the License for the specific language governing permissions and
    * limitations under the License.
    */
    package com.cloudera.hive.serde;
    
    import java.util.ArrayList;
    import java.util.Arrays;
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    import java.util.Properties;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.hive.serde.serdeConstants;
    import org.apache.hadoop.hive.serde2.SerDe;
    import org.apache.hadoop.hive.serde2.SerDeException;
    import org.apache.hadoop.hive.serde2.SerDeStats;
    import org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.MapObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
    import org.apache.hadoop.hive.serde2.objectinspector.StructField;
    import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
    import org.apache.hadoop.hive.serde2.typeinfo.ListTypeInfo;
    import org.apache.hadoop.hive.serde2.typeinfo.MapTypeInfo;
    import org.apache.hadoop.hive.serde2.typeinfo.StructTypeInfo;
    import org.apache.hadoop.hive.serde2.typeinfo.TypeInfo;
    import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory;
    import org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.io.Writable;
    import org.codehaus.jackson.map.ObjectMapper;
    
    /**
    * This SerDe can be used for processing JSON data in Hive. It supports
    * arbitrary JSON data, and can handle all Hive types except for UNION.
    * However, the JSON data is expected to be a series of discrete records,
    * rather than a JSON array of objects.
    *
    * The Hive table is expected to contain columns with names corresponding to
    * fields in the JSON data, but it is not necessary for every JSON field to
    * have a corresponding Hive column. Those JSON fields will be ignored during
    * queries.
    *
    * Example:
    *
    * { "a": 1, "b": [ "str1", "str2" ], "c": { "field1": "val1" } }
    *
    * Could correspond to a table:
    *
    * CREATE TABLE foo (a INT, b ARRAY<STRING>, c STRUCT<field1:STRING>);
    *
    * JSON objects can also interpreted as a Hive MAP type, so long as the keys
    * and values in the JSON object are all of the appropriate types. For example,
    * in the JSON above, another valid table declaraction would be:
    *
    * CREATE TABLE foo (a INT, b ARRAY<STRING>, c MAP<STRING,STRING>);
    *
    * Only STRING keys are supported for Hive MAPs.
    */
    public class JSONSerDe implements SerDe {
      
      private StructTypeInfo rowTypeInfo;
      private ObjectInspector rowOI;
      private List<String> colNames;
      private List<Object> row = new ArrayList<Object>();
      
      /**
    * An initialization function used to gather information about the table.
    * Typically, a SerDe implementation will be interested in the list of
    * column names and their types. That information will be used to help perform
    * actual serialization and deserialization of data.
    */
      @Override
      public void initialize(Configuration conf, Properties tbl)
          throws SerDeException {
        // Get a list of the table's column names.
        String colNamesStr = tbl.getProperty(serdeConstants.LIST_COLUMNS);
        colNames = Arrays.asList(colNamesStr.split(","));
        
        // Get a list of TypeInfos for the columns. This list lines up with
        // the list of column names.
        String colTypesStr = tbl.getProperty(serdeConstants.LIST_COLUMN_TYPES);
        List<TypeInfo> colTypes =
            TypeInfoUtils.getTypeInfosFromTypeString(colTypesStr);
        
        rowTypeInfo =
            (StructTypeInfo) TypeInfoFactory.getStructTypeInfo(colNames, colTypes);
        rowOI =
            TypeInfoUtils.getStandardJavaObjectInspectorFromTypeInfo(rowTypeInfo);
      }
    
      /**
    * This method does the work of deserializing a record into Java objects that
    * Hive can work with via the ObjectInspector interface. For this SerDe, the
    * blob that is passed in is a JSON string, and the Jackson JSON parser is
    * being used to translate the string into Java objects.
    *
    * The JSON deserialization works by taking the column names in the Hive
    * table, and looking up those fields in the parsed JSON object. If the value
    * of the field is not a primitive, the object is parsed further.
    */
      @Override
      public Object deserialize(Writable blob) throws SerDeException {
        Map<?,?> root = null;
        row.clear();
        try {
          ObjectMapper mapper = new ObjectMapper();
          // This is really a Map<String, Object>. For more information about how
          // Jackson parses JSON in this example, see
          // http://wiki.fasterxml.com/JacksonDataBinding
          root = mapper.readValue(blob.toString(), Map.class);
        } catch (Exception e) {
          throw new SerDeException(e);
        }
    
        // Lowercase the keys as expected by hive
        Map<String, Object> lowerRoot = new HashMap();
        for(Map.Entry entry: root.entrySet()) {
          lowerRoot.put(((String)entry.getKey()).toLowerCase(), entry.getValue());
        }
        root = lowerRoot;
        
        Object value= null;
        for (String fieldName : rowTypeInfo.getAllStructFieldNames()) {
          try {
            TypeInfo fieldTypeInfo = rowTypeInfo.getStructFieldTypeInfo(fieldName);
            value = parseField(root.get(fieldName), fieldTypeInfo);
          } catch (Exception e) {
            value = null;
          }
          row.add(value);
        }
        return row;
      }
      
      /**
    * Parses a JSON object according to the Hive column's type.
    *
    * @param field - The JSON object to parse
    * @param fieldTypeInfo - Metadata about the Hive column
    * @return - The parsed value of the field
    */
      private Object parseField(Object field, TypeInfo fieldTypeInfo) {
        switch (fieldTypeInfo.getCategory()) {
        case PRIMITIVE:
          // Jackson will return the right thing in this case, so just return
          // the object
          if (field instanceof String) {
            field = field.toString().replaceAll("\n", "\\\\n");
          }
          return field;
        case LIST:
          return parseList(field, (ListTypeInfo) fieldTypeInfo);
        case MAP:
          return parseMap(field, (MapTypeInfo) fieldTypeInfo);
        case STRUCT:
          return parseStruct(field, (StructTypeInfo) fieldTypeInfo);
        case UNION:
          // Unsupported by JSON
        default:
          return null;
        }
      }
      
      /**
    * Parses a JSON object and its fields. The Hive metadata is used to
    * determine how to parse the object fields.
    *
    * @param field - The JSON object to parse
    * @param fieldTypeInfo - Metadata about the Hive column
    * @return - A map representing the object and its fields
    */
      private Object parseStruct(Object field, StructTypeInfo fieldTypeInfo) {
        Map<Object,Object> map = (Map<Object,Object>)field;
        ArrayList<TypeInfo> structTypes = fieldTypeInfo.getAllStructFieldTypeInfos();
        ArrayList<String> structNames = fieldTypeInfo.getAllStructFieldNames();
        
        List<Object> structRow = new ArrayList<Object>(structTypes.size());
        for (int i = 0; i < structNames.size(); i++) {
          structRow.add(parseField(map.get(structNames.get(i)), structTypes.get(i)));
        }
        return structRow;
      }
    
      /**
    * Parse a JSON list and its elements. This uses the Hive metadata for the
    * list elements to determine how to parse the elements.
    *
    * @param field - The JSON list to parse
    * @param fieldTypeInfo - Metadata about the Hive column
    * @return - A list of the parsed elements
    */
      private Object parseList(Object field, ListTypeInfo fieldTypeInfo) {
        ArrayList<Object> list = (ArrayList<Object>) field;
        TypeInfo elemTypeInfo = fieldTypeInfo.getListElementTypeInfo();
        
        for (int i = 0; i < list.size(); i++) {
          list.set(i, parseField(list.get(i), elemTypeInfo));
        }
        
        return list.toArray();
      }
    
      /**
    * Parse a JSON object as a map. This uses the Hive metadata for the map
    * values to determine how to parse the values. The map is assumed to have
    * a string for a key.
    *
    * @param field - The JSON list to parse
    * @param fieldTypeInfo - Metadata about the Hive column
    * @return
    */
      private Object parseMap(Object field, MapTypeInfo fieldTypeInfo) {
        Map<Object,Object> map = (Map<Object,Object>) field;
        TypeInfo valueTypeInfo = fieldTypeInfo.getMapValueTypeInfo();
        
        for (Map.Entry<Object,Object> entry : map.entrySet()) {
          map.put(entry.getKey(), parseField(entry.getValue(), valueTypeInfo));
        }
        return map;
      }
    
      /**
    * Return an ObjectInspector for the row of data
    */
      @Override
      public ObjectInspector getObjectInspector() throws SerDeException {
        return rowOI;
      }
    
      /**
    * Unimplemented
    */
      @Override
      public SerDeStats getSerDeStats() {
        return null;
      }
    
      /**
    * JSON is just a textual representation, so our serialized class
    * is just Text.
    */
      @Override
      public Class<? extends Writable> getSerializedClass() {
        return Text.class;
      }
    
      /**
    * This method takes an object representing a row of data from Hive, and uses
    * the ObjectInspector to get the data for each column and serialize it. This
    * implementation deparses the row into an object that Jackson can easily
    * serialize into a JSON blob.
    */
      @Override
      public Writable serialize(Object obj, ObjectInspector oi)
          throws SerDeException {
        Object deparsedObj = deparseRow(obj, oi);
        ObjectMapper mapper = new ObjectMapper();
        try {
          // Let Jackson do the work of serializing the object
          return new Text(mapper.writeValueAsString(deparsedObj));
        } catch (Exception e) {
          throw new SerDeException(e);
        }
      }
    
      /**
    * Deparse a Hive object into a Jackson-serializable object. This uses
    * the ObjectInspector to extract the column data.
    *
    * @param obj - Hive object to deparse
    * @param oi - ObjectInspector for the object
    * @return - A deparsed object
    */
      private Object deparseObject(Object obj, ObjectInspector oi) {
        switch (oi.getCategory()) {
        case LIST:
          return deparseList(obj, (ListObjectInspector)oi);
        case MAP:
          return deparseMap(obj, (MapObjectInspector)oi);
        case PRIMITIVE:
          return deparsePrimitive(obj, (PrimitiveObjectInspector)oi);
        case STRUCT:
          return deparseStruct(obj, (StructObjectInspector)oi, false);
        case UNION:
          // Unsupported by JSON
        default:
          return null;
        }
      }
      
      /**
    * Deparses a row of data. We have to treat this one differently from
    * other structs, because the field names for the root object do not match
    * the column names for the Hive table.
    *
    * @param obj - Object representing the top-level row
    * @param structOI - ObjectInspector for the row
    * @return - A deparsed row of data
    */
      private Object deparseRow(Object obj, ObjectInspector structOI) {
        return deparseStruct(obj, (StructObjectInspector)structOI, true);
      }
    
      /**
    * Deparses struct data into a serializable JSON object.
    *
    * @param obj - Hive struct data
    * @param structOI - ObjectInspector for the struct
    * @param isRow - Whether or not this struct represents a top-level row
    * @return - A deparsed struct
    */
      private Object deparseStruct(Object obj,
                                   StructObjectInspector structOI,
                                   boolean isRow) {
        Map<Object,Object> struct = new HashMap<Object,Object>();
        List<? extends StructField> fields = structOI.getAllStructFieldRefs();
        for (int i = 0; i < fields.size(); i++) {
          StructField field = fields.get(i);
          // The top-level row object is treated slightly differently from other
          // structs, because the field names for the row do not correctly reflect
          // the Hive column names. For lower-level structs, we can get the field
          // name from the associated StructField object.
          String fieldName = isRow ? colNames.get(i) : field.getFieldName();
          ObjectInspector fieldOI = field.getFieldObjectInspector();
          Object fieldObj = structOI.getStructFieldData(obj, field);
          struct.put(fieldName, deparseObject(fieldObj, fieldOI));
        }
        return struct;
      }
    
      /**
    * Deparses a primitive type.
    *
    * @param obj - Hive object to deparse
    * @param oi - ObjectInspector for the object
    * @return - A deparsed object
    */
      private Object deparsePrimitive(Object obj, PrimitiveObjectInspector primOI) {
        return primOI.getPrimitiveJavaObject(obj);
      }
    
      private Object deparseMap(Object obj, MapObjectInspector mapOI) {
        Map<Object,Object> map = new HashMap<Object,Object>();
        ObjectInspector mapValOI = mapOI.getMapValueObjectInspector();
        Map<?,?> fields = mapOI.getMap(obj);
        for (Map.Entry<?,?> field : fields.entrySet()) {
          Object fieldName = field.getKey();
          Object fieldObj = field.getValue();
          map.put(fieldName, deparseObject(fieldObj, mapValOI));
        }
        return map;
      }
    
      /**
    * Deparses a list and its elements.
    *
    * @param obj - Hive object to deparse
    * @param oi - ObjectInspector for the object
    * @return - A deparsed object
    */
      private Object deparseList(Object obj, ListObjectInspector listOI) {
        List<Object> list = new ArrayList<Object>();
        List<?> field = listOI.getList(obj);
        ObjectInspector elemOI = listOI.getListElementObjectInspector();
        for (Object elem : field) {
          list.add(deparseObject(elem, elemOI));
        }
        return list;
      }
    }

    Using the SerDe

    然后将JSONSerDe.java代码通过Eclipse打包成*.jar文件,并添加相关属性到hive-site.xml文件中,否则在hive客户端执行相关MR操作时提示com.cloudera.hive.serde.JSONSerDe无法找到的Hive ClassNotFoundException相关异常,sulution如下:

    you need to tell Hive about the JAR. This is how I do it in hive-site.xml:
    将下面的配置语句,加在配置文件: $HIVE_INSTALL/conf/hive-site.xml中,value中*.jar的路径为你机器上实际的放置,在$HIVE_INSTALL/lib目录下寻找。
    <property>
      <name>hive.aux.jars.path</name>
      <value>file:///home/landen/UntarFile/hive-0.10.0/lib/*.jar</value>
      <description>These JAR file are available to all users for all jobs</description>
    </property>

    Notice: 仅仅ADD JAR home/landen/UntarFile/hive-0.10.0/lib/*.jar是不够的,还需要在hive启动之前告知其*.jar路径,否则会出现上述相关异常。

    Tables can be configured to process data using a SerDe by specifying the SerDe to use at table creation time, or through the use of an ALTER TABLE statement. For example:

    create table if not exists tweets(
           text string comment 'tweet content',
           created_at int comment 'the time the tweet issued',
           user_id int comment 'user id')
    ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
    LOCATION '/home/landen/UntarFile/hive-0.10.0/StorageTable' ;       

    相关知识:

    1、SerDe是Serialize/Deserilize的简称,目的是用于序列化和反序列化。
    
    2、用户在建表时可以用自定义的SerDe或使用Hive自带的SerDe,SerDe能为表指定列,且对列指定相应的数据。
    
    CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
      [(col_name data_type [COMMENT col_comment], ...)]
      [COMMENT table_comment]
      [PARTITIONED BY (col_name data_type
        [COMMENT col_comment], ...)]
      [CLUSTERED BY (col_name, col_name, ...)
      [SORTED BY (col_name [ASC|DESC], ...)]
      INTO num_buckets BUCKETS]
      [ROW FORMAT row_format]
      [STORED AS file_format]
      [LOCATION hdfs_path]
    
    创建指定SerDe表时,使用row format row_format参数,例如:
    
    a、添加jar包。在hive客户端输入:hive>add jar  /run/serde_test.jar;
    或者在linux shell端执行命令:${HIVE_HOME}/bin/hive  -auxpath  /run/serde_test.jar 
    b、建表:create table serde_table row format serde  'hive.connect.TestDeserializer';
    
    3、编写序列化类TestDeserializer。实现Deserializer接口的三个函数:
    
    a)初始化:initialize(Configuration conf, Properties tb1)。
    
    b)反序列化Writable类型返回Object:deserialize(Writable blob)。
    
    c)获取deserialize(Writable blob)返回值Object的inspector:getObjectInspector()。
    
    public interface Deserializer {
    
      /**
       * Initialize the HiveDeserializer.
       * @param conf System properties
       * @param tbl  table properties
       * @throws SerDeException
       */
      public void initialize(Configuration conf, Properties tbl) throws  SerDeException;
      
      /**
       * Deserialize an object out of a Writable blob.
       * In most cases, the return value of this function will be  constant since the function
       * will reuse the returned object.
       * If the client wants to keep a copy of the object, the client  needs to clone the
       * returned value by calling  ObjectInspectorUtils.getStandardObject().
       * @param blob The Writable object containing a serialized object
       * @return A Java object representing the contents in the blob.
       */
      public Object deserialize(Writable blob) throws SerDeException;
    
      /**
       * Get the object inspector that can be used to navigate through  the internal
       * structure of the Object returned from deserialize(...).
       */
      public ObjectInspector getObjectInspector() throws SerDeException;
    
    }

    Hive客户端执行过程如下:

    landen@landen-Lenovo:~/UntarFile/hive-0.10.0$ bin/hive
    WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files.
    Logging initialized using configuration in jar:file:/home/landen/UntarFile/hive-0.10.0/lib/hive-common-0.10.0.jar!/hive-log4j.properties
    Hive history file=/tmp/landen/hive_job_log_landen_201305051116_1786137407.txt
    hive (default)> show databases;
    OK
    database_name
    dataprocess
    default
    economy
    financials
    human_resources
    login
    student
    Time taken: 4.411 seconds
    hive (default)> use dataprocess;
    OK
    Time taken: 0.032 seconds
    hive (dataprocess)> load data local path '/home/landen/文档/语料库/NLPIR——tweets.txt' overwrite into table tweets;
    Copying data.......... hive (dataprocess)
    > describe tweets; OK col_name data_type comment text string from deserializer created_at int from deserializer user_id int from deserializer Time taken: 0.427 seconds
    可以看出导入到JSON格式数据已被(JSONSerDe)反序列化为Hive所能理解的数据格式文件 hive (dataprocess)
    > select * from tweets limit 20;//此时还未启动MapReduce过程 OK text created_at user_id @shizhao,我此前一直用dh的,你问问谁用bluehost借用一下就可以了。一般的小站流量根本没多大的.. 1177292576 1 可以看了 1177248274 0 你给的链接无法查看 1177248218 0 转移备份,在看iyee关于blognetwork的文章... 1177174402 0 当帮主也不错啊 1177172873 0 没人告知 1177172446 0 twitter支持中文了? 原来头字符不能是中文的.... 1177172440 0 我也要 1177172414 0 @geegi 你在skype上吗? 1177083182 0 ... 可怜的AMD,但我相信它们比Intel更有钱途 1177082821 0 ..... 并购ATi似乎不在这时候体现吧 1177082690 0 ... 不过就是粘了点改革开放的春风,更多有钱的人不是踢足球的 :( 1177081404 0 @QeeGi 很有理 1177081154 0 ... 不涨工资,还要存款,计划买房,压力不小,生活如此辛苦 1177080852 0 ........ 偶要去吃kfc 1176980497 0 @hung 虽然显示面积大了,但感觉不太方便啊 1176961521 0 @hung 你不用书签栏 1176961395 0 $40-45 million ebay买下StumbleUpon 1176954286 0 ... 加班ing 1176890179 0 ... wjs就是典型的小资,鄙视 1176884977 0 Time taken: 12.161 seconds hive (dataprocess)> select count(*) from tweets;//此时开始启动MapReduce过程 Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.pe
      1. LOAD DATA LOCAL INPATH 'disp_20130204_14_disp1.log' OVERWRITE INTO TABLE disp_log_data; 
    r.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapred.reduce.tasks=<number>
    Starting Job = job_201305041640_0008, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201305041640_0008
    Kill Command = /home/landen/UntarFile/hadoop-1.0.4/libexec/../bin/hadoop job  -kill job_201305041640_0008
    Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 1
    2013-05-05 11:20:50,690 Stage-1 map = 0%,  reduce = 0%
    2013-05-05 11:21:36,395 Stage-1 map = 6%,  reduce = 0%
    2013-05-05 11:22:02,540 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:03,545 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:04,549 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:05,552 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:06,556 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:07,559 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:08,564 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:09,569 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:10,572 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:11,593 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:13,348 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:14,351 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:15,355 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:16,358 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:17,361 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:18,365 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:19,369 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:20,373 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:21,376 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:22,380 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:23,384 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:24,389 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:26,460 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:27,464 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:28,468 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:29,471 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:30,793 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:32,357 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:33,706 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:34,709 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:36,622 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:37,626 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:38,631 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:39,635 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:40,639 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:41,643 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:42,648 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:43,651 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:44,655 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:45,659 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:46,662 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 39.64 sec
    2013-05-05 11:22:47,669 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 80.39 sec
    2013-05-05 11:22:48,683 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 80.39 sec
    2013-05-05 11:22:49,686 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 80.39 sec
    2013-05-05 11:22:50,693 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 80.39 sec
    2013-05-05 11:22:51,696 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 80.39 sec
    2013-05-05 11:22:52,699 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 80.39 sec
    2013-05-05 11:22:53,705 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
    2013-05-05 11:22:54,987 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
    2013-05-05 11:22:55,994 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
    2013-05-05 11:22:56,998 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
    2013-05-05 11:22:58,003 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
    2013-05-05 11:22:59,010 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:00,017 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:01,021 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:02,655 Stage-1 map = 38%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:04,766 Stage-1 map = 38%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:06,201 Stage-1 map = 38%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:07,945 Stage-1 map = 38%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:09,201 Stage-1 map = 38%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:10,624 Stage-1 map = 38%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:11,628 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:13,317 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:14,323 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:15,327 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:16,331 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:17,334 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:18,405 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:19,409 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:20,412 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:21,417 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:22,420 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:27,402 Stage-1 map = 44%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:30,861 Stage-1 map = 50%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:31,865 Stage-1 map = 50%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:33,569 Stage-1 map = 50%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:34,573 Stage-1 map = 50%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:35,576 Stage-1 map = 50%,  reduce = 8%, Cumulative CPU 111.41 sec
    2013-05-05 11:23:36,630 Stage-1 map = 56%,  reduce = 8%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:37,635 Stage-1 map = 56%,  reduce = 8%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:38,671 Stage-1 map = 56%,  reduce = 8%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:39,676 Stage-1 map = 56%,  reduce = 8%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:40,683 Stage-1 map = 56%,  reduce = 8%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:41,691 Stage-1 map = 56%,  reduce = 8%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:42,701 Stage-1 map = 56%,  reduce = 17%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:43,705 Stage-1 map = 56%,  reduce = 17%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:44,752 Stage-1 map = 56%,  reduce = 17%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:45,755 Stage-1 map = 56%,  reduce = 17%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:46,758 Stage-1 map = 56%,  reduce = 17%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:47,769 Stage-1 map = 56%,  reduce = 17%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:48,773 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:49,776 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:50,779 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:51,784 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:52,788 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:53,793 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 131.8 sec
    2013-05-05 11:23:54,812 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:23:55,831 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:23:56,834 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:23:57,838 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:23:58,843 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:23:59,918 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:00,921 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:01,924 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:02,927 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:03,931 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:04,934 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:05,938 Stage-1 map = 63%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:06,941 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:07,944 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:08,948 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:09,952 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:10,956 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:11,960 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:12,964 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:13,968 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:14,973 Stage-1 map = 69%,  reduce = 17%, Cumulative CPU 182.57 sec
    2013-05-05 11:24:15,977 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
    2013-05-05 11:24:16,981 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
    2013-05-05 11:24:17,985 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
    2013-05-05 11:24:18,988 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
    2013-05-05 11:24:19,992 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
    2013-05-05 11:24:20,995 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
    2013-05-05 11:24:21,998 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
    2013-05-05 11:24:23,001 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
    2013-05-05 11:24:24,008 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
    2013-05-05 11:24:25,012 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
    2013-05-05 11:24:26,016 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
    2013-05-05 11:24:27,024 Stage-1 map = 94%,  reduce = 17%, Cumulative CPU 198.58 sec
    2013-05-05 11:24:28,028 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 225.88 sec
    2013-05-05 11:24:29,034 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 225.88 sec
    2013-05-05 11:24:30,037 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 225.88 sec
    2013-05-05 11:24:31,043 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 225.88 sec
    2013-05-05 11:24:32,046 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 225.88 sec
    2013-05-05 11:24:33,049 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 225.88 sec
    2013-05-05 11:24:34,055 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
    2013-05-05 11:24:35,058 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
    2013-05-05 11:24:36,061 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
    2013-05-05 11:24:37,065 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
    2013-05-05 11:24:38,068 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
    2013-05-05 11:24:39,072 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
    2013-05-05 11:24:40,076 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 227.04 sec
    MapReduce Total cumulative CPU time: 3 minutes 47 seconds 40 msec
    Ended Job = job_201305041640_0008
    MapReduce Jobs Launched: 
    Job 0: Map: 4  Reduce: 1   Cumulative CPU: 227.04 sec   HDFS Read: 845494724 HDFS Write: 8 SUCCESS
    Total MapReduce CPU Time Spent: 3 minutes 47 seconds 40 msec
    OK将下面的配置语句,加在配置文件: $HIVE_INSTALL/conf/hive-site.xml中,value中hive-contrib-*.jar的路径为你机器上实际的放置,在$HIVE_INSTALL/lib目录下寻找。
    _c0
    4999999(500万条过滤后的twitter语料库)
    Time taken: 266.063 seconds
    hive (dataprocess)> select text,created_at from tweets where user_id = 1;
    Total MapReduce jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_201305041640_0009, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201305041640_0009
    Kill Command = /home/landen/UntarFile/hadoop-1.0.4/libexec/../bin/hadoop job  -kill job_201305041640_0009
    Hadoop job information for Stage-1: number of mappers: 4; number of reducers: 0
    2013-05-05 20:45:19,007 Stage-1 map = 0%,  reduce = 0%
    2013-05-05 20:45:48,825 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:45:49,836 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:45:50,838 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:45:51,841 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:45:52,844 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:45:56,152 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:45:57,158 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:45:58,161 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:45:59,163 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:00,166 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:01,169 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:02,200 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:03,203 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:04,206 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:05,208 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:06,212 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:07,215 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:08,219 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:09,225 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:10,227 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:11,231 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:12,234 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:13,237 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:14,239 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:15,242 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:16,244 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:17,247 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:18,250 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:19,256 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:20,260 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:21,263 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:22,266 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:23,277 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:24,279 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:25,282 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:26,286 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:27,290 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:28,292 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:29,296 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:30,298 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:31,301 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:32,303 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:33,306 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 13.11 sec
    2013-05-05 20:46:34,610 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:35,688 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:36,693 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:37,696 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:38,698 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:39,701 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:40,703 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:41,707 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:42,710 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:43,713 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:44,715 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:45,718 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:46,721 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:47,723 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:48,728 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:49,732 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:50,764 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:51,820 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:52,823 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:53,879 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:54,998 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:56,161 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:57,164 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:58,167 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:46:59,262 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:47:00,811 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 100.22 sec
    2013-05-05 20:47:02,161 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:03,164 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:04,166 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:05,169 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:08,703 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:09,710 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:10,713 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:11,715 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:12,718 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:13,723 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:14,726 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:15,729 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:16,732 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:17,737 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:18,739 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:19,745 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:20,749 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:21,754 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:22,757 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:23,760 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:24,763 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:25,766 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:26,770 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:27,778 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:28,781 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:29,784 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:30,788 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:31,791 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:32,795 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:33,798 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 111.27 sec
    2013-05-05 20:47:34,964 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 155.37 sec
    2013-05-05 20:47:35,967 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 155.37 sec
    2013-05-05 20:47:37,161 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 155.37 sec
    2013-05-05 20:47:38,173 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 155.37 sec
    2013-05-05 20:47:39,176 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 155.37 sec
    2013-05-05 20:47:40,244 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
    2013-05-05 20:47:41,247 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
    2013-05-05 20:47:42,249 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
    2013-05-05 20:47:43,319 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
    2013-05-05 20:47:44,322 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
    2013-05-05 20:47:45,325 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
    2013-05-05 20:47:46,327 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
    2013-05-05 20:47:47,330 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
    2013-05-05 20:47:48,333 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 158.69 sec
    2013-05-05 20:47:49,644 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 158.69 sec
    MapReduce Total cumulative CPU time: 2 minutes 38 seconds 690 msec
    Ended Job = job_201305041640_0009
    MapReduce Jobs Launched: 
    Job 0: Map: 4   Cumulative CPU: 158.69 sec   HDFS Read: 845494724 HDFS Write: 138 SUCCESS
    Total MapReduce CPU Time Spent: 2 minutes 38 seconds 690 msec
    OK
    text    created_at
    @shizhao,我此前一直用dh的,你问问谁用bluehost借用一下就可以了。一般的小站流量根本没多大的..    1177292576
    Time taken: 172.857 seconds
    hive (dataprocess)> 

    Conclusion

    The SerDe interface is extremely powerful for dealing with data with a complex schema. By utilizing SerDes, any dataset can be made queryable through Hive.

    参考资料:

    (How-to: Use a SerDe in Apache Hive) http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/

    (Hive中SerDe概述) http://blog.csdn.net/dajuezhao/article/details/5753791

    (大数据解决方案设计) http://www.infoq.com/cn/articles/BigDataBlueprint

    (把JSON格式的数据储存到MongDB中) http://www.myexception.cn/database/502613.html

  • 相关阅读:
    fescar源码解析系列(一)之启动详解
    dubbo源码解析二 invoker链
    dubbo源码解析一
    CSP-S 2021 游记
    使用SpEL记录操作日志的详细信息
    Router 重定向和别名是什么?
    vue项目做seo(prerender-spa-plugin预渲染)
    vue3.0初体验有哪些实用新功能
    uniapp弹窗踩坑
    Spring boot application.properties 配置
  • 原文地址:https://www.cnblogs.com/likai198981/p/3061499.html
Copyright © 2020-2023  润新知