• hive数据文件简单合并


    MR代码:

    package merge;
    import java.io.IOException;
    import java.util.Iterator;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.FileInputFormat;
    import org.apache.hadoop.mapred.FileOutputFormat;
    import org.apache.hadoop.mapred.JobClient;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapred.MapReduceBase;
    import org.apache.hadoop.mapred.Mapper;
    import org.apache.hadoop.mapred.OutputCollector;
    import org.apache.hadoop.mapred.Reducer;
    import org.apache.hadoop.mapred.Reporter;
    import org.apache.hadoop.mapred.TextInputFormat;
    import org.apache.hadoop.mapred.TextOutputFormat;
    
    public class merge
    {
        public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text>
        {
            private Text word=new Text("");
            public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
                    throws IOException
            {
                output.collect(value,word);
            }
        }
    
    
        public static void main(String[] args) throws Exception
        {
            JobConf conf = new JobConf(merge.class);
            conf.setJobName("wordcount");
            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(Text.class);
            conf.setMapperClass(Map.class);
            conf.setInputFormat(TextInputFormat.class);
            conf.setOutputFormat(TextOutputFormat.class);
            FileInputFormat.setInputPaths(conf, new Path(args[0]));
            FileOutputFormat.setOutputPath(conf, new Path(args[1]));
            JobClient.runJob(conf);
        }
    }

    Eclipse自动生成.class文件,打包命令:

    jar打包:在项目的bin目录下
    Dev-Fac:bin ce-pc$ jar -cvf hive-merge.jar -C  ../ .

    合并命令:

    hadoop jar /tmp/hive-merge.jar merge.merge /user/hive/warehouse/table1 /user/hive/warehouse/table1/out
    
    #merge.merge 表示merge包下的merge
  • 相关阅读:
    C# DataConstruct 数据结构关于 Array,ArrayList,List,HashTable,Dictionnary的学习记录
    Moq 在.net Core 单元测试中的使用
    记录一些 APM 仓储
    .net core Swagger 过滤部分Api
    C# Conversion Keywords
    (转载)C# 枚举 FlagsAttribute用法
    [慢更]Sublime Text 快捷键及使用过的插件
    Docker发布程序那些事
    RabbitMQ 学习日记
    Linux Tomcat7.0安装配置实践总结
  • 原文地址:https://www.cnblogs.com/ggzone/p/10121217.html
Copyright © 2020-2023  润新知