1.问题描述
使用hadoop 的reduce端排序,用MultipleInputs 输入两个文件夹下不同格式的文件,使用两个mapper解析,hadoop版本2.8.3. 3 hadoop 3.2.1也报同样的错误。
java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
Caused by: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit
at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.setup(DelegatingMapper.java:45)
at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.run(DelegatingMapper.java:54)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2.问题分析
网上找了很多资料都说是hadoop的一个bug,需要修改TaggedInputSplit的源码(路径D:hadoophadoop-2.8.3sharehadoopmapreducehadoop-mapreduce-client-core-2.8.3.jar!orgapachehadoopmapreducelibinputDelegatingMapper.class),也就是hadoop-client-core,然后重新编译源码,生成hadoop-mapreduce-client-core-2.8.3.jar替换掉本地仓库的jar包就可以了。
(1)MultipleInputs的addInputPath方法如下,job.setMapperClass(DelegatingMapper.class);表明用DelegatingMapper的作为输入文件的处理类,实际的处理数据的Mapper是入参mapperClass, mapperClass名称和输入文件路径被组织成了字符串,然后设置到conf配置中。可能最后被DelegatingMapper读取配置conf,然后在获取到实际的处理类mapperClass和输入文件路径,进行处理文件中的数据。
public static void addInputPath(Job job, Path path, Class<? extends InputFormat> inputFormatClass, Class<? extends Mapper> mapperClass) { addInputPath(job, path, inputFormatClass); Configuration conf = job.getConfiguration(); String mapperMapping = path.toString() + ";" + mapperClass.getName(); String mappers = conf.get("mapreduce.input.multipleinputs.dir.mappers"); conf.set("mapreduce.input.multipleinputs.dir.mappers", mappers == null ? mapperMapping : mappers + "," + mapperMapping); job.setMapperClass(DelegatingMapper.class); }
(2)在idea中点击报错的行at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.setup(DelegatingMapper.java:45) 会跳转到转换异常发生的地方
protected void setup(Mapper<K1, V1, K2, V2>.Context context) throws IOException, InterruptedException {
TaggedInputSplit inputSplit = (TaggedInputSplit)context.getInputSplit();
this.mapper = (Mapper)ReflectionUtils.newInstance(inputSplit.getMapperClass(), context.getConfiguration());
}
context.getInputSplit();返回的是FileSplit对象类型,两个类的定义如下,两个类都是继承于InputSplit,只有父类和子类之间能转换,但是这两个类之间无法完成强制转换,所以报异常
FileSplit类定义
public class FileSplit extends InputSplit implements Writable
TaggedInputSplit 类定义
class TaggedInputSplit extends InputSplit implements Configurable, Writable {
3.解决方法
实际异常原因是我在驱动代码中加入了下面三行代码,用MultipleInputs添加了输入路径之后,和输入input之后,就不能再用FileInputFormat添加输入路径了,也不用setInputFormatClass,
这样系统会认为是普通的输入FileSpilt,所以TaggedInputSplit inputSplit = (TaggedInputSplit)context.getInputSplit();中context.getInputSplit()返回的是FileSplit类型,
不是TaggedInputSplit而无法被强制转换为TaggedInputSplit 类,TaggedInputSplit类会根据是要解析出传入的实际map处理类的。
public int run(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
if (args.length!=3)
{
return -1;
}
Job job=new Job(getConf(),"joinStationTemperatueRecord");
if (job==null)
{
return -1;
}
String strClassName=this.getClass().getName();
job.setJarByClass(this.getClass());
//设置两个输入路径,一个输出路径
Path StationPath=new Path(args[0]);
Path TemperatureRecordPath= new Path(args[1]);
Path outputPath=new Path(args[2]);
MultipleInputs.addInputPath(job,StationPath, TextInputFormat.class,StationMapper.class);
MultipleInputs.addInputPath(job,TemperatureRecordPath,TextInputFormat.class,TemperatureRecordMapper.class);
FileOutputFormat.setOutputPath(job,outputPath);
//设置分区类、分组类、reduce类
job.setPartitionerClass(FirstPartitioner.class);
job.setGroupingComparatorClass(GroupingComparator.class);
job.setReducerClass(JoinReducer.class);
job.setNumReduceTasks(2);
//下面的三行不能加,否则会报java.lang.Exception: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.FileSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit
// job.setInputFormatClass(TextInputFormat.class);
// FileInputFormat.addInputPath(job,StationPath);
// FileInputFormat.addInputPath(job,TemperatureRecordPath);
//设置输出类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapOutputKeyClass(TextPair.class);
job.setMapOutputValueClass(Text.class);
//删除结果目录,重新生成
FileUtil.fullyDelete(new File(args[2]));
return job.waitForCompletion(true)? 0:1;
}
自己开发了一个股票智能分析软件,功能很强大,需要的点击下面的链接获取:
https://www.cnblogs.com/bclshuai/p/11380657.html