压缩完了,当然需要解压缩了。
直接上代码:
private static void getFile(String filePath) throws IOException, ClassNotFoundException { FileSystem fs = FileSystem.get(URI.create(filePath), HDFSConf.getConf()); Path path = new Path(filePath); if (fs.exists(path) ) { FSDataInputStream in; FSDataOutputStream out; Path outPath; FileStatus file = fs.getFileStatus(path); // 该压缩方法对应的文件扩展名 outPath = new Path(filePath.substring(0,filePath.indexOf(".")) + ".new"); logger.info("out put path is : " + outPath.toString()); if (fs.createNewFile(outPath)) { CompressionCodecFactory factory = new CompressionCodecFactory(HDFSConf.getConf()); CompressionCodec codec = factory.getCodec(file.getPath()); in = fs.open(file.getPath()); InputStream cin = codec.createInputStream(in); logger.info("create file : " + outPath.toString()); out = fs.append(outPath); // 缓冲区设为5MB IOUtils.copyBytes(cin, out, 1024 * 1024 * 5, false); out.flush(); cin.close(); in.close(); out.close(); logger.info("Decompress file successful"); } else { logger.error("File exists"); } } else { logger.info("There is no file :" + filePath); } }
打包执行:
[hadoop@venn05 venn]$ java -cp compressHdfsFile-1.0-SNAPSHOT.jar com.utstarcom.hdfs.DeCompressFile /aaa/test/viewlog_20180402.log.gz 2018-06-10 04:21:44.562 [Common.java] [main] INFO : start init : 2018-06-10 04:21:44.566 [Common.java] [main] INFO : properties path : /opt/hadoop/tmp/venn/ /opt/hadoop/tmp/venn/hdfs.properties default.compress.format hdfs.uri 2018-06-10 04:21:44.568 [Common.java] [main] INFO : get System enviroment : 46 2018-06-10 04:21:44.569 [Common.java] [main] INFO : properties path : {hdfs.uri=hdfs://venn06:8020, default.compress.format=bz2} hdfs://venn06:8020/aaa/test/viewlog_20180402.log.gz log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 2018-06-10 04:21:46.409 [DeCompressFile.java] [main] INFO : out put path is : hdfs://venn06:8020/aaa/test/viewlog_20180402.new 2018-06-10 04:21:46.623 [DeCompressFile.java] [main] INFO : create file : hdfs://venn06:8020/aaa/test/viewlog_20180402.new 2018-06-10 04:22:24.566 [DeCompressFile.java] [main] INFO : Decompress file successful cost : 39 s
文件大小: 249.4 M ,解压后大小:1.4 G,执行时间 39 s,很不错
文件大小: [hadoop@ut01 venn]$ hadoop fs -ls /aaa/test/ 18/06/10 04:26:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 3 items -rw-r--r-- 3 hadoop supergroup 1515343101 2018-06-03 17:07 /aaa/test/viewlog_20180402.log -rw-r--r-- 3 hadoop supergroup 261506977 2018-06-09 15:46 /aaa/test/viewlog_20180402.log.gz -rw-r--r-- 3 hadoop supergroup 1515343101 2018-06-09 15:43 /aaa/test/viewlog_20180402.new [hadoop@ut01 venn]$ hadoop fs -ls -h /aaa/test/ 18/06/10 04:26:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 3 items -rw-r--r-- 3 hadoop supergroup 1.4 G 2018-06-03 17:07 /aaa/test/viewlog_20180402.log -rw-r--r-- 3 hadoop supergroup 249.4 M 2018-06-09 15:46 /aaa/test/viewlog_20180402.log.gz -rw-r--r-- 3 hadoop supergroup 1.4 G 2018-06-09 15:43 /aaa/test/viewlog_20180402.new
项目地址:码云