• Hadoop HDFS copyMergeFromLocal


    在谈到HDFS优化中,其中HDFS擅长处理大文件,而对于小文件常用的优化策略有压缩合并。在此列举小文件合并工具类供参考。

    
    
    /**
    * Get all the files in the directories that match the source file pattern
    * and merge and sort them to only one file on HDFS is kept.
    * 
    * Also adds a string between the files (useful for adding 
    
    * to a text file)
    * @param srcf: a file pattern specifying source files
    * @param dstf: a destination local file/directory
    * @param endline: if an end of line character is added to a text file 
    * @exception: IOException 
    */
    public static void copyMergeFromLocal(String srcf, Path dst, boolean endline)
                throws IOException {
            Configuration conf = new Configuration();
            Path srcPath = new Path(srcf);
            FileSystem dstFs = srcPath.getFileSystem(conf);
            FileSystem srcFs = FileSystem.getLocal(conf);
            Path[] srcs = FileUtil.stat2Paths(srcFs.globStatus(srcPath), srcPath);
            for (Path src : srcs) {
                FileUtil.copyMerge(srcFs, src,
                        dstFs, dst, false, conf,
                        endline ? "
    " : null);
            }
        }
    void copyMergeFromLocal(String srcf, Path dst) throws IOException {
        copyMergeFromLocal(srcf, dst, false);
      }

    在HDFS文件上传时,可以设置过滤条件,使小文件自动合并。

  • 相关阅读:
    斯坦福大学Andrew Ng
    斯坦福大学Andrew Ng
    斯坦福大学Andrew Ng
    斯坦福大学Andrew Ng
    学到即赚到
    matlab学习笔记之五种常见的图形绘制功能
    Flutter混合栈的管理
    Android调用系统拍照裁剪和选图功能
    Android DataBinding库(MVVM设计模式)
    Flutter 动画使用
  • 原文地址:https://www.cnblogs.com/cunchen/p/9464193.html
Copyright © 2020-2023  润新知