• 根据文件头判断文件类型



    最近在项目中需要判断文件类型,如果只根据文件后缀名称,还不够严格(有可能手动修改后缀名称),所以这种判断文件类型有差错,想起了以前在网上看到使用根据读取头文件部分内容与标准格式文件对比,即可准确判断文件类型,这里可以读取3个字节,或者10个字节

    具体如下:

    10个字节一种判断

    private static final HashMap<String, String> mFileTypes = new HashMap<String, String>();
            // judge file type by file header content
            static {
                mFileTypes.put("ffd8ffe000104a464946", "jpg"); //JPEG (jpg)       
                mFileTypes.put("89504e470d0a1a0a0000", "png"); //PNG (png)       
                mFileTypes.put("47494638396126026f01", "gif"); //GIF (gif)       
                mFileTypes.put("49492a00227105008037", "tif"); //TIFF (tif)       
                mFileTypes.put("424d228c010000000000", "bmp"); //16色位图(bmp)       
                mFileTypes.put("424d8240090000000000", "bmp"); //24位位图(bmp)       
                mFileTypes.put("424d8e1b030000000000", "bmp"); //256色位图(bmp)       
                mFileTypes.put("41433130313500000000", "dwg"); //CAD (dwg)       
                mFileTypes.put("3c21444f435459504520", "html"); //HTML (html)  
                mFileTypes.put("3c21646f637479706520", "htm"); //HTM (htm)  
                mFileTypes.put("48544d4c207b0d0a0942", "css"); //css  
                mFileTypes.put("696b2e71623d696b2e71", "js"); //js  
                mFileTypes.put("7b5c727466315c616e73", "rtf"); //Rich Text Format (rtf)       
                mFileTypes.put("38425053000100000000", "psd"); //Photoshop (psd)       
                mFileTypes.put("46726f6d3a203d3f6762", "eml"); //Email [Outlook Express 6] (eml)         
                mFileTypes.put("d0cf11e0a1b11ae10000", "doc"); //MS Excel 注意:word、msi 和 excel的文件头一样       
                mFileTypes.put("d0cf11e0a1b11ae10000", "vsd"); //Visio 绘图       
                mFileTypes.put("5374616E64617264204A", "mdb"); //MS Access (mdb)        
                mFileTypes.put("252150532D41646F6265", "ps");
                mFileTypes.put("255044462d312e350d0a", "pdf"); //Adobe Acrobat (pdf)     
                mFileTypes.put("2e524d46000000120001", "rmvb"); //rmvb/rm相同    
                mFileTypes.put("464c5601050000000900", "flv"); //flv与f4v相同    
                mFileTypes.put("00000020667479706d70", "mp4");
                mFileTypes.put("49443303000000002176", "mp3");
                mFileTypes.put("000001ba210001000180", "mpg"); //       
                mFileTypes.put("3026b2758e66cf11a6d9", "wmv"); //wmv与asf相同      
                mFileTypes.put("52494646e27807005741", "wav"); //Wave (wav)    
                mFileTypes.put("52494646d07d60074156", "avi");
                mFileTypes.put("4d546864000000060001", "mid"); //MIDI (mid)     
                mFileTypes.put("504b0304140000000800", "zip");
                mFileTypes.put("526172211a0700cf9073", "rar");
                mFileTypes.put("235468697320636f6e66", "ini");
                mFileTypes.put("504b03040a0000000000", "jar");
                mFileTypes.put("4d5a9000030000000400", "exe");//可执行文件  
                mFileTypes.put("3c25402070616765206c", "jsp");//jsp文件  
                mFileTypes.put("4d616e69666573742d56", "mf");//MF文件  
                mFileTypes.put("3c3f786d6c2076657273", "xml");//xml文件  
                mFileTypes.put("494e5345525420494e54", "sql");//xml文件  
                mFileTypes.put("7061636b616765207765", "java");//java文件  
                mFileTypes.put("406563686f206f66660d", "bat");//bat文件  
                mFileTypes.put("1f8b0800000000000000", "gz");//gz文件  
                mFileTypes.put("6c6f67346a2e726f6f74", "properties");//bat文件  
                mFileTypes.put("cafebabe0000002e0041", "class");//bat文件  
                mFileTypes.put("49545346030000006000", "chm");//bat文件  
                mFileTypes.put("04000000010000001300", "mxp");//bat文件  
                mFileTypes.put("504b0304140006000800", "docx");//docx文件  
                mFileTypes.put("d0cf11e0a1b11ae10000", "wps");//WPS文字wps、表格et、演示dps都是一样的  
                mFileTypes.put("6431303a637265617465", "torrent");
    
    
                mFileTypes.put("6D6F6F76", "mov"); //Quicktime (mov)    
                mFileTypes.put("FF575043", "wpd"); //WordPerfect (wpd)     
                mFileTypes.put("CFAD12FEC5FD746F", "dbx"); //Outlook Express (dbx)       
                mFileTypes.put("2142444E", "pst"); //Outlook (pst)        
                mFileTypes.put("AC9EBD8F", "qdf"); //Quicken (qdf)       
                mFileTypes.put("E3828596", "pwl"); //Windows Password (pwl)           
                mFileTypes.put("2E7261FD", "ram"); //Real Audio (ram)     
                mFileTypes.put("null", null); //null
            }
    
            public static String getFileType(String filePath) {
                return mFileTypes.get(getFileHeader(filePath));
            }
    
            private static String getFileHeader(String filePath) {
                File file=new File(filePath);
                if(!file.exists() || file.length()<11){
                    return "null";
                }
                FileInputStream is = null;
                String value = null;
                try {
                    is = new FileInputStream(file);
                    byte[] b = new byte[10];
                    is.read(b, 0, b.length);
                    value = bytesToHexString(b);
                } catch (Exception e) {
                } finally {
                    if(null != is) {
                        try {
                            is.close();
                        } catch (IOException e) {}
                    }
                }
                return value;
            }
    
            private static String bytesToHexString(byte[] src){
                StringBuilder stringBuilder = new StringBuilder();
                if (src == null || src.length <= 0) {
                    return null;
                }
                for (int i = 0; i < src.length; i++) {
                    int v = src[i] & 0xFF;
                    String hv = Integer.toHexString(v);
                    if (hv.length() < 2) {
                        stringBuilder.append(0);
                    }
                    stringBuilder.append(hv);
                }
                return stringBuilder.toString();
            }

    3个字节判断:

     private static final HashMap<String, String> mFileTypes = new HashMap<String, String>();
            // judge file type by 
            static {
                //images
                mFileTypes.put("FFD8FF", "jpg");
                mFileTypes.put("89504E47", "png");
                mFileTypes.put("47494638", "gif");
                mFileTypes.put("49492A00", "tif");
                mFileTypes.put("424D", "bmp");
                //
                mFileTypes.put("41433130", "dwg"); //CAD
                mFileTypes.put("38425053", "psd");
                mFileTypes.put("7B5C727466", "rtf"); //日记本
                mFileTypes.put("3C3F786D6C", "xml");
                mFileTypes.put("68746D6C3E", "html");
                mFileTypes.put("44656C69766572792D646174653A", "eml"); //邮件
                mFileTypes.put("D0CF11E0", "doc");
                mFileTypes.put("5374616E64617264204A", "mdb");
                mFileTypes.put("252150532D41646F6265", "ps");
                mFileTypes.put("255044462D312E", "pdf");
                mFileTypes.put("504B0304", "zip");
                mFileTypes.put("52617221", "rar");
                mFileTypes.put("57415645", "wav");
                mFileTypes.put("41564920", "avi");
                mFileTypes.put("2E524D46", "rm");
                mFileTypes.put("000001BA", "mpg");
                mFileTypes.put("000001B3", "mpg");
                mFileTypes.put("6D6F6F76", "mov");
                mFileTypes.put("3026B2758E66CF11", "asf");
                mFileTypes.put("4D546864", "mid");
                mFileTypes.put("1F8B08", "gz");
                mFileTypes.put("", "");
            }
    
            public static String getFileType(String filePath) {
                return mFileTypes.get(getFileHeader(filePath));
            }
            //获取文件头信息
            public static String getFileHeader(String filePath) {
               <pre name="code" class="java">          File file=new File(filePath);
                if(!file.exists() || file.length()<4){
                    return "null";
                }
                FileInputStream is = null;
                String value = null;
                try {
                    is = new FileInputStream(file);
                    byte[] b = new byte[3];
                    is.read(b, 0, b.length);
                    value = bytesToHexString(b);
                } catch (Exception e) {
                } finally {
                    if(null != is) {
                        try {
                            is.close();
                        } catch (IOException e) {}
                    }
                }
                return value;
            }
    
            private static String bytesToHexString(byte[] src){
                StringBuilder builder = new StringBuilder();
                if (src == null || src.length <= 0) {
                    return null;
                }
                String hv;
                for (int i = 0; i < src.length; i++) {
                    hv = Integer.toHexString(src[i] & 0xFF).toUpperCase();
                    if (hv.length() < 2) {
                        builder.append(0);
                    }
                    builder.append(hv);
                }
                return builder.toString();
            }

    
    

    优化处理:在不同的设备上同样类型的文件,文件头前面内容未必一致,可能只有前几个一致,后面就不同了(例如:jpg类型文件,在不同手机上,lennovo k900前10个是一致的,但是MI3只有前5个字符一致,后面是不一样的,所有一些情况进行特殊处理)当整个头文件失败后,在进行前5个字符截取对比处理,优化具体如下:

    public static String getFileType(String filePath) {
                String keySearch=getFileHeader(filePath);
                String fileSuffix=mFileTypes.get(keySearch);
                //补充 这里并不是所有的文件格式前10 byte(jpg)都一致,前五个byte一致即可
                if(TextUtils.isEmpty(fileSuffix)){
                    Iterator<String> keyList=mFileTypes.keySet().iterator();
                    String key,keySearchPrefix=keySearch.substring(0,5);
                    while (keyList.hasNext()){
                        key=keyList.next();
                        if(key.contains(keySearchPrefix)) {
                            fileSuffix = mFileTypes.get(key);
                            break;
                        }
                    }
                }
                return fileSuffix;
            }



  • 相关阅读:
    codeforce 272B Dima and Sequence
    Codeforce 270D Greenhouse Effect
    codeforce 270C Magical Boxes
    codeforce 270B Multithreading
    图论--Dijkstra算法总结
    图论--(技巧)超级源点与超级汇点
    图论--Floyd总结
    ZOJ 3932 Handshakes
    ZOJ 3932 Deque and Balls
    ZOJ 3927 Programming Ability Test
  • 原文地址:https://www.cnblogs.com/happyxiaoyu02/p/6150667.html
Copyright © 2020-2023  润新知