• 过滤非汉字的utf8的字符


    原文:http://www.oschina.net/code/snippet_564772_13507

    utf8是变长字符集,单个字符占用1~4个字节。mysql在选择utf8字符集时,最多只能存储3个字节的 utf8字符,如果想要保存任意的utf8字符,数据必须用utf8mb4字符集,有些情况下,不能变更已选定的字符集,只好不得以而为之,把输入中的4 个字节的utf8字符全部过滤掉,好在,utf8字符集中,汉字是3个字节的。

    public static String filterOffUtf8Mb4_2(String text) throws UnsupportedEncodingException {
    byte[] bytes = text.getBytes("utf-8");
    ByteBuffer buffer = ByteBuffer.allocate(bytes.length);
    int i = 0;
    while (i < bytes.length) {
    short b = bytes[i];
    if (b > 0) {
    buffer.put(bytes[i++]);
    continue;
    }
    
    b += 256; //去掉符号位
    
    if (((b >> 5) ^ 0x06) == 0) {
    buffer.put(bytes, i, 2);
    i += 2;
    System.out.println("2");
    } else if (((b >> 4) ^ 0x0E) == 0) {
    System.out.println("3");
    buffer.put(bytes, i, 3);
    i += 3;
    } else if (((b >> 3) ^ 0x1E) == 0) {
    i += 4;
    System.out.println("4");
    } else if (((b >> 2) ^ 0xBE) == 0) {
    i += 5;
    System.out.println("5");
    } else {
    i += 6;
    System.out.println("6");
    }
    }
    buffer.flip();
    return new String(buffer.array(), "utf-8");
    }
    static public String filterOffUtf8Mb4(String text) throws UnsupportedEncodingException {
            byte[] bytes = text.getBytes("utf-8");
            ByteBuffer buffer = ByteBuffer.allocate(bytes.length);
            int i = 0;
            while (i < bytes.length) {
                short b = bytes[i];
                if (b > 0) {
                    buffer.put(bytes[i++]);
                    continue;
                }
                b += 256;
                if ((b ^ 0xC0) >> 4 == 0) {
                    buffer.put(bytes, i, 2);
                    i += 2;
                }
                else if ((b ^ 0xE0) >> 4 == 0) {
                    buffer.put(bytes, i, 3);
                    i += 3;
                }
                else if ((b ^ 0xF0) >> 4 == 0) {
                    i += 4;
                }
            }
            buffer.flip();
            return new String(buffer.array(), "utf-8");
        }
    程序员生涯
  • 相关阅读:
    Spring Junit 读取WEB-INF下的配置文件
    cron表达式详解
    jQuery.Validate验证库
    CentOS6.x升级MySQL版本5.1到5.6
    Linux下部署
    jQuery插件之ajaxFileUpload
    javascript深入理解js闭包
    STM32常用参考资料
    STM32点灯需要的文件
    【编程1】写一个函数判断系统是大端还是小端
  • 原文地址:https://www.cnblogs.com/mlj007/p/4325798.html
Copyright © 2020-2023  润新知