• x 开头编码的数据解码成中文


    在python里,直接decode('utf-8')即可

    >>> "xE5x85x84xE5xBCx9FxE9x9AxBExE5xBDx93 xE6x9Dx9CxE6xADx8C".decode('utf-8')
    u'u5144u5f1fu96beu5f53 u675cu6b4c'
    >>> print "xE5x85x84xE5xBCx9FxE9x9AxBExE5xBDx93 xE6x9Dx9CxE6xADx8C".decode('utf-8')
    兄弟难当 杜歌
    >>>

    在java里未发现直接解码的函数,不过只要理解了数据是如何编码的,就可以很快的进行解码,推荐阅读http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html

    UTF-8是unicode编码的一种落地方案:

    Unicode符号范围 | UTF-8编码方式
    (十六进制) | (二进制)
    --------------------+---------------------------------------------
    0000 0000-0000 007F | 0xxxxxxx
    0000 0080-0000 07FF | 110xxxxx 10xxxxxx
    0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
    0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

    x对应的是UTF-8编码的数据,通过转化规则可以转换为Unicode编码,就能得到对应的汉字,转换规则很简单,先将x去掉,转换为数字,然后进行对应的位移操作即可,需要注意的是先要判断utf-8的位数:

     val pattern = """(d+.d+.d+.d+) - (S+) (S+) [([^]]+)] "(w+) (S+) S+" (S+) (S+) "([^"]+)" "([^"]+)" "([^"]+)" "([^"]+)""".r
      val decodeDataPattern = """(\x([0-9A-Z]){2})+""".r
      def decodeUtf8(utf8Str:String):String={
        var data =   decodeDataPattern.replaceAllIn(utf8Str, m=>{
            var item = decodeXdata(m.toString())
            item
         }) 
         return data
       }
         
       def decodeXdata(utf8Str:String):String={
         var arr = utf8Str.split("\\x")
         var result = new StringBuilder()
         var isMatchEnd = true
         var matchIndex = 0
         var currentWordLength = 0
         var current = 0
         var e0=0xe0;
         
         for(item <-arr){
            var str = item.trim
            if(str.length()>0){
               var currentCode =  Integer.parseInt(str, 16);
               if(isMatchEnd){
                 isMatchEnd = false
                 var and = currentCode & e0;
                 if(and == 0xe0){
                    matchIndex = 1;
                    currentWordLength = 3;
                    current =  (currentCode & 0x1f) <<12  // 3位编码的
                 }else if(and==96){
                    matchIndex = 1;
                    currentWordLength = 2;
                    current =  (currentCode & 0x1f) <<6 // 2位编码的
                 }else{
                   current = currentCode  // 1位编码的
                 }
              }else{
                matchIndex = matchIndex+1;
                if(matchIndex == 2)
                {
                  current+=(currentCode & 0x3f) <<6
                }else{
                   current+=(currentCode & 0x3f) 
                }
              }
               if(matchIndex==currentWordLength){
                   var hex = Integer.toHexString(current)
                   hex = if(hex.length()<4) "\u00"+hex else "\u"+hex  //补0
                   result.append(new String(StringEscapeUtils.unescapeJava(hex).getBytes,"utf-8")) 
                   current = 0
                   matchIndex=0
                   isMatchEnd = true
               }
            }
         }
         
         return result.toString()
       }
    

      

    Javascript x 反斜杠x 16进制 编解码

  • 相关阅读:
    【PAT甲级】1014 Waiting in Line (30 分)(队列维护)
    【PAT甲级】1013 Battle Over Cities (25 分)(并查集,简单联通图)
    获取当前时间
    设备版本,设备号,APP版本,APP名称获取
    获取设备号
    Button的图像位置设置
    UIButton设置imgae图片自适应button的大小且不变形
    手势图片:拖拽捏合旋转放大
    Unable to add App ID because the '10' App ID limit in '7' days has been exceeded.
    iOS开发系列--触摸事件、手势识别、摇晃事件、耳机线控
  • 原文地址:https://www.cnblogs.com/xiaoqi/p/5101795.html
Copyright © 2020-2023  润新知