• Unicode, UTF, ASCII, ANSI format differences


    问题:

    What is the difference between the Unicode, UTF8, UTF7, UTF16, UTF32, ASCII, and ANSI encodings?

    In what way are these helpful for programmers?

     解答:

    Going down your list:

    • "Unicode" isn't an encoding, although unfortunately, a lot of documentation imprecisely不严密地 uses it to refer to whichever Unicode encoding that particular system uses by default. On Windows and Java, this often means UTF-16; in many other places, it means UTF-8. Properly, Unicode refers to the abstract character set itself, not to any particular encoding.
    • UTF-16: 2 bytes per "code unit". This is the native format of strings in .NET, and generally in Windows and Java. Values outside the Basic Multilingual Plane (BMP) are encoded as surrogate pairs. (These are relatively rarely used - which is a good job, as very few developers get them right, I suspect. I very much doubt that I do.)
    • UTF-8: Variable length encoding, 1-4 bytes per code point. ASCII values are encoded as ASCII using 1 byte.
    • UTF-7: Usually used for mail encoding. Chances are if you think you need it and you're not doing mail, you're wrong. (That's just my experience of people posting in newsgroups etc - outside mail, it's really not widely used at all.)
    • UTF-32: Fixed width encoding using 4 bytes per code point. This isn't very efficient, but makes life easier outside the BMP. I have a .NET Utf32String class as part of my MiscUtil library, should you ever want it. (It's not been very thoroughly tested, mind you.)
    • ASCII: Single byte encoding only using the bottom 7 bits. (Unicode code points 0-127.) No accents etc.
    • ANSI: There's no one fixed ANSI encoding - there are lots of them. Usually when people say "ANSI" they mean "the default locale/codepage for my system" which is obtained via Encoding.Default, and is often Windows-1252 but can be other locales.

    There's more on my Unicode page and tips for debugging Unicode problems.

    The other big resource of code is unicode.org which contains more information than you'll ever be able to work your way through - possibly the most useful bit is the code charts.

    Unicode

    [Test]
    public void GetDefaultEncodings()
    {
    var defaultEncoding = Encoding.Default;
    Console.WriteLine(
    $"{defaultEncoding.EncodingName},{defaultEncoding.CodePage},{defaultEncoding.WindowsCodePage}");
    }

    Chinese Simplified (GB2312),936,936

  • 相关阅读:
    javascript 中的暗物质 闭包
    关于使用 jBox 对话框的提交问题
    Orchard 的项目结构解决方案文件夹的原理与使用
    翻译:创建 Windows8 应用 Part I: Hello, world!
    翻译:FireBug 1.10 新特性
    SQL数据库设计经验
    我的女孩
    在WINDOWS下安装MRTG全攻略网络流量监控
    ASP实用函数库
    DIV CSS设计时IE6、IE7、FF 与兼容性有关的特性
  • 原文地址:https://www.cnblogs.com/chucklu/p/10755968.html
Copyright © 2020-2023  润新知