• LanguageTag


    LanguageTag

    Table of Contents

    This is a memo of RFC 5646, ie BCP-47.

    1 The Language Tag

    Language tags are used to help identify languages, whether spoken, written, signed, or otherwise signaled, for the purpose of communication. This includes constructed and artificial languages but excludes languages not intended primarily for human communication, such as programming languages.

    1.1 Syntax

    • TAG is composed from a sequence of one or more subtags
    • SubTags are sequence of alphanumric characters to narrow the range of languge.
    • SubTags are concated suing "-".

    The syntax of the language tag in ABNF [RFC5234] is:

    Language-Tag  = langtag             ; normal language tags
                  / privateuse          ; private use tag
                  / grandfathered       ; grandfathered tags
    
    langtag       = language
                    ["-" script]
                    ["-" region]
                    *("-" variant)
                    *("-" extension)
                    ["-" privateuse]
    
    language      = 2*3ALPHA            ; shortest ISO 639 code
                    ["-" extlang]       ; sometimes followed by
                                        ; extended language subtags
                  / 4ALPHA              ; or reserved for future use
                  / 5*8ALPHA            ; or registered language subtag
    
    extlang       = 3ALPHA              ; selected ISO 639 codes
                    *2("-" 3ALPHA)      ; permanently reserved
    
    script        = 4ALPHA              ; ISO 15924 code
    
    region        = 2ALPHA              ; ISO 3166-1 code
                  / 3DIGIT              ; UN M.49 code
    
    variant       = 5*8alphanum         ; registered variants
                  / (DIGIT 3alphanum)
    
    extension     = singleton 1*("-" (2*8alphanum))
    
                                        ; Single alphanumerics
                                        ; "x" reserved for private use
    singleton     = DIGIT               ; 0 - 9
                  / %x41-57             ; A - W
                  / %x59-5A             ; Y - Z
                  / %x61-77             ; a - w
                  / %x79-7A             ; y - z
    
    privateuse    = "x" 1*("-" (1*8alphanum))
    
    grandfathered = irregular           ; non-redundant tags registered
                  / regular             ; during the RFC 3066 era
    
    irregular     = "en-GB-oed"         ; irregular tags do not match
                  / "i-ami"             ; the 'langtag' production and
                  / "i-bnn"             ; would not otherwise be
                  / "i-default"         ; considered 'well-formed'
                  / "i-enochian"        ; These tags are all valid,
                  / "i-hak"             ; but most are deprecated
                  / "i-klingon"         ; in favor of more modern
                  / "i-lux"             ; subtags or subtag
                  / "i-mingo"           ; combination
                  / "i-navajo"
                  / "i-pwn"
                  / "i-tao"
                  / "i-tay"
                  / "i-tsu"
                  / "sgn-BE-FR"
                  / "sgn-BE-NL"
                  / "sgn-CH-DE"
    
    regular       = "art-lojban"        ; these tags match the 'langtag'
                  / "cel-gaulish"       ; production, but their subtags
                  / "no-bok"            ; are not extended language
                  / "no-nyn"            ; or variant subtags: their meaning
                  / "zh-guoyu"          ; is defined by their registration
                  / "zh-hakka"          ; and all of these are deprecated
                  / "zh-min"            ; in favor of a more modern
                  / "zh-min-nan"        ; subtag or sequence of subtags
                  / "zh-xiang"
    
    alphanum      = (ALPHA / DIGIT)     ; letters and numbers
    

    Figure 1: Language Tag ABNF

    Note:

    1.1.1 Formatting of Languge Tags

    Although tags should be case-insensitive, there are formatting conventions:

    • recommends that language codes be written in lowercase ('mn' Mongolian).
    • recommends that script codes use lowercase with the initial letter capitalized ('Cyrl' Cyrillic).
    • recommends that country codes be capitalized ('MN' Mongolia).

    1.2 Language Subtag Sources and Interpretation

    The namespace of language tags and their subtags is administered by the Internet Assigned Numbers Authority (IANA) according to the rules in Section 5 of this document. The Language Subtag Registry maintained by IANA is the source for valid subtags: other standards referenced in this section provide the source material for that registry.

    1.2.1 Primary Language Subtag

    Should never be omitted in most cases, can be two or three characters.

  • 相关阅读:
    Java io 理解
    Java应用的理解
    Flyweight 享元模式
    Bridge 桥梁模式
    Decrator 装饰模式
    [CF997C]Sky Full of Stars_二项式反演_等比数列_容斥原理
    [CF1010D]Mars Over_位运算性质
    [CF991D]Bishwock_状压dp
    [Agc030B]Tree Burning_贪心
    [Cometoj#4 E]公共子序列_贪心_树状数组_动态规划
  • 原文地址:https://www.cnblogs.com/yangyingchao/p/3794436.html
Copyright © 2020-2023  润新知