• LanguageTag


    LanguageTag

    Table of Contents

    This is a memo of RFC 5646, ie BCP-47.

    1 The Language Tag

    Language tags are used to help identify languages, whether spoken, written, signed, or otherwise signaled, for the purpose of communication. This includes constructed and artificial languages but excludes languages not intended primarily for human communication, such as programming languages.

    1.1 Syntax

    • TAG is composed from a sequence of one or more subtags
    • SubTags are sequence of alphanumric characters to narrow the range of languge.
    • SubTags are concated suing "-".

    The syntax of the language tag in ABNF [RFC5234] is:

    Language-Tag  = langtag             ; normal language tags
                  / privateuse          ; private use tag
                  / grandfathered       ; grandfathered tags
    
    langtag       = language
                    ["-" script]
                    ["-" region]
                    *("-" variant)
                    *("-" extension)
                    ["-" privateuse]
    
    language      = 2*3ALPHA            ; shortest ISO 639 code
                    ["-" extlang]       ; sometimes followed by
                                        ; extended language subtags
                  / 4ALPHA              ; or reserved for future use
                  / 5*8ALPHA            ; or registered language subtag
    
    extlang       = 3ALPHA              ; selected ISO 639 codes
                    *2("-" 3ALPHA)      ; permanently reserved
    
    script        = 4ALPHA              ; ISO 15924 code
    
    region        = 2ALPHA              ; ISO 3166-1 code
                  / 3DIGIT              ; UN M.49 code
    
    variant       = 5*8alphanum         ; registered variants
                  / (DIGIT 3alphanum)
    
    extension     = singleton 1*("-" (2*8alphanum))
    
                                        ; Single alphanumerics
                                        ; "x" reserved for private use
    singleton     = DIGIT               ; 0 - 9
                  / %x41-57             ; A - W
                  / %x59-5A             ; Y - Z
                  / %x61-77             ; a - w
                  / %x79-7A             ; y - z
    
    privateuse    = "x" 1*("-" (1*8alphanum))
    
    grandfathered = irregular           ; non-redundant tags registered
                  / regular             ; during the RFC 3066 era
    
    irregular     = "en-GB-oed"         ; irregular tags do not match
                  / "i-ami"             ; the 'langtag' production and
                  / "i-bnn"             ; would not otherwise be
                  / "i-default"         ; considered 'well-formed'
                  / "i-enochian"        ; These tags are all valid,
                  / "i-hak"             ; but most are deprecated
                  / "i-klingon"         ; in favor of more modern
                  / "i-lux"             ; subtags or subtag
                  / "i-mingo"           ; combination
                  / "i-navajo"
                  / "i-pwn"
                  / "i-tao"
                  / "i-tay"
                  / "i-tsu"
                  / "sgn-BE-FR"
                  / "sgn-BE-NL"
                  / "sgn-CH-DE"
    
    regular       = "art-lojban"        ; these tags match the 'langtag'
                  / "cel-gaulish"       ; production, but their subtags
                  / "no-bok"            ; are not extended language
                  / "no-nyn"            ; or variant subtags: their meaning
                  / "zh-guoyu"          ; is defined by their registration
                  / "zh-hakka"          ; and all of these are deprecated
                  / "zh-min"            ; in favor of a more modern
                  / "zh-min-nan"        ; subtag or sequence of subtags
                  / "zh-xiang"
    
    alphanum      = (ALPHA / DIGIT)     ; letters and numbers
    

    Figure 1: Language Tag ABNF

    Note:

    1.1.1 Formatting of Languge Tags

    Although tags should be case-insensitive, there are formatting conventions:

    • recommends that language codes be written in lowercase ('mn' Mongolian).
    • recommends that script codes use lowercase with the initial letter capitalized ('Cyrl' Cyrillic).
    • recommends that country codes be capitalized ('MN' Mongolia).

    1.2 Language Subtag Sources and Interpretation

    The namespace of language tags and their subtags is administered by the Internet Assigned Numbers Authority (IANA) according to the rules in Section 5 of this document. The Language Subtag Registry maintained by IANA is the source for valid subtags: other standards referenced in this section provide the source material for that registry.

    1.2.1 Primary Language Subtag

    Should never be omitted in most cases, can be two or three characters.

  • 相关阅读:
    JDBC(三)、基础代码及优化
    JDBC(二)、注册驱动三种实现原理
    JDBC(一)、数据库存取的三种
    Java学习笔记(四) 运算符
    Java学习笔记(三) 基础语法
    Java学习笔记(二) 第一个程序
    Java学习笔记(一) Java 概述
    JqGrid自定义获取编辑中单元格的值
    移动前端开发之viewport的深入理解
    CSS学习笔记--提示工具(Tooltip)
  • 原文地址:https://www.cnblogs.com/yangyingchao/p/3794436.html
Copyright © 2020-2023  润新知