在RFC1738中,对于URL可以使用的字符集做了如下规定:
“
只有0-9a-zA-Z的字母以及$-_.+!*'(),"这几个特殊字符
”
而在html4中扩展了所有的unicode character set能够在url中使用。
那么到底有哪些字符需要encoded呢?
1. ascii control characters
原因是:他们不可打印,
字符范围iso-8859-1的00-1F 以及7F
2. non-ascii characters:
原因:这些字符因为不在ascii集合中不被认为在url中是合法的
字符范围: iso-latin的80-FF范围
3. reserved characters:
原因:URL使用部分预留的字符来定义url的语法。当这些字符在url中不被当作其特殊角色时,他们必须被encoded
字符范围: $, &,+, , /,:,;,=,?,@
Character | Code Points (Hex) | Code Points (Dec) |
---|---|---|
Dollar ("$") Ampersand ("&") Plus ("+") Comma (",") Forward slash/Virgule ("/") Colon (":") Semi-colon (";") Equals ("=") Question mark ("?") 'At' symbol ("@") |
24 26 2B 2C 2F 3A 3B 3D 3F 40 |
36 38 43 44 47 58 59 61 63 64 |
4.unsafe characters
原因: 部分字符如果在url中可能导致歧义。这些字符也必须被encoded:
Character | Code Points (Hex) | Code Points (Dec) | Why encode? |
---|---|---|---|
Space | 20 | 32 | Significant sequences of spaces may be lost in some uses (especially multiple spaces) |
Quotation marks 'Less Than' symbol ("<") 'Greater Than' symbol (">") |
22 3C 3E |
34 60 62 |
These characters are often used to delimit URLs in plain text. |
'Pound' character ("#") | 23 | 35 | This is used in URLs to indicate where a fragment identifier (bookmarks/anchors in HTML) begins. |
Percent character ("%") | 25 | 37 | This is used to URL encode/escape other characters, so it should itself also be encoded. |
Misc. characters: Left Curly Brace ("{") Right Curly Brace ("}") Vertical Bar/Pipe ("|") Backslash ("") Caret ("^") Tilde ("~") Left Square Bracket ("[") Right Square Bracket ("]") Grave Accent ("`") |
7B 7D 7C 5C 5E 7E 5B 5D 60 |
123 125 124 92 94 126 91 93 96 |
Some systems can possibly modify these chara |
如何做url encoded呢?
url encoding of a character包含一个%号,并且以iso-latin的16进制两位数来跟进
例如:
space = %20
使用javascript的
encodeURIComponent 函数来实现