[oracle@oadb utf-8]$ cat a1.pl
my $str="中国";
print length($str);
print "
";
print $str."
";
[oracle@oadb utf-8]$ perl a1.pl
6
中国
此时长度是6
[oracle@oadb utf-8]$ cat a1.pl
use utf8;
my $str="中国";
print length($str);
print "
";
print $str."
";
[oracle@oadb utf-8]$ perl a1.pl
2
Wide character in print at a1.pl line 5.
中国
此时启用utf8 长度为2
[oracle@oadb utf-8]$ cat a1.pl
use utf8;
use Encode;
my $str="中国";
$str=encode_utf8($str);
print length($str);
print "
";
print $str."
";
[oracle@oadb utf-8]$ perl a1.pl
6
中国
此时长度为6
常用中文字符用utf-8编码占用3个字节(大约2万多字),但超大字符集中的更大多数汉字要占4个字节(在unicode编码体系中,U+20000开始有5万多汉字)
字节 -> decode ->字符串 ->encode ->字节
字符串长度短,字节长度长