Wrong codepoints for non-ASCII characters inserted in UTF-8 database using CLP

Wrong codepoints for non-ASCII characters inserted in UTF-8 database using CLP
Technote (troubleshooting)
Problem(Abstract)

During insert from the CLP there is no codepage conversion if operating system codepage and database codepage are both UTF-8. In this case data to be inserted should also be in UTF-8 encoding.
If data has a different encoding then the database codepage (this can be verified using any hex editor), then the operating system codepage should be changed to match the data's encoding in order to enforce the data conversion to the database codepage.

Symptom

Error executing Select SQL statement. Caught by java.io.CharConversionException. ERRORCODE=-4220

Caused by: java.nio.charset.MalformedInputException: Input length = 4759 at com.ibm.db2.jcc.b.u.a(u.java:19) at com.ibm.db2.jcc.b.bc.a(bc.java:1762)

Cause
During an insert of data using CLP characters, they do not go through codepage conversion. If operating system and database codepage both are UTF-8, but the data to be inserted is not Unicode, then data in the database might have incorrect codepoints (not-Unicode) and the above error will be a result during data retrieval.

To verify the encoding for data to be inserted you can use any editor that shows hex representation of characters. Please verify the codepoints for non-ASCII characters that you try to insert. If you see only 1 byte per non-ASCII characters then you need to force the database conversion during insert from CLP to UTF-8 database.

To force codepage conversion during insert from the CLP make sure that the operating system codepage is non-Unicode and matching to the codepage of data when you insert data to Unicode database from non-Unicode data source.

Problem Details An example problem scenario is as follows:

Create a database of type UTF-8:
CREATE DATABASE <db> USING CODESET utf-8 TERRITORY US

Create a table that holds character data:
CREATE TABLE test (col char(20))

Check operating system locale:
locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8"

Insert the non-ASCII characters 'Ã' , '³', '©' which have codepoint 0x'C3', 0x'B3',0x'A9' in codepage 819 into the table:
INSERT INTO test VALUES ('Ã') INSERT INTO test VALUES ('³') INSERT INTO test VALUES ('©')

By running the following statement, you can see that all INSERT statements caused only one byte to be inserted into the table:
SELECT col, HEX(col) FROM test
Ã C3 ³ B3 © A9
However, the UTF-8 representation of those characters are: 0x'C383' for 'Ã', 0x'C2B3' for '³', and 0x'C2A9' for '©'. So these three rows in the table contain invalid characters in UTF-8.

When selecting from a column using the JDBC application, the following error will occur. This is expected because the table contains invalid UTF-8 data: Error executing Select SQL statement. Caught by java.io.CharConversionException. ERRORCODE=-4220 Caused by: java.nio.charset.MalformedInputException: Input length = 4759 at com.ibm.db2.jcc.b.u.a(u.java:19) at com.ibm.db2.jcc.b.bc.a(bc.java:1762)

Delete all rows with incorrect Unicode codepoints from the test table: DELETE * from test

Change the locale to one that matching codepage of data to be inserted: export locale=en_us. One of the way to determine the codepage for your data can be found here: http://www.codeproject.com/Articles/17201/Detect-Encoding-for-In-and-Outgoing-Text. If you prepare data yourself using some editor please check the documentation for your editor to find out how to set up the codepage for data being prepared by the editor.

Insert data to the table: INSERT INTO test VALUES ('Ã') INSERT INTO test VALUES ('³') INSERT INTO test VALUES ('©')

Verify that inserted data were converted to UTF-8 during insert: SELECT col, HEX(col) FROM test
Ã C383 ³ C2B3 © C2A9

Run your java application selecting Unicode data. No exception should be reported.
Environment

UNIX, Linux, Unicode database

Diagnosing the problem

Verify that non-ASCII data have a proper Unicode codepoints in Unicode database

Resolving the problem

Reinsert data with codepage conversion enforced by setting the operation system codepage matching to the codepage of data to be inserted
Related information

Export data:

Community questions and discussion

By adding a comment, you accept our Terms of Use. Your comments entered on this IBM Support site do not represent the views or opinions of IBM. IBM, in its sole discretion, reserves the right to remove any comments from this site. IBM is not responsible for, and does not validate or confirm, the correctness or accuracy of any comments you post. IBM does not endorse any of your comments. All IBM comments are provided "AS IS" and are not warranted by IBM in any way.
相关阅读:
创建一个windows服务的小程序及注意事项
 Asp中上传文件
 C#创建Excel表格（样式设置）
在Windows服务中使用EventLog组件纪录日志
 MVC中使用事物
 WCF（学习笔记）【参见WCF教程】
用vs命令提示符来使用 Installutil.exe来安装和卸载Windows服务
 web service使用注意事项
 iphone开发有关 Navigation Bar 和 UITableView 的用法（Navigation Bar 的edit 按钮自定义实现编辑状态）
在Mvc中使用 Ajax 提交和接收数据
原文地址：https://www.cnblogs.com/xuxiuxiu/p/7271953.html

Wrong codepoints for non-ASCII characters inserted in UTF-8 database using CLP

Technote (troubleshooting)

Problem(Abstract)

Symptom

Cause

Environment

Diagnosing the problem

Resolving the problem

Related information

Community questions and discussion