GP通过外部表装载数据时遇到ERROR:extra data after last expected column解决方法

GP通过外部表装载数据时遇到ERROR:extra data after last expected column解决方法
--一般通过hive -e导出文本时，我都习惯把比较容易出错的String类型字段做一次regexp_replace()处理，因为如果字段里面含有制表符，那是一件很坑很坑的事
```
hive -e "select regexp_replace(String_Col1,'	',''),Date_Col2,Integer_Col3 ... from hivedb.export_table" | sed 's/	/x01/g;s/\/\\/g;s/x00//g'  >.../export_table.txt
```
--GP中新建外部表，根据之前把替换掉的分隔符x01来分隔，屡试不爽，成功率99.99%
```
drop external table if exists product_ext.export_table_ext;
create external table product_ext.export_table_ext(
String_Col1 varchar(1000),
Date_Col2 date,
Integer_Col3 integer,
...
)
LOCATION (
'gpfdist://xxx.xxx.xxx.xxx:port/.../export_table.txt'
)FORMAT 'TEXT' (DELIMITER E'x01'); --irview_vt
```
--即使sed 's/ /x01/g;s/\/\\/g;s/x00//g'处理过，分隔符按照x01处理，今天还是遇到了0.01%的失败，下面是解决方法:

--根据错误信息，定位到含有出错字符串的行，备份到一个临时文件，实际上也就发现了一行有错误，没办法，只能查找删除它
```
more rid_mac_201735to38w.txt | grep 'jQTIJWkiyytg97PCjh5U' > rid_mac_falsedata.txt
```
--把包含错误(也就是通过外部表映射到内部表出错)信息的行删除掉
```
sed -i '/jQTIJWkiyytg97PCjh5U/d' rid_mac_201735to38w.txt
```
--此时，就能正常插入了　　
```
Query returned successfully: 302060132 rows affected, 26.6 secs execution time.
```
附.

sed命令_Linux sed 命令用法详解

　　

　　
相关阅读:
php 异步执行脚本
 微信扫描带参数二维码事件
 windows7搭建wnmp环境
 Windows下安装Redis及php的redis拓展教程
 英语翻译(一维map)
转圈游戏
 蓝桥杯剪邮票
 再谈组合
 关于inf设置为0x3f3f3f3f
枚举排列组合(dfs)
原文地址：https://www.cnblogs.com/binguo2008/p/7682783.html