cdh3上,pig支持int到chararray的隐式转化,但到cdh5不行。
pig code is as follows:
%default Cleaned_Log /user/usergroup_mdmp/test/cleaned/2015-01-05/5/part-r-00000
%default Industry_Path /user/usergroup_mdmp/test/report/historical/appcategory/2015/industry
origin_cleaned_data = LOAD '$Cleaned_Log' USING PigStorage(',')
AS (ad_network_id:chararray,
app_id:chararray,
app_category_id:chararray,
quadkey:chararray);
category_data = foreach origin_cleaned_data generate (int)app_category_id;
industry_existed_Data = LOAD '$Industry_Path' USING PigStorage(',') AS (appcategory_id:chararray);
result = UNION category_data, industry_existed_Data;
dump result;
--STORE result INTO '/user/usergroup_mdmp/test/report/historical/hour/2015/test' USING PigStorage(',');
老Cdh3跑的好好的,
到cdh5报错:can‘t cast to bytearray。原来不能将int隐式转化为chararray了。
所以,合并前要合并成同一类型int。
其他:
int到bytearray也不行
chararray到bytearray也不行:
category_data = foreach origin_cleaned_data generate (bytearray)app_category_id;
industry_existed_Data = LOAD '$Industry_Path' USING PigStorage(',') AS (appcategory_id:chararray);
result = UNION category_data, industry_existed_Data;
另外:
cdh5 的hadoop命令也有所改动
1,hadoop fs -mkdir /tt/xx :不能生成父路径不存在的路径,只能创建已存在目录下当前的一级目录。如果要创建多级不存在目录的路径,要用mkdir -p.
2,删除命令变为rm -r,原来的rmr 已经deprecated,还可以用