pig flatten

今天通过不断的尝试，终于知道这个flatten的用法了。其实吧，有时候关键是要test，才能充分理解解说。不过，同事给说的有点问题，误导了我。整的我一直没明白怎么回事。

这是官方的解释：

The FLATTEN operator looks like a UDF syntactically, but it is actually an operator that changes the structure of tuples and bags in a way that a UDF cannot. Flatten un-nests tuples as well as bags. The idea is the same, but the operation and result is different for each type of structure.

For tuples, flatten substitutes the fields of a tuple in place of the tuple. For example, consider a relation that has a tuple of the form (a, (b, c)). The expression GENERATE $0, flatten($1), will cause that tuple to become (a, b, c).

For bags, the situation becomes more complicated. When we un-nest a bag, we create new tuples. If we have a relation that is made up of tuples of the form ({(b,c),(d,e)}) and we apply GENERATE flatten($0), we end up with two tuples (b,c) and (d,e). When we remove a level of nesting in a bag, sometimes we cause a cross product to happen. For example, consider a relation that has a tuple of the form (a, {(b,c), (d,e)}), commonly produced by the GROUP operator. If we apply the expression GENERATE $0, flatten($1) to this tuple, we will create new tuples: (a, b, c) and (a, d, e).

我试验下来也是这样的，我今天把第一种和第二种情况都尝试了，实验证明，即使是第二种，其实一次flatten就够了，就得到schema了。这样的数据，

Joe {(Joe,18,3.8)}
Bill {(Bill,20,3.9)}
John {(John,18,4.0)}
Mary {(Mary,19,3.8),(Mary,19,5.0)}

a = load 'result' as (f1:chararray,B: bag {T: tuple(t1:chararray, t2:int, t3:float)});

b = foreach a GENERATE FLATTEN(B) as (t1:chararray,t2:int,t3:float);

这个是可以一次性flatten的。但是更高的复杂度我每测试，应该是需要两次这种操作的吧。真是真是对bag, tuple也长了见识了。明天看看能否把数据传输到UDF中操作。

总结一句话，在不确定时要首先看官方文档，然后就先拿小数据测试一下，看看每一步得到的是什么结构describe,同时store后看看是什么结果，是否和自己想的一样。整体来说还是很清晰的。

相关阅读:
MySql 主从
MySql Docker 主主配置
【算法刷题】C01-Q01 设计一个有getMin功能的栈
redis管道技术pipeline二——api
redis管道技术pipeline一 ——api
Hbase结构和原理
吴晓波：预见2021（跨年演讲 —— 02 “云上中国”初露峥嵘）
吴晓波：预见2021（跨年演讲 —— 08 超级城市大赛鸣枪）
吴晓波：预见2021（跨年演讲 —— 07 房产投资低空飞行）
吴晓波：预见2021（跨年演讲 —— 06 购物中心即将消亡）

原文地址：https://www.cnblogs.com/jamesf/p/4751605.html