• pig trial-group,foreach


    A = load '/user/cloudera/lab/mydata' using PigStorage() as (a,b,c);

    如果写成 A=load 就会出现  [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Encountered " "A=load "" at line 1, column 1.

    ​(1,2,3)

    (4,2,1)

    (8,3,4)

    (4,3,3)

    (7,2,5)

    (8,4,3)

    B = group A by a;​

    (1,{(1,2,3)})

    (4,{(4,3,3),(4,2,1)})

    (7,{(7,2,5)})

    (8,{(8,4,3),(8,3,4)})

    C = foreach B { D = distinct A.b; generate flatten(group), COUNT(D); };

    把"("写成中文"( " 会报错  Unexpected character '.

    B的第一个字段有固定的名字,叫group,因为它是由group操作生成的。

    上面语句中D = distinct A.b;       A 指 B的第二个字段,保留生成B的时候 relation的名字,这里是以下值 

    (1,2,3)

    (4,3,3), (4,2,1)

    (7,2,5)

    (8,4,4), (8,3,4)

    所以 D 每次是

    2

    3,2

    2

    4,3

    >> generate flatten(group), COUNT(D);

    (1,1)

    (4,2)

    (7,1)

    (8,2)

    =========================

    GROUP creates a nested set of output tuples while JOIN creates a flat set of output tuples

    The first field is named "group" (do not confuse this with the GROUP operator) and is the same type as the group key.  

    The second field takes the name of the original relation and is type bag.

    # so "group" is the key name, and "A or B the original alias" is the nested set name

  • 相关阅读:
    Linux:备份
    在 Cordova/Phonegap for Android 中包含中文文件名的页面
    jQuery插件开发
    为Google Reader守夜。。。
    冒泡排序
    无题六月
    XXX读后感
    KL25的AD采集操作
    工作流--JBPM简介及开发环境搭建
    内存错误:CRT detected that the application wrote to memory after end of heap buffer
  • 原文地址:https://www.cnblogs.com/bob-dong/p/14248211.html
Copyright © 2020-2023  润新知