• pig相关


    1. 重命名pig job name:

    在Pig脚本中的一开始处,写上这一句:

    set job.name 'This is my job';

    2. 设置pig参数:

    允许pig时,输入如下:

    pig -p JOBNAME="MyJob" test.pig
    ************test.pig**********
    set job.name '$JOBNAME';
    ......

    3. pig分隔符定义:

    pig默认分隔符是/t,可以通过如下命令 using PigStorage(',')自定义分隔符:

    prices = load 'NYSE_daily' using PigStorage(',') as (exchange, symbol, date, open,high, low, close, volume, adj_close);

    4. pig定义reduce个数:

    Parallel

    设置pig的reduce进程个数

    --parallel.pig
    daily   = load 'NYSE_daily' as (exchange, symbol, date, open, high, low, close,
                volume, adj_close);
    bysymbl = group daily by symbol parallel 10;

    parallel只针对一条语句,如果希望脚本中的所有语句都有10个reduce进程,可以使用 set default_parallel 10命令

    --defaultparallel.pig
    set default_parallel 10;
    daily   = load 'NYSE_daily' as (exchange, symbol, date, open, high, low, close,
                volume, adj_close);
    bysymbl = group daily by symbol;
    average = foreach bysymbl generate group, AVG(daily.close) as avg;
    sorted  = order average by avg desc;

    其他可以参考:

    http://www.cnblogs.com/siwei1988/archive/2012/08/06/2624912.html

  • 相关阅读:
    Ubuntu oracle SDK替换OpenJDK
    用update-alternatives管理java版本
    安卓配置
    CS 159: Advanced Topics in Machine Learning: Structured Prediction
    ros bag 代码操作
    vim 方式快捷编辑代码
    文件权限
    操作系统连不上网
    github权限管理
    nmap基本命令使用
  • 原文地址:https://www.cnblogs.com/dorothychai/p/4606406.html
Copyright © 2020-2023  润新知