16 June 2013

author : xiajun

一.资料:

http://mfido-sina-cn.iteye.com/blog/1454873安装指南

http://pig.apache.org/docs/r0.11.1/func.html内置函数</br> 二.内置函数使用:

  id    name   sex  age address 

  1      zs     F   25     bj

  2      ls     M   22     hb

  3      ww     F   23     hn

  4      zl     M   28     bj

  5      kk     M   25     hn

命令:

u = load 'user.txt'  using PigStorage(',') as (id:int,name:chararray,sex:chararray,age:int,address:chararray); //load 装载数据  PigStorage 列的分隔符

g = group u by address;//根据address进行分组

a = foreach g generate u.address ,AVG(u.age);//avg 时必须经过分组 generate 后是分组列 

dump a; //将变量a的值输出到屏幕

store a  into '/home/xxx/xxx';//将变量保存到系统路径</br>
describe u;//查看u的结构。

CONCAT (expression, expression)

X = foreach u generate CONCAT(name,address);//将name和address进行连接 结果如:zsbj 注意 concat 是大写

COUNT(expression)

b = group u by address;

x = foreach b generate COUNT(u);//必须对u进行分组
COUNT_STAR(expression) 

X = FOREACH b GENERATE COUNT_STAR(u);

MAX(expression)

X = FOREACH B GENERATE group, MAX(u.age);

limit = LIMIT u 2;//查询前2条数据

x = order u by age;
x = filter u by age>30;

内连接Inner Join

tmp_table_left_join = JOIN u BY age LEFT OUTER,u1 BY age;//注意join大写 右连接

tmp_table_right_join = JOIN u BY age RIGHT OUTER,u1 BY age;

去重复

tmp_table_distinct = DISTINCT tmp_table_distinct;


blog comments powered by Disqus