博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
方法、脚本-Pig Grunt之简单命令及实例说明-by小雨
阅读量:6463 次
发布时间:2019-06-23

本文共 4460 字,大约阅读时间需要 14 分钟。

首先声明,我是一个菜鸟。一下文章中出现技术误导情况盖不负责

    Pig大的运行方法:

    1、脚本

    2、Grunt

    3、嵌入式

    

    Grunt

    1、主动补全制机 (令命补全、不支持文件名补全)

    2、autocomplete文件

    3、Eclipse件插PigPen

    

    进入Grunt shell令命

    [hadoop@master pig]$ ./bin/pig

2013-04-13 23:00:19,909 [main] INFO  org.apache.pig.Main - Apache Pig version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
2013-04-13 23:00:19,909 [main] INFO  org.apache.pig.Main - Logging error messages to: /opt/pig/pig_1365865219902.log
2013-04-13 23:00:20,237 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://192.168.154.100:9000
2013-04-13 23:00:20,536 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: 192.168.154.100:9001

    

    帮助(help)

    grunt> help

Commands:
<pig latin statement>; - See the PigLatin manual for details: http://hadoop.apache.org/pig
File system commands:
    fs <fs arguments> - Equivalent to Hadoop dfs command: http://hadoop.apache.org/common/docs/current/hdfs_shell.html
Diagnostic commands:
    describe <alias>[::<alias] - Show the schema for the alias. Inner aliases can be described as A::B.
    explain [-script <pigscript>] [-out <path>] [-brief] [-dot] [-param <param_name>=<param_value>]
        [-param_file <file_name>] [<alias>] - Show the execution plan to compute the alias or for entire script.
        -script - Explain the entire script.
        -out - Store the output into directory rather than print to stdout.
        -brief - Don't expand nested plans (presenting a smaller graph for overview).
        -dot - Generate the output in .dot format. Default is text format.
        -param <param_name - See parameter substitution for details.
        -param_file <file_name> - See parameter substitution for details.
        alias - Alias to explain.
    dump <alias> - Compute the alias and writes the results to stdout.
Utility Commands:
    exec [-param <param_name>=param_value] [-param_file <file_name>] <script> -
        Execute the script with access to grunt environment including aliases.
        -param <param_name - See parameter substitution for details.
        -param_file <file_name> - See parameter substitution for details.
        script - Script to be executed.
    run [-param <param_name>=param_value] [-param_file <file_name>] <script> -
        Execute the script with access to grunt environment.
        -param <param_name - See parameter substitution for details.
        -param_file <file_name> - See parameter substitution for details.
        script - Script to be executed.
    sh  <shell command> - Invoke a shell command.
    kill <job_id> - Kill the hadoop job specified by the hadoop job id.
    set <key> <value> - Provide execution parameters to Pig. Keys and values are case sensitive.
        The following keys are supported:
        default_parallel - Script-level reduce parallelism. Basic input size heuristics used by default.
        debug - Set debug on or off. Default is off.
        job.name - Single-quoted name for jobs. Default is PigLatin:<script name>
        job.priority - Priority for jobs. Values: very_low, low, normal, high, very_high. Default is normal
        stream.skippath - String that contains the path. This is used by streaming.
        any hadoop property.
    help - Display this message.
    quit - Quit the grunt shell.

    

    查看(ls、cd 、cat)

    grunt> ls

hdfs://192.168.154.100:9000/user/hadoop/in    <dir>
hdfs://192.168.154.100:9000/user/hadoop/out    <dir>

    grunt> cd in

grunt> ls
hdfs://192.168.154.100:9000/user/hadoop/in/test1.txt<r 1>    12
hdfs://192.168.154.100:9000/user/hadoop/in/test2.txt<r 1>    13
hdfs://192.168.154.100:9000/user/hadoop/in/test_1.txt<r 1>    328
hdfs://192.168.154.100:9000/user/hadoop/in/test_2.txt<r 1>    139
grunt> cat test1.txt
hello world

    

    复制到当地(copyToLocal)

    grunt> ls

hdfs://192.168.154.100:9000/user/hadoop/in/test1.txt<r 1>    12
hdfs://192.168.154.100:9000/user/hadoop/in/test2.txt<r 1>    13
hdfs://192.168.154.100:9000/user/hadoop/in/test_1.txt<r 1>    328
hdfs://192.168.154.100:9000/user/hadoop/in/test_2.txt<r 1>    139
grunt> copyToLocal test1.txt ttt

    

    [root@master pig]# ls -l ttt

-rwxrwxrwx. 1 hadoop hadoop 12  4月 13 23:06 ttt
[root@master pig]#

    行执操纵系统令命:sh

    grunt> sh jps          

2098 DataNode
1986 NameNode
2700 Jps
2539 RunJar
2297 JobTracker
2211 SecondaryNameNode
2411 TaskTracker
grunt> 

    

    Pig数据模型

    Bag:表

    Tuple:行,录记

    Field:性属

    Pig不要求同一个bag面里的各个tuple有雷同数量或雷同类型的field

    

    Pig latin经常使用语句

    LOAD:指出载入数据的方法

    FOREACH:逐行扫描行进某种处置

    FILTER:滤过行

    DUMP:把结果表现到屏幕

    STORE:把结果保存到文件

    pig脚本实例:

    

    grunt> records = LOAD 'input/ncdc/micro-tab/sample.txt'

    >> AS (year:chararray, temerature:int, quality:int);

    

    

    

    

    

    

    (1949,111)

    (1950,22)

    由此功成算计年每的最高气温。

    

    

    实例二:

    

    

文章结束给大家分享下程序员的一些笑话语录: 看到有人回帖“不顶不是中国人”,他的本意是想让帖子沉了。

转载地址:http://twhzo.baihongyu.com/

你可能感兴趣的文章
再次更新
查看>>
mysql的数据类型int、bigint、smallint 和 tinyint取值范围
查看>>
利用网易获取所有股票数据
查看>>
移动铁通宽带上网设置教程
查看>>
Python算法(含源代码下载)
查看>>
利用Windows自带的Certutil查看文件MD5
查看>>
通过原生js添加div和css
查看>>
简单的导出表格和将表格下载到桌面上。
查看>>
《ArcGIS Engine+C#实例开发教程》第一讲桌面GIS应用程序框架的建立
查看>>
查询指定名称的文件
查看>>
Python 嵌套列表解析
查看>>
[GXOI/GZOI2019]旧词——树链剖分+线段树
查看>>
anroid 广播
查看>>
AJAX POST&跨域 解决方案 - CORS
查看>>
开篇,博客的申请理由
查看>>
Servlet 技术全总结 (已完成,不定期增加内容)
查看>>
[JSOI2008]星球大战starwar BZOJ1015
查看>>
centos 7 部署LDAP服务
查看>>
揭秘马云帝国内幕:马云的野心有多大
查看>>
iOS项目分层
查看>>