• Awk基本入门[2] Awk Builtin Variables

    1、FS - Input Field Separator


    awk -F ',' '{print $2, $3}' employee.txt


    awk 'BEGIN {FS=","} {print $2, $3}' employee.txt


    $ vi employee-multiple-fs.txt
    101,John Doe:CEO%10000
    102,Jason Smith:IT Manager%5000
    103,Raj Reddy:Sysadmin%4500
    104,Anand Ram:Developer%4500
    105,Jane Miller:Sales Manager%3000

    You can specify MULTIPLE field separators using a regular expression. For example FS = "[,:%]" indicates that the field separator can be , or : or %

    So, the following example will print the name and the title from the employee-multiple-fs.txt file that contains different field separators.

    $ awk 'BEGIN {FS="[,:%]"} {print $2, $3}' \
    John Doe CEO
    Jason Smith IT Manager
    Raj Reddy Sysadmin
    Anand Ram Developer
    Jane Miller Sales Manager



    $ echo abcdefghigk | awk 'BEGIN{FIELDWIDTHS="1 2"} {$1=$1;print $0}'
    a bc
    $ echo abcdefghigk | awk 'BEGIN{FIELDWIDTHS="1 2 3"} {$1=$1;print $0}'
    a bc def




    $ cat addresses.csv
    Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA

    注意到其中的地址字段("1234 A Pretty Street, NE")中包含了一个“,”,如果采用了FS=","来分隔输入域,则地址会被拆分成两部分:

    "1234 A Pretty Street NE





    $ cat simple-csv.awk 
    BEGIN {
             FPAT = "([^,]+)|(\"[^\"]+\")"
             print "NF = ", NF
             for (i = 1; i <= NF; i++) {
                 printf("$%d = <%s>\n", i, $i)
     $ gawk -f simple-csv.awk addresses.csv
    NF =  7
    $1 = <Robbins>
    $2 = <Arnold>
    $3 = <"1234 A Pretty Street, NE">
    $4 = <MyTown>
    $5 = <MyState>
    $6 = <12345-6789>
    $7 = <USA>



    3、OFS - Output Field Separator


    When you use a single print statement to print two 

    variables by separating them with comma (as shown below), it will print the values of those two variables separated by space.

    $ awk -F ',' '{print $2, $3}' employee.txt
    John Doe CEO
    Jason Smith IT Manager
    Raj Reddy Sysadmin
    Anand Ram Developer
    Jane Miller Sales Manager

    The following print statement is printing two variables ($2 and $4) separated by comma, however the output will have colon separating them (instead of space), as our OFS is set to colon.

    $ awk -F ',' 'BEGIN { OFS=":" } \
    { print $2, $3 }' employee.txt
    John Doe:CEO
    Jason Smith:IT Manager
    Raj Reddy:Sysadmin
    Anand Ram:Developer
    Jane Miller:Sales Manager

    When you specify a comma in the print statement between different print values, awk will use the OFS. In the following example, the default OFS is used, so you'll see a space between the values in the output.

    $ awk 'BEGIN { print "test1","test2" }'
    test1 test2

    When you don't separate values with a comma in the print statement, awk will not use the OFS; instead it will print the values with nothing in between.

    $ awk 'BEGIN { print "test1" "test2" }'

    4、RS - Record Separator


    $ vi employee-one-line.txt
    101,John Doe:102,Jason Smith:103,Raj Reddy:104,Anand
    Ram:105,Jane Miller



    $ awk -F, '{print $2}' employee-one-line.txt
    John Doe:102

    这是因为awk将整行文本作为一条记录,而且逗号作为域分隔符,所以第二个域就是John Doe:102。所以如果想要将整行文本作为5条记录来处理,需要显示的指定记录分隔符:

    $ awk -F, 'BEGIN { RS=":" } \
    { print $2 }' employee-one-line.txt
    John Doe
    Jason Smith
    Raj Reddy
    Anand Ram
    Jane Miller

    5、ORS - Output Record Separator


    $ awk 'BEGIN { FS=","; ORS="\n---\n" } \
    {print $2, $3}' employee.txt
    John Doe CEO
    Jason Smith IT Manager
    Raj Reddy Sysadmin
    Anand Ram Developer
    Jane Miller Sales Manager

    6、NR - Number of Records


    NR is very helpful. When used inside the loop, this gives the line number. When used in the END block, this gives the total number of records in the file.

    The following example shows how NR works in the body block,and in the END block:

    $ awk 'BEGIN {FS=","} \
    {print "Emp Id of record number",NR,"is",$1;} \
    END {print "Total number of records:",NR}' employee.txt
    Emp Id of record number 1 is 101
    Emp Id of record number 2 is 102
    Emp Id of record number 3 is 103
    Emp Id of record number 4 is 104
    Emp Id of record number 5 is 105
    Total number of records: 5

    7、FILENAME – Current File Name


    FILENAME is helpful when you are specifying multiple input-files to the awk program. This will give you the name of the file Awk is currently processing.

    $ awk '{ print FILENAME }' \
    employee.txt employee-multiple-fs.txt

    8、FNR - File "Number of Record"


    NR keeps
    growing between multiple files. When the body block starts processing the 2nd file, NR will not be reset to 1, instead it will continue from the last NR number value of the previous file.

    FNR will give you record number within the current file. So, when awk finishes executing the body block for the 1st file and starts the body block the next file, FNR will start from 1 again.

    The following example shows both NR and FNR:

    $ vi fnr.awk
    BEGIN {
    printf "FILENAME=%s NR=%s FNR=%s\n", FILENAME, NR,
    END {
    printf "END Block: NR=%s FNR=%s\n", NR, FNR
    $ awk -f fnr.awk employee.txt employee-multiple-fs.txt
    FILENAME=employee.txt NR=1 FNR=1
    FILENAME=employee.txt NR=2 FNR=2
    FILENAME=employee.txt NR=3 FNR=3
    FILENAME=employee.txt NR=4 FNR=4
    FILENAME=employee.txt NR=5 FNR=5
    FILENAME=employee-multiple-fs.txt NR=6 FNR=1
    FILENAME=employee-multiple-fs.txt NR=7 FNR=2
    FILENAME=employee-multiple-fs.txt NR=8 FNR=3
    FILENAME=employee-multiple-fs.txt NR=9 FNR=4
    FILENAME=employee-multiple-fs.txt NR=10 FNR=5
    END Block: NR=10 FNR=5
  • 相关阅读:
    4-1 R语言函数 lapply
    3-6 向量化操作
    3-5 处理缺失值
    3-4 列表的子集
    3-3 数据框的子集
    3-2 矩阵的子集
    bootstrap 模式对话框
    手机端 超链接 识别电话号码
    TP 框架 ajax[利用异步提交表单]
  • 原文地址:https://www.cnblogs.com/yangfengtao/p/3124100.html
Copyright © 2020-2023  润新知