• Awk基本入门[6] Additional Awk Commands 2


    4、 Generic String Functions


     Index Function

    The index function can be used to get the index (location) of the given string (or character) in an input string.

    You can also use index to check whether a given string (or character) is present in an input string. If the given string is not present, it will return the location as 0, which means the given string doesn't exist, as shown below.

    $ cat index.awk
    BEGIN {
        state="CA is California"
        print "String CA starts at location",index(state,"CA");
        print "String Cali starts at location",index(state,"Cali");
        if (index(state,"NY")==0)
            print "String NY is not found in:", state
    }
    
    $ awk -f index.awk
    String CA starts at location 1
    String Cali starts at location 7
    String NY is not found in: CA is California

    Length Function

    The length function returns the length of a string. In the following example, we print the total number of characters in each record of 
    the items.txt file.

    $ awk '{print length($0)}' items.txt
    29
    32
    27
    31
    30

    Split Function

    Syntax:

    split(input-string,output-array,separator)

    This split function splits a string into individual array elements. It takes following three arguments.
    • input-string: This is the input string that needs to be split into multiple strings.
    • output-array: This array will contain the split strings as individual elements.
    • separator: The separator that should be used to split the input-string.

    For this example, the original items-sold.txt file is slightly changed to have different field delimiters, i.e. a colon to separate the item number and the quantity sold. Within quantity sold, the individual quantities are separated by comma.

    So, in order for us to calculate the total number of items sold for a particular item, we should take the 2nd field (which is all the quantities sold delimited by comma), split them using comma separator and store the substrings in an array, then loop through the array to add the quantities.

    $ cat items-sold1.txt
    101:2,10,5,8,10,12
    102:0,1,4,3,0,2
    103:10,6,11,20,5,13
    104:2,3,4,0,6,5
    105:10,2,5,7,12,6
    
    
    $ cat split.awk
    BEGIN {
        FS=":"
    }
    {
        split($2,quantity,",");
        total=0;
       for (x in quantity)
        total=total+quantity[x];
        print "Item", $1, ":", total, "quantities sold";
    }
    $ awk -f split.awk items-sold1.txt
    Item 101 : 47 quantities sold
    Item 102 : 10 quantities sold
    Item 103 : 65 quantities sold
    Item 104 : 20 quantities sold
    Item 105 : 42 quantities sold

    Substr Function
    Syntax:

    substr(input-string, location, length)

    The substr function extracts a portion of a given string. In the above syntax:
    • input-string: The input string containing the substring.
    • location: The starting location of the substring.
    • length: The total number of characters to extract from the starting location. This parameter is optional. When you don't specify it extracts the rest of the characters from the starting location.

    Start from the 1st character (of the 2nd field) and prints 5 characters:

    $ awk -F"," '{print substr($2,1,5)}' items.txt
    HD Ca
    Refri
    MP3 P
    Tenni
    Laser

    2、GAWK/NAWK String Functions


     These string functions are available only in GAWK and NAWK flavors.

    Sub Function
    syntax:

    sub(original-string,replacement-string,string-variable)

    • sub stands for substitution.
    • original-string: This is the original string that needs to be replaced. This can also be a regular expression.
    • replacement-string: This is the replacement string.
    • string-variable: This acts as both input and output string variable. You have to be careful with this, as after the successful substitution, you lose the original value in this string-variable.

    In the following example:

    • original-string: This is the regular expression C[Aa], which matches either "CA" or "Ca"
    • replacement-string: When the original-string is found, replace it with "KA"
    • string-variable: Before executing the sub, the variable contains the input string. Once the replacement is done, the variable contains the output string.

    Please note that sub replaces only the 1st occurrence of the match.

    $ cat sub.awk
    BEGIN {
        state="CA is California"
        sub("C[Aa]","KA",state);
        print state;
    }
    
    
    $ awk -f sub.awk
    KA is California

    The 3rd parameter string-variable is optional. When it is not specified, awk will use $0 (the current line), as shown below. This example changes the first 2 characters of the record from "10" to "20". So, the item number 101 becomes 201, 102 becomes 202, etc.

    $ awk '{ sub("10","20"); print $0; }' items.txt
    201,HD Camcorder,Video,210,10
    202,Refrigerator,Appliance,850,2
    203,MP3 Player,Audio,270,15
    204,Tennis Racket,Sports,190,20
    205,Laser Printer,Office,475,5

    When a successful substitution happens, the sub function returns 1, otherwise it returns 0.

    Print the record only when a successful substitution occurs:

    $ awk '{ if (sub("HD","High-Def")) print $0; }'  items.txt
    101,High-Def Camcorder,Video,210,10

    Gsub Function

    gsub stands for global substitution. gsub is exactly same as sub, except that all occurrences of original-string are changed to replacement-string.

    In the following example, both "CA" and "Ca" are changed to "KA":

    $ cat gsub.awk
    BEGIN {
        state="CA is California"
        gsub("C[Aa]","KA",state);
        print state;
    }
    
    $ awk -f gsub.awk
    KA is KAlifornia

    As with sub, the 3rd parameter is optional. When it is not specified, awk will use $0 just as sub.

    Match Function () and RSTART, RLENGTH variables

    Match function searches for a given string (or regular expression) in the input-string, and returns a positive value when a successful match occurs.

    Syntax:

    match(input-string,search-string)

    • input-string: This is the input-string that needs to be searched.
    • search-string: This is the search-string, that needs to be search in the input-string. This can also be a regular expression.

    The following example searches for the string "Cali" in the state string variable. If present, it prints a successful message.

    $ cat match.awk
    BEGIN {
        state="CA is California"
        if (match(state,"Cali")) {
            print substr(state,RSTART,RLENGTH),"is present in:", state;
        }
    }
    
    
    $ awk -f match.awk
    Cali is present in: CA is California

    Match sets the following two special variables. The above example uses these in the substring function call, to print the pattern in the success message.
    • RSTART - The starting location of the search-string
    • RLENGTH - The length of the search-string.

    index(string1, subStr) == match(string1, subStr)

    3、GAWK String Functions


    tolower and toupper are available only in Gawk. As the name suggests the function converts the given string to lower case or upper case as shown below.

    $ awk '{print tolower($0)}' items.txt
    101,hd camcorder,video,210,10
    102,refrigerator,appliance,850,2
    103,mp3 player,audio,270,15
    104,tennis racket,sports,190,20
    105,laser printer,office,475,5
    
    
    $ awk '{print toupper($0)}' items.txt
    101,HD CAMCORDER,VIDEO,210,10
    102,REFRIGERATOR,APPLIANCE,850,2
    103,MP3 PLAYER,AUDIO,270,15
    104,TENNIS RACKET,SPORTS,190,20
    105,LASER PRINTER,OFFICE,475,5

     

     

     

  • 相关阅读:
    作业帮:最长连续序列(头部插入)
    作业帮:字符串反转(头部插入)
    作业帮:给定一个整数数组,找出其中两个数相加等于目标值(去重set)
    JVM系列之七:HotSpot 虚拟机
    JVM系列之六:内存溢出、内存泄漏 和 栈溢出
    JVM系列之四:运行时数据区
    JVM系列之五:垃圾回收
    JVM系列之三:类装载器子系统
    JVM系列之二:编译过程
    JVM系列之一:JVM架构
  • 原文地址:https://www.cnblogs.com/yangfengtao/p/3305638.html
Copyright © 2020-2023  润新知