• Hive内置字符串函数及其示例


    来源:https://sparkbyexamples.com/apache-hive/hive-built-in-string-functions-with-examples/

    Hive supports several built-in string functions similar to SQL functions to manipulate the strings. These Hive string functions come in handy when you are doing transformations without bringing data into Spark and using String functions or any equivalent frameworks.

     

    In this article let’s learn the most used String Functions syntax, usage, description along with examples.

    Hive String Functions List

    With every new version, Hive has been releasing new String functions to work with Query Language (HiveQL), you can use these built-in functions on Hive Beeline CLI Interface or on HQL queries using different languages and frameworks.

    When you need to perform any string manipulations, we often think to write our own code (natural tendency of developers)/UDF however, I would recommend do some research to find out existing functions before you write to UDF as UDF leads to lot of performance issues.

    Most of these function returns String data type, functions that return other types I will call it explicitly.

     
    STRING FUNCTIONUSAGE & DESCRIPTION
    ascii(string str) Returns ASCII numeric value of the first character of the input argument.
    Return: int
    base64(binary bin) Used to convert the binary input argument to base64 string.
    character_length(string str) Returns the number of UTF-8 characters contained in an input string, You can also use alternative char_length(string str).
    Return: int

    concat(string|binary A, string|binary B…)
    Returns the string or bytes after concatenating all strings or bytes passed in as input.
    concat_ws(string SEP, string A, string B…) Similar to concat() additionally takes separator SEP. You should use this if you wanted to concatenate the input strings with a delimiter.
    decode(binary bin, string charset) Decodes the binary into a string using the charset provided in the second argument.
    encode(string src, string charset) Encodes the string into binary type using the character set provided in the second argument.
    Return: binary
    format_number(number x, int d) Returns the formatted number like ‘#,###,###.##’
    instr(string str, string substr) It looks for the first occurrence of substr in str and returns the position. It is based on Index 1 no zero.
    Return: int or null when one of the input is null
    length(string A) Returns the length of the string.
    Return: int
    locate(string substr, string str[, int pos]) Returns the position of the first occurrence of substr in str after position pos.
    Return: int
    lower(string A) lcase(string A) Converts the input string into lower case.
    lpad(string str, int len, string pad) Returns string after appending string at the left by the length specified. In case of empty pad string, the return value is null.
    ltrim(string A) Returns a string after trimming all spaces at the left side of the input string
    printf(String format, Obj… args) Returns a formatted input string.
    quote(String text) Returns the quoted string (Includes escape character for any single quotes HIVE-4.0.0)
    regregexp_extract(string subject, string pattern, int index) It extracts the string matching with regular expression.
    regexp_replace() regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT)
    repeat(string str, int n) returns a String after repeating n number of times.
    replace(string A, string OLD, string NEW) Replace all occurrences of the old string with the new string on input string A.
    reverse(string A) Reverses the input string.
    rpad(string str, int len, string pad) Right padding the characters.
    rtrim(string A) Returns a string after trimming all spaces at the right side of the input string.
    space(int n) Returns a string with n spaces.
    split(string str, string pat) Split the input string str by the regular pattern specified.
    substr(string|binary A, int start) substring(string|binary A, int start) Extract the substring from the start position.
    trim(string A) trimming spaces from left and right.
    upper(string A) ucase(string A) Converts the input string into upper case.
    initcap(string A) Capitalize the first letter of each word of an input string sentence.
    soundex() Returns Soundex code of the input string.
    levenshtein(string A, string B) Returns the Levenshtein distance between two strings

    Hive String Functions Examples

    Below are some of the Hive most using String functions with examples. Hope these are useful.

    ascii() function converts char to ascii value

    
    0: jdbc:hive2://> select ascii("ABC"); ==> Returns 65
    

    character_length() returns length of the string

    
    jdbc:hive2://> select character_length("ABC"); ==> Returns 3
    
    

    concat() function concatenates the multiple strings

    
    jdbc:hive2://> select concat("ABC","DEF"); ==> Returns ABCDEF
    

    concat_ws() function concatenates string with delimiter separator

    
    jdbc:hive2://> select concat_ws("|","1","2","3"); ==> Returns 1|2|3
    

    format_number() function formats integer with comma separator

    
    jdbc:hive2://> select format_number(1234567,3); ==> Returns 1,234,567.000
    jdbc:hive2://> select format_number(1234567,0); ==> Returns 1,234,567
    jdbc:hive2://> select format_number(1234567.23456,3); ==>  1,234,567.235
    

    instr() searches for a string in a input string and returns the index

    
    jdbc:hive2://> select instr("usa is a land of opportunity","is"); ==> Returns 5
    

    length() function returns the length of the string

    jdbc:hive2://> select length("123456789"); ==> Returns 9
    

    locate() Example

    similar to instr function but returns the string position found after the specified index

    
    jdbc:hive2://> select locate("is","usa is a usa is a"); ==> Returns 5
    jdbc:hive2://> select locate("is","usa is a usa is a",6); ==> Returns 14
    

    lower() returns the string in lower case

    lcase() similar to lower() that returns the string in lower case

    ltrim() left trim, usually used to trim the spaces on left side of the string.

    
    jdbc:hive2://> select lower("UNITEDSTATES"); ==> unitedstates
    jdbc:hive2://> select lcase("UNITEDSTATES"); ==> unitedstates
    jdbc:hive2://> select ltrim("    UNITEDSTATES"); ==> UNITEDSTATES
    

    printf() returns the formatted string

    To format the output string, sse %s for string, %d for integer number and %f for float.

    
    jdbc:hive2://> select printf('%d UNITED',45); ==> Returns 45 UNITED
    jdbc:hive2://> select printf('%S UNITED %d','USA',45); ==> USA UNITED 45 
    

    regregexp_extract() Example

    
    

    regexp_replace() Example

    
    

    repeat() the input string n number of times

    
    jdbc:hive2://> select repeat("USA",5); ==> Returns USAUSAUSAUSAUSA
    

    replace() all occurances of a string with a new string

    
    jdbc:hive2://> select replace("ABC CDF ABC CDF","CDF",'XYZ'); Returns ABC XYZ ABC XYZ
    

    reverse() function reverse the string

    
    jdbc:hive2://> select reverse("ABCDEF"); ==> Returns FEDCBA
    

    rpad() right pad is used to add the spaces or characters to input string

    If input string is longer than length, the return value is shortened to specified len characters

    
    jdbc:hive2://> select rpad("UNITED",10,'0'); ==> Returns UNITED0000
    jdbc:hive2://> select rpad("UNITED",10,' '); ==> Returns 'UNITED    ' // with out quotes
    jdbc:hive2://> select rpad("UNITEDSTATES",10,'0'); ==> Returns UNITEDSTAT
    jdbc:hive2://> select rpad("UNITEDSTATES",10,null); ==> Returns  NULL
    

    rtrim() eliminates the spaces from the right (right trim ) side of the string. For example, trim(‘ bar ‘) results in ‘ bar’

    space() returns a string with the specified number of spaces.

    
    0: jdbc:hive2://> select space(10); ==> Returns '          '
    

    split() the string into an array based on delimiter separator (comma, pipe, space e.t.c)

    Below example splits using space delimiter.

    
    jdbc:hive2://> select split("USA IS A PLACE"," ");
    Returns: ["USA","IS","A","PLACE"]
    

    substr() | substring() is used to get the part of the string from the starting index and n number of characters.

    
    jdbc:hive2://> select substr("USA IS A PLACE",5,2); ==> Returns IS
    jdbc:hive2://> select substr("USA IS A PLACE",5,100); ==> Returns IS A PLACE
    

    trim() – trimming spaces from both ends of a string. For example, trim(‘ bar ‘) results in ‘bar’

    This is a result of using ltrim and rtrim functions.

    upper() | ucase() changes the string to upper case or all caps.

    
    jdbc:hive2://> select upper("unitedstates"); ==> Returns UNITEDSTATES
    jdbc:hive2://> select ucase("unitedstates"); ==> Returns UNITEDSTATES
    

    initcap() returns input string after converting first letter of a word to upper/capital case

    
    jdbc:hive2://> select initcap("USA IS A PLACE");
    Returns: Usa Is A Place
    

    soundex() function returns soundex of the string

    
    jdbc:hive2://> select soundex("PLACE"); ==> Returns P420
    jdbc:hive2://> select soundex("UNITED"); ==> Returns U533
    

    You May Like Related Articles

  • 相关阅读:
    mssql锁
    gridview 分页兼容BOOTSTRAP
    BOOTSTRAP前端模板
    bootstrap 简单模板
    ajax 跨域访问的解决方案
    webapi之权限验证
    webapi权限常见错误
    ajax跨域解决方案
    iis 部署webapi常见错误及解决方案
    OOM AutoMapper的简单实用
  • 原文地址:https://www.cnblogs.com/144823836yj/p/14755419.html
Copyright © 2020-2023  润新知