来源:https://sparkbyexamples.com/apache-hive/hive-built-in-string-functions-with-examples/
Hive supports several built-in string functions similar to SQL functions to manipulate the strings. These Hive string functions come in handy when you are doing transformations without bringing data into Spark and using String functions or any equivalent frameworks.
In this article let’s learn the most used String Functions syntax, usage, description along with examples.
Hive String Functions List
With every new version, Hive has been releasing new String functions to work with Query Language (HiveQL), you can use these built-in functions on Hive Beeline CLI Interface or on HQL queries using different languages and frameworks.
When you need to perform any string manipulations, we often think to write our own code (natural tendency of developers)/UDF however, I would recommend do some research to find out existing functions before you write to UDF as UDF leads to lot of performance issues.
Most of these function returns String data type, functions that return other types I will call it explicitly.
STRING FUNCTION | USAGE & DESCRIPTION |
---|---|
ascii(string str) | Returns ASCII numeric value of the first character of the input argument. Return: int |
base64(binary bin) | Used to convert the binary input argument to base64 string. |
character_length(string str) | Returns the number of UTF-8 characters contained in an input string, You can also use alternative char_length(string str). Return: int |
concat(string|binary A, string|binary B…) |
Returns the string or bytes after concatenating all strings or bytes passed in as input. |
concat_ws(string SEP, string A, string B…) | Similar to concat() additionally takes separator SEP. You should use this if you wanted to concatenate the input strings with a delimiter. |
decode(binary bin, string charset) | Decodes the binary into a string using the charset provided in the second argument. |
encode(string src, string charset) | Encodes the string into binary type using the character set provided in the second argument. Return: binary |
format_number(number x, int d) | Returns the formatted number like ‘#,###,###.##’ |
instr(string str, string substr) | It looks for the first occurrence of substr in str and returns the position. It is based on Index 1 no zero.Return: int or null when one of the input is null |
length(string A) | Returns the length of the string. Return: int |
locate(string substr, string str[, int pos]) | Returns the position of the first occurrence of substr in str after position pos. Return: int |
lower(string A) lcase(string A) | Converts the input string into lower case. |
lpad(string str, int len, string pad) | Returns string after appending string at the left by the length specified. In case of empty pad string, the return value is null. |
ltrim(string A) | Returns a string after trimming all spaces at the left side of the input string |
printf(String format, Obj… args) | Returns a formatted input string. |
quote(String text) | Returns the quoted string (Includes escape character for any single quotes HIVE-4.0.0) |
regregexp_extract(string subject, string pattern, int index) | It extracts the string matching with regular expression. |
regexp_replace() | regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT) |
repeat(string str, int n) | returns a String after repeating n number of times. |
replace(string A, string OLD, string NEW) | Replace all occurrences of the old string with the new string on input string A. |
reverse(string A) | Reverses the input string. |
rpad(string str, int len, string pad) | Right padding the characters. |
rtrim(string A) | Returns a string after trimming all spaces at the right side of the input string. |
space(int n) | Returns a string with n spaces. |
split(string str, string pat) | Split the input string str by the regular pattern specified. |
substr(string|binary A, int start) substring(string|binary A, int start) | Extract the substring from the start position. |
trim(string A) | trimming spaces from left and right. |
upper(string A) ucase(string A) | Converts the input string into upper case. |
initcap(string A) | Capitalize the first letter of each word of an input string sentence. |
soundex() | Returns Soundex code of the input string. |
levenshtein(string A, string B) | Returns the Levenshtein distance between two strings |
Hive String Functions Examples
Below are some of the Hive most using String functions with examples. Hope these are useful.
ascii() function converts char to ascii value
0: jdbc:hive2://> select ascii("ABC"); ==> Returns 65
character_length() returns length of the string
jdbc:hive2://> select character_length("ABC"); ==> Returns 3
concat() function concatenates the multiple strings
jdbc:hive2://> select concat("ABC","DEF"); ==> Returns ABCDEF
concat_ws() function concatenates string with delimiter separator
jdbc:hive2://> select concat_ws("|","1","2","3"); ==> Returns 1|2|3
format_number() function formats integer with comma separator
jdbc:hive2://> select format_number(1234567,3); ==> Returns 1,234,567.000
jdbc:hive2://> select format_number(1234567,0); ==> Returns 1,234,567
jdbc:hive2://> select format_number(1234567.23456,3); ==> 1,234,567.235
instr() searches for a string in a input string and returns the index
jdbc:hive2://> select instr("usa is a land of opportunity","is"); ==> Returns 5
length() function returns the length of the string
jdbc:hive2://> select length("123456789"); ==> Returns 9
locate() Example
similar to instr function but returns the string position found after the specified index
jdbc:hive2://> select locate("is","usa is a usa is a"); ==> Returns 5
jdbc:hive2://> select locate("is","usa is a usa is a",6); ==> Returns 14
lower() returns the string in lower case
lcase() similar to lower() that returns the string in lower case
ltrim() left trim, usually used to trim the spaces on left side of the string.
jdbc:hive2://> select lower("UNITEDSTATES"); ==> unitedstates
jdbc:hive2://> select lcase("UNITEDSTATES"); ==> unitedstates
jdbc:hive2://> select ltrim(" UNITEDSTATES"); ==> UNITEDSTATES
printf() returns the formatted string
To format the output string, sse %s for string, %d for integer number and %f for float.
jdbc:hive2://> select printf('%d UNITED',45); ==> Returns 45 UNITED
jdbc:hive2://> select printf('%S UNITED %d','USA',45); ==> USA UNITED 45
regregexp_extract() Example
regexp_replace() Example
repeat() the input string n number of times
jdbc:hive2://> select repeat("USA",5); ==> Returns USAUSAUSAUSAUSA
replace() all occurances of a string with a new string
jdbc:hive2://> select replace("ABC CDF ABC CDF","CDF",'XYZ'); Returns ABC XYZ ABC XYZ
reverse() function reverse the string
jdbc:hive2://> select reverse("ABCDEF"); ==> Returns FEDCBA
rpad() right pad is used to add the spaces or characters to input string
If input string is longer than length, the return value is shortened to specified len characters
jdbc:hive2://> select rpad("UNITED",10,'0'); ==> Returns UNITED0000
jdbc:hive2://> select rpad("UNITED",10,' '); ==> Returns 'UNITED ' // with out quotes
jdbc:hive2://> select rpad("UNITEDSTATES",10,'0'); ==> Returns UNITEDSTAT
jdbc:hive2://> select rpad("UNITEDSTATES",10,null); ==> Returns NULL
rtrim() eliminates the spaces from the right (right trim ) side of the string. For example, trim(‘ bar ‘) results in ‘ bar’
space() returns a string with the specified number of spaces.
0: jdbc:hive2://> select space(10); ==> Returns ' '
split() the string into an array based on delimiter separator (comma, pipe, space e.t.c)
Below example splits using space delimiter.
jdbc:hive2://> select split("USA IS A PLACE"," ");
Returns: ["USA","IS","A","PLACE"]
substr() | substring() is used to get the part of the string from the starting index and n number of characters.
jdbc:hive2://> select substr("USA IS A PLACE",5,2); ==> Returns IS
jdbc:hive2://> select substr("USA IS A PLACE",5,100); ==> Returns IS A PLACE
trim() – trimming spaces from both ends of a string. For example, trim(‘ bar ‘) results in ‘bar’
This is a result of using ltrim and rtrim functions.
upper() | ucase() changes the string to upper case or all caps.
jdbc:hive2://> select upper("unitedstates"); ==> Returns UNITEDSTATES
jdbc:hive2://> select ucase("unitedstates"); ==> Returns UNITEDSTATES
initcap() returns input string after converting first letter of a word to upper/capital case
jdbc:hive2://> select initcap("USA IS A PLACE");
Returns: Usa Is A Place
soundex() function returns soundex of the string
jdbc:hive2://> select soundex("PLACE"); ==> Returns P420
jdbc:hive2://> select soundex("UNITED"); ==> Returns U533