<Impala><Overview><UDF>

<Impala><Overview><UDF>
Overview
- Apache Impala (incubating) is the open source, native analytic database for apache Hadoop.
Features
- Do BI-style Queries on Hadoop:
  - low latency and high concurrency for BI/analytic queries on Hadoop(not delivered by batch frameworks such as Apache Hive).
  - scales linearly, even in multitenant environments.
- Unify ur Infrasturecture: Utilize the same file and data formats and metadata, security, and resource management frameworks as your Hadoop deployment—no redundant infrastructure or data conversion/duplication.
- Implement Quickly: supports SQL
- Count on Enterprise-class Security
- Retain Freedom from Lock-in: open-source
- Expand the Hadoop User-verse
Architecuture
- Circumvents MapReduce to avoid latency, directly access the data through a specialized distributed query engine that is very similar to those found in commercial parallel RDBMSs.
- Some advantages:
  - Thx to local processing on data nodes, network bottlenecks are avoided.
  - A signle, open, and unified metadata store can be utilized.
  - Costly data format conversion is unnecessary and thus no overhead is incurred.
  - All data is immediately query-able, with no delays for ETL.
  - All hardware is utilized for Impala queries as well as for MR.
  - Only a single machine pool is needed to scale.
Documentation

... skip

Impala User-Defined Functions(UDFs)
- UDF let you code ur own application logic for processing column values during an Impala query.
UDFS Concepts
- U can code either scalar functions for producing results one row at a time.
- Or more complex aggregate functions for doing analysis across.
UDFs and UDAFs
- The most general kind of udf takes single input value and produces a single output value. When used in a query, it is called once for each row in the result set. eg:
  select customer_name, is_frequent_customer(customer_id) from customers; select obfuscate(sensitive_column) from sensitive_data;
- A user-defined aggergate function(UDAF) accepts a group of values and returns a single value. U can use UDAFs to summarize and condense sets of rows, in the same style as the built-in COUNT, MAX(), SUM(), and AVG() functions. When called in a query that uses the GROUP BY clause, the function is called once for each combination of GROUP BY values. eg:
  -- Evaluates multiple rows but returns a single value select closest_restaurant(latitude, longitude) from places; -- Evaluates batches of rows and returns a separate value for each batch. select most_profitable_locartion(store_id, sales, expenses, tax_rate, depreciation) from franchise_data group by year;
- Currently, Impala does not support other categories of udf, such as user-defined table functions(UDTFs) or window functions.
Native Impala UDFs
- Impala supports UDFs written in C++, in addition to supporting existing Hive UDFs written in Java.
- Where practical, use C++ UDFs because the compiled native code can yield higher performance, with UDF execution time often 10x faster for a C++ UDF than the equivalent Java UDF.
Using Hive UDFs with Impala
- Impala can run Java-based user-defined functions (UDFs), originally written for Hive, with no changes, subject to the following conditions:
  - The parameter and return value must all use scalar data types supported by Impala. That's to say, complex or nested types are not supported.
  - Currently, Hive UDFs that accept or return the TIMESTAMP type are not supported.
  - Hive UDAFs and UDTFs are not supported.
  - Typically, a Java UDF will execute several times slower in Impala than the equivalent native UDF written in C++.
- What to do next?
  - write ur udf
  - upload the jar to a hdfs path(where impala can read)
  - for each Java-based UDF that u want to call through Impala, issue a CREATE FUNCTION statement, with a LOCATION clause containing the full HDFS path or the JAR file, and a SYMBOL clause with the fully qualified name of the class, using dots as separators and without the .class extension. eg:
    
    create function my_neg(bigint) returns bigint location '/user/hive/udfs/hive.jar' symbol = 'org.apache.hadoop.hive.ql.udf.UDFOPNegative';
  - call the function from ur queries, passing arguments of the correct type to match the function signature.
FYI
相关阅读:
Mac部署hadoop3.2.1(伪分布式) ，Hadoop自带的MapReduce程序(wordcount)，，，，安装scala，hadoop安装启动问题，Pyspark开发环境搭建，MAC Spark安装和环境变量设置
 使用objdump objcopy查看与修改符号表
 alias, bg, bind, break, builtin, caller, cd, command,
virtualbox端口转发
 CMake快速入门教程-实战
 内存管理
 http调试工具,linux调试工具
 CSS Background
RadioButton的check改变的时候
 Docs-->.NET-->API reference-->System.Web.UI.WebControls-->Repeater
原文地址：https://www.cnblogs.com/wttttt/p/7236469.html

<Impala><Overview><UDF>

Overview

Features

Architecuture

Documentation

Impala User-Defined Functions(UDFs)

UDFS Concepts

UDFs and UDAFs

Native Impala UDFs

Using Hive UDFs with Impala

FYI