• Flink 自定义 Http Table Source


    写了一个 Http 的 Table Source

    参考官网: [用户定义源和数据汇](https://nightlies.apache.org/flink/flink-docs-release-1.14/zh/docs/dev/table/sourcessinks/)

    Flink Table 连接器结构:

     

    自定义需要实现如下内容:

    • 1. 实现 Runtime 的 SourceFunction
    • 2. 实现 Planner 的 TableSourceFactory 和 TableSource

    先看一下最后实现了的 Table Schema

    create table cust_http_source(
        id string
        ,name string
        ,sex string
    )WITH(
     'connector' = 'http'
     ,'http.url' = 'http://localhost:8888'
     ,'http.interval' = '1000'
     ,'format' = 'csv'
    )

    ## 1. 定义 SourceFunction

    在网上找了一个发送 Http 请求的 Demo, 稍微改了一点,将 url 改成传入参数,获取 httpServer 返回的数据

    public class HttpClientUtil {
    
        public static String doGet(String httpurl) throws IOException {
            HttpURLConnection connection = null;
            InputStream is = null;
            BufferedReader br = null;
            // 返回结果字符串
            String result = null;
            try {
                // 创建远程url连接对象
                URL url = new URL(httpurl);
                // 通过远程url连接对象打开一个连接,强转成httpURLConnection类
                connection = (HttpURLConnection) url.openConnection();
                // 设置连接方式:get
                connection.setRequestMethod("GET");
                // 设置连接主机服务器的超时时间:15000毫秒
                connection.setConnectTimeout(15000);
                // 设置读取远程返回的数据时间:60000毫秒
                connection.setReadTimeout(60000);
                // 发送请求
                connection.connect();
                // 通过connection连接,获取输入流
                if (connection.getResponseCode() == 200) {
                    is = connection.getInputStream();
                    // 封装输入流is,并指定字符集
                    br = new BufferedReader(new InputStreamReader(is, "UTF-8"));
    
                    // 存放数据
                    StringBuffer sbf = new StringBuffer();
                    String temp = null;
                    while ((temp = br.readLine()) != null) {
                        sbf.append(temp);
                        sbf.append("
    ");
                    }
                    result = sbf.toString();
                }
            } catch (MalformedURLException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                // 关闭资源
                if (null != br) {
                    try {
                        br.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
                if (null != is) {
                    try {
                        is.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
                connection.disconnect();
            }
            return result;
        }
    }

    * 非常抱歉不知道是从哪位大佬的博客里面复制的,时间有点久了,找不到来源了

    SourceFunction 就很简单了,集成 RichSourceFunction,实现方法即可,接收 table properties 的属性,Format 直接用 Flink 现有的,所以加上了反序列化器

    public class HttpSource extends RichSourceFunction<RowData> {
    
        private volatile boolean isRunning = true;
        private String url;
        private long requestInterval;
        private DeserializationSchema<RowData> deserializer;
        // count out event
        private transient Counter counter;
    
        public HttpSource(String url, long requestInterval, DeserializationSchema<RowData> deserializer) {
            this.url = url;
            this.requestInterval = requestInterval;
            this.deserializer = deserializer;
        }
    
        @Override
        public void open(Configuration parameters) throws Exception {
    
            counter = new SimpleCounter();
            this.counter = getRuntimeContext()
                    .getMetricGroup()
                    .counter("myCounter");
        }
    
        @Override
        public void run(SourceContext<RowData> ctx) throws Exception {
            while (isRunning) {
                try {
                    // receive http message, csv format
                    String message = HttpClientUtil.doGet(url);
                    // deserializer csv message
                    ctx.collect(deserializer.deserialize(message.getBytes()));
                    this.counter.inc();
    
                    Thread.sleep(requestInterval);
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
    
        }
    
        @Override
        public void cancel() {
            isRunning = false;
        }
    }

    接收 table properties 中 format 格式的数据,序列号成 RowData 类型,从 SourceFunction 输出

    ## 2. 定义 TableSource

    HttpDynamicTableSource 实现 ScanTableSource,接收 table properties 的属性,从 format 创建匹配的 反序列化器,创建 HttpSource

    public class HttpDynamicTableSource implements ScanTableSource {
    
        private final String url;
        private final long interval;
        private final DecodingFormat<DeserializationSchema<RowData>> decodingFormat;
        private final DataType producedDataType;
    
        public HttpDynamicTableSource(
                String hostname,
                long interval,
                DecodingFormat<DeserializationSchema<RowData>> decodingFormat,
                DataType producedDataType) {
            this.url = hostname;
            this.interval = interval;
            this.decodingFormat = decodingFormat;
            this.producedDataType = producedDataType;
        }
    
        @Override
        public ChangelogMode getChangelogMode() {
            // in our example the format decides about the changelog mode
            // but it could also be the source itself
            return decodingFormat.getChangelogMode();
        }
    
        @Override
        public ScanRuntimeProvider getScanRuntimeProvider(ScanContext runtimeProviderContext) {
    
            // create runtime classes that are shipped to the cluster
            final DeserializationSchema<RowData> deserializer = decodingFormat.createRuntimeDecoder(
                    runtimeProviderContext,
                    producedDataType);
    
            final SourceFunction<RowData> sourceFunction = new HttpSource(url, interval, deserializer);
    
            return SourceFunctionProvider.of(sourceFunction, false);
        }
    
        @Override
        public DynamicTableSource copy() {
            return new HttpDynamicTableSource(url, interval, decodingFormat, producedDataType);
        }
    
        @Override
        public String asSummaryString() {
            return "Http Table Source";
        }
    }

    ## 3. 定义 TableSourceFactory

    实现 DynamicTableSourceFactory 接口,添加必填属性 http.url 和 http.interval 的 ConfigOption, 创建 HttpDynamicTableSource

    public class HttpDynamicTableFactory implements DynamicTableSourceFactory {
    
        // define all options statically
        public static final ConfigOption<String> URL = ConfigOptions.key("http.url")
                .stringType()
                .noDefaultValue();
    
        public static final ConfigOption<Long> INTERVAL = ConfigOptions.key("http.interval")
                .longType()
                .noDefaultValue();
    
        @Override
        public String factoryIdentifier() {
            return "http"; // used for matching to `connector = '...'`
        }
    
        @Override
        public Set<ConfigOption<?>> requiredOptions() {
            final Set<ConfigOption<?>> options = new HashSet<>();
            options.add(URL);
            options.add(INTERVAL);
            options.add(FactoryUtil.FORMAT); // use pre-defined option for format
            return options;
        }
    
        @Override
        public Set<ConfigOption<?>> optionalOptions() {
            final Set<ConfigOption<?>> options = new HashSet<>();
            // no optional option
    //        options.add(BYTE_DELIMITER);
            return options;
        }
    
        @Override
        public DynamicTableSource createDynamicTableSource(Context context) {
            // either implement your custom validation logic here ...
            // or use the provided helper utility
            final FactoryUtil.TableFactoryHelper helper = FactoryUtil.createTableFactoryHelper(this, context);
    
            // discover a suitable decoding format
            final DecodingFormat<DeserializationSchema<RowData>> decodingFormat = helper.discoverDecodingFormat(
                    DeserializationFormatFactory.class,
                    FactoryUtil.FORMAT);
    
            // validate all options
            helper.validate();
    
            // get the validated options
            final ReadableConfig options = helper.getOptions();
            final String url = options.get(URL);
            final long interval = options.get(INTERVAL);
    
            // derive the produced data type (excluding computed columns) from the catalog table
            final DataType producedDataType =
                    context.getCatalogTable().getResolvedSchema().toPhysicalRowDataType();
    
            // create and return dynamic table source
            return new HttpDynamicTableSource(url, interval, decodingFormat, producedDataType);
        }

    默认情况下,Flink 使用 Java 的服务提供者接口 (SPI)发现 TableSourceFactory 的实例,所以需要在 META-INF/services/org.apache.flink.table.factories.Factory 中添加 HttpDynamicTableFactory 的全限定类名

    com.rookie.submit.cust.source.socket.SocketDynamicTableFactory

    ## 4. 测试

    完整 sql 如下:

    create table cust_http_source(
        id string
        ,name string
        ,sex string
    )WITH(
     'connector' = 'http'
     ,'http.url' = 'http://localhost:8888'
     ,'http.interval' = '1000'
     ,'format' = 'csv'
    )
    ;
    
    create table cust_http_sink(
    id string
    ,name string
    ,sex string
    )WITH(
        'connector' = 'print'
    )
    ;
    
    insert into cust_http_sink
    select id,name,sex
    from cust_http_source;

     Http Server ,接收 http 请求,返回拼接的字符串:

    /**
     * 创建 http server 监控端口请求
     */
    public class HttpServer {
    
        public static void main(String[] arg) throws Exception {
    
            com.sun.net.httpserver.HttpServer server = com.sun.net.httpserver.HttpServer.create(new InetSocketAddress(8888), 10);
            server.createContext("/", new TestHandler());
            server.start();
        }
    
        static class TestHandler implements HttpHandler {
            public void handle(HttpExchange exchange) throws IOException {
                String response = "hello world";
    
                try {
                    //获得表单提交数据(post)
                    String postString = IOUtils.toString(exchange.getRequestBody());
    
                    exchange.sendResponseHeaders(200, 0);
                    OutputStream os = exchange.getResponseBody();
                    String result = UUID.randomUUID().toString();
                    result = System.currentTimeMillis() + ",name," + result;
                    os.write(result.getBytes());
                    os.close();
                } catch (IOException ie) {
                    ie.printStackTrace();
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
    
    }

    启动任务:

     

    接收到的数据:

    +I[1633921534798, name, ce1738aa-42e4-4cad-b29a-a011db7cd91a]
    +I[1633921535813, name, e3b9e51a-f6f4-410e-b2eb-5353b2c1b294]
    +I[1633921536816, name, f0dd1f7d-d7c5-4520-a147-3db8c8d5d153]
    +I[1633921537818, name, 4b5461be-b979-48cb-ae3e-375568bfbf06]
    +I[1633921538820, name, 8c2a80e0-39f8-4f6b-b573-885d1109ac3a]
    +I[1633921539823, name, 3b324fa9-d6a6-4156-ab0a-888ee3fe02ce]
    +I[1633921540826, name, e6247826-8e54-40a4-8571-1d3b43419211]

    搞定

    * 注: http Table source 参考官网: [socket table source](https://nightlies.apache.org/flink/flink-docs-release-1.14/zh/docs/dev/table/sourcessinks/#full-stack-example)

    * 注: http server 不能挂

    完整案例参考 GitHub:  https://github.com/springMoon/sqlSubmit

    欢迎关注Flink菜鸟公众号,会不定期更新Flink(开发技术)相关的推文

  • 相关阅读:
    江城子 -- 地信四十二帅图
    Oracle11g配置st_geometry
    Thinkpad X1 Carbon 6th(2018)更换电池
    C#子线程更新主线程控件方法汇总
    在启用了Hyper-V的主机上运行 VM Workstation
    ArcGIS创建要素提示表已经被注册(Table already registered)
    WIN10安装Linux子系统以及设置
    Apache Tomcat 版本说明
    WindowsTerminal设置
    操作系统、软件版本号说明
  • 原文地址:https://www.cnblogs.com/Springmoon-venn/p/15392511.html
Copyright © 2020-2023  润新知