关于kafka的source部分请参考 上一篇: https://www.cnblogs.com/liufei1983/p/15801848.html
1: 首先下载两个和jdbc和mysql相关的jar包,注意版本,我的flink是1.13.1, 所以flink-connect-jdck_2.11也用1.13.1的版本,否则会报错误。
2: 在MYSQL里建立一个表:
-- `sql-demo`.cdn_access_statistic definition (这个在MYSQL里执行)
CREATE TABLE `cdn_access_statistic` (
`province` varchar(100) DEFAULT NULL,
`access_count` bigint DEFAULT NULL,
`total_download` bigint DEFAULT NULL,
`download_speed` bigint DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
在zeppelin里创建SINK job: 因为zeppeline是在docker运行,所以MYSQL的url的地址不能写localhost, 要写宿主机的IP
%flink.ssql
DROP table if exists cdn_access_statistic;
-- Please create this mysql table first in your mysql instance. Flink won't create mysql table for you.
CREATE TABLE cdn_access_statistic (
province VARCHAR,
access_count BIGINT,
total_download BIGINT,
download_speed DOUBLE
) WITH (
'connector.type' = 'jdbc',
'connector.url' = 'jdbc:mysql://192.168.3.XXX:3306/sql-demo',
'connector.table' = 'cdn_access_statistic',
'connector.username' = 'sql-demo',
'connector.password' = 'demo-sql',
'connector.write.flush.interval' = '1s'
)
3: 确定 kafak的source table和 mysql的sink table都创建了。
4: 从kafka消费数据,存储到mysql. 可以看到mysql 数据库里数据在变化
%flink.ssql insert into cdn_access_statistic select client_ip, request_time,request_time,request_time from cdn_access_log