添加自定义监控指标,以flink1.5的Kafka读取以及写入为例,添加rps、dirtyData等相关指标信息。�kafka读取和写入重点是先拿到RuntimeContex初始化指标,并传递给要使用的序列类,通过重写序列化和反序列化方法,来更新指标信息。
不加指标的kafka数据读取、写入Demo。
public class FlinkEtlTest { private static final Logger logger = LoggerFactory.getLogger(FlinkEtlTest.class); public static void main(String[] args) throws Exception { final ParameterTool params = ParameterTool.fromArgs(args); String jobName = params.get("jobName"); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); /** 设置kafka数据 */ String topic = "myTest01"; Properties props = new Properties(); props.setProperty("bootstrap.servers", "localhost:9092"); props.setProperty("zookeeper.quorum", "localhost:2181/kafka"); // 使用FlinkKafkaConsumer09以及SimpleStringSchema序列化类,读取kafka数据 FlinkKafkaConsumer09<String> consumer09 = new FlinkKafkaConsumer09(topic, new SimpleStringSchema(), props); consumer09.setStartFromEarliest(); // 使用FlinkKafkaProducer09和SimpleStringSchema反序列化类,将数据写入kafka String sinkBrokers = "localhost:9092"; FlinkKafkaProducer09<String> myProducer = new FlinkKafkaProducer09<>(sinkBrokers, "myTest01", new SimpleStringSchema()); DataStream<String> kafkaDataStream = env.addSource(consumer09); kafkaDataStream = kafkaDataStream.map(str -> { logger.info("map receive {}",str); return str.toUpperCase(); }); kafkaDataStream.addSink(myProducer); env.execute(jobName); } }
下面重新复写flink的
FlinkKafkaConsumer09
FlinkKafkaProducer09
方法,加入metrics的监控。
为kafka读取添加相关指标
- 继承FlinkKafkaConsumer09,获取它的RuntimeContext,使用当前MetricGroup初始化指标参数。
public class CustomerFlinkKafkaConsumer09<T> extends FlinkKafkaConsumer09<T> { CustomerSimpleStringSchema customerSimpleStringSchema; // 构造方法有多个 public CustomerFlinkKafkaConsumer09(String topic, DeserializationSchema valueDeserializer, Properties props) { super(topic, valueDeserializer, props); this.customerSimpleStringSchema = (CustomerSimpleStringSchema) valueDeserializer; } @Override public void run(SourceContext sourceContext) throws Exception { //将RuntimeContext传递给customerSimpleStringSchema customerSimpleStringSchema.setRuntimeContext(getRuntimeContext()); // 初始化指标 customerSimpleStringSchema.initMetric(); super.run(sourceContext); } }
重写SimpleStringSchema类的反序列化方法,当数据流入时变更指标。
public class CustomerSimpleStringSchema extends SimpleStringSchema { private static final Logger logger = LoggerFactory.getLogger(CustomerSimpleStringSchema.class); public static final String DT_NUM_RECORDS_RESOVED_IN_COUNTER = "dtNumRecordsInResolve"; public static final String DT_NUM_RECORDS_RESOVED_IN_RATE = "dtNumRecordsInResolveRate"; public static final String DT_DIRTY_DATA_COUNTER = "dtDirtyData"; public static final String DT_NUM_BYTES_IN_COUNTER = "dtNumBytesIn"; public static final String DT_NUM_RECORDS_IN_RATE = "dtNumRecordsInRate"; public static final String DT_NUM_BYTES_IN_RATE = "dtNumBytesInRate"; public static final String DT_NUM_RECORDS_IN_COUNTER = "dtNumRecordsIn"; protected transient Counter numInResolveRecord; //source RPS protected transient Meter numInResolveRate; //source dirty data protected transient Counter dirtyDataCounter; // tps protected transient Meter numInRate; protected transient Counter numInRecord; //bps protected transient Counter numInBytes; protected transient Meter numInBytesRate; private transient RuntimeContext runtimeContext; public void initMetric() { numInResolveRecord = runtimeContext.getMetricGroup().counter(DT_NUM_RECORDS_RESOVED_IN_COUNTER); numInResolveRate = runtimeContext.getMetricGroup().meter(DT_NUM_RECORDS_RESOVED_IN_RATE, new MeterView(numInResolveRecord, 20)); dirtyDataCounter = runtimeContext.getMetricGroup().counter(DT_DIRTY_DATA_COUNTER); numInBytes = runtimeContext.getMetricGroup().counter(DT_NUM_BYTES_IN_COUNTER); numInRecord = runtimeContext.getMetricGroup().counter(DT_NUM_RECORDS_IN_COUNTER); numInRate = runtimeContext.getMetricGroup().meter(DT_NUM_RECORDS_IN_RATE, new MeterView(numInRecord, 20)); numInBytesRate = runtimeContext.getMetricGroup().meter(DT_NUM_BYTES_IN_RATE , new MeterView(numInBytes, 20)); } // 源表读取重写deserialize方法 @Override public String deserialize(byte[] value) { // 指标进行变更 numInBytes.inc(value.length); numInResolveRecord.inc(); numInRecord.inc(); try { return super.deserialize(value); } catch (Exception e) { dirtyDataCounter.inc(); } return ""; } public void setRuntimeContext(RuntimeContext runtimeContext) { this.runtimeContext = runtimeContext; } }
CustomerFlinkKafkaConsumer09<String> consumer09 = new CustomerFlinkKafkaConsumer09(topic, new CustomerSimpleStringSchema(), props);
为kafka写入添加相关指标
- 继承FlinkKafkaProducer09类,重写open方法,拿到RuntimeContext,初始化指标信息传递给CustomerSinkStringSchema。
public class CustomerFlinkKafkaProducer09<T> extends FlinkKafkaProducer09<T> { public static final String DT_NUM_RECORDS_OUT = "dtNumRecordsOut"; public static final String DT_NUM_RECORDS_OUT_RATE = "dtNumRecordsOutRate"; CustomerSinkStringSchema schema; public CustomerFlinkKafkaProducer09(String brokerList, String topicId, SerializationSchema serializationSchema) { super(brokerList, topicId, serializationSchema); this.schema = (CustomerSinkStringSchema) serializationSchema; } @Override public void open(Configuration configuration) { producer = getKafkaProducer(this.producerConfig); RuntimeContext ctx = getRuntimeContext(); Counter counter = ctx.getMetricGroup().counter(DT_NUM_RECORDS_OUT); //Sink的RPS计算 MeterView meter = ctx.getMetricGroup().meter(DT_NUM_RECORDS_OUT_RATE, new MeterView(counter, 20)); // 将counter传递给CustomerSinkStringSchema schema.setCounter(counter); super.open(configuration); } }
重写SimpleStringSchema的序列化方法
public class CustomerSinkStringSchema extends SimpleStringSchema { private static final Logger logger = LoggerFactory.getLogger(CustomerSinkStringSchema.class); private Counter sinkCounter; @Override public byte[] serialize(String element) { logger.info("sink data {}", element); sinkCounter.inc(); return super.serialize(element); //复写serialize方法,序列化继续使用父类提供的序列化方法 } public void setCounter(Counter counter) { this.sinkCounter = counter; } } 复制代码
获取 Metrics
这样就可以在监控框架里面看到采集的指标信息了,
比如flink_taskmanager_job_task_operator_dtDirtyData指标,dtDirtyData是自己添加的指标,前面的字符串是operator默认使用的metricGroup。
获取 Metrics 有三种方法,首先可以在 WebUI 上看到;其次可以通过 RESTful API 获取,RESTful API 对程序比较友好,比如写自动化脚本或程序,自动化运维和测试,通过 RESTful API 解析返回的 Json 格式对程序比较友好;最后,还可以通过 Metric Reporter 获取,监控主要使用 Metric Reporter 功能。
数据分析:
分析任务有时候为什么特别慢呢?
当定位到某一个 Task 处理特别慢时,需要对慢的因素做出分析。分析任务慢的因素是有优先级的,可以从上向下查,由业务方面向底层系统。因为大部分问题都出现在业务维度上,比如查看业务维度的影响可以有以下几个方面,并发度是否合理、数据波峰波谷、数据倾斜;其次依次从 Garbage Collection、Checkpoint Alignment、State Backend 性能角度进行分析;最后从系统性能角度进行分析,比如 CPU、内存、Swap、Disk IO、吞吐量、容量、Network IO、带宽等。