遇到的问题
- 根据 flink doc 添加依赖:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-pulsar_2.11</artifactId>
<version>1.14.4</version>
</dependency>
- 在IDE里可以直接运行,但需要勾选
Add denpendencies with "provided" scope to classpath
- 打包成jar 提交后,提交成功但无法启动,异常如下:
2022-06-15 10:32:13
org.apache.flink.util.FlinkException: Global failure triggered by OperatorCoordinator for 'Source: pulsar-source-test -> (Sink: Print to Std. Out, Map -> Filter -> Map -> Filter -> Map)' (operator 06f6b9841ee245744a878eedfd102524).
at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:545)
at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$QuiesceableContext.failJob(RecreateOnResetOperatorCoordinator.java:223)
at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.failJob(SourceCoordinatorContext.java:285)
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.start(SourceCoordinator.java:132)
at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$DeferrableCoordinator.resetAndStart(RecreateOnResetOperatorCoordinator.java:381)
at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator.lambda$resetToCheckpoint$6(RecreateOnResetOperatorCoordinator.java:136)
at java.base/java.util.concurrent.CompletableFuture$UniRun.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source)
at org.apache.flink.runtime.operators.coordination.ComponentClosingUtils.lambda$closeAsyncWithTimeout$0(ComponentClosingUtils.java:71)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.pulsar.client.admin.internal.PulsarAdminImpl
at org.apache.pulsar.client.admin.internal.PulsarAdminBuilderImpl.build(PulsarAdminBuilderImpl.java:47)
at org.apache.flink.connector.pulsar.common.utils.PulsarExceptionUtils.sneaky(PulsarExceptionUtils.java:69)
at org.apache.flink.connector.pulsar.common.utils.PulsarExceptionUtils.sneakyClient(PulsarExceptionUtils.java:46)
at org.apache.flink.connector.pulsar.common.config.PulsarConfigUtils.createAdmin(PulsarConfigUtils.java:213)
at org.apache.flink.connector.pulsar.source.enumerator.PulsarSourceEnumerator.<init>(PulsarSourceEnumerator.java:86)
at org.apache.flink.connector.pulsar.source.PulsarSource.createEnumerator(PulsarSource.java:149)
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.start(SourceCoordinator.java:128)
... 7 more
尝试解决
- 使用IDEA点开jar包,可以找到
org.apache.pulsar.client.admin.internal.PulsarAdminImpl
等文件 - 怀疑是版本兼容问题,更换1.14.0 到1.14.4 一样的错;更换到1.15.0 甚至报错
graph is cyclic
, 提示更换大版本的迁移风险。 - 尝试使用
jdeps
分析这个 jar 包的依赖jdeps target/vdf-1.0.0.jar
, 报异常:
Exception in thread "main" java.lang.module.FindException: Module java.xml.bind not found, required by java.ws.rs
at java.base/java.lang.module.Resolver.findFail(Resolver.java:877)
at java.base/java.lang.module.Resolver.resolve(Resolver.java:191)
at java.base/java.lang.module.Resolver.resolve(Resolver.java:140)
- 移除 pom 中的
flink-connector-pulsar
,再打包jar,上面异常消失,可以正常分析依赖。 - 怀疑是依赖的依赖没有添加,找到connector原始仓库 flink-connector-pulsar/pom.xml, 添加以下依赖到 flink job 的 pom:
<dependency>
<groupId>org.apache.pulsar</groupId>
<artifactId>pulsar-client-all</artifactId>
<version>${pulsar.version}</version>
<exclusions>
<exclusion>
<groupId>com.sun.activation</groupId>
<artifactId>javax.activation</artifactId>
</exclusion>
<exclusion>
<groupId>jakarta.activation</groupId>
<artifactId>jakarta.activation-api</artifactId>
</exclusion>
<exclusion>
<groupId>jakarta.ws.rs</groupId>
<artifactId>jakarta.ws.rs-api</artifactId>
</exclusion>
<exclusion>
<groupId>jakarta.xml.bind</groupId>
<artifactId>jakarta.xml.bind-api</artifactId>
</exclusion>
<exclusion>
<groupId>javax.validation</groupId>
<artifactId>validation-api</artifactId>
</exclusion>
<exclusion>
<groupId>javax.xml.bind</groupId>
<artifactId>jaxb-api</artifactId>
</exclusion>
<exclusion>
<groupId>net.jcip</groupId>
<artifactId>jcip-annotations</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.pulsar</groupId>
<artifactId>pulsar-package-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.beust</groupId>
<artifactId>jcommander</artifactId>
</exclusion>
</exclusions>
</dependency>
- 再次打包、提交,之前的异常消失,且 jdeps可以进行依赖分析,但出现新的异常:
2022-06-15 16:32:33
org.apache.flink.util.FlinkException: Global failure triggered by OperatorCoordinator for 'Source: pulsar-source-test -> (Sink: Print to Std. Out, Map -> Filter -> Map -> Filter -> Map)' (operator 06f6b9841ee245744a878eedfd102524).
at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:545)
at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$QuiesceableContext.failJob(RecreateOnResetOperatorCoordinator.java:231)
at org.apache.flink.runtime.source.coordinator.SourceCoordinatorContext.failJob(SourceCoordinatorContext.java:287)
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.start(SourceCoordinator.java:128)
at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator$DeferrableCoordinator.resetAndStart(RecreateOnResetOperatorCoordinator.java:389)
at org.apache.flink.runtime.operators.coordination.RecreateOnResetOperatorCoordinator.lambda$resetToCheckpoint$6(RecreateOnResetOperatorCoordinator.java:144)
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)
at java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source)
at org.apache.flink.runtime.operators.coordination.ComponentClosingUtils.lambda$closeAsyncWithTimeout$0(ComponentClosingUtils.java:77)
at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.IncompatibleClassChangeError: Method 'org.apache.pulsar.client.admin.PulsarAdminBuilder org.apache.pulsar.client.admin.PulsarAdmin.builder()' must be Methodref constant
at org.apache.flink.connector.pulsar.common.config.PulsarConfigUtils.createAdmin(PulsarConfigUtils.java:176)
at org.apache.flink.connector.pulsar.source.enumerator.PulsarSourceEnumerator.<init>(PulsarSourceEnumerator.java:86)
at org.apache.flink.connector.pulsar.source.PulsarSource.createEnumerator(PulsarSource.java:149)
at org.apache.flink.runtime.source.coordinator.SourceCoordinator.start(SourceCoordinator.java:124)
... 8 more
- 后来检查发现上面的异常是因为版本问题, pulsar.version 应该为2.8.0 而不是2.7.0:
<groupId>org.apache.pulsar</groupId>
<artifactId>pulsar-client-all</artifactId>
<version>${pulsar.version}</version>
可以检查maven仓库,从1.14.0到1.14.4,都是pulsar 2.8.0
最终解决方法
- 把
maven-shade-plugin
里面的<exclude>org.slf4j:*</exclude>
这行去掉即可,无需添加flink-connector-pulsar
之外的其他依赖 - jdeps 的异常是因为,pulsar connect 的依赖中有 pulsar-client,其中又用到了
java.ws.rs
,而它又用到了java.xml.bind
,但是 java 11 中把java.xml.bind
移除了,参考,通过加入:<exclusion> <groupId>jakarta.ws.rs</groupId> <artifactId>jakarta.ws.rs-api</artifactId> </exclusion>
可以解决
总结
- 通过逐个检查 pulsar connector 的依赖,发现其需要 org.slf4j :
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.15</version>
<scope>provided</scope>
</dependency>
再通过检查自己项目下的pom,发现exclude,去之即可。