1、下载spark-streaming-kafka插件包
由于Linux集群环境我使用spark是spark-2.1.1-bin-hadoop2.7,kafka是kafka_2.11-0.8.2.1,所以我下载的是spark-streaming-kafka-0-8_2.11-2.1.1.jar。
官网下载地址:http://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-8_2.11/2.1.1
百度云下载地址:链接:http://pan.baidu.com/s/1o83DOHO 密码:2dgx
2、整合spark和kafka的jar包
2.1添加spark-streaming-kafka插件包
新建一个lib目录,首先把1步骤下载的spark-streaming-kafka-0-8_2.11-2.1.1.jar放进去
如图:
2.2添加spark依赖包
找到spark-2.1.1-bin-hadoop2.7/jars目录下所有的jar包,如图:
把spark-2.1.1-bin-hadoop2.7/jars目录下所有的jar包复制到上述新建的lib目录下,如图:
2.3添加kafka依赖包
找到kafka_2.11-0.8.2.1/libs目录下所有的jar包,如图:
把kafka_2.11-0.8.2.1/libs目录下所有的jar包复制到上述新建的lib目录下,如图:
3、新建测试工程
新建scala project,引用上述lib目录下的所有jar包;新建一个KafkaWordCount.scala用于测试:
- import org.apache.spark.streaming.StreamingContext
- import org.apache.spark.SparkConf
- import org.apache.spark.streaming.kafka.KafkaUtils
- import org.apache.spark.streaming.Seconds
- import org.apache.spark.streaming.Minutes
- import org.apache.spark.SparkContext
- import kafka.serializer.StringDecoder
- object KafkaWordCount {
- def main(args: Array[String]) {
- val sparkConf = new SparkConf().setAppName("KafkaWordCount").setMaster("local[2]")
- sparkConf.set("spark.port.maxRetries","128")
- val ssc = new StreamingContext(sparkConf, Seconds(2))
- ssc.checkpoint("hdfs://192.168.168.200:9000/checkpoint")
- val zkQuorum = "192.168.168.200:2181"
- val group = "test-group"
- val topics = "test"
- val numThreads = 1
- val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
- val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)
- val words = lines.flatMap(_.split(" "))
- val wordCounts = words.map(x => (x, 1L))
- .reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)
- wordCounts.print()
- ssc.start()
- ssc.awaitTermination()
- }
- }
如图:
启动spark集群和kafka集群,默认已经开启,默认kafka有test主题,这是默认要会的,在这里不在详述。
运行成功,如图:
- SLF4J: Class path contains multiple SLF4J bindings.
- SLF4J: Found binding in [jar:file:/I:/001sourceCode/020SparkStreaming/%e5%a4%a7%e6%95%b0%e6%8d%ae%e5%bc%80%e5%8f%91%e6%96%b9%e6%a1%88%e8%b5%84%e6%96%99%ef%bc%88%e5%a4%a9%e7%bb%b4%e5%b0%94%ef%bc%89/%e5%bc%80%e5%8f%91%e6%89%80%e9%9c%80jar%e5%8c%85/lib/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
- SLF4J: Found binding in [jar:file:/I:/001sourceCode/020SparkStreaming/%e5%a4%a7%e6%95%b0%e6%8d%ae%e5%bc%80%e5%8f%91%e6%96%b9%e6%a1%88%e8%b5%84%e6%96%99%ef%bc%88%e5%a4%a9%e7%bb%b4%e5%b0%94%ef%bc%89/%e5%bc%80%e5%8f%91%e6%89%80%e9%9c%80jar%e5%8c%85/lib/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
- SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
- SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
- -------------------------------------------
- Time: 1499667652000 ms
- -------------------------------------------
- -------------------------------------------
- Time: 1499667654000 ms
- -------------------------------------------
- -------------------------------------------
- Time: 1499667656000 ms
- -------------------------------------------
4、接收kafka的主题消息
启动一个kafka的生产者客户端:
- [root@master ~]# kafka-console-producer.sh --broker-list 192.168.168.200:9092 --topic test
- test success
- spark
- kafka
运行日志如下:
- -------------------------------------------
- Time: 1499667830000 ms
- -------------------------------------------
- -------------------------------------------
- Time: 1499667832000 ms
- -------------------------------------------
- (test,1)
- (success,1)
- -------------------------------------------
- Time: 1499667834000 ms
- -------------------------------------------
- (test,1)
- (success,1)
- -------------------------------------------
- Time: 1499667836000 ms
- -------------------------------------------
- (test,1)
- (spark,1)
- (success,1)
- -------------------------------------------
- Time: 1499667838000 ms
- -------------------------------------------
- (kafka,1)
- (test,1)
- (spark,1)
- (success,1)
5、sparkStreaming收不到kafka主题消息
如果出现kakfa的消费者客户端可以收到消息,而spark的消费者客户端收不到消息,后台也没有报错,那么要仔细检查kafka_home/conf目录下的server.properties,有没有配置:
- ############################# Socket Server Settings #############################
- # The port the socket server listens on
- port=9092
- # Hostname the broker will bind to. If not set, the server will bind to all interfaces
- host.name=192.168.168.200
6、sbtassembly打包代码并上传到spark运行
可参考以下资料: