• WordCount: 基于kafka+storm+hbase


    描述

    使用wordcount程序,整合kafkf,storm和hbase
    数据源:kafka, topic "logs"
    词频统计: storm
    存储:统计的结果存储到hbase
    

    1,分析

    1.1 storm topology

    在topology中,使用KafkaSpout从kafka接收数据,接收到的数据是以行为单位的句子;
    使用SentenceSplitBolt分拆出每个单词,再使用CountBolt统计每个单词出现的次数,最后使用Hbase bolt把结果存储到hbase中。

    Kafka -> KafkaSpout -> SentenceSplitBolt -> CountBolt -> Hbase bolt
    

    2, 实现

    实验环境

    2台服务器,hadoop1 和 hadoo2

    CentOS-6.4hadoop1, hadoop2
    Hadoop-2.5-cdh-5.3.6hadoop1
    kafka-2.10-0.8.1.1 hadoop2
    hbase-0.98.6-cdh-5.3.6hadoop2-HMaster, hadoop1-RegionServer
    storm-0.9.6 hadoop2
    zookeeper-3.4.5-cdh5.3.6 hadoop2

    SentenceSplitBolt

    public class SentenceSplitBolt extends BaseRichBolt {
    	static final Logger LOGGER = LoggerFactory.getLogger(SentenceSplitBolt.class);
    	
    	private OutputCollector collector;
    	
    	@Override
    	public void prepare(Map stormConf, TopologyContext context,
    			OutputCollector collector) {
    		collector = collector;
    	}
    
    	@Override
    	public void execute(Tuple input) {			
    		// KafkaSpout中使用了"str"作为数据的字段名
    		String sentence = input.getStringByField("str");
    		String[] words = sentence.split(" ");
    		
    		if (words.length > 0) {
    			for (String word : words) {				
    				collector.emit(new Values(words));			// 一个一个单词发射出去
    			}
    		}
    		
    		// 确认:tuple成功处理
    		collector.ack(input);
    	}
    
    	@Override
    	public void declareOutputFields(OutputFieldsDeclarer declarer) {
    		declarer.declare(new Fields("word"));
    	}
    }
    

    CountBolt

    public class CountBolt extends BaseRichBolt {
    	static final Logger LOGGER = LoggerFactory.getLogger(CountBolt.class);
    	private OutputCollector collector;
    	private Map<String, Integer> wordMap = new Hashtable<String, Integer>();
    	
    	@Override
    	public void prepare(Map stormConf, TopologyContext context,
    			OutputCollector collector) {
    		this.collector = collector;			
    	}
    
    	@Override
    	public void execute(Tuple input) {
    		String word = input.getStringByField("word");
    		if (!wordMap.containsKey(word)) {
    			wordMap.put(word, 0);
    		}
    		
    		int count = wordMap.get(word);
    		count++;
    		wordMap.put(word, count);
    
    		// 为了方便测试,把count转化为字符串,这样能够在hue中方便查看到hbase中的数据
    		collector.emit(new Values(word, String.valueOf(count)));
    	}
    
    	@Override
    	public void declareOutputFields(OutputFieldsDeclarer declarer) {
    		declarer.declare(new Fields("word", "count"));
    	}
    }
    

    WCTopohogy

    public class WCTopohogy {
    	static Logger logger = LoggerFactory.getLogger(WCTopohogy.class);
    	
    	public static void main(String[] args) throws AlreadyAliveException, InvalidTopologyException, InterruptedException {
    		TopologyBuilder builder = new TopologyBuilder();
    
    		SpoutConfig spoutConf = new SpoutConfig(new ZkHosts("hadoop2"), "test", "/test", UUID.randomUUID().toString());
    		spoutConf.forceFromStart = true;
    		spoutConf.scheme = new SchemeAsMultiScheme(new StringScheme());
    		
    		KafkaSpout kafkaSpout = new KafkaSpout(spoutConf);
    		
    	    builder.setSpout("spout", kafkaSpout, 5);
    
    	    builder.setBolt("split", new SentenceSplitBolt(), 8).shuffleGrouping("spout");
    	    builder.setBolt("count", new CountBolt(), 12).fieldsGrouping("split", new Fields("word"));	    
    	    
    	    SimpleHBaseMapper mapper = new SimpleHBaseMapper();
    	    mapper.withColumnFamily("result");
    	    mapper.withColumnFields(new Fields("count"));
    	    mapper.withRowKeyField("word");
    	    
    	    Map<String, Object> map = Maps.newTreeMap();
    	    map.put("hbase.rootdir", "hdfs://hadoop1:9000/hbase");
    	    map.put("hbase.zookeeper.quorum", "hadoop2:2181");
    	    
    	    // hbase-bolt
    	    HBaseBolt hBaseBolt = new HBaseBolt("wordcount", mapper).withConfigKey("hbase.conf");
    	    builder.setBolt("hbase", hBaseBolt, 6).shuffleGrouping("count");
    	    
    	    Config conf = new Config();
    	    conf.setDebug(true);
    	    conf.put("hbase.conf", map);
    
    	   // 设置远程nimbus主机
    	   // conf.put(Config.NIMBUS_HOST, "hadoop2");
    	   //  conf.put(Config.NIMBUS_THRIFT_PORT, 6627);
    	    
    	    // 集群模式
    	    if (args != null && args.length > 0) {
    	      conf.setNumWorkers(3);
    
    	      StormSubmitter.submitTopologyWithProgressBar(args[0], conf, builder.createTopology());
    	    }
    	    else {		// 本地模式
    	      conf.setMaxTaskParallelism(3);
    	      LocalCluster cluster = new LocalCluster();
    	      cluster.submitTopology("word-count", conf, builder.createTopology());
    	    }
    	}
    }
    

    PrepareHbase 用于在hbase创建wordcount table

    public class PrepareHbase {
    	public static void main(String[] args) throws MasterNotRunningException, ZooKeeperConnectionException, IOException {	
    		Configuration conf = HBaseConfiguration.create();
    		conf.set("hbase.rootdir", "hdfs://hadoop1:9000/hbase");
    		conf.set("hbase.zookeeper.quorum", "hadoop2:2181");
    		
    		HBaseAdmin admin = new HBaseAdmin(conf);
    		HTableDescriptor tableDescriptor = new HTableDescriptor("wordcount");
    		tableDescriptor.addFamily(new HColumnDescriptor("result"));
    		admin.createTable(tableDescriptor);
    	}
    }
    

    3 测试

    1. 运行PrepareHbase创建wordcount表
    2. 运行WCTopology

    启动kafka-console-consumer,输入句子进行测试

    在hue中观察storm出现的次数

    再次在kafka-console-consumer输入storm后,观察storm的次数

    4 总结

    Storm是一个实时流式数据处理器,本实验使用storm处理来自kafka的消息,并把处理后的结果保存到hbase

  • 相关阅读:
    #C++初学记录(sort函数)
    #C++初学记录(贪心算法#结构体#贪心算法)
    #C++初学记录(初识汉诺塔)
    vuex中使用多模块时,如果不同模块中action有名字冲突该如何解决
    vue中二次封装别人组件,动态传属性使用v-bind="$attrs" 和 v-on="$listeners"
    在vue.config.js项目中配置proxy解决跨域问题
    Vue里报错:Maximum call stack size exceeded
    git报错:pre-commit hook failed (add --no-verify to bypass)
    十大排序算法,用JS写出来
    基数排序(JS代码)
  • 原文地址:https://www.cnblogs.com/ivanny/p/word_count_kafkf_stomr_hbase.html
Copyright © 2020-2023  润新知