• Storm—Storm分组策略和并发度


    五 分组策略和并发度

    5.1 读取文件案例思考

    1)spout数据源:数据库、文件、MQ(比如:Kafka)

    2)数据源是数据库:只适合读取数据库的配置文件

    3)数据源是文件:只适合测试、讲课用(因为集群是分布式集群)

    4)企业产生的log文件处理步骤:

           (1)读出内容写入MQ

           (2)Storm再处理

    5.2 分组策略(Stream Grouping)

    stream grouping用来定义一个stream应该如何分配给Bolts上面的多个Executors(多线程、多并发)。

    Storm里面有7种类型的stream grouping

    1)Shuffle Grouping: 随机分组,轮询,平均分配。随机派发stream里面的tuple,保证每个bolt接收到的tuple数目大致相同。

    2)Fields Grouping:按字段分组,比如按userid来分组,具有同样userid的tuple会被分到相同的Bolts里的一个task,而不同的userid则会被分配到不同的bolts里的task。

    3)All Grouping:广播发送,对于每一个tuple,所有的bolts都会收到。

    4)Global Grouping:全局分组,这个tuple被分配到storm中的一个bolt的其中一个task。再具体一点就是分配给id值最低的那个task。

    5)Non Grouping:不分组,这stream grouping个分组的意思是说stream不关心到底谁会收到它的tuple。目前这种分组和Shuffle grouping是一样的效果。在多线程情况下不平均分配。

    6)Direct Grouping:直接分组,这是一种比较特别的分组方法,用这种分组意味着消息的发送者指定由消息接收者的哪个task处理这个消息。只有被声明为Direct Stream的消息流可以声明这种分组方法。而且这种消息tuple必须使用emitDirect方法来发射。消息处理者可以通过TopologyContext来获取处理它的消息的task的id (OutputCollector.emit方法也会返回task的id)。

    7)Local or shuffle grouping如果目标bolt有一个或者多个task在同一个工作进程中,tuple将会被随机发送给这些tasks。否则,和普通的Shuffle Grouping行为一致。

    8)测试

             (1)spout并发度修改为2,bolt并发度修改为1,shuffleGrouping模式

    builder.setSpout("WebLogSpout", new WebLogSpout(),2);

    builder.setBolt("WebLogBolt", new WebLogBolt(), 1).shuffleGrouping("WebLogSpout");

    Spout开两个线程会对数据读取两份,打印出来就是2份。如果数据源是消息队列,就不会出来读取两份的数据(统一消费者组,只能有一个消费者)

    Thread-33-WebLogBolt-executor[1 1]lines:60  session_id:CYYH6Y2345GHI899OFG4V9U567

             (2)spout并发度修改为1,bolt并发度修改为2,noneGrouping模式

    builder.setSpout("WebLogSpout", new WebLogSpout(),1);

    builder.setBolt("WebLogBolt", new WebLogBolt(), 2).noneGrouping("WebLogSpout");

    每个bolt接收到的单词不同。

    Thread-33-WebLogBolt-executor[1 1]lines:14  session_id:VVVYH6Y4V4SFXZ56JIPDPB4V678

    Thread-34-WebLogBolt-executor[2 2]lines:16  session_id:VVVYH6Y4V4SFXZ56JIPDPB4V678

             (3)spout并发度修改为1,bolt并发度修改为2,fieldsGrouping

    builder.setSpout("WebLogSpout", new WebLogSpout(),1);

    builder.setBolt("WebLogBolt", new WebLogBolt(), 2).fieldsGrouping("WebLogSpout", new Fields("log"));

    基于web案例不明显,后续案例比较明显

             (4)spout并发度修改为1,bolt并发度修改为2,allGrouping(“spout”);

    builder.setSpout("WebLogSpout", new WebLogSpout(),1);

    builder.setBolt("WebLogBolt", new WebLogBolt(), 2).allGrouping("WebLogSpout");

    每一个bolt获取到的数据都是一样的。

    Thread-43-WebLogBolt-executor[1 1]lines:30  session_id:VVVYH6Y4V4SFXZ56JIPDPB4V678

    Thread-23-WebLogBolt-executor[2 2]lines:30  session_id:VVVYH6Y4V4SFXZ56JIPDPB4V678

             (5)spout并发度修改为1,bolt并发度修改为2,globalGrouping(“spout”);

    builder.setSpout("WebLogSpout", new WebLogSpout(),1);

    builder.setBolt("WebLogBolt", new WebLogBolt(), 2).globalGrouping("WebLogSpout");

    Task的id最低的bolt获取到了所有数据。

    Thread-28-WebLogBolt-executor[1 1]lines:30  session_id:VVVYH6Y4V4SFXZ56JIPDPB4V678

    5.3 并发度

    5.3.1 场景分析

    1)单线程下:加减乘除、全局汇总

    2)多线程下:局部加减乘除、持久化DB等

    (1)思考:如何计算:word总数和word个数?并且在高并发下完成

    前者是统计总行数,后者是去重word个数;

    类似企业场景:计算网站PV和UV

    (2)网站最常用的两个指标:

    PV(page views):count (session_id) 即页面浏览量。

    UV(user views):count(distinct session_id) 即独立访客数。

    a)用ip地址分析

    指访问某个站点或点击某个网页的不同IP地址的人数。在同一天内,UV只记录第一次进入网站的具有独立IP的访问者,在同一天内再次访问该网站则不计数。

    b)用Cookie分析UV值

      当客户端第一次访问某个网站服务器的时候,网站服务器会给这个客户端的电脑发出一个Cookie,通常放在这个客户端电脑的C盘当中。在这个Cookie中会分配一个独一无二的编号,这其中会记录一些访问服务器的信息,如访问时间,访问了哪些页面等等。当你下次再访问这个服务器的时候,服务器就可以直接从你的电脑中找到上一次放进去的Cookie文件,并且对其进行一些更新,但那个独一无二的编号是不会变的。

    实时处理的业务场景主要包括:汇总型(如网站PV、销售额、订单数)、去重型(网站UV、顾客数、销售商品数)

    5.3.2 并发度

    并发度:用户指定一个任务,可以被多个线程执行,并发度的数量等于线程executor的数量。

    Task就是具体的处理逻辑对象,一个executor线程可以执行一个或多个tasks,但一般默认每个executor只执行一个task,所以我们往往认为task就是执行线程,其实不是。

    Task代表最大并发度,一个component的task数是不会改变的,但是一个componet的executer数目是会发生变化的(storm rebalance命令),task数>=executor数,executor数代表实际并发数。

    5.4 实操案例

    5.4.1 实时单词统计案例

    1)需求

           实时统计发射到Storm框架中单词的总数。

    2)分析

    设计一个topology,来实现对文档里面的单词出现的频率进行统计。

    整个topology分为三个部分:

    (1)WordCountSpout:数据源,在已知的英文句子中,随机发送一条句子出去。

    (2)WordCountSplitBolt:负责将单行文本记录(句子)切分成单词

    (3)WordCountBolt:负责对单词的频率进行累加

    (1)创建spout

    package com.atguigu.storm.wordcount;

    import java.util.Map;

    import org.apache.storm.spout.SpoutOutputCollector;

    import org.apache.storm.task.TopologyContext;

    import org.apache.storm.topology.OutputFieldsDeclarer;

    import org.apache.storm.topology.base.BaseRichSpout;

    import org.apache.storm.tuple.Fields;

    import org.apache.storm.tuple.Values;

    public class WordCountSpout extends BaseRichSpout {

          

           private static final long serialVersionUID = 1L;

           private SpoutOutputCollector collector;

           @Override

           public void nextTuple() {

                  // 1 发射模拟数据

                  collector.emit(new Values("i am ximen love jinlian"));

                 

                  // 2 睡眠2秒

                  try {

                         Thread.sleep(2000);

                  } catch (InterruptedException e) {

                         e.printStackTrace();

                  }

           }

           @SuppressWarnings("rawtypes")

           @Override

           public void open(Map arg0, TopologyContext arg1, SpoutOutputCollector collector) {

                  this.collector = collector;

           }

           @Override

           public void declareOutputFields(OutputFieldsDeclarer declarer) {

                  declarer.declare(new Fields("love"));

           }

    }

           (2)创建切割单词的bolt

    package com.atguigu.storm.wordcount;

    import java.util.Map;

    import org.apache.storm.task.OutputCollector;

    import org.apache.storm.task.TopologyContext;

    import org.apache.storm.topology.OutputFieldsDeclarer;

    import org.apache.storm.topology.base.BaseRichBolt;

    import org.apache.storm.tuple.Fields;

    import org.apache.storm.tuple.Tuple;

    import org.apache.storm.tuple.Values;

    public class WordCountSplitBolt extends BaseRichBolt {

           private static final long serialVersionUID = 1L;

           private OutputCollector collector;

           @Override

           public void execute(Tuple input) {

                  // 1 获取传递过来的一行数据

                  String line = input.getString(0);

                  // 2 截取

                  String[] arrWords = line.split(" ");

                 

                  // 3 发射

                  for (String word : arrWords) {

                         collector.emit(new Values(word, 1));

                  }

           }

           @SuppressWarnings("rawtypes")

           @Override

           public void prepare(Map arg0, TopologyContext arg1, OutputCollector collector) {

                  this.collector = collector;

           }

           @Override

           public void declareOutputFields(OutputFieldsDeclarer declarer) {

                  declarer.declare(new Fields("word", "num"));

           }

    }

           (3)创建汇总单词个数的bolt

    package com.atguigu.storm.wordcount;

    import java.util.HashMap;

    import java.util.Map;

    import org.apache.storm.task.OutputCollector;

    import org.apache.storm.task.TopologyContext;

    import org.apache.storm.topology.OutputFieldsDeclarer;

    import org.apache.storm.topology.base.BaseRichBolt;

    import org.apache.storm.tuple.Tuple;

    public class WordCountBolt extends BaseRichBolt {

           private static final long serialVersionUID = 1L;

           private Map<String, Integer> map = new HashMap<String, Integer>();

           @Override

           public void execute(Tuple input) {

                  // 1 获取传递过来的数据

                  String word = input.getString(0);

                  Integer num = input.getInteger(1);

                  // 2 累加单词

                  if (map.containsKey(word)) {

                         Integer count = map.get(word);

                         map.put(word, count + num);

                  } else {

                         map.put(word, num);

                  }

    System.err.println(Thread.currentThread().getId() + "  word:" + word + "  num:" + map.get(word));

           }

           @SuppressWarnings("rawtypes")

           @Override

           public void prepare(Map arg0, TopologyContext arg1, OutputCollector collector) {

           }

           @Override

           public void declareOutputFields(OutputFieldsDeclarer arg0) {

                  // 不输出

           }

    }

           (4)创建程序的拓扑main

    package com.atguigu.storm.wordcount;

    import org.apache.storm.Config;

    import org.apache.storm.LocalCluster;

    import org.apache.storm.StormSubmitter;

    import org.apache.storm.topology.TopologyBuilder;

    import org.apache.storm.tuple.Fields;

    public class WordCountMain {

           public static void main(String[] args) {

                  // 1、准备一个TopologyBuilder

                  TopologyBuilder builder = new TopologyBuilder();

                  builder.setSpout("WordCountSpout", new WordCountSpout(), 1);

                  builder.setBolt("WordCountSplitBolt", new WordCountSplitBolt(), 2).shuffleGrouping("WordCountSpout");

                  builder.setBolt("WordCountBolt", new WordCountBolt(), 4).fieldsGrouping("WordCountSplitBolt", new Fields("word"));

                  // 2、创建一个configuration,用来指定当前topology 需要的worker的数量

                  Config conf = new Config();

                  conf.setNumWorkers(2);

                  // 3、提交任务 -----两种模式 本地模式和集群模式

                  if (args.length > 0) {

                         try {

                                // 4 分布式提交

                                StormSubmitter.submitTopology(args[0], conf, builder.createTopology());

                         } catch (Exception e) {

                                e.printStackTrace();

                         }

                  } else {

                         // 5 本地模式提交

                         LocalCluster localCluster = new LocalCluster();

                         localCluster.submitTopology("wordcounttopology", conf, builder.createTopology());

                  }

           }

    }

           (5)测试

    发现132线程只处理单词am和单词love;169进程处理单词i、ximen、jianlian。这就是分组的好处。

    132  word:am  num:1

    132  word:love  num:1

    169  word:i  num:1

    169  word:ximen  num:1

    169  word:jinlian  num:1

    5.4.2 实时计算网站PV案例

    0)基础知识准备

    1)需求

           统计网站pv。

    2)需求分析

    方案一:

    定义static long pv,Synchronized 控制累计操作。(不可行)

    原因:Synchronized 和 Lock在单JVM下有效,但在多JVM下无效

    方案二:

    shuffleGrouping下,pv * Executer并发数

    驱动函数中配置如下:

    builder.setSpout("PVSpout", new PVSpout(), 1);

    builder.setBolt("PVBolt1", new PVBolt1(), 4).shuffleGrouping("PVSpout");

    在PVBolt1中输出时

    System.err.println("threadid:" + Thread.currentThread().getId() + "  pv:" + pv*4);

    因为shuffleGrouping轮询分配

    优点:简单、计算量小

    缺点:稍有误差,但绝大多数场景能接受

    方案三:

    PVBolt1进行多并发局部汇总,PVSumbolt单线程进行全局汇总

    线程安全:多线程处理的结果和单线程一致

    优点:绝对准确;如果用filedGrouping可以得到中间值,如单个user的访问PV(访问深度等)

    缺点:计算量稍大,且多一个Bolt

    3)案例实操

    (1)创建数据输入源PVSpout

    package com.atguigu.storm.pv;

    import java.io.BufferedReader;

    import java.io.FileInputStream;

    import java.io.IOException;

    import java.io.InputStreamReader;

    import java.util.Map;

    import org.apache.storm.spout.SpoutOutputCollector;

    import org.apache.storm.task.TopologyContext;

    import org.apache.storm.topology.IRichSpout;

    import org.apache.storm.topology.OutputFieldsDeclarer;

    import org.apache.storm.tuple.Fields;

    import org.apache.storm.tuple.Values;

    public class PVSpout implements IRichSpout{

           private static final long serialVersionUID = 1L;

           private SpoutOutputCollector collector ;

           private BufferedReader reader;

          

           @SuppressWarnings("rawtypes")

           @Override

           public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {

                  this.collector = collector;

                 

                  try {

                         reader = new BufferedReader(new InputStreamReader(new FileInputStream("e:/website.log"),"UTF-8"));

                        

                  } catch (Exception e) {

                         e.printStackTrace();

                  }

           }

           @Override

           public void close() {

                 

                  try {

                         if (reader != null) {

                                reader.close();

                         }

                  } catch (IOException e) {

                         e.printStackTrace();

                  }

           }

           @Override

           public void activate() {

                 

           }

           @Override

           public void deactivate() {

                 

           }

           private String str;

          

           @Override

           public void nextTuple() {

                 

                  try {

                         while((str = reader.readLine()) != null){

                               

                                collector.emit(new Values(str));

                               

                                Thread.sleep(500);

                         }

                  } catch (Exception e) {

                        

                  }

           }

           @Override

           public void ack(Object msgId) {

           }

          

           @Override

           public void fail(Object msgId) {

                 

           }

           @Override

           public void declareOutputFields(OutputFieldsDeclarer declarer) {

                  declarer.declare(new Fields("log"));

           }

           @Override

           public Map<String, Object> getComponentConfiguration() {

                  return null;

           }

    }

    (2)创建数据处理pvbolt1

    package com.atguigu.storm.pv;

    import java.util.Map;

    import org.apache.storm.task.OutputCollector;

    import org.apache.storm.task.TopologyContext;

    import org.apache.storm.topology.IRichBolt;

    import org.apache.storm.topology.OutputFieldsDeclarer;

    import org.apache.storm.tuple.Fields;

    import org.apache.storm.tuple.Tuple;

    import org.apache.storm.tuple.Values;

    public class PVBolt1 implements IRichBolt {

          

           private static final long serialVersionUID = 1L;

           private OutputCollector collector;

           private long pv = 0;

           @SuppressWarnings("rawtypes")

           @Override

           public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {

                  this.collector = collector;

           }

           @Override

           public void execute(Tuple input) {

                  // 获取传递过来的数据

                  String logline = input.getString(0);

                  // 截取出sessionid

                  String session_id = logline.split(" ")[1];

                  // 根据会话id不同统计pv次数

                  if (session_id != null) {

                         pv++;

                  }

                  // 提交

                  collector.emit(new Values(Thread.currentThread().getId(), pv));

                  System.err.println("threadid:" + Thread.currentThread().getId() + "  pv:" + pv);

           }

           @Override

           public void declareOutputFields(OutputFieldsDeclarer declarer) {

                  declarer.declare(new Fields("thireadID", "pv"));

           }

           @Override

           public void cleanup() {

           }

           @Override

           public Map<String, Object> getComponentConfiguration() {

                  return null;

           }

    }

    (3)创建PVSumBolt

    package com.atguigu.storm.pv;

    import java.util.HashMap;

    import java.util.Iterator;

    import java.util.Map;

    import org.apache.storm.task.OutputCollector;

    import org.apache.storm.task.TopologyContext;

    import org.apache.storm.topology.IRichBolt;

    import org.apache.storm.topology.OutputFieldsDeclarer;

    import org.apache.storm.tuple.Tuple;

    public class PVSumBolt implements IRichBolt {

           private static final long serialVersionUID = 1L;

           private Map<Long, Long> counts = new HashMap<>();

           @SuppressWarnings("rawtypes")

           @Override

           public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {

           }

           @Override

           public void execute(Tuple input) {

                  Long threadID = input.getLong(0);

                  Long pv = input.getLong(1);

                  counts.put(threadID, pv);

                  long word_sum = 0;

                  Iterator<Long> iterator = counts.values().iterator();

                  while (iterator.hasNext()) {

                         word_sum += iterator.next();

                  }

                  System.err.println("pv_all:" + word_sum);

           }

           @Override

           public void cleanup() {

           }

           @Override

           public void declareOutputFields(OutputFieldsDeclarer declarer) {

           }

           @Override

           public Map<String, Object> getComponentConfiguration() {

                  return null;

           }

    }

    (4)驱动

    package com.atguigu.storm.pv;

    import org.apache.storm.Config;

    import org.apache.storm.LocalCluster;

    import org.apache.storm.StormSubmitter;

    import org.apache.storm.topology.TopologyBuilder;

    public class PVMain {

           public static void main(String[] args) {

                  TopologyBuilder builder = new TopologyBuilder();

                 

                  builder.setSpout("PVSpout", new PVSpout(), 1);

                  builder.setBolt("PVBolt1", new PVBolt1(), 4).shuffleGrouping("PVSpout");

                  builder.setBolt("PVSumBolt", new PVSumBolt(), 1).shuffleGrouping("PVBolt1");

                 

                  Config conf = new Config();

                 

                  conf.setNumWorkers(2);

                 

                  if (args.length > 0) {

                         try {

                                StormSubmitter.submitTopology(args[0], conf, builder.createTopology());

                         } catch (Exception e) {

                                e.printStackTrace();

                         }

                  }else {

                         LocalCluster cluster = new LocalCluster();

                         cluster.submitTopology("pvtopology", conf, builder.createTopology());

                  }

           }

    }

    (5)测试,执行程序输出如下结果

    threadid:161  pv:1

    pv_all:1

    threadid:164  pv:1

    pv_all:2

    threadid:161  pv:2

    pv_all:3

    threadid:172  pv:1

    pv_all:4

    threadid:164  pv:2

    pv_all:5

    threadid:164  pv:3

    pv_all:6

    threadid:162  pv:1

    pv_all:7

    threadid:161  pv:3

    pv_all:8

    threadid:172  pv:2

    pv_all:9

    threadid:164  pv:4

    pv_all:10

    threadid:162  pv:2

    pv_all:11

    threadid:172  pv:3

    pv_all:12

    threadid:164  pv:5

    pv_all:13

    threadid:164  pv:6

    pv_all:14

    threadid:161  pv:4

    pv_all:15

    threadid:161  pv:5

    pv_all:16

    threadid:164  pv:7

    pv_all:17

    threadid:172  pv:4

    pv_all:18

    threadid:172  pv:5

    pv_all:19

    threadid:161  pv:6

    pv_all:20

    threadid:162  pv:3

    pv_all:21

    threadid:164  pv:8

    pv_all:22

    threadid:172  pv:6

    pv_all:23

    threadid:164  pv:9

    pv_all:24

    threadid:161  pv:7

    pv_all:25

    threadid:162  pv:4

    pv_all:26

    threadid:162  pv:5

    pv_all:27

    threadid:162  pv:6

    pv_all:28

    threadid:164  pv:10

    pv_all:29

    threadid:161  pv:8

    pv_all:30

    5.4.3 实时计算网站UV去重案例

    1)需求:

           统计网站UV。

    2)需求分析

    方案一:

    把ip放入Set实现自动去重,Set.size() 获得UV(分布式应用中不可行)

    方案二:

    UVBolt1通过fieldGrouping 进行多线程局部汇总,下一级UVSumBolt进行单线程全局汇总去重。按ip地址统计UV数。

    既然去重,必须持久化数据:

    (1)内存:数据结构map

    (2)no-sql分布式数据库,如Hbase

     

    3)案例实操

           (1)创建带ip地址的数据源GenerateData

    package com.atguigu.storm.uv;

    import java.io.File;

    import java.io.FileOutputStream;

    import java.io.IOException;

    import java.util.Random;

    public class GenerateData {

           public static void main(String[] args) {

                  // 创建文件

                  File logFile = new File("e:/website.log");

                  // 判断文件是否存在

                  if (!logFile.exists()) {

                         try {

                                logFile.createNewFile();

                         } catch (IOException e) {

                                e.printStackTrace();

                         }

                  }

                  Random random = new Random();

                  // 1 网站名称

                  String[] hosts = { "www.atguigu.com" };

                  // 2 会话id

                  String[] session_id = { "ABYH6Y4V4SCVXTG6DPB4VH9U123", "XXYH6YCGFJYERTT834R52FDXV9U34",

                                "BBYH61456FGHHJ7JL89RG5VV9UYU7", "CYYH6Y2345GHI899OFG4V9U567", "VVVYH6Y4V4SFXZ56JIPDPB4V678" };

                  // 3 访问网站时间

                  String[] time = { "2017-08-07 08:40:50", "2017-08-07 08:40:51", "2017-08-07 08:40:52", "2017-08-07 08:40:53",

                                "2017-08-07 09:40:49", "2017-08-07 10:40:49", "2017-08-07 11:40:49", "2017-08-07 12:40:49" };

                  // 3 访问网站时间

                  String[] ip = { "192.168.1.101", "192.168.1.102", "192.168.1.103", "192.168.1.104", "192.168.1.105",

                                "192.168.1.106", "192.168.1.107", "192.168.1.108" };

                  StringBuffer sb = new StringBuffer();

                  for (int i = 0; i < 30; i++) {

                         sb.append(hosts[0] + " " + session_id[random.nextInt(5)] + " " + time[random.nextInt(8)] + " "

                                       + ip[random.nextInt(8)] + " ");

                  }

                  // 写数据

                  try {

                         FileOutputStream fs = new FileOutputStream(logFile);

                         fs.write(sb.toString().getBytes());

                         fs.close();

                  } catch (Exception e) {

                         e.printStackTrace();

                  }

           }

    }

           (2)创建接收数据UVSpout

    package com.atguigu.storm.uv;

    import java.io.BufferedReader;

    import java.io.FileInputStream;

    import java.io.IOException;

    import java.io.InputStreamReader;

    import java.util.Map;

    import org.apache.storm.spout.SpoutOutputCollector;

    import org.apache.storm.task.TopologyContext;

    import org.apache.storm.topology.IRichSpout;

    import org.apache.storm.topology.OutputFieldsDeclarer;

    import org.apache.storm.tuple.Fields;

    import org.apache.storm.tuple.Values;

    public class UVSpout implements IRichSpout{

           private static final long serialVersionUID = 1L;

           private SpoutOutputCollector collector ;

           private BufferedReader reader;

          

           @SuppressWarnings("rawtypes")

           @Override

           public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {

                  this.collector = collector;

                 

                  try {

                         reader = new BufferedReader(new InputStreamReader(new FileInputStream("e:/website.log"),"UTF-8"));

                        

                  } catch (Exception e) {

                         e.printStackTrace();

                  }

           }

           @Override

           public void close() {

                 

                  try {

                         if (reader != null) {

                                reader.close();

                         }

                  } catch (IOException e) {

                         e.printStackTrace();

                  }

           }

           @Override

           public void activate() {

                 

           }

           @Override

           public void deactivate() {

                 

           }

           private String str;

          

           @Override

           public void nextTuple() {

                 

                  try {

                         while((str = reader.readLine()) != null){

                               

                                collector.emit(new Values(str));

                               

                                Thread.sleep(500);

                         }

                  } catch (Exception e) {

                        

                  }

           }

           @Override

           public void ack(Object msgId) {

           }

          

           @Override

           public void fail(Object msgId) {

                 

           }

           @Override

           public void declareOutputFields(OutputFieldsDeclarer declarer) {

                  declarer.declare(new Fields("log"));

           }

           @Override

           public Map<String, Object> getComponentConfiguration() {

                  return null;

           }

    }

           (3)创建UVBolt1

    package com.atguigu.storm.uv;

    import java.util.Map;

    import org.apache.storm.task.OutputCollector;

    import org.apache.storm.task.TopologyContext;

    import org.apache.storm.topology.OutputFieldsDeclarer;

    import org.apache.storm.topology.base.BaseRichBolt;

    import org.apache.storm.tuple.Fields;

    import org.apache.storm.tuple.Tuple;

    import org.apache.storm.tuple.Values;

    public class UVBolt1 extends BaseRichBolt {

           private static final long serialVersionUID = 1L;

           private OutputCollector collector;

           @SuppressWarnings("rawtypes")

           @Override

           public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {

                  this.collector = collector;

           }

           @Override

           public void execute(Tuple input) {

                  // 1 获取传递过来的一行数据

                  String line = input.getString(0);

                 

                  // 2 截取

                  String[] splits = line.split(" ");

                  String ip = splits[3];

                 

                  // 3 发射

                  collector.emit(new Values(ip, 1));

           }

           @Override

           public void declareOutputFields(OutputFieldsDeclarer declarer) {

                  declarer.declare(new Fields("word", "num"));

           }

    }

           (4)创建UVSumBolt

    package com.atguigu.storm.uv;

    import java.util.HashMap;

    import java.util.Map;

    import org.apache.storm.task.OutputCollector;

    import org.apache.storm.task.TopologyContext;

    import org.apache.storm.topology.OutputFieldsDeclarer;

    import org.apache.storm.topology.base.BaseRichBolt;

    import org.apache.storm.tuple.Tuple;

    public class UVSumBolt extends BaseRichBolt {

           private static final long serialVersionUID = 1L;

           private Map<String, Integer> map = new HashMap<String, Integer>();

          

           @SuppressWarnings("rawtypes")

           @Override

           public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {

           }

           @Override

           public void execute(Tuple input) {

                  // 1 获取传递过来的数据

                  String ip = input.getString(0);

                  Integer num = input.getInteger(1);

                  // 2 累加单词

                  if (map.containsKey(ip)) {

                         Integer count = map.get(ip);

                         map.put(ip, count + num);

                  } else {

                         map.put(ip, num);

                  }

                  System.err.println(Thread.currentThread().getId() + "  ip:" + ip + "  num:" + map.get(ip));

           }

           @Override

           public void declareOutputFields(OutputFieldsDeclarer declarer) {

           }

    }

           (5)创建UVMain驱动

    package com.atguigu.storm.uv;

    import org.apache.storm.Config;

    import org.apache.storm.LocalCluster;

    import org.apache.storm.StormSubmitter;

    import org.apache.storm.topology.TopologyBuilder;

    public class UVMain {

           public static void main(String[] args) {

                 

                  TopologyBuilder builder = new TopologyBuilder();

                 

                  builder.setSpout("UVSpout", new UVSpout(),1);

                  builder.setBolt("UVBolt1", new UVBolt1(),4).shuffleGrouping("UVSpout");

                  builder.setBolt("UVSumBolt", new UVSumBolt(), 1).shuffleGrouping("UVBolt1");

                  Config conf = new Config();

                  conf.setNumWorkers(2);

                  if (args.length > 0) {

                         try {

                                StormSubmitter.submitTopology(args[0], conf, builder.createTopology());

                         } catch (Exception e) {

                                e.printStackTrace();

                         }

                  }else {

                         LocalCluster cluster = new LocalCluster();

                        

                         cluster.submitTopology("uvtopology", conf, builder.createTopology());

                  }

           }

    }

           (6)测试

    136  ip:192.168.1.101  num:1

    136  ip:192.168.1.107  num:1

    136  ip:192.168.1.108  num:1

    136  ip:192.168.1.103  num:1

    136  ip:192.168.1.105  num:1

    136  ip:192.168.1.102  num:1

    136  ip:192.168.1.103  num:2

    136  ip:192.168.1.101  num:2

    136  ip:192.168.1.107  num:2

    136  ip:192.168.1.105  num:2

    136  ip:192.168.1.101  num:3

    136  ip:192.168.1.108  num:2

    136  ip:192.168.1.108  num:3

    136  ip:192.168.1.105  num:3

    136  ip:192.168.1.101  num:4

    136  ip:192.168.1.108  num:4

    136  ip:192.168.1.102  num:2

    136  ip:192.168.1.104  num:1

    136  ip:192.168.1.103  num:3

    136  ip:192.168.1.106  num:1

    136  ip:192.168.1.105  num:4

    136  ip:192.168.1.101  num:5

    136  ip:192.168.1.102  num:3

    136  ip:192.168.1.108  num:5

    136  ip:192.168.1.103  num:4

    136  ip:192.168.1.107  num:3

    136  ip:192.168.1.102  num:4

    136  ip:192.168.1.104  num:2

    136  ip:192.168.1.105  num:5

    136  ip:192.168.1.104  num:3

    测试结果:一共8个用户,101:访问5次;102访问:4次;103:访问4次;104:访问3次;105:访问5次;106:访问1次;107:访问3次;108:访问5次;

     

     

  • 相关阅读:
    增强for循环
    java魔性的类型
    回溯算法
    UE4报错cmd.exe failed with args /c
    C#中GetHashCode的各类实现
    Unity3D中的meta文件笔记
    数值分析笔记(3)——数值计算中的原则
    数值分析笔记(2)——有效数字
    数值分析笔记(1)——误差的来源和分类
    数值分析笔记(0)——数值分析研究的对象和内容
  • 原文地址:https://www.cnblogs.com/tesla-turing/p/13699100.html
Copyright © 2020-2023  润新知