• 使用mongoshake进行oplog同步读取,解决乱码问题


    mongoshake 是个开源的用户mongo数据迁移和同步的工具,支持往各种目标源头写数据

    具体:https://github.com/alibaba/MongoShake

    有业务场景想把oplog 写入到kafka 中,如果直接在collector.conf 设置kafka 信息会导致写入kafka 中数据是乱码

    官方解释是直接从collector 采集到的oplog是带有控制信息的。直接写入kafka 的内容使用时要进行剥离。

    在下载mongoshake 包时他会提供receiver 进行控制信息剥离

    mongo --> collector --> kafka --> receiver --> 业务

    mongo --> collector --> receiver --> kafka --> 业务

    这里更倾向于第二种

    collector --> receiver 我采用的是tcp 

    配置:

    collector.conf

    tunnel = tcp
    tunnel.address = 127.0.0.1:9300

    receiver.conf

    tunnel = tcp
    tunnel.address = 127.0.0.1:9300

    这里会很奇怪,也没有设置kafka 的地方啊,这样所有oplog剥离信息都会放在receiver 的log下

    这里官方解释是要求我们对源码进行修改、编译,源码是GO 写的,改起来也比较熟悉

      下载官方源码

    src/mongoshake/receiver/replayer.go

    在handler() 

    /*
     * Users should modify this function according to different demands.
     */
    func (er *ExampleReplayer) handler() {
    	config := sarama.NewConfig()//kafka配置
    	config.Producer.RequiredAcks = sarama.WaitForAll
    	config.Producer.Return.Successes = true
    	kafkaClient, err := sarama.NewSyncProducer([]string{conf.Options.KafkaHost}, config)
    	if err != nil {
    		LOG.Info("producer close,err:", err)
    		return
    	}
    
    	defer kafkaClient.Close()
    
    
    	for msg := range er.pendingQueue {
    		count := uint64(len(msg.message.RawLogs))
    		if count == 0 {
    			// probe request
    			continue
    		}
    
    		// parse batched message
    		oplogs := make([]*oplog.PartialLog, len(msg.message.RawLogs))
    		for i, raw := range msg.message.RawLogs {
    			oplogs[i] = new(oplog.PartialLog)
    			if err := bson.Unmarshal(raw, oplogs[i]); err != nil {
    				// impossible switch, need panic and exit
    				LOG.Crashf("unmarshal oplog[%v] failed[%v]", raw, err)
    				return
    			}
    			oplogs[i].RawSize = len(raw)
                            //这里是对oplog 做了一些定制化内容
    			kafkaOpLog := KafkaOpLog{}
    			kafkaOpLog.Namespace = oplogs[i].Namespace
    			kafkaOpLog.Query = oplogs[i].Query
    			kafkaOpLog.Object = oplogs[i].Object.Map()
    			kafkaOpLog.Operation = oplogs[i].Operation
    			kafkaOpLog.Timestamp = oplogs[i].Timestamp
    
    			msg := &sarama.ProducerMessage{}
    			msg.Topic = conf.Options.KafkaTopic
    			encode ,err := json.Marshal(kafkaOpLog)
    			if err != nil {
    				_ = LOG.Error("oplogs bson.MarshalJSON err",err)
    				continue
    			}
    			msg.Value = sarama.StringEncoder(encode)
    			msg.Key = sarama.StringEncoder(kafkaOpLog.Namespace)
    			_, _, err = kafkaClient.SendMessage(msg)
    			if err != nil {
    				_ = LOG.Error("send message failed,", err)
    				return
    			}
                            //原来源码中只是打印了log
    			//LOG.Info(oplogs[i]) // just print for test, users can modify to fulfill different needs
    		}
    
    		if callback := msg.completion; callback != nil {
    			callback() // exec callback
    		}
    
    		// get the newest timestamp
    		n := len(oplogs)
    		lastTs := utils.TimestampToInt64(oplogs[n-1].Timestamp)
    		er.Ack = lastTs
    
    		LOG.Debug("handle ack[%v]", er.Ack)
    
    		// add logical code below
    	}
    }
       

    然后go build   使用

  • 相关阅读:
    [LeetCode]Remove Duplicates from Sorted Array
    二叉树中和为某一值的路径
    机器学习基石笔记:Homework #2 Decision Stump相关习题
    机器学习基石笔记:08 Noise and Error
    机器学习基石笔记:07 The VC Dimension
    机器学习基石笔记:06 Theory of Generalization
    机器学习基石笔记:05 Training versus Testing
    正交矩阵、EVD、SVD
    win10安装ubuntu16.04及后续配置
    chmod命令相关
  • 原文地址:https://www.cnblogs.com/zhaosc-haha/p/12021444.html
Copyright © 2020-2023  润新知