• 【原创】大数据基础之Logstash(6)mongo input


    logstash input插件之mongodb是第三方的,配置如下:

    input {
      mongodb {
        uri => 'mongodb://mongo_server:27017/db'
        placeholder_db_dir => '/path/to/db_dir/'
        placeholder_db_name => 'table.db'
        collection => 'table'
        batch_size => 5000
      }
    }

    安装

    ./logstash-plugin install logstash-input-mongodb

    插件实现非常简单,就一个ruby文件,

    https://github.com/phutchins/logstash-input-mongodb/blob/master/lib/logstash/inputs/mongodb.rb

    使用sqlite来维护状态,db文件目录在 placeholder_db_dir,可以直接通过sqlite命令查看和修改

    # sqlite3 /path/to/db_dir/table.db

    db结构

    sqlite> .schema
    CREATE TABLE `since_table` (`table` varchar(255), `place` Int);
    sqlite> select * from since_table order by place desc limit 1;
    logstash_since_table|5d0b2c2682b7d74de069ce4d

    插件中取place代码

      public
      def get_placeholder(sqlitedb, since_table, mongodb, mongo_collection_name)
        since = sqlitedb[SINCE_TABLE]
        x = since.where(:table => "#{since_table}_#{mongo_collection_name}")
        if x[:place].nil? || x[:place] == 0
          first_entry_id = init_placeholder(sqlitedb, since_table, mongodb, mongo_collection_name)
          @logger.debug("FIRST ENTRY ID for #{mongo_collection_name} is #{first_entry_id}")
          return first_entry_id
        else
          @logger.debug("placeholder already exists, it is #{x[:place]}")
          return x[:place][:place]
        end
      end

    place取自mongo的_id

    > db.table.find().limit(1).pretty()
    {
            "_id" : ObjectId("5b48cd2382b7d752b802de31"),
    ...

    可以手工通过sqlite的update命令来操作进度;

    同步过程日志

    D, [2019-06-20T16:21:31.938302 #28968] DEBUG -- : MONGODB | 47.92.149.159:27017 | db.find | STARTED | {"find"=>"table", "filter"=>{"_id"=>{"$gt"=>BSON::ObjectId('5d0b420782b7d74de069db7b')}}, "limit"=>10000}
    D, [2019-06-20T16:21:31.941658 #28968] DEBUG -- : MONGODB | 47.92.149.159:27017 | db.find | SUCCEEDED | 0.002s

    读place,从place开始取10000条,然后写place,如此往复

    参考:https://github.com/phutchins/logstash-input-mongodb

  • 相关阅读:
    poj 2378 删点最大分支不超过一半
    poj 3107 删点最大分支最小
    hdu 3586 最小代价切断根与叶子联系
    hdu 4276 树形m时间1进n出
    hdu 4044 树形DP 炮台打怪 (好题)
    poj 3345 树形DP 附属关系+输入输出(好题)
    poj 2486 树形DP n选m连续路径
    hdu 1561 树形DP n个选m个价值最大
    poj 1947 树形背包
    hdu 1011 树形背包
  • 原文地址:https://www.cnblogs.com/barneywill/p/11058828.html
Copyright © 2020-2023  润新知