• 用elasticsearch索引mongodb数据


    参照网页:单机搭建elasticsearch和mongodb的river

    三个步骤:

    一,搭建单机replicSet
    二,安装mongodb-river插件
    三,创建meta,验证使用

    第一步,搭建单机mongodb的replSet

    1,配置/etc/mongodb.conf
    增加两个配置:

    replSet=rs0 #这里是指定replSet的名字 
    oplogSize=100 #这里是指定oplog表数据大小(太大了不支持)

    启动mongodb:bin/mongod --fork --logpath /data/db/mongodb.log -f /etc/mongodb.conf

    2,初始化replicSet

    root# bin/mongo
    >rs.initiate( {"_id" : "rs0", "version" : 1, "members" : [ { "_id" : 0, "host" : "127.0.0.1:27017" } ]}) 

    3,搭建好replicSet之后,退出mongo shell重新登录,提示符会变成:

    rs0:PRIMARY>

    第二步, 安装mongodb-river插件

    插件项目:https://github.com/richardwilly98/elasticsearch-river-mongodb
    安装插件命令:

    bin/plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/2.0.0

    完毕后启动elasticsearch,正常会显示如下提示信息:

    root# bin/elasticsearch
    
    ...
    [2014-03-14 19:28:34,179][INFO ][plugins] [Super Rabbit] loaded [mongodb-river], sites [river-mongodb]
    [2014-03-14 19:28:41,032][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] Starting river mongodb_test
    [2014-03-14 19:28:41,087][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] MongoDB River Plugin - version[2.0.0] - hash[a0c23f1] - time[2014-02-23T20:40:05Z]
    [2014-03-14 19:28:41,087][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] starting mongodb stream. options: secondaryreadpreference [false], drop_collection [false], include_collection [], throttlesize [5000], gridfs [false], filter [null], db [test], collection [page], script [null], indexing to [test]/[page]
    [2014-03-14 19:28:41,303][INFO ][org.elasticsearch.river.mongodb.MongoDBRiver] MongoDB version - 2.2.7

    第三步,创建meta信息

    1,创建mongodb连接

    root# curl -XPUT "localhost:9200/_river/mongodb_mytest/_meta" -d ' 
    > {
    > "type": "mongodb", 
    > "mongodb": { 
    > "host": "localhost", 
    > "port": "27017", 
    > "db": "testdb", 
    > "collection": "testcollection" 
    > }, 
    > "index": { 
    > "name": "testdbindex", 
    > "type": "testcollection"} }'
    {"_index":"_river","_type":"mongodb_mytest","_id":"_meta","_version":1,"created":true}'
    返回created为true,表示创建成功,也可通过curl "http://localhost:9200/_river/mongodb_mytest/_meta"查看

    主要分为三个部分:

    type:river的类型,也就是“mongodb”
    mongodb:mongodb的连接信息
    index:elastisearch中用于接收mongodb数据的索引index和“type”。

    其中mongodb_mytest为${es.river.name},每个索引名称都不一样,如果重复插入会导致索引被覆盖的问题。

    2,往mongodb插入数据

    rs0:PRIMARY> db.testcollection.save({name:"stone"})

    3,自定义查询

    root# curl -XGET 'http://localhost:9200/testdbindex/_search?q=name:stone'
    {"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.30685282,"hits":[{"_index":"testdbindex","_type":"testcollection","_id":"5322eb23fdfc233ffcfa02bb","_score":0.30685282, "_source" : {"_id":"5322eb23fdfc233ffcfa02bb","name":"stone"}}]}}

    一个问题(我这边测试不存在这个问题,创建meta后之前mongodb中已存在的数据也会被索引,不过还是把原作者的解决方案放在下面吧)

    "在river建立之后的数据变动会体现在elasticsearh里,但是river建立前的数据变动因为没有在oplog表里,不能被同步。解决方案是,遍历一次需要导出的表,重新插入到另外一个表里,然后将river指定到这个新表,这样新表的变动就可以全部体现在oplog里了。"

    遍历mongodb的表可以通过cursor来实现:

    var myCursor = db.oldcollection.find( { }, {html:0} ); 
    myCursor.forEach(function(myDoc) {db.newcollection.save(myDoc); });

    附:mongodb&mongodb-river(elasticsearch)部署

    elasticsearch使用示例如下:(index索引 对应 database数据库,type类型 对应 table数据表)

    1,查询单个索引条目
    curl -XGET 'http://localhost:9200/testdbindex/testcollection/532a45ad94af83f0122292cf'
    {"_index":"testdbindex","_type":"testcollection","_id":"532a45ad94af83f0122292cf","_version":1,"found":true, "_source" : {"_id":"532a45ad94af83f0122292cf","name":"stone"}}
    
    2,查询多个索引条目
    curl 'localhost:9200/testdbindex/testcollection/_mget' -d '{  
        "ids":["532a40f51d82291684692d1d","532a45ad94af83f0122292cf"]  
    }'  
    
    3,搜索指定域(类似关系型数据库列字段)
    curl -XGET 'http://localhost:9200/testdbindex/testcollection/532a40f51d82291684692d1d?fields=title'
    
    4,搜索
    curl -XGET 'http://localhost:9200/testdbindex/testcollection/_search' -d '{  
        "query":{  
            "term" : {"name":"penjin"}  
        }  
    }'
    
    5,在所有type类型里面搜索name=stone
    curl -XGET 'http://localhost:9200/testdbindex/_search?q=name:stone'
    
    6,在指定type为testcollection里面搜索
    curl -XGET 'http://localhost:9200/testdbindex/testcollection/_search?q=name:stone'
    
    7
    查找count数目
    curl -XGET 'http://localhost:9200/testdbindex/testcollection/_count?q=name:stone'
    curl -XGET 'http://localhost:9200/testdbindex/_count?q=name:stone'
    
    curl -XGET 'http://localhost:9200/testdbindex/blogs/_count' -d '
    {
        "query" : {    
            "term" : { "name" : "stone" }
        }
    }'
    
    8,复杂查询
    /**
    * 1,指定查询起始及数目
    * 2,指定排序
    * 3,查询指定域
    * 4,查询条件
    */
    curl -XGET 'http://localhost:9200/testdbindex/blogs/_search' -d '
    {
        "from" : 0, "size" : 10,
        "sort" : [
            { "name" : "desc" }
        ],
        "fields" : ["name"],
        "query" : {    
            "term" : { "name" : "stone" }
        }
    }'
    /**
    * 依赖分词
    */
    curl -XGET 'http://localhost:9200/testdbindex/blogs/_search' -d '
    {
        "query" : {    
            "match" : {
                _all : "stone"
            }
        }
    }'
    /**
    * 类似数据库like语句
    */
    curl -XGET 'http://localhost:9200/testdbindex/blogs/_search' -d '
    {
        "query" : {    
            "fuzzy_like_this" : {
                "fields" : ["name"],
                "like_text" : "ston",
                "max_query_terms" : 12
            }
        }
    }'
    
    9,更多高级查询参照elasticsearch官方页面

    如果索引数据多了,elasticsearch的data目录会很大,如果不得不清理磁盘的话,删除索引即可。一般情况需要扩容磁盘。

    root# curl -XDELETE 'http://localhost:9200/testdbindex'
    root# curl -XDELETE 'http://localhost:9200/_river' (这行不需要)
    {"acknowledged":true}

    java语言使用jar包查询等操作也很方便(依赖elasticsearch.jar与lucene-core.jar包,es的安装包解压后lib目录下有)

    package com.ciaos;
    
    import java.util.Iterator;
    import java.util.Map.Entry;
    
    import org.elasticsearch.action.search.SearchResponse;
    import org.elasticsearch.action.search.SearchType;
    import org.elasticsearch.client.transport.TransportClient;
    import org.elasticsearch.common.transport.InetSocketTransportAddress;
    import org.elasticsearch.common.unit.TimeValue;
    import org.elasticsearch.index.query.QueryBuilder;
    import org.elasticsearch.index.query.QueryBuilders;
    import org.elasticsearch.search.SearchHit;
    
    public class EsDemo {
    
        private static TransportClient client = null;
    
        public static void GetConnection(){
            client = new TransportClient().addTransportAddress(new InetSocketTransportAddress(
                    "127.0.0.1", 9300));
        }
    
        public static void searchIndex() {
    
            QueryBuilder qb = QueryBuilders.termQuery("name", "stone");
    
            SearchResponse scrollResp = client.prepareSearch("testdbindex")
                            .setSearchType(SearchType.SCAN)
                            .setScroll(new TimeValue(60000))
                            .setQuery(qb.buildAsBytes())
                            .setSize(100).execute().actionGet();
            while (true) {
                scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(600000)).execute().actionGet();
                boolean hitsRead = false;
                for (SearchHit hit : scrollResp.getHits()) {
                    hitsRead = true;
                    Iterator<Entry<String, Object>> rpItor = hit.getSource().entrySet().iterator();
                    while (rpItor.hasNext()) {
                         Entry<String, Object> rpEnt = rpItor.next();
                         System.out.println(rpEnt.getKey() + " : " + rpEnt.getValue());
                    }
                }
                if (!hitsRead) {
                    break;
                }
            }
        }
    
        public static void main(String[] args) {
            // TODO Auto-generated method stub
            GetConnection();
            searchIndex();
            
            client.close();
        }
    }

    运行结果如下:

    _id : 532a49e294af83f0122292d3
    name : stone
    _id : 532a45ad94af83f0122292cf
    name : stone
  • 相关阅读:
    python高级特性和高阶函数
    代理模式及案例
    我的报错错误记录
    摘抄-编码规范
    测试java的Lambda语法
    测试IDEA将新建项目提交到github上
    js处理科学计数法
    测试java操作运算符
    java根据模板生成,导出word和pdf(aspose.words实现word转换pdf)
    sqlserver日期函数
  • 原文地址:https://www.cnblogs.com/ciaos/p/3601209.html
Copyright © 2020-2023  润新知