• [转]Cross-type joins in Elasticsearch


    Cross-type joins in Elasticsearch

    http://rore.im/posts/elasticsearch-joins

    December 31, 2014

    When modeling data in Elasticsearch, a common question is how to design the data to capture relationships between entities, to allow at least some level of “joins”.

    Elasticsearch has a good guide about data modeling. One of the options provided for expressing relationships is the parent-child model.

    A parent-child relationship in Elasticsearch is a way to express a one-to-many relationship (a parent with many children). The parent and child are separate Elasticsearch types, bounded only by specifying the parent type on the child mapping, and by giving the parent ID for every child index operation (this is used for routing the child to the shard of the parent).

    It’s a useful model when a parent has many children and when the child update pattern is different from that of the parent. (Since every child is a separate document, updating the child does not require re-indexing the parent).

    But this model also provides an interesting (if limited) way to capture relationships between sibling types.

    Lets consider the following data:

    My helpful screenshot

    Bill has two children - Adam and Eve, and a Dog (Apple).
    Bob has no children or pets (ah, freedom!).
    Mary has a little newborn child called Lamb.
    Jane has a boy named Xander, a cat (Buffy) and a dog (Willow).

    Lets create this data in Elasticsearch.
    We will have a parent type - “person”, and two child types - “children” and “pets”.
    First we’ll create the mapping for the child types.

        #!/bin/bash
        
        export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
        
        # Create indexes
        
        curl -XPUT "$ELASTICSEARCH_ENDPOINT/es-joins" -d '{
            "mappings": {
                "children": {
                    "_parent": {
                        "type": "person"
                    }
                },
                "pets": {
                    "_parent": {
                        "type": "person"
                    }
                }
            }
        }' 
    

    Next, index all the documents - parents, children and pets.

        # Index documents
        curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
        {"index":{"_index":"es-joins","_type":"person","_id":1}}
        {"name":"Bill","gender":"male"}
        {"index":{"_index":"es-joins","_type":"person","_id":2}}
        {"name":"Bob","gender":"male"}
        {"index":{"_index":"es-joins","_type":"person","_id":3}}
        {"name":"Mary","gender":"female"}
        {"index":{"_index":"es-joins","_type":"person","_id":4}}
        {"name":"Jane","gender":"female"}
        {"index":{"_index":"es-joins","_type":"children","_parent":1,"_id":1}}
        {"name":"Adam","gender":"male"}
        {"index":{"_index":"es-joins","_type":"children","_parent":1,"_id":2}}
        {"name":"Eve","gender":"female"}
        {"index":{"_index":"es-joins","_type":"children","_parent":3,"_id":3}}
        {"name":"Lamb","gender":"male"}
        {"index":{"_index":"es-joins","_type":"children","_parent":4,"_id":4}}
        {"name":"Xander","gender":"male"}
        {"index":{"_index":"es-joins","_type":"pets","_parent":1,"_id":1}}
        {"name":"Apple","type":"dog"}
        {"index":{"_index":"es-joins","_type":"pets","_parent":4,"_id":2}}
        {"name":"Buffy","type":"cat"}
        {"index":{"_index":"es-joins","_type":"pets","_parent":4,"_id":3}}
        {"name":"Willow","type":"dog"}
        '
    

    Now we can do some searches on it.
    The usual example will be searching a parent by its children. Lets find all the parents that has a girl. We expect to get back only Bill.

        curl -XPOST "$ELASTICSEARCH_ENDPOINT/es-joins/person/_search?pretty" -d '
        {
            "query": {
                "filtered": {
                    "filter": {
                        "and": [
                            {
                                "has_child": {
                                    "type": "children",
                                    "query": {
                                        "term": {
                                            "gender": "female"
                                        }
                                    }
                                }
                            }
                        ]
                    }
                }
            }
        }
        '
    

    We can also combine conditions on multiple child types.
    Lets find parents that have a boy and a dog. This time we expect to get back both Bill and Jane.

        curl -XPOST "$ELASTICSEARCH_ENDPOINT/es-joins/person/_search?pretty" -d '
        {
            "query": {
                "filtered": {
                    "filter": {
                        "and": [
                            {
                                "has_child": {
                                    "type": "children",
                                    "query": {
                                        "term": {
                                            "gender": "male"
                                        }
                                    }
                                }
                            },
                            {
                                "has_child": {
                                    "type": "pets",
                                    "query": {
                                        "term": {
                                            "type": "dog"
                                        }
                                    }
                                }
                            }
                        ]
                    }
                }
            }
        }
        '
    

    Another commonly used option is finding children by their parents.
    But a more interesting possibility is finding children by their siblings.
    Lets lookup all boys that have a dog. To do that we’re searching on the “children” type, and doing a has_parent filter that contains a has_child filter on the “pets” type.
    This time we expect to get back the children - Adam and Xander.

        curl -XPOST "$ELASTICSEARCH_ENDPOINT/es-joins/children/_search?pretty" -d '
        {
            "query": {
                "filtered": {
                    "filter": {
                        "and": [
                            {
                                "has_parent": {
                                    "parent_type": "person",
                                    "filter": {
                                        "has_child": {
                                            "type": "pets",
                                            "query": {
                                                "term": {
                                                    "type": "dog"
                                                }
                                            }
                                        }
                                    }
                                }
                            },
                            {
                                "term": {
                                    "gender": "male"
                                }
                            }
                        ]
                    }
                }
            }
        }
        '
    

    Of course, our data model here is a bit simplified as it allows only a single parent. If we were to extend it, we would create a “family” parent type, with child types - “parents”, “children” and “pets”.

    Currently, in order to get the details of the “joined” entity, another query is needed. For example, when searching “all boys that have a dog”, if we want the details of the dogs we need a second search for “all dogs with parents that have children with _id=…” (and the _ids of the children from the first search).
    This will change with the new upcoming inner hits feature that will allow getting the data of the inner entities in a single query.

    One should note that this method is not exactly recommended by Elasticsearch. Because of the memory requirements and performance hit, the official recommendation is: “Avoid using multiple parent-child joins in a single query”. So as always, test, measure and choose your modeling wisely.

  • 相关阅读:
    全面认识golang string
    解决Manjaro Linux无法安装搜狗拼音
    解决QTableWidget不显示数据的问题
    在go modules中使用replace替换无法直接获取的package(golang.org/x/...)
    在go modules里使用go get进行包管理
    golang包管理解决之道——go modules初探
    反爬虫——使用chrome headless时一些需要注意的细节
    golang使用chrome headless获取网页内容
    <强化学习>开门帖
    <老古董>1992年之后的非线性支持向量机解法
  • 原文地址:https://www.cnblogs.com/freebird92/p/6340043.html
Copyright © 2020-2023  润新知