• Vivek's blog Roll your own autocomplete solution using Tries.


    Vivek's blog - Roll your own autocomplete solution using Tries.

    Roll your own autocomplete solution using Tries.

    You might have come across many websites with autocomplete suggestions, most notably Google.

    Google autosuggest

    Adding such an option to your site or application might seem daunting but there is a very simple recursive data structure that solves the problem. There is a ton of literature on the net on how to do this using black box approaches like Lucene, Solr, Sphinx, Redis etc. But all these packages require a lot of configuration and you also lose flexibility. Tries can be implemented in a few lines of code in any language of your choice.

    Trie

    A trie is basically a tree, with each node representing a letter as illustrated in the figure above. Words are paths along this tree and the root node has no characters associated with it.

    1. The value of each node is the path or the character sequence leading upto it.
    2. The children at each node are ideally represented using a hash table mapping the next character to the child nodes.
    3. Its also useful to set a flag at every node to indicate whether a word/phrase ends there.

    Now we can define methods to insert a string and to search for one.

    class Trie(object):
        def __init__(self, value=None):
            self.children = {}
            self.value = value
            self.flag = False # Flag to indicate that a word ends at this node
        def add(self, char):
            val = self.value + char if self.value else char
            self.children[char] = Trie(val)
        def insert(self, word):
            node = self
            for char in word:
                if char not in node.children:
                    node.add(char)
                node = node.children[char]
            node.flag = True
        def find(self, word):
            node = self
            for char in word:
                if char not in node.children:
                    return None
                node = node.children[char]
            return node.value
    view raw trie.py This Gist brought to you by GitHub.
    The insertion cost has a linear relationship with the string length. Now lets define the methods for listing all the strings which start with a certain prefix. The idea is to traverse down to the node representing the prefix and from there do a breadth first search of all the nodes that are descendants of the prefix node. We check if the node is at a word boundary from the flag we defined earlier and append the node’s value to the results. A caveat of the recursive approach, is you might run into stack depth problems when the maximum length of a string reaches tens of thousands of characters.
    class Trie(object):
        ...
        def all_prefixes(self):
            results = set()
            if self.flag:
                results.add(self.value)
            if not self.children: return results
            return reduce(lambda a, b: a | b,
                         [node.all_prefixes() for
                          node in self.children.values()]) | results
        def autocomplete(self, prefix):
            node = self
            for char in prefix:
                if char not in node.children:
                    return set()
                node = node.children[char]
            return node.all_prefixes()
    view raw
    trie.py
    This Gist brought to you by GitHub.

    To delete an item,  first traverse down to the leaf of the keyword and then work backwards, checking for common paths.

    And there you have it, an efficient way of retrieving strings with a common prefix in less than 40 lines of python code. I also wrote a C++ version using the STL data structures in about 90 lines, the time complexity for this version however is O(n log n) as the STL uses a red-black tree implementation for an associative array. Colin Dean has wrote a Ruby version and Marcus McCurdy, a Java one. You can find them all over here - https://github.com/vivekn/autocomplete. Read more about tries at Wikipedia.

    Update: Thanks to bmahler on reddit for pointing out that the wordlist approach was unnecessary and space inefficient. It has been corrected.

    Vivek's blog - Roll your own autocomplete solution using Tries.

    Roll your own autocomplete solution using Tries.

    You might have come across many websites with autocomplete suggestions, most notably Google.

    Google autosuggest

    Adding such an option to your site or application might seem daunting but there is a very simple recursive data structure that solves the problem. There is a ton of literature on the net on how to do this using black box approaches like Lucene, Solr, Sphinx, Redis etc. But all these packages require a lot of configuration and you also lose flexibility. Tries can be implemented in a few lines of code in any language of your choice.

    Trie

    A trie is basically a tree, with each node representing a letter as illustrated in the figure above. Words are paths along this tree and the root node has no characters associated with it.

    1. The value of each node is the path or the character sequence leading upto it.
    2. The children at each node are ideally represented using a hash table mapping the next character to the child nodes.
    3. Its also useful to set a flag at every node to indicate whether a word/phrase ends there.

    Now we can define methods to insert a string and to search for one.

    class Trie(object):
        def __init__(self, value=None):
            self.children = {}
            self.value = value
            self.flag = False # Flag to indicate that a word ends at this node
        def add(self, char):
            val = self.value + char if self.value else char
            self.children[char] = Trie(val)
        def insert(self, word):
            node = self
            for char in word:
                if char not in node.children:
                    node.add(char)
                node = node.children[char]
            node.flag = True
        def find(self, word):
            node = self
            for char in word:
                if char not in node.children:
                    return None
                node = node.children[char]
            return node.value
    view raw trie.py This Gist brought to you by GitHub.
    The insertion cost has a linear relationship with the string length. Now lets define the methods for listing all the strings which start with a certain prefix. The idea is to traverse down to the node representing the prefix and from there do a breadth first search of all the nodes that are descendants of the prefix node. We check if the node is at a word boundary from the flag we defined earlier and append the node’s value to the results. A caveat of the recursive approach, is you might run into stack depth problems when the maximum length of a string reaches tens of thousands of characters.
    class Trie(object):
        ...
        def all_prefixes(self):
            results = set()
            if self.flag:
                results.add(self.value)
            if not self.children: return results
            return reduce(lambda a, b: a | b,
                         [node.all_prefixes() for
                          node in self.children.values()]) | results
        def autocomplete(self, prefix):
            node = self
            for char in prefix:
                if char not in node.children:
                    return set()
                node = node.children[char]
            return node.all_prefixes()
    view raw
    trie.py
    This Gist brought to you by GitHub.

    To delete an item,  first traverse down to the leaf of the keyword and then work backwards, checking for common paths.

    And there you have it, an efficient way of retrieving strings with a common prefix in less than 40 lines of python code. I also wrote a C++ version using the STL data structures in about 90 lines, the time complexity for this version however is O(n log n) as the STL uses a red-black tree implementation for an associative array. Colin Dean has wrote a Ruby version and Marcus McCurdy, a Java one. You can find them all over here - https://github.com/vivekn/autocomplete. Read more about tries at Wikipedia.

    Update: Thanks to bmahler on reddit for pointing out that the wordlist approach was unnecessary and space inefficient. It has been corrected.

  • 相关阅读:
    Go语言基础--1.1 变量的声明
    基本语法
    弹性盒子修改
    弹性盒子内容
    弹性盒子
    响应式列重置
    栅格系统
    布局容器
    额外按钮
    可消失的弹出框
  • 原文地址:https://www.cnblogs.com/lexus/p/2379830.html
Copyright © 2020-2023  润新知