• Dictionaries and Sets


    1. Handling missing keys with setdefault

    import sys
    import re
    
    WORD_RE = re.compile('w+')
    
    index = {}
    
    print(sys.argv)
    
    # Example 3-2
    with open(sys.argv[1], encoding='utf-8') as fp:
        for line_no, line in enumerate(fp, 1):
            for match in WORD_RE.finditer(line):
                # finditer 返回的格式: <_sre.SRE_Match object; span=(0, 4), match='User'> ;
                # 既有匹配到的内容,也有该内容的位置, match.start() 和 match.end()分别表起始位置和结束位置
                word = match.group()
                # match.group() 返回匹配到的内容: 如  User
                column_no = match.start() + 1
                location = (line_no, column_no)
    
                # 以下为常规写法:
                occurrences = index.get(word, [])
                occurrences.append(location)
                index[word] = occurrences
    
    for word in sorted(index, key=str.upper):   # 对字典进行排序
        print(word, index[word])
    
    print("-----------------------")
    
    # Example 3-4:handling missing keys with setdefault
    index2 = {}
    
    with open(sys.argv[1], encoding='utf-8') as fp:
        for line_no, line in enumerate(fp, 1):
            for match in WORD_RE.finditer(line):
                word = match.group()
                column_no = match.start() + 1
                occurrences = (line_no, column_no)
                # Missing keys with setdefault
                index2.setdefault(word, []).append(occurrences)
                # setdefault :有就用它原来的,没有则设置
                # Get the list of occurrences for word, or set it to [] if not found;
                # setdefault returns the value, so it can be updated without requiring a second search.
    
    for word in sorted(index2, key=str.upper):
        print(word, index2[word])
    
    # Output 示例:
    # flasgger [(3, 6), (4, 6)]
    # flask [(2, 6)]
    # Flask [(2, 19)]
    # from [(2, 1), (3, 1), (4, 1)]
    # import [(1, 1), (2, 12), (3, 15), (4, 21)]
    # jsonify [(2, 26)]
    # random [(1, 8)]
    # request [(2, 35)]
    # Swagger [(3, 22)]
    # swag_from [(4, 28)]
    # utils [(4, 15)]
    
    """
    The result of this line ...
        my_dict.setdefault(key, []).append(new_value)
    ... is the same as running ...
        if key not in my_dict:
            my_dict[key] = []
        my_dict[key].append(new_value)
    ... except that the latter code performs at least two searches for key --- three if not found --- while setdefault
    does it all with a single lookup.
    """

    2. Mapping with Flexible Key Lookup

    2.1 defaultdict: Another Take on Missing Keys

    示例代码如下:

    import re
    import sys
    import collections
    
    WORD_RE = re.compile('w+')
    
    index = collections.defaultdict(list)
    
    with open(sys.argv[1], encoding='utf-8') as fp:
        for line_no, line in enumerate(fp, 1):
            for match in WORD_RE.finditer(line):
                word = match.group()
                column_no = match.start() + 1
                occurrences = (line_no, column_no)
    
                # defaultdict 示例:
                index[word].append(occurrences)
    
    for word in sorted(index, key=str.upper):
        print(word, index[word])
    
    # Output:
    # flasgger [(3, 6), (4, 6)]
    # flask [(2, 6)]
    # Flask [(2, 19)]
    # from [(2, 1), (3, 1), (4, 1)]
    # import [(1, 1), (2, 12), (3, 15), (4, 21)]
    # jsonify [(2, 26)]
    # random [(1, 8)]
    # request [(2, 35)]
    # Swagger [(3, 22)]
    # swag_from [(4, 28)]
    # utils [(4, 15)]
    
    """
    defaultdict:
    How defaultdict works:
        When instantiating a defaultdict, you provide a callable that is used to produce default value whenever __getitem__
        is passed a nonexistent key argument.
        For example, given an empty defaultdict created as dd = defaultdict(list), if 'new_key' is not in dd, the 
        expression dd['new_key'] does the following steps:
            1. Call list() to create a new list.
            2. Inserts the list into dd using 'new_key' as key.
            3. Returns a reference to that list.
            
    The callable that produces the default values is held in an instance attribute called default_factory.
    If no default_factory is provided, the usual KeyError is raised for missing keys.
    
    The default_factory of a defaultdict is only invoked to provide default values for __getitem__ calls, and not for the
    other methods. For example, if dd is a defaultdict, and k is a missing key, dd[k] will call the default_factory to 
    create a default value, but dd.get(k) still returns None.
    
    The mechanism that makes defaultdict work by calling default_factory is actually the __missing__ special method, a
    feature supported by all standard mapping.
    """

    2.2 The __missing__ Method

    示例代码如下:

    """ StrKeyDict0 converts nonstring keys to str on lookup """
    
    
    class StrKeyDict0(dict):
    
        def __missing__(self, key):
            if isinstance(key, str):    # 如果没有这个判断,self[k] 在没有的情况下会无限递归调用 __missing__
                raise KeyError(key)
            return self[str(key)]
    
        def get(self, key, default=None):
            """
            The get method delegates to __getitem__ by using the self[key] notation; that gives the opportunity for
            our __missing__ to act.
            :param key:
            :param default:
            :return:
            """
            try:
                return self[key]
            except KeyError:
                return default
    
        def __contains__(self, key):
            # 此时不能用 key in self (self 指 StrKeyDict0 的实例,就是一个字典)进行判断,
            # 因为 k in dict 也会调用 __contains__ ,所以会出现无限递归调用 __contains__
            return key in self.keys() or str(key) in self.keys()
    
    # A better way to create a user-defined mapping type is to subclass collections.UserDict instead of dict.
    
    
    """
    Underlying the way mappings deal with missing keys is the aptly named __missing__ method. This method is not defined in
    the base dict class, but dict is aware of it: if you subclass dict and provide a __missing__ method, the standard 
    dict.__getitem__ will call it whenever a key is not found, instead of raising KeyError.
    
    The __missing__ method is just called by __getitem__ (i.e., for the d[k] operator). The presence of a __missing__ method
    has no effect on the behavior of other methods that look up keys, such as get or __contains__ .
    """

    小结: 对于字典中不存在的 key ,有三种方式进行处理: 1. setdefault  2. collections.defaultdict  3. __missing__ 方法 

    3. Variations of dict: UserDict

    UserDict is designed to be subclassed.

    示例代码:

    """ convert non-string keys to str -- on insertion, update and lookup """
    import collections
    
    
    class StrKeyDict(collections.UserDict):
    
        def __missing__(self, key):
            if isinstance(key, str):
                raise KeyError(key)
            return self[str(key)]
    
        def __contains__(self, key):
            # self.data : UserDict 并不继承 dict,但它内部有一个 dict 的实例,叫 data, 这个 data 保存着 UserDict 实例的真正数据
            return str(key) in self.data
    
        def __setitem__(self, key, value):
            # UserDict 实例中的数据存放在 data 属性中
            # This method is easier to overwrite when we can delegate to the self.data attribute.
            self.data[str(key)] = value
    
    
    """
    It's almost always easier to create a new mapping type by extending UserDict rather than dict. The main reason is that
    the built-in has some implementation shortcuts that end up forcing us to override methods that we can just inherit
    from UserDict with no problem.
    
    UserDict does not inherit from dict, but has an internal dict instance, call data, which holds the actual items. This
    avoids undesired recursion when coding special methods like __setitem__ , and simplify the coding of __contains__ .
    """

    4. Immutable Mappings

    示例代码如下:

    >>> from types import MappingProxyType
    >>> 
    >>> d = {1: 'A'}
    >>> d_proxy = MappingProxyType(d)
    >>> d_proxy
    mappingproxy({1: 'A'})
    >>> d_proxy[1]              # Items in d can be seen through d_proxy
    'A'
    >>> d_proxy[2] = 'x'        # Changes cannot be made through d_proxy
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: 'mappingproxy' object does not support item assignment
    >>> d[2] = 'B'
    >>> d_proxy                 # d_proxy is dynamic: any changes in d is reflected.
    mappingproxy({1: 'A', 2: 'B'})
    >>> 
    
    
    """
    The mapping types provided by the standard library are all mutable, but you may need to guarantee that a user cannot 
    change a mapping by mistake.
    
    Since Python3.3, the types module provides a wrapper class called MappingProxyType, which, given a mapping, returns
    a mappingproxy instance that is a read-only but dynamic view of the original mapping. So updates to the original
    mapping can be seen in the mappingproxy, but changes cannot be made through it.
    """

    end

  • 相关阅读:
    MFC的序列化的一点研究.
    一次LoadRunner的CPC考试经历
    LAMP架构上(一)
    文件和目录管理
    如何在Linux上清理内存缓存、缓冲与交换空间
    Linux Shell基础(下)
    防火墙(上)
    LAMP架构(三)
    LNMP(二)
    LNMP(一)
  • 原文地址:https://www.cnblogs.com/neozheng/p/12180100.html
Copyright © 2020-2023  润新知