• 【python cookbook】python访问子字符串


    访问子字符串最简单的的方式是使用切片

    afiled = theline[3:8]

    但一次只能取一个子字符串

    如果还要考虑字段的长度 struct.unpack可能更合适

    import struct
    #得到一个5字节的字符串 跳过三字节 得到两个8字节的字符串 以及其余部分
    
    baseformat = "5s 3x 8s 8s"
    #theline超出的长度也由这个base-format 确定
    numremain = len(theline) - struct.calcsize(baseformat)
    #用合适的s或者x字段完成格式 然后unpack
    format = "%s %ds" % (baseformat,numremain)
    l,s1,s2,t = struct.unpack(format,theline)
    #test

    >>> theline = "numremain = len(theline) - struct.calcsize(baseformat)" >>> numremain = len(theline) - struct.calcsize(baseformat) >>> format = "%s %ds" % (baseformat,numremain) >>> format '5s 3x 8s 8s 30s' >>> l,s1,s2,t = struct.unpack(format,theline) >>> l 'numre' >>> s1 'n = len(' >>> s2 'theline)' >>> t ' - struct.calcsize(baseformat)'

    如果获取固定字长的数据,可以利用带列表推导(LC)的切片方法

    pieces = [theline[k:k+n] for k in xrange(0,len(theline),n)]

    如果想把数据切成指定长度的列 用带LC的切片方法比较容易实现

    cuts = [8,14,20,26,30]
    pieces = [ theline[i,j] for i j in zip([0]+cuts,cuts+[None])]

    在LC中调用zip,返回的是一个列表每项形如cuts[k],cuts[k+1]

    第一项和最后一项为(0,cuts[0]) (cuts[len(cuts)-1],None)

     

    将以上代码片段封装成函数

    def fields(baseformat,theline,lastfield=False):
        #theline 超出的长度也有这个base-format 确定
        #(通过 struct.calcsize计算切片的长度)
        numremain = len(theline)-struct.calcsize(baseformat)
    
        #用合适的s或者x字段完成格式 然后unpack
        format = "%s %d %s" % (baseformat,numre

    下边这个是使用memoizing机制的版本

    def fields(baseformat,theline,lastfield=False,_cache={ }):
        #生成键并尝试获得缓存的格式字符串
        key = baseformat,len(theline),lastfield
        format _cache.get(key)
        if format is None:    
            #m没有缓存的格式字符串 创建并缓存
            numremain = len(theline) - struct.calcsize(baseformat)
            _cache[key] = format = "%s %d%s" % (
                baseformat,numremain,lastfield and "s" or "x")
        return struct.unpack(format,theline)

    cookbook上说的这个比优化之前的版本快30%到40% 不过如果这里不是瓶颈部分,没必要使用这种方法

    使用LC切片函数

    def split_by(theline,n,lastfield=False):
        #切割所有需要的片段
        pieces = [theline[k:k+n] for k in xrange(0,len(theline),n)]
        #弱最后一段太短或不需要,丢弃
        if not lastfield and len(pieces[-1] < n):
            pieces.pop()
        return pieces
    def split_at(theline,cuts,lastfield=False):
        #切割所有需要的片段
        pieces = [ theline[i,j] for i j in zip([0]+cuts,cuts+[None])]
        #若不需要最后一段 丢弃
        if not lastfield:
            pieces.pop()
        return pieces

    使用生成器的版本

    def split_at(the_line,cuts,lastfield=False):
        last = 0
        for cut in cuts:
            yield the_line[last:cut]
            last = cut
        if lastfield:
            yield the_line[last:]
    def split_by(the_line,n,lastfield=False):
        return split_at1(the_line,xrange(n,len(the_line),n),lastfield)

    zip()的用法

    zip([iterable...])

    This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The returned list is truncated in length to the length of the shortest argument sequence. When there are multiple arguments which are all of the same length, zip() is similar to map() with an initial argument of None. With a single sequence argument, it returns a list of 1-tuples. With no arguments, it returns an empty list.

    The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n).

    zip() in conjunction with the * operator can be used to unzip a list:

    >>> x = [1, 2, 3]
    >>> y = [4, 5, 6]
    >>> zipped = zip(x, y)
    >>> zipped
    [(1, 4), (2, 5), (3, 6)]
    >>> x2, y2 = zip(*zipped)
    >>> x == list(x2) and y == list(y2)
    True

      >>> x2
      (1, 2, 3)
      >>> y2
      (4, 5, 6)

     

    生成器的用法参见这篇博客 http://www.cnblogs.com/cacique/archive/2012/02/24/2367183.html

  • 相关阅读:
    iOS基础
    iOS基础 ----- 内存管理
    NSAttributedString 的一些基本用法
    node安装使用
    docker常用命令
    docker lnmp
    easy-wechat
    composer
    center7系统搭建lnmp
    xammp环境配置
  • 原文地址:https://www.cnblogs.com/cacique/p/2603640.html
Copyright © 2020-2023  润新知