• 写一手漂亮的代码,走向极致的编程 一、代码运行时间分析


    前言

    写一手漂亮的代码,何谓漂亮的代码?对我来说大概有这么几点:

    1. 写法符合规范(如:该空格的地方打上空格,该换行的地方换行,名命方式符合规范等等)
    2. 简洁且可读性高(能十行代码实现并且让人容易看懂的绝不写十一行,对经常重复出现的代码段落进行封装)
    3. 性能高(如:运行时间尽可能短,运行时所用内存尽可能少)

    要实现以上目标,自然就要对代码进行优化,说到代码的优化,自然而然就会想到对算法时间复杂度进行优化,比如我要实现一个在有序数组中查找一个数,最容易想到的就是遍历一遍 O(n) 的复杂度,优化一下自然是使用二分, O(logn) 的复杂度。如果这段代码在我们的程序中会经常被调用,那么,通过这算法上的优化,我们的程序性能自然而然的会有很高的提升。

    但是,有时候会发现,已经对算法进行优化了,程序的性能(如运行时间、内存占用等)仍然不能达到预期,那么,这时候该如何对我们的代码进行进一步的优化呢?

    这篇文章将以 Python 为例进行介绍

    先来段代码

    这里,我将通过使用 Julia 分形的代码来进行。

    Julia 集合,由式 (f_c(z) = z ^2 + c) 进行反复迭代到。

    对于固定的复数 c ,取某一 z 值,可以得到序列

    (z_0, f_c(z_0), f_c(f_c(z_0)), ...)

    这一序列可能发散于无穷大或处于某一范围之内并收敛于某一值,我们将使其不扩散的 z 值的集合称为朱利亚集合。

    import time
    import numpy as np
    import imageio
    import PIL
    import matplotlib.pyplot as plt
    import cv2 as cv
    
    x1, x2, y1, y2 = -1.8, 1.8, -1.8, 1.8
    c_real, c_imag = -0.62772, -0.42193
    
    def calculate_z_serial_purepython(maxiter, zs, cs):
        output = [0] * len(zs)
        for i in range(len(zs)):
            n = 0
            z = zs[i]
            c = cs[i]
            while abs(z) < 2 and n < maxiter:
                z = z * z + c
                n += 1
            output[i] = n
        return output
    
    def calc_pure_python(desired_width, max_itertions):
        x_step = (float(x2 - x1)) / float(desired_width)
        y_step = (float(y2 - y1)) / float(desired_width)
        x, y = [], []
        ycoord = y1
        while ycoord < y2:
            y.append(ycoord)
            ycoord += y_step
        xcoord = x1
        while xcoord < x2:
            x.append(xcoord)
            xcoord += x_step
        zs, cs = [], []
        for ycoord in y:
            for xcoord in x:
                zs.append(complex(xcoord, ycoord))
                cs.append(complex(c_real, c_imag))
        print(f"Length of x: {len(x)}")
        print(f"Total elements: {len(zs)}")
        start_time = time.time()
        output = calculate_z_serial_purepython(max_itertions, zs, cs)
        end_time = time.time()
        secs = end_time - start_time
        print("calculate_z_serial_purepython took", secs, "seconds")
    
        assert sum(output) == 33219980
        # # show img
        # output = np.array(output).reshape(desired_width, desired_width)
        # plt.imshow(output, cmap='gray')
        # plt.savefig("julia.png")
        
    
    if __name__ == "__main__":
        calc_pure_python(desired_width=1000, max_itertions=300)
    

    这段代码运行完,可以得到图片

    运行结果

    Length of x: 1000
    Total elements: 1000000
    calculate_z_serial_purepython took 25.053941249847412 seconds
    

    开始分析

    这里,将通过各种方法来对这段代码的运行时间来进行分析

    直接打印运行时间

    在前面的代码中,我们可以看到有 start_time 和 end_time 两个变量,通过 print 两个变量的差值即可得到运行时间,但是,每次想要打印运行时间都得加那么几行代码就会很麻烦,此时我们可以通过使用修饰器来进行

    from functools import wraps
    def timefn(fn):
        @wraps(fn)
        def measure_time(*args, **kwargs):
            start_time = time.time()
            result = fn(*args, **kwargs)
            end_time = time.time()
            print("@timefn:" + fn.__name__ + " took " + str(end_time - start_time), " seconds")
            return result
        return measure_time
    

    然后对 calculate_z_serial_purepython 函数进行测试

    @timefn
    def calculate_z_serial_purepython(maxiter, zs, cs):
    	...
    

    运行后输出结果

    Length of x: 1000
    Total elements: 1000000
    @timefn:calculate_z_serial_purepython took 26.64286208152771  seconds
    calculate_z_serial_purepython took 26.64286208152771 seconds
    

    另外,也可以在命令行中输入

    python -m timeit -n 5 -r 5 -s "import code" "code.calc_pure_python(desired_width=1000, max_itertions=300)"
    

    其中 -n 5 表示循环次数, -r 5 表示重复次数,timeit 会对语句循环执行 n 次,并计算平均值作为一个结果,重复 r 次选出最好的结果。

    5 loops, best of 5: 24.9 sec per loop
    

    UNIX tine 命令

    由于电脑上没有 Linux 环境,于是使用 WSL 来进行

    time -p python code.py
    如果是 Linux 中进行,可能命令需改成
    /usr/bin/time -p python code.py
    

    输出结果

    Length of x: 1000
    Total elements: 1000000
    @timefn:calculate_z_serial_purepython took 14.34933090209961  seconds
    calculate_z_serial_purepython took 14.350624322891235 seconds
    real 15.57
    user 15.06
    sys 0.40
    

    其中 real 记录整体耗时, user 记录了 CPU 花在任务上的时间,sys 记录了内核函数耗费的时间

    /usr/bin/time --verbose python code.py
    

    输出,WSL 的 time 命令里面没有 --verbose 这个参数,只能到服务器里面试了,突然觉得我的笔记本跑的好慢。。。

    Length of x: 1000
    Total elements: 1000000
    @timefn:calculate_z_serial_purepython took 7.899603605270386  seconds
    calculate_z_serial_purepython took 7.899857997894287 seconds
            Command being timed: "python code.py"
            User time (seconds): 8.33
            System time (seconds): 0.08
            Percent of CPU this job got: 98%
            Elapsed (wall clock) time (h:mm:ss or m:ss): 0:08.54
            Average shared text size (kbytes): 0
            Average unshared data size (kbytes): 0
            Average stack size (kbytes): 0
            Average total size (kbytes): 0
            Maximum resident set size (kbytes): 98996
            Average resident set size (kbytes): 0
            Major (requiring I/O) page faults: 0
            Minor (reclaiming a frame) page faults: 25474
            Voluntary context switches: 0
            Involuntary context switches: 2534
            Swaps: 0
            File system inputs: 0
            File system outputs: 0
            Socket messages sent: 0
            Socket messages received: 0
            Signals delivered: 0
            Page size (bytes): 4096
            Exit status: 0
    

    这里面需要关心的参数是 Major (requiring I/O) page faults ,表示操作系统是否由于 RAM 中的数据不存在而需要从磁盘上读取页面。

    cProfile 模块

    cProfile 模块是标准库内建三个的分析工具之一,另外两个是 hotshot 和 profile。

    python -m cProfile -s cumulative code.py
    

    -s cumulative 表示对每个函数累计花费的时间进行排序

    输出

    36222017 function calls in 30.381 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.000    0.000   30.381   30.381 {built-in method builtins.exec}
            1    0.064    0.064   30.381   30.381 code.py:1(<module>)
            1    1.365    1.365   30.317   30.317 code.py:35(calc_pure_python)
            1    0.000    0.000   28.599   28.599 code.py:13(measure_time)
            1   19.942   19.942   28.598   28.598 code.py:22(calculate_z_serial_purepython)
     34219980    8.655    0.000    8.655    0.000 {built-in method builtins.abs}
      2002000    0.339    0.000    0.339    0.000 {method 'append' of 'list' objects}
            1    0.012    0.012    0.012    0.012 {built-in method builtins.sum}
            4    0.003    0.001    0.003    0.001 {built-in method builtins.print}
            1    0.000    0.000    0.000    0.000 code.py:12(timefn)
            1    0.000    0.000    0.000    0.000 functools.py:44(update_wrapper)
            4    0.000    0.000    0.000    0.000 {built-in method time.time}
            1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:989(_handle_fromlist)
            4    0.000    0.000    0.000    0.000 {built-in method builtins.len}
            7    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
            1    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
            5    0.000    0.000    0.000    0.000 {built-in method builtins.setattr}
            1    0.000    0.000    0.000    0.000 functools.py:74(wraps)
            1    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
            1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    

    可以看到,在代码的入口处总共花费了 30.381 秒,ncalls 为 1,表示只执行了 1 次,然后 calculate_z_serial_purepython 花费了 28.598 秒,可以推断出调用该函数使用了近 2 秒。另外可以看到,abs 函数被调用了 34219980 次。对列表项的 append 操作进行了 2002000 次(1000 * 1000 * 2 +1000 * 2 )。

    接下来,我们进行更深入的分析。

    python -m cProfile -o profile.stats code.py
    

    先生成一个统计文件,然后在 python 中进行分析

    >>> import pstats
    >>> p = pstats.Stats("profile.stats")
    >>> p.sort_stats("cumulative")
    <pstats.Stats object at 0x000002AA0A6A8908>
    >>> p.print_stats()
    Sat Apr 25 16:38:07 2020    profile.stats
    
             36222017 function calls in 30.461 seconds
    
       Ordered by: cumulative time
    
       ncalls  tottime  percall  cumtime  percall filename:lineno(function)
            1    0.000    0.000   30.461   30.461 {built-in method builtins.exec}
            1    0.060    0.060   30.461   30.461 code.py:1(<module>)
            1    1.509    1.509   30.400   30.400 code.py:35(calc_pure_python)
            1    0.000    0.000   28.516   28.516 code.py:13(measure_time)
            1   20.032   20.032   28.515   28.515 code.py:22(calculate_z_serial_purepython)
     34219980    8.483    0.000    8.483    0.000 {built-in method builtins.abs}
      2002000    0.360    0.000    0.360    0.000 {method 'append' of 'list' objects}
            1    0.012    0.012    0.012    0.012 {built-in method builtins.sum}
            4    0.004    0.001    0.004    0.001 {built-in method builtins.print}
            1    0.000    0.000    0.000    0.000 code.py:12(timefn)
            1    0.000    0.000    0.000    0.000 C:UsersITryagainAppDataLocalcondacondaenvs	ensorflow-gpulibfunctools.py:44(update_wrapper)
            4    0.000    0.000    0.000    0.000 {built-in method time.time}
            1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:989(_handle_fromlist)
            7    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
            1    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
            4    0.000    0.000    0.000    0.000 {built-in method builtins.len}
            1    0.000    0.000    0.000    0.000 C:UsersITryagainAppDataLocalcondacondaenvs	ensorflow-gpulibfunctools.py:74(wraps)
            1    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
            5    0.000    0.000    0.000    0.000 {built-in method builtins.setattr}
            1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    
    
    <pstats.Stats object at 0x000002AA0A6A8908>
    

    这里,就生成了与上面一致的信息

    >>> p.print_callers()
       Ordered by: cumulative time
    
    Function                                                                                              was called by...
                                                                                                              ncalls  tottime  cumtime
    {built-in method builtins.exec}                                                                       <-
    code.py:1(<module>)                                                                                   <-       1    0.060   30.461  {built-in method builtins.exec}
    code.py:35(calc_pure_python)                                                                          <-       1    1.509   30.400  code.py:1(<module>)
    code.py:13(measure_time)                                                                              <-       1    0.000   28.516  code.py:35(calc_pure_python)
    code.py:22(calculate_z_serial_purepython)                                                             <-       1   20.032   28.515  code.py:13(measure_time)
    {built-in method builtins.abs}                                                                        <- 34219980    8.483    8.483  code.py:22(calculate_z_serial_purepython)
    {method 'append' of 'list' objects}                                                                   <- 2002000    0.360    0.360  code.py:35(calc_pure_python)
    {built-in method builtins.sum}                                                                        <-       1    0.012    0.012  code.py:35(calc_pure_python)
    {built-in method builtins.print}                                                                      <-       1    0.000    0.000  code.py:13(measure_time)
                                                                                                                   3    0.003    0.003  code.py:35(calc_pure_python)
    code.py:12(timefn)                                                                                    <-       1    0.000    0.000  code.py:1(<module>)
    C:UsersITryagainAppDataLocalcondacondaenvs	ensorflow-gpulibfunctools.py:44(update_wrapper)  <-       1    0.000    0.000  code.py:12(timefn)
    {built-in method time.time}                                                                           <-       2    0.000    0.000  code.py:13(measure_time)
                                                                                                                   2    0.000    0.000  code.py:35(calc_pure_python)
    <frozen importlib._bootstrap>:989(_handle_fromlist)                                                   <-       1    0.000    0.000  code.py:1(<module>)
    {built-in method builtins.getattr}                                                                    <-       7    0.000    0.000  C:UsersITryagainAppDataLocalcondacondaenvs	ensorflow-gpulibfunctools.py:44(update_wrapper)
    {built-in method builtins.hasattr}                                                                    <-       1    0.000    0.000  <frozen importlib._bootstrap>:989(_handle_fromlist)
    {built-in method builtins.len}                                                                        <-       2    0.000    0.000  code.py:22(calculate_z_serial_purepython)
                                                                                                                   2    0.000    0.000  code.py:35(calc_pure_python)
    C:UsersITryagainAppDataLocalcondacondaenvs	ensorflow-gpulibfunctools.py:74(wraps)           <-       1    0.000    0.000  code.py:12(timefn)
    {method 'update' of 'dict' objects}                                                                   <-       1    0.000    0.000  C:UsersITryagainAppDataLocalcondacondaenvs	ensorflow-gpulibfunctools.py:44(update_wrapper)
    {built-in method builtins.setattr}                                                                    <-       5    0.000    0.000  C:UsersITryagainAppDataLocalcondacondaenvs	ensorflow-gpulibfunctools.py:44(update_wrapper)
    {method 'disable' of '_lsprof.Profiler' objects}                                                      <-
    
    
    <pstats.Stats object at 0x000002AA0A6A8908>
    

    这里,我们可以看到,在每一行最后会有调用这部分的父函数名称,这样我们就可以定位到对某一操作最费时的那个函数。

    我们还可以显示那个函数调用了其它函数

    >>> p.print_callees()
       Ordered by: cumulative time
    
    Function                                                                                              called...
                                                                                                              ncalls  tottime  cumtime
    {built-in method builtins.exec}                                                                       ->       1    0.060   30.461  code.py:1(<module>)
    code.py:1(<module>)                                                                                   ->       1    0.000    0.000  <frozen importlib._bootstrap>:989(_handle_fromlist)
                                                                                                                   1    0.000    0.000  code.py:12(timefn)
                                                                                                                   1    1.509   30.400  code.py:35(calc_pure_python)
    code.py:35(calc_pure_python)                                                                          ->       1    0.000   28.516  code.py:13(measure_time)
                                                                                                                   2    0.000    0.000  {built-in method builtins.len}
                                                                                                                   3    0.003    0.003  {built-in method builtins.print}
                                                                                                                   1    0.012    0.012  {built-in method builtins.sum}
                                                                                                                   2    0.000    0.000  {built-in method time.time}
                                                                                                             2002000    0.360    0.360  {method 'append' of 'list' objects}
    code.py:13(measure_time)                                                                              ->       1   20.032   28.515  code.py:22(calculate_z_serial_purepython)
                                                                                                                   1    0.000    0.000  {built-in method builtins.print}
                                                                                                                   2    0.000    0.000  {built-in method time.time}
    code.py:22(calculate_z_serial_purepython)                                                             -> 34219980    8.483    8.483  {built-in method builtins.abs}
                                                                                                                   2    0.000    0.000  {built-in method builtins.len}
    {built-in method builtins.abs}                                                                        ->
    {method 'append' of 'list' objects}                                                                   ->
    {built-in method builtins.sum}                                                                        ->
    {built-in method builtins.print}                                                                      ->
    code.py:12(timefn)                                                                                    ->       1    0.000    0.000  C:UsersITryagainAppDataLocalcondacondaenvs	ensorflow-gpulibfunctools.py:44(update_wrapper)
                                                                                                                   1    0.000    0.000  C:UsersITryagainAppDataLocalcondacondaenvs	ensorflow-gpulibfunctools.py:74(wraps)
    C:UsersITryagainAppDataLocalcondacondaenvs	ensorflow-gpulibfunctools.py:44(update_wrapper)  ->       7    0.000    0.000  {built-in method builtins.getattr}
                                                                                                                   5    0.000    0.000  {built-in method builtins.setattr}
                                                                                                                   1    0.000    0.000  {method 'update' of 'dict' objects}
    {built-in method time.time}                                                                           ->
    <frozen importlib._bootstrap>:989(_handle_fromlist)                                                   ->       1    0.000    0.000  {built-in method builtins.hasattr}
    {built-in method builtins.getattr}                                                                    ->
    {built-in method builtins.hasattr}                                                                    ->
    {built-in method builtins.len}                                                                        ->
    C:UsersITryagainAppDataLocalcondacondaenvs	ensorflow-gpulibfunctools.py:74(wraps)           ->
    {method 'update' of 'dict' objects}                                                                   ->
    {built-in method builtins.setattr}                                                                    ->
    {method 'disable' of '_lsprof.Profiler' objects}                                                      ->
    
    
    <pstats.Stats object at 0x000002AA0A6A8908>
    

    line_profiler 逐行分析

    前面我们通过 cProfile 来对代码进行了整体的分析,当我们确定了耗时多的函数后,想对该函数进行进一步分析时,就可以使用 line_profiler 了。

    先安装

    pip install line_profiler
    或
    conda install line_profiler
    

    在需要测试的函数前面加上修饰器 @profile,然后命令函输入

    kernprof -l -v code.py
    

    输出

    Wrote profile results to code.py.lprof
    Timer unit: 1e-07 s
    
    Total time: 137.019 s
    File: code.py
    Function: calculate_z_serial_purepython at line 23
    
    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
        23                                           @profile
        24                                           def calculate_z_serial_purepython(maxiter, zs, cs):
        25         1      89776.0  89776.0      0.0      output = [0] * len(zs)
        26   1000001    9990393.0     10.0      0.7      for i in range(len(zs)):
        27   1000000    9244029.0      9.2      0.7          n = 0
        28   1000000   10851654.0     10.9      0.8          z = zs[i]
        29   1000000   10242762.0     10.2      0.7          c = cs[i]
        30  34219980  558122806.0     16.3     40.7          while abs(z) < 2 and n < maxiter:
        31  33219980  403539388.0     12.1     29.5              z = z * z + c
        32  33219980  356918574.0     10.7     26.0              n += 1
        33   1000000   11186107.0     11.2      0.8          output[i] = n
        34         1         12.0     12.0      0.0      return output
    

    运行时间比较长。。不过,这里可以发现,耗时的操作主要都在 while 循环中,做判断的耗时最长,但是这里我们并不知道是 abs(z) < 2 还是 n < maxiter 更花时间。z 与 n 的更新也比较花时间,这是因为在每次循环时, Python 的动态查询机制都在工作。

    那么,这里可以通过 timeit 来进行测试

    In [1]: z = 0 + 0j
    
    In [2]: %timeit abs(z) < 2
    357 ns ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
    
    In [3]: n = 1
    
    In [4]: maxiter = 300
    
    In [5]: %timeit n < maxiter
    119 ns ± 6.91 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
    

    可以看到,n < maxiter 所需时间更短,并且每301次会有一次 False,而 abs(z) < 2 为 False 的次数我们并不好估计,占比约为前面图片中白色部分所占比例。因此,我们可以假设交换两条语句的顺序可以使得程序运行速度更快。

    Total time: 132.816 s
    File: code.py
    Function: calculate_z_serial_purepython at line 23
    
    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
        23                                           @profile
        24                                           def calculate_z_serial_purepython(maxiter, zs, cs):
        25         1      83002.0  83002.0      0.0      output = [0] * len(zs)
        26   1000001    9833163.0      9.8      0.7      for i in range(len(zs)):
        27   1000000    9241272.0      9.2      0.7          n = 0
        28   1000000   10667576.0     10.7      0.8          z = zs[i]
        29   1000000   10091308.0     10.1      0.8          c = cs[i]
        30  34219980  531157092.0     15.5     40.0          while n < maxiter and abs(z) < 2:
        31  33219980  393275303.0     11.8     29.6              z = z * z + c
        32  33219980  352964180.0     10.6     26.6              n += 1
        33   1000000   10851379.0     10.9      0.8          output[i] = n
        34         1         11.0     11.0      0.0      return output
    

    可以看到,确实是有所优化。

    小节

    从开始学习编程到现在差不多快 3 年了,之前可以说是从来没有利用这些工具来对代码性能进行过分析,最多也只是通过算法复杂度的分析来进行优化,接触了这些之后就感觉,需要学习的东西还有很多。在近期进行的华为软挑中,队友也曾对代码(C++)的运行时间进行过分析,如下图。

    下篇将介绍对运行时内存的分析。

    参考

    1. 《Python 高性能编程》
  • 相关阅读:
    [HihoCoder1259]A Math Problem
    [POJ1205]Water Treatment Plants
    [HDU5492]Find a path
    [USACO08JAN]Cell Phone Network
    [CodeForces-543D]Road Improvement
    [HAOI2012]外星人
    [CodeForces-869C]The Intriguing Obsession
    [CodeChef-CAPTCITI]Snakes capturing the Mongoose Cities
    CF741D Arpa’s letter-marked tree and Mehrdad’s Dokhtar-kosh paths
    Luogu P4095 [HEOI2013]Eden的新背包问题
  • 原文地址:https://www.cnblogs.com/csu-lmw/p/12774328.html
Copyright © 2020-2023  润新知