• python 使用multiprocessing需要注意的问题


    我们在编写程序的时候经常喜欢这样写代码

    import MySQLdb
    import time
    from multiprocessing import Process
    
    conn = MySQLdb.connect(localhost, vearne, xx, test)
    
    def f(name):
        for i in xrange(10):
            cursor = conn.cursor()
            sql = "insert into car(name) values(%s)"
            param = [(name)]
            print param
            #time.sleep(1)
            n = cursor.execute(sql,param)
            cursor.close()
            conn.commit()
    
    if __name__ == __main__:
        for i in xrange(10):
            p = Process(target=f, args=(bob,))
            p.start()

    上面的程序有问题吗? 
    以上的程序在单进程的情况下,应该是没有问题,但是在多进程的情况下,它是有错误的。

    首先看看下面的源码

    class Process(object):
        ‘‘‘
        Process objects represent activity that is run in a separate process
    
        The class is analagous to `threading.Thread`
        ‘‘‘
        _Popen = None
    
        def __init__(self, group=None, target=None, name=None, args=(), kwargs={}):
            assert group is None, group argument must be None for now
            count = _current_process._counter.next()
            self._identity = _current_process._identity + (count,)
            self._authkey = _current_process._authkey
            self._daemonic = _current_process._daemonic
            self._tempdir = _current_process._tempdir
            self._parent_pid = os.getpid()
            self._popen = None
            self._target = target
            self._args = tuple(args)
            self._kwargs = dict(kwargs)
            self._name = name or type(self).__name__ + ‘-‘ +                      ‘:‘.join(str(i) for i in self._identity)
    
        def run(self):
            ‘‘‘
            Method to be run in sub-process; can be overridden in sub-class
            ‘‘‘
            if self._target:
                self._target(*self._args, **self._kwargs)
    
        def start(self):
            ‘‘‘
            Start child process
            ‘‘‘
            assert self._popen is None, cannot start a process twice
            assert self._parent_pid == os.getpid(),                can only start a process object created by current process
            assert not _current_process._daemonic,                daemonic processes are not allowed to have children
            _cleanup()
            if self._Popen is not None:
                Popen = self._Popen
            else:
                from .forking import Popen
            self._popen = Popen(self)   # -- 创建 Popen 对象 --
            _current_process._children.add(self)
            #  省略部分代码 ... ...
        def _bootstrap(self):    # -- _bootstrap 函数 --
            from . import util
            global _current_process
    
            try:
                self._children = set()
                self._counter = itertools.count(1)
                try:
                    sys.stdin.close()
                    sys.stdin = open(os.devnull)
                except (OSError, ValueError):
                    pass
                _current_process = self
                util._finalizer_registry.clear()
                util._run_after_forkers()
                util.info(child process calling self.run()‘)try:self.run()# -- 调用run函数 --
                    exitcode =0finally:
                    util._exit_function()exceptSystemExit, e:ifnot e.args:
                    exitcode =1elif isinstance(e.args[0],int):
                    exitcode = e.args[0]else:
                    sys.stderr.write(str(e.args[0])+
    )
                    sys.stderr.flush()
                    exitcode =0if isinstance(e.args[0], str)else1except:
                exitcode =1import traceback
                sys.stderr.write(Process%s:
    %self.name)
                sys.stderr.flush()
                traceback.print_exc()
    
            util.info(process exiting with exitcode %d% exitcode)return exitcode

    from .forking import Popen 定义

        class Popen(object):
    
            def __init__(self, process_obj):
                sys.stdout.flush()
                sys.stderr.flush()
                self.returncode = None
    
                self.pid = os.fork()     # -- fork子进程 --
                # fork 函数调用一次,会在返回两次,一次在父进程中返回,返回的pid 值大于0      
                # 一次在子进程中返回,返回的pid值 等于 0
                if self.pid == 0:        # pid值 等于 0 说明 以下代码都是在子进程中执行的 
                    if random in sys.modules:
                        import random
                        random.seed()
                    code = process_obj._bootstrap() # -- 调用_bootstrap函数 --
                    sys.stdout.flush()
                    sys.stderr.flush()
                    os._exit(code)

    从代码中我们可以看出,python 的multiprocessing 使用fork创建子进程,并在子进程中执行run函数

    man fork

    可以得到如下信息

    Fork() causes creation of a new process.  The new process (child process) is an exact copy of the calling process (parent process) except for the following:
               o   The child process has a unique process ID.
               o   The child process has a different parent process ID (i.e., the process ID of the parent process).
               o   The child process has its own copy of the parents descriptors.  These descriptors reference the same underlying objects, so that, for instance, file pointers in file objects are shared between the child and the parent, so that an lseek(2) on a descriptor in the child process can affect a subsequent read or write by the parent.  This descriptor copying is also used by the shell to establish standard input and output for newly created processes as well as to set up pipes.
               o   The child processes resource utilizations are set to 0; see setrlimit(2).

    fork 函数创建的子进程是父进程的完全拷贝,他们拥有相同的文件描述符。这样如果在父进程中创建了连接,就会出现父进程和多个子进程公用一个连接,会出现无法预料的错误。 
    (每个连接都有独立的读缓冲区和写缓冲区,多个进程的对读缓冲区和写缓冲区的操作会导致数据混乱)

    所以我们应该在子进程中创建连接,这样就能够避免问题的发生。

    import MySQLdb
    import time
    from multiprocessing import Process
    
    class SLWorker(Process):
        def __init__(self):
            super(SLWorker, self).__init__()
            self.conn = None
    
        def run(self):
            # *** 注意这里 *** 连接延迟加载, 也就是说连接在子进程中被创建
            if self.conn ==  None:  
                self.conn = MySQLdb.connect(localhost, vearne, xxx, test)
            for i in xrange(10):
                cursor = self.conn.cursor()
                sql = "insert into car(name) values(%s)"
                name = "bob"
                param = [(name)]
                print param
                #time.sleep(30)
                n = cursor.execute(sql,param)
                cursor.close()
                self.conn.commit()
        def __del__(self):
            if self.conn != None:
                self.conn.close()
    
    if __name__ == __main__:
        ll = []
        for i in xrange(10):
            p = SLWorker()
            p.start()
            ll.append(p)
        for p in ll:
            p.join()

    答案归结为只需要将在子进程中创建连接,或者连接延迟创建就能够解决这个问题 
    其实现在有很多连接池都是延迟创建连接,没有仔细看,有研究的分享下。

    PS: celery 和rq 也都会有这样的问题,请大家引起足够重视

    版权声明:本文为博主原创文章,未经博主允许不得转载。

    python 使用multiprocessing需要注意的问题

    标签:python   多进程   

    原文地址:http://blog.csdn.net/woshiaotian/article/details/46892689

  • 相关阅读:
    ASP.NET MVC Ajax下载文件(使用NPOI向现有的excel模板文件里面添加数据)
    Devexpress MVC DateEdit 设置默认的Time
    SQL 行转列(列的值不规则的数目)
    靶机Cyberry
    PHP-Audit-Labs-Day1
    DASCTF七月赛两道Web题复现
    靶机BlackMarket
    靶机CH4INRULZ_v1.0.1
    Kali中John的使用方法
    虚拟机中桥接模式和NAT模式以及仅主机模式的区别
  • 原文地址:https://www.cnblogs.com/ExMan/p/10143033.html
Copyright © 2020-2023  润新知