当我们处理大规模数据如ImageNet的时候,单进程显得很吃力耗时,且不能充分利用多核CPU计算机的资源。因此需要使用多进程对数据进行并行处理,然后将结果合并即可。以下给出的是多进程处理的demo代码,如需要应用到实际应用中,则需要自己实现target_function函数,并且传args即可。
#coding=utf-8
from multiprocessing import Process
def target_function(index,sublist): print index,sublist if __name__=="__main__": TXT_FILE = "path/to/imagelist.txt" n_processes = 50 #number of processes f = open(TXT_FILE,'r') image_list = f.readlines() f.close() n_total = len(image_list) length = float(n_total) / float(n_processes) indices = [int(round(i* length)) for i in range(n_processes)] sublists = [image_list[indices[i]:indices[i+1]] for i in range(n_processes)] processes = [Process(target=target_function,args=(i,x)) for i,x in enumerate(sublists)] for p in processes: p.start() for p in processes: p.join()