场景:想把获取的10000张图片,进行特征提取转化。但是单线程跑的话。太慢。
想充分利用计算机性能的话,应该多进程分析。
写一个dummy的例子如下,可以看到我建立了一个pool池,在池中放入了4个进程。然后运行我的dummy函数。
# -*- codeing: utf-8 -*-
import sys
import os
import cv2
import dlib
from multiprocessing import Pool
import time
def exec_feature_work_dummy(index):
print('Run dummy subprocess %s (%s)...' % (index, os.getpid()))
start = time.time()
time.sleep(1)
end = time.time()
print('dummy subprocess %s runs %0.2f seconds.' % (index, (end - start)))
if __name__=='__main__':
print('Parent process %s.' % os.getpid())
p = Pool(10)
### apply 10 subprocesses
p.apply_async(exec_feature_work_dummy, args=(1,))
p.apply_async(exec_feature_work_dummy, args=(1000,))
p.apply_async(exec_feature_work_dummy, args=(2000,))
p.apply_async(exec_feature_work_dummy, args=(3000,))
print('Waiting for all subprocesses done...')
p.close()
p.join()
print('All subprocesses done.')
运行的日志效果如下:
(zai01) zsd@zsd-virtual-machine:~/AIFace_Reg$ python py_batch.py
Parent process 6940.
Waiting for all subprocesses done...
Run dummy subprocess 1 (6956)...
Run dummy subprocess 1000 (6957)...
Run dummy subprocess 2000 (6958)...
Run dummy subprocess 3000 (6959)...
dummy subprocess 1 runs 1.00 seconds.
dummy subprocess 1000 runs 1.00 seconds.
dummy subprocess 2000 runs 1.00 seconds.
dummy subprocess 3000 runs 1.00 seconds.
All subprocesses done.
好,有了上面的例子,我可以进入真题了。重构exec_feature_work_dummy
函数,填写我们真正要跑的程序函数。如下:
# -*- codeing: utf-8 -*-
import sys
import os
import cv2
import dlib
from multiprocessing import Pool
import time
output_dir = './myface_feature_img_batch'
input_dir = './myface'
size = 64
if not os.path.exists(output_dir):
os.makedirs(output_dir)
detector = dlib.get_frontal_face_detector()
print('test!!!')
def exec_feature_work_dummy(index):
print('Run dummy subprocess %s (%s)...' % (index, os.getpid()))
start = time.time()
time.sleep(1)
end = time.time()
print('dummy subprocess %s runs %0.2f seconds.' % (index, (end - start)))
def exec_feature_work(index,boundary):
print('Run fetch feature subprocess %s (%s)...' % (index, os.getpid()))
start = time.time()
for (path, dirnames, filenames) in os.walk(input_dir):
print('for 1')
print(path)
print(dirnames)
for filename in filenames:
if filename.endswith('.jpg') and index < boundary:
# print('Being processed picture %s' % index)
img_path = path+'/'+filename
img = cv2.imread(img_path)
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
dets = detector(gray_img, 1)
for i, d in enumerate(dets):
x1 = d.top() if d.top() > 0 else 0
y1 = d.bottom() if d.bottom() > 0 else 0
x2 = d.left() if d.left() > 0 else 0
y2 = d.right() if d.right() > 0 else 0
face = img[x1:y1,x2:y2]
face = cv2.resize(face, (size,size))
# cv2.imshow('image',face)
cv2.imwrite(output_dir+'/'+str(index)+'.jpg', face)
index += 1
key = cv2.waitKey(30) & 0xff
if key == 27:
sys.exit(0)
end = time.time()
print('Run fetch feature subprocess %s runs %0.2f seconds.' % (index, (end - start)))
if __name__=='__main__':
print('Parent process %s.' % os.getpid())
p = Pool(10)
### apply 10 subprocesses
p.apply_async(exec_feature_work_dummy, args=(1,))
for(x,y) in zip(range(1,10000,1000),range(1000,10000,1000)):
p.apply_async(exec_feature_work, args=(x,y))
print('Waiting for all subprocesses done...')
p.close()
p.join()
print('All subprocesses done.')
现在,就可以看到服务器,在几乎满负荷的运行着。
运行的日志效果如下:
(zai01) zsd@zsd-virtual-machine:~/AIFace_Reg$ python py_batch_advance.py
test!!!
Parent process 7029.
Waiting for all subprocesses done...
Run dummy subprocess 1 (7045)...
Run fetch feature subprocess 1 (7046)...
Run fetch feature subprocess 1001 (7047)...
Run fetch feature subprocess 2001 (7048)...
Run fetch feature subprocess 3001 (7049)...
Run fetch feature subprocess 4001 (7050)...
Run fetch feature subprocess 5001 (7051)...
Run fetch feature subprocess 6001 (7052)...
Run fetch feature subprocess 7001 (7053)...
Run fetch feature subprocess 8001 (7054)...
------------------------------------------------运行的时间分割线--------------------------------------
dummy subprocess 1 runs 1.00 seconds.
Run fetch feature subprocess 3000 runs 354.47 seconds.
Run fetch feature subprocess 2000 runs 356.76 seconds.
Run fetch feature subprocess 9000 runs 357.03 seconds.
Run fetch feature subprocess 1000 runs 358.78 seconds.
Run fetch feature subprocess 7000 runs 358.83 seconds.
Run fetch feature subprocess 6000 runs 361.09 seconds.
Run fetch feature subprocess 8000 runs 361.85 seconds.
Run fetch feature subprocess 4000 runs 363.03 seconds.
Run fetch feature subprocess 5000 runs 371.46 seconds.
All subprocesses done.
从上面的结果,可以看到效果非常的明显,速度比我之前的近一个小时快了许多。