基于tornado的文件上传demo

这里，web框架是tornado的4.0版本，文件上传组件，是用的bootstrap-fileinput。

这个小demo，是给合作伙伴提供的，模拟APP上摄像头拍照，上传给后台服务进行图像识别用，识别结果OK，则告知客户端不需要继续上传图片了，若结果不ok，则还要继续上传。利用的是每次上传的图片，拍摄的角度或者光线什么的可能不同，丰富后台识别系统识别的判决依据。

还有一点，要注意的是，一般基于http请求操作，都是基于session操作，我们要标识多次图像上传的操作，对应的是同一个业务流，怎么做到呢？就好比用户登录http后台服务器后，通过session保持住一个会话，直到用户退出。在这个应用场景中，可能一个session中存在多个业务流，如何区分？是有办法的，同样利用http session的原理，只是，在我们这个demo里面，我利用时间戳的方式，即每次前端上传图片的时候都带上一个timestamp的字段，这个值在前端通过js生成。当图片识别结果是OK的时候，就将这个timestamp进行刷新，否则就继续保持timestamp的值不变。

web后端服务，采用的是多进程方式，因为python的GIL（全局解析锁）的缘故，无法利用多线程发挥并发优势。故而采用了多进程。多进程要做的事情包括：

1> 接收客户端上传的图像数据，写文件，保存以备后续做学习素材。

2> 处理图像识别的逻辑，处理的结果写入共享数据区。

说到这里，基于tornado的web应用，在接收http请求的时候，这个处理http请求的过程，其实也是一个进程。所以，这个demo就相当于是3个进程之间的协助了。多进程协助，就必须考虑同步和资源共享的问题。

《一》先将web后端的服务代码贴上来，然后给予一些解说，方便读者理解：

 1 #!/usr/bin/env python
 2 #-*- coding:utf-8 -*-
 3 #__author__ "shihuc"
 4 
 5 import tornado.ioloop
 6 import tornado.web
 7 import os
 8 import json
 9 import multiprocessing
10 
11 import aibusiness
12 
13 procPool = multiprocessing.Pool()
14 
15 class MainHandler(tornado.web.RequestHandler):
16     def get(self):
17         self.render("uploadAI.html")
18 
19 class UploadHandler(tornado.web.RequestHandler):
20 
21     def post(self,*args,**kwargs):
22         file_metas=self.request.files['tkai_file']                         #提取表单中‘name’为‘tkai_file’的文件元数据
23         timestamp = self.get_argument("sequence")
24         xsrf = self.get_argument("_xsrf")
25 
26         res = {}
27         #注意，只会有一个文件在每次http请求中
28         for meta in file_metas:
29             filename=meta['filename']
30             procPool.apply_async(aibusiness.doWriteImageJob, (filename, meta['body'],))
31             p = multiprocessing.Process(target=aibusiness.doRecJob, args=(timestamp, meta['body'],))
32             p.start()
33             p.join()
34         retVal = aibusiness.reportResult(timestamp)
35         print "timestamp: %s, xrsf: %s, res: %s, filename: %s
" % (timestamp, xsrf, retVal, filename)
36         res['result'] = retVal
37         self.write(json.dumps(res))
38 
39 
40 
41 settings = {
42     'template_path': 'page',          # html文件
43     'static_path': 'resource',        # 静态文件（css,js,img）
44     'static_url_prefix': '/resource/',# 静态文件前缀
45     'cookie_secret': 'shihuc',        # cookie自定义字符串加盐
46     'xsrf_cookies': True              # 防止跨站伪造
47 }
48 
49 def make_app():
50     return tornado.web.Application([
51         (r"/", MainHandler),(r"/upload", UploadHandler)
52     ], default_host='',transforms=None, **settings)
53 
54 if __name__ == "__main__":
55     app = make_app()
56     app.listen(9909)
57     tornado.ioloop.IOLoop.current().start()
58     procPool.close()

针对上面的代码，我简单的加以解释说明：

a>本demo中，多进程中，接受图像并写入文件的过程，采用的是进程池。注意第13行，我定义全局的变量procPool的时候，multiprocessing.Pool()，没有指定参数，默认会根据当前主机的cpu核数决定启动几个进程。

b>图像识别的处理过程，采用的是来一个请求就启动一个进程的方式。这里的图像识别的处理逻辑，采用的是模拟的过程，用生成随机数的方式替代，相关的逻辑，在aiprocess这个模块中，后面将会附上代码。

c>开41行，这里的settings，给tornado的web应用程序指定基本的配置信息，这里有web应用的页面显示文件的存放路径，以及html文件里面用到的资源文件的存放路径，还有安全防御相关的配置。

比如html文件存放路径，这里是page目录；资源文件（css,js,image等）的根目录在resource下面。
安全相关的，cookie字符串加密过程中添加了自定义的盐；防止跨站请求伪造(CSRF)的功能开关是否开启，在tornado框架下，csrf被叫作xsrf了，本例中，xsrf开关是开启的。

d>多进程之间的同步，这里，主要需要考虑的是http接收消息的进程与图形识别进程之间的同步，因为识别后的结果要返回给客户端，所以，接收消息的进程必须要等待图形识别进程执行关闭。这里，这个同步，主要利用的是33行的代码join完成的。

e>看26,36,37行的代码，这里要注意，http处理函数post结束后，必须放回json格式的结果给客户端。因为这个是bootstrap-fileinput框架检查结果要求的。

《二》接下来看看，aiprocess模块的内容

 1 #!/usr/bin/env python
 2 #-*- coding:utf-8 -*-
 3 #__author__ "shihuc"
 4 
 5 import os
 6 import json
 7 import random
 8 import multiprocessing
 9 
10 
11 #记录同一个业务请求对应的上传的图片数量，key是前端传来的timestamp，value是对应该
12 #timestamp值的图片处理结果，一个list。
13 timestamp_filecount_map = multiprocessing.Manager().dict()
14 
15 procLock = multiprocessing.Lock()
16 procEvent = multiprocessing.Event()
17 
18 upload_path=os.path.join(os.path.dirname(__file__),'uploadfiles')  #文件的暂存路径
19 
20 def doWriteImageJob(filename, imgData):
21        """ 1. Add your business logic here, write image data as file! 
22        """
23        #Below do result update
24        filepath=os.path.join(upload_path,filename)
25        with open(filepath,'wb') as up:                                #有些文件需要已二进制的形式存储，实际中可以更改
26             up.write(imgData)
27 
28 def doRecJob(timestamp, imgData):
29        """ 1. Add your business logic here, for example, image recognization! 
30            2. After image rec process, you must update the timestamp_filecount_map
31        to check the next final result in the next step.
32        """
33        #Here, do recognization, simulate the result by random
34        procLock.acquire()
35        result = random.randrange(0, 10, 1)
36        #Below do result update
37        res = []
38        if timestamp_filecount_map.get(str(timestamp)) is None:
39           res.append(result)
40        else:
41           res = timestamp_filecount_map.get(str(timestamp))
42           res.append(result)
43        timestamp_filecount_map[str(timestamp)] = res
44        print timestamp_filecount_map
45        procLock.release()
46 
47 
48 def reportResult(timestamp):
49        """ Add your business logic here, check whether the result is ok or not. 
50        Here, I will simulate the logic that check the existing result whether it
51        is accepted as OK, e.g. the present of image with same result is no less
52        80%, which is defined to be OK.
53        """
54        #Here, simulation. check if all the result, if there is 80% image whose result 
55        #is no less 2, then the final is OK.
56        procLock.acquire()
57        tempCnt = 0
58        try:
59            detail_info = timestamp_filecount_map.get(str(timestamp))
60            if detail_info is None:
61               return "OK"
62            else:
63               for elem in detail_info:
64                  if elem >= 2:
65                      tempCnt += 1
66               if tempCnt >= len(detail_info) * 0.8:
67                  del timestamp_filecount_map[str(timestamp)]
68                  return "OK"
69               else:
70                  return "NOK"
71        finally:
72            procLock.release()

上述代码，有几点需要解释说明：

1>进程之间的同步问题，用到多进程的Lock，例如代码15行 procLock = multiprocessing.Lock()。每次进程操作的时候，对该代码逻辑进行锁定，因为都在操作共享资源timestamp_filecount_map这个结构，加锁可以保证数据操作的完整性，避免出现脏读现象。

2>进程之间的共享，一定要用多进程模块的Manager生成相应的数据结构。例如代码13行timestamp_filecount_map = multiprocessing.Manager().dict()。否则，假若用一般的字典结构，例如下面： timestamp_filecount_map = {}，那么，在进程之间，就无法传递共享的数据，典型的测试结果就是每次在调研reportResult的时候，执行到第59行时，返回的detail_info都是None。

3>上面的代码，处理图像识别的逻辑，是通过生成随机数来模拟的，随机数大于2，表示识别结果是OK的。最终叛变一个业务流是否OK，就是看随机数列表中，不小于2的数的个数是不是不小于随机数总数的80%，是则OK，否则NOK。

《三》看看基于bootstrap-fileinput的前端

 1 <!doctype html>
 2 <html>
 3 <head>
 4     <meta charset="UTF-8">
 5     <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> 
 6     <meta name="viewport" content="width=device-width, initial-scale=1.0">
 7     <title>Python多进程DEMO</title>
 8     <link href="{{static_url('css/bootstrap.min.css')}}" rel="stylesheet">
 9     <link rel="stylesheet" type="text/css" href="{{static_url('css/default.css')}}">
10     <link href="{{static_url('css/fileinput.css')}}" media="all" rel="stylesheet" type="text/css" />    
11     <script src="{{static_url('js/jquery-2.1.1.min.js')}}"></script>
12     <script src="{{static_url('js/fileinput.js')}}" type="text/javascript"></script>
13     <script src="{{static_url('js/bootstrap.min.js')}}" type="text/javascript"></script>
14     <script src="{{static_url('js/bootbox.js')}}" type="text/javascript"></script>
15 </head>
16 <body>
17     <div class="htmleaf-container">
18         <div class="container kv-main">
19            <div class="page-header">
20              <h2>Python concurrency demo<small></h2>
21            </div>
22            <form enctype="multipart/form-data" method="post">
23               <div class="form-group">
24                   {% module xsrf_form_html() %}
25                   <input type="file" name="tkai_file" id="tkai_input" multiple>
26               </div>
27               <hr>
28            </form>
29         </div>
30     </div>
31     <script>
32             $(document).ready(function() {
33                 if(sessionStorage.image_ai_sequence == null || sessionStorage.image_ai_sequence == undefined){
34                     sessionStorage.image_ai_sequence = Date.parse(new Date());
35                 }
36                 var fileInput= $("#tkai_input").fileinput({
37                         uploadUrl: "/upload",
38                         uploadAsync: true,
39                         maxFileCount: 15,
40                         allowedFileExtensions : ['jpg','jpeg','png','gif'],//允许的文件类型
41                         showUpload: false,                                 //是否显示上传按钮
42                         showCaption: true,                                 //是否显示标题
43                         showPreview: true,
44                         autoReplace: true,
45                         dropZoneEnabled: true,               
46                         uploadExtraData: function() { return {'sequence': sessionStorage.image_ai_sequence, '_xsrf': document.getElementsByName('_xsrf')[0].value}} 
47                     }).on('filepreajax', function(event, previewId, index) {
48                         console.log('previewId:' + previewId + ', index: ' + index + ', seq: ' + sessionStorage.image_ai_sequence);
49                     }).on('filepreupload', function(event, data, previewId, index, jqXHR){
50                         //console.log('filepreupload');
51                     }).on('fileuploaded',function(event, data) {     //单个文件上传成功后的回调
52                         //console.log('fileuploaded');
53                         var res=data.response;
54                         if(res.result == "NOK"){
55                             ;                                        //如果后台处理的结果显示为NOK，说明识别效果没有达到预期，要重新传图片
56                         }else if (res.result == "OK"){
57                             sessionStorage.image_ai_sequence = Date.parse(new Date());       //识别效果达到预期，可以不用再传文件了。
58                             bootbox.alert("Result is acceptable!");
59                         }
60                    }).on('filecustomerror', function(event, params, msg) {
61                         //console.log(params)
62                         //console.log(msg)
63                    }).on('fileclear', function(event,data) {         //删除按钮对应的回调
64                         //console.log(data);
65                    }).on('filebatchuploadsuccess', function(event,data) { //批量上传文件对应的回调
66                         //console.log(data);
67                    });
68             });
69     </script>
70 </body>
71 </html>

对这段代码，也做一些必要的解释说明

1>第8行处，红色部分static_url这个函数，是tornado的模板解释函数，在这里{{static_url('css/bootstrap.min.css')}}，要结合第一部分web后端代码中介绍的settings中的静态资源路径配置信息，本demo中的资源路径前缀是/resource/，所以这个红色部分在模板解析后，全路径就是/resource/css/bootstrap.min.css。上述代码中其他的涉及到static_url的地方，都是如此。资源加载的模板格式都是{{。。。}}这样的，这样用有一个好处，每次加载资源，tornado都会给资源文件添加一个版本号，强制浏览器放弃使用缓存，每次刷新页面，都会重新加载，不会出现因为缓存造成文件内容刷新不及时的问题。

2>第24行代码，这里也是利用了tornado的模板语言，加载了一段代码，生成xsrf相关逻辑的，对应就是添加了一个input元素，已hidden的方式，以name为_xsrf，value为一段tornado生成的字符串，相当于token，是随机的，防止跨站请求伪造用的。提交表单时没有这个值或者这个值和tornado后台的值对不上，都是会拒绝提交的表单的。这里的模板格式是{% 。。。 %}。

3>第33-35行的代码，对应前面说到的标记一个业务流的timestamp标记，当然这个值，可以是后台生成。这里是demo，就前端生成了。这里用到了sessionStorage的存储功能，防止页面刷新导致这个值可能出现的不一致。

4>fileinput插件，多文件上传过程，支持两种文件上传模式，一个是同步的批量上传，一个是异步的一张一张的文件上传。第38行的代码，就是设置为异步的单张文件的上传。这种模式下，后台接收文件的地方，每次http请求到来时，里面只会有一个文件。若是批量上传，则http后台接收文件时，会是一个数组的形式，接收到多个文件。我们的python后台代码，是同时支持单张上传和批量上传的。

5>第46行的代码，利用了fileinput的上传过程支持上传额外数据的能力，即不仅仅上传form表单中的数据，还可以上传用户自定义的数据。这里，通过回调函数的方式设置uploadExtraData，就是为了在每次上传之前，都重新获取一次新数据，防止每次上传的数据都是页面加载时的初始值。

最后，将整个基于tornado的web项目目录结构信息附在这里：

1 [root@localhost demo]# ll
2 总计 20
3 -rw-r--r-- 1 root root 2686 03-09 10:36 aibusiness.py
4 drwxr-xr-x 2 root root 4096 03-10 14:12 page
5 drwxr-xr-x 6 root root 4096 03-03 15:07 resource
6 drwxr-xr-x 2 root root 4096 03-07 17:07 uploadfiles
7 -rw-r--r-- 1 root root 1858 03-07 17:05 web_server.py

项目启动后，从浏览器访问项目，看到的效果如下图

另外，这个demo的所有源文件，我都上传到了github,地址https://github.com/shihuc/fileupload，有兴趣的，可以去参考。

相关阅读:
android videoView 加载等待
 LocalBroadcastManager
sessionStorage 、localStorage
javascript 数组、json连接
 properties 文件注意事项
 nutz 使用beetl
[Git/Github] ubuntu 14.0 下github 配置
 【UNIX环境编程、操作系统】孤儿进程和僵尸进程
 【操作系统】进程间通信
 【操作系统】线程
原文地址：https://www.cnblogs.com/shihuc/p/6530459.html