对于这种时间格式:發表於: 星期一 五月 28, 2012 6:59 am
import re INPUT = "發表於: 星期一 五月 28, 2012 6:59 am 文章主題: 對《大話新聞》改組的誠心思考/蔬菜麵" pattern = re.compile(r'[d]+') b = re.findall(pattern, INPUT) a = INPUT.split(' ') monthdict = {"一月": "01","二月": "02", "三月": "01", "四月": "04", "五月": "05", "六月": "06", "七月": "07", "八月": "08", "九月": "09", "十月": "10", "十一月": "11", "十二月": "12"} year = a[4] month = monthdict[a[2]] day = b[0] if a[6] == 'pm': hour = int(b[2].encode('utf-8')) + 12 hour= b[2] min = b[3] OUTPUT = "%s-%s-%s %s:%s:00"% (year, month, day, hour, min) print OUTPUT
对于这种正常的时间格式 http://www.cdnews.com.tw 2015-11-02 17:33:55
import re INPUT="http://www.cdnews.com.tw 2015-11-02 17:33:55" pattern = re.compile(r'[d]+') a = re.findall(pattern, INPUT) year = a[0] month = a[1] day = a[2] hour = a[3] minute = a[4] second = a[5] OUTPUT = "%s-%s-%s %s:%s:%s" % (year,month,day,hour,minute,second) print OUTPUT
对于这种时间格式 發表於: 星期三 十二月 14, 2016 6:45 pm
import re INPUT = "發表於: 星期三 十二月 14, 2016 6:45 pm" pattern = re.compile(r'[d]+') b = re.findall(pattern, INPUT) a = INPUT.split(' ') monthdict = {"一月": "01","二月": "02", "三月": "01", "四月": "04", "五月": "05", "六月": "06","七月": "07", "八月": "08", "九月": "09", "十月": "10", "十一月": "11", "十二月": "12"} year = a[4] month = monthdict[a[2]] day = b[0] if a[6] == 'pm': hour = int(b[2].encode('utf-8')) + 12 elif a[6] == 'am': h = int(b[2]) if h >= 10: hour = h elif h<10: hour= "0"+b[2] min = b[3] OUTPUT = "%s-%s-%s %s:%s:00"% (year, month, day, hour, min) print OUTPUT