原始数据片段展示:
来电,2017/1/5 上午11:55,95599,【中国农业银行】您尾号9672的农行账户于01月05日11时54分完成一笔支付宝交易,金额为-18.00,余额3905.35。,
来电,2017/1/5 下午12:10,95599,【中国农业银行】您尾号9672的农行账户于01月05日12时10分完成一笔现支交易,金额为-200.00,余额3705.35。,
来电,2017/1/5 下午12:35,95599,【中国农业银行】您尾号9672的农行账户于01月05日12时35分完成一笔支付宝交易,金额为-50.00,余额3650.35。,
来电,2017/1/5 下午1:47,95599,【中国农业银行】您尾号9672的农行账户于01月05日13时47分完成一笔支付宝浙交易,金额为-199.00,余额3451.35。,
来电,2017/1/5 下午2:45,95599,【中国农业银行】您尾号9672的农行账户于01月05日14时45分完成一笔消费交易,金额为-199.00,余额3252.35。,
来电,2017/1/5 下午4:21,95599,【中国农业银行】您尾号9672的农行账户于01月05日16时21分完成一笔支付宝浙交易,金额为-329.00,余额2923.35。,
来电,2017/1/5 下午5:56,95599,【中国农业银行】您尾号9672的农行账户于01月05日17时56分完成一笔支付宝交易,金额为-20.00,余额2903.35。,
来电,2017/1/9 上午10:33,106906615500,【京东】还剩最后两天!PLUS会员新年特权,开通立送2000京豆,独享全品类神券,确定要错过? dc.jd.com/auVjQQ 回TD退订,
来电,2017/1/10 下午1:10,106980005618000055,【京东】我是京东配送员:韩富韩,您的订单正在配送途中,请准备收货,联系电话:15005125027。,
来电,2017/1/10 下午3:13,106906615500,【京东】等着放假,忘了您的PLUS账户中还有超过2000待返京豆?现在开通PLUS正式用户即可到账,还可享受高于普通用户10倍的购物回馈,随时京豆拿到手软。另有全年360元运费补贴、专享商品、专属客服等权益。戳 dc.jd.com/XhuKQQ 开通。回TD退订,
(数据来源-手机短信导出CVS格式)
目的
第一阶段的目的:分析基于中国农业银行的短信提醒,基于时间和银行账户余额的一个图表。
二阶段:想办法表现消费原因,消费金额。
三阶段:在处理语言方面可以灵活变动,不是简单地切片处理,而是基于处理自然语言的理解文意
以下是第一阶段的代码。如有问题或建议,欢迎交流!
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Jul 22 22:13:20 2018
@author: mrzhang
"""
import csv
import os
import matplotlib.pyplot as plt
class DealMessage:
def __init__(self):
self.home_path = os.getcwd() # get absolute path
self.filename = self.home_path + "/message.csv"
def get_cvs_list(self):
''' get data for cvs '''
with open(self.filename) as f: # open file
reader = csv.reader(f)
list_read = list(reader)
return list_read
def get_yinghang_message_list(self):
''' del other data likes name, phone and others '''
total_list = self.get_cvs_list()
money_list = []
for each_line in total_list:
if each_line[2] == '95599':
del each_line[0] # remove useless data
del each_line[1]
del each_line[2]
each_line_list = each_line[1][37:].split(',')
each_line_list.insert(0, each_line[0])
money_list.append(each_line_list) # add to a new List
return money_list
def get_type_by_parameter(self, num):
''' there are 2 types of data, use len of data to distinguish it '''
money_list = self.get_yinghang_message_list()
first_list = []
for each in money_list:
if len(each) == num:
first_list.append(each)
return first_list
def deal_time_form(self, messages):
''' transform time form like 1995/02/07/02/23 '''
for each in messages:
correct_time = each[0].split()
date = correct_time[0]
time = correct_time[1]
time = time[2:]
shi, feng = time.split(":")
if time[0:2] == "下":
shi = int(shi) + 12
final_time = date + "/" + str(shi) + "/" + feng
each.insert(0, final_time)
def choose_message_by_time(self, is_before_0223):
''' reduce the difference betwoon different data, deal with time and money at the same time.'''
if is_before_0223:
num = 4
remove_num = 2
else:
num = 3
remove_num = 5
messages = self.get_type_by_parameter(num)
for each in messages:
# deal with time , transform time form like 1995/12/17/02/23
correct_time = each[0].split()
date = correct_time[0]
time = correct_time[1]
time = time[2:]
shi, feng = time.split(":")
if time[0:2] == "下": # transform time-form into 24h-form
shi = int(shi) + 12
final_time = date + "/" + str(shi) + "/" + feng
each.insert(0, final_time)
# deal with money
money = each[-1][remove_num:][0:-1]
each.insert(1, money)
return messages
def get_x_y(self):
''' get money and time '''
messages = self.choose_message_by_time(True)+self.choose_message_by_time(False)
time_list = []
money_list = []
for each in messages:
time_list.append(each[0])
money_list.append(float(each[1]))
return time_list[35::3], money_list
def draw_picture(self):
''' draw a picture about money change '''
x, y = self.get_x_y()
plt.figure(figsize=(16, 4)) # Create figure object
plt.plot(y, 'r') # plot‘s paramter(x,y,color,width)
plt.xlabel("Time")
plt.ylabel("Money")
plt.title("money")
plt.grid(True)
plt.show() # show picture
plt.savefig("line.jpg") # save picture
m = DealMessage() # get a class object
m.draw_picture() # draw picture
程序运行:
随意转载,欢迎交流!