转利用python实现电影推荐

转利用python实现电影推荐
“协同过滤”是推荐系统中的常用技术，按照分析维度的不同可实现“基于用户”和“基于产品”的推荐。

以下是利用python实现电影推荐的具体方法，其中数据集源于《集体编程智慧》一书，后续的编程实现则完全是自己实现的（原书中的实现比较支离、难懂）。

这里我采用的是“基于产品”的推荐方法，因为一般情况下，产品的种类往往较少，而用户的数量往往非常多，“基于产品”的推荐程序可以很好的减小计算量。

其实基本的思想很简单：

首先读入数据，形成用户-电影矩阵，如图所示：矩阵中的数据为用户（横坐标）对特定电影（纵坐标）的评分。

其次根据用户-电影矩阵计算不同电影之间的相关系数（一般用person相关系数），形成电影-电影相关度矩阵。

其次根据电影-电影相关度矩阵，以及用户已有的评分，通过加权平均计算用户未评分电影的预估评分。例如用户对A电影评3分、B电影评4分、C电影未评分，而C电影与A电影、B电影的相关度分别为0.3和0.8，则C电影的预估评分为(0.3*3+0.8*4)/(0.3+0.8)。

最后对于每一位用户，提取其未评分的电影并按预估评分值倒序排列，提取前n位的电影作为推荐电影。

以下为程序源代码，大块的注释还是比较详细的，便于理解各个模块的作用。此外，程序用到了pandas和numpy库，实现起来会比较简洁，因为许多功能如计算相关系数、排序等功能在这些库中已有实现，直接拿来用即可。
[python] view plain copy
1. import pandas as pd
2. import numpy as np
4. #read the data
5. data={'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5,
6. 'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5},
7. 'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5,
8. 'Just My Luck': 1.5, 'The Night Listener': 3.0},
9. 'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0,
10. 'Superman Returns': 3.5, 'The Night Listener': 4.0},
11. 'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0,
12. 'The Night Listener': 4.5, 'You, Me and Dupree': 2.5},
13. 'Mick LaSalle': {'Just My Luck': 2.0, 'Lady in the Water': 3.0,'Superman Returns': 3.0, 'The Night Listener': 3.0, 'You, Me and Dupree': 2.0},
14. 'Jack Matthews': {'Snakes on a Plane': 4.0, 'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5},
15. 'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}}
17. #clean&transform the data
18. data = pd.DataFrame(data)
19. #0 represents not been rated
20. data = data.fillna(0)
21. #each column represents a movie
22. mdata = data.T
24. #calculate the simularity of different movies, normalize the data into [0,1]
25. np.set_printoptions(3)
26. mcors = np.corrcoef(mdata, rowvar=0)
27. mcors = 0.5+mcors*0.5
28. mcors = pd.DataFrame(mcors, columns=mdata.columns, index=mdata.columns)
30. #calculate the score of every item of every user
31. #matrix:the user-movie matrix
32. #mcors:the movie-movie correlation matrix
33. #item:the movie id
34. #user:the user id
35. #score:score of movie for the specific user
36. def cal_score(matrix,mcors,item,user):
37. totscore = 0
38. totsims = 0
39. score = 0
40. if pd.isnull(matrix[item][user]) or matrix[item][user]==0:
41. for mitem in matrix.columns:
42. if matrix[mitem][user]==0:
43. continue
44. else:
45. totscore += matrix[mitem][user]*mcors[item][mitem]
46. totsims += mcors[item][mitem]
47. score = totscore/totsims
48. else:
49. score = matrix[item][user]
50. return score
52. #calculate the socre matrix
53. #matrix:the user-movie matrix
54. #mcors:the movie-movie correlation matrix
55. #score_matrix:score matrix of movie for different users
56. def cal_matscore(matrix,mcors):
57. score_matrix = np.zeros(matrix.shape)
58. score_matrix = pd.DataFrame(score_matrix, columns=matrix.columns, index=matrix.index)
59. for mitem in score_matrix.columns:
60. for muser in score_matrix.index:
61. score_matrix[mitem][muser] = cal_score(matrix,mcors,mitem,muser)
62. return score_matrix
64. #give recommendations: depending on the score matrix
65. #matrix:the user-movie matrix
66. #score_matrix:score matrix of movie for different users
67. #user:the user id
68. #n:the number of recommendations
69. def recommend(matrix,score_matrix,user,n):
70. user_ratings = matrix.ix[user]
71. not_rated_item = user_ratings[user_ratings==0]
72. recom_items = {}
73. #recom_items={'a':1,'b':7,'c':3}
74. for item in not_rated_item.index:
75. recom_items[item] = score_matrix[item][user]
76. recom_items = pd.Series(recom_items)
77. recom_items = recom_items.sort_values(ascending=False)
78. return recom_items[:n]
81. #main
82. score_matrix = cal_matscore(mdata,mcors)
83. for i in range(10):
84. user = input(str(i)+' please input the name of user:')
85. print recommend(mdata,score_matrix,user,2)
相关阅读:
【BZOJ 1598】牛跑步
 【SDOI 2010】魔法猪学院
 【POJ 2449】 Remmarguts' Date
【HDU 3085】 Nightmare Ⅱ
【POJ 3635】 Full Tank
【POJ 2230】 Watchcow
USB设备驱动总结
 经典SQL语句大全
 float型数据与字节数组的转化
 linux下dmesg命令详解
原文地址：https://www.cnblogs.com/onemorepoint/p/8167874.html