• 协同过滤CF算法之入门


    数据规整

    首先将评分数据从 ratings.dat 中读出到一个 DataFrame 里:

    >>> import pandas as pd

    In [2]: import pandas as pd

    In [3]: df = pd.read_csv('2014-12-18.csv')

    In [4]: df.head()
    Out[4]:
    user_id item_id behavior_type user_geohash item_category hour
    0 100268421 284019855 1 95ridd7 1863 19
    1 109802727 56489946 1 NaN 8291 10
    2 109802727 56489946 1 NaN 8291 10
    3 109802727 266907147 1 99ctk96 9117

     

    >>> data = ratings.pivot(index='user_id',columns='movie_id',values='rating')

    >>> data[:5]
    movie_id  1   2   3   4   5   6 
    user_id                                                                       
    1          5 NaN NaN NaN NaN NaN ...
    2        NaN NaN NaN NaN NaN NaN ...
    3        NaN NaN NaN NaN NaN NaN ...
    4        NaN NaN NaN NaN NaN NaN ...
    5        NaN NaN NaN NaN NaN   2 ...
     

    >>> check_size = 1000

    >>> check = {}
    >>> check_data = data.copy()#复制一份 data 用于检验,以免篡改原数据
    >>> check_data = check_data.ix[check_data.count(axis=1)>200]#滤除评价数小于200的用户
    >>> for user in np.random.permutation(check_data.index):
            movie = np.random.permutation(check_data.ix[user].dropna().index)[0]
            check[(user,movie)] = check_data.ix[user,movie]
            check_data.ix[user,movie] = np.nan
            check_size -= 1
            if not check_size:
                break
     
    >>> corr = check_data.T.corr(min_periods=200)
    >>> corr_clean = corr.dropna(how='all')
    >>> corr_clean = corr_clean.dropna(axis=1,how='all')#删除全空的行和列
    >>> check_ser = Series(check)#这里是被提取出来的 1000 个真实评分
    >>> check_ser[:5]
    (15593)     4
    (23555)     3
    (333363)    4
    (362355)    5
    (533605)    4
    dtype: float64
     

    参考:

    Python 基于协同过滤的推荐

    利用python的theano库刷kaggle mnist排行榜

    每天一小步,人生一大步!Good luck~
  • 相关阅读:
    automaticallyAdjustsScrollViewInsets的作用
    UIView的一些常用属性和方法
    iOS sqlite 增删改查 简单封装(基于 FMDB)
    iOS 状态栏管理
    UINavigationController 总结
    storyboard 总结
    NSPredicate 谓词
    AFNetworking 简单应用
    iOS 网络编程
    归档 NSKeyedArchiver
  • 原文地址:https://www.cnblogs.com/jkmiao/p/4443968.html
Copyright © 2020-2023  润新知