• Python for Data Analysis | MovieLens


    Background

    MovieLens 1M数据集含有来自6000名用户对4000部电影的100万条评分数据。 

    ratings.dat

    UserID::MovieID::Rating::Timestamp

    users.dat

    UserID::Gender::Age::Occupation::Zip-code

    movies.dat

    MovieID::Title::Genres

    通过pandas.read_table将各个表分别读到一个pandas DataFrame对象中。

    * head=None, case-sensitive.  

    In [1]: import pandas as pd
    
    In [2]: unames = ['user_id', 'gender', 'age', 'occupation', 'zip']
    In [3]: users = pd.read_table('C:/Users/I******/Desktop/.../movielens/users.dat', sep='::', header=None, names=unames)
    
    In [4]: rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
    In [5]: ratings = pd.read_table('C:/Users/I******/Desktop/.../movielens/ratings.dat', sep='::', header=None, names=rnames)
    
    In [6]: mnames = ['movie_id', 'title', 'genres']
    In [7]: movies = pd.read_table('C:/Users/I******/Desktop/.../movielens/movies.dat', sep='::', header=None, names=mnames)

    利用Python的切片语法,通过查看每个DataFrame的前几行,验证数据加载工作是否顺利。

    In [8]: users[:5]
    Out[8]:
       user_id gender  age  occupation    zip
    0        1      F    1          10  48067
    1        2      M   56          16  70072
    2        3      M   25          15  55117
    3        4      M   45           7  02460
    4        5      M   25          20  55455
    
    In [9]: ratings[:5]
    Out[9]:
       user_id  movie_id  rating  timestamp
    0        1      1193       5  978300760
    1        1       661       3  978302109
    2        1       914       3  978301968
    3        1      3408       4  978300275
    4        1      2355       5  978824291
    
    In [10]: movies[:5]
    Out[10]:
       movie_id                               title                        genres
    0         1                    Toy Story (1995)   Animation|Children's|Comedy
    1         2                      Jumanji (1995)  Adventure|Children's|Fantasy
    2         3             Grumpier Old Men (1995)                Comedy|Romance
    3         4            Waiting to Exhale (1995)                  Comedy|Drama
    4         5  Father of the Bride Part II (1995)                        Comedy
    
    In [11]: ratings
    Out[11]:
             user_id  movie_id  rating  timestamp
    0              1      1193       5  978300760
    1              1       661       3  978302109
    2              1       914       3  978301968
    3              1      3408       4  978300275
    4              1      2355       5  978824291
    5              1      1197       3  978302268
    6              1      1287       5  978302039
    7              1      2804       5  978300719
    8              1       594       4  978302268
    9              1       919       4  978301368
    10             1       595       5  978824268
    11             1       938       4  978301752
    12             1      2398       4  978302281
    13             1      2918       4  978302124
    14             1      1035       5  978301753
    15             1      2791       4  978302188
    16             1      2687       3  978824268
    17             1      2018       4  978301777
    18             1      3105       5  978301713
    19             1      2797       4  978302039
    20             1      2321       3  978302205
    21             1       720       3  978300760
    22             1      1270       5  978300055
    23             1       527       5  978824195
    24             1      2340       3  978300103
    25             1        48       5  978824351
    26             1      1097       4  978301953
    27             1      1721       4  978300055
    28             1      1545       4  978824139
    29             1       745       3  978824268
    ...          ...       ...     ...        ...
    1000179     6040      2762       4  956704584
    1000180     6040      1036       3  956715455
    1000181     6040       508       4  956704972
    1000182     6040      1041       4  957717678
    1000183     6040      3735       4  960971654
    1000184     6040      2791       4  956715569
    1000185     6040      2794       1  956716438
    1000186     6040       527       5  956704219
    1000187     6040      2003       1  956716294
    1000188     6040       535       4  964828734
    1000189     6040      2010       5  957716795
    1000190     6040      2011       4  956716113
    1000191     6040      3751       4  964828782
    1000192     6040      2019       5  956703977
    1000193     6040       541       4  956715288
    1000194     6040      1077       5  964828799
    1000195     6040      1079       2  956715648
    1000196     6040       549       4  956704746
    1000197     6040      2020       3  956715288
    1000198     6040      2021       3  956716374
    1000199     6040      2022       5  956716207
    1000200     6040      2028       5  956704519
    1000201     6040      1080       4  957717322
    1000202     6040      1089       4  956704996
    1000203     6040      1090       3  956715518
    1000204     6040      1091       1  956716541
    1000205     6040      1094       5  956704887
    1000206     6040       562       5  956704746
    1000207     6040      1096       4  956715648
    1000208     6040      1097       4  956715569
    
    [1000209 rows x 4 columns]

    先用pandas的merge函数将ratings跟users合并到一起,然后再将movies也合并进去。pandas会根据列名的重叠情况推断出哪些列是合并(或连接)键。

      1 In [12]: data = pd.merge(pd.merge(ratings, users), movies)
      2 
      3 In [13]: data
      4 Out[13]:
      5          user_id  movie_id  rating   timestamp gender  age  occupation    zip  
      6 0              1      1193       5   978300760      F    1          10  48067
      7 1              2      1193       5   978298413      M   56          16  70072
      8 2             12      1193       4   978220179      M   25          12  32793
      9 3             15      1193       4   978199279      M   25           7  22903
     10 4             17      1193       5   978158471      M   50           1  95350
     11 5             18      1193       4   978156168      F   18           3  95825
     12 6             19      1193       5   982730936      M    1          10  48073
     13 7             24      1193       5   978136709      F   25           7  10023
     14 8             28      1193       3   978125194      F   25           1  14607
     15 9             33      1193       5   978557765      M   45           3  55421
     16 10            39      1193       5   978043535      M   18           4  61820
     17 11            42      1193       3   978038981      M   25           8  24502
     18 12            44      1193       4   978018995      M   45          17  98052
     19 13            47      1193       4   977978345      M   18           4  94305
     20 14            48      1193       4   977975061      M   25           4  92107
     21 15            49      1193       4   978813972      M   18          12  77084
     22 16            53      1193       5   977946400      M   25           0  96931
     23 17            54      1193       5   977944039      M   50           1  56723
     24 18            58      1193       5   977933866      M   25           2  30303
     25 19            59      1193       4   977934292      F   50           1  55413
     26 20            62      1193       4   977968584      F   35           3  98105
     27 21            80      1193       4   977786172      M   56           1  49327
     28 22            81      1193       5   977785864      F   25           0  60640
     29 23            88      1193       5   977694161      F   45           1  02476
     30 24            89      1193       5   977683596      F   56           9  85749
     31 25            95      1193       5   977626632      M   45           0  98201
     32 26            96      1193       3   977621789      F   25          16  78028
     33 27            99      1193       2   982791053      F    1          10  19390
     34 28           102      1193       5  1040737607      M   35          19  20871
     35 29           104      1193       2   977546620      M   25          12  00926
     36 ...          ...       ...     ...         ...    ...  ...         ...    ...
     37 1000179     4933      3084       3   962757020      M   25          15  94040
     38 1000180     4802      2218       2  1014866656      M   56           1  40601
     39 1000181     4812      2308       2   962932391      M   18          14  25301
     40 1000182     4874       624       4   962781918      F   25           4  70808
     41 1000183     5059      1434       4   962484364      M   45          16  22652
     42 1000184     5947      1434       4   957190428      F   45          16  97215
     43 1000185     5077      1868       3   962417299      M   25           2  20037
     44 1000186     5944      1868       1   957197520      F   18          10  27606
     45 1000187     5105       404       3   962337582      M   50           7  18977
     46 1000188     5185       404       4   963402617      F   35           4  44485
     47 1000189     5532       404       5   959619841      M   25          17  27408
     48 1000190     5543       404       3   960127592      M   25          17  97401
     49 1000191     5220      2543       3   961546137      M   25           7  91436
     50 1000192     5754      2543       4   958272316      F   18           1  60640
     51 1000193     5227       591       3   961475931      M   18          10  64050
     52 1000194     5795       591       1   958145253      M   25           1  92688
     53 1000195     5313      3656       5   960920392      M   56           0  55406
     54 1000196     5328      2438       4   960838075      F   25           4  91740
     55 1000197     5334      3323       3   960796159      F   56          13  46140
     56 1000198     5334       127       1   960795494      F   56          13  46140
     57 1000199     5334      3382       5   960796159      F   56          13  46140
     58 1000200     5420      1843       3   960156505      F    1          19  14850
     59 1000201     5433       286       3   960240881      F   35          17  45014
     60 1000202     5494      3530       4   959816296      F   35          17  94306
     61 1000203     5556      2198       3   959445515      M   45           6  92103
     62 1000204     5949      2198       5   958846401      M   18          17  47901
     63 1000205     5675      2703       3   976029116      M   35          14  30030
     64 1000206     5780      2845       1   958153068      M   18          17  92886
     65 1000207     5851      3607       5   957756608      F   18          20  55410
     66 1000208     5938      2909       4   957273353      M   25           1  35401
     67 
     68                                                      title  
     69 0                   One Flew Over the Cuckoo's Nest (1975)
     70 1                   One Flew Over the Cuckoo's Nest (1975)
     71 2                   One Flew Over the Cuckoo's Nest (1975)
     72 3                   One Flew Over the Cuckoo's Nest (1975)
     73 4                   One Flew Over the Cuckoo's Nest (1975)
     74 5                   One Flew Over the Cuckoo's Nest (1975)
     75 6                   One Flew Over the Cuckoo's Nest (1975)
     76 7                   One Flew Over the Cuckoo's Nest (1975)
     77 8                   One Flew Over the Cuckoo's Nest (1975)
     78 9                   One Flew Over the Cuckoo's Nest (1975)
     79 10                  One Flew Over the Cuckoo's Nest (1975)
     80 11                  One Flew Over the Cuckoo's Nest (1975)
     81 12                  One Flew Over the Cuckoo's Nest (1975)
     82 13                  One Flew Over the Cuckoo's Nest (1975)
     83 14                  One Flew Over the Cuckoo's Nest (1975)
     84 15                  One Flew Over the Cuckoo's Nest (1975)
     85 16                  One Flew Over the Cuckoo's Nest (1975)
     86 17                  One Flew Over the Cuckoo's Nest (1975)
     87 18                  One Flew Over the Cuckoo's Nest (1975)
     88 19                  One Flew Over the Cuckoo's Nest (1975)
     89 20                  One Flew Over the Cuckoo's Nest (1975)
     90 21                  One Flew Over the Cuckoo's Nest (1975)
     91 22                  One Flew Over the Cuckoo's Nest (1975)
     92 23                  One Flew Over the Cuckoo's Nest (1975)
     93 24                  One Flew Over the Cuckoo's Nest (1975)
     94 25                  One Flew Over the Cuckoo's Nest (1975)
     95 26                  One Flew Over the Cuckoo's Nest (1975)
     96 27                  One Flew Over the Cuckoo's Nest (1975)
     97 28                  One Flew Over the Cuckoo's Nest (1975)
     98 29                  One Flew Over the Cuckoo's Nest (1975)
     99 ...                                                    ...
    100 1000179                                   Home Page (1999)
    101 1000180                            Juno and Paycock (1930)
    102 1000181                                Detroit 9000 (1973)
    103 1000182                               Condition Red (1995)
    104 1000183                               Stranger, The (1994)
    105 1000184                               Stranger, The (1994)
    106 1000185                                  Truce, The (1996)
    107 1000186                                  Truce, The (1996)
    108 1000187  Brother Minister: The Assassination of Malcolm...
    109 1000188  Brother Minister: The Assassination of Malcolm...
    110 1000189  Brother Minister: The Assassination of Malcolm...
    111 1000190  Brother Minister: The Assassination of Malcolm...
    112 1000191                          Six Ways to Sunday (1997)
    113 1000192                          Six Ways to Sunday (1997)
    114 1000193                            Tough and Deadly (1995)
    115 1000194                            Tough and Deadly (1995)
    116 1000195                                       Lured (1947)
    117 1000196                               Outside Ozona (1998)
    118 1000197                              Chain of Fools (2000)
    119 1000198  Silence of the Palace, The (Saimt el Qusur) (1...
    120 1000199                             Song of Freedom (1936)
    121 1000200                     Slappy and the Stinkers (1998)
    122 1000201                           Nemesis 2: Nebula (1995)
    123 1000202                          Smoking/No Smoking (1993)
    124 1000203                                 Modulations (1998)
    125 1000204                                 Modulations (1998)
    126 1000205                              Broken Vessels (1998)
    127 1000206                                  White Boys (1999)
    128 1000207                           One Little Indian (1973)
    129 1000208        Five Wives, Three Secretaries and Me (1998)
    130 
    131                          genres
    132 0                         Drama
    133 1                         Drama
    134 2                         Drama
    135 3                         Drama
    136 4                         Drama
    137 5                         Drama
    138 6                         Drama
    139 7                         Drama
    140 8                         Drama
    141 9                         Drama
    142 10                        Drama
    143 11                        Drama
    144 12                        Drama
    145 13                        Drama
    146 14                        Drama
    147 15                        Drama
    148 16                        Drama
    149 17                        Drama
    150 18                        Drama
    151 19                        Drama
    152 20                        Drama
    153 21                        Drama
    154 22                        Drama
    155 23                        Drama
    156 24                        Drama
    157 25                        Drama
    158 26                        Drama
    159 27                        Drama
    160 28                        Drama
    161 29                        Drama
    162 ...                         ...
    163 1000179             Documentary
    164 1000180                   Drama
    165 1000181            Action|Crime
    166 1000182   Action|Drama|Thriller
    167 1000183                  Action
    168 1000184                  Action
    169 1000185               Drama|War
    170 1000186               Drama|War
    171 1000187             Documentary
    172 1000188             Documentary
    173 1000189             Documentary
    174 1000190             Documentary
    175 1000191                  Comedy
    176 1000192                  Comedy
    177 1000193   Action|Drama|Thriller
    178 1000194   Action|Drama|Thriller
    179 1000195                   Crime
    180 1000196          Drama|Thriller
    181 1000197            Comedy|Crime
    182 1000198                   Drama
    183 1000199                   Drama
    184 1000200       Children's|Comedy
    185 1000201  Action|Sci-Fi|Thriller
    186 1000202                  Comedy
    187 1000203             Documentary
    188 1000204             Documentary
    189 1000205                   Drama
    190 1000206                   Drama
    191 1000207    Comedy|Drama|Western
    192 1000208             Documentary
    193 
    194 [1000209 rows x 10 columns]

    查看指定记录

    Error

    1 In [14]: data.ix[0]
    2 C:UsersI******AppDataLocalEnthoughtCanopyAppappdatacanopy-2.1.3.3542.win-x86_64libsite-packagesIPython__main__.py:1: DeprecationWarning:
    3 .ix is deprecated. Please use
    4 .loc for label based indexing or
    5 .iloc for positional indexing

    Solution

     1 In [15]: data.iloc[0]
     2 Out[15]:
     3 user_id                                            1
     4 movie_id                                        1193
     5 rating                                             5
     6 timestamp                                  978300760
     7 gender                                             F
     8 age                                                1
     9 occupation                                        10
    10 zip                                            48067
    11 title         One Flew Over the Cuckoo's Nest (1975)
    12 genres                                         Drama
    13 Name: 0, dtype: object

    使用pivot_table方法,按性别计算每部电影的平均得分

    pandas.pivot_table(datavalues=Noneindex=Nonecolumns=Noneaggfunc='mean'fill_value=Nonemargins=Falsedropna=Truemargins_name='All')01

     1 In [16]: mean_ratings = data.pivot_table('rating', index='title', columns='gender', aggfunc='mean')
     2 
     3 In [17]: mean_ratings[:10]
     4 Out[17]:
     5 gender                                    F         M
     6 title
     7 $1,000,000 Duck (1971)             3.375000  2.761905
     8 'Night Mother (1986)               3.388889  3.352941
     9 'Til There Was You (1997)          2.675676  2.733333
    10 'burbs, The (1989)                 2.793478  2.962085
    11 ...And Justice for All (1979)      3.828571  3.689024
    12 1-900 (1994)                       2.000000  3.000000
    13 10 Things I Hate About You (1999)  3.646552  3.311966
    14 101 Dalmatians (1961)              3.791444  3.500000
    15 101 Dalmatians (1996)              3.240000  2.911215
    16 12 Angry Men (1957)                4.184397  4.328421

    过滤评分数据不足250条的电影。

    先对title进行分组,然后利用size()得到一个含有各电影分组大小的Series对象;

     1 In [18]: ratings_by_title = data.groupby('title').size()
     2 
     3 In [19]: ratings_by_title[:10]
     4 Out[19]:
     5 title
     6 $1,000,000 Duck (1971)                37
     7 'Night Mother (1986)                  70
     8 'Til There Was You (1997)             52
     9 'burbs, The (1989)                   303
    10 ...And Justice for All (1979)        199
    11 1-900 (1994)                           2
    12 10 Things I Hate About You (1999)    700
    13 101 Dalmatians (1961)                565
    14 101 Dalmatians (1996)                364
    15 12 Angry Men (1957)                  616
    16 dtype: int64

    保留评分数据大于250条的电影名称。

     1 In [20]: active_titles = ratings_by_title.index[ratings_by_title >= 250]
     2 
     3 In [21]: active_titles
     4 Out[21]:
     5 Index([u''burbs, The (1989)', u'10 Things I Hate About You (1999)',
     6        u'101 Dalmatians (1961)', u'101 Dalmatians (1996)',
     7        u'12 Angry Men (1957)', u'13th Warrior, The (1999)',
     8        u'2 Days in the Valley (1996)', u'20,000 Leagues Under the Sea (1954)',
     9        u'2001: A Space Odyssey (1968)', u'2010 (1984)',
    10        ...
    11        u'X-Men (2000)', u'Year of Living Dangerously (1982)',
    12        u'Yellow Submarine (1968)', u'You've Got Mail (1998)',
    13        u'Young Frankenstein (1974)', u'Young Guns (1988)',
    14        u'Young Guns II (1990)', u'Young Sherlock Holmes (1985)',
    15        u'Zero Effect (1998)', u'eXistenZ (1999)'],
    16       dtype='object', name=u'title', length=1216)

    据此从mean_ratings中选取所需的行。

    Error

    1 In [22]: mean_ratings = mean_ratings.ix[active_titles]
    2 C:UsersI******AppDataLocalEnthoughtCanopyAppappdatacanopy-2.1.3.3542.win-x86_64libsite-packagesIPython__main__.py:1: DeprecationWarning:
    3 .ix is deprecated. Please use
    4 .loc for label based indexing or
    5 .iloc for positional indexing

    Solution

    In [23]: mean_ratings = mean_ratings.loc[active_titles]
    
    In [24]: mean_ratings
    Out[24]:
    gender                                                     F         M
    title
    'burbs, The (1989)                                  2.793478  2.962085
    10 Things I Hate About You (1999)                   3.646552  3.311966
    101 Dalmatians (1961)                               3.791444  3.500000
    101 Dalmatians (1996)                               3.240000  2.911215
    12 Angry Men (1957)                                 4.184397  4.328421
    13th Warrior, The (1999)                            3.112000  3.168000
    2 Days in the Valley (1996)                         3.488889  3.244813
    20,000 Leagues Under the Sea (1954)                 3.670103  3.709205
    2001: A Space Odyssey (1968)                        3.825581  4.129738
    2010 (1984)                                         3.446809  3.413712
    28 Days (2000)                                      3.209424  2.977707
    39 Steps, The (1935)                                3.965517  4.107692
    54 (1998)                                           2.701754  2.782178
    7th Voyage of Sinbad, The (1958)                    3.409091  3.658879
    8MM (1999)                                          2.906250  2.850962
    About Last Night... (1986)                          3.188679  3.140909
    Absent Minded Professor, The (1961)                 3.469388  3.446809
    Absolute Power (1997)                               3.469136  3.327759
    Abyss, The (1989)                                   3.659236  3.689507
    Ace Ventura: Pet Detective (1994)                   3.000000  3.197917
    Ace Ventura: When Nature Calls (1995)               2.269663  2.543333
    Addams Family Values (1993)                         3.000000  2.878531
    Addams Family, The (1991)                           3.186170  3.163498
    Adventures in Babysitting (1987)                    3.455782  3.208122
    Adventures of Buckaroo Bonzai Across the 8th Di...  3.308511  3.402321
    Adventures of Priscilla, Queen of the Desert, T...  3.989071  3.688811
    Adventures of Robin Hood, The (1938)                4.166667  3.918367
    African Queen, The (1951)                           4.324232  4.223822
    Age of Innocence, The (1993)                        3.827068  3.339506
    Agnes of God (1985)                                 3.534884  3.244898
    ...                                                      ...       ...
    White Men Can't Jump (1992)                         3.028777  3.231061
    Who Framed Roger Rabbit? (1988)                     3.569378  3.713251
    Who's Afraid of Virginia Woolf? (1966)              4.029703  4.096939
    Whole Nine Yards, The (2000)                        3.296552  3.404814
    Wild Bunch, The (1969)                              3.636364  4.128099
    Wild Things (1998)                                  3.392000  3.459082
    Wild Wild West (1999)                               2.275449  2.131973
    William Shakespeare's Romeo and Juliet (1996)       3.532609  3.318644
    Willow (1988)                                       3.658683  3.453543
    Willy Wonka and the Chocolate Factory (1971)        4.063953  3.789474
    Witness (1985)                                      4.115854  3.941504
    Wizard of Oz, The (1939)                            4.355030  4.203138
    Wolf (1994)                                         3.074074  2.899083
    Women on the Verge of a Nervous Breakdown (1988)    3.934307  3.865741
    Wonder Boys (2000)                                  4.043796  3.913649
    Working Girl (1988)                                 3.606742  3.312500
    World Is Not Enough, The (1999)                     3.337500  3.388889
    Wrong Trousers, The (1993)                          4.588235  4.478261
    Wyatt Earp (1994)                                   3.147059  3.283898
    X-Files: Fight the Future, The (1998)               3.489474  3.493797
    X-Men (2000)                                        3.682310  3.851702
    Year of Living Dangerously (1982)                   3.951220  3.869403
    Yellow Submarine (1968)                             3.714286  3.689286
    You've Got Mail (1998)                              3.542424  3.275591
    Young Frankenstein (1974)                           4.289963  4.239177
    Young Guns (1988)                                   3.371795  3.425620
    Young Guns II (1990)                                2.934783  2.904025
    Young Sherlock Holmes (1985)                        3.514706  3.363344
    Zero Effect (1998)                                  3.864407  3.723140
    eXistenZ (1999)                                     3.098592  3.289086
    
    [1216 rows x 2 columns]

    了解女性观众最喜欢的电影,对F列降序排列。

    Error

    1 In [25]: top_female_ratings = mean_ratings.sort_index(by='F', ascending=False)
    2 C:UsersI******AppDataLocalEnthoughtCanopyAppappdatacanopy-2.1.3.3542.win-x86_64libsite-packagesIPython__main__.py:1: FutureWarning: by argument to sort_index is deprecated, pls use .sort_values(by=...)

    Solution

     1 In [26]: top_female_ratings = mean_ratings.sort_values(by='F', ascending=False)
     2 
     3 In [27]: top_female_ratings[:10]
     4 Out[27]:
     5 gender                                                     F         M
     6 title
     7 Close Shave, A (1995)                               4.644444  4.473795
     8 Wrong Trousers, The (1993)                          4.588235  4.478261
     9 Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)       4.572650  4.464589
    10 Wallace & Gromit: The Best of Aardman Animation...  4.563107  4.385075
    11 Schindler's List (1993)                             4.562602  4.491415
    12 Shawshank Redemption, The (1994)                    4.539075  4.560625
    13 Grand Day Out, A (1992)                             4.537879  4.293255
    14 To Kill a Mockingbird (1962)                        4.536667  4.372611
    15 Creature Comforts (1990)                            4.513889  4.272277
    16 Usual Suspects, The (1995)                          4.513317  4.518248

    计算不同性别的评分分歧:

    给mean_ratings加上一个用于存放平均得分之差的列,并对其进行排序 --> 女性观众更喜欢的电影;

     1 In [28]: mean_ratings['diff'] = mean_ratings['M'] - mean_ratings['F']
     2 
     3 In [29]: sorted_by_diff = mean_ratings.sort_values(by='diff')
     4 
     5 In [30]: sorted_by_diff[:15]
     6 Out[30]:
     7 gender                                        F         M      diff
     8 title
     9 Dirty Dancing (1987)                   3.790378  2.959596 -0.830782
    10 Jumpin' Jack Flash (1986)              3.254717  2.578358 -0.676359
    11 Grease (1978)                          3.975265  3.367041 -0.608224
    12 Little Women (1994)                    3.870588  3.321739 -0.548849
    13 Steel Magnolias (1989)                 3.901734  3.365957 -0.535777
    14 Anastasia (1997)                       3.800000  3.281609 -0.518391
    15 Rocky Horror Picture Show, The (1975)  3.673016  3.160131 -0.512885
    16 Color Purple, The (1985)               4.158192  3.659341 -0.498851
    17 Age of Innocence, The (1993)           3.827068  3.339506 -0.487561
    18 Free Willy (1993)                      2.921348  2.438776 -0.482573
    19 French Kiss (1995)                     3.535714  3.056962 -0.478752
    20 Little Shop of Horrors, The (1960)     3.650000  3.179688 -0.470312
    21 Guys and Dolls (1955)                  4.051724  3.583333 -0.468391
    22 Mary Poppins (1964)                    4.197740  3.730594 -0.467147
    23 Patch Adams (1998)                     3.473282  3.008746 -0.464536

    对排序结果反序,并取出前15行 --> 男性观众更喜欢的电影;

     1 In [31]: sorted_by_diff[::-1][:15]
     2 Out[31]:
     3 gender                                         F         M      diff
     4 title
     5 Good, The Bad and The Ugly, The (1966)  3.494949  4.221300  0.726351
     6 Kentucky Fried Movie, The (1977)        2.878788  3.555147  0.676359
     7 Dumb & Dumber (1994)                    2.697987  3.336595  0.638608
     8 Longest Day, The (1962)                 3.411765  4.031447  0.619682
     9 Cable Guy, The (1996)                   2.250000  2.863787  0.613787
    10 Evil Dead II (Dead By Dawn) (1987)      3.297297  3.909283  0.611985
    11 Hidden, The (1987)                      3.137931  3.745098  0.607167
    12 Rocky III (1982)                        2.361702  2.943503  0.581801
    13 Caddyshack (1980)                       3.396135  3.969737  0.573602
    14 For a Few Dollars More (1965)           3.409091  3.953795  0.544704
    15 Porky's (1981)                          2.296875  2.836364  0.539489
    16 Animal House (1978)                     3.628906  4.167192  0.538286
    17 Exorcist, The (1973)                    3.537634  4.067239  0.529605
    18 Fright Night (1985)                     2.973684  3.500000  0.526316
    19 Barb Wire (1996)                        1.585366  2.100386  0.515020

    不考虑性别因素,计算得分数据的方差或标准差。

     1 In [32]: rating_std_by_title = data.groupby('title')['rating'].std()
     2 
     3 In [33]: rating_std_by_title = rating_std_by_title.loc[active_titles]
     4 
     5 In [34]: rating_std_by_title.sort_values(ascending=False)[:10]
     6 Out[34]:
     7 title
     8 Dumb & Dumber (1994)                     1.321333
     9 Blair Witch Project, The (1999)          1.316368
    10 Natural Born Killers (1994)              1.307198
    11 Tank Girl (1995)                         1.277695
    12 Rocky Horror Picture Show, The (1975)    1.260177
    13 Eyes Wide Shut (1999)                    1.259624
    14 Evita (1996)                             1.253631
    15 Billy Madison (1995)                     1.249970
    16 Fear and Loathing in Las Vegas (1998)    1.246408
    17 Bicentennial Man (1999)                  1.245533
    18 Name: rating, dtype: float64

    Reference

    01 http://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html

  • 相关阅读:
    按之字形打印二叉树 --剑指offer
    浅谈PHP+Access数据库的连接 注意要点
    Linux下统计代码行数
    获取服务器IP,客户端IP
    CURL访问举例
    廖雪峰博客
    Redis命令
    svn merge和branch 详解
    Linux Screen超简明教程
    MySQL 的Coalesce函数
  • 原文地址:https://www.cnblogs.com/princemay/p/7233256.html
Copyright © 2020-2023  润新知