基础知识《一》

太棒了又收集到一些好东西---2014-11-05

2012届KDD Cup

Track1任务：社交网络中的个性化推荐系统

根据腾讯微博中的用户属性（User Profile）、SNS社交关系、在社交网络中的互动记录（retweet、comment、at）等，以及过去30天内的历史item推荐记录，来预测接下来最有可能被用户接受的推荐item列表

Track2任务：搜索广告系统的pTCR点击率预估

提供用户在腾讯搜索的查询词（query）、展现的广告信息（包括广告标题、描述、url等），以及广告的相对位置（多条广告中的排名）和用户点击情况，以及广告主和用户的属性信息，来预测后续时间用户对广告的点击情况

数据集：http://www.kddcup2012.org/c/kddcup2012-track1/data

论文：http://www.kddcup2012.org/workshop

2011届KDD Cup

Track1任务：音乐评分预测

根据用户在雅虎音乐上item的历史评分记录，来预测用户对其他item（包括歌曲、专辑等）的评分和实际评分之间的差异RMSE（最小均方误差）。同时提供的还有歌曲所属的专辑、歌手、曲风等信息

Track2任务：识别音乐是否被用户评分

每个用户提供6首候选的歌曲，其中3首为用户已评分数据，另3首是该用户未评分，但是出自用户中整体评分较高的歌曲。歌曲的属性信息（专辑、歌手、曲风等）也同样提供。参赛者给出二分分类结果（0/1分类），并根据整体准确率计算最终排名

数据集：http://kddcup.yahoo.com/datasets.php#

论文：http://kddcup.yahoo.com/workshop.php

2009届KDD Cup

法国电信运营商Orange的大规模数据中，积累了大量客户的行为记录。竞赛者需要设计一个良好的客户关系管理系统（CRM），用快速、稳定的方法，预测客户三个维度的属性，包括：1、忠诚度：用户切换运营商的可能性（Churn）；2、购买欲：购买新服务的可能性（Appetency）；3、增值性：客户升级或追加购买高利润产品的可能性（Up-selling）。结果用AUC曲线来评估

数据集：http://www.sigkdd.org/kddcup/index.php

论文：http://jmlr.csail.mit.edu/proceedings/papers/v7/

附上我收集的资料链接，格式基本按照‘URL+资料名称+出现在书中的页数’，某些链接可能需要你翻过一道‘墙’,某些重复引用的我就没重复贴上链接了
　　
　　
　　http://en.wikipedia.org/wiki/Information_overload
　　 P1
　　
　　http://www.readwriteweb.com/archives/recommender_systems.php
　　(A Guide to Recommender System) P4
　　
　　http://en.wikipedia.org/wiki/Cross-selling
　　 (Cross Selling) P6
　　
　　http://blog.kiwitobes.com/?p=58 ， http://stanford2009.wikispaces.com/
　　(课程：Data Mining and E-Business: The Social Data Revolution) P7
　　
　　 http://thesearchstrategy.com/ebooks/an%20introduction%20to%20search%20engines%20and%20web%20navigation.pdf
　　（An Introduction to Search Engines and Web Navigation） p7
　　
　　http://www.netflixprize.com/
　　p8
　　
　　http://cdn-0.nflximg.com/us/pdf/Consumer_Press_Kit.pdf
　　 p9
　　
　　 http://stuyresearch.googlecode.com/hg-history/c5aa9d65d48c787fd72dcd0ba3016938312102bd/blake/resources/p293-davidson.pdf
　　(The Youtube video recommendation system) p9
　　
　　 http://www.slideshare.net/plamere/music-recommendation-and-discovery
　　( PPT: Music Recommendation and Discovery) p12
　　
　　http://www.facebook.com/instantpersonalization/
　　P13
　　
　　 http://about.digg.com/blog/digg-recommendation-engine-updates
　　 (Digg Recommendation Engine Updates) P16
　　
　　 http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36955.pdf
　　 (The Learning Behind Gmail Priority Inbox)p17
　　
　　http://www.grouplens.org/papers/pdf/mcnee-chi06-acc.pdf
　　(Accurate is not always good: How Accuracy Metrics have hurt Recommender Systems) P20
　　
　　http://www-users.cs.umn.edu/~mcnee/mcnee-cscw2006.pdf
　　 (Don’t Look Stupid: Avoiding Pitfalls when Recommending Research Papers)P23
　　
　　http://www.sigkdd.org/explorations/issues/9-2-2007-12/7-Netflix-2.pdf
　　 (Major componets of the gravity recommender system) P25
　　
　　http://cacm.acm.org/blogs/blog-cacm/22925-what-is-a-good-recommendation-algorithm/fulltext
　　(What is a Good Recomendation Algorithm?) P26
　　
　　http://research.microsoft.com/pubs/115396/evaluationmetrics.tr.pdf
　　 (Evaluation Recommendation Systems) P27
　　
　　http://mtg.upf.edu/static/media/PhD_ocelma.pdf
　　(Music Recommendation and Discovery in the Long Tail) P29
　　
　　http://ir.ii.uam.es/divers2011/
　　(Internation Workshop on Novelty and Diversity in Recommender Systems) p29
　　
　　http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Notes/RN_11_21.pdf
　　(Auralist: Introducing Serendipity into Music Recommendation ) P30
　　
　　http://www.springerlink.com/content/978-3-540-78196-7/#section=239197&page=1&locus=21
　　(Metrics for evaluating the serendipity of recommendation lists) P30
　　
　　http://dare.uva.nl/document/131544
　　(The effects of transparency on trust in and acceptance of a content-based art recommender) P31
　　
　　http://brettb.net/project/papers/2007%20Trust-aware%20recommender%20systems.pdf
　　 (Trust-aware recommender systems) P31
　　
　　http://recsys.acm.org/2011/pdfs/RobustTutorial.pdf
　　(Tutorial on robutness of recommender system) P32
　　
　　http://youtube-global.blogspot.com/2009/09/five-stars-dominate-ratings.html
　　 (Five Stars Dominate Ratings) P37
　　
　　http://www.informatik.uni-freiburg.de/~cziegler/BX/
　　(Book-Crossing Dataset) P38
　　
　　http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-1K.html
　　(Lastfm Dataset) P39
　　
　　http://mmdays.com/2008/11/22/power_law_1/
　　（浅谈网络世界的Power Law现象） P39
　　
　　http://www.grouplens.org/node/73/
　　(MovieLens Dataset) P42
　　
　　http://research.microsoft.com/pubs/69656/tr-98-12.pdf
　　(Empirical Analysis of Predictive Algorithms for Collaborative Filtering) P49
　　
　　http://vimeo.com/1242909
　　(Digg Vedio) P50
　　
　　http://glaros.dtc.umn.edu/gkhome/fetch/papers/itemrsCIKM01.pdf
　　 (Evaluation of Item-Based Top-N Recommendation Algorithms) P58
　　
　　http://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf
　　(Amazon.com Recommendations Item-to-Item Collaborative Filtering) P59
　　
　　http://glinden.blogspot.com/2006/03/early-amazon-similarities.html
　　 (Greg Linden Blog) P63
　　
　　http://www.hpl.hp.com/techreports/2008/HPL-2008-48R1.pdf
　　(One-Class Collaborative Filtering) P67
　　
　　http://en.wikipedia.org/wiki/Stochastic_gradient_descent
　　(Stochastic Gradient Descent) P68
　　
　　http://www.ideal.ece.utexas.edu/seminar/LatentFactorModels.pdf
　　 (Latent Factor Models for Web Recommender Systems) P70
　　
　　http://en.wikipedia.org/wiki/Bipartite_graph
　　(Bipatite Graph) P73
　　
　　http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4072747&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4072747
　　(Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation) P74
　　
　　http://www-cs-students.stanford.edu/~taherh/papers/topic-sensitive-pagerank.pdf
　　(Topic Sensitive Pagerank) P74
　　
　　http://www.stanford.edu/dept/ICME/docs/thesis/Li-2009.pdf
　　(FAST ALGORITHMS FOR SPARSE MATRIX INVERSE COMPUTATIONS) P77
　　
　　https://www.aaai.org/ojs/index.php/aimagazine/article/view/1292
　　 (LIFESTYLE FINDER: Intelligent User Profiling Using Large-Scale Demographic Data) P80
　　
　　http://research.yahoo.com/files/wsdm266m-golbandi.pdf
　　( adaptive bootstrapping of recommender systems using decision trees) P87
　　
　　http://en.wikipedia.org/wiki/Vector_space_model
　　(Vector Space Model) P90
　　
　　http://tunedit.org/challenge/VLNetChallenge
　　(冷启动问题的比赛) P92
　　
　　http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf
　　 (Latent Dirichlet Allocation) P92
　　
　　http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
　　 (Kullback–Leibler divergence) P93
　　
　　http://www.pandora.com/about/mgp
　　(About The Music Genome Project) P94
　　
　　http://en.wikipedia.org/wiki/List_of_Music_Genome_Project_attributes
　　(Pandora Music Genome Project Attributes) P94
　　
　　http://www.jinni.com/movie-genome.html
　　(Jinni Movie Genome) P94
　　
　　http://www.shilad.com/papers/tagsplanations_iui2009.pdf
　　 (Tagsplanations: Explaining Recommendations Using Tags) P96
　　
　　http://en.wikipedia.org/wiki/Tag_(metadata)
　　(Tag Wikipedia) P96
　　
　　http://www.shilad.com/shilads_thesis.pdf
　　(Nurturing Tagging Communities) P100
　　
　　http://www.stanford.edu/~morganya/research/chi2007-tagging.pdf
　　 (Why We Tag: Motivations for Annotation in Mobile and Online Media ) P100
　　
　　http://www.google.com/url?sa=t&rct=j&q=delicious%20dataset%20dai-larbor&source=web&cd=1&ved=0CFIQFjAA&url=http%3A%2F%2Fwww.dai-labor.de%2Fen%2Fcompetence_centers%2Firml%2Fdatasets%2F&ei=1R4JUKyFOKu0iQfKvazzCQ&;usg=AFQjCNGuVzzKIKi3K2YFybxrCNxbtKqS4A&cad=rjt
　　(Delicious Dataset) P101
　　
　　http://research.microsoft.com/pubs/73692/yihgoca-www06.pdf
　　 (Finding Advertising Keywords on Web Pages) P118
　　
　　http://www.kde.cs.uni-kassel.de/ws/rsdc08/
　　(基于标签的推荐系统比赛) P119
　　
　　http://delab.csd.auth.gr/papers/recsys.pdf
　　（Tag recommendations based on tensor dimensionality reduction）P119
　　
　　http://www.l3s.de/web/upload/documents/1/recSys09.pdf
　　(latent dirichlet allocation for tag recommendation) P119
　　
　　http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.5271&rep=rep1&type=pdf
　　(Folkrank: A ranking algorithm for folksonomies) P119
　　
　　http://www.grouplens.org/system/files/tagommenders_numbered.pdf
　　 (Tagommenders: Connecting Users to Items through Tags) P119
　　
　　http://www.grouplens.org/system/files/group07-sen.pdf
　　(The Quest for Quality Tags) P120
　　
　　http://2011.camrachallenge.com/
　　(Challenge on Context-aware Movie Recommendation) P123
　　
　　http://bits.blogs.nytimes.com/2011/09/07/the-lifespan-of-a-link/
　　(The Lifespan of a link) P125
　　
　　http://www0.cs.ucl.ac.uk/staff/l.capra/publications/lathia_sigir10.pdf
　　 (Temporal Diversity in Recommender Systems) P129
　　
　　http://staff.science.uva.nl/~kamps/ireval/papers/paper_14.pdf
　　 (Evaluating Collaborative Filtering Over Time) P129
　　
　　http://www.google.com/places/
　　(Hotpot) P139
　　
　　http://www.readwriteweb.com/archives/google_launches_recommendation_engine_for_places.php
　　(Google Launches Hotpot, A Recommendation Engine for Places) P139
　　
　　http://xavier.amatriain.net/pubs/GeolocatedRecommendations.pdf
　　 (geolocated recommendations) P140
　　
　　http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html
　　(A Peek Into Netflix Queues) P141
　　
　　http://www.cs.umd.edu/users/meesh/420/neighbor.pdf
　　(Distance Browsing in Spatial Databases1) P142
　　
　　http://www.eng.auburn.edu/~weishinn/papers/MDM2010.pdf
　　 (Efﬁcient Evaluation of k-Range Nearest Neighbor Queries in Road Networks) P143
　　
　　
　　http://blog.nielsen.com/nielsenwire/consumer/global-advertising-consumers-trust-real-friends-and-virtual-strangers-the-most/
　　(Global Advertising: Consumers Trust Real Friends and Virtual Strangers the Most) P144
　　
　　http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36371.pdf
　　(Suggesting Friends Using the Implicit Social Graph) P145
　　
　　http://blog.nielsen.com/nielsenwire/online_mobile/friends-frenemies-why-we-add-and-remove-facebook-friends/
　　(Friends & Frenemies: Why We Add and Remove Facebook Friends) P147
　　
　　http://snap.stanford.edu/data/
　　(Stanford Large Network Dataset Collection) P149
　　
　　http://www.dai-labor.de/camra2010/
　　(Workshop on Context-awareness in Retrieval and Recommendation) P151
　　
　　http://www.comp.hkbu.edu.hk/~lichen/download/p245-yuan.pdf
　　 (Factorization vs. Regularization: Fusing Heterogeneous
　　Social Relationships in Top-N Recommendation) P153
　　
　　http://www.infoq.com/news/2009/06/Twitter-Architecture/
　　(Twitter, an Evolving Architecture) P154
　　
　　http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CGQQFjAB&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.165.3679%26rep%3Drep1%26type%3Dpdf&ei=dIIJUMzEE8WviQf5tNjcCQ&usg=AFQjCNGw2bHXJ6MdYpksL66bhUE8krS41w&sig2=5EcEDhRe9S5SQNNojWk7_Q
　　(Recommendations in taste related domains) P155
　　
　　http://www.ercim.eu/publication/ws-proceedings/DelNoe02/RashmiSinha.pdf
　　(Comparing Recommendations Made by Online Systems and Friends) P155
　　
　　http://techcrunch.com/2010/04/22/facebook-edgerank/
　　(EdgeRank: The Secret Sauce That Makes Facebook's News Feed Tick) P157
　　
　　http://www.grouplens.org/system/files/p217-chen.pdf
　　(Speak Little and Well: Recommending Conversations in Online Social Streams) P158
　　
　　http://blog.linkedin.com/2008/04/11/learn-more-abou-2/
　　(Learn more about “People You May Know”) P160
　　
　　http://domino.watson.ibm.com/cambridge/research.nsf/58bac2a2a6b05a1285256b30005b3953/8186a48526821924852576b300537839/$FILE/TR%202009.09%20Make%20New%20Frends.pdf
　　(“Make New Friends, but Keep the Old” – Recommending People on Social Networking Sites) P164
　　
　　http://www.google.com.hk/url?sa=t&rct=j&q=social+recommendation+using+prob&source=web&cd=2&ved=0CFcQFjAB&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.141.465%26rep%3Drep1%26type%3Dpdf&ei=LY0JUJ7OL9GPiAfe8ZzyCQ&usg=AFQjCNH-xTUWrs9hkxTA8si5fztAdDAEng
　　(SoRec: Social Recommendation Using Probabilistic Matrix) P165
　　
　　http://olivier.chapelle.cc/pub/DBN_www2009.pdf
　　(A Dynamic Bayesian Network Click Model for Web Search Ranking) P177
　　
　　http://www.google.com.hk/url?sa=t&rct=j&q=online+learning+from+click+data+spnsored+search&source=web&cd=1&ved=0CFkQFjAA&url=http%3A%2F%2Fwww.research.yahoo.net%2Ffiles%2Fp227-ciaramita.pdf&ei=HY8JUJW8CrGuiQfpx-XyCQ&usg=AFQjCNE_CYbEs8DVo84V-0VXs5FeqaJ5GQ&cad=rjt
　　(Online Learning from Click Data for Sponsored Search) P177
　　
　　http://www.cs.cmu.edu/~deepay/mywww/papers/www08-interaction.pdf
　　(Contextual Advertising by Combining Relevance with Click Feedback) P177
　　http://tech.hulu.com/blog/2011/09/19/recommendation-system/
　　(Hulu 推荐系统架构) P178
　　
　　http://mymediaproject.codeplex.com/
　　(MyMedia Project) P178
　　
　　http://www.grouplens.org/papers/pdf/www10_sarwar.pdf
　　(item-based collaborative filtering recommendation algorithms) P185
　　
　　http://www.stanford.edu/~koutrika/Readings/res/Default/billsus98learning.pdf
　　(Learning Collaborative Information Filters) P186
　　
　　http://sifter.org/~simon/journal/20061211.html
　　(Simon Funk Blog:Funk SVD) P187
　　
　　http://courses.ischool.berkeley.edu/i290-dm/s11/SECURE/a1-koren.pdf
　　(Factor in the Neighbors: Scalable and Accurate Collaborative Filtering) P190
　　
　　http://nlpr-web.ia.ac.cn/2009papers/gjhy/gh26.pdf
　　(Time-dependent Models in Collaborative Filtering based Recommender System) P193
　　
　　http://sydney.edu.au/engineering/it/~josiah/lemma/kdd-fp074-koren.pdf
　　(Collaborative filtering with temporal dynamics) P193
　　
　　http://en.wikipedia.org/wiki/Least_squares
　　(Least Squares Wikipedia) P195
　　
　　http://www.mimuw.edu.pl/~paterek/ap_kdd.pdf
　　(Improving regularized singular value decomposition for collaborative filtering) P195
　　
　　http://public.research.att.com/~volinsky/netflix/kdd08koren.pdf
　　 (Factorization Meets the Neighborhood: a Multifaceted
　　Collaborative Filtering Model) P195

Where to Learn Deep Learning – Courses, Tutorials, Software

Deep Learning is a very hot Machine Learning techniques which has been achieving remarkable results recently. We give a list of free resources for learning and using Deep Learning.

comments

By Gregory Piatetsky, @kdnuggets, May 26, 2014.

Deep Learning is a very hot area of Machine Learning Research, with many remarkable recent successes, such as 97.5% accuracy on face recognition, nearly perfect German traffic sign recognition, or even Dogs vs Cats image recognition with 98.9% accuracy. Many winning entries in recent Kaggle Data Science competitions have used Deep Learning.

The term "deep learning" refers to the method of training multi-layered neural networks, and became popular after papers by Geoffrey Hinton and his co-workers which showed a fast way to train such networks.

Yann LeCun, a student of Geoff Hinton, also developed a very effective algorithm for deep learning, called Filters learned by ConvNet

ConvNet, which was successfully used in late 80-s and early 90-s for automatic reading of amounts on bank checks.

See more on ConvNet and factors enabled recent success of Deep Learning in my exclusive interview with Yann LeCun.

In May 2014, Baidu, the Chinese search giant, hashired Andrew Ng, a leading Machine Learning and Deep Learning expert (and co-founder of Coursera) to head their new AI Lab in Silicon Valley, setting up an AI & Deep Learning race with Google (which hired Geoff Hinton) and Facebook (which hired Yann LeCun to head Facebook AI Lab).

Here are some useful and free (!) resources for learning and using Deep Learning:

DeepLearning.net, dedicated site for Deep Learning
DeepLearning.net tutorials
Deep Learning Wikipedia page
NYU Deep Learning course material by Yann LeCun
Yann LeCun overview of Deep Learning with Marc'Aurelio Ranzato
Geoff Hinton Coursera course on Neural Networks
Deep Learning: Methods and Applications book (134 pages) from the Microsoft Speech Group
CMU reading list, including student notes
Deep Learning Google+ page
Watch: Deep Learning Tutorial by John Kaufhold at Washington, DC Data Science Meetup, 2014
Where are the Deep Learning Courses?, blog by John Kaufhold, data scientist and managing partner of Deep Learning Analytics.
How Deep Learning will change our world, summary of Melbourne Data Science presentation by Jeremy Howard.

The packages which support Deep Learning include

Torch7, an extension of the LuaJIT language which includes an object-oriented package for deep learning and computer vision. The main advantage of Torch7 is that LuaJIT is extremely fast and very flexible.
Theano + Pylearn2, which has the advantage of using Python (widely used), and the disadvantage of using Python (slow for big data).
cuda-convnet, High-performance C++/CUDA implementation of convolutional neural networks, based on Yann LeCun work.

Related:

相关阅读:
DDD：再谈：实体能否处于非法状态？
EntityFramework：迁移工具入门
技术人生：态度决定人生
EntityFramework：EF Migrations Command Reference
DDD：聊天笔记
DCI：DCI学习总结
DCI：The DCI Architecture: A New Vision of Object-Oriented Programming
设计原则：消除Switch...Case的过程，可能有点过度设计了。
.NET：动态代理的 “5 + 1” 模式
Silverlight：《Pro Silverlight5》读书笔记之 Dependency Properties And Routed Event

原文地址：https://www.cnblogs.com/abc8023/p/4063756.html