有博客园上的朋友问“领域目前的从业情况”@疯狂的小风
各位大大如果有恰好是在公司任职做数据挖掘的,也请不吝分享下自己的工作,大家也可以交流下数据挖掘在业内的从业情况。
我先抛砖引玉下,在以前公司主要是做一些recommend system的搭建,主要包括各种分类用户数据的抓取与过滤,调整算法参数和结果的一些指标评测以及其可视化;还有就是关于spam user的detect。前几天接到EMC2的电话,发现他们也是做数据挖掘的,毕竟是号称大数据的公司,不过具体干嘛没问,说是以后再联系。百度的数据挖掘应用则多如牛毛,做数据挖掘的进去了不愁找不到坑坐。淘宝的数据挖掘部门也是众所周知的数据魔方,里面大概分了research组和其他的负责design的组,research组也是从评分到推荐什么都干。豆瓣的数据挖掘的人似乎是去的豆瓣算法组......目前里面的人士偏统计方向的。搜狗有一批是搞推荐和自然语言处理的,网易游戏也招数据挖掘分析师。还有就是一些投行和量化期货小团队的,比如今年赫赫有名的本科年薪120W+的dd学长就是去了香港的JaneStreet,具体工作是做quant。
所以看的出国内外公司对于搞数据挖掘还是有很大的需求量的,而且offer方面至少能保证我们衣食无忧,安心研究。不过我理想中的数据挖掘从业者应该是在理论上熟悉机器学习,统计学,神经网络,数据挖掘标准化流程;在实践上熟悉工程,会用hadoop,熟悉C/C++,python,R;在工具上熟悉多种数据挖掘分析软件和开源包;在分析问题上一方面谨遵标准化流程,另一方面又有敏锐嗅觉的。
最近在Quora上看到的What do statisticians do at Google便是讲数据挖掘领域在谷歌公司中的从业情况。
statistician,或者说Data Scientist和Quantitative Analyst等等,据在google工作的作为Statistician的Michael Hochster说,最大的关注点是搜索与广告。
Michael Hochster在两个领域都工作过。
在搜索领域数据分析师着重于搜索的质量,谷歌的工程师为了让搜索效果更好而工作,数据分析师则指出其是搜索否更好。他所知有几个有统计学博士(a couple of people )的工作是提高搜索的质量,不过他们被称为软件工程师。
Michael Hochster现在工作在一个叫广告指标(Ads Metrics)的中型group里,这个group基本上是由数据分析师组成。很多但不是所有是接收统计学训练的。
他们的工作既有提高广告服务(ads serve)
而在广告则是提高提高广告服务(ads serving)和广告度量(measurement)上。这个测度的项目不仅仅是一个一时的分析结果,也是包含了开发工具和processes。比如进行和分析在大规模上的实验。这里作者做了点模糊的说明,估计和项目的保密有关。
还有几个组也做雇佣了数据分析师,比如搜索基础建设组(分析谷歌的目录),经济组(做预测),量化市场组等。
总之我们可以看出,数据分析师在谷歌,不单单做分析的工作,也是要写代码实现的,可以说是RD结合吧,这也是提醒我们不但机器学习和统计学的功底要牢固,算法和acm什么的也不能落下,工程方面更要多做^_^
转文如下,部分我觉得有用的地方用黑体标注出了,也希望能有所指导和启发
原文链接:
http://www.quora.com/Google/What-do-statisticians-do-at-Google
Michael Hochster, Statistician at Google
A lot of different things. There is something called the Quantitative Analyst job ladder within Google which includes many different titles (Quantitative Analyst, Statistician, Data Scientist. etc) which as far as I can tell are all the same. Who gets which of these titles seems to be mostly a function of when the person was hired.
The largest concentrations of statisticians are in Search and Ads. I have worked in both these areas. In Search, statisticians concentrate on measuring search quality. Google engineers primarily work on making it better, statisticians work on figuring out whether it is better. I know a couple of people in search with PhDs in statistics who work on making search better, but they are called Software Engineers.
I now work in a medium-sized group called Ads Metrics, which is made up mostly of Quantitative Analysts. Many, but not all, are trained in statistics. We work both on building models to help improve ads serving, and measurement (i.e. how to measure whether a change to ads serving is a good thing) The measurement projects are not just one-at-a-time analysis questions but also involve development of tools and processes, for example for carrying out and analyzing experiments on a large scale. I'm being deliberately extremely vague about the details here, sorry.
There are several other groups that employ Quantitative Analysts. I don't know too much about most of them: Search Infrastructure (analysis relating to Google's index); Economics (forecasting), Quantitative Marketing (I know nothing at all about this).
For me, the joy of statistics is in answering interesting questions with data. Google abounds both with interesting (to me) questions, vast amounts of data, and powerful tools for working with it. It's a great place to be in my line of work.