crawler with data analysis (Hadoop, MapReduce, HBase) Phase I Data Modeling

http://www.donanza.com/jobs/p3315101-crawler_with_data_analysis_hadoop_mapreduce_hbase_phase_i

crawler with data analysis (Hadoop, MapReduce, HBase) - Phase I - Data Modeling

Goal for Phase 1: given a topic in English (e.g. "skiing"), crawl the web (sites, blogs, social media) and collect 1 million relevant articles/pages/posts/documents. Perform analysis and generate meaningful reports on the topic, potentially including top keywords, concepts, related topics or concepts, Optional task (bonus): add "intelligence" to your analysis, by determining rank/reputation, sentiment (negative vs. positive), type (opinion article vs. advertisement vs. for sale ad vs. wanted ad) - we are flexible and open to ideas. Development/staging environment: 3-node cluster, CentOS 5.6 and Cloudera CDH3 (Hadoop, MapReduce, Hue, Pig, Flume, HBase) + one management machine with CDH. If you bid on this job, please describe your prior experience with Big Data, and tell us how you would approach this problem, a high-level overview of steps you will need to perform... It's important for us to see the way you approach problems. We speak English and Russian fluently. Depending on your approach, we will define milestones and timeline together. This is Phase I of the project, do your best! Desired Skills: Data Modeling, Scripts & Utilities, CentOS, Hadoop, MapReduce

相关阅读:
sql server 2005中模仿merge的使用方法
c#中怎么控制session失效时间
MasterPage，Page 2者之间事件的执行顺序
ToolStripComboBox的DataSource和DataTable
C#综合揭秘——细说多线程（上）（转载）
SQL Server 2008 MERGE
SQL output语句的用法
SQL中使用update inner join和delete inner join
C#综合揭秘——细说事务（转载）
SQLMETAL :Linq对象生成

原文地址：https://www.cnblogs.com/lexus/p/2196490.html