Studying Hadoop or MapReduce can be a daunting task if you get your hand dirty at the start.
I followed the schedule as follows :
- Start with very basics of MR with code.google.com/edu/parallel/dsd-tutorial.html code.google.com/edu/parallel/mapreduce-tutorial.html
- Then go for the first two lectures in www.cs.washington.edu/education/courses/cse490h/08au/lectures.htm A very good course intro to MapReduce and Hadoop.
- Read the seminal paper labs.google.com/papers/mapreduce.html and its improvements in the updated version http://www.cs.washington.edu/education/courses/cse490h/08au/readings/communications200801-dl.pdf
- Then go for all the other videos in the U.Washington link given above.
- Try youtubing the terms Map reduce and hadoop to find videos by ORielly and Google RoundTable for good overview of the future of Hadoop and MapReduce
- Then off to the most important videos -
Cloudera Videos
www.cloudera.com/resources/?media=Video
and
Google MiniLecture Series
code.google.com/edu/submissions/mapreduce-minilecture/listing.html
Along with all the Multimedia above we need good written material
Documents:
- Architecture diagrams at hadooper.blogspot.com are good to have on your wall
- Hadoop: The definitive guide goes more into the nuts and bolts of the whole system where as Hadoop in Action is a good read with lots of teaching examples to learn the concepts of hadoop. Pro Hadoop is not for beginners
- pdfs of the documentation from Apache Foundation
hadoop.apache.org/common/docs/current/
and hadoop.apache.org/common/docs/stable/
will help you learn as to how model your problem into a MR solution in order to gain the advantages of Hadoop in total. - HDFS paper by Yahoo! Research is also a good read in order to gain in depth knowledge of hadoop
- Subscribe to the User Mailing List of Commons, MapReduce and HDFS in order to know problems, solutions and future solutions.
- Try the http://developer.yahoo.com/hadoop/tutorial/module1.html link for beginners to expert path to Hadoop
For Any Queries ...
Contact Apache, Google, Bing, Yahoo!
Others:
Your question seems overly broad - To get a resource to use while looking at source code you should narrow your focus of what you want to study. This will make it easier for you (and any on SO) to find papers/topics covering that topic.
I've dug into the Hadoop source a few times. Normally with a very specific class I needed to learn about. In these cases an external resource wasn't really needed, and since I had the class name, I just googled for that and found resources.
If I were to start trying to understand the hadoop source at a higher level I'd get the source code and my copy of Hadoop: The Definitive Guide and use that as a reference to understand the higher level connections of the source code.
I won't claim that this would be a perfect solution. H:TDG is at a more technical level than the other hadoop books I have and I find it to be very informative. H:TDG is what I'd start with and as I found areas I wanted to dig into more, I would start searching for those specifically.