第一本当然是大名鼎鼎的《Hadoop: The Definitive Guide》,基本上是Bible级别的,目前已经有第二版了。去年读了第一版,当时是以旧的API为例子的。关于新版本,参考amazon的介绍:
Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework -- an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters.
This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book.
* Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with MapReduce
* Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence
* Discover common pitfalls and advanced features for writing real-world MapReduce programs
* Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud
* Use Pig, a high-level query language for large-scale data processing
* Analyze datasets with Hive, Hadoop’s data warehousing system
* Take advantage of HBase, Hadoop’s database for structured and semi-structured data
* Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems
"Now you have the opportunity to learn about Hadoop from a master -- not only of the technology, but also of common sense and plain talk."
--Doug Cutting, Cloudera
About the Author
Tom White has been an Apache Hadoop committer since February 2007, and is a member of the Apache Software Foundation. He works for Cloudera, a company set up to offer Hadoop support and training. Previously he was as an independent Hadoop consultant, working with companies to set up, use, and extend Hadoop. He has written numerous articles for O'Reilly, java.net and IBM's developerWorks, and has spoken at several conferences, including at ApacheCon 2008 on Hadoop. Tom has a Bachelor's degree in Mathematics from the University of Cambridge and a Master's in Philosophy of Science from the University of Leeds, UK.
第二本是《Hadoop in Action》,这本书不厚,目前看了大概一半了,非常实用,如果你想快速的了解并开始实践的话,推荐这个。参考amazon的介绍:
Hadoop in Action teaches readers how to use Hadoop and write MapReduce programs. The intended readers are programmers, architects, and project managers who have to process large amounts of data offline. Hadoop in Action will lead the reader from obtaining a copy of Hadoop to setting it up in a cluster and writing data analytic programs.
The book begins by making the basic idea of Hadoop and MapReduce easier to grasp by applying the default Hadoop installation to a few easy-to-follow tasks, such as analyzing changes in word frequency across a body of documents. The book continues through the basic concepts of MapReduce applications developed using Hadoop, including a close look at framework components, use of Hadoop for a variety of data analysis tasks, and numerous examples of Hadoop in action.
Hadoop in Action will explain how to use Hadoop and present design patterns and practices of programming MapReduce. MapReduce is a complex idea both conceptually and in its implementation, and Hadoop users are challenged to learn all the knobs and levers for running Hadoop. This book takes you beyond the mechanics of running Hadoop, teaching you to write meaningful programs in a MapReduce framework.
This book assumes the reader will have a basic familiarity with Java, as most code examples will be written in Java. Familiarity with basic statistical concepts (e.g. histogram, correlation) will help the reader appreciate the more advanced data processing examples.
About the Author
Chuck Lam is a Senior Engineer at RockYou!. Chuck received his B.S from San Jose State University and his Ph.D in Electrical Engineering from Stanford University, where his thesis topic was computational data acquisition.
然后就是《Pro Hadoop》,看名字就知道是进阶的版本哦,我的下一本书了。参考amazon的介绍:
You’ve heard the hype about Hadoop: it runs petabyte–scale data mining tasks insanely fast, it runs gigantic tasks on clouds for absurdly cheap, it’s been heavily committed to by tech giants like IBM, Yahoo!, and the Apache Project, and it’s completely open source (thus free). But what exactly is it, and more importantly, how do you even get a Hadoop cluster up and running?
From Apress, the name you’ve come to trust for hands–on technical knowledge, Pro Hadoop brings you up to speed on Hadoop. You learn the ins and outs of MapReduce; how to structure a cluster, design, and implement the Hadoop file system; and how to build your first cloud–computing tasks using Hadoop. Learn how to let Hadoop take care of distributing and parallelizing your software—you just focus on the code, Hadoop takes care of the rest.
Best of all, you’ll learn from a tech professional who’s been in the Hadoop scene since day one. Written from the perspective of a principal engineer with down–in–the–trenches knowledge of what to do wrong with Hadoop, you learn how to avoid the common, expensive first errors that everyone makes with creating their own Hadoop system or inheriting someone else’s.
Skip the novice stage and the expensive, hard–to–fix mistakes...go straight to seasoned pro on the hottest cloud–computing framework with Pro Hadoop. Your productivity will blow your managers away.
What you’ll learn
* Set up a stand–alone Hadoop cluster the smart way, laid out simply and step by step so you can get up and running quickly to build your next data center, collaborative, data–intensive Internet services application, Software as a Service (SaaS), and more.
* Optimize your Hadoop production tasks like an experienced pro.
* Work with time–proven, bulletproof standard patterns that have been tested and debugged in high–volume production.
* Understand just enough theoretical knowledge to know why something works in Hadoop, without getting bogged down in abstruse walls of theory.
* Get detailed explanations of not only how to do something with Hadoop, but also why, from a front–line coder with years in the Hadoop game.
* Turn someone else’s expensive cluster–wide “wrong” into an orderly, productive "right" with professional–level debugging and testing.
Who this book is for
IT professionals interested in investigating Hadoop and implementing it in their organizations, and existing Hadoop users who want to deepen their professional toolkits.
Table of Contents
1. Getting Started with Hadoop Core
2. The Basics of a MapReduce Job
3. The Basics of Multimachine Clusters
4. HDFS Details for Multimachine Clusters
5. MapReduce Details for Multimachine Clusters
6. Tuning Your MapReduce Jobs
7. Unit Testing and Debugging
8. Advanced and Alternate MapReduce Techniques
9. Solving Problems with Hadoop
10. Projects Based On Hadoop and Future Directions
About the Author
Jason Venner has 20+ years of software engineering, managing, designing, and coding. He has been a VP, director, and consultant. Currently, his interests and expertise are in Java, Hadoop, cloud computing, and more. For more, visit www.prohadoopbook.com.
最后是一本延伸读物《HBase: The Definitive Guide》,还没有上市,需要预定。
注意:Hadoop 0.20采用了全新的API,所以以前的代码很多都需要重新写过。所以很有必要了解这些变化,如果你直接从0.20开始,也是没有问题的。
下载:英文的在csdn下载里面都有
购买:china-pub和dangdang有《Hadoop权威指南(中文版)》,英文原版的太贵了。