There’s a lot of excitement around Hadoop software these days, here’s my definition of what “Hadoop” means:
Hadoop ™ is the ASF’s trademark for our Apache Hadoop software product that provides a service and simple programming model for the distributed processing of large data sets across clusters of commodity computers. Many people view Hadoop as the software that started the current “Big Data” processing model, which allows programmers to easily and effectively process huge data sets to get meaningful results.
The best place of all to learn about Hadoop is of course the Apache Hadoop project and community, which says this about the Hadoop software:
“(Hadoop) is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the (simple to program) application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.”
The Apache Hadoop project at the ASF is related to or has created a large number of notable modules, subprojects, or full projects at Apache, including:
- Hadoop Common
- Hadoop HDFS
- Hadoop MapReduce
- Apache Avro
- Apache Cassandra
- Apache Chukwa (incubating)
- Apache HBase
- Apache Hive
- Apache Mahout
- Apache Pig
- Apache ZooKeeper