What is Apache Hadoop?

There’s a lot of excitement around Hadoop software these days, here’s my definition of what “Hadoop” means: Hadoop ™ is the ASF’s trademark for our Apache Hadoop software product that provides a service and simple programming model for the distributed processing of large data sets across clusters of commodity computers. Many people view Hadoop as the software that started the current “Big Data” processing model, which allows programmers to easily and effectively process huge data sets to get meaningful results. The best place of all to learn about Hadoop is of course the Apache Hadoop project and community, which says this about the Hadoop software: “(Hadoop) is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the (simple to program) application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.” The Apache Hadoop project at the ASF is related to or has created a large number of notable modules, subprojects, or full projects at Apache, including:

There are a wide variety of vendors who provide Hadoop-related software, however the only source for Hadoop software itself is the Apache Hadoop project here at the ASF. We certainly appreciate the many companies who allow their employees to contribute work to Apache Hadoop and all of our projects, and also to the many Apache Corporate Sponsors. However I do hope that companies working in the Hadoop and related Big Data industry take stock of their marketing strategies, and ensure that their corporate marketing doesn’t shortchange the credit owed to the Apache Hadoop community itself. We very much appreciate those corporate supporters who do provide plenty of credit to the ASF and the Apache Hadoop community – both the old hats, and the very new spinoff in the Big Data space. I just hope that some of the other players in the industry will carefully consider their public crediting (or lack thereof) to the ASF’s Hadoop brand and the many individual committers and contributors to the Apache Hadoop project. As always, the Apache Hadoop website and mailing lists are the best place to learn about Hadoop software! Oh, and remember: Apache Hadoop, Hadoop, the yellow elephant logo, the names of Apache software products, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries