Facebook has open-sourced some interesting in-house code in the past like Flashcache for the Linux kernel, the Folly C++ library, and the HipHop Virtual Machine. The latest open-source Linux-compatible software release coming out of Facebook is Presto, their tool for interacting with petabytes of information.
Presto is a distributed SQL query engine developed in-house at Facebook that they use for scouring their 300+ petabytes of data at the social network company. Facebook uses Hadoop clusters but Hive and other existing open-source tools didn’t provide the low-latency results the company wanted, so a team set to develop Presto.
Interestingly this low-latency distributed query engine is implemented in Java but is able to avoid typical issues of Java code via writing optimized code and generating some of its own byte code. Presto supports multiple back-ends and has been in development for the past year. Already the open-source tool has 10x better performance than Hive/MapReduce with CPU efficiency and latency for most of Facebook’s queries. Most ANSI SQL is supported by the engine.