Saturday, July 2, 2011

Hadoop the high-performance parallel data processing and reliable data storage technique

"I keep saying the sexy job in the next ten years will be statisticians" - Google’s Chief Economist Hal Varian

If you want a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model, then Hadoop will be the best option.

Technically, Hadoop consists of two key services: reliable data storage using the Hadoop Distributed File System (HDFS) and high-performance parallel data processing using a technique called MapReduce.

