The Cloudera company first began in October of 2008, offering paid support for Apache's Hadoop, the software that manages big data stored on up to thousands of servers. They've already delivered training, selling, and consulting services and support contracts to varied industries including biotech, finance, advertising, mobile, and research. Now Cloudera is offering a free distribution of Hadoop that's more easily deployed and configured, centered on helping smaller businesses with smaller IT staff to be able to run Hadoop with the efficiency and potency that larger companies possess.
Cloudera has its Hadoop distribution available for free as an RPM package downloadable for Red Had Linux distributions or as an image. The RPM can be downloaded at a newly launched website that enables a user to utilize a web-based configuration tool for the distribution. The configuration wizard includes varied questions in plain language that, when completed, generates a customized configuration file that composes the 300 or so settings Hadoop requires, making the set-up process much easier. The RPM is available at the end of the six-step process, and the configuration files will be saved on the website for future use and alterations, if needed. The custom package creator is free of charge as is the download, all of it being under the same Apache Software License v2.0 that the core Hadoop project is.
The company is also making available free, preconfigured VMware images to run on Linux, Windows, or Mac machines. The image includes example code and all components needed to use the Cloudera Distribution for Hadoop, including a master server and single node.
According to Cloudera, Hadoop is generally deployed on single- or multi-CPU systems with at least two cores per CPU, that have at least 16 GB or more of RAM, and a couple of terabytes of hard drive space. However, the specifications for optimal performance depends more on the data being managed and the analyses needed to be run rather than on Hadoop itself. As it is, Hadoop runs well on older hardware; users can build smaller clusters of more powerful machines or larger clusters of older, less powerful ones.
Though there currently aren't any projects to port Hadoop for other systems, Cloudera CEO Mike Olson says that that could be a possibility in the future.
Olson explained that Cloudera intends to bring the data technologies of successful companies like Google and Facebook to those many smaller ones throughout the globe:
It can't be said better than those up top at Cloudera in their video below: