Open source is the heart of Big Data, driving the state of the art in both data storage and data processing. As recent research suggests, when IT professionals and data scientists get serious about building Big Data applications, they overwhelmingly turn to MongoDB and Hadoop. MongoDB is increasingly the industry’s default data store for Big Data-type applications, and it also works for data processing. Hadoop, for its part, is deployed for deep, computationally-intensive data processing.
The two technologies are highly complementary.
Small wonder, then, that many companies use the two together. To meet this demand, 10gen built a MongoDB Hadoop Adapter, which Orbitz, Foursquare, and others deploy to store and crunch massive quantities of data. There are also integrations with Storm and other tools for real-time, but MongoDB-plus-Hadoop is easily the most widely used integration.
Where does this leave traditional RDBMS solutions? As 10gen president Max Schireson explains in a recent interview, the cost of building out a Big Data solution with, for example, Oracle technology is cost-prohibitive:
In the relational world, when you need real processing power you might go out and buy a big [Oracle] Exadata box for $10m. But in our world the way to get more power is just to buy more cheap commodity servers. One $10m server will typically have less processing power than a rack full of 50 cheap commodity servers that cost $5,000 each or $250,000 in total.
And while open-source SQL technologies like MySQL aren’t likely to break an IT budget, they also aren’t well-suited for large quantities of unstructured, complex data.
Not everyone will need to turn to Hadoop to process the data stored in MongoDB. As the City of Chicago has done for its Data Portal, MongoDB can be a highly efficient way to both store and process data in real-time. Schireson explains:
If you're storing data in a relational database and you want to run it through Hadoop, you need to take the data out of the database, put it into HDFS [Hadoop File System], do the analytics in Hadoop, take the result of that and put it back into the database. With Mongo you can do those operations in real time while it's still in the operational database. You can also mix and match database-style queries with Hadoop-style MapReduce analytics.
If you’re interested in hearing how enterprises are embracing MongoDB for Big Data analysis, using it alone or in concert with Hadoop, 10gen’s chief evangelist Steve Francia, along with select MongoDB enterprise users, will be presenting on this topic at Strata Conference’s Bridge to Big Data track on October 23 in New York. We’d love to see you there.