For processing and querying data sets that are characterised by huge volumes and enormous variability (a.k.a. big data), employing NoSQL on clusters of cheap servers for high-powered analytics becomes an increasingly attractive alternative to a large SQL appliance, especially when it is used in conjunction with MapReduce technology to crunch the data: NoSQL databases are generally distributed pre-integrated with Hadoop, or at least with built-in support.
“If you’re storing data in a relational database and you want to run it through Hadoop, you need to take the data out of the database, put it into HDFS [Hadoop File System], do the analytics in Hadoop, take the result of that and put it back into the database. With Mongo you can do those operations in real time while it’s still in the operational database. You can also mix and match database-style queries with Hadoop-style MapReduce analytics,” said Schireson.