Reverb Technologies is the world’s largest English language resource – a “dictionary on steroids” that is six times bigger than the Oxford English Dictionary. Founded on the belief that people understand words best when they can see them in real-world context, Reverb Technologies goes beyond traditional dictionary definitions to bring words to life by revealing the conversations, pictures, and other discussions about them. Leveraging the power of the real-time web, Reverb Technologies uses MongoDB as the foundation for its “live” dictionary that keeps pace with and illuminates the rapid evolution of language today. MongoDB offers the high performance and reliability the company required, and today, Reverb Technologies stores its entire text corpus in MongoDB – 3.5T of data in 20 billion records.
As a real-time dictionary, Reverb Technologies processes and analyzes a staggering amount of data. Its entire system is built on an almost continuous stream of high-quality text that is pulled from online sources, ranging from Twitter to newspaper articles. The system maps every word based on real data (e.g. “Google” used as a verb in a sentence), and users are able to contribute data via a free, public API. As a result, the data is constantly refreshed and the engine, which requires text, gets smarter.
Reverb Technologies was initially launched entirely on MySQL but quickly hit performance road blocks. Adding too much data too quickly resulted in outages; tables locked for tens of seconds during inserts, freezing IT’s ability to get or add data to the system. Reverb Technologies began an extensive evaluation of non-relational database options. MongoDB offered the high performance and reliability that the company required, coupled with a long runway for scale-out and a fast, easy solution for storing, locating and retrieving data.
Compared to other non-relational databases, MongoDB proved to be faster, more reliable and easier to implement and administer with a small team. The fact that it was easy for developers was a bonus.
MongoDB now powers every web site request – a hefty order with 20 million API calls per day and millions of unique users per month. It stores 3.5TB of data in 20 billion records, including append-only corpus, structured hierarchical and user generated data. All software that touches MongoDB is written in Java and Scala, and everything runs on Linux. Reverb Technologies also stores and accesses analytic data generated from Hadoop in MongoDB.
One guy, one month...zero downtime
In 2009, Reverb Technologies completed the first prototype, migrating 5 billion rows of data from MySQL to MongoDB in a single day. With only one devoted developer, the new system was up and running in a month and experienced zero downtime.
According to Reverb Technologies, developers treat MongoDB like “a raw high performance storage engine.” It’s the fastest storage engine they’ve ever used, serving an average of 500k requests per hour, and four times that during peak hours. MongoDB sustains an insert speed of 8k words per second, with frequent bursts of up to 50k per second. Queries are “blazingly fast” and Reverb Technologies is able to serve the data as quickly as the network can handle it.
With MySQL, Reverb Technologies relied on the memcached to maintain reasonable speed. Once MongoDB was in place, their structured dictionary data system jumped from 50 requests per second to 1000 requests per second. Further optimization around turning MongoDB data into objects led to an astounding 35,000 requests per second. As a result, Reverb Technologies eliminated its memcached layer, creating a simplified system that required fewer resources and was less prone to error.
Reverb Technologies uses MongoDB replica sets, which allow servers to act independently and enable high availability. The Reverb Technologies team sleeps soundly at night knowing that MongoDB will automatically fail over should one of their servers fail.
“The best thing from the get-go was 10gen’s engineers and developer-first approach,” said Reverb Technologies Vice President of Engineering and Technical Co-founder Tony Tam. “Even Eliot [Horowitz, 10gen CTO and co-founder] was very quick to help people, always giving helpful, professional advice.”
While Reverb Technologies migrated to MongoDB, they also converted to a physical data center – a winning combination that cut the cost per API request by a factor of 10. MongoDB reduced code by 75% and sped up data retrieval: example fetch time was cut from 400ms to 60ms, dictionary entries from 20ms to 1ms, and document metadata from 30ms to .1ms.
Said Tam: “Life with MongoDB has been good for Reverb Technologies. Our code is faster, more flexible and dramatically smaller. Since we don’t spend time worrying about the database, we can spend more time writing code for our application.”