Traackr provides brand marketers and public relations professionals with a sophisticated, automated tool to identify industry “influencers,” such as bloggers, analysts and reporters. The company’s social media monitoring product pulls large amounts of online data to find people and content relevant to keyword topics. Traackr’s ability to produce high-quality influencer lists depends on establishing connections between influencers, the sites they post on and their content. Their existing NoSQL database locked them into a data model, stifling Traackr’s ability to create strong connections between the various data sets. MongoDB proved to be the only NoSQL database able to scale and deliver the flexibility required to model their data in new ways.
Traackr processes between half a million and a million posts per day – including articles, blogs and social media posts – to generate influencer “A-lists.” In 2010, they deployed HBase to store unstructured data from the web, and to complement their relational back-end.
Traackr’s original model included three distinct data buckets: 1) Influencers, 2) Channels (sites for published content), and 3) Posts (tweets, blogs, etc.). Traackr creates influencer listings based on mining these data sets for keywords. If the information was only loosely coupled or if the wrong type of relationship festered for too long, inaccurate, inconsistent influencer rankings would result. In order to provide higher quality lists to clients, Traackr needed to build stronger associations within their data model, and HBase lacked indexing and ad-hoc querying capabilities to make this happen.
In their re-evaluation of NoSQL technology, MongoDB emerged as the NoSQL solution that offered the perfect fit for how Traackr wanted to model data.
“MongoDB has evolved with impressive speed, passion and community following,” said Traackr Co-founder and CTO David Chancogne. “As a developer and a CTO, I want technology that is not too complicated and that just works. MongoDB is easy to get started with, easy to put in production, it can scale to bigger production systems, and doesn’t take a brain surgeon to operate.”
Traackr stores up to a million posts per day and performs a large amount of post-processing on data in MongoDB. After only three months of development, Traackr’s core application built on MongoDB went live in early December 2011. Traackr can now account for atypical connections between data, such as an influential blogger who is more closely associated with The Huffington Post than his own personal Wordpress page. The result: higher quality, more reliable lists for customers.
EASY TO DEVELOP AND OPERATE
MongoDB is easy to use and deploy, both with day-to-day development and in production environments. “The approachability of MongoDB is really appealing” said Chancogne. Even the product manager, who is technical but doesn’t code, can create his own reports.
POWERFUL INDEXING & AD HOC QUERYING
MongoDB’s secondary and compound indexes immediately drew Traackr’s attention. The powerful indexing allows them to improve performance and maintain relationships between influencer and channel data. Ad hoc querying delivers better access to their data, with more expressive queries and the ability to query non-index fields.
REPLACED MAPREDUCE JOBS
While integration between MongoDB and Hadoop was important in order to maintain Traackr’s ability to batch process data, they were surprised that MongoDB could replace certain MapReduce jobs. Scoring influencers on a weekly basis, for example, previously required a MapReduce job with many URLs and metrics. Using MongoDB, Traackr replaced the complicated approach with a multi-threaded Java app that calls MongoDB directly to perform similar computations. Best of all, MongoDB was faster.
Prior to MongoDB, Traackr used replication on multiple machines, but the master/slave topology still created issues with single points of failure. Traackr needed the high availability and MongoDB delivered.
TECHNOLOGY AS RECRUITING TOOL
“At every turn, we are impressed with the thoughtfulness put into MongoDB. Every time we ask a question about capabilities, it delivers,” said George Stathis, vice president of engineering. “That has made development much more enjoyable, and everyone is really excited to work with it.” In fact, Traackr hopes the switch to MongoDB will attract more developers and facilitate their recruiting efforts.