foursquare is a location-based social network that allows users to “check-in” to venues on their mobile phones to earn points and rewards. Growing rapidly since its inception in 2009, the company needed to efficiently scale their application with limited engineering resources. As their data grew, foursquare made the strategic decision to migrate storage of venues and check-ins from their original relational architecture to MongoDB.
The original foursquare application relied on a single relational database. With this relational architecture, foursquare could not simply and easily scale to many nodes required for a high traffic application. As the company experienced rapid growth, they split the data to two nodes: one for checkins (the biggest data set) and one for everything else. Yet it was clear that check-ins would grow beyond what a single machine could handle, and that a long-term, scalable solution to foursquare's growth was needed.
In MongoDB the foursquare team discovered a solution with a features that solved more than just their scaling problem.
foursquare migrated their data to MongoDB to take advantage of its built-in auto-sharding. MongoDB's auto-sharding partitions the database, enabling foursquare to scale writes across many nodes. Instead of writing their own sharding layer, foursquare can rely upon MongoDB's automated scaling infrastructure and spin up new nodes as their application grows. This enabled foursquare to focus engineering resources on building their application rather than the back-end. "Writing our own sharding layer in the app or in a middleware layer seemed like a lot of work. Big win to outsource this to the guys at 10gen," said Harry Heymann of foursquare.
In addition to auto-sharding, foursquare benefits from MongoDB’s support for geospatial indexing, allowing them to easily query for location-based data.
MongoDB’s replica sets provide high availability through automated failover of nodes. Because foursquare runs on Amazon EC2, where nodes could fail at any time, automated failover is a huge benefit to the foursquare operations team. With replica sets, an event that would have been a production crisis becomes a regular operational task.
The document model of MongoDB, with independent JSON-like objects, maps well to object-oriented programming, in contrast with the schema-enforced table structures of relational databases. The relational model is just “not the way programmers think these days, since most engineers are object-oriented programmers,” says Harry Heymann of foursquare.
MongoDB allows foursquare to dramatically simplify their data model. For instance, rather than storing tags ("has wifi", "great for dates", "hotspot", etc) in a separate table and relying on mapping tables and costly JOINs, in MongoDB tags are embedded directly into the document representing a venue. This is both more efficient at run-time, and easier for engineers to understand and manipulate.
In foursquare's words
"MongoDB is a practical database for practical problems that engineers in the real world have. MongoDB is really designed by people who built large-scale web apps, and [they] want to build the perfect database for the web apps that [they] were building and solve practical problems that [their] customers actually have. That’s gone all the way to their original 1.0 design to the stuff they’re working on today. [The] stuff they’re working on today, they’re working on that because people like you guys - people in this room - went to them and said hey, here is this problem that we have. We think if you build feature x into this database, it would make it a lot simpler for us. And that’s the kind of stuff they love doing. I think over the next couple of years it’s only going to continue to evolve into a database that just makes our jobs easier as application developers, which is fantastic."
— Harry Heymann, Lead Server Engineer at foursquare, at MongoNYC 2011