15
JUL

Databases and the Cloud

In the early days of DoubleClick, when designing the DART technology, we had to decide on an efficient way to store massive amounts of information, such as number of exposures to a particular advertisement ID.

At first, we stored this information in Oracle, although it was quickly apparent that this was not an ideal long term solution. We eventually would need to store a trillion rows of information with random access to that information.

We built a proprietary, real-time data store for the problem. The system:

  • supported the operations get, put, increment
  • was very fast and optimized for the problem
  • was non-transactional, which was ok for the problem at hand
  • was very easy to partition (shard) across multiple machines

Now we are working on the 10gen database, named Mongo. As we work on the design, we keep these issues in mind. For a database for web 2.0 applications today, we want:

  • Scalability: a database that easily partitions over any number of commodity-class machines, and grows to any scale.
  • Hardware efficiency: it is one thing to scale linearly, another to have a good “constant factor”. We want a system that, in addition to scaling well, supports a high number of requests per second per cpu core, and is fast enough that intermediate caching layers are unnecessary.
  • Multimedia capability: we want to store large objects in the DB itself, instead of writing complex code to go find the right thing in an attached file system.
  • Ease of programming and depth of functionality.

What are lower priorities for us? We are not optimizing for decision support; rather, we need an operational data store. Also, we are not targeting hardened, banking class transactions and durability – especially if those features come at a large hardware cost.

As we look at the four design goals above, we conclude the ideal technology solution is an object oriented DBMS:

  • Scalability: object databases are easier to scale than relational databases; sharding is easier. In a relational database, distributed joins are a complex problem that must be solved if one desires true plug-and-play scalability without limits.
  • Hardware efficiency: when used appropriately, object databases can be very fast. All the data for a complex object can be clustered at the same disk location. The data representation on disk can be very close to the program’s in-memory representation, making reads that are file system cache hits fast.
  • Multimedia: an object DB can do well storing large, opaque blobs of information.
  • Ease of programming: object databases are easy to use when developing software in an object oriented programming language. To maintain portability, existing object persistence interfaces can be mapped to the database.

Our approach with Mongo is in some ways similar, and in some ways different, from that of Amazon SimpleDB and Google BigTable. It is similar in that all three are non-relational. It is different in that Mongo is a true object database, rather than a key/value data store.

We also want Mongo to have greater depth of functionality than existing cloud databases. To take a simple example, some of these products have no “order by” feature for queries. We want something that works in the cloud well yet also has the rich functionality developers expect.

An early alpha release of Mongo will be available next week, with a full release targeted for late in the year.

| Add a Comment | Back to main page

Categories: db

Comments

1

Richard L. Burton

Extremely interesting Dwight.

I understand your company is looking for Engineers to work on various parts of the Platform and I wanted to know if a person with Java experience would be a fit somewhere within 10Gen?

I’d love to read more on the Object Grid. Was it built from the ground up or on top of something like Sleepycat? What is the primary language used in the implementation? Did you incorporate various concepts like Google’s GFS or looked at Hadoop to barrow from the architecture?

Best Regards,
Richard L. Burton

2

dwight

Most of the system is Java, libraries Javascript, database C++. More information will be available next week on the internals.

3

Sadri

Several reactions:

1) ODB is fine, but the "share-nothing" assumption is questionable in the long term. Eventually objects in one partition will want to reference objects that happen to be in another partition. The complexity and cost of distributed joins in RDB is mostly about how to control the flow of tuples from one machine to another without starvation, deadlocks, or overwhelming memory. The techniques that handle these problems in RDB are equally applicable in ODB. Obviously, you can get started with share-nothing - but it tends to force people to use large objects to cut down on inter-object references that could get stored on different slices, or that cause skew (one machine gets vastly more objects than another in order to avoid intermachine references, but then that machine takes way longer to process 'queries').

2) Object model is fine, and ODB is fine. But vertical storage has some strong performance advantages over retrieving whole objects when an application may only need a handful of attributes. Paraccel blew everyone's numbers out of the water on TPCH, and Vertica wins a lot of business on its performance. If you go vertical, just be careful about compression / Sybase.

3) With hundreds to thousands of machines/disks, something fails all the time. Disk RAID only goes so far. Sites that can't afford downtime or lost data will want machine-level RAID, or mirroring, or geographical-object replication. Block/Page level mirroring may be the least performance-impacting choice. Sometimes failures are soft (disk speed suddenly slows, but doesn't fail). It's a good idea to keep track of which machines takes longest to react to a query. If there's regularity, your software will know there's a problem with skew or a soft failure.

4) Indexing is important at scale. The old btree/hash stuff doesn't cut it at large scale. Btrees are much slower than Tries/bucket-sorts and hashing sucks when it doesn't fit in memory.

5) Caching isn't so important at scale. Sure, some requests are more common than others, and caching is important for those objects. But at terabyte scales, the chances of wanting the same needle in the haystack is pretty small.

6) Solid state storage will be a game changer. Probably within 3-to-5 years? Good to keep it mind.


Good luck to you all. I'll give it a try when it's ready for a spin.

4

dwight

Great comments, thanks.

5

Markus

Can't subscribe to the rss feed, it says http://static.10gen.com/www.com/blog/rss?ctxt=wwwr2.1.3&lm=doesntexist which doesn't work.

thanks

add a comment

recent posts

SDK Updates - Stable for Windows, new snapshots Windows Binary SDK Available Sandboxing A Very Simple Mongo Insert Speed Test Template Technologies: Mix them in your app

most popular

You Don't Need a File System Databases and the Cloud 10gen Platform is Open Source Why JavaScript? Platform vs. Cloud

most commented

Why JavaScript? 10gen Platform is Open Source You Don't Need a File System Zeus Load Balancer is a Good Product Problems with JavaScript