Mongo DB
Our portfolio company 10gen started out last year building an entire open source cloud computing stack and based on feedback from the market they recently opted to focus entirely on the data store component, called MongoDB.
My partner Albert, who sits on 10gen's board, has a blog post up today talking about MongoDB and who should be trying it out and why:
MongoDB is a much better fit for most web development than a
traditional relational database. Instead of requiring an ORM layer,
MongoDB simply stores objects as documents in the database. This is
very fast since it eliminates a lot of overhead and therefore scales
much better than a relational DB+ORM. Yet it retains all the
flexibility for super agile development. Need a new field in your
objects? Just start saving new objects with that field. Need a new
collection of objects? Just start saving to it!
traditional relational database. Instead of requiring an ORM layer,
MongoDB simply stores objects as documents in the database. This is
very fast since it eliminates a lot of overhead and therefore scales
much better than a relational DB+ORM. Yet it retains all the
flexibility for super agile development. Need a new field in your
objects? Just start saving new objects with that field. Need a new
collection of objects? Just start saving to it!
If you are a web developer and are curious what MondoDB can do for you, you can download it here
Comments (Archived):
Excellent! We are going to take a serious look at this. Smart.
The word mongo has a very bad connotation in spanish.
Thanks for pointing that out
In portuguese too. Bad names aside, it`s a fact that databases are the biggest pain in the ass for developers trying to scale today`s RIAs. I`ll take a good look at this.
I’m curious how using 10gen and their MongoDB compares with or works with Hadoop/HDFS/HBase. This is a big initiative with a lot of companies now that are being challenged by their business sides to get technology to manage and exponential amount of data and the database structures even at major companies are a mess. Hadoop seems scalable and makes things possible from a storage perspective but as far as random access and recall it’s not there yet.As a disclaimer….I’m from the business side, not technology side so I’m really an amateur when it comes to this stuff but when starting from scratch it’s important to build something right to scale for the future.
I always thought Hadoop (based on Google’s map/reduce) is really designed for massive multi-TB datasets, so might be a little overkill.I would be interested in a benchmark comparison between MongoDB, CouchDB and HBase across different size and complexity datasets.
yes that is correct.. I don’t think you’ll find too many people complaining if they are using mysql or the standard stack on a small website. Larger websites seem to be fine with oracle, but when using adserving and data mining on cookie information data sets get huge quickly. The mgmt team of 10gen are from doublclick so I have to assume they are aware of all the companies in the ad network and adserving business who are using hadoop to manage massive amounts of data. So I don’t think my question is overkill.
I tend to think of Hadoop/HDFS and MongoDB as tools for different problems.In my own experience, I used Hadoop to process large-scale datasets, doing mostly aggregation and transformation (e.g. ‘summarize activity by hour and write to this summary file’). The HDFS underneath was a nice addition to ensure multiple copies of the data persisted. So in this sense, I used Hadoop more as a warehouse and reporting system.MongoDB is intended for high-performance operational data stores – something that you’d access when processing web requests (for example), where low-latency and a more conventional query model (‘find all where color is blue’) are important. With MongoDB’s replication, you do get the benefit of multiple copies of the data as well.So I think of these tools as complimentary. While we do plan to add map/reduce-like functionality in the future (so you can do Hadoop-style distributed aggregation operations), the current featureset is really tailored to high-performance operational needs.geir
Thanks for the response. I guess that’s what I’m getting at. When managing large data sets with HDFS the data is not readily available. it must be rolled up to have quick access to it and to provide reporting on say adserving or data mining on every cookie status for billions of ad impressions.So are you saying that you plan to either connect Mongo DB as an automated roll-up of hadoop or a similar map-reduce system that is rolled up into a MongoDB on some sort of regular schedule? Would love to hear more about that. Thanks!
That’s a hard question to answer since I can interpret in quite a few ways, and the roadmap we have for this kind of featureset isn’t detailed – we’re working now on sharding, concurrency, performance and the like.However, certainly being able to do MR-style operations across a MongoDB cluster is in the cards, and we’re probably be well served to ensure that it’s compatible with the code people are writing for Hadoop. The data will be distributed across multiple systems, and even our current, basic sharding work will distribute queries across a set of shards and aggregate the resulting fragementary result sets into a unified one to make it easy to use – the caller has no idea that they are working across a sharded system.Now, clearly this isn’t MR, but you can see that at a high enough level and an appropriate defocusing of the eyes, we’re headed in that general direction. :)geir
thanks for the detail and I look forward to experimenting with MongoDB
typo alert – ‘Could’ computing stack instead of cloud? although a ‘coulda-shoulda’ computing stack sounds promising…or maybe Vista has that space locked up
I must have peer produced proof reading asap!I’ve gotten a dozen comments in the past day about my inability to write correctlyThanks. I’ll fix it.
one of the casualties of the new age of journalism…..need I remind of http://www.avc.com/a_vc/200…
everything is going to the audience side so proof reading should too!
wow, i just realized when i read the post completely that they opened it up to php….can’t wait to download this and try it out.
I’m very excited about all the forward motion in document/object databases. Between MongoDB, CouchDB, SimpleDB, etc, I’m wonder who will come out the market leader. I hope a leader arrives soon, as I work at a company where the problems we solve correlate tightly to replacing real-world pieces of paper….a perfect opportunity for a document store.
In a world of ad-funded widget plays, it’s great to see USV backing a company doing open source infrastructure.
Hi Fred, this has been an interesting investment to watch as the non-relational database area is particularly active and developers/architects have a lot more to think about when choosing a database. To date relational has always been a given and the only decision being made is which vendor (usually Microsoft, Oracle, IBM or MySQL). Now we have a buffet of relational, key/value stored and document orientated. The vendors (or projects in some cases) now include Amazon, Google, 10gen, Project Voldemort (LinkedIn), CouchDB, Drizzle, SQL Data Services (Microsoft), Cassandra (Facebook) plus a myriad of other open source projects. Added to that mix, is all of the traditional relational databases are now available in the cloud (most on EC2).The RDB hasn’t maintained its dominance for the last 20-30 years out of accident. It has provided what I call the best mix of performance, scalability, flexibility, integrity, recoverability and redundancy. Every “relational killer” so far has made improvement on one aspect, but has come at the detrimental cost of another. This has resulted in the so called RDB replacements being pigeon holed as niche products and being used only for niche solutions and dying off.The current wave of key/value stores and document (JSON/BSON) based databases also fits with niche needs. Often they make improvements on the scalability and redundancy aspects at a cost to flexibility, integrity and (sometimes) performance. So again they are not a global alternative to RDB’s. But what makes this niche different is it’s large and growing due to the demand for web services. Here scalability can be so important, more important than almost everything else.The key issues I see for 10gen are:1)Building a start up on a start up. This has to be a concern for any company looking to use 10gen as their database platform of choice. Making it open source helps this and is probably all you can do here.2)Mindset. Currently, by default, if someone needs a database 99% of people will throw a relational database at the problem. Minds have to be expanded.3)Standards. While each relational vendor has its own flavor of SQL underneath them all is ANSI SQL. It would be great to start to see some commonality of standards starting to be developed by those in this space. 4)Perfecting the positioning. We have relational databases, column store data warehouse databases (many now available in the cloud), key/value stores (such as SimpleDB offering scalability and cost effectiveness) and document orientated databases. There is a lot of noise at the moment and people aren’t sure, helping their potential customers realize why they are so will be of benefit. When visiting the Mongo site right now, I don’t get a good sense of where the sweet spot is.So I think if 10gen can solve the scalability challenge, give less on other aspects than others, remove the object to relational mapping requirement (history has shown this on its own isn’t enough) and clearly articulate their value proposition they will have a compelling product. Good luck.
Thanks TonyThis is super helpfulWe are working on all those fronts and think we can deliver on them
Some explicit positioning and expectations wrt alternatives would be nice. And no, I’m not just talking about features.For example, when might I choose MongoDB over AWS or GAE? Do you expect (encourage?) other folks to set up MongoDB hosting services? On what scale? (For example, you might expect/hope that folks set up free and small services, will be releasing initial versions of the deployment support for that scale, and say that you’re ready to host huge ones.)You might also want to talk about how one might move an application from AWS or GAE to MongoDB and the reverse. The more painless this process is, the less risk it is to choose MongoDB.
Great feedback and ill ask them for that
Google AppEngine Data Store is only available if you are building using Google App Engine, so this is the only time (currently) you can/will use this. SimpleDB has a pretty good cost/scale ratio, and is a pretty compelling choice when building apps in AWS. However outside AWS the latency of SimpleDB would be too great for a high performance online apps.I believe Mongo has/will have an AWS build which is great but they will have to have a pretty good argument for using it over SimpleDB. The ability to devlop/start locally and migrate to AWS later is a great but I think a stronger technology based argument will be better.
> Google AppEngine Data Store is only available if you are building using Google App Engine, so this is the only time (currently) you can/will use this.While Google’s version is only available on GAE, it’s possible to clone the API, making applications written to that API portable between GAE and the clones.Note the word “clone”. A strict superset is okay, but other differences give Google lock-in. (“Better” usually isn’t and almost never overcomes the cost of “different”. You can think of this as Dave Winer says, aka “be humble”, or you can remember that potential customers don’t care about your problems or success.)> However outside AWS the latency of SimpleDB would be too great for a high performance online apps.The latency argument doesn’t apply to a SimpleDB clone in the same facility as the rest of the app hosting.Google and Amazon are teaching a lot of programmers a new programming “language”. They’re the low-risk option. Also, there is a lot of infrastructure being built.If you don’t clone, you have to do a lot of this work yourself.> I think a stronger technology based argument will be better.Stronger technology is never enough. In fact, it usually isn’t in the top 5. (Nope – google vs altavista isn’t a counter-example – altavista lost for non-technical reasons.)