Posts Tagged MongoDb
The Hacker News community that contributed to the adoption of MongoDB is showing dissent, dismay and desertion of the quintessential rainbows-and-unicorns NoSQL database. The fire was set off last week by an anonymous post ‘Don’t use MongoDB’ and, during the same period, the ‘Failing with MongoDB’ post. These posts triggered all sorts of interesting discussions on the Hacker News threads – some trolling, but mostly from people experienced in mongoDB, scalability, databases, open source and startups. A good sampling of opinion, both technical and not, from people who collectively have a good and valid opinion of MongoDB.
The basis of the trouble is that MongoDB, under certain load conditions, has a tendency to fall over and, crucially, lose data. That begs the question about the quality of the code, the involvement of 10gen and whether or not things will improve over time, or at all. Added to the specific MongoDB concerns, this seems to have cast a broad shadow over NoSQL databases in general.
Below are the links to the relevant posts, with the Hacker News comment thread (in brackets). I urge you to scan through the comment threads as there are some useful nuggets in there.
I have little to add to the overall discussion (there are some detailed and insightful comments in the threads), but would make the following brief observations.
- MongoDB 1.8 and back was unashamedly fast and the compromise was that the performance gain was obtained by being memory based (where commits happen in memory as opposed to disk). It was almost like a persistent cache than a database for primary data.
- If you absolutely have to have solid, sure, consistent, reliable, error free, recoverable, transactioned and similar attributes on your data, then MongoDB is probably not a good choice and it would be safer to go with one of the incumbent SQL RDBMSs.
- Not all data has to be so safe and MongoDB has clear and definite use cases.
- However, unexpected and unexplained data loss is a big deal for any data store, even if it is text files sitting in a directory. MongoDB could ‘throw away’ data in a managed fashion and get away with it (say giving up in replication deadlocks), but for it to happen mysteriously is a big problem.
- Architects assessing and implementing MongoDB should be responsible. Test it to make sure that it works and manage any of the (by now well known) issues around MongoDB.
- Discussions about NoSQL in general should not be thrown in with MongoDB at all. Amazon SimpleDB is also NoSQL, but doesn’t suffer from data loss issues. (It has others, such as latency, but there is no compromise on data loss)
- The big problem that I have with using MongoDB properly is that it is beginning to require knowledge about ‘data modelling’ (whatever that means in a document database context), detailed configuration and understanding about the internals of how it works. NoSQL is supposed to take a lot of that away for you and if you need to worry about that detail, then going old school may be better. In other words, the benefits of using MongoDB over say MySQL have to significantly outweigh the risks of just using MySQL from the beginning.
- Arguably creating databases is hard work and MongoDB is going to run up against problems that Oracle did, and solved, thirty years ago. It will be interesting to see where this lands up – a better positioned lean database (in terms of its use case) or a bloated, high quality one. MongoDB is where MySQL was ten years ago, and I’m keen to see what the community does with it.
I was asked via email to confirm my thoughts on running MongoDB on Windows Azure, specifically the implication that it is not good practice. Things have moved along and my thoughts have evolved, so I thought it may be necessary to update and publish my thoughts.
Firstly, I am a big fan of SQL Azure, and think that the big decision to remove backwards compatibility with SQL Server was a good one that enabled SQL Azure to rid itself of some of the problems with RDBMSs in the cloud. But, as I discussed in Windows Azure has little to offer NoSQL, Microsoft is so big on SQL Azure (for many good reasons) that NoSQL is a second class citizen on Windows Azure. Even Azure Table Storage is lacking in features that have been asked for for years and if it moves forward, it will do so grudgingly and slowly. That means that an Azure architecture that needs the goodness offered by NoSQL products needs to roll in an alternative product into some Azure role of sorts (worker or VM role). (VM Roles don’t fit in well with the Azure PaaS model, but for purposes of this discussion the differences between a worker role and VM role are irrelevant.)
Azure roles are not instances. They are application containers (that happen to have some sort of VM basis) that are suited to stateless application processing – Microsoft refers to them as Windows Azure Compute, which gives a clue that they are primarily to be used for computing, not persistence. In the context of an application container Azure roles are far more unstable than an AWS EC2 instance. This is both by design and a good thing (if what you want is compute resources). All of the good features of Windows Azure, such as automatic patching, failover etc are only possible if the fabric controller can terminate roles whenever it feels like it. (I’m not sure how this termination works, but I imagine that, at least with web roles, there is a process to gracefully terminate the application by stopping the handling of incoming requests and letting the running ones come to an end.) There is no SLA for a Windows Azure compute single instance as there is with an EC2 instance. The SLA clearly states that you need two or more roles to get the 99.95% uptime.
For compute, we guarantee that when you deploy two or more role instances in different fault and upgrade domains your Internet facing roles will have external connectivity at least 99.95% of the time.
On 4 February 2011, Steve Marx from Microsoft asked Roger Jennings to stop publishing his Windows Azure Uptime Report
Please stop posting these. They’re irrelevant and misleading.
To others who read this, in a scale-out platform like Windows Azure, the uptime of any given instance is meaningless. It’s like measuring the availability of a bank by watching one teller and when he takes his breaks.
Think, for a moment, what this means when you run MongoDB in Windows Azure – your MongoDB role is going to be running where the “uptime of any given instance is meaningless”. That makes using a role for persistence really hard. The only way then is to run multiple instances and make sure that the data is on both instances.
Before getting into how this would work on Windows Azure, consider for a moment that MongoDB is unashamedly fast and that speed is gained by committing data to memory instead of disk as the default option. So committing to disk (using ‘safe mode’) or a number of instances (and their disks) goes against some of what MongoDB stands for. The MongoDB api allows you to specify the ‘safe’ option (or “majority” in 2.0, but more about that later) for individual commands. This means that you can fine tune when you are concerned about ensuring that data is saved. So, for important data you can be safe, and in other cases you may be able to put up with occasional data loss.
(Semi) Officially MongoDB supports Windows Azure with the MongoDB Wrapper that is currently an alpha release. In summary, as per the documentation, is as follows:
- It allows running a single MongoDB process (mongod) on an Azure worker role with journaling. It also optionally allows for having a second worker role instance as a warm standby for the case where the current instance either crashes or is being recycled for a normal upgrade.
- MongoDB data and log files are stored in an Azure Blob mounted as a cloud drive.
- MongoDB on Azure is delivered as a Visual Studio 2010 solution with associated source files
There are also some additional screen shots and instructions in the Azure Configuration docs.
What is interesting about this solution is the idea of a ‘warm standby’. I’m not quite sure what that is and how it works, but since ‘warm standby’ generally refers to some sort of log shipping and the role has journaling turned on, I assume that the journals are written from the primary to the secondary instances. How this works with safe mode (and ‘unsafe’ mode) will need to be looked at and I invite anyone who has experience to comment. Also, I am sure that all of this journaling and warm standby has some performance penalty.
It is unfortunate that there is only support for a standalone mode as MongoDB really comes into its own when using replica sets (which is the recommended way of deploying it on AWS). One of the comments on the page suggests that they will have more time to work on supporting replica sets in Windows Azure sometime after the 2.0 release, which was today.
MongoDB 2.0 has some features that would be useful when trying to get it to work on Windows Azure, particularly the Data Centre Awareness “majority” tagging. This means that a write can be tagged to write across the majority of the servers in the replica set. You should be able to, with MongoDB 2.0, run it in multiple worker roles as replicas (not just warm standby) and ensure that if any of those roles were recycled that data would not be lost. There will still be issues of a recycled instance rejoining the replica set that need to be resolved however – and this isn’t easy on AWS either.
I don’t think that any Windows Azure application can get by with SQL Azure alone – there are a lot of scenarios where SQL Azure is not suitable. That leaves Windows Table Storage or some other database engine. Windows Table Storage, while well integrated into the Windows Azure platform, is short on features and cloud be more trouble than it is worth. In terms of other database engines, I am a fan of MongoDB but there are other options (RavenDB, CouchDB) – although they all suffer from the same problem of recycling instances. I imagine that 10Gen will continue to develop their Windows Azure Wrapper and expect that a 2.0 replica set enabled wrapper would be a fairly good option. So at this stage MongoDB should be a safe enough technology bet, but make sure that you use the “safe mode” or “majority” sparingly in order to take advantage of the benefits of MongoDB.