Archive for category Design
Too often we see cloud project fail, not because of the platforms or lack of enthusiasm, but from a general lack of skills on cloud computing principles and architectures. At the beginning of last year I looked at how to address this problem and realised that some guidance was needed on what is different with cloud applications and how to address those differences.
The result was a book that I wrote and published “CALM -Cloud ALM with Microsoft Windows Azure”, which takes the approach that implementation teams know how to do development, they just don’t know how to do it for the cloud, and they need to adopt cloud thinking into their existing development practices.
The “with Windows Azure” means that the book has been written with specific examples of how problems are solved with Windows Azure, but is not necessarily a book about Windows Azure — it applies as much to AWS (except you would have to figure out the technologies that apply yourself).
CALM takes an approach to look at certain models and encourages filling in the detail of the models in order to come up with the design. The models include the lifecycle model, which looks at load and traffic over time, the availability model, data model, test model and others. In looking at the full breadth of ALM (not just development), some models apply to earlier stages (qualify and prove), as well as post-delivery models, such as the deployment, health and operational models.
CALM is licensed as open source, which also means that it is free to download, read and use. It is available on github at github.com/projectcalm/Azure-EN, with pdf, mobi (Kindle), and raw html available for download on this share. A print version of the book is also available for purchase on Lulu.
I encourage you to have a look at CALM, let others know about it, ask any questions, and give me some feedback on how it can be made better.
Update: This post was written way back in March 2011 and the cloud support of database latency has moved on since then. Look at What DynamoDB tells us about the future of cloud computing and AWS and high performance commodity for updates.
A general theme in the thousands of comments about the Reddit outage is the blame that is placed firmly at the door of Amazon Web Services EBS (Elastic Block Store), as so eloquently put by ex Reddit employee ketralnis,
Amazon’s EBSs are a barrel of laughs in terms of performance and reliability and are a constant (and the single largest) source of failure across reddit.
Without trying to understand the real reasons behind the Reddit outage and the congregation of pro-Reddit commenters on their blog, I would tend to agree with the tertiary reason for failure being the failure of AWS’s EBS. The primary reason for failure is having such a high dependency on EBS in the first place.
Databases are known not to be able to scale up cheaply and the expensive item for scaling up databases is disk i/o performance – be that throughput, latency, availability or any other measure. So the scaling out of databases comes at the expense of a lot of the goodness of an RDBMS, which is where some of the origins of the NoSQL movement were born. Postgres, as used by Reddit, is essentially a plain ‘ol relational database that has a high dependency on disk i/o in order to perform and is not, architecturally, the best candidate data store for hosting on a cheap commodity storage medium that is EBS.
A discussion of whether or not RDS (Amazon Relational Database Service – mySQL on Amazon) is better that Postgres is meaningless as RDS probably wasn’t available when Reddit made their data store decisions. However, Amazon S3 and Amazon SimpleDB were, and the former has 11 nines durability. The architecturally prudent approach would be to make best use of specialised AWS services and not rely on EBS for such a key part of the architecture. Does anyone really think that they are going to get enough database performance using a traditional database product on top of a storage service instead of expensive, hand tuned dedicated storage?
There may be a demand in future, particularly as the enterprise makes more use of AWS, to have higher performance dedicated disk storage inside an AWS data centre, but until then systems should be architected to store data responsibly and tuned for the particular platform. As architects we do make compromises, but many of the NoSQL databases are failure or latency resilient, by running multiple instances, making a lot of use of memory and so on. We do this because of fear of vendor lock-in or because we think that the alternative data stores are cool or interesting.
When building sites on top of AWS, or Google, or Azure, use the data store that is optimised for and built in to the platform as your first choice in order to get the most responsive and reliable service. A lot of the blame for the Reddit failure rests with the original architects that selected a technology (Postgres) that has such a high reliance on an AWS technology (EBS) that will always under perform. At the time, they may not have forseen the load and may have expected EBS to get faster, but their architecture, in the AWS context, is not very scalable.
This problem will not go away. Highly available and low latency storage will always be expensive, going against the commodity cloud computing ethos. Architects have tough decisions to make and the consequences of those decisions are far-reaching.
AWS have released a few whitepapers on strategy’s to adopt on migrating existing applications to the cloud which can be found at migrating your existing applications to the aws cloud
I recently worked on a migration project migrating apps from one Data Centre to another where we had the following phases: Assessment, Proof of concept, Data migration, Application migration, Application/Data modifications and Optimisation note the similarity to that described by the AWS team. The key takeaway for those thinking about under taking an exercise to migrate existing applications to the cloud is that the approach described really isn’t any different from what you would do to migrate from an existing Data centre to an alternative data Centre.
I am a keen proponent of seriously considering how important it is to ensure that your whole application can be transported to another cloud without unreasonable levels of effort. Either way you slice it, even moving from one identical cloud to another will always involve some effort whether it be the initial impact assessment, project management, switching over, data migration, application deployment, parallel running etc. If the clouds aren’t identical then it will require more effort. Obviously.
One way to minimise the impact of change and enable the largest amount of choice (should the need to change become necessary) is to move to the lowest common denominator. A ubiquitous LAMP stack could be such an example – moving between on-premise, managed hosting and IaaS clouds. But this may create a mental attitude that all proprietary cloud services are bad and should be avoided – e.g. BigTable, SimpleDB, SQS, SNS etc. If you are not careful this can become a project mantra that can take on a life of its own and will never be contested ever again! Sometimes those cloud services can offer tremendous value, massively simplifying the architecture and operations and seriously shortening the time-to-value. It’d be daft to always declare them off limits…
Another approach to isolate the impact of change is through good architecture and framework abstractions. Obvious examples that are often cited are the use of ORMs, OLEDB, JDBC, JMS etc. The temptation here, for example, is to think “if I access SQS via a JMS abstraction then I can swap out the implementation – simple as that”.
This is true to a point. However, it is important to remember the non-functional elements of the underlying technology. For example the scalability, availability, monitoring and security model surrounding your cloud queue service may not easily be replicated on-premise at a similar price point.
Never mind IaaS or PaaS the first question I keep coming up against is should I take advantage of propriety features or design my app so I can lift & shift to an alternative platform. The Public cloud services arena is still relatively new and the reluctance to take the leap is understandable. Simon’s post Google App Engine Hello/Goodbye briefly lists a number of reasons outlining why the leap can be dangerous. For apps that have been designed well there really doesn’t need to be the fear of embracing propriety features after all using propriety features is done every day in more traditional deployments such as using BizTalk or even the propriety features of a traditional Relational solution( e.g SQL server reporting services) . So the lift & shift question is becoming harder to defend as a first port of call really. The initial emphasis should be on the SLA’s provided, the stability of the provider (Amazon, Google and Microsoft all pretty sound) and the projected costs over the life cycle of the project. It may be evident that hosting on the cloud initially will be cost effective but during the lifetime it makes sense to move it in house so in that case the portability aspect does become an issue but it should not be the first question you ask!