Archive for category Architecture

Free eBook on Designing Cloud Applications

Too often we see cloud project fail, not because of the platforms or lack of enthusiasm, but from a general lack of skills on cloud computing principles and architectures. At the beginning of last year I looked at how to address this problem and realised that some guidance was needed on what is different with cloud applications and how to address those differences.

The result was a book that I wrote and published “CALM -Cloud ALM with Microsoft Windows Azure”, which takes the approach that implementation teams know how to do development, they just don’t know how to do it for the cloud, and they need to adopt cloud thinking into their existing development practices.

The “with Windows Azure” means that the book has been written with specific examples of how problems are solved with Windows Azure, but is not necessarily a book about Windows Azure — it applies as much to AWS (except you would have to figure out the technologies that apply yourself).

CALM takes an approach to look at certain models and encourages filling in the detail of the models in order to come up with the design. The models include the lifecycle model, which looks at load and traffic over time, the availability model, data model, test model and others. In looking at the full breadth of ALM (not just development), some models apply to earlier stages (qualify and prove), as well as post-delivery models, such as the deployment, health and operational models.

CALM is licensed as open source, which also means that it is free to download, read and use. It is available on github at, with pdf, mobi (Kindle), and raw html available for download on this share. A print version of the book is also available for purchase on Lulu.

I encourage you to have a look at CALM, let others know about it, ask any questions, and give me some feedback on how it can be made better.


Simon Munro



Leave a comment

AWS and high performance commodity

One of the primary influencers on cloud application architectures is the lack of high performance infrastructure — particularly infrastructure that satisfies the I/O demands of databases. Databases running on public cloud infrastructure have never had access to the custom-build high I/O infrastructure of their on-premise counterparts. This had led to the well known idea that “SQL doesn’t scale” and the rise of distributed databases has been on the back of the performance bottleneck of SQL. Ask any Oracle sales rep and they will tell you that SQL scales very well and will point to an impressive list of references. The truth about SQL scalability is that it should rather be worded as ‘SQL doesn’t scale on commodity infrastructure’. There are enough stories on poor and unreliable performance of EBS backed EC2 instances to lend credibility to that statement.

Given high performance infrastructure, dedicated network backbones, Fusion-IO cards on the bus, silly amounts of RAM, and other tweaks, SQL databases will run very well for most needs. The desire for running databases on commodity hardware comes largely down to cost (with influence of availability). Why run your database on hardware that costs a million dollars, licences that cost about the same and support agreements that cost even more, when you can run it on commodity hardware, with open-source software for a fraction of the cost?

That’s all very fine and well until high performance becomes commodity. When high performance becomes commodity then cloud architectures can, and should, adapt. High performance services such as DynamoDB do change things, but such proprietary APIs won’t be universally accepted. The AWS announcement of the new High I/O EC2 Instance Type, which deals specifically with I/O performance by having 10Gb ethernet and SSD backed storage, makes high(er) performance I/O commodity.

How this impacts cloud application architectures will depend on the markets that use it. AWS talks specifically about the instances being ‘an exceptionally good host for NoSQL databases such as Cassandra and MongoDB’. That may be true, but there are not many applications that need that kind of performance on their distributed NoSQL databases — most run fine (for now) on the existing definition of commodity. I’m more interested to see how this matches up with AWSs enterprise play. When migrating to the cloud, enterprises need good I/O to run their SQL databases (and other legacy software) and these instances at least make it possible to get closer to what is possible in on-premise data centres for commodity prices. That, in turn, makes them ripe for accepting more of the cloud into their architectures.

The immediate architectural significance is small, after all, good cloud architects have assumed that better stuff would become commodity (@swardley’s kittens keep shouting that out), so the idea of being able to do more with less is built in to existing approaches. The medium term market impact will be higher. IaaS competitors will be forced to bring their own high performance I/O plans forward as people start running benchmarks. Existing co-lo hosters are going to see one of their last competitive bastions (offering hand assembled high performance infrastructure) broken and will struggle to differentiate themselves from the competition.

Down with latency! Up with IOPS! Bring on commodity performance!

Simon Munro



Feature Shaping

One of the key concepts in scalability is the ability to allow for service degradation when an application is under load. But service degradation can be difficult to explain (an relate back to the term) and ‘degrade’ has negative connotations.

The networking people overcame the bad press of degradation by calling it ‘traffic shaping’ or ‘packet shaping’. Traffic shaping, as we see it on the edge of the network on our home broadband connections, allows some data packets to be of a lower priority (such online gaming) than others (such as web browsing). The idea is that a saturated network can handle the load by changing the profile or shape of priority traffic. Key to traffic shaping is that most users don’t notice that it is happening.

So along a similar vein I am starting to talk about feature shaping which is the ability for an application, when under load to shape the profile of features that get priority, or to shape the result to be one that is less costly (in terms of resources) to produce. This is best explained by examples.

  • A popular post on High Scalability talked about how Farmville degraded services when under load by dropping some of the in game features that required a lot of back end processing — shaping the richness of in-game functionality.
  • Email confirmations can be delayed to reduce load. The deferred load can either by the generation of the email itself, or the result of sending the email.
  • Encoding of videos on Facebook is not immediate and is shaped by the capacity that is available for encoding. During peak usage, the feature will take longer.
  • A different search index that produces less accurate results, but for a lower cost, may be used during heavy load — shaping the search result.
  • Real-time analytics for personalised in-page advertising can be switched off when under load — shaping the adverts to those that are more general.

So my quick definition of feature shaping is

  • Feature shaping allows some parts of an application degrade their normal performance or accuracy service levels in response to load.
  • Feature shaping is not fault tolerance — it is not a mechanism to cope when all hell breaks loose.
  • Feature shaping is for exceptional behaviour and features should not be shaped under normal conditions
  • Shaped features will be generally unnoticeable to most users. The application seems to behave as expected.
  • Feature shaping can be automated or manual.
  • Feature shaping can be applied differently to different sets of users at the same time (e.g. registered users don’t get features shaped).

So, does the terminology of feature shaping make sense to you?

Simon Munro

1 Comment

Qualifying and quantifying “spiky” and “bursty” workloads as candidates for Cloud

Enterprises are looking to migrate applications to the cloud. Enterprises with thousands of applications require a fast, consistent and repeatable process to identify which applications could stand to benefit. One of the benefits of cloud is how on-demand elasticity of seemingly infinite resources can be an advantage to “spiky” or “bursty” workloads. But as I mentioned in a recent post, people may have a very different view on what constitutes “spiky”. This could yield very unpredictable results when trying to identify those cloud candidates.

It would be useful to have a consistent and repeatable way to determine whether an application was spiky.

We normally translate spiky and bursty to “high variability” which is a good term to use as it indicates the statistical methods by which we can assess whether our utilization patterns match this description and hence benefit from the cloud.

Consider the utilization graphs below (numbers displayed beneath). It could equally be transactions per second – just don’t mix and match.

  1. “Very High” line shows a single, short lived spike.
  2. “High” shows two slightly longer lived spikes.
  3. “Mild” shows the same two spikes, less exaggerated and decaying to a non-zero utilization.
  4. “None” shows small fluctuations but no real spikes – essentially constant utilization.

2,3 and 4 actually have the same average utilization of c. 25%. This means that they consume the same amount of compute cycles during the day irrespective of their pattern; however we can see that to cater for “High” we actually have to have 3x the capacity to service the peak than for “None” – leaving resources underutilized most of the time. The curse of the data centre.

Clearly the Average utilization isn’t the whole picture. We need to look at the Standard Deviation to see into the distribution of utilization:

  • High = 42.5
  • Mild = 21.2
  • None = 5.0

Excellent, so the high standard deviation is starting to show which ones are variable and those which are not. But what is the standard deviation of the spikiest load of all “Very High”? Only 20.0? Much the same as the “Mild” line! The final step is to look at the Coefficient of Variation which is the ratio of the Standard Deviation to the Mean. The Coefficient of Variation is:

  • Very High = 4.00
  • High = 1.59
  • Mild = 0.82
  • None = 0.20

@grapesfrog asked me to describe something like AWS Elastic Map Reduce in these terms. Think of EMR as utilisation of {0,0,0,0,5000,0,0,0,0,0,0,0,0,0,0,0…} where 5000% utilisation is 50 x 1 machine at 100% utilisation. So if you used 50 machines @ 100% for one hour every day your CV would be 4.8. If you used 50 machines for one hour every month your CV would rise to 27.2.


Comparing the mean utilization allows us to compare the relative amount of resource used over a period of time. This showed that they nearly all consumed the same amount of resources with the very noticeable exception of the spikiest of them all. It actually consumed very little resource in total.

Comparing the coefficient of variation reveals the spikiest workload. In this example, the spikiest would require 3x the resources of the least spikey to service the demand BUT would only actually consume 20% of the resources consumed by the least! Sometimes this is the point: spikiest loads require the largest amount of resources to be deployed but can actually consume the least.

Further work:

Our assessment could state “any workload showing a CV>0.5” is a candidate for cloud – revealing applications with spiky intra day behaviour as well as month-end classics.

Workloads that oscillate with a high frequency between extremes may show a CV>0.5 but we begin to trespass on topics such as dead beat control within control theory, and will start to challenge the ability of cloud monitoring/control time resolution etc. I’ll leave it there for the time being though…


Hour Very High High Mild None
1 0 0 10 30
2 0 0 10 20
3 0 0 10 30
4 0 0 10 20
5 0 0 10 30
6 0 0 15 20
7 0 10 20 30
8 0 100 50 20
9 0 100 80 30
10 0 100 50 20
11 0 10 20 30
12 0 0 15 20
13 0 0 15 30
14 10 0 15 20
15 100 0 15 30
16 10 0 15 20
17 0 0 15 30
18 0 10 20 20
19 0 100 50 30
20 0 100 80 20
21 0 100 50 30
22 0 10 20 20
23 0 0 15 30
24 0 0 10 20
Standard Deviation 20.0 42.5 21.2 5.0
Average 5 27 26 25
Coefficient of Variation 4.00 1.59 0.82 0.20
Median 0 0 15 25
Minimum 0 0 10 20
Maximum 100 100 80 30



Leave a comment

Issues and benefits conflated with spiky, bursty applications moving to the cloud

Fellow poster Simon Munro made some excellent follow up comments to my recent comments about enterprise applications qualifying for the cloud. I’ve tried to mangle the offline conversation into a post:

Simon was remarking that there are some other qualities conflated with spikiness that are perhaps easily ignored:

Performance – you cannot assume that the required performance is fixed. During peak periods users may tolerate slower performance (and not even notice it). You would have to include something like ApDex in order to get a better sense and impact of the spike.

I think this is a very good point and, ironically, i think it goes both ways. Some apps just aren’t worth the extra resources to maintain peak performance. However, for some apps, the rise in demand might indicate you should make the performance even better as this is a critical moment; the moment that really counts. For example: customer loyalty, serving really stressed users who only care about this system RIGHT NOW or even responding to a crisis.

Base Cost – particularly with commercial enterprise software you have base situations where the cost and load relationship is non linear. Think of the cost of a base Oracle setup, with backup licences, installation costs, DBAs – where the cost is fixed, regardless on whether or not you are putting n or 3n transactions through.

Another excellent point. Although, hopefully this will change over time in response to cloud. Even now we have Oracle on AWS RDS / AWS DevPay or SQL Server via SQL Azure.

Simon then introduces us to a new term:

Business Oriented Service Degradation. I did remark that this concept is sort of covered in some SLAs and SLOs but this is way cooler 😉 Simon’s point is that when an accounting system does it’s end of month run (the spike), the ability to process other transactions is irrelevant because the system is ‘closed’ for new transactions by the business anyway.

Sometimes I wonder if the constraints of the past are cast into the DNA of those operating models. This month is closed but plenty of people could be entering new transactions for the next quarter. Is it possible the resource constraints meant that this was historically a better solution?

The point remains though if the spike is huge and infrequent there is a mismatch in total resources deployed (cost) and the level of utilisation. That means waste.

Interestingly, if the utilisation is sat at 100% for extended periods there is also the case for giving it access to more resources. Clearly, with more resources, the system could complete this effort far sooner. Would there be benefit in that? Enabling “closing” the month even later than usual and capturing more transactions? Better compliance rating by never being late despite failed runs and reruns?


Leave a comment

Just because your application would never earn a place on doesn’t mean it won’t qualify for cloud.

Working with large enterprises with 1000s of applications it is useful to assess hundreds of applications at a time to determine which ones will benefit from moving to the cloud. The first iteration to reveal the “low hanging fruit” requires a fairly swift “big handfuls” approach. One of the most commonly stated economic benefits of the cloud is elastically coping with “spiky” or “bursty” demand. OK, so whilst we scan through the application portfolio one of the things we have to spot are applications expressing this characteristic.

I first want to tackle one of the, ever so slightly, frustrating consequence of the original cloud use cases and case studies. You know – the one where some image processing service goes bananas on Sunday night when everyone uploads their pictures and they scale up with an extra 4000 nodes for a few hours to cope with the queue. The consequence is that people now only perceive such skewed scenarios as cloud candidates along the “spiky” dimension. Hence my parting note in my recent post on Evernote, Zynga and Netflix – even normal business systems express spiky behaviour when you consider the usage cycle over a day, month or year. This is one of the reasons, despite all the virtualisation magic, for still such low average utilisation figures in the data centre.

Some classic mitigations for this (excluding cloud) are to distribute the load more evenly across time and resources e.g.:

  • run maintenance tasks such as index rebuilds, backups, consistency checks during the quiet periods
  • build decisions support data structures overnight and prepare the most common or resource intensive reports before people turn up to work
  • make the application more asynchronous and use queues to defer processing to quieter periods or on to other resources
  • use virtualisation to consolidate VMs on to less physical hosts during quiet periods so you can power off hosts (those hosts are now idle assets returning no business value but at least they aren’t consuming power/cooling)

Despite all this, many applications continue to exhibit spiky behaviour – just not in the extreme headline sense that we see on

In a cloud assessment you just want to identify this behaviour as one reason (among many) to put that application on your list of candidates for further study and business justification. In a cloud migration some applications may be immune from the bulleted options above for a number of reasons. Anyway, those techniques will work nicely in the cloud too of course.



Cloud costs are an engineering problem

Head over to the new Google App Engine pricing that will come into effect when App Engine comes out of preview later in the year and you see a list of prices similar, in format at least, to pricing for AWS, Azure and other cloud providers. That seems fairly straightforward until you look at the FAQ that describes the pricing in more detail that, while answering a lot of questions, gives explanations that give rise to even more questions.

It seems that Google is switching over to an instance based pricing model from a CPU based one, but there are differences between different frameworks – where Java handles concurrent requests and Python and Go do not (yet). In addition the FAQ makes observations about the change in pricing that will affect current apps that are memory heavy because they have been designed to optimise the CPU pricing and may land up being more expensive under the new model. Then there are reserved instances, API charges, bandwidth, premier accounts and a whole lot of other considerations to add to the confusion. Even if you are not interested in App Engine it is a worthwhile read.

I have done and seen a few spreadsheets to try and work out hosting costs for cloud computing and they reach a point of complexity with so many unknowns that it becomes very difficult to go to the business with a definitive statement on how much it will cost to run an application. This is particularly difficult when development hasn’t even started yet, so there is no indication of the architectural choices (say memory over CPU) that affect the estimates. While AWS make be easier in some sense because the instance is a familiar unit (a machine yay big that we put stuff on), there are still many considerations that affect the cost of hosting. Grace an I struggled with a particular piece of SOLR availability and avoided using a load balancer for internal traffic until we ran the numbers and worked out that it would cost pennies per day in bandwidth costs so decided to use ELB after all – and that is one of the simpler pricing architectural decisions. Trying to build a scalable architecture out of loosely coupled components that makes optimal use of the resources available is very difficult to do.

We could ask vendors for better or more flexible pricing models. We could have estimating tools that allow us to estimate costs based on a choice of ‘similar’ application models. We could trade SLAs for cost as S3 reduced redundancy does. We can hedge out costs using reserved instances. We could run simulations (given the on demand availability this is relatively easy). We could have better tools to analyse our bills (as Quest has for Azure). We need all of this but ultimately the pricing of cloud computing is going to remain complex and will increase in complexity in future, leaving the big decisions up to the technical people doing the implementation.

Cloud expertise needs to extend beyond knowing your IaaS from your SaaS and experts need to have a handle on all aspects of cloud computing architectures, for a specific platform, in order to realise the benefits that cloud computing promises. In the context of developers being the new kingmakers, it is developers, software architects and DevOps that are the only ones close enough to the metal to make the decisions that ultimately affect the cost. Where currently developers optimise at the cost of development time (which is largely discouraged), we may want developers to optimise CPU against memory against latency against bandwidth against engineering effort, and even, at a push, against environmental friendliness in future. Let’s not even get into having to adapt to providers changing pricing models periodically. It is going to take some serious skill to pull that together – from the entire team.

So while the cloud computing marketers make it sound easy to put our apps onto the cloud there is a long road ahead in developing the necessary skills to ensure that it is done optimally and at a cost that is reasonable across the life of the application. There are business cases that could collapse under spiralling cloud costs if we pull one lever incorrectly.

Simon Munro



1 Comment

%d bloggers like this: