One of the most popular posts on CloudComments is the year old Amazon Web Services is not IaaS, mainly because people search for AWS IaaS and it comes up first. It does illustrate the pervasiveness of the IaaS/PaaS/SaaS taxonomy despite it’s lack of clear and agreed definition — people are, after all, searching for AWS in the context of IaaS.
Amazon, despite being continually referred to by analysts and the media as IaaS has avoided classifying itself as ‘just’ IaaS and specifically avoids trying to be placed in the PaaS box. This is understandable as many platforms that identify themselves as PaaS, such as Heroku, run on AWS and the inferred competition with their own customers is best avoided. As covered by ZDnet earlier this year,
“We want 1,000 platforms to bloom,” said Vogels, echoing comments he made at Cloud Connect in March, before explaining Amazon has “no desire to go and really build a [PaaS].”
(which sort of avoids directly talking about AWS as PaaS).
As an individual with no affiliation with analysts, standards organisations and ‘leaders’ who spend their days putting various bits of the cloud in neat little boxes, I have no influence (or desire to influence) the generally accepted definition of IaaS or PaaS. It is, after all, meaningless and tiresome, but the market is still led by these definitions and understanding AWSs position within these definitions is necessary for people still trying to figure things out.
To avoid running foul of some or other specific definition of what a PaaS is, I’ll go ahead and call AWS PaaS v.Next. This (hopefully) implies that AWS is the definition for what PaaS needs to be and, due to their rapid innovation, are the ones to look at for what it will become. Some of my observations,
- AWS is releasing services that are not only necessary for a good application platform, but nobody else seems to have (or seem to be building). Look at Amazon DynamoDB and Amazon CloudSearch for examples of services that are definitely not traditional infrastructure but are fundamental building blocks of modern web applications.
- AWS CloudFormation is the closest thing to a traditional PaaS application stack and although it has some gaps, they continue to innovate and add to the product.
- Surely it is possible to build an application platform using another application platform? Amazon Web Services (the clue being in the ‘Web Services’ part of the name) provides services that, in the context of modern application architectures, are loosely coupled, REST based and fit in perfectly well with whatever you want to build on it. It doesn’t make it infrastructure (there is no abstraction from tin), it makes it platform services which are engineered into the rest of the application. Heroku, for example, is a type of PaaS running on the AWS application platform and will/should embrace services such as DynamoDB and CloudSearch — architecturally I see no problem with that.
- The recent alignment of Eucalyptus and CloudStack to the AWS API indicates that AWS all but owns the definition of cloud computing. The API coverage that those cloud stacks have supports more of the infrastructure component for now and I would expect that over time (as say Eucalyptus adds a search engine) that they would continue to adopt the AWS API and therefore the AWS definition of what makes a platform.
What of the other major PaaS players (as put into neat little boxes) such as Windows Azure and Google App Engine? Well it is obvious that they are lagging and are happy (relieved?), for now, that AWS is not trying to call itself PaaS. But the services that are being added at such a rapid rate to AWS make them look like less and less attractive platforms. Azure has distinct advantages as a purer PaaS platform, such as how it handles deployments and upgrades, and Azure has a far better RDBMS in SQL Azure. But how do application developers on Azure do something simple like search? You would think that the people who built Bing would be able to rustle up some sort of search service — it is embarrassing to them that AWS built a search application platform first. (The answer to the question, by the way, is ‘Not easily’ — Azure developers have to mess around with running SOLR on Java in Azure). How many really useful platform services does AWS have to release before Microsoft realises that AWS has completely pwned their PaaS lunch?
I don’t know what the next platform service is that AWS will release, but I do know two three things about it. Firsty, it will be soon. Secondly it will be really useful, and lastly, it won’t even be in their competitors’ product roadmap. While there is still a lot to be done on AWS and many shortcomings in their services to application developers, to me it is clear that AWS is taking the lead as a provider of application platform services in the cloud. They are the leaders in what PaaS is evolving into — I’ll just call it PaaS v.Next.
The news that AWS is partnering with Eucalyptus to provide some sort of API compatible migration between private clouds and AWS is interesting. But not for the reasons you would expect. Yes, at some level it is interesting that AWS is acknowledging private-public cloud portability. It is also somewhat interesting that Eucalyptus providers now have an extra arrow in their quiver. But all of that will be minor in the bigger AWS scheme of things anyway — after all, those partnerships seldom mount to much (as @swardley asks, “Is ”AWS partnering with Eucalyptus = MS partnering with Novell“ a sensible analogy or the argument of those hanging on by their fingernails?”). But still it is a good move by Eucalyptus nonetheless.
What is interesting is the API compatibility. Eucalyptus is AWS API compatible and OpenStack is not. The OpenStack community has been arguing for months on whether or not they should make their API compatible with AWS. I haven’t followed the argument in detail (yawn) and think that currently they are still um and ah-ing over AWS API compatibility. Feel free to correct me in the comments if necessary. Have a read through Innovation and OpenStack: Lessons from HTTP by Mark Shuttleworth for his opinion on the matter (as of September 2011).
One of the questions about API compatibility is whether or not AWS would get upset and it seems that the Eucalyptus agreement has given explicit rights to use the AWS API. The legal rights around using the same API may be grey, but the right to brag about it has to be given by the original authors, surely? This bragging right is going to give Eucalyptus a lot of credibility and a head start over Openstack.
What about CloudFoundry, OpenShift and other cloud platforms? I have always avoided trying to define AWS in the context of cloud taxonomies, using the IaaS/PaaS/SaaS or any other taxonomy (see Amazon Web Services is not IaaS) and the reason is quite simple. AWS is pretty much the definition of cloud computing and all definitions have to bow down to AWSs dominance. After all, what’s the point of drawing little definition boxes if the gorilla doesn’t fit comfortably into any of them?
So what is really interesting about the Eucalyptus announcement is that it lends credibility to AWS as the definition of cloud computing (not just the market leader or early adopter). Using AWS as the definition and getting rid of all of the IaaS/PaaS crap makes it pretty easy for AWS to talk to the enterprise – far more than talking on-prem does.
As a side note, Microsoft seriously needs to get API compatibility between Windows Azure and on-prem Windows or else AWS is going to be having interesting conversations with Microsoft’s enterprise customers. (Considering their enterprise heritage I am at a complete loss at explaining why, after more than two years this is still the case)
A post Getting Real About Distributed System Reliability by Jay Kreps is an interesting post about the perception that distributed systems (and distributed databases) increase reliability because they are horizontally scalable. The reasoning flaw, he points out is ‘is the assumption that failures are independent’.
Failures tend to occur, as is his observation, because of bugs in the software (or in the homogeneous infrastructure) and the addition of redundant nodes does not decrease the likelihood of failure much. We see this continuously with cloud outages – the recent leap day bug that crashed Windows Azure is a good example.
I have been doing some work on availability recently and my first availability influencer is quality, followed by fault tolerance (resilience). Redundancy is relevant at the hardware level and is more relevant for scalability than availability. So yes, to active availability — quality, then resilience, and redundancy near the bottom of the list.
I have also been doing work on cloud operations and was intrigued to see that in his post he highlights that the core difficulty is not architecture or design, but operations. I think that he is downplaying architecture but the ability to operate a complex (distributed) system is a big part of keeping it running. He singles out AWSs DynamoDB,
This is why people should be excited about things like Amazon’s DynamoDB. When DynamoDB was released, the company DataStax that supports and leads development on Cassandra released a feature comparison checklist. The checklist was unfair in many ways (as these kinds of vendor comparisons usually are), but the biggest thing missing in the comparison is that you don’t run DynamoDB, Amazon does. That is a huge, huge difference. Amazon is good at this stuff, and has shown that they can (usually) support massively multi-tenant operations with reasonable SLAs, in practice.
I tend to agree with that. Rolling your own available platform is going to be hard, and providers of cloud services, such as Amazon or Microsoft, have more mature operational processes to keep things available. It also casts a shadow over self operated cloud platforms (such as CloudFoundry) which have all of the bugs and none of the operational chops to ensure that availability is high.
Go and read Jay’s post. It is required reading for people building cloud applications.
Often, in cloud computing, we talk about availability. After all, we use the cloud to build high availability applications, right? When pressed to explain exactly what is meant by availability, people seem to be stuck at an answer. “A system that is not ‘down’, er, I mean ‘Up” is not good enough, but very common. So I had a crack at my own definition, starting of by describing availability outcomes and influencers. Have a look at Defining application availability and let me know what you think.
AWS doesn’t want cloud bursting, where customers come to the platform and retreat to their on-premise infrastructure when the work is done. AWS is keen to get you to commit to longer term reserved instances and is adjusting their pricing accordingly.
There is nothing like a price drop in Jeff Barr’s midnight AWS announcements to get everybody excited in the morning. What interests me is not the price drop per se, nor the inevitable demand for competitors to follow, nor the back and forth comparison with traditional managed hosting that will get underway (again). What is interesting is the increasing differential between the price drop for on demand versus reserved instances.
As James has pointed out, reserved instance pricing is important in developing a cost model for cloud applications and over time, but the gathering of data and analysis can get a bit tricky. EC2 reserved instance pricing page reckons that reserved instance pricing will save between 50% and 70% of your EC2 costs (RDS has similar savings for reserved databases) — which is a compelling proposition.
Anecdotal evidence (read “I may or may not have heard it somewhere”) suggests that a lot of AWS business is for cloud bursting — where AWS is used, not as the primary platform, but one to use occasionally. It would also seem that the ‘occasionally’ refers to development and test capacity, rather than an architecture engineered to use AWS as a true cloud bursting platform for a production system. By creating a huge differential between on demand and reserved instances, AWS may be presenting the teaser to convert those ‘cloud burst’ uses into long term deployments. After all, if it is cheap enough to use on demand (which it must be otherwise customers would build their own) then being able to drop the price of something that customers know works (technically) by an additional 50% may push the decision makers in favour of AWS. But those savings only come from long term commitments (3 years), which is enough time to ensure that the particular application is committed to AWS, with others to follow.
While reductions in cloud costs are welcome, the fundamental problem of figuring out the costs, cost benefits and ‘in the box’ comparisons continue to be difficult. I have discussed this as an engineering problem and people always wade dangerously into the debate, as Jeff Barr did recently, and there is some tricky analysis required. Pricing is still far to complex and the models too immature to be convincing in front of the CFO, or whoever controls the budget, and private cloud advocates, who have been doing pricing for a while, can almost always bulldoze a public cloud pricing model. Rather than only a few pennies saved on EC2 savings, I would like to see some rich, capable pricing tools and models emerging.
One of the key concepts in scalability is the ability to allow for service degradation when an application is under load. But service degradation can be difficult to explain (an relate back to the term) and ‘degrade’ has negative connotations.
The networking people overcame the bad press of degradation by calling it ‘traffic shaping’ or ‘packet shaping’. Traffic shaping, as we see it on the edge of the network on our home broadband connections, allows some data packets to be of a lower priority (such online gaming) than others (such as web browsing). The idea is that a saturated network can handle the load by changing the profile or shape of priority traffic. Key to traffic shaping is that most users don’t notice that it is happening.
So along a similar vein I am starting to talk about feature shaping which is the ability for an application, when under load to shape the profile of features that get priority, or to shape the result to be one that is less costly (in terms of resources) to produce. This is best explained by examples.
- A popular post on High Scalability talked about how Farmville degraded services when under load by dropping some of the in game features that required a lot of back end processing — shaping the richness of in-game functionality.
- Email confirmations can be delayed to reduce load. The deferred load can either by the generation of the email itself, or the result of sending the email.
- Encoding of videos on Facebook is not immediate and is shaped by the capacity that is available for encoding. During peak usage, the feature will take longer.
- A different search index that produces less accurate results, but for a lower cost, may be used during heavy load — shaping the search result.
- Real-time analytics for personalised in-page advertising can be switched off when under load — shaping the adverts to those that are more general.
So my quick definition of feature shaping is
- Feature shaping allows some parts of an application degrade their normal performance or accuracy service levels in response to load.
- Feature shaping is not fault tolerance — it is not a mechanism to cope when all hell breaks loose.
- Feature shaping is for exceptional behaviour and features should not be shaped under normal conditions
- Shaped features will be generally unnoticeable to most users. The application seems to behave as expected.
- Feature shaping can be automated or manual.
- Feature shaping can be applied differently to different sets of users at the same time (e.g. registered users don’t get features shaped).
So, does the terminology of feature shaping make sense to you?
The question: should you use S3 to host a high volume static web site or should you configure and operate a load balanced and auto-scaling EC2 Multi A-Z cluster?
- Ignore cost of storage. You get loads bundled on EC2 and a couple of GB is only a few cents on S3. This is not true for big media streaming of course.
- Bandwidth costs are the same for S3 and EC2 and can be excluded from the comparison.
- Management costs are 1:1 with VM costs. This in turn assumes existing management infrastructure and people are in place and this website is an incremental requirement.
- EC2 will require two instances running. Ideally one in each Zone to achieve a vaguely similar sort of availability target as S3.
- A home page could require circa 100 GET requests (perhaps overdoing it a little bit)
- A UK only web site may only be truly busy for 12 hours per day.
- The cost of a “Heavy Utilisation 1 Year Reserved Instance Small Linux Instance in EU” is $45.65 per month. Two instances: $91.30 per month. Total managed cost: $200 per month.
You would have to make 200,000,000 GET requests per month to reach $200. That is 2,000,000 Page Views per month. Considering 12 hours per day: 91 pages per second. This is a small load shared between two web servers only serving static content. Surely within the reach of two small Linux instances – in fact shouldn’t they serve 10x that volume for the same price?
Because S3 sites are so incredibly simple to setup and have high availability, scalability and performance baked in you can’t possibly justify building up EC2 based web servers at low page volumes. However, the primary cost of S3 is down to GET requests and there are no price/volume breaks in the pricing of GET requests. The costs scale with the volume of requests in a linear way and much faster than they do if you were to build your own EC2 fleet.
If you don’t already have a management function in place for EC2 then the level of scale needed to justify this expense would be considerably higher. The big benefit of S3 sites is in the “static web site as a service” element – i.e. a highly scalable, available, high performance, simple and managed environment.
The linear relationship between scale and costs whilst a disadvantage in one way could be seen as an advantage. The S3 site scales from 0 to Massive and back again instantly. It can remain dormant for days and then fend off massive media focus.
However, I was surprised to see that this wasn’t as cut and dried in favour of S3 as I’d assumed or hoped.