Archive for category Amazon Web Services
One of the primary influencers on cloud application architectures is the lack of high performance infrastructure — particularly infrastructure that satisfies the I/O demands of databases. Databases running on public cloud infrastructure have never had access to the custom-build high I/O infrastructure of their on-premise counterparts. This had led to the well known idea that “SQL doesn’t scale” and the rise of distributed databases has been on the back of the performance bottleneck of SQL. Ask any Oracle sales rep and they will tell you that SQL scales very well and will point to an impressive list of references. The truth about SQL scalability is that it should rather be worded as ‘SQL doesn’t scale on commodity infrastructure’. There are enough stories on poor and unreliable performance of EBS backed EC2 instances to lend credibility to that statement.
Given high performance infrastructure, dedicated network backbones, Fusion-IO cards on the bus, silly amounts of RAM, and other tweaks, SQL databases will run very well for most needs. The desire for running databases on commodity hardware comes largely down to cost (with influence of availability). Why run your database on hardware that costs a million dollars, licences that cost about the same and support agreements that cost even more, when you can run it on commodity hardware, with open-source software for a fraction of the cost?
That’s all very fine and well until high performance becomes commodity. When high performance becomes commodity then cloud architectures can, and should, adapt. High performance services such as DynamoDB do change things, but such proprietary APIs won’t be universally accepted. The AWS announcement of the new High I/O EC2 Instance Type, which deals specifically with I/O performance by having 10Gb ethernet and SSD backed storage, makes high(er) performance I/O commodity.
How this impacts cloud application architectures will depend on the markets that use it. AWS talks specifically about the instances being ‘an exceptionally good host for NoSQL databases such as Cassandra and MongoDB’. That may be true, but there are not many applications that need that kind of performance on their distributed NoSQL databases — most run fine (for now) on the existing definition of commodity. I’m more interested to see how this matches up with AWSs enterprise play. When migrating to the cloud, enterprises need good I/O to run their SQL databases (and other legacy software) and these instances at least make it possible to get closer to what is possible in on-premise data centres for commodity prices. That, in turn, makes them ripe for accepting more of the cloud into their architectures.
The immediate architectural significance is small, after all, good cloud architects have assumed that better stuff would become commodity (@swardley’s kittens keep shouting that out), so the idea of being able to do more with less is built in to existing approaches. The medium term market impact will be higher. IaaS competitors will be forced to bring their own high performance I/O plans forward as people start running benchmarks. Existing co-lo hosters are going to see one of their last competitive bastions (offering hand assembled high performance infrastructure) broken and will struggle to differentiate themselves from the competition.
Down with latency! Up with IOPS! Bring on commodity performance!
The recent outage suffered at Amazon Web Services due to the failure of something-or-other caused by storms in Virginia has created yet another round of discussions about availability in the public cloud.
Update: The report from AWS on the cause and ramifications of the outage is here.
While there has been some of the usual commentary about how this outage reminds us of the risks of public cloud computing, there have been many articles and posts on how AWS customers are simply doing it wrong. The general consensus is that those applications that were down were architected incorrectly and should have been built with geographic redundancy in mind. I fully agree with that as a principle of cloud based architectures and posted as much last year when there was another outage (and also when it was less fashionable to blame the customers).
Yes, you should build for better geographic redundancy if you need higher availability, but the availability of AWS is, quite frankly not acceptable. The AWS SLA promises 99.95% uptime on EC2 and although they may technically be reaching that or giving measly 10% credits, anecdotally I don’t believe that AWS is getting near that in US-East. 99.95% translates to 4.38 hours a year or 22 minutes a month and I don’t believe that they are matching those targets. (If someone from AWS can provide a link with actual figures, I’ll gladly update this post to reflect as much). Using the x-nines measure of availability is all that we have, even if it is a bit meaningless, and by business measures of availability (application must be available when needed) AWS availability falls far short of expectations.
I am all for using geographic replication/redundancy/resilience when you want to build an architecture that pushes 100% on lower availability infrastructure, but it should not be required to overcome infrastructure that has outages for a couple of hours every few weeks or months. While individual AWS fans are defending AWS and pointing fingers at architectures that are not geographically distributed is going to happen, an article on ZDNet calling AWS customers ‘cheapskates’ is a bit unfair to customers. If AWS can’t keep a data centre running when there is a power failure in an area, and can’t remember to keep the generator filled with diesel (or whatever), blaming customers for single building single zone architectures isn’t the answer.
Yes, I know that there are availability zones and applications that spanned availability zones may not have been affected, but building an application where data is distributed across multiple AZs is not trivial either. Also, it seems that quite frequently an outage in one AZ has an impact on the other (overloads the EBS control plane, insufficient capacity on healthy AZ etc), so the multiple AZ approach is a little bit risky too.
Us application developers and architects get that running a highly available data centre is hard, but so is building a geographically distributed application. So we are expected to build these complicated architectures because the infrastructure is less stable than expected? Why should we (and the customers paying us) take on the extra effort and cost just because AWS is unreliable? How about this for an idea — fix AWS? Tear down US East and start again… or something. How is AWS making it easier to build geographically distributed applications? No, white papers aren’t good enough. If you want your customers to wallpaper over AWS cracks, make services available that make geographic distribution easier (data synchronisation services, cross-region health monitoring and autoscaling, pub-sub messaging services, lower data egress costs to AWS data centres).
Regardless of how customers may feel, if you Google ‘AWS outage’ you get way, way to many results in the search. This isn’t good for anybody. It isn’t good for people like me who are fans of the public cloud, it isn’t good for AWS obviously, and it isn’t even good for AWS competitors (who are seen as inferior to AWS). If I see another AWS outage in the next few months, in any region, for any reason I will be seriously fucking pissed off.
When Amazon announced RDS for SQL Server and .NET support for Elastic Beanstalk, the response over the next few hours and days was a gushy ‘AWS cosies up to .NET developers’ or something similar. My first thought upon reading the news was “Man, some people on the Azure team must be really, really pissed at the SQL Server team for letting SQL Server on to AWS”. It’s not that AWS is not a good place for .NET people to cosy up to, and some AWS people are very cosy indeed (except for one, who’s been avoiding me for a year), but .NET people getting friendly with AWS people is bad for Azure. While it is great for .NET developers, the problem for Microsoft is that SQL RDS erodes the primary competitive advantage of Windows Azure.
AWS has been a long time supporter of the Windows and .NET ecosystem but the missing part was the lack of a good story around SQL Server. Sure, you have always been able to roll your own SQL instance, but keeping it available and running is a pain. What was lacking, until this week, was a SQL Server database service that negated the need to muck around by yourself. What was needed was a service provided by AWS that you could just click on to enable. Not only does AWS now support SQL (although not 2012 yet) it seems to superficially offer a better SQL than Microsoft does on SQL Azure. I personally think that SQL Azure is a better product and has been developed, from the ground up, specifically for a cloud environment, but that process has left it somewhat incompatible with on-premise SQL Server. AWSs RDS SQL is plain ‘ol SQL Server that everyone is familiar with, with databases bigger than 150GB, backups, performance counters and other things that are lacking in SQL Azure. While the discerning engineer may understand the subtle edge that SQL Azure has over RDS SQL, it will be completely lost on the decision makers.
AWS has recently been making feints into the enterprise market, a stalwart of more established vendors, including Microsoft. And, if AWS want to present a serious proposition to enterprise customers, they have to present a good Windows/.NET story without gaps — and it seems that they are beginning to fill in those gaps. It is particularly interesting and compelling for larger enterprises where there is a mish-mash of varied platforms, as there inevitably are in large organisations, where one cloud provider is able to take care of them all.
Windows Azure has Windows/.NET customer support at the core of its value proposition and SQL Azure is a big part of that. If you have a need for SQL Server functionality, why go to anyone other than a big brand that offers it as part of their core services (and I mean ‘service’, not just ‘ability to host or run’)? Windows Azure was that big brand offering that service, where the customer would choose it by default because of SQL support. Well, now there is another big brand with a compelling offering.
Microsoft obviously can’t go around refusing licenses for their software, and for a business that for decades has had ‘sell as many licenses as possible’ as their most basic cheerleader chant, it is virtually impossible to not sell licenses. The models for the new world of cloud computing clash right here with the old business models that Microsoft is struggling to adapt. For an organisation that is ‘all in’ on the cloud, the only ‘all in’ part of the messages that I am getting is that Microsoft wants to sell as many licenses of their products to cloud providers as possible — putting Windows Azure in a very awkward position. If it was me in the big Microsoft chair, I would have fought SQL RDS as long as possible — but hey, I’m not a highly influential sweaty billionaire, so my opinion doesn’t count and won’t make me a sweaty billionaire either.
The competitor to Windows Azure is not AWS, or AppEngine or any other cloud provider — the competitor is Windows Server, SQL Server and all the on-premise technologies that their customers are familiar with. I’m sure that Microsoft desperately wanted to get SQL onto RDS and helped as much as they could because that is what their customers were asking for (Microsoft is apparently quite big on listening to customers). I can’t help thinking that every time Microsofties went over for a meeting at the Amazon office to hammer out the details, the Azure team was left clueless in Redmond and the Amazon staff were chuckling behind their backs.
How does Microsoft reconcile their support for Windows Azure and their support for their existing customers and business models? How do they work with AWS as one of their biggest partners and competitors? While Microsoft struggles with these sorts of questions and tries to decide where to point the ship, Amazon will take whatever money it can off the table, thank you very much.
One of the most popular posts on CloudComments is the year old Amazon Web Services is not IaaS, mainly because people search for AWS IaaS and it comes up first. It does illustrate the pervasiveness of the IaaS/PaaS/SaaS taxonomy despite it’s lack of clear and agreed definition — people are, after all, searching for AWS in the context of IaaS.
Amazon, despite being continually referred to by analysts and the media as IaaS has avoided classifying itself as ‘just’ IaaS and specifically avoids trying to be placed in the PaaS box. This is understandable as many platforms that identify themselves as PaaS, such as Heroku, run on AWS and the inferred competition with their own customers is best avoided. As covered by ZDnet earlier this year,
“We want 1,000 platforms to bloom,” said Vogels, echoing comments he made at Cloud Connect in March, before explaining Amazon has “no desire to go and really build a [PaaS].”
(which sort of avoids directly talking about AWS as PaaS).
As an individual with no affiliation with analysts, standards organisations and ‘leaders’ who spend their days putting various bits of the cloud in neat little boxes, I have no influence (or desire to influence) the generally accepted definition of IaaS or PaaS. It is, after all, meaningless and tiresome, but the market is still led by these definitions and understanding AWSs position within these definitions is necessary for people still trying to figure things out.
To avoid running foul of some or other specific definition of what a PaaS is, I’ll go ahead and call AWS PaaS v.Next. This (hopefully) implies that AWS is the definition for what PaaS needs to be and, due to their rapid innovation, are the ones to look at for what it will become. Some of my observations,
- AWS is releasing services that are not only necessary for a good application platform, but nobody else seems to have (or seem to be building). Look at Amazon DynamoDB and Amazon CloudSearch for examples of services that are definitely not traditional infrastructure but are fundamental building blocks of modern web applications.
- AWS CloudFormation is the closest thing to a traditional PaaS application stack and although it has some gaps, they continue to innovate and add to the product.
- Surely it is possible to build an application platform using another application platform? Amazon Web Services (the clue being in the ‘Web Services’ part of the name) provides services that, in the context of modern application architectures, are loosely coupled, REST based and fit in perfectly well with whatever you want to build on it. It doesn’t make it infrastructure (there is no abstraction from tin), it makes it platform services which are engineered into the rest of the application. Heroku, for example, is a type of PaaS running on the AWS application platform and will/should embrace services such as DynamoDB and CloudSearch — architecturally I see no problem with that.
- The recent alignment of Eucalyptus and CloudStack to the AWS API indicates that AWS all but owns the definition of cloud computing. The API coverage that those cloud stacks have supports more of the infrastructure component for now and I would expect that over time (as say Eucalyptus adds a search engine) that they would continue to adopt the AWS API and therefore the AWS definition of what makes a platform.
What of the other major PaaS players (as put into neat little boxes) such as Windows Azure and Google App Engine? Well it is obvious that they are lagging and are happy (relieved?), for now, that AWS is not trying to call itself PaaS. But the services that are being added at such a rapid rate to AWS make them look like less and less attractive platforms. Azure has distinct advantages as a purer PaaS platform, such as how it handles deployments and upgrades, and Azure has a far better RDBMS in SQL Azure. But how do application developers on Azure do something simple like search? You would think that the people who built Bing would be able to rustle up some sort of search service — it is embarrassing to them that AWS built a search application platform first. (The answer to the question, by the way, is ‘Not easily’ — Azure developers have to mess around with running SOLR on Java in Azure). How many really useful platform services does AWS have to release before Microsoft realises that AWS has completely pwned their PaaS lunch?
I don’t know what the next platform service is that AWS will release, but I do know two three things about it. Firsty, it will be soon. Secondly it will be really useful, and lastly, it won’t even be in their competitors’ product roadmap. While there is still a lot to be done on AWS and many shortcomings in their services to application developers, to me it is clear that AWS is taking the lead as a provider of application platform services in the cloud. They are the leaders in what PaaS is evolving into — I’ll just call it PaaS v.Next.
The news that AWS is partnering with Eucalyptus to provide some sort of API compatible migration between private clouds and AWS is interesting. But not for the reasons you would expect. Yes, at some level it is interesting that AWS is acknowledging private-public cloud portability. It is also somewhat interesting that Eucalyptus providers now have an extra arrow in their quiver. But all of that will be minor in the bigger AWS scheme of things anyway — after all, those partnerships seldom mount to much (as @swardley asks, “Is ”AWS partnering with Eucalyptus = MS partnering with Novell“ a sensible analogy or the argument of those hanging on by their fingernails?”). But still it is a good move by Eucalyptus nonetheless.
What is interesting is the API compatibility. Eucalyptus is AWS API compatible and OpenStack is not. The OpenStack community has been arguing for months on whether or not they should make their API compatible with AWS. I haven’t followed the argument in detail (yawn) and think that currently they are still um and ah-ing over AWS API compatibility. Feel free to correct me in the comments if necessary. Have a read through Innovation and OpenStack: Lessons from HTTP by Mark Shuttleworth for his opinion on the matter (as of September 2011).
One of the questions about API compatibility is whether or not AWS would get upset and it seems that the Eucalyptus agreement has given explicit rights to use the AWS API. The legal rights around using the same API may be grey, but the right to brag about it has to be given by the original authors, surely? This bragging right is going to give Eucalyptus a lot of credibility and a head start over Openstack.
What about CloudFoundry, OpenShift and other cloud platforms? I have always avoided trying to define AWS in the context of cloud taxonomies, using the IaaS/PaaS/SaaS or any other taxonomy (see Amazon Web Services is not IaaS) and the reason is quite simple. AWS is pretty much the definition of cloud computing and all definitions have to bow down to AWSs dominance. After all, what’s the point of drawing little definition boxes if the gorilla doesn’t fit comfortably into any of them?
So what is really interesting about the Eucalyptus announcement is that it lends credibility to AWS as the definition of cloud computing (not just the market leader or early adopter). Using AWS as the definition and getting rid of all of the IaaS/PaaS crap makes it pretty easy for AWS to talk to the enterprise – far more than talking on-prem does.
As a side note, Microsoft seriously needs to get API compatibility between Windows Azure and on-prem Windows or else AWS is going to be having interesting conversations with Microsoft’s enterprise customers. (Considering their enterprise heritage I am at a complete loss at explaining why, after more than two years this is still the case)
AWS doesn’t want cloud bursting, where customers come to the platform and retreat to their on-premise infrastructure when the work is done. AWS is keen to get you to commit to longer term reserved instances and is adjusting their pricing accordingly.
There is nothing like a price drop in Jeff Barr’s midnight AWS announcements to get everybody excited in the morning. What interests me is not the price drop per se, nor the inevitable demand for competitors to follow, nor the back and forth comparison with traditional managed hosting that will get underway (again). What is interesting is the increasing differential between the price drop for on demand versus reserved instances.
As James has pointed out, reserved instance pricing is important in developing a cost model for cloud applications and over time, but the gathering of data and analysis can get a bit tricky. EC2 reserved instance pricing page reckons that reserved instance pricing will save between 50% and 70% of your EC2 costs (RDS has similar savings for reserved databases) — which is a compelling proposition.
Anecdotal evidence (read “I may or may not have heard it somewhere”) suggests that a lot of AWS business is for cloud bursting — where AWS is used, not as the primary platform, but one to use occasionally. It would also seem that the ‘occasionally’ refers to development and test capacity, rather than an architecture engineered to use AWS as a true cloud bursting platform for a production system. By creating a huge differential between on demand and reserved instances, AWS may be presenting the teaser to convert those ‘cloud burst’ uses into long term deployments. After all, if it is cheap enough to use on demand (which it must be otherwise customers would build their own) then being able to drop the price of something that customers know works (technically) by an additional 50% may push the decision makers in favour of AWS. But those savings only come from long term commitments (3 years), which is enough time to ensure that the particular application is committed to AWS, with others to follow.
While reductions in cloud costs are welcome, the fundamental problem of figuring out the costs, cost benefits and ‘in the box’ comparisons continue to be difficult. I have discussed this as an engineering problem and people always wade dangerously into the debate, as Jeff Barr did recently, and there is some tricky analysis required. Pricing is still far to complex and the models too immature to be convincing in front of the CFO, or whoever controls the budget, and private cloud advocates, who have been doing pricing for a while, can almost always bulldoze a public cloud pricing model. Rather than only a few pennies saved on EC2 savings, I would like to see some rich, capable pricing tools and models emerging.
The question: should you use S3 to host a high volume static web site or should you configure and operate a load balanced and auto-scaling EC2 Multi A-Z cluster?
- Ignore cost of storage. You get loads bundled on EC2 and a couple of GB is only a few cents on S3. This is not true for big media streaming of course.
- Bandwidth costs are the same for S3 and EC2 and can be excluded from the comparison.
- Management costs are 1:1 with VM costs. This in turn assumes existing management infrastructure and people are in place and this website is an incremental requirement.
- EC2 will require two instances running. Ideally one in each Zone to achieve a vaguely similar sort of availability target as S3.
- A home page could require circa 100 GET requests (perhaps overdoing it a little bit)
- A UK only web site may only be truly busy for 12 hours per day.
- The cost of a “Heavy Utilisation 1 Year Reserved Instance Small Linux Instance in EU” is $45.65 per month. Two instances: $91.30 per month. Total managed cost: $200 per month.
You would have to make 200,000,000 GET requests per month to reach $200. That is 2,000,000 Page Views per month. Considering 12 hours per day: 91 pages per second. This is a small load shared between two web servers only serving static content. Surely within the reach of two small Linux instances – in fact shouldn’t they serve 10x that volume for the same price?
Because S3 sites are so incredibly simple to setup and have high availability, scalability and performance baked in you can’t possibly justify building up EC2 based web servers at low page volumes. However, the primary cost of S3 is down to GET requests and there are no price/volume breaks in the pricing of GET requests. The costs scale with the volume of requests in a linear way and much faster than they do if you were to build your own EC2 fleet.
If you don’t already have a management function in place for EC2 then the level of scale needed to justify this expense would be considerably higher. The big benefit of S3 sites is in the “static web site as a service” element – i.e. a highly scalable, available, high performance, simple and managed environment.
The linear relationship between scale and costs whilst a disadvantage in one way could be seen as an advantage. The S3 site scales from 0 to Massive and back again instantly. It can remain dormant for days and then fend off massive media focus.
However, I was surprised to see that this wasn’t as cut and dried in favour of S3 as I’d assumed or hoped.