Too often we see cloud project fail, not because of the platforms or lack of enthusiasm, but from a general lack of skills on cloud computing principles and architectures. At the beginning of last year I looked at how to address this problem and realised that some guidance was needed on what is different with cloud applications and how to address those differences.
The result was a book that I wrote and published “CALM -Cloud ALM with Microsoft Windows Azure”, which takes the approach that implementation teams know how to do development, they just don’t know how to do it for the cloud, and they need to adopt cloud thinking into their existing development practices.
The “with Windows Azure” means that the book has been written with specific examples of how problems are solved with Windows Azure, but is not necessarily a book about Windows Azure — it applies as much to AWS (except you would have to figure out the technologies that apply yourself).
CALM takes an approach to look at certain models and encourages filling in the detail of the models in order to come up with the design. The models include the lifecycle model, which looks at load and traffic over time, the availability model, data model, test model and others. In looking at the full breadth of ALM (not just development), some models apply to earlier stages (qualify and prove), as well as post-delivery models, such as the deployment, health and operational models.
CALM is licensed as open source, which also means that it is free to download, read and use. It is available on github at github.com/projectcalm/Azure-EN, with pdf, mobi (Kindle), and raw html available for download on this share. A print version of the book is also available for purchase on Lulu.
I encourage you to have a look at CALM, let others know about it, ask any questions, and give me some feedback on how it can be made better.
One of the primary influencers on cloud application architectures is the lack of high performance infrastructure — particularly infrastructure that satisfies the I/O demands of databases. Databases running on public cloud infrastructure have never had access to the custom-build high I/O infrastructure of their on-premise counterparts. This had led to the well known idea that “SQL doesn’t scale” and the rise of distributed databases has been on the back of the performance bottleneck of SQL. Ask any Oracle sales rep and they will tell you that SQL scales very well and will point to an impressive list of references. The truth about SQL scalability is that it should rather be worded as ‘SQL doesn’t scale on commodity infrastructure’. There are enough stories on poor and unreliable performance of EBS backed EC2 instances to lend credibility to that statement.
Given high performance infrastructure, dedicated network backbones, Fusion-IO cards on the bus, silly amounts of RAM, and other tweaks, SQL databases will run very well for most needs. The desire for running databases on commodity hardware comes largely down to cost (with influence of availability). Why run your database on hardware that costs a million dollars, licences that cost about the same and support agreements that cost even more, when you can run it on commodity hardware, with open-source software for a fraction of the cost?
That’s all very fine and well until high performance becomes commodity. When high performance becomes commodity then cloud architectures can, and should, adapt. High performance services such as DynamoDB do change things, but such proprietary APIs won’t be universally accepted. The AWS announcement of the new High I/O EC2 Instance Type, which deals specifically with I/O performance by having 10Gb ethernet and SSD backed storage, makes high(er) performance I/O commodity.
How this impacts cloud application architectures will depend on the markets that use it. AWS talks specifically about the instances being ‘an exceptionally good host for NoSQL databases such as Cassandra and MongoDB’. That may be true, but there are not many applications that need that kind of performance on their distributed NoSQL databases — most run fine (for now) on the existing definition of commodity. I’m more interested to see how this matches up with AWSs enterprise play. When migrating to the cloud, enterprises need good I/O to run their SQL databases (and other legacy software) and these instances at least make it possible to get closer to what is possible in on-premise data centres for commodity prices. That, in turn, makes them ripe for accepting more of the cloud into their architectures.
The immediate architectural significance is small, after all, good cloud architects have assumed that better stuff would become commodity (@swardley’s kittens keep shouting that out), so the idea of being able to do more with less is built in to existing approaches. The medium term market impact will be higher. IaaS competitors will be forced to bring their own high performance I/O plans forward as people start running benchmarks. Existing co-lo hosters are going to see one of their last competitive bastions (offering hand assembled high performance infrastructure) broken and will struggle to differentiate themselves from the competition.
Down with latency! Up with IOPS! Bring on commodity performance!
The recent outage suffered at Amazon Web Services due to the failure of something-or-other caused by storms in Virginia has created yet another round of discussions about availability in the public cloud.
Update: The report from AWS on the cause and ramifications of the outage is here.
While there has been some of the usual commentary about how this outage reminds us of the risks of public cloud computing, there have been many articles and posts on how AWS customers are simply doing it wrong. The general consensus is that those applications that were down were architected incorrectly and should have been built with geographic redundancy in mind. I fully agree with that as a principle of cloud based architectures and posted as much last year when there was another outage (and also when it was less fashionable to blame the customers).
Yes, you should build for better geographic redundancy if you need higher availability, but the availability of AWS is, quite frankly not acceptable. The AWS SLA promises 99.95% uptime on EC2 and although they may technically be reaching that or giving measly 10% credits, anecdotally I don’t believe that AWS is getting near that in US-East. 99.95% translates to 4.38 hours a year or 22 minutes a month and I don’t believe that they are matching those targets. (If someone from AWS can provide a link with actual figures, I’ll gladly update this post to reflect as much). Using the x-nines measure of availability is all that we have, even if it is a bit meaningless, and by business measures of availability (application must be available when needed) AWS availability falls far short of expectations.
I am all for using geographic replication/redundancy/resilience when you want to build an architecture that pushes 100% on lower availability infrastructure, but it should not be required to overcome infrastructure that has outages for a couple of hours every few weeks or months. While individual AWS fans are defending AWS and pointing fingers at architectures that are not geographically distributed is going to happen, an article on ZDNet calling AWS customers ‘cheapskates’ is a bit unfair to customers. If AWS can’t keep a data centre running when there is a power failure in an area, and can’t remember to keep the generator filled with diesel (or whatever), blaming customers for single building single zone architectures isn’t the answer.
Yes, I know that there are availability zones and applications that spanned availability zones may not have been affected, but building an application where data is distributed across multiple AZs is not trivial either. Also, it seems that quite frequently an outage in one AZ has an impact on the other (overloads the EBS control plane, insufficient capacity on healthy AZ etc), so the multiple AZ approach is a little bit risky too.
Us application developers and architects get that running a highly available data centre is hard, but so is building a geographically distributed application. So we are expected to build these complicated architectures because the infrastructure is less stable than expected? Why should we (and the customers paying us) take on the extra effort and cost just because AWS is unreliable? How about this for an idea — fix AWS? Tear down US East and start again… or something. How is AWS making it easier to build geographically distributed applications? No, white papers aren’t good enough. If you want your customers to wallpaper over AWS cracks, make services available that make geographic distribution easier (data synchronisation services, cross-region health monitoring and autoscaling, pub-sub messaging services, lower data egress costs to AWS data centres).
Regardless of how customers may feel, if you Google ‘AWS outage’ you get way, way to many results in the search. This isn’t good for anybody. It isn’t good for people like me who are fans of the public cloud, it isn’t good for AWS obviously, and it isn’t even good for AWS competitors (who are seen as inferior to AWS). If I see another AWS outage in the next few months, in any region, for any reason I will be seriously fucking pissed off.
Microsoft’s biggest strength has always its partner network and it seemed, at least for a couple of decades, that a strong channel was needed to get your product into the market. Few remember the days where buyers only saw products in computer magazines, computer trade shows and the salespeople walking through the door — the first two no longer exist and the last one may be on its last legs.
The overriding reaction from Microsoft’s Surface announcement was the snubbing of their traditional OEM partners and building a device on their own. As a consumer I think it’s a great idea — the Dell that I use for work is the sorriest excuse for a premium laptop and the more Microsoft can own the hardware and drivers that support their OS, the better. Surface, and Microsofts ownership of the design, manufacturing, distribution, sales and support of the device is a response to the pressure that they are under from the iPad — and having OEM partners messing around with their own hardware, with Android shoehorned in, obviously doesn’t work for them.
There is more to this story than consumer tablet devices. What about the rest of their business? What about the channel for enterprise software? In terms of traditional enterprise accounts I don’t expect much movement yet — there is a well established channel and an organisational culture within Microsoft to push product through that channel. I don’t expect that enterprises are going to be buying Windows Server for the datacentre from a local Microsoft store any time soon. This is probably because the channel supplier is already camped out in the data centre and rolls everything up in services and support contracts that are attractive to the enterprise (or at least appear to be).
What is going to change the Microsoft to partner to customer channel is the cloud — and it is already happening. Focussing on SaaS for a moment, consider the ‘buy direct’ model of Office365. Customers can go direct to Microsoft and get thousands of seats without going through a partner. The most engagement that they will have with partners is for migration or configuration of AD. Increasingly we are seeing Microsoft cash cow products (Exchange and Office) being sold direct and this trend will only continue. Should anything else be expected of Microsoft? With Google going direct and promising customers all sorts of magic and unicorns, why would Microsoft do anything less? Should they rely on the partner channel that either cowers in the corner with cloud phobia or is still out having an expensive lunch funded by multi year MOLP agreements and software assurance plans? The answer, of course, is no. Microsoft should not hang around while partners are milling about and Google is eating their lunch. Just like Surface, where Microsoft has to take the fight to Apple, Microsoft has to go direct because relying on (OEM) partners hasn’t been working too well.
Not only can customers buy Office365 direct, but they can also buy Windows Azure direct. With the new IaaS oriented VMs, it makes it easier to buy direct than go through the channel for on-prem Windows Servers. Microsoft has tried for years to get partners on board with Windows Azure but it has largely failed. That may be because partners don’t understand Windows Azure, but also because Microsoft is making the money (somehow) on the sales. Without the sale of the hardware, OS, networking, installation, configuration and support of a traditional server, there seems to be very little meat left on the bone for the channel — even if a few pennies are thrown their way by skimming a percentage of the monthly Windows Azure bill. To blame the channel and say that it has been lethargic is not entirely fair — Windows Azure effectively blows their business models out of the water, so it is unsurprising that they failed to embrace it with open arms. As with the SaaS offerings, what is Microsoft to do? Amazon Web Services goes direct and customers go direct to Amazon, without waiting for their incumbent IT suppliers to recommend AWS. In the face of that competition, Microsoft has little choice than to go direct and act like a huge multinational with huge investments in infrastructure, as Amazon does, rather than leaving it up to local partner minnows.
If the Microsoft channel continues to collapse (and I believe it will), when does Microsoft stop? Do they offer more and more professional services on top of their SaaS, PaaS and IaaS? I believe that they have little choice. If a customer buys Office365 direct from Microsoft, who would they choose to do the installation and support? The supplier of the service, or a partner? I believe most customers would choose to get the professional services from Microsoft — after all, they would reason, the Microsoft professional services have more direct access to the people looking after the physical infrastructure than partners would (and they would be right). Should Microsoft snub the channel and offer Windows Azure based software development services? Yes, if their existing channel is fixated on building apps on-prem (whether through ignorance or protecting their market).
Microsoft partners that continue to ignore cloud based offerings are going to fall by the wayside — both because Microsoft will ignore/undercut them by offering direct cloud services, or because their own customers will choose to go direct themselves (even to Google or Amazon). Partners need to work differently and re-invent themselves — find a way to add value and make money off the cloud. Microsoft in turn needs to protect those partners that are cloud-oriented (by, for example, allowing partners direct access to internal teams), after all, they need lots of partners in order to scale. There is no way that Microsoft can offer all of the professional services themselves — yet.
While many criticise the languishing 90s era Microsoft it appears that the ship is beginning to turn. Building a closed ecosystem consumer computing platform seems to be the only way to satisfy the needs of the consumer market (and compete with the iPad). Building out cloud services and massive computing infrastructure seems to be the way for satisfying the needs of the emerging business software market (and competing with Google and Amazon). Nobody cares as much about Microsofts survival than Microsoft does themselves, and they seem to be realizing that. Perhaps Microsoft is shedding the 90s culture and moving with the times. It is a big bet for Microsoft, but Microsoft seems poised to assert their place in the new markets and I wouldn’t bet against their ability to deliver.
One of the most significant, highly anticipated, and worst kept secrets of the Windows Azure spring release is the inclusion of persistent VMs, with the notable addition of support for Linux on those VMs.
The significance of the feature is not that high architecturally — after all, Windows Azure applications that were specifically architected for Windows Azure run well already. The aspects that I find more significant are,
- Closing the gap to AWS — It is has always been difficult to compare Windows Azure and AWS because of the IaaS bias of AWS versus Windows Azure. With the addition of persistent VMs, the two platforms can be better compared and better choices made.
- Base understanding — Windows Azure is widely misunderstood, largely due to its PaaS nature. In the face of this misunderstanding, AWS as the de-facto choice, and the more common understanding of IaaS, has been easy. The addition of persistent VMs allows decision makers to go with something that is more familiar before branching out into some of the specific Windows Azure features (as customers moving to AWS tend to do).
- Not just Windows — The inclusion of Linux is a big deal for Microsoft. Regardless of Microsoft’s own reasons, having first-class support of Linux breaks the perception that Windows Azure is Windows and .NET only. Support of Java, Node.js, Ruby and now Python under Windows Azure now has more credibility with the addition of Linux to the stable.
- Architectural choices — I’ve never been a fan of running everything under the Windows Azure ‘role’ model. Running something like MongoDB or Solr in this way just seems wrong. The addition of persistent VMs now gives architects the chance to deploy technologies that work well under Linux, where there is better support and understanding of how they run. Building a solution with MongoDB running on Linux on Windows Azure is architecturally significant and very useful.
- Enterprise comfort — Enterprises with legacy applications have struggled to make the move to Windows Azure and they are probably the largest drivers of the inclusion of persistent VMs (the ‘listening to our customers’ part of Microsoft). Regardless if it is a good idea or not to run SSIS or old-school SharePoint on a cloud platform, it is something that lots of people want to do. Enterprise customers can now run whatever they like, including Linux-based parts of their solutions.
- Bring your stack — When the announcement of the spring release was made yesterday I was most interested to see the flurry of accompanying press releases. I saw news from RightScale, Cloudant, Opscode and 10Gen. These, and similar, organisations are the backbone of the cloud community and their support of Windows Azure (however extensive it may be) greatly increases the reach of Windows Azure into areas of the cloud playground where the cool kids are hanging out.
It will be interesting to see, over the coming weeks, how the markets and the clouderati respond to these announcements. It was a move that Microsoft had to make and they need to get the right messages about the changes out to the market in order to gain better traction of Windows Azure.
When Amazon announced RDS for SQL Server and .NET support for Elastic Beanstalk, the response over the next few hours and days was a gushy ‘AWS cosies up to .NET developers’ or something similar. My first thought upon reading the news was “Man, some people on the Azure team must be really, really pissed at the SQL Server team for letting SQL Server on to AWS”. It’s not that AWS is not a good place for .NET people to cosy up to, and some AWS people are very cosy indeed (except for one, who’s been avoiding me for a year), but .NET people getting friendly with AWS people is bad for Azure. While it is great for .NET developers, the problem for Microsoft is that SQL RDS erodes the primary competitive advantage of Windows Azure.
AWS has been a long time supporter of the Windows and .NET ecosystem but the missing part was the lack of a good story around SQL Server. Sure, you have always been able to roll your own SQL instance, but keeping it available and running is a pain. What was lacking, until this week, was a SQL Server database service that negated the need to muck around by yourself. What was needed was a service provided by AWS that you could just click on to enable. Not only does AWS now support SQL (although not 2012 yet) it seems to superficially offer a better SQL than Microsoft does on SQL Azure. I personally think that SQL Azure is a better product and has been developed, from the ground up, specifically for a cloud environment, but that process has left it somewhat incompatible with on-premise SQL Server. AWSs RDS SQL is plain ‘ol SQL Server that everyone is familiar with, with databases bigger than 150GB, backups, performance counters and other things that are lacking in SQL Azure. While the discerning engineer may understand the subtle edge that SQL Azure has over RDS SQL, it will be completely lost on the decision makers.
AWS has recently been making feints into the enterprise market, a stalwart of more established vendors, including Microsoft. And, if AWS want to present a serious proposition to enterprise customers, they have to present a good Windows/.NET story without gaps — and it seems that they are beginning to fill in those gaps. It is particularly interesting and compelling for larger enterprises where there is a mish-mash of varied platforms, as there inevitably are in large organisations, where one cloud provider is able to take care of them all.
Windows Azure has Windows/.NET customer support at the core of its value proposition and SQL Azure is a big part of that. If you have a need for SQL Server functionality, why go to anyone other than a big brand that offers it as part of their core services (and I mean ‘service’, not just ‘ability to host or run’)? Windows Azure was that big brand offering that service, where the customer would choose it by default because of SQL support. Well, now there is another big brand with a compelling offering.
Microsoft obviously can’t go around refusing licenses for their software, and for a business that for decades has had ‘sell as many licenses as possible’ as their most basic cheerleader chant, it is virtually impossible to not sell licenses. The models for the new world of cloud computing clash right here with the old business models that Microsoft is struggling to adapt. For an organisation that is ‘all in’ on the cloud, the only ‘all in’ part of the messages that I am getting is that Microsoft wants to sell as many licenses of their products to cloud providers as possible — putting Windows Azure in a very awkward position. If it was me in the big Microsoft chair, I would have fought SQL RDS as long as possible — but hey, I’m not a highly influential sweaty billionaire, so my opinion doesn’t count and won’t make me a sweaty billionaire either.
The competitor to Windows Azure is not AWS, or AppEngine or any other cloud provider — the competitor is Windows Server, SQL Server and all the on-premise technologies that their customers are familiar with. I’m sure that Microsoft desperately wanted to get SQL onto RDS and helped as much as they could because that is what their customers were asking for (Microsoft is apparently quite big on listening to customers). I can’t help thinking that every time Microsofties went over for a meeting at the Amazon office to hammer out the details, the Azure team was left clueless in Redmond and the Amazon staff were chuckling behind their backs.
How does Microsoft reconcile their support for Windows Azure and their support for their existing customers and business models? How do they work with AWS as one of their biggest partners and competitors? While Microsoft struggles with these sorts of questions and tries to decide where to point the ship, Amazon will take whatever money it can off the table, thank you very much.
When Windows Azure first launched I thought that their well established sales channel and partner network would give them the edge – where loyal partners would sell the next big thing from Redmond, as they have done in the past. A few years down the line and Microsoft has been unable to turn that well-oiled machine into increased adoption of Windows Azure. Indeed, the reverse seems true where partners are corralling back to traditional enterprise IT and barely giving Windows Azure any attention. On the other hand Amazon Web Services has never had a sales channel and is, if you look at the broader Amazon philosophy, the epitome of disintermediation — where authors can sell to readers with no publishing or distribution intermediaries in between. Disintermediation is not a good brand when trying to attract ‘partners’.
Perhaps the existing Microsoft model harks back to days when software had to be shipped out of a distribution centre with manuals and media that had to be fed, one by one, into a carefully built server. That is no longer necessary and the customer can use their credit card and go directly to the source and they believe, perhaps wrongly, that going directly to the supplier is the best thing. This direct-from-source model means that the traditional channel, where SIs pass the product along after adding some margin, is non-existent. Despite the direct sales model, there are still ‘partners’, even Amazon Web Services appears to have them (with a newly announced ‘network’), but it is not what it used to be. It has gone from “We’ll market this product and you make money selling and supporting it” to “We’ll market, sell and (sort of) support this product and you make money by selling additional services to the customer (that you have just handed over to us)”. It is little wonder that the existing partners are struggling to see how this works for them.
Of course, those reading this post will be thinking that having a direct sales model is as it should be. The public cloud, it can be argued, is about breaking of traditional IT models — if planning, provisioning, development, operation and architectures have changed, then why should the sales channel and acquisition be built on outdated practices? There are potentially two reasons why a well established sales channel may be necessary, or at least useful.
Firstly, people do need help getting their cloud stuff running. Being able to rent an instance using a credit card is not the only skill that is required to use the cloud and a lot of potential buyers are left out of the cloud revolution because they don’t know what to do with cloud technology once they have their hands on it. For cloud computing to truly break into the mainstream it cannot remain within the hacker community and needs to be able to be consumed by those organisations without the full breadth of skills. Someone from the channel to help them choose, architect and configure seems logical. Unfortunately though, in their attempt to make the cloud seem easy to use and consume, cloud providers do not want to mention that a partner is advised in order to get it working properly.
Secondly, there is a lot of non-cloud IT still happening. Part of the reason for this is not technical and is related to the huge investment that the incumbents have in traditional IT and their desire to keep their market under their control. There are also a lot of talented, well connected and wealthy salespeople selling traditional kit and would never sell public cloud offerings because it makes no financial sense to them. How much commission would a salesperson get selling a one petabyte multi-datacentre SAN versus the salesperson who sells Amazon S3 as the best solution? (The answer — a lot, lot more). For every cloud ‘win’ there are thousands of traditional IT purchases simply because of the sheer number of vendors sweet talking the CTO — and the CTO has made time for them, their campaigns, their presentations and sales pitches. So while cloud providers are creating channel conflict, competing with their existing channels or simply don’t have one, Oracle, EMC and other enterprise vendors are making money hand over fist selling as much as they possibly can. And, just to rub salt in it, are branding their products as ‘private cloud’ — ensuring that the CTO can report back to the steering committee that they do have a cloud strategy and locking out the public cloud for a few more years.
I think that there is still a lot of development and maturing that has to take place with the channel strategies that get public clouds into the hands of the people who will pay every month. It may not be as complex as traditional IT where tin has to be physically put in place, but it is a lot more complicated than getting someone to read a book on a kindle. Microsoft hasn’t been able to work their channel to Azure’s advantage and is probably dismantling the channel that took them twenty years to build. Amazon has direct sales in its DNA and are dipping their toes into partnerships — while potential partners are fearful of Amazon stomping all over them if they turn their backs. There is a middle ground that needs to be found and while the big providers take their time to sort it out, old school IT (branded as private cloud) continues to rake in the cash. Perhaps it is time to start thinking less about technologies and features and focus on building sustainable ecosystems that allow big cloud providers to work in harmony with their customers and providers of specialised skills, products and services.