Lori MacVittie has an excellent post ‘When Black Boxes Fail: Amazon, Cloud and the Need to Know‘ on the sparseness of documentation from Amazon on how their availability zone failure triggers work.
“…most customers didn’t – and still don’t – know how Availability Zones really work and, more importantly, what triggers a fail over. What’s worse, what triggers a fail back? Amazon’s documentation is light. Very light. Like cloud light.”
As someone who is trying to work with cloud platforms at the architectural level, I echo the lack of detailed documentation on how things work. Without clear documentation it is difficult to make the correct architectural decisions.
Amazon is not alone in this. Yesterday I had a conversation with a well respected database professional who was shocked to hear that an Azure VM role could be ‘whacked’ (my term) for any ‘ol reason and without warning. This had a direct impact on what he was proposing in his architecture and is something that is not immediately obvious in the documentation.
The lack of documentation is probably a combination of a number of factors:
- The dynamic nature of the environment, which makes documentation outdated. Nobody wants outdated documentation hanging around on the Interwebs – it has a tendency not to forget.
- A degree of trade secrets being protected. Detailed documentation gives an insight into how things work.
- Not having enough focus (or people) on producing usable documentation.
- Most documentation has to pimp up the services and make them look good. As much as AWS tells you to design for failure they don’t want to drum into that their service is prone to failure. I’m sure a lot of documentation gets the once-over by marketing – which is a bad thing for architects.
We are not yet at the nirvana of simply choosing a service and having the implementation abstracted away from us. The underlying infrastructure still seems to leak through the abstraction – to a lesser degree for some services (S3) than others (EBS). So because we have leaky abstractions we need to understand them in more detail in order to make the correct architectural decisions.
Even if we could get services ‘a la carte’, we would need to know more about them, such as expected latency, throughput, availability etc – it is, after all, an attribute of the service. You won’t find documentation on that either.
Documentation that has been assembled by other vendors (as 10gen create for mongoDB on AWS) and can be found in forums is indicative of the lack of, and desire for, more documentation. Documentation that is less light like clouds.