Posts Tagged Hadoop
I think I’m getting old. I remember OS/2 and Windows NT that arose from Microsoft’s failed partnership with IBM. I remember using DoubleSpace, which Microsoft promised to licence from Stac Electronics and turned into a lawsuit that sunk the little guy. There are many other examples of failed partnerships and ’embrace and replace’ tactics. We aren’t even talking about the tactics that they employed while trying to win the late nineties browser wars.
It’s not that Microsoft doesn’t do partnerships – they have a very good channel and lots of partnerships that work well for them and their partners. But when it comes to working side by side on a technology, I don’t think that Microsoft can handle the culture. Let’s not even get into partnerships with the open source community.
The news of the Microsoft partnership with HortonWorks to do Hadoop smells funny to me. Not funny hilarious, but as in ‘this milk smells funny’. I just don’t see how Microsoft will ditch their own map-reduce plans and throw their lot in with Hadoop, as people are inferring from the announcement (but is not really stated). Big Data is, apparently, the next big thing and Microsoft has had a lot of people working on this for years. They had Pat Helland working there for a while (now leaving) who in his own words has been working on
…Cosmos, some of the plumbing for Bing. It stores hundreds of petabytes of data on tens of thousands of computers. Large scale batch processing using Dryad with a high-level language called SCOPE on top of it
Pat is a smart guy and has been working with unstructured data for a while. When you think about it, the same problem that Google solved with Hadoop has to exist in Bing. Surely Bing has a big enough big data problem that they have arguably solved? It just needs packaging, right?
Then there is SQL Parallel Data Warehouse, resulting from the DATAllegro acquisition in 2008, which is about big data. Yes, it also requires Big Tin, but those kinks can be worked out of the system.
So while it is great for Hadoop in general that Microsoft seems to be cosying up to them, I don’t think that this means that Microsoft is ‘all in’ with open source or Hadoop. I reckon it is a strategy to make sure that the customers that would have gone over to Hadoop anyway don’t feel compelled to stray too far from Microsoft – they can reel them back in in due course. If the big data market (however that may be defined) is worth hundreds of millions per year today, in ten years time it is going to be worth billions and you can bet that Microsoft will have some licenses to sell in due course.
Dryad, Microsoft’s MapReduce implementation, has finally found it’s way out of Microsoft Research and is now open to a public beta. Surprisingly it is a quiet blog announcement, with no accompanying name change that is typical of Microsoft such as ‘Windows Server 2008 R2 High Performance Distributed Compute Services’ – or something equally catchy. Unsurprisingly for an enterprise software vendor, Dryad is limited to Windows HPC (High Performance Compute) Server – which means high-end tin and expensive licences. While most of the MapReduce secret sauce for the Dryad implementation probably comes from the HPC OS, it is disappointing that there is still no generally available MapReduce on the Microsoft stack on the horizon.
I recall more that a year ago wishing for Dryad (and DryadLINQ) on Azure to query Azure Table Storage (the Azure NoSQL data store) and generally thinking that Azure would get some MapReduce functionality as a service (like Amazon Elastic MapReduce – a Hadoop implementation on AWS) out of the Dryad project. But it seems that Microsoft is focussing on the enterprise MapReduce market for now.
I’m not that sure about the market for enterprise MapReduce and defer to the experts, wherever they may be. I thought that MapReduce was about using cheap resources (commodity hardware and open source licences) in order to scale out compute power. Surely if you have to pay Microsoft and high end hardware tax the case for scale out drops and you are better off just scaling up already expensive servers? I am sure the research market will still go for Hadoop or similar, and maybe Microsoft sees something in the deep pockets of financial services.
Had Microsoft brought Dryad to Azure, as ‘MapReduce as a Service’ then there would be something worth looking into on the Microsoft stack, but until then MapReduce for the masses is likely to remain on Hadoop. Hadoop is the de facto MapReduce implementation for non-specialists and as MapReduce gains traction as a way of solving problems, Microsoft will find it impossible to catch up.