David Heinemeier Hansson has posted an interested find over at his blog Loud Thinking about IBM’s stated new goal to pursue Radical Simplification in their enterprise work. Essentially, Big Blue is starting to acknowledge that most enterprise web development is just too cumbersome and daunting for agile and powerful web development. Things like SOAP and WSDL are examples of this. What should be a simple task like retrieve a weather forecast from a remote site becomes a complicated mess of debugging SOAP calls, tweaking WSDL specifications and using wizards on the server to generate code that is impossible for the end user to debug, let alone understand.
Or as Sam Ruby puts in his excellent presentation to IBM on the topic Hello From The Open Source World:
For normal people, the perceived usefulness of a computer language is inversely proportional to the amount of theory the language forces you to learn.
Sam illustrates this in his slides by showing the classic “Hello World” example in a variety of programming languages on different slides. But when he reaches WSDL to specify it, it’s a blank. Because, quite frankly, WSDL is so unwieldy to use, it’s hard to just build it on the fly, forcing many people to plunk down bucks to Microsoft or IBM to get their compilers to build it for them. It’s good business for those companies, but bad news for the Internet as a whole I think.
Ruby says that essentially for a framework to succeed on the web, it has to enable a situation he terms Zero Training, a state where it is easy to get going in the language and to adapt examples to your own needs. Good programming languages have it, some web technologies have it, SOAP and WSDL don’t. Indeed, I feel like the growth of SOAP/WSDL in the enterprise has been in spite of the difficulties of developing for it (mainly because of Microsoft and IBM pushing it). Because, quite frankly, it’s a beast for anything but the most basic RPC-style calls. The reasons:
- SOAP allows you to abstract away the underlying transport protocol. But for 99% of SOAP communication, this protocol is HTTP, and the abstraction limits the control you have over HTTP.
- SOAP needs to validate the entire message before applications can operate on it. This usually means it has to load it up into a DOM as well, so calls that return lots of data or might take a while (the fun ones) are frowned upon
- SOAP also assumes the COM/CORBA marshalling mechanism, so no streaming data or parsing data until the entire document has been received, parsed into an XML DOM tree, and then mapped into an in-memory object tree. For large amounts of data, forget it.
- WSDL has a lot of syntax to map procedure names/arguments onto HTTP. Contrast this to REST, which embraces the inherent naming conventions of HTTP and applies them to a simple procedural call model
- The assumption of these tools is that you will use their SOAP wizards instead of rolling your own code. The wizard is meant to be the only way, not an additional way.
- In fact, the idea that SOAP = Simple Object Access Protocol has become a profound irony.
In his presentation, Mr. Ruby points to PHP as an alternate model. PHP is not really a pretty programming model and it tends to shy away from enforcing higher-level abstractions in favor of lower-level models. But, it is precisely for this reason that PHP has succeeded where many other dynamic web application programming models have failed. Indeed, as it says in the page Do You PHP?
Database abstraction is mostly a myth. There is nothing wrong with direct database calls’ making use of all the tricks and cheats your chosen database has to offer, to tweak as much performance as possible out of it.
And this is the kicker here. Complicated database models tend to abstract away the lower level details of individual databases, even when those details can be the difference between fast and sluggish performance. Remote invocation models like SOAP abstract away the details of the underlying calling mechanism, even when tweaking HTTP requests can be the difference between a fast web app and a slow one. The lower-level mechanism might require more work for the developer, it might be ugly as sin, and it might be wrong and inefficient on some levels, but it works, and more importantly, it works quickly. Because ultimately, the customer doesn’t care what technology is used on the backend, they just want their data fast.
Bringing it back to my own experience, I once had to work on an application where we searched a remote document repository via SOAP and got metadata back on matching documents in XML. The service would send back all matching results for a query and in some cases, this would result in returning 150000 documents over 5 minutes of streaming XML to us. Knowing that the user probably would want to see data as quickly as possible, I decided it would be good to render whatever data I got after I received 10000 documents and tell the user to perhaps narrow their search down. And I tried to do this in SOAP in Visual Studio, and I failed. I could error out completely, but the mechanism had abstracted away the underlying XML-over-HTTP communication that I had no hooks to get lower level.
And so, I chucked it out and went back to basics. The upcoming call to the remote service was faked by filling in an XML string I grabbed from the original SOAP call via a packet sniffer. On the return, my program screams through the data with stream-based XML parsing. Once I hit 10000 records, then I can stop reading the input stream, display what I have, and tell the user that they need to narrow their search criteria. All within 30 seconds.
Those people who use SOAP are probably horrified by this and would rightly point out that I had to do a lot of lower-level work to get some of the same functionality promised by the one-click higher-level SOAP abstraction. But I think that’s precisely the problem. I needed something to work, and I didn’t need as much for it to be pretty. The SOAP framework locked me into a particular model, and once I needed more than that, it failed to deliver. Abstraction is nice, but abstraction always reduces speed, abstraction always reduces flexbility. In most cases, it is possible to strike a balance where the abstraction helps more than it hurts, but until SOAP is able to handle the heavy lifting needed for serious web work, it isn’t worth my time.