Open source software security

ablog_Why the EU Will Always be Cooler than a Mashup

30 November -0001
Ok, once again, we return to the topic du jour, defiling mashups. In response to Chris' blog. As a side note I've enable anonymous comments for now, let me know if there are still problems (my opinion of Drupal is declining the more I use it). I'm tempted to take a lot of different avenues in explaining why I would strongly recommend mashups, and there's a strong pull to use anecdotal evidence, but I think I'll stick to straightforward analysis. By this I don't even just mean business analysis, but also engineering analysis. When building any product there are factors that you want to try and maximize for your client (even if that client is yourself). The two relevant factors to this analysis are reliability and availability. Sometimes these factors are not always cohesive, but most of the time it is easy to maximize one without impinging on another. Let us examine mashups by this yardstick, since it is a fair model to apply. Good software meets all of these requirements. Reliability: Because mashups rely on an external data source they are inherently unreliable. There are two main factors to this aspect of mashups. The first is the remote data source. Without any assurance (business or otherwise) there can be no assumption by the mashup engineer that the API, data source, or data format will remain consistent. If a source decides to upgrade, because their data is being made available for free, there is no requirement to forewarn users. Also, because the data is remote, there is not sustainability. That is, the data changes from one moment to the next, and isn't stored anywhere in the mashup. This means that if the data stream runs dry the mashup can't even maintain state at the last good data input. This makes the mashup extremely volatile, the developer can't guarantee that the mashup will present evenly from one moment to the next. The second main factor in this equation is that mashups distort the traditional client server model by multiplying the server nodes. Even if the mashup collates data at the server side, the mashup server must still contact other data nodes in order to produce the mashup. While this method may be safer if only because the mashup server could more gracefully handle data deficiencies the increased node relationships decrease the reliability of the mashup as a whole. If any of the nodes experience problems, either natively or in delivery, the mashup could fail. This risk increases exponentially as more source nodes are added to the equation. Furthermore, if the mashup is comprised of client side node information gathering (using Javascript, AJAX, etc.) then the mashup is further complicated by stretching a browser model to support multiple congruent data sources which becomes problematic for sustainability, bandwidth, and browser compatibility. The problem of reliability of the data source is increased because the developer has no control over client side scripting and mashup failures are likely to degrade less gracefully. Availability: Closely tied to reliability is the issue of mashup availability. Because the mashup relies on remote data sources piped over the network, and because it relies on non-volatile API's for it's composition, no guarantee of availability can be made at all. Worse than that, the developer can't even take concrete steps to help insure availability. They cannot introduce redundancy, error checking, or any of the traditional safeguards against down time because the data source relies on remote networks. In essence, the data is provided for free, and it's reliability is worth what the consumer pays for it. For any engineer to make any guarantees as to the lifespan or availability of a mashup would be patently unethical in my opinion. To offer a mashup to a client would be a travesty. Sure, it might appear to work at one moment, but at the very next it could crumble. The client has no guarantee that the mashup will subsist beyond the moment, and worse than becoming static the mashup would actually fail when/if the data source became unusable. Like using imagery from a remote site, you're never sure when your viewers will see a broken image link or when they'll view a completed composition. Furthermore I'd like to provide some specific rebuttals to arguments made for mashups: 1. Mashups are like open source (drupal, rails, etc.). This is just wrong. Open source is about providing code free of charge, not data. Code is mobile executable, non-volatile data. The source of the code you download doesn't change after you've downloaded it. This means open source code can make guarantees as to availability - mashups cannot make such guarantees. 2. Remember a time when Google went down? Yes - What, just once? Isn't that enough? No? Ok, here's another. These two examples don't even include localized network outages that might cause Google to disappear for small portions of the internet. 3. A data source wouldn't change their API without notice - From the Google Maps Terms of Use: "Google reserves the right to release subsequent versions of the API and to require You to obtain and use the most recent version"... i.e. Google API may change without notice. 4. "Mashups are transitory? Sure, and so's any application." There is a large distinction I'd like to draw between an application, and data. An application is not transitory. It will continue to run as long as the hardware architecture remains the same. Data on the internet is transitory, the word processor on your laptop is not (although your laptop may be transitory depending on how you treat it or who made it (flammability is a secondary argument)). 5. "Build-don't-buy mentality". I'd like to say that using mashups isn't part of this equation. It's not about building or buying - I think both approaches are perfectly legitimate. Mashups make use of free data with no guarantee that either building or buying could provide. If you build a piece of software you govern the spec, you control the development, and you guide the product. If you buy a piece of software you enter into a financial arrangement with the vendor and are afforded legal protections according to such a transaction. Mashup data is 'found data' with no legal protections or guarantees. That's why serious businesses that want to use a Google Maps API pay for the Enterprise license - that's why such a license exists! 6. Drawing geopolitical analogies to software is inflammatory and I'll leave it at that ;) 7. The hours of work spent on the GeorgeSchool campus map ensured that the map would persist. We entered into a contractual arrangement with the buyer to provide software that we could guarantee would work. By building the software we could make that guarantee. If we had either fully apprised the client of the fact that Google map based solution might stop working at any time without notice, or if we had paid for an enterprise license then I would have condoned it. To build a Google maps based solution without the former two though would have been irresponsible. 8. Lack of adoption doesn't mean lack of potential - This is exactly why I'm still betting on Java Applets. Come on Applets, you can do it!