Thursday, November 15, 2007

On Addressability of Data on the Web

I don't understand data sharing solutions that require centralization of data stores. Centralization is an organizational artifact, not a technical one. "The way to solve this is...everyone stop putting you data there and start putting it here." By the very nature of the web, this requirement will never be fully satisfied. There will always be something not in the "master database" that is relevant. And how do you consume data from multiple sources, i.e., multiple perspectives (ex., operational, financial, organizational)? Centralization is seldom even possible, by design, in classified environments and highly improbable when sharing data potentially leads to losing your budget. If data exists on the web and needs to be shared then the way to share is clearly pub/sub; to syndicate query results over HTTP. Probably, but not necessarily, XML will be used to carry data. Technically this should be a no-brainer. Wrap database stored procedures with methods on a Web Service and point to them with URLs. Everything we share on the web we share over HTTP using URLs for addressability. Addresses are fundamental to how data is managed in computers down to the hardware. Why should data on the web be any different? They aren't:

[Note: These links are illustrative. They don't work]

The results of these queries are usable by both people and software.

Sidebar: Just like hardware, data addresses aren't particularly people-friendly. Look to the ideas behind semantic web to help with that.

No comments: