Page:Aaron Swartz s A Programmable Web An Unfinished Work.pdf/64

From Wikisource
Jump to navigation Jump to search
This page has been proofread, but needs to be validated.
52
8. CONCLUSION: A SEMANTIC WEB?

people at home with their weak data brew.) Which is why the Web invented APIs, letting us share the data with everyone who could think of a use for it.

Now we're not just a simple website, pushing pages to the browser and providing a "See also:" like list of other pages one can visit. We're actually exchanging the data itself from application to application, making possible a new world of mashups and intelligent applications.

This is an entirely new notion of the tapestry—a tapestry of data instead of a tapestry of documents. Documents can't really be merged and integrated and queried; they serve mostly as isolated instances to be viewed and reviewed. But data are protean, able to shift into whatever shape best suits your needs.

But as our needs grow more varied, we need better ways to get at the data that will best serve them. Which is where our queries and dumps come in. No longer are we hampered by only being able to ask the questions a site's programmers have expected and accounted for; now we can ask whatever questions we like, or do processing that can't even be put in the form of a question at all. Combining these dumps from different data sources, the possibilities are endless.

But where do we go from here?

Obviously the first step is to take the large dumps we've all made and load them into one big database. And, of course, we've started to see people do that, from research projects to commercial companies like Metaweb's Freebase. Freebase is an enormous collaborative Web-editable RDF-like database, prepopulated with data extracted from Wikipedia and numerous other sources and supplemented with the contributions of various users. Freebase is still quite small, but their aims are ambitious—creating a database that combines numerous different sources and providing it as a backend to people who want to build more intelligent applications.

Ideally, of course, intelligent applications won't be dependent on a single commercial site, like Freebase, but will merge and combine knowledge from various sites across the Web, crawling and trawling for more useful information and deciding which bits of it to trust.

Already we're seeing things like this in research projects. One of the most exciting Semantic Web tools is a program called cwm, hacked together between (or during) meetings by Sir Tim Berners-Lee himself. Cwm (pronounced coom) is one of the most amazing programs I've seen; it's a veritable data swiss-army knife, all built on RDF.

Of course it does all the basics—reading and writing RDF files of various formats, combining multiple files, printing all the results out in a pretty format.