Ruminations on the Geo-Semantic Web
Along with several other SNI-ites, I recently gave a presentation at the 2009 GITA Geospatial Infrastructure Solutions Conference . Mine was entitled “The Geo-Semantic Web, Looking Beyond the Buzzwords”. A topic that has been on my mind for some months. I was rather hoping that I would be able to post a link to the recorded session video, however, they are not yet available, and apparently GITA is charging for them. Not quite sure how I feel about that, but perhaps that’s a topic for another time.
In any case, I thought I would finally take the time to summarize some of my thoughts and opinions on the matter, and why I think the notion of the semantic web, in general, is important, as well as what we might stand to gain by adding a little dash of geo into it.
Read past the jump for all the fun.
I’m going to try and stray away from delving too deeply into the technical mumbo-jumbo here and instead try and summarize some of the main points from my GITA presentation.
Firstly, it seems like we’re all constantly inundated with some fancy-schmancy new set of buzzwords or catch phrases that we’re supposed to be on the lookout for, since they will all apparently be the “next big thing.”
Web 3.0 anybody? Heck, I’m still reeling over all the hoopla surrounding Web 2.0! But, I think it’s important to take a bit of a step back here and look at what this Web 2.0 stuff really is (or was… should I be using the past tense already? I’ll leave that as an exercise to the reader).
You can read through that linked wikipedia article for what amounts to the official definition, I suppose, but here’s my take on it:
Technologically, not much changed or happened in the underlying web infrastructure.
Now, before I start a flame war there, yes, I know there were some changes, some things evolved and were made at least different, if not better, but when you look at the actual nuts and bolts of “The Web” it’s pretty much the same as it was during the glory days of Web 1.0 (before we even knew it had a version number, ah… ignorance was bliss, was it not?).
However, most of what really occurred was a fundamental shift in thinking. People and organizations started seeing the web more as an application platform, and not just a platform, but oftentimes a preferred platform. What that led to was the creation of actual web applications as opposed to web sites. Instead of just reading articles, searching for information, or browsing picture galleries, we’re actually using web applications to do, well, you know, real stuff.
This is not unlike all of the brouhaha surrounding Web 3.0 and the Semantic Web (which are often lumped together). The technology to do most of this “semantic stuff” already exists, and has for quite some time. RDF, which serves as an underlying structural framework for most things semantic, for example, was a W3C recommendation back in 1999. And, in fact, even though the more recent definitions (specifically wikipedia, which is, after all, pretty much the font of all human knowledge :) ) quips that the acronym RSS is:
” most commonly translated as “Really Simple Syndication” “
The original version of the specification published by Netscape way back in 1999 notes that the acronym stands for “RDF Site Summary.” So, I suppose if you want to get all technical about it (as I am wont to do, being some sort of über geek and all), if you were subscribing and/or publishing RSS feeds back in 1999, you were pretty much using the Semantic Web version 1.0, so give yourself a big pat on the back for being such a forward-thinking early adopter!
True, we are starting to see some of the technologies mature (triple stores for example), and see wider adoption, but the underlying ideas and frameworks have been around for quite some time.
What the semantic web will really involve, if it is to take off in a big way and become ubiquitous (as I am fairly convinced it will), it will be the result of a shift in thinking and perception more so than a rapid and radical evolutionary leap in technology. Granted, that leap is bound to occur coincident with all that semantic ubiquity (oooh… Semantic Ubiquity, band name?), but that shift in thinking is the important bit.
And what would that shift involve (in my opinion, at least)? Well, it’s really about starting to blur the lines between “data” (things lying around in relational databases, spreadsheets, XML documents, etc.) and “content.” After all, we’ve got quite a bit of both lying around in one form or another, but in a rather substantial preponderance of cases, one would find it difficult to use that data without first combining it with some form of “content,” or converting that content into some sort of normalized “data” that can be manipulated, queried, sorted, reported on, yada3.
What the whole semantic web movement is attempting to get at is to blur those lines and build those bridges, such that you can break out of the typical mold of seeing “data” in typical tabular format, and “content” as a big blob of words or numbers without any structure, and, most importantly, without any meaning (meaning to a computer, that is), such that they can be transparently intermingled and used together, ultimately making everyone’s lives easier (knock on wood), and, most importantly, reducing the amount of time and effort it takes to combine all of these disparate bits of content and data together in order to form actionable intelligence.
That being said, this is not an easy problem to solve, especially when we’re talking about the “plain” web, as opposed to the GeoWeb (yeah, that’s right, I made you read allll the way down here before I even started getting to the “geo” part!).
As I mentioned, I’m not going to delve into the technical nitty gritty of some of the current and/or proposed work being done on the “plain” web side in that respect (with one exception, DBPedia which I happen to think is pretty spiffy, given my obvious affection for all things semantic and wikipedia). However, getting back to the “geo” part, I think we may be a bit ahead of the game here.
The non-geo web is going to continue to evolve more and more towards a semantically enabled and linked infrastructure, but as those efforts march on, I think the geospatial crowd would be doing both themselves and the rest of our web family a service by beginning to think about our data (and content!) in terms of semantics.
Sound hard? Well, to be honest, yeah, there are challenges. However, think about it. Your average bits and pieces of geodata are already structured by their very nature. Whether that’s a shapefile, a KML file (with SchemaData of course!), GeoRSS, or anything else along those lines, a good bit of the work required to “semantify” the data is already there! The hardest part is starting to think about things not in terms of structured tables and databases, but as semantic graphs. Or, put more succinctly, think about embedding more meaning into your data. Beginning to think about things in this way is by far the biggest challenge I have had as I have been digging into, researching, and attempting to use and build things around semantic technologies, as I find myself so used to thinking about things in terms of tables and joins and rows and columns and so on and so forth. It has proven incredibly difficult to re-train my brain to think about what things mean and how to describe those relationships to a computer, even though one would think it might be the more natural way of doing it.
If we can get there, though, it opens up all kinds of doors for future applications, data interoperability, and analysis. Think about the sorts of things you might be able to do if, for example (disclaimer: I am notoriously terrible with coming up with relevant, let alone good examples on the spur of the moment) instead of having a big ol’ geodataset full of various bits and pieces of information on fire hydrants and their locations, you were also embedding or linking meaning, or had some already there, automagically imported from another source. Meaning such as what fire hydrants did, that their flow rates relate to water, which comes from municipal sources, which are also used to provide H2O to nearby houses, which affects the pressure in those locations…
These are the sorts of things we might be doing and attempting to figure out and analyze today, but it is still us, the humans, who have to ultimately use our noggins or other pieces of unwieldy or complex task-specific tools and software to derive, describe, and/or use that meaning in order to make decisions. If we can cut down that lead time even by a factor of 5-10%, wouldn’t that give us a lot more time to get into doing some really crazy, new, difficult, and interesting stuff?
Those are my $0.02 worth anyway, and, granted, I glossed over a LOT of stuff here and it still ended up being a short novel, but I’ll do my best to start elaborating on some more specific examples and topics here in the near future :)

