Stitching the Semantic Web together with OWL at AAAI-06

Submitted by connolly on Fri, 2006-08-11 16:06. :: | | |

I was pleased to find that AAAI '06 in Boston a couple weeks ago had a spectrum of people I know and don't know and work that's near and far from my own. The talk about the DARPA grand challenge was inspiring.

But closer to my work, I ran into Jeff Heflin, who I worked with on DAML and especially the OWL requirements document. Amid too many papers about ontologies for the sake of ontologies and threads like Is there real world RDF-S/OWL instance data?, his Investigation into the Feasibility of the Semantic Web is a breath of fresh air. The introduction sets out their approach this way:

Our approach is to use axioms of OWL, the de facto Semantic Web language, to describe a map for a set of ontologies. The axioms will relate concepts from one ontology to the other. ... There is a well-established body of research in the area of automated ontology alignment. This is not our focus. Instead we investigate the application of these alignments to provide an integrated view of the Semantic Web data.

(emphasis mine). The rest of the paper justifies this approach, leading up to:

We first query the knowledge base from the perspective of each of the 10 ontologies that define the concept Person. We now ask for all the instances of the concept Person. The results vary from 4 to 1,163,628. We then map the Person concept from all the ontologies to the Person concept defined in the FOAF ontology. We now issue the same query from the perspective of this map and we get 1,213,246 results. The results now encompass all the data sources that commit to these 10 ontologies. Note: a pair wise mapping would have taken 45 mapping axioms to establish this alignment instead of the 9 mapping axioms that we used. More importantly due to this network effect of the maps, by contributing just a single map, one will automatically get the benefit of all the data that is available in the network.

That's fantastic stuff.

We now pause for a word from Steve Lawrence; NEC Research Institute, to lament the lack of free online proceedings for AAAI: Articles freely available online are more highly cited. For greater impact and faster scientific progress, authors and publishers should aim to make research easy to access. OK, now back to the great paper...

Along the way, they give a definition of a knowledge function, K, that is remarkably similar to log:semantics from N3. They also define a commitment function that is basically the ontological closure pattern.

The approach to querying all this data is something they call DLDB, which comes from a paper they submitted to the ISWC Practical and Scalable Semantic Systems workshop. Darn! no full text proceedings online again. Ah... Jeff's pubs include a tech report version. To paraphrase: there's a table for each class and a table for each property that relates rows from the class tables. They use a DL reasoner to find subclass relationships, and they make views out of them. I have never seen this approach to before; it sure looks promising. I wonder if we can integrate it into our dbview work somehow and perhaps into our truth-maintenance system in the TAMI project.

This wasn't the only work at AAAI on scalable, practical knowledge representation. I caught just a glance at some other papers at the conference that exploit wikipedia as a dataset in various algorithms. I hope to study those more.

I also ran into Ben Kuipers, whose Algernon and Access-Limited Logic has long appealed to me as an approach to reasoning that might work well when scaled up to Semantic Web data sets. That work is mostly on hold; we started talking about getting it going again, but didn't get very far into the conversation. I hope to pick that up again soon.

I gather the 1.0 release of OpenCyc happened at the conference; there's a lot of great stuff in cyc, but only time will tell how well it will integrate with other Semantic Web stuff.

Meanwhile, a handy citation for Heflin's paper...

That's marked up using an XHTML/LaText/BibTex idiom that I'm working on so that we get BibTex for free:

@inproceedings{pan06a,
title = "{An Investigation into the Feasibility of the Semantic Web}",
    author = {Z. Pan and A. Qasem and J. Heflin},
    booktitle = {Proc. of the Twenty First  National Conference on Artificial Intelligence  (AAAI 2006)},
    year = {2006},
    address = {Boston, USA},
}

on Wikimania 2006, from a few hundred miles away

Submitted by connolly on Thu, 2006-08-10 16:26. :: | |

Wikimania 2006 was last week in Boston; I had it on my travel schedule, tenatively, months in advance, but I didn't really come up with a solid justification, and there were conflicts, so I ended up not going.

I was very interested to see the online participation options, but I didn't get my hopes up too high, because I know that ConnectingAudiences is challenging.

I tried to participate in the transcription stuff real-time; installation of the goby collaborative editor went smoothly enough (it looks like an interesting alternative to SubEthaEdit, though it's client/server, not peer-to-peer; they're talking about switching to the jabber protocol...) but I couldn't seem to connect to any sessions while people were active in them.

The real-time video feed of mako on a definition of Freedom was surprisingly good, though I couldn't give it my full attention during the work day. I didn't understand the problem he was speaking to (isn't GFDL good enough?) until I listened to Lessig on Free Culture and realized that CC share-alike and GFDL don't interoperate. (Yet another reason to keep the test of independent invention in mind at all times.)

Lessig read this quote, but only referred to the author using a photo that I couldn't see via the audio feed; when I looked it up, I realized there was a gap in this student's free culture education:

If we don't want to live in a jungle, we must change our attitudes. We must start sending the message that a good citizen is one who cooperates when appropriate, not one who is successful at taking from others.

RMS, 1992

These sessions on the wikipedia process look particularly interesting; I hope to find time to see or listen to a recording:

I bumped into TimBL online and remind him about the Wikipedia and the Semantic Web panel; he had turned it down because of other travel obligations, but he just managed to stop by after all. I hope it went allright; he was pretty jet-lagged.

I see WikiSym 2006 coming up August 21-23, 2006 in Odense, Denmark. I'm not sure I can find justification to make travel plans on just a few weeks of notice. But Denny's hottest conference ever item burns like salt in an open wound and motivates me to give it a try. It looks like the SweetWiki folks, who participate in the GRDDL WG, will be there; that's the start of a justification...

tabulator maps in Argentina

Submitted by connolly on Mon, 2006-08-07 11:39. :: | |

My spanish is a little rusty, but it looks like inktel is having fun with the tabulator's map support too.

tags pending: geo, tabulator

OpenID, verisign, and my life: mediawiki, bugzilla, mailman, roundup, ...

Submitted by connolly on Mon, 2006-07-31 15:45. ::

Please, don't ask me to manage another password! In fact, how about getting rid of most of the ones I already manage?

I have sent support requests for some of these; the response was understandable, if disappointing: when debian/ubuntu supports it, or at least when the core MailMain/mediawiki guys support it, we'll give it a try. I opened Issue 18: OpenID support in roundup too; there are good OpenID libraries in python, after all.

A nice thing about OpenID is that the service provider doesn't have to manage passwords either. I was thinking about where my OpenID password(s) should live, and I realized the answer is: nowhere. If we put the key fingerprint in the OpenID persona URL, I can build an OpenID server does public key challenge-response authentication and doesn't store any passwords at all.

As I sat down to tinker with that idea, I rememberd the verisign labs openid service and gave it a try. Boy, it's nice! They use the user-chosen photo anti-phishing trick and provide nice audit trails. So it will probably be quite a while before I feel the need to code my own OpenID server.

I'm still hoping for mac keychain support for OpenID. Meanwhile, has anybody seen a nice gnome applet for keeping the state of my ssh-agent credentials and my CSAIL kerberos credentials visible?

Slicing and dicing web data with Tabulator

Submitted by timbl on Wed, 2006-07-26 10:41. ::

There is a new version 07 of the Tabulator out. This is the generic data browser which lets you do useful things with your RDF data the moment it's on the web.

It works by exploring the web of relationship between things, loading more data from the web as you go. Then, if you find a pattern of information you are interested in, it will search for all occurrences of that pattern and display them in tables, maps, calendars, and so on.

In the same session, you can explore, say, some geocoded photos taken from on a trip with a GPS, outline view of trip data and then separately explore where in the world the tabulator developers are based. outline view of trip data Then, you can project both datasets onto the same map. outline view of trip data Or onto the same calendar, for data with a time component. This shows the cross-domain power of the semantic web.

This means you can correlate data from completely different domains. Think of all the different mash-ups people have made for putting things like friends houses, photos, or coffee shops on the web. Each a different mash-up for a different data source.

For data in RDF (or any XML with a GRDDL profile), though, then you don't have to program anything. You can just explore it and map it. And you can map many different data sources at the same time.

Oh, and for developers, the core of the tabulator is an open source RDF library with a complete tested RDF/XML parser, a store which smushes on owl:sameAs and owl:[Inverse]FunctionalProperty, and web crawling query engine supporting basic SPARQL. Enjoy.

Comments Disabled

Submitted by ryanlee on Tue, 2006-07-25 14:51. ::

Due to an overwhelming signal-noise ratio in the wrong direction, I've disabled all anonymous commenting. We've tried to use spam auto-classification, but the volume is so large and diverse that eventually everything looks like it might be spam, and it's back to square one.

Thanks for your direct participation and input; from now on, we'll be looking for alternatives to continuing these conversations across the web.

Choosing flight itineraries using tabulator and data from Wikipedia

Submitted by connolly on Mon, 2006-07-17 18:13. :: |

While planning a trip to Boston/Cambridge, I was faced with a blizzard of itinerary options from American Airlines. I really wanted to overlay them all on the same map or calendar or something. I pretty much got it to work:

That's a map view using the tabulator, which had another release today. The itinerary data in RDF is converted from HTML via grokOptions.xsl (and tidy).

I can, in fact, see all the itineraries on the same calendar view. Getting these views to be helpful in choosing between the itineraries is going to take some more work, but this is a start.

Getting a map view required getting latitude/longitude info for the airports. I think getting Semantic Web data from Wikipedia is a promising approach. A while back, I figured out how to get lat/long data for airports from wikipedia. This week, I added a Kid template, aptinfo.kid, and I figured figured out how to serve up that data live from the DIG/CSAIL web server. For example, http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD#item is a URI for the Chicago airport, and when you GET it with HTTP, a little CGI script calls aptdata.py, which fetches the relevant page from wikipedia (using an httplib2 cache) and scrapes the lat/long and a few other details and gives them back to you in RDF. Viewed with RDF/N3 glasses, it looks like:

#   Base was: http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD
@prefix : <#> .
@prefix apt: <http://www.daml.org/2001/10/html/airport-ont#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix go: <http://www.w3.org/2003/01/geo/go#> .
@prefix s: <http://www.w3.org/2000/01/rdf-schema#> .

:item a apt:Airport;
apt:iataCode "ORD";
s:comment "tz: -6";
s:label "O'Hare International Airport";
go:within_3_power_11_metres :nearbyCity;
geo:lat "41.9794444444";
geo:long "-87.9044444444";
foaf:homepage <http://en.wikipedia.org/wiki/O%27Hare_International_Airport> .

:nearbyCity foaf:homepage <http://en.wikipedia.org/wiki/wiki/Chicago%2C_Illinois>;
foaf:name "Chicago, Illinois" .

In particular, notice that:

  • I use the swig geo vocabulary, which the new GEO XG is set up to refine. The use of strings rather than datatyped floating point numbers follows the schema for that vocabulary.
  • I use distinct URIs for the airport (http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD#item) and the page about the airport (http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD).
  • I use an owl:InverseFunctionalProperty, foaf:homepage to connect the airport to its wikipedia article, and another, apt:iatacode to relate the airport to its IATA code.
  • I use the GeoOnion pattern to relate the airport to a/the city that it serves. I'm not sure I like that pattern, but the idea is to make a browseable web of linked cities, states, countries, and other geographical items.

Hmm... I use rdfs:label for the name of the airport but foaf:name for the name of the city.I don't think that was a conscious choice. I may change that.

The timezone info is in an rdfs:comment. I hope to refine that in future episodes. Stay tuned.

An Introduction and a JavaScript RDF/XML Parser

Submitted by dsheets on Mon, 2006-07-17 15:02. :: | | | |

My name is David Sheets. I will be a sophomore at MIT this fall. I like to be at the intersection of theory and practice.

This summer, I am working as a student developer on the Tabulator Project in the Decentralized Information Group at MIT's CSAIL. My charge has been to develop a new RDF/XML parser in JavaScript with a view to a JavaScript RDF library. I am pleased to report that I have finished the first version of the new RDF/XML parser.

Before this release, the only available RDF/XML parser in JavaScript was Jim Ley's parser.js. This parser served the community well for quite a while but fell short of the needs of the Tabulator Project. Most notably, it didn't parse all valid RDF/XML resources.

To rectify this, work on a new parser was begun. The result that is being released today is a JavaScript class that weighs in at under 400 source lines of code and 2.8K gzip compressed (12K uncompressed). For maximum utility, a parser should be small, standards-compliant, widely portable, and fast.

To the best of my knowledge, RDFParser is fully compliant with the RDF/XML specification. The parser passes all of the positive parser test cases from the W3. This was tested using jsUnit -- a unit testing framework similar to jUnit but for JavaScript. To run the automated tests against RDFParser, you can follow the steps here. This means the parser supports features such as xml:base, xml:lang, RDF Collections, XML literals, and so forth. If it's in the specification, it should be supported. An important point to note is that this parser, due to speed concerns, is non-validating. Additionally, RDFParser has been speed optimized resulting in code that is slightly less readable.

The new parser is not as portable as the old parser at this time. It has only been tested in Firefox 1.5 but should work in any browser that supports the DOM Level 2 specification.

RDFParser runs at a speed similar to Jim Ley's parser. One can easily construct example RDF/XML files that run faster on one parser or another. I took five files that the tabulator might come across in day-to-day use and I ran head-to-head benchmarks between the two parsers.

Parse time is highly influenced by compact serialization. The more nested the RDF/XML serialization, the more scope frames must be created to track features from the specification. The less nested, the fewer steps to traverse the DOM, the more triples per DOM element.

Planned in the next release of RDFParser is a callback/continuation system so that the parser can yield in the middle of a parse run and allow other important page features to run.

API documentation for RDFParser included in the Tabulator 0.7 release is available.

Finally, I'd be happy to hear from you if you have questions, comments, or ideas regarding the RDFParser or related technologies.

Wondering about how PDF&#8217;s phone home

Submitted by Danny Weitzner on Thu, 2006-07-13 21:40. ::
Wondering about how PDF’s phone home

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

Recently noticed this description of the Adobe Policy Server:

About Adobe LiveCycle Policy Server. Authors who protect their documents with Adobe LiveCycle Policy Server can audit what is done with each copy of the document (such as opening, printing, and editing). They can also change or revoke access rights at any time. If an author has revoked access to a document that is protected by Adobe LiveCycle Policy Server, Adobe Reader or Acrobat informs you that your access rights have been removed the next time you try to open the document.

I wonder how this works.

a walk thru the tabulator calendar view

Submitted by connolly on Tue, 2006-07-11 11:22. :: | |

The SIMILE Timeline is a nifty hack. DanBri writes about Who, what, where, when? of the Semantic Web, and in a message to the Semantic Web Interest Group, he asks

TimBL, any plans for Geo widgets to be wired into the tabulator?

Indeed, there are. Stay tuned for more on that... from the students who are actually doing the work, I hope. But I can't wait to tell everybody about the calendar view. Give it a try:

  1. Start with the development version of the tabulator.
  2. Select a source of calendar info. Morten's SPARQL Timeline uses RSS. The tabulator calendar groks dc:date, so something like W3C's main RSS feed will work fine. Put its URI in the tabulator's URI text field and hit "Add to outliner".
    add to outliner screenshot, before

    When it's done it should look something like this:

    add to outliner screenshot, after
    • For fun, open the Sources tab near the bottom. Note that the tabulator loads the RSS and DC schemas, plus all the schemas they reference, and so on; i.e. the ontological closure. Hmm... the RSS terms seem to be 404.
      sources screenshot
  3. Now navigate the outline down to one of the items.
    item screenshot
    and then re-focus (shift click) on the rss item class itself, and then open an item and select the date property.
    refocus screenshot
  4. Now hit "Tabulate selected properties". You'll get a table of items and their dates.
    table screenshot
  5. OK, so much for review of basic tabulator stuff. Now you're all set for the new stuff. Hit Calendar and scroll down a little:
    table screenshot

Note the Export button with the SPARQL option. That's a whole other item in itself, but for now, you can see the SPARQL query that corresponds to what you've selected to put on the the calendar:

SELECT ?v0 ?v1 
WHERE
{
?v0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/rss/1.0/item> .
?v0 <http://purl.org/dc/elements/1.1/date> ?v1 .

Fun, huh?