Conference

Linked Data at WWW2007: GRDDL, SPARQL, and Wikipedia, oh my!

Submitted by connolly on Thu, 2007-05-17 16:29. :: | | |

Last Tuesday, TimBL started to gripe that the WWW2007 program had lots of stuff that he wanted to see all at the same time; we both realized pretty soon: that's a sign of a great conference.

That afternoon, Harry Halpin and I gave a GRDDL tutorial. Deploying Web-scale Mash-ups by Linking Microformats and the Semantic Web is the title Harry came up with... I was hesitant to be that sensationalist when we first started putting it together, but I think it actually lived up to the billing. It's too bad last-minute complications prevented Murray Maloney from being there to enjoy it with us.

For one thing, GRDDL implementations are springing up all over. I donated my list to the community as the GrddlImplementations wiki topic, and when I came back after the GRDDL spec went to Candidate Recommendation on May 2, several more had sprung up.

What's exciting about these new implementations is that they go beyond the basic "here's some RDF data from one web page" mechanism. They're integrated with RDF map/timeline browsers, and SPARQL engines, and so on.

The example from the GRDDL section of the semantic web client library docs (by Chris Bizer, Tobias Gauß, and Richard Cyganiak) is just "tell me about events on Dan's travel schedule" but that's just the tip of the iceberg: they have implemented the whole LinkedData algorithm (see the SWUI06 paper for details).

With all this great new stuff popping up all over, I felt I should include it in our tutorial materials. I'm not sure how long OpenLink Virtuoso has had GRDDL support (along with database integration, WEBDAV, RSS, Bugzilla support, and on and on), but it was news to me. But I also had to work through some bugs in the details of the GRDDL primer examples with Harry (not to mention dealing with some unexpected input on the HTML 5 decision). So the preparation involved some late nights...

I totally forgot to include the fact that Chime got the Semantic Technologies conference web site using microformats+GRDDL, and Edd did likewise with XTech.

But the questions from the audience showed they were really following along. I was a little worried when they didn't ask any questions about the recursive part of GRDDL; when I prompted them, they said they got it. I guess verbal explanations work; I'm still struggling to find an effective way to explain it in the spec. Harry followed up with some people in the halls about the spreadsheet example; as mnot said, Excel spreadsheets contain the bulk of the data in the enterprise.

One person was even followingn along closely enough to help me realize that the slide on monotonicity/partial understanding uses a really bad example.

The official LinkedData session was on Friday, but it spilled over to a few impromptu gatherings; on Wednesday evening, TimBL was browsing around with the tabulator, and he asked for some URIs from the audience, and in no time, we were browsing protiens and diseases, thanks to somebody who had re-packaged some LSID-based stuff as HTTP+RDF linked data.

Giovanni Tummarello showed a pretty cool back-link service for the Semantic Web. It included support for finding SPARQL endpoints relevant to various properties and classes, a contribution to the serviceDescription issue that the RDF Data Access Working Group postponed. I think I've seen a few other related ideas here and there; I'll try to put them in the ServiceDescription wiki topic when I remember the details...

Chris Bizer showed that dbpedia is the catalyst for an impressive federation of linked data. Back in March 2006, Toward Semantic Web data from Wikipedia was my wish into the web, and it's now granted. All those wikipedia infoboxes are now out there for SPARQLing. And other groups are hooking up musicbrainz and wordnet and so on. After such a long wait, it seems to be happening so fast! 

Speaking of fast, the Semantic MediaWiki project itself is starting to do performance testing with a full copy of wikipedia, Denny told us on Friday afternoon in the DevTrack.

Also speaking of fast, how did OpenLink go from not-on-my-radar to supporting every Semantic Web Technology I have ever heard of in about a year? I got part of the story in the halls... it started with ODBC drivers about a decade ago, which explains why their database integration is so good. Kingsley, here's hoping we get to play volleyball sometime. It's a shame we had just a few short moments together in the halls...

tags: (photos), grddl, www2007, travel

Stitching the Semantic Web together with OWL at AAAI-06

Submitted by connolly on Fri, 2006-08-11 16:06. :: | | |

I was pleased to find that AAAI '06 in Boston a couple weeks ago had a spectrum of people I know and don't know and work that's near and far from my own. The talk about the DARPA grand challenge was inspiring.

But closer to my work, I ran into Jeff Heflin, who I worked with on DAML and especially the OWL requirements document. Amid too many papers about ontologies for the sake of ontologies and threads like Is there real world RDF-S/OWL instance data?, his Investigation into the Feasibility of the Semantic Web is a breath of fresh air. The introduction sets out their approach this way:

Our approach is to use axioms of OWL, the de facto Semantic Web language, to describe a map for a set of ontologies. The axioms will relate concepts from one ontology to the other. ... There is a well-established body of research in the area of automated ontology alignment. This is not our focus. Instead we investigate the application of these alignments to provide an integrated view of the Semantic Web data.

(emphasis mine). The rest of the paper justifies this approach, leading up to:

We first query the knowledge base from the perspective of each of the 10 ontologies that define the concept Person. We now ask for all the instances of the concept Person. The results vary from 4 to 1,163,628. We then map the Person concept from all the ontologies to the Person concept defined in the FOAF ontology. We now issue the same query from the perspective of this map and we get 1,213,246 results. The results now encompass all the data sources that commit to these 10 ontologies. Note: a pair wise mapping would have taken 45 mapping axioms to establish this alignment instead of the 9 mapping axioms that we used. More importantly due to this network effect of the maps, by contributing just a single map, one will automatically get the benefit of all the data that is available in the network.

That's fantastic stuff.

We now pause for a word from Steve Lawrence; NEC Research Institute, to lament the lack of free online proceedings for AAAI: Articles freely available online are more highly cited. For greater impact and faster scientific progress, authors and publishers should aim to make research easy to access. OK, now back to the great paper...

Along the way, they give a definition of a knowledge function, K, that is remarkably similar to log:semantics from N3. They also define a commitment function that is basically the ontological closure pattern.

The approach to querying all this data is something they call DLDB, which comes from a paper they submitted to the ISWC Practical and Scalable Semantic Systems workshop. Darn! no full text proceedings online again. Ah... Jeff's pubs include a tech report version. To paraphrase: there's a table for each class and a table for each property that relates rows from the class tables. They use a DL reasoner to find subclass relationships, and they make views out of them. I have never seen this approach to before; it sure looks promising. I wonder if we can integrate it into our dbview work somehow and perhaps into our truth-maintenance system in the TAMI project.

This wasn't the only work at AAAI on scalable, practical knowledge representation. I caught just a glance at some other papers at the conference that exploit wikipedia as a dataset in various algorithms. I hope to study those more.

I also ran into Ben Kuipers, whose Algernon and Access-Limited Logic has long appealed to me as an approach to reasoning that might work well when scaled up to Semantic Web data sets. That work is mostly on hold; we started talking about getting it going again, but didn't get very far into the conversation. I hope to pick that up again soon.

I gather the 1.0 release of OpenCyc happened at the conference; there's a lot of great stuff in cyc, but only time will tell how well it will integrate with other Semantic Web stuff.

Meanwhile, a handy citation for Heflin's paper...

That's marked up using an XHTML/LaText/BibTex idiom that I'm working on so that we get BibTex for free:

@inproceedings{pan06a,
title = "{An Investigation into the Feasibility of the Semantic Web}",
    author = {Z. Pan and A. Qasem and J. Heflin},
    booktitle = {Proc. of the Twenty First  National Conference on Artificial Intelligence  (AAAI 2006)},
    year = {2006},
    address = {Boston, USA},
}

on Wikimania 2006, from a few hundred miles away

Submitted by connolly on Thu, 2006-08-10 16:26. :: | |

Wikimania 2006 was last week in Boston; I had it on my travel schedule, tenatively, months in advance, but I didn't really come up with a solid justification, and there were conflicts, so I ended up not going.

I was very interested to see the online participation options, but I didn't get my hopes up too high, because I know that ConnectingAudiences is challenging.

I tried to participate in the transcription stuff real-time; installation of the goby collaborative editor went smoothly enough (it looks like an interesting alternative to SubEthaEdit, though it's client/server, not peer-to-peer; they're talking about switching to the jabber protocol...) but I couldn't seem to connect to any sessions while people were active in them.

The real-time video feed of mako on a definition of Freedom was surprisingly good, though I couldn't give it my full attention during the work day. I didn't understand the problem he was speaking to (isn't GFDL good enough?) until I listened to Lessig on Free Culture and realized that CC share-alike and GFDL don't interoperate. (Yet another reason to keep the test of independent invention in mind at all times.)

Lessig read this quote, but only referred to the author using a photo that I couldn't see via the audio feed; when I looked it up, I realized there was a gap in this student's free culture education:

If we don't want to live in a jungle, we must change our attitudes. We must start sending the message that a good citizen is one who cooperates when appropriate, not one who is successful at taking from others.

RMS, 1992

These sessions on the wikipedia process look particularly interesting; I hope to find time to see or listen to a recording:

I bumped into TimBL online and remind him about the Wikipedia and the Semantic Web panel; he had turned it down because of other travel obligations, but he just managed to stop by after all. I hope it went allright; he was pretty jet-lagged.

I see WikiSym 2006 coming up August 21-23, 2006 in Odense, Denmark. I'm not sure I can find justification to make travel plans on just a few weeks of notice. But Denny's hottest conference ever item burns like salt in an open wound and motivates me to give it a try. It looks like the SweetWiki folks, who participate in the GRDDL WG, will be there; that's the start of a justification...

Exporting databases in the Semantic Web with SPARQL, D2R, dbview, ARC, and such

Submitted by connolly on Fri, 2006-06-02 16:55. :: | | |

The developer track at WWW2006 last week in Edinburgh was really cool; you had to show up on time or you couldn't fit in the room! One of the coolest talks was D2R-Server - Publishing Relational Databases on the Web as SPARQL-Endpoints.. I see D2R Server is released now. Cool.

Yes, storing RDF in a SQL database using 3-column tables (or 4 or 5 or 6...) is cool as far as it goes, but I'm gland we're finally seeing more work on taking existing SQL databases (whose schemas are not designed with RDF in mind) and exporting them as RDF.

TimBL wrote a design note on Relational Databases on the Semantic Web in 1998. In 2002, I wrote dbview.py, a couple hundred lines of python that implements parts of it. Rob Crowell picked it up and the 2005/2006 version of dbview.py now does foreign keys and backlinks.

D2R gets points for using RDF for their configuration/mapping info. The slides showed turtle/n3. Why are the dbin brainlets in XML but not RDF? I wonder.

D2R Server has a mapping layer; dbview assumes that will be handled with rules. The choice of URIs for column names is interesting. D2R uses jdbc:mysql://127.0.0.1/wordpress#users1, but dbview is all about embedding a SQL database in HTTP space, so we use URIs like http://db.example/orders/customers/custno/1#item. In dbview, the decisions about when to use / and when to use # are made so that the result is browseable. In D2R, the default URIs don't matter as much because it's expected that they'll be mapped to a more well-known ontology/schema like foaf.

dbview is still just a few hundred lines of python; we haven't integrated the SPARQL parser that Yosi developed for cwm, nor integrated EricP's work on federated query.

Speaking of federated query... on Wednesday at the conference, I saw Tim Finin in the poster session. He showed me something the swoogle folks are cooking up: you give it a SPARQL query, and it looks at the terms used in your query and suggests documents you should put in your SPARQL dataset to run your query against. I hope to hear more about that.

Somewhere in EricP's work is one of the several SPARQL-to-SQL rewriters out there... oh... I thought the HP tech report, A relational algebra for SPARQL was another one, but it seems to be by Richard Cyganiak, one of the D2R guys.

Benjamin Nowack's Feb 2006 item announced a SPARQL-to-SQL rewriter for his ARC RDF store for PHP.

Hmm... maybe it's time for a ScheduledTopicChat on SPARQL, SQL, and RDF? If you're interested, suggest a couple times that would be good for you in a comment or in mail to me and a public archive.

WWW2006 in Edinburgh: Identity, Reference, and Meaning

Submitted by connolly on Fri, 2006-06-02 14:40. :: | | | | |

I went to Edinburgh last week for WWW2006.

I spent Tuesday in the workshop on Identity, Reference, and the Web (IRW2006). I didn't really finish my presentation slides in time, but I think my paper, A Pragmatic Theory of Reference for the Web is mostly coherent. Each section of the workshop got an entry in a semantic wiki; mine is the one that started at 12:00.

The IRE formalism presented by Valentina and Aldo was though-provoking. I think their proxy-for is like foaf:topic (modulo the way they mix in time). And exact-proxy-for is like foaf:primaryTopic. Very handy. I wonder if foaf:primaryTopic should be promoted to its own thing, separate from all the social networking stuff in foaf.

Ginsberg's talk hit on one of the most important questions: "Do I commit to a document just because I use one of its terms?" His answer was basically to reify everything; I think we can do better than that. Peter Patel-Schneider's talk basically gave a 'no' answer to the question. I don't think we should go that far either, though from a standardization point of view, that's sorta where we're at.

Steve Pepper's talked about published subjects and public resource identifiers; I can sympathize with his point that we have too many URL/URI/URN/IRI/XRI/etc. terms, but when he suggests that the answer is to make a new one, I'm not sure I agree. He argues to deprecate all the others, but as URI Activity lead at W3C, I'm not in a position where I can overrule people and deprecate things that they say they want. I agree with him that the 303 redirection is too much trouble, but he doesn't seem to be willing to use the HashURI pattern either, and as I said in the advice section of my paper, that's asking for trouble.

On Thursday, I was on a panel about tagging versus the Semantic Web: Meaning on the Web: Evolution or Intelligent Design?. Frank started by debunking 4 myths about the Semantic Web. I gotta find Frank's slides. "I'll hold up one finger whever anybody says myth #1, and so on." As the the other Frank was talking about tagging, Frank held up 2 and 3 fingers, and the audience pointed out that he should have held up 1 finger.

I talked without slides. I think I got away with it. I said that I don't expect symbolic reasoning to beat statistical methods when it comes to the wisdom of crowds, but who wants to delegate their bank balance or the targets of their mail messages to the wisdom of crowds? Sometimes we mean exactly what we say, not just something close.

I suggested that GRDDL+microformats is a practical way to get lots of Semantic Web data. And I brought up the problem with iCalendar timezones and noted that while timezones data should be published by the government entities that govern them, Semantic Web data from wikipedia might be a more straightforward mechanism and might be just as democratic.

So much for philosophical discussions; stay tuned for another item about SPARQL and databases and running code.

Getting (dis)organized for SxSWi in Austin

Submitted by connolly on Tue, 2006-03-07 20:49. :: | |

SxSWi looks to be quite the PathCross: The microformats panel on Monday is what put the conference on my radar this year, but it's just one of dozens of panels that I really want to see. It's overwhelming. Of course, that's part of the appeal of the Austin/SXSW scene: creative chaos. As a student, my creed was "never plan more than 15 minutes ahead." Life was much simpler, in many ways, back then.

Other stuff I'm looking forward to:

I'm driving down with Mary and the boys, stopping to visit folks here and there.

And on Tuesday night, my itinerary takes me to New York for the W3C workshop on usability and authorization.

RDF Calendar, GRDDL, Microformats, and all that at XML2005 in Atlanta

Submitted by connolly on Mon, 2005-11-21 15:21. :: | | | | |

My talk was:

I unfortunately didn't leave any time for questions, but I had some interesting follow-up conversations:

  • Somebody asked about using GRDDL and RDF to track relationships between specs, products that support them, and all that. I recalled that when the folks that run the OASIS standards registry contacted W3C, we told them we prefer a more decentralized approach: each organization publishes stuff about their own standards, in RDF, and anybody can aggregate it. TimBL's roadmap diagrams show one approach. It is somewhat bit-rotten, but we have an automated system in production for publishing basic title/author/date/version metadata about our specs and we're adding more stuff over time; e.g. which WG produced the spec (for patent policy reasons), comment due dates, etc. I told him this had come up in spec-prod; while I'm happy for the discussion to go there, my impression that it had come up there before was wrong. I hope to organize my thoughts on this near NormativeReferences in the QA/ESW wiki and re-kindle discussion in spec-prod or qa-ig.
  • At lunch, somebody brought up my slide about email headers in RDF and asked if thunderbird has RDF support like mozilla and firefox. I don't know, but I hope to find out. DanBri? Anyone?

On the non-technical front, jamming with Len Bullard was a blast. We had a fascinating discussion of DRM and the recording industry where I relayed AaronSw's viewpoint that any model based on scarcity is uninteresting. Len says Prince is no longer independent, which contradicts the impression I got from studying Prince in Wikipedia recently. Len says the big customer ripie for SemWeb technology is transit, at least as much as intelligence. Gotta look into that.

Later in the evening Len brought out a fake book and Tony and Lauren and Eve and John sang and I tried to accompany them on Len's guitar. I was having so much fun that I raised a sizeable blood-blister on my strumming hand before I noticed. I think we did OK with Annie's Song as well as mangling lots of Beatles and such.

Then Len took the guitar and Eve asked him to play Angel from Montgomery by Bonnie Raitt. When he said he didn't know it, I was able to use my sidekick to find chords and lyrics and since it was your basic three chord number, he picked it up in no time.

As to the conference program...

Tue 15 Nov

Wed 16 Nov

Thu 17 Nov

ISWC2005 Experiences

Submitted by ryanlee on Wed, 2005-11-16 09:37. :: |

Some of my notes on experiences at ISWC2005 in Galway, Ireland.

SIMILE was associated with part of the End User Semantic Web Interaction Workshop by way of Fresnel and with David Huynh's paper talk on Piggy Bank. As Eric had responsilibities as metadata chair for the conference, we were also behind the scenes running a conference-enhanced version of Semantic Bank. Having widespread exposure across a workshop, a highly anticipated paper talk, and particularly plenary sessions increased the visibility of our work and our persons; people were telling me how great Piggy Bank was, and I wasn't sure how they knew I was part of SIMILE.

We ran a contest to further promote the conference bank. The contest was a bit of a last minute plan and, in any future scenarios similar to a raffle, we should probably put in more advance planning for determining rules and winners.

The top two issues I heard at ISWC concerning our work were 1) whether or not the conference bank was queryable (yes, but not really - you would need to reverse engineer David's querying system to get to the subset of RDF you wanted, and it's grounded in faceted browsing, not the more free-form SPARQL) and 2) how hard it was to get Piggy Bank running. I watched a number of people struggle through the process who likely would have given up without some guidance from me. Probably the greatest benefit would come from cutting the Google Maps step out of the initialization wizard and/or fixing it so the link to acquiring a key is not modal.

There was further confusion on how to properly tag things in the bank (part of the contest rules), and it became clear that, in certain environments, Piggy Bank cost more than it was worth, even with an iPod Nano at stake. The hurdle to tag one paper seemed to be quite high.

We have much food for thought for the next round of enhancements.

Extra: an extremely small selection of some of my photos on Flickr from Galway and all Flickr photos tagged iswc2005.

Syndicate content