connolly's blog

FOAF and OpenID: two great tastes that taste great together

Submitted by connolly on Wed, 2007-10-24 23:00. :: | |

As Simon Willison notes, OpenID solves the identity problem, not the trust problem. Meanwhile, FOAF and RDF are potential solutions to lots of problems but not yet actual solutions to very many. I think they go together like peanut butter and chocolate, creating a deliciously practical testbed for our Policy Aware Web research.

Our struggle to build a community is fairly typical:

In Dec 2006, Ryan did a Drupal upgrade that included OpenID support, but that only held the spammers back for a couple weeks. Meanwhile, Six Apart is Opening the Social Graph:


... if you manage a social networking service, we strongly encourage you to embrace OpenID, hCard XFN, FOAF and the other open standards around data portability.

With that in mind, a suggestion to outsource to a centralized commercial blog spam filtering service seemed like a step in the wrong direction; we are the Decentralized Information Group after all; time to eat our own cooking!

The policy we have working right now is, roughly: you can comment on our blog if you're a friend of a friend of a member of the group.

In more detail, you can comment on our blog if:

  1. You can show ownership of a web page via the OpenID protocol.
  2. That web page is related by the foaf:openid property to a foaf:Person, and
  3. That foaf:Person is
    1. listed as a member of the DIG group in, or
    2. related to a dig member by one or two foaf:knows links.

The implementation has two components so far:

  • an enhancement to drupal's OpenID support to check a whitelist
  • a FOAF crawler that generates a whitelist periodically

We're looking into policies such as You can comment if you're in a class taught by a DIG group member, but there are challenges reconciling policies protecting privacy of MIT students with this approach.

We're also interested in federating with other communities. The Advogato community is particuarly interesting because

  1. The DIG group is pretty into Open Source, the core value of advogato.
  2. Advogato's trust metric is designed to be robust in the face of spammers and seems to work well in practice.

So I'd like to be able to say You can comment on our blog if you're certified Journeyer or above in the Advogato community. Advogato has been exporting basic foaf:name and foaf:knows data since a Feb 2007 update, but they didn't export the results of the trust metric computation in RDF.

Asking for that data in RDF has been on my todo list for months, but when Sean Palmer found out about this OpenID and FOAF stuff, he sent an enhancement request, and Steven Rainwater joined the #swig channel to let us alpha test it in no time. Sean also did a nice write-up.

This is a perfect example of the sort of integration of statistical methods into the Semantic Web that we have been talking about as far back as our DAML proposal in 2000:

Some of these systems use relatively simple and straightforward manipulation of well-characterized data, such as an access control system. Others, such as search engines, use wildly heuristic manipulations to reach less clearly justified but often extremely useful conclusions. In order to achieve its potential, the Semantic Web must provide a common interchange language bridging these diverse systems. Like HTML, the Semantic Web language should be basic enough that it does not impose an undue burden on the simplest web software systems, but powerful enough to allow more sophisticated components to use it to advantage as well.

Now we just have to enhance our crawler to get that data or otherwise integrate it with the drupal whitelist. (I'm particularly interested in using GRDDL to get FOAF data right from the OpenID page; stay tuned for more on that.) And I guess we need Advogato to provide a user interface for foaf:openid support... or maybe links to supplementary FOAF files via rdfs:seeAlso or owl:sameAs.

Soccer schedules, flight itineraries, timezones, and python web frameworks

Submitted by connolly on Wed, 2007-09-12 17:17. :: | | | |

The schedule for this fall soccer season came out August 11th. I got the itinerary for the trip I'm about to take on July 26. But I just now got them synchronized with the family calendar.

The soccer league publishes the schedule in somewhat reasonable HTML; to get that into my sidekick, I have a Makefile that does these steps:

  1. Use tidy to make the markup well-formed.
  2. Use 100 lines of XSLT (soccer-schedfix.xsl) to add hCalendar markup.
  3. Use glean-hcal.xsl to get RDF calendar data.
  4. Use to upload the calendar items via XMLRPC to the danger/t-mobile service, which magically updates the sidekick device.

But oops! The timezones come out wrong. Ugh... manually fix the times of 12 soccer games... better than manually keying in all the data... then sync with the family calendar. My usual calendar sync Makefile does the following:

  1. Use to download the calendar and task data via XMLRPC.
  2. Use to filter by category=family, convert from danger/sidekick/hiptop conventions to iCalendar standard conventions pour the records into a kid template to produce RDF Calendar (and hCalendar).
  3. Use to convert RDF Calendar to .ics format.
  4. Upload to family WebDAV server using curl.

Then check the results on my mac to make sure that when my wife refreshes her iCal subscriptions it will look right.

Oh no! The timezones are wrong again!

The sidekick has no visible support for timezones, but the start_time and end_time fields in the XMLRPC interface are in Z/UTC time, and there's a timezone field. However, after years with this device, I'm still mystified about how it works. The Makefiles approach is not conducive to tinkering at this level, so I worked on my REST interface, until it had crude support for editing records (using JSON syntax in a form field). What I discovered is that once you post an event record with a mixed up timezone, there's no way to fix it. When you use the device UI to change the start time, it looks OK, but the Z time via XMLRPC is then wrong.

So I deleted all the soccer game records, carefully factored the danger/iCalendar conversion code out of into for ease of testing, and goit it working for local Chicago-time events.

Then I went through the whole story again with my itinerary. Just replace tidy and soccer-schedfix.xsl with to get the itinerary from SABRE's text format to hCalendar:

  1. Upload itinerary to the sidekick.
  2. Manually fix the times.
  3. Sync with iCal. Bzzt. Off by several hours.
  4. Delete the flights from the sidekick.
  5. Work on some more.
  6. Upload to the sidekick again. Ignore the sidekick display, which is right for the parts of the itinerary in Chicago, but wrong for the others.
  7. Sync with iCal. Win!

I suppose I'm resigned that the only way to get the XMLRPC POST/upload right (the stored Z times, at least, if not the display) is to know what timezone the device is set to when the POST occurs. Sigh.

A March 2005 review corroborates my findings:

The Sidekick and the sync software do not seem to be aware of time zones. That means that your PC and your Sidekick have to be configured for the same time zone when they synchronize, else your appointments will be all wrong. is about my 5th iteration on this idea of a web server interface to my PDA data. It uses WSGI and JSON and Genshi, following Joe G's stuff. Previous itertions include:

  1. - quick n dirty perl hack (started April 2001)
  2. - screen scraping (Dec 2002)
  3. - XMLRPC with a python shelf and hardcoded RDF/XML output (Feb 2004)
  4. - conversion logic in python with kid templates and SPARQL-like filters over JSON-shaped data (March 2006)
It's pretty raw right now, but fleshing out the details looks like fun. Wish me luck.

Units of measure and property chaining

Submitted by connolly on Tue, 2007-07-31 13:42. :: | | | |

We're long overdue for standard URIs for units of measure in the Semantic Web.

The SUMO stuff has a nice browser (e.g. see meter), a nice mapping from wordnet, and nice licensing terms. Of course, it's not RDF-native. In particular, it uses n-ary relations in the form of functions of more than one argument; 1 hour is written (&%MeasureFn 1 &%HourDuration). I might be willing to work out a mapping for that, but other details in the KIF source bother me a bit: a month is modelled conservatively as something between 28 and 31 days, but a year is exactly 365 days, despite leap-years. Go figure.

There's a nice Units in MathML note from November 2003, but all the URIs are incomplete, e.g. http://.../units/yard .

The Sep 2006 OWL Time Working Draft has full URIs such as, but its approach to n-ary relations is unsound, as I pointed out in a Jun 2006 comment.

Tim sketched the Interpretation Properties idiom back in 1998; I don't suppose it fits in OWL-DL, but it appeals to me quite a bit as an approach to units of measure. He just recently fleshed out some details in Units of measure are modelled as properties that relate quantities to magnitudes; for example:

 track length [ un:mile 0.25].

This Interpretation Properties approach allows us to model composition of units in the natural way:

W is o2:chain of (A V).

where o2:chain is like property chaining in OWL 1.1 (we hope).

Likewise, inverse units are modelled as inverse properties:

s a Unit; rdfs:label "s".
hz rdfs:label "Hz"; owl:inverseOf s.

Finally, scalar conversions are modelled using product; for example, mile is defined in terms of meter like so:

(m 0.0254) product inch.
(inch 12) product foot.
(foot 3) product yard.
(yard 22) product chain.
(chain 10) product furlong.
(furlong 8)product mile.

I supplemented his ontology with some test/example cases, unit_ex.n3 and then added a few rules to flesh out the modelling. These rules converts between meters and miles:

# numeric multiplication associates with unit multiplication
{ (?U1 ?S1) un:product ?U2.
(?U2 ?S2) un:product ?U3.
(?S1 ?S2) math:product ?S3
} => { (?U1 ?S3) un:product ?U3 }

# scalar conversions between units
{ ?X ?UNIT ?V.
(?BASE ?CONVERSION) un:product ?UNIT.
(?V ?CONVERSION) math:product ?V2.
} => { ?X ?BASE ?V2 }.

Put them together and out comes:

    ex:track     ex:length  [
:chain 20.0;
:foot 1320.0;
:furlong 2.0;
:inch 15840.0;
:m 402.336;
:mile 0.25;
:yard 440.0 ] .

The rules I wrote for pushing conversion factors into chains isn't fully general, but it works in cases like converting from this:

(un:foot un:hz) o2:chain fps.
bullet speed [ fps 4000 ].

to this:

    ex:bullet     ex:speed  [
ex:fps 4000;
:mps 1219.2 ] .

As I say, I find this approach quite appealing. I hope to discuss it with people working on units of measure in development of a Delivery Context Ontology.

Linked Data at WWW2007: GRDDL, SPARQL, and Wikipedia, oh my!

Submitted by connolly on Thu, 2007-05-17 16:29. :: | | |

Last Tuesday, TimBL started to gripe that the WWW2007 program had lots of stuff that he wanted to see all at the same time; we both realized pretty soon: that's a sign of a great conference.

That afternoon, Harry Halpin and I gave a GRDDL tutorial. Deploying Web-scale Mash-ups by Linking Microformats and the Semantic Web is the title Harry came up with... I was hesitant to be that sensationalist when we first started putting it together, but I think it actually lived up to the billing. It's too bad last-minute complications prevented Murray Maloney from being there to enjoy it with us.

For one thing, GRDDL implementations are springing up all over. I donated my list to the community as the GrddlImplementations wiki topic, and when I came back after the GRDDL spec went to Candidate Recommendation on May 2, several more had sprung up.

What's exciting about these new implementations is that they go beyond the basic "here's some RDF data from one web page" mechanism. They're integrated with RDF map/timeline browsers, and SPARQL engines, and so on.

The example from the GRDDL section of the semantic web client library docs (by Chris Bizer, Tobias Gauß, and Richard Cyganiak) is just "tell me about events on Dan's travel schedule" but that's just the tip of the iceberg: they have implemented the whole LinkedData algorithm (see the SWUI06 paper for details).

With all this great new stuff popping up all over, I felt I should include it in our tutorial materials. I'm not sure how long OpenLink Virtuoso has had GRDDL support (along with database integration, WEBDAV, RSS, Bugzilla support, and on and on), but it was news to me. But I also had to work through some bugs in the details of the GRDDL primer examples with Harry (not to mention dealing with some unexpected input on the HTML 5 decision). So the preparation involved some late nights...

I totally forgot to include the fact that Chime got the Semantic Technologies conference web site using microformats+GRDDL, and Edd did likewise with XTech.

But the questions from the audience showed they were really following along. I was a little worried when they didn't ask any questions about the recursive part of GRDDL; when I prompted them, they said they got it. I guess verbal explanations work; I'm still struggling to find an effective way to explain it in the spec. Harry followed up with some people in the halls about the spreadsheet example; as mnot said, Excel spreadsheets contain the bulk of the data in the enterprise.

One person was even followingn along closely enough to help me realize that the slide on monotonicity/partial understanding uses a really bad example.

The official LinkedData session was on Friday, but it spilled over to a few impromptu gatherings; on Wednesday evening, TimBL was browsing around with the tabulator, and he asked for some URIs from the audience, and in no time, we were browsing protiens and diseases, thanks to somebody who had re-packaged some LSID-based stuff as HTTP+RDF linked data.

Giovanni Tummarello showed a pretty cool back-link service for the Semantic Web. It included support for finding SPARQL endpoints relevant to various properties and classes, a contribution to the serviceDescription issue that the RDF Data Access Working Group postponed. I think I've seen a few other related ideas here and there; I'll try to put them in the ServiceDescription wiki topic when I remember the details...

Chris Bizer showed that dbpedia is the catalyst for an impressive federation of linked data. Back in March 2006, Toward Semantic Web data from Wikipedia was my wish into the web, and it's now granted. All those wikipedia infoboxes are now out there for SPARQLing. And other groups are hooking up musicbrainz and wordnet and so on. After such a long wait, it seems to be happening so fast! 

Speaking of fast, the Semantic MediaWiki project itself is starting to do performance testing with a full copy of wikipedia, Denny told us on Friday afternoon in the DevTrack.

Also speaking of fast, how did OpenLink go from not-on-my-radar to supporting every Semantic Web Technology I have ever heard of in about a year? I got part of the story in the halls... it started with ODBC drivers about a decade ago, which explains why their database integration is so good. Kingsley, here's hoping we get to play volleyball sometime. It's a shame we had just a few short moments together in the halls...

tags: (photos), grddl, www2007, travel

IKL by Hayes et al. provides a semantics for N3?

Submitted by connolly on Thu, 2007-05-17 14:25. :: |

One my trip to Duke, just after I arrived on Thursday, Pat Hayes gave a talk about IKL; it's a logic with nice Web-like properties such as any collection of well-formed IKL sentences is itself well-formed. As he was talking, I saw lots of parallels to N3... propositions as terms, log:uri, etc.

By Friday night I was exhuasted from travel, lack of sleep, and conference-going, but I couldn't get the IKL/N3 ideas out of my head, so I had to code it up as another output mode of

The superman case works, though it's a bit surprising that rdf:type gets contextualized along with superman. The thread continues with the case of "if your homepage says you're vegetarian, then for the purpose of registration for this conference, you're vegetarian". I'm still puzzling over Pat's explanation a bit, but it seems to make sense.

Along with the IKL spec and IKL Guide, Pat also suggests:

Collaboration and crime at a distance at HASTAC, WWW2007

Submitted by connolly on Thu, 2007-05-17 13:44. :: |

I went to the 1st International HASTAC Conference, April 19-21, 2007 at Duke University in Durham, NC, USA. My stated role was to tell the story of How the W3C Process Got Its Stripes to this humanities research community on a The World Wide Web Evolves panel that Harry Halpin arranged.

After a short history of my role in the development of the Web and W3C, I noted that the Internet not only faciiltates remote collaboration; it also opens the door to crime at a distance. Extortion of the form "say... nice web site you got there; it would be a shame if something happened to it" is a reality. I'm interested in research into how much the Internet can tolerate before we see the tragedy of the commons.

I noted the Proof-of-work proves not to work result by Laurie and Clayton in 2004 as a fairly surprising result based on what looks like fairly straightforward and unsophisticated economic analysis of spam, zombies, etc. Does the humanities research community have expertise in statistics and economics of preserving cultural values such as open communication? (Oh yeah... and I meant to encourage them to look at social/ethical issues around OpenID and distributed authentication, but I completely forgot.)

While HASTAC is somewhat on the leading edge of the humanities community, I'm not sure their scope includes what I'm looking for.

Meanwhile, at the Web Science panel at WWW2007 in Banff, Peter asked "Where are the cultural anthropologists?" I was pleasantly surprised that some of them were there. Again, at Harry Halpin's prompting.

The Mercurial SCM: great for lots of stuff, but not the holy grail

Submitted by connolly on Fri, 2007-03-23 15:44. :: | |

I have been tracking the mercurial project for a couple years now. First just a bookmark under python+scm, then after using hg to code on an airplane about a year later, I was hooked. I helped get the microformats testing effort using mercurial about a year later, and did some noodling on Access control and version control: an over-constrained problem? around that same time.

Yesterday I played host to Matt Mackall as he gave a presentation, The Mercurial SCM, to the W3C Team. In the disucssion that followed, we touched on:

  • fractal project organization (touching on PartiaClone and the ForestExtension)
  • the toplogy of update flows in a large development system with
    overlapping communities with differentt access rights
  • comparisons with Darcs
  • hg hosting, large projects, user support

It seems that hg scales to very large projects, as long as they're fairly uniform, but it doesn't support the sort of tangly fractal web of inter-project dependencies that would make it the holy grail of version control systems.

A design for web content labels built from GRDDL and rules

Submitted by connolly on Thu, 2007-01-25 13:35. :: | | |

In #swig discussion, Tim mentioned he did some writing on labels and rules and OWL which prompted me to flesh out some related ideas I had. The result is a Makefile and four tests with example labels. One of them is:

All resources on are accessible for all users and meet WAI AA guidelines except those on which are not suitable for users with impaired vision.

I picked an XML syntax out of the air and wrote visaa.lbl:

<wai:AAuser />

And then in testdata.ttl we have:

<> a webarch:InformationResource.
<> a
:charlene a wai:AAuser.

Then we run the test thusly...

$ make visaa_test.ttl
xsltproc --output visaa.rdf label2rdf.xsl visaa.lbl
python ../../../2000/10/swap/ visaa.rdf lblrules.n3 owlAx.n3
testdata.ttl \
--think --filter=findlabels.n3 --n3 >visaa_test.ttl

and indeed, it concludes:

    <>     lt:suitableFor :charlene .

but doesn't conclude that pg2needsVision is OK for charlene.

The .lbl syntax is RDF data via GRDDL and label2rdf.xsl. Then owlAx.n3 is rules that derive from the RDFS and OWL specs; i.e. stuff that's already standard. As Tim wrote, A label is a fairly direct use of OWL restrictions. This is very much the sort of thing OWL is designed for. Only the lblrules.n3 bit goes beyond what's standardized, and it's written in the N3 Rules subset of N3, which, assuming a few built-ins, maps pretty neatly to recent RIF designs.

A recent item from Bijan notes a SPARQL-rules design by Axel; I wonder if these rules fit in that design too. I hope to take a look soonish.

She's a witch and I have the proof (in N3)

Submitted by connolly on Tue, 2007-01-02 22:28. :: |

A while back, somebody turned the Monty Python Burn the Witch sketch into an example resolution proof. Bijan and Kendall had some fun turning it into OWL.

I'm still finding bugs pretty regularly, but the cwm/n3 proof stuff is starting to mature; it works for a few PAW demo scenarios. Ralph asked me to characterize the set of problems it works for. I don't have a good handle on that, but this witch example seems to be in the set.

Transcribing the example resolution FOL KB to N3 is pretty straightforward; the original is preserved in the comments:

@prefix : <witch#>.
@keywords is, of, a.

#[1] BURNS(x) /\ WOMAN(x) => WITCH(x)

{ ?x a BURNS. ?x a WOMAN } => { ?x a WITCH }.


#[3] \forall x, ISMADEOFWOOD(x) => BURNS(x)
{ ?x a ISMADEOFWOOD. } => { ?x a BURNS. }.

#[4] \forall x, FLOATS(x) => ISMADEOFWOOD(x)
{ ?x a FLOATS } => { ?x a ISMADEOFWOOD }.



#[6] \forall x,y FLOATS(x) /\ SAMEWEIGHT(x,y) => FLOATS(y)

{ ?x a FLOATS. ?x SAMEWEIGHT ?y } => { ?y a FLOATS }.

# and, by experiment


Then we run cwm to generate the proof and then run the proof checker in report mode:

$ witch.n3  --think --filter=witch-goal.n3  --why >witch-pf.n3
$ --report witch-pf.n3 >witch-pf.txt

The report is plain text; I'll enrich it just a bit here. Note that in the N3 proof format, some formulas are elided. It makes some sense not to repeat the whole formula you get by parsing an input file, but I'm not sure why cwm elides results of rule application. It seems to give the relevant formula on the next line, at least:

  1. ...
    [by parsing <witch.n3>]

  2. :GIRL a :WOMAN .
    [by erasure from step 1]

    [by erasure from step 1]

  4. :DUCK a :FLOATS .
    [by erasure from step 1]

  5. @forAll :x, :y . { :x a wit:FLOATS; wit:SAMEWEIGHT :y . } log:implies {:y a wit:FLOATS . } .
    [by erasure from step 1]

  6. ...
    [by rule from step 5 applied to steps [3, 4]
    with bindings {'y': '<witch#GIRL>', 'x': '<witch#DUCK>'}]

  7. :GIRL a :FLOATS .
    [by erasure from step 6]

  8. @forAll :x . { :x a wit:FLOATS . } log:implies {:x a wit:ISMADEOFWOOD . } .
    [by erasure from step 1]

  9. ...
    [by rule from step 8 applied to steps [7]
    with bindings {'x': '<witch#GIRL>'}]

    [by erasure from step 9]

  11. @forAll :x . { :x a wit:ISMADEOFWOOD . } log:implies {:x a wit:BURNS . } .
    [by erasure from step 1]

  12. ...
    [by rule from step 11 applied to steps [10]
    with bindings {'x': '<witch#GIRL>'}]

  13. :GIRL a :BURNS .
    [by erasure from step 12]

  14. @forAll witch:x . { witch:x a :BURNS, :WOMAN . } log:implies {witch:x a :WITCH . } .
    [by erasure from step 1]

  15. ...
    [by rule from step 14 applied to steps [2, 13]
    with bindings {'x': '<witch#GIRL>'}]

  16. :GIRL a :WITCH .
    [by erasure from step 15]

All the files are in the swap/test/reason directory: witch.n3, witch-goal.n3, witch-pf.n3, witch-pf.txt. Enjoy.

Modelling HTTP cache configuration in the Semantic Web

Submitted by connolly on Fri, 2006-12-22 19:10. :: |

The W3C Semantic Web Interest Group is considering URI best practices, whether to use LSIDs or HTTP URIs, etc. I ran into some of them at MIT last week. At first it sounded like they wanted some solution so general it would solve the only two hard things in Computer Science: cache invalidation and naming things , as Phil Karlton would say. But then we started talking about a pretty interesting approach: using the semantic web to model cache configuration. It has long been a thorn in my side that there is no standard/portable equivalent ot .htaccess files, no RDF schema for HTTP and MIME, etc.

At WWW9 in May 2000, I gave a talk on formalizing HTTP caching. Where I used larch there, I'd use RDF, OWL, and N3 rules, today. I made some progress in that direction in August 2000: An RDF Model for GET/PUT and Document Management.

Web Architecture: Protocols for State Distribution is a draft I worked on around 1996 to 1999 wihthout ever really finishing it.

I can't find Norm Walsh's item on wwwoffle config, but I did find his XML 2003 paper Caching in with Resolvers:

This paper discusses entity resolvers, caches, and other strategies for dealing with access to sporadically available resources. Our principle focus is on XML Catalogs and local proxy caches. We’ll also consider in passing the ongoing debate of names and addresses, most often arising in the context of URNs vs. URLs.

In Nov 2003 I worked on Web Architecture Illustrated with RDF diagramming tools.

The tabulator, as it's doing HTTP, propagates stuff like content type, last modified, etc. from javascript into its RDF store. Meanwhile, the accessability evaluation and repair folks just released HTTP Vocabulary in RDF. I haven't managed to compare the tabulator's vocabulary with that one yet. I hope somebody does soon.

And while we're doing this little survey, check out the Uri Template stuff by Joe Gregorio and company. I haven't taken a very close look yet, but I suspect it'll be useful for various problems, if not this one in particular.

Syndicate content