connolly's blog

how much do I want to know about drupal?

Submitted by connolly on Wed, 2006-08-09 09:59. ::

breadcrumbs fell over, again, today. Disk full. Probably the spam database filled up with drek. Again. While googling for reports of similar problems, I discovered drupal 4.7 is out since May 1. They tout TimBL's blog in their release announcement. I wonder if they'd help us upgrade. Well, they do provide a video about upgrading. Maybe I'll find time to watch it.

Meanwhile, I discovered a couple interesting articles on the design/architecture of drupal and how PHP is used: Drupal Programming from an Object-Oriented Perspective and the toungue-in-cheek The Road to Drupal Hell.

I'm not sure how much of this I really want to know. As I said back in my october item on PHP angst, I'm mostly playing simple customer when it comes to drupal. But I'm having a hard time investing in technology that I don't know inside and out.

In a #swig discussion where I was considering Zope alternatives (the one-big-file design has lost its charm), it occurred to me that I have read (parts of) the source to most everything that currently backs my personal web site Zope, the python interpreter, libc, various bits of debian infrastructure, and the linux kernel. I wonder when that will become totally impractical, and I'll understand my web site no more than I understand my car.

tabulator maps in Argentina

Submitted by connolly on Mon, 2006-08-07 11:39. :: | |

My spanish is a little rusty, but it looks like inktel is having fun with the tabulator's map support too.

tags pending: geo, tabulator

OpenID, verisign, and my life: mediawiki, bugzilla, mailman, roundup, ...

Submitted by connolly on Mon, 2006-07-31 15:45. ::

Please, don't ask me to manage another password! In fact, how about getting rid of most of the ones I already manage?

I have sent support requests for some of these; the response was understandable, if disappointing: when debian/ubuntu supports it, or at least when the core MailMain/mediawiki guys support it, we'll give it a try. I opened Issue 18: OpenID support in roundup too; there are good OpenID libraries in python, after all.

A nice thing about OpenID is that the service provider doesn't have to manage passwords either. I was thinking about where my OpenID password(s) should live, and I realized the answer is: nowhere. If we put the key fingerprint in the OpenID persona URL, I can build an OpenID server does public key challenge-response authentication and doesn't store any passwords at all.

As I sat down to tinker with that idea, I rememberd the verisign labs openid service and gave it a try. Boy, it's nice! They use the user-chosen photo anti-phishing trick and provide nice audit trails. So it will probably be quite a while before I feel the need to code my own OpenID server.

I'm still hoping for mac keychain support for OpenID. Meanwhile, has anybody seen a nice gnome applet for keeping the state of my ssh-agent credentials and my CSAIL kerberos credentials visible?

Choosing flight itineraries using tabulator and data from Wikipedia

Submitted by connolly on Mon, 2006-07-17 18:13. :: |

While planning a trip to Boston/Cambridge, I was faced with a blizzard of itinerary options from American Airlines. I really wanted to overlay them all on the same map or calendar or something. I pretty much got it to work:

That's a map view using the tabulator, which had another release today. The itinerary data in RDF is converted from HTML via grokOptions.xsl (and tidy).

I can, in fact, see all the itineraries on the same calendar view. Getting these views to be helpful in choosing between the itineraries is going to take some more work, but this is a start.

Getting a map view required getting latitude/longitude info for the airports. I think getting Semantic Web data from Wikipedia is a promising approach. A while back, I figured out how to get lat/long data for airports from wikipedia. This week, I added a Kid template, aptinfo.kid, and I figured figured out how to serve up that data live from the DIG/CSAIL web server. For example, http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD#item is a URI for the Chicago airport, and when you GET it with HTTP, a little CGI script calls aptdata.py, which fetches the relevant page from wikipedia (using an httplib2 cache) and scrapes the lat/long and a few other details and gives them back to you in RDF. Viewed with RDF/N3 glasses, it looks like:

#   Base was: http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD
@prefix : <#> .
@prefix apt: <http://www.daml.org/2001/10/html/airport-ont#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix go: <http://www.w3.org/2003/01/geo/go#> .
@prefix s: <http://www.w3.org/2000/01/rdf-schema#> .

:item a apt:Airport;
apt:iataCode "ORD";
s:comment "tz: -6";
s:label "O'Hare International Airport";
go:within_3_power_11_metres :nearbyCity;
geo:lat "41.9794444444";
geo:long "-87.9044444444";
foaf:homepage <http://en.wikipedia.org/wiki/O%27Hare_International_Airport> .

:nearbyCity foaf:homepage <http://en.wikipedia.org/wiki/wiki/Chicago%2C_Illinois>;
foaf:name "Chicago, Illinois" .

In particular, notice that:

  • I use the swig geo vocabulary, which the new GEO XG is set up to refine. The use of strings rather than datatyped floating point numbers follows the schema for that vocabulary.
  • I use distinct URIs for the airport (http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD#item) and the page about the airport (http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD).
  • I use an owl:InverseFunctionalProperty, foaf:homepage to connect the airport to its wikipedia article, and another, apt:iatacode to relate the airport to its IATA code.
  • I use the GeoOnion pattern to relate the airport to a/the city that it serves. I'm not sure I like that pattern, but the idea is to make a browseable web of linked cities, states, countries, and other geographical items.

Hmm... I use rdfs:label for the name of the airport but foaf:name for the name of the city.I don't think that was a conscious choice. I may change that.

The timezone info is in an rdfs:comment. I hope to refine that in future episodes. Stay tuned.

a walk thru the tabulator calendar view

Submitted by connolly on Tue, 2006-07-11 11:22. :: | |

The SIMILE Timeline is a nifty hack. DanBri writes about Who, what, where, when? of the Semantic Web, and in a message to the Semantic Web Interest Group, he asks

TimBL, any plans for Geo widgets to be wired into the tabulator?

Indeed, there are. Stay tuned for more on that... from the students who are actually doing the work, I hope. But I can't wait to tell everybody about the calendar view. Give it a try:

  1. Start with the development version of the tabulator.
  2. Select a source of calendar info. Morten's SPARQL Timeline uses RSS. The tabulator calendar groks dc:date, so something like W3C's main RSS feed will work fine. Put its URI in the tabulator's URI text field and hit "Add to outliner".
    add to outliner screenshot, before

    When it's done it should look something like this:

    add to outliner screenshot, after
    • For fun, open the Sources tab near the bottom. Note that the tabulator loads the RSS and DC schemas, plus all the schemas they reference, and so on; i.e. the ontological closure. Hmm... the RSS terms seem to be 404.
      sources screenshot
  3. Now navigate the outline down to one of the items.
    item screenshot
    and then re-focus (shift click) on the rss item class itself, and then open an item and select the date property.
    refocus screenshot
  4. Now hit "Tabulate selected properties". You'll get a table of items and their dates.
    table screenshot
  5. OK, so much for review of basic tabulator stuff. Now you're all set for the new stuff. Hit Calendar and scroll down a little:
    table screenshot

Note the Export button with the SPARQL option. That's a whole other item in itself, but for now, you can see the SPARQL query that corresponds to what you've selected to put on the the calendar:

SELECT ?v0 ?v1 
WHERE
{
?v0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/rss/1.0/item> .
?v0 <http://purl.org/dc/elements/1.1/date> ?v1 .

Fun, huh?

converting vcard .vcf syntax to hcard and catching up on CALSIFY

Submitted by connolly on Thu, 2006-06-29 00:17. :: | | | |

A while back I wrote about using JSON and templates to produce microformat data. I swapped some of those ideas in today while trying to figure out a simple, consistent model for recurring events using floating times plus locations.

I spent a little time catching up on the IETF CALSIFY WG; they meet Wednesday, July 12 at 9am in Montreal. I wonder how much progress they'll make on issues like the March 2007 DST change and the CalConnect recommendations toward an IANA timezone registry.

When I realized I didn't have a clear problem or use case in mind, I went looking for something that I could chew on in test-driven style.

So I picked up the hcard tests and built a vcard-to-hcard converter sort of out of spare parts. icslex.py handles low-level lexical details of iCalendar, which turn out to have lots in common with vCard: line breaking, escaping, that sort of thing. On top of that, I wrote vcardin.py, which has enough vcard schema smarts to break down the structured N and ADR and GEO properties so there's no microparsing below the JSON level. Then contacts.kid is a kid template that spits out the data following the hcard spec.

It works like this:

python vcardin.py contacts.kid hcard/01-tantek-basic.vcf >,01.html

Then I used X2V to convert the results back to .vcf format and compared them using hcard testing tools (normalize.pl and such) fixed the breakage. Lather, rinse, repeat... I have pretty much all the tests working except 16-honorific-additional-multiple.

It really is a pain to set up a loop for the additional-name field when that field is almost never used, let alone used with multiple values. This sort of structure is more natural in XML/XHTML/hCard than in JSON, in a way. And if I change the JSON structure from a string to a list, does that mean the RDF property should use a list/collection? Probably not... I probably don't need additional-name to be an owl:FunctionalProperty.

Hmm... meanwhile, this contacts.kid template should mesh nicely with my wikipedia airport scraping hack...

See also: IRC notes from #microformats, from #swig.

fun with flock

Submitted by connolly on Fri, 2006-06-16 23:52. :: |

I just found this...

Flocks has a nice WYSIWYG blog post editor that's a marvel to use. It does the quoting and citation for me. It lets me drag images from the Web, or my flickr stream straight into the post. It just makes posting so much more fun.

Labnotes � Blog Archive � Mucking around with Flock

Ooh... excerpt with 2 guestures.

Source view!

Let's try linking... and list editing...

Let's try pictures...

A photo on Flickr

Seems to work quite nicely!

Odd... posting works, but the posting dialog hangs.

Hmm... I'd rather use delicious than technorati as the base of my tags.

technorati tags:, , ,

Blogged with Flock

Equality and inconsistency in the rules layer

Submitted by connolly on Mon, 2006-06-05 17:13. :: |

In the working group that originally did RDFS, there wasn't enough confidence to standardize things that could express inconsistencies, such as disjointness of classes, nor to do equality reasoning a la foaf:mbox. The 2004 versions of RDF and RDFS are now properly grounded in mathematical logic, and OWL specifies owl:disjointWith and owl:differentFrom as well as owl:InverseFunctionalProperty and owl:sameAs.

I have gotten quite a bit of mileage out of N3 rules that capture at least some of the meaning of these constructs. If we feed this to cwm or Euler:

{ ?p a owl:InverseFunctionalProperty.
?u ?p ?w.
?v ?p ?w.
} => { ?u = ?v }.

:Dan foaf:mbox <mailto:connolly@w3.org>;
foaf:name "Dan C.".
:Daniel foaf:mbox <mailto:connolly@w3.org>;
foaf:name "Daniel W. Connolly".

foaf:mbox a owl:InverseFunctionalProperty.

then they will conclude...

:Dan = :Daniel.

And we can use cwm's equality reasoning mode or explicit substitution-of-equals-for-equals rules a la { ?S1 ?P ?O. ?S1 = ?S2 } => { ?S2 ?P ?O } to conclude...

:Dan     = :Dan,
:Daniel;
foaf:mbox <mailto:connolly@w3.org>;
foaf:name "Dan C.",
"Daniel W. Connolly" .

To capture owl:disjointWith, I use:

{ [ is rdf:type of ?x ] owl:disjointWith
[ is rdf:type of ?y ] }
=> { ?x owl:differentFrom ?y }.

Taking an example from the DAML walkthru that I worked on with Lynn Stein and Deb McGuinness back in 2000:

    :Female     a rdfs:Class;
daml:disjointWith :Male;
rdfs:subClassOf :Animal .

... and adding :bob a :Male. :jill a :Female gives :bob owl:differentFrom :jill. And from :pat a :Male, :Female, we get :pat owl:differentFrom :pat which is clearly inconsistent.

It's pretty straightforward to write rules to express these features; the axiomatic semantics from Fikes and McGuinness in 2001 represents them in KIF. It's much less straightforward to be sure there are no bugs, such as the inconsistency reported by Baclawski of the UBOT project. So I think ter Horst's paper, Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary is a particularly interesting contribution. My study of his rules is in owlth.n3. In many cases, the rules are the ones I have been using for years:

# rdfp1
{ ?p a owl:FunctionalProperty.
?u ?p ?v.
?u ?p ?w.
} => { ?v = ?w }.

But the semantics he gives does not always interpret owl:sameAs as identity; I'd like to understand better why not. He looks for clashes of the form ?v owl:differentFrom ?w; owl:sameAs ?w and ?v owl:disjointWith ?w. ?u a ?v, ?w; isn't it enough to just look for ?x owl:differentFrom ?x and reduce owl:disjointWith to owl:differentFrom by the rule I gave above?

Exporting databases in the Semantic Web with SPARQL, D2R, dbview, ARC, and such

Submitted by connolly on Fri, 2006-06-02 16:55. :: | | |

The developer track at WWW2006 last week in Edinburgh was really cool; you had to show up on time or you couldn't fit in the room! One of the coolest talks was D2R-Server - Publishing Relational Databases on the Web as SPARQL-Endpoints.. I see D2R Server is released now. Cool.

Yes, storing RDF in a SQL database using 3-column tables (or 4 or 5 or 6...) is cool as far as it goes, but I'm gland we're finally seeing more work on taking existing SQL databases (whose schemas are not designed with RDF in mind) and exporting them as RDF.

TimBL wrote a design note on Relational Databases on the Semantic Web in 1998. In 2002, I wrote dbview.py, a couple hundred lines of python that implements parts of it. Rob Crowell picked it up and the 2005/2006 version of dbview.py now does foreign keys and backlinks.

D2R gets points for using RDF for their configuration/mapping info. The slides showed turtle/n3. Why are the dbin brainlets in XML but not RDF? I wonder.

D2R Server has a mapping layer; dbview assumes that will be handled with rules. The choice of URIs for column names is interesting. D2R uses jdbc:mysql://127.0.0.1/wordpress#users1, but dbview is all about embedding a SQL database in HTTP space, so we use URIs like http://db.example/orders/customers/custno/1#item. In dbview, the decisions about when to use / and when to use # are made so that the result is browseable. In D2R, the default URIs don't matter as much because it's expected that they'll be mapped to a more well-known ontology/schema like foaf.

dbview is still just a few hundred lines of python; we haven't integrated the SPARQL parser that Yosi developed for cwm, nor integrated EricP's work on federated query.

Speaking of federated query... on Wednesday at the conference, I saw Tim Finin in the poster session. He showed me something the swoogle folks are cooking up: you give it a SPARQL query, and it looks at the terms used in your query and suggests documents you should put in your SPARQL dataset to run your query against. I hope to hear more about that.

Somewhere in EricP's work is one of the several SPARQL-to-SQL rewriters out there... oh... I thought the HP tech report, A relational algebra for SPARQL was another one, but it seems to be by Richard Cyganiak, one of the D2R guys.

Benjamin Nowack's Feb 2006 item announced a SPARQL-to-SQL rewriter for his ARC RDF store for PHP.

Hmm... maybe it's time for a ScheduledTopicChat on SPARQL, SQL, and RDF? If you're interested, suggest a couple times that would be good for you in a comment or in mail to me and a public archive.

WWW2006 in Edinburgh: Identity, Reference, and Meaning

Submitted by connolly on Fri, 2006-06-02 14:40. :: | | | | |

I went to Edinburgh last week for WWW2006.

I spent Tuesday in the workshop on Identity, Reference, and the Web (IRW2006). I didn't really finish my presentation slides in time, but I think my paper, A Pragmatic Theory of Reference for the Web is mostly coherent. Each section of the workshop got an entry in a semantic wiki; mine is the one that started at 12:00.

The IRE formalism presented by Valentina and Aldo was though-provoking. I think their proxy-for is like foaf:topic (modulo the way they mix in time). And exact-proxy-for is like foaf:primaryTopic. Very handy. I wonder if foaf:primaryTopic should be promoted to its own thing, separate from all the social networking stuff in foaf.

Ginsberg's talk hit on one of the most important questions: "Do I commit to a document just because I use one of its terms?" His answer was basically to reify everything; I think we can do better than that. Peter Patel-Schneider's talk basically gave a 'no' answer to the question. I don't think we should go that far either, though from a standardization point of view, that's sorta where we're at.

Steve Pepper's talked about published subjects and public resource identifiers; I can sympathize with his point that we have too many URL/URI/URN/IRI/XRI/etc. terms, but when he suggests that the answer is to make a new one, I'm not sure I agree. He argues to deprecate all the others, but as URI Activity lead at W3C, I'm not in a position where I can overrule people and deprecate things that they say they want. I agree with him that the 303 redirection is too much trouble, but he doesn't seem to be willing to use the HashURI pattern either, and as I said in the advice section of my paper, that's asking for trouble.

On Thursday, I was on a panel about tagging versus the Semantic Web: Meaning on the Web: Evolution or Intelligent Design?. Frank started by debunking 4 myths about the Semantic Web. I gotta find Frank's slides. "I'll hold up one finger whever anybody says myth #1, and so on." As the the other Frank was talking about tagging, Frank held up 2 and 3 fingers, and the audience pointed out that he should have held up 1 finger.

I talked without slides. I think I got away with it. I said that I don't expect symbolic reasoning to beat statistical methods when it comes to the wisdom of crowds, but who wants to delegate their bank balance or the targets of their mail messages to the wisdom of crowds? Sometimes we mean exactly what we say, not just something close.

I suggested that GRDDL+microformats is a practical way to get lots of Semantic Web data. And I brought up the problem with iCalendar timezones and noted that while timezones data should be published by the government entities that govern them, Semantic Web data from wikipedia might be a more straightforward mechanism and might be just as democratic.

So much for philosophical discussions; stay tuned for another item about SPARQL and databases and running code.

Syndicate content