connolly's blog

Map and Territory in RDF APIs

Submitted by connolly on Tue, 2010-04-27 14:30. :: |

RDF specs and APIs have made a bit of a mess out of a couple pretty basic tools of math and computing: graphs and logic formulas. With the RDF next steps workshop coming up and Pat Hayes re-thinking RDF semantics Sandro thinking out loud about RDF2, I'd like us to think about RDF in more traditional terms. The scala programming language seems to be an interesting framework to explore how they relate to RDF.

The Feb 1999 RDF spec wasn't very clear about the map and the territory. It said that statements are made out of parts in the territory, rather than features on the map, which doesn't make very much sense. RDF APIs seem to inherit this confusion; e.g. from an RDF::Value class for ruby:

Examples:

Checking if a value is a resource (blank node or URI reference)

value.resource

Blank nodes and URI references are parts of the map; resources are in the territory.

Likewise in Package org.jrdf.graph:

Resource A resource stands for either a Blank Node or a URI Reference.

The 2004 RDF specs take great pains to clarify these use/mention distinctions, but they also go on at great length.

Let's review Wikipedia on graphs:

In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. ...  The edges may be directed (asymmetric) or undirected (symmetric) ... and the edges are called directed edges or arcs; ... graphs which have labeled edges are called edge-labeled graphs.


With that in mind, in the swap-scala project, we summarize the RDF abstract syntax as an edge-labelled directed graph with just one or two wrinkles:

package org.w3.swap.rdf

trait RDFGraphParts {
  type Arc = (SubjectNode, Label, Node)

  type Node
  type Literal <: Node
  type SubjectNode <: Node
  type BlankNode <: SubjectNode
  type Label <: SubjectNode
}

The wrinkles are:

  • Arcs can only start from BlankNodes or Labels, i.e. SubjectNodes
  • Arcs labels may also appear as Nodes

We use another trait to relate concrete datatypes to these abstract types:

trait RDFNodeBuilder extends RDFGraphParts {
def uri(i: String): Label
type LanguageTag = Symbol
def plain(s: String, lang: Option[LanguageTag]): Literal
def typed(s: String, dt: String): Literal
def xmllit(content: scala.xml.NodeSeq): Literal
}

This doesn't pin down what a Label is, but in any concrete implementation, you can build one from a String using the uri method. The RDFNodeBuilder trait is used to implement RDF/XML, RDFa, and turtle parsers that are agnostic to the concrete implementation of an RDF graph.

Now let's look at terms of first order logic:

 The set of terms is inductively defined by the following rules:

  1. Variables. Any variable is a term.
  2. Functions. Any expression f(t1,...,tn) of n arguments (where each argument ti is a term and f is a function symbol of valence n) is a term.
This is represented straightforwardly in scala a la:
package org.w3.swap.logic1
/**
* A Term is either a Variable or an FunctionTerm.
*/
sealed abstract class Term { ... }

class Variable extends Term { ...}

abstract class FunctionTerm() extends Term {
def fun: Any
def args: List[Term]
}

The core RDF doesn't cover all of first order logic; it corresponds fairly closely to the conjunctive query fragment:

The conjunctive queries are simply the fragment of first-order logic given by the set of formulae that can be constructed from atomic formulae using conjunction \wedge and existential quantification \exists, but not using disjunction \lor, negation \neg, or universal quantification \forall.

We can then excerpt just the relevant parts of the definition of formulas:

The set of formulas is inductively defined by the following rules:

  1. Predicate symbols. If P is an n-ary predicate symbol and t1, ..., tn are terms then P(t1,...,tn) is a formula.
  2. Binary connectives. If φ and ψ are formulas, then (φ \rightarrow ψ) is a formula. Similar rules apply to other binary logical connectives.
  3. Quantifiers. If φ is a formula and x is a variable, then \forall x \varphi and \exists x \varphi are formulas.
Our scala representation follows straightforwardly:
package org.w3.swap.logic1ec 

sealed abstract class ECFormula
case class Exists(vars: Set[Variable], g: And) extends ECFormula
sealed abstract class Ground extends ECFormula
case class And(fmlas: Seq[Atomic]) extends Ground
case class Atomic(rel: Symbol, args: List[Term]) extends Ground

Now that we have scala representations for RDF graphs and conjunctive query formulas, how do we relate them? This is the fun part:

package org.w3.swap.rdflogic

import swap.rdf.RDFNodeBuilder
import swap.logic1.{Term, FunctionTerm, Variable}
import swap.logic1ec.{Exists, And, Atomic, ECProver, ECFormula}

/**
* RDF has only ground, 0-ary function terms.
*/
abstract class Ground extends FunctionTerm {
override def fun = this
override def args = Nil
}

case class Name(n: String) extends Ground
case class Plain(s: String, lang: Option[Symbol]) extends Ground
case class Data(lex: String, dt: Name) extends Ground
case class XMLLit(content: scala.xml.NodeSeq) extends Ground


/**
* Implement RDF Nodes (except BlankNode) using FOL function terms
*/
trait TermNode extends RDFNodeBuilder {
type Node = Term
type SubjectNode = Term
type Label = Name

def uri(i: String) = Name(i)

type Literal = Term
def plain(s: String, lang: Option[Symbol]) = Plain(s, lang)
def typed(s: String, dt: String): Literal = Data(s, Name(dt))
def xmllit(e: scala.xml.NodeSeq): Literal = XMLLit(e)
}

The abstract RDFGraphBuilder node types are implemented as first order logic terms. For formulas, we use a "holds" predicate:

 object RDFLogic extends ... {
def atom(s: Term, p: Term, o: Term): Atomic = {
Atomic('holds, List(s, p, o))
}
def atom(arc: (Term, Term, Term)): Atomic = {
Atomic('holds, List(arc._1, arc._2, arc._3))
}
}

Then all the semantic machinery up to simple entailment between RDF graphs just falls out of conjunctive query.

I haven't done RDFS Entailment yet; the plan is to do basic rules first (N3rules or RIF BLD) and then use that for RDFS, OWL2-RL, and the like.

 

 

Existentials in ACL2 and Milawa make sense; how about level breakers?

Submitted by connolly on Wed, 2010-01-20 18:20. :: |

Since my Sep 2006 visit to the ACL 2 seminar, I've been trying to get my head around existentials in ACL2. The lightbulb finally went off this week while reading Jared's Dec 2009 Milawa thesis.

3.7 Provability

Now that we have a proof checker, we can use existential quantification to
decide whether a particular formula is provable. Recall from page 61 the notion
of a witnessing (Skolem) function.
We begin by introducing a witnessing function,
logic.provable-witness, whose defining axiom is as follows.


Definition 92: logic.provable-witness
(por* (pequal* ...))

Intuitively, this axiom can be understood as: if there exists an appeal which is
a valid proof of x, then (logic.provable-witness x axioms thms atbl) is such
an appeal.

Ding! Now I get it.

This witnessing stuff is published in other ACL publications, noteably:

  • Structured Theory Development for a Mechanized Logic, M. Kaufmann and J Moore, Journal of Automated Reasoning 26, no. 2 (2001), pp. 161-203.

But I can't fit those in my tiny brain.

Thanks, Jared, for explaining it at my speed!

Here's hoping I can turn this new knowledge into code that converts N3Rules to ACL2 and/or Milawa's format. N3Rules covers RDF, RDFs, and, I think, OWL2-RL and some parts of RIF. Roughly what stuff FuXi covers.

I'm somewhat hopeful that the rest of N3 is just quoting. That's the intuition that got me looking into ACL2 and Milawa again after working on some TAG stuff using N3Logic to encode ABLP logic. Last time I tried turning N3 {} terms in to lisp quote expressions was when looking at IKL as a semantic framework for N3. I didn't like the results that time; I'm not sure why I expect it to be different this time, but somehow I do...

Another question that's keeping me up at night lately: is there a way to fit level-breakers such as log:uri (or name and denotation, if not wtr from KIF) in the Milawa architecture somehow?

 

DIG losing the battle with spammers again

Submitted by connolly on Tue, 2009-03-10 11:56. :: |

Blog spam went out of control again; the only remedy I could find was a very big hammer: turn off the drupal comments module altogether and in doing so, unpublish all comments ever posted to this site. I suppose they're still in the database and could be published again, if we could separate them from the spam.

The drupal expertise in our group seems to have gone on to greener pastures. That prompted me to divest from my family business drupal installation and start a hosted wordpress site and makes me wonder how safe is stuff that I write here...

Any MIT students want to help this research group manage a community presence? Please get in touch.

OpenID "Hello World" on apache still deep magic

Submitted by connolly on Thu, 2009-01-08 18:37. ::

I have a home movie that I just want to show to just a few friends around the Web. With OpenID, I should be able to just give my web server a list of my friends' pages, right?

I eventually found a README for mpopenid with just what I wanted:

PythonOption authorized-users "http://alice.com/ http://bob.com/"

But that wasn't on the top page of hits on a search for "apache OpenID". (Like most sites, mine runs on apache.) The top hit is mod_auth_openid, but its FAQ that says my use case isn't directly supported:

Is it possible to limit login to some users, like htaccess/htpasswd does?
No. ... If you want to restrict to specific users that span multiple identity providers, then OpenID probably isn't the authentication method you want. Note that you can always do whatever vetting you want using the REMOTE_USER CGI environment variable after a user authenticates.

So I installed the prerequisites for mpopenid: libapache2-mod-python and python-elementtree were straightforward, but I struggled to find a version of python-openid that matched. I almost gave up at that point, but heartened by somebody else who got mpopenid working, I went back to searching and found a launchpad development version of mpopenid. That seems to work with python-openid-1.1.0.

In /etc/apache2/sites-available/mysite, I have this bit that glues mpopenid's login page into my site:

<Location "/openid-test-aux">
SetHandler mod_python
PythonOption action-path "/openid-test-aux"
PythonHandler mpopenid::openid
</Location>

And in mysite/movies/.htaccess, this bit says only I get to see http://mysite.example/sekret:

<Files "sekret">
PythonAccessHandler mpopenid::protect
PythonOption authorized-users "http://www.w3.org/People/Connolly/"
</Files>

The mpopenid README also shows an option to put the list of pages in a separate file:

PythonOption authorized-users-list-url file:///my/directory/allowed-users.txt

But I haven't tried that yet. So far I'm happy to put the list right in the .htaccess file.

The details of data in documents; GRDDL, profiles, and HTML5

Submitted by connolly on Fri, 2008-08-22 14:09. :: | |
GRDDL, a mechanism for putting RDF data in XML/XHTML documents, is specified mostly at the XPath data model level. Some GRDDL software goes beyond XML and supports HTML as she are spoke, aka tag soup. HTML 5 is intended to standardize the connection between tag soup and XPath. The tidy use case for GRDDL anticipates that using HTML 5 concrete syntax rather than XHTML 1.x concrete syntax involves no changes at the XPath level

But in GRDDL and HTML5, Ian Hickson, editor of HTML 5, advocates dropping the profile attribute of the HTML head element in favor of rel="profile" or some such. I dropped by the #microformats channel to think out loud about this stuff, and Tantek said similarly, "we may solve this with rel="profile" anyway." The rel-profile topic in the microformats wiki shows the idea goes pretty far back.

Possibilities I see include:
  • GRDDL implementors add support for rel="profile" along with HTML 5 concrete syntax
  • GRDDL implementors don't change their code, so people who want to use GRDDL with HTML 5 features such as <video> stick to XML-wf-happy HTML 5 syntax and they use the head/@profile attribute anyway, despite what the HTML 5 spec says.
  • People who want to use GRDDL stick to XHTML 1.x.
  • People who want to put data in their HTML documents use RDFa.

I don't particularly care for the rel="profile" design, but one should choose ones battles and I'm not inclined to choose this one. I'm content for the market to choose.

 

sidekick calendar subscription for SXSW

Submitted by connolly on Sat, 2008-03-08 12:57. :: | |

At a conference, like in a good coding session, it's too easy to lose track of time, so I rely heavily on a PDA to remind me of appointments. The SXSW program has just the features I want:

  • an "add this to my calendar" button next to each session
  • a calendar feed of my choices

But I carry a hiptop, which doesn't support calendar subscription. I could copy-and-paste a few critical sessions to my hiptop, but when the climbing geeks offer an hCalendar feed, it becomes wortwhile to use iCal on the laptop, i.e. something that groks calendar subscription, as the master calendar device.

I have had a system for exporting my mobile calendar as a feed, but it's a tedious 4 step shell command sequence; it's OK once or twice a week, but here at SXSW, I want to sync up several times a day.

I have been moving my palmagent project from shell commands and Makefiles to a RESTful Web service, and this pushed me over the edge to add calendar feed support.

As usual, to pull the data from the hiptop's data servers:

  1. Make a directory to hold hiptop accounts and put it in hip_config.py:
    AccountsDir = "/Users/connolly/Desktop/danger-accts"
  2. Start hipwsgi.py running:
    pbjam:~/projects/palmagent$ python hipwsgi.py &
    Serving HTTP on 0.0.0.0 port 8080 ...
  3. Use dangerSync.py to log in and get some session credentials for half an hou of use:
    ~/Desktop/danger-accts/ACCT $ python ~/projects/palmagent/dangerSync.py \
    --prod --user ACCT \
    --passwd YOUR_PASSWORD_HERE \
    >session-id
  4. Visit http://0.0.0.0:8080/pim/ACCT and hit the Pull button.

Now you have event, task, contact, and note directories containing a JSON file for each record and hipwsgi.py lets you navigate them in a few different ways.

The pull feature is incremental; it grabs just the records that have changed since you previously pulled:

Pull majo from danger hiptop service

back to sync options

event

 

The new feature today is the ical export, linked from the event categories page:

event

back to sync options

 

You can copy the address of that ical export link and subscribe to it from iCal, and bingo, there it is, merged with the SXSW calendar and such.

@@screenshot pending 

 

hAudio for microformats mixtapes, in progress

Submitted by connolly on Thu, 2008-03-06 17:00. ::

I was visiting a friend and I wanted to play Back When I Could Fly and the easiest way was to burn a CD and put it in their CD player and while I was at it I figured I might as well pick a few other songs... a sort of mixtape to say thanks for letting me crash there.

That sort of artifact is too precious to leave locked up in iTunes's proprietary format, even if it is XML; as I said in a July 2000 message

There are very few data formats I trust... when I use
the computer to capture my knowledge, I pretty
much stick to plain text, [X]HTML, and email. I use JPG, PNG, and PDF if I must,
but not for capturing knowledge for exchange, revision, etc.


So I wrote itunekb.py, which reads the iTunes data, picks out one playlist, and writes it out in hAudio format using a genshi template. The result is ordinary HTML at one level:

  1. Poems, Prayers And Promises by John Denver
    4:06 from A Song's Best Friend: The Very Best Of John Denver [Disc 1] (2004)
  2. Did You Feel The Mountains Tremble by Delirious?
    4:42 from WOW Worship: Orange (Disc 1) (2000)
  3. The Reason by Hoobastank
    3:52 from The Reason (2003)
  4. Back When I Could Fly by Trout Fishing In America
    3:29 from Family Music Party (1998)
  5. ...

At another level, it's yummy Semantic Web data.

Oops! Well, it used to be; but hAudio seems to be changing:

Here's hoping I find time to catch up.

I can only imagine...

Submitted by connolly on Sun, 2007-12-09 09:30. :: | | | |

I have a new bookmark. No, not a del.icio.us bookmark; not some bits in a file. This is the kind you have to go there to get.. go to Cleveland, that is. It reads:

Thank you
for you love & support
for the Ikpia & Ogbuji families
At this time of real need.
We will never forget
Imose, Chica, & Anya

Imose, Chica, & Anya

Abundant Life International Church
Highland heights, OH

After working with Chime for a year or so on the GRDDL Working Group (he was the difference between a hodge-podge of test files and a nicely organized GRDDL Test Cases technical report), I was really excited to meet him at the W3C Technical Plenary in Cambridge in early November. His Fuxi work is one of the best implementations of the way I think semantic web rules and proofs should go. When he told me some people didn't see the practical applications, it made me want to fly there and tell them how I think this will save lives and end world hunger.

So this past Tuesday, when I read the news about his family, the only way I could make my peace with it was to go and be with him. I can only imagine what he is going through. Eric Miller and Brian and David drove me to the funeral, but the line to say hi to the family was too long. And the internment service didn't really provide an opportunity to talk. So I was really glad that after I filled my plate at the reception, a seat across from Chime and Roschelle opened up for me and I got to sit and share a meal with them.

Grandpa Linus was at the table, too. His eulogy earlier at the funeral ended with the most inspiring spoken rendition of a song that I have ever heard:

Now The Hacienda's Dark The Town Is Sleeping
Now The Time Has Come To Part The Time For Weeping
Vaya Con Dios My Darling
Vaya Con Dios My Love

His eulogy is also posted as Grandparents' lament on The Kingdom Kids web site, along with details about a fund to help the family get back on their feet.

Free Culture: Why buy the Amazon Kindle when you can give and get an OLPC XO-1 for the same price?

Submitted by connolly on Tue, 2007-11-20 02:11. :: |

I just discovered Kindle: Amazon's New Wireless Reading Device. About $10 per e-book sounds ok, but $0.10 to put my own files on it?!?! It can read blogs like Slashdot and boingboing for as little as $.99 per month over the $399 purchase price. It comes with wikipedia. Say... that sounds familiar... where else can I get wikipedia on a device with a nice display that works in daylight...

Oh yeah! The OLPC XO-1. For the same $400 (+ shipping) you can get one and give one away.

Håkon brought one to the video panel at the W3C TPAC this month, while the voice of Lawrence Lessig was still ringing in my head: What have we done about it? he asked again and again in his powerful OSCON 2002 talk:

Lawrence Lessig: I have been doing this for about two years--more than 100 of these gigs. This is about the last one. One more and it's over for me. So I figured I wanted to write a song to end it. But then I realized I don't sing and I can't write music. But I came up with the refrain, at least, right? This captures the point. If you understand this refrain, you're gonna' understand everything I want to say to you today. It has four parts:

  • Creativity and innovation always builds on the past.

  • The past always tries to control the creativity that builds upon it.

  • Free societies enable the future by limiting this power of the past.

  • Ours is less and less a free society.

 

I don't sing all that well either, but I play a little guitar, so when Håkon walked into the HTML WG meeting as un-conference pitches were next on the agenda, I pitched a jam session. I dedicated the opening number,With a Little Help from My Friends, to Sam Ruby whose comment prompted me to watch the Lessig show before the trip. The InstantGig was "surreal (but awesome)" according to one account.

Håkon's pitch for open standard video for our cultural heritage inspired One laptop per Kyle, the story of getting an XO-1 for my 8-year-old boy instead of the Windows PC he says he want in order to play the games that his friends all play. Before the trip he told me that he wants to build a web site with lots and lots of games and I thought "but you're just one little boy." But I think I get it now...

He has a new name, by the way: Burn, as in Rip, Mix, an Burn. Rip, after 1 year of musical training, can sound out the Mario theme on trombone or piano in an afternoon, something I can't do after 20 years of training my mediocre ear. And the middle child, Mix, is so charming that if you stop at a red light, he'll have a new friend before the light turns green.

I have one give-one-get-one package on order for Burn; if you're feeling like a patron of the arts and you want to see what happens if Rip and Mix get one too, feel free to send us a Christmas Card with a little something inside.

And look out for SwordPedestal.com, which Kyle picked out. It's only a dream now, but I have a hunch it may one day rival Nintendo for the hearts and minds of a few million people.

brainstorming, issue tracking, and problem reporting... with tabulator?

Submitted by connolly on Mon, 2007-11-05 02:07. ::
I want to brainstorm about a bunch of issues in preparation for the HTML WG meeting this week. tracker's forms interface won't let me enter an issue until I fill out all the details... well, it doesn't really have that many constraints, except that it notifies the whole WG when an issue is added.

I want something more like the OmniOutliner experience... I want to brainstorm.

But when I'm done, I don't want to tediously copy and paste each field into tracker.

Clearly, I could write some python or XSLT to take OmniOutliner's XML and feed it to tracker afterward, but... can't we do better than that?

What if tabulator's UI were as smooth as OmniOutliner... and what if I could just push one button and get the toothpaste back in the tube, i.e. feed the outline into the tracker's REST interface?

p.s. why am I using emacs to write this? Apple Mail knows IntegrityIsJobOne, but in OS X 10.4, like iCal, it goes off into the weeds eating CPU for inexplicable reasons, and I don't invest debugging effort in stuff that isn't open source.

How do I feed this to breadcrumbs now? Does emacs have a markdown/ReStructuredText mode?
How about AtomPub support? I manually cut and pasted and cleaned up the line breaks. ugh!

I use Thunderbird on my PowerBook, but it's totally confused about offline operation. It goes to save to the drafts folder every now and again, but over IMAP... so if the net is flakey or down, (a) it doesn't actually save, and (b) it interrupts my drafting!!! OK... found the config option to use a local drafts folder under Tools/Account settings. (why not under preferences?) But Thunderbird doesn't do well filling the IMAP cache; I don't know to tell it to go offline until I've left the airport wifi, and at that point, it's too late to grab the mail I want to read. The Apple mailer does much better at using idle time to prefetch.

p.p.s. how do I use hReview and GRDDL to make the data in this gripe available as if it were a bugzilla entry? More on that to follow, I hope...
Syndicate content