RDF

Map and Territory in RDF APIs

Submitted by connolly on Tue, 2010-04-27 14:30. :: |

RDF specs and APIs have made a bit of a mess out of a couple pretty basic tools of math and computing: graphs and logic formulas. With the RDF next steps workshop coming up and Pat Hayes re-thinking RDF semantics Sandro thinking out loud about RDF2, I'd like us to think about RDF in more traditional terms. The scala programming language seems to be an interesting framework to explore how they relate to RDF.

The Feb 1999 RDF spec wasn't very clear about the map and the territory. It said that statements are made out of parts in the territory, rather than features on the map, which doesn't make very much sense. RDF APIs seem to inherit this confusion; e.g. from an RDF::Value class for ruby:

Examples:

Checking if a value is a resource (blank node or URI reference)

value.resource

Blank nodes and URI references are parts of the map; resources are in the territory.

Likewise in Package org.jrdf.graph:

Resource A resource stands for either a Blank Node or a URI Reference.

The 2004 RDF specs take great pains to clarify these use/mention distinctions, but they also go on at great length.

Let's review Wikipedia on graphs:

In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. ...  The edges may be directed (asymmetric) or undirected (symmetric) ... and the edges are called directed edges or arcs; ... graphs which have labeled edges are called edge-labeled graphs.


With that in mind, in the swap-scala project, we summarize the RDF abstract syntax as an edge-labelled directed graph with just one or two wrinkles:

package org.w3.swap.rdf

trait RDFGraphParts {
  type Arc = (SubjectNode, Label, Node)

  type Node
  type Literal <: Node
  type SubjectNode <: Node
  type BlankNode <: SubjectNode
  type Label <: SubjectNode
}

The wrinkles are:

  • Arcs can only start from BlankNodes or Labels, i.e. SubjectNodes
  • Arcs labels may also appear as Nodes

We use another trait to relate concrete datatypes to these abstract types:

trait RDFNodeBuilder extends RDFGraphParts {
def uri(i: String): Label
type LanguageTag = Symbol
def plain(s: String, lang: Option[LanguageTag]): Literal
def typed(s: String, dt: String): Literal
def xmllit(content: scala.xml.NodeSeq): Literal
}

This doesn't pin down what a Label is, but in any concrete implementation, you can build one from a String using the uri method. The RDFNodeBuilder trait is used to implement RDF/XML, RDFa, and turtle parsers that are agnostic to the concrete implementation of an RDF graph.

Now let's look at terms of first order logic:

 The set of terms is inductively defined by the following rules:

  1. Variables. Any variable is a term.
  2. Functions. Any expression f(t1,...,tn) of n arguments (where each argument ti is a term and f is a function symbol of valence n) is a term.
This is represented straightforwardly in scala a la:
package org.w3.swap.logic1
/**
* A Term is either a Variable or an FunctionTerm.
*/
sealed abstract class Term { ... }

class Variable extends Term { ...}

abstract class FunctionTerm() extends Term {
def fun: Any
def args: List[Term]
}

The core RDF doesn't cover all of first order logic; it corresponds fairly closely to the conjunctive query fragment:

The conjunctive queries are simply the fragment of first-order logic given by the set of formulae that can be constructed from atomic formulae using conjunction \wedge and existential quantification \exists, but not using disjunction \lor, negation \neg, or universal quantification \forall.

We can then excerpt just the relevant parts of the definition of formulas:

The set of formulas is inductively defined by the following rules:

  1. Predicate symbols. If P is an n-ary predicate symbol and t1, ..., tn are terms then P(t1,...,tn) is a formula.
  2. Binary connectives. If φ and ψ are formulas, then (φ \rightarrow ψ) is a formula. Similar rules apply to other binary logical connectives.
  3. Quantifiers. If φ is a formula and x is a variable, then \forall x \varphi and \exists x \varphi are formulas.
Our scala representation follows straightforwardly:
package org.w3.swap.logic1ec 

sealed abstract class ECFormula
case class Exists(vars: Set[Variable], g: And) extends ECFormula
sealed abstract class Ground extends ECFormula
case class And(fmlas: Seq[Atomic]) extends Ground
case class Atomic(rel: Symbol, args: List[Term]) extends Ground

Now that we have scala representations for RDF graphs and conjunctive query formulas, how do we relate them? This is the fun part:

package org.w3.swap.rdflogic

import swap.rdf.RDFNodeBuilder
import swap.logic1.{Term, FunctionTerm, Variable}
import swap.logic1ec.{Exists, And, Atomic, ECProver, ECFormula}

/**
* RDF has only ground, 0-ary function terms.
*/
abstract class Ground extends FunctionTerm {
override def fun = this
override def args = Nil
}

case class Name(n: String) extends Ground
case class Plain(s: String, lang: Option[Symbol]) extends Ground
case class Data(lex: String, dt: Name) extends Ground
case class XMLLit(content: scala.xml.NodeSeq) extends Ground


/**
* Implement RDF Nodes (except BlankNode) using FOL function terms
*/
trait TermNode extends RDFNodeBuilder {
type Node = Term
type SubjectNode = Term
type Label = Name

def uri(i: String) = Name(i)

type Literal = Term
def plain(s: String, lang: Option[Symbol]) = Plain(s, lang)
def typed(s: String, dt: String): Literal = Data(s, Name(dt))
def xmllit(e: scala.xml.NodeSeq): Literal = XMLLit(e)
}

The abstract RDFGraphBuilder node types are implemented as first order logic terms. For formulas, we use a "holds" predicate:

 object RDFLogic extends ... {
def atom(s: Term, p: Term, o: Term): Atomic = {
Atomic('holds, List(s, p, o))
}
def atom(arc: (Term, Term, Term)): Atomic = {
Atomic('holds, List(arc._1, arc._2, arc._3))
}
}

Then all the semantic machinery up to simple entailment between RDF graphs just falls out of conjunctive query.

I haven't done RDFS Entailment yet; the plan is to do basic rules first (N3rules or RIF BLD) and then use that for RDFS, OWL2-RL, and the like.

 

 

Accountability Appliances: What Lawyers Expect to See - Part III (User Interface)

I've written in the last two blogs about how lawyers operate in a very structured enviroment. This will have a tremendous impact on what they'll consider acceptable in a user interface. They might accept something which seems a bit like an outline or a form, but years of experience tell me that they will rail at anything code-like.

For example, we see

:MList a rdf:List

and automatically read

"MList" is the name of a list written in rdf

Or,

air:pattern {
:MEMBER air:in :MEMBERLIST.


and know that we are asking our system to look for a pattern in the data in which a particular "member" is in a particular list of members. Perhaps because law is already learning to read, speak, and think in another language, most lawyers look at lines like those above and see no meaning.

Our current work-in-progress produces output that includes:


bjb reject bs non compliant with S9Policy 1

Because

phone record 2892 category HealthInformation

Justify

bs request instruction bs request content
type Request
bs request content intended beneficiary customer351
type Benefit Action Instruction
customer351 location MA
xphone record 2892 about customer351



Nearly every output item is a hotlink to something which provides definition, explanation, or derivation. Much of it is in "Tabulator", the cool tool that aggregates just the bits of data we want to know.

From a user-interface-for-lawyers perspective, this version of output is an improvement over our earlier ones because it removes a lot of things programmers do to solve computation challenges. It removes colons and semi-colons from places they're not commonly used in English (i.e., as the beginning of a term) and mostly uses words that are known in the general population. It also parses "humpbacks" - the programmers' traditional
concatenation of a string of words - back into separate words. And, it replaces hyphens and underlines - also used for concatenation - with blank spaces.

At last week's meeting, we talked about the possibility of generating output which simulates short English sentences. These might be stilted but would be most easily read by lawyers. Here's my first attempt at the top-level template:

 

Issue: Whether the transactions in [TransactionLogFilePopularName] {about [VariableName] [VariableValue]} comply with [MasterPolicyPopularName]?

Rule: To be compliant, [SubPolicyPopularName] of [MasterPolicyPopularName] requires [PatternVariableName] of an event to be [PatternValue1].

Fact: In transaction [TransactionNumber] [PatternVariableName] of the event was [PatternValue2].

Analysis: [PatternValue2] is not [PatternValue].

Conclusion: The transactions appear to be non-compliant with [SubPolicyName] of [MasterPolicyPopularName].



This seems to me approximately correct in the context of requests for the appliance to reason over millions of transactions with many sub-rules. A person seeking an answer from the system would create the Issue question. The Issue question is almost always going to ask whether some series of transactions violated a super-rule and often will have a scope limiter (e.g., in regards to a particular person or within a date scope or by one entity), denoted here by {}.

From the lawyer perspective, the interesting part of the result is the finding of non-compliance or possible non-compliance. So, the remainder of the output would be generated to describe only the failure(s) in a pattern-matching for one or more sub-rules. If there's more than one violation, the interface would display the Issue once and then the Rule to Conclusion steps for each non-compliant result.

I tried this out on a laywer I know. He insisted it was unintelligible when the []'s were left in but said it was manageable when he saw the same text without them.


For our Scenario 9, Transaction 15, an idealized top level display would say:


Issue: Whether the transactions in Xphone's Customer Service Log about Person Bob Same comply with MA Disability Discrimination Law?

Rule: To be compliant, Denial of Service Rule of MA Disability Discrimination Law requires reason of an event to be other than disability.

Fact: In transaction Xphone Record 2892 reason of the event was Infectious Disease.

Analysis: Infectious disease is not other than disability.

Conclusion: The transactions appear to be non-compliant with Denial of Service Rule of MA Disability Discrimination Law.



Each one of the bound values should have a hotlink to a Tabulator display that provides background or details.



Right now, we might be able to produce:


Issue: Whether the transactions in Xphone's Customer Service Log about Betty JB reject Bob Same comply with MA Disability Discrimination Law?

Rule: To be non-compliant, Denial of Service Rule of MA Disability Discrimination Law requires REASON of an event to be category Health Information.

Fact: In transaction Xphone Record 2892 REASON of the event was category Health Information.

Analysis: category Health Information is category Health Information.

Conclusion: The transactions appear to be non-compliant with Denial of Service Rule of MA Disability Discrimination Law.




This example highlights a few challenges.

1) It's possible that only failures of policies containing comparative matches (e.g., :v1 sameAs :v2; :v9 greaterThan :v3; :v12 withinDateRange :v4) are legally relevant. This needs more thought.

2) We'd need to name every sub-policy or have a default called UnnamedSubPolicy.

3) We'd need to be able to translate statute numbers to popular names and have a default instruction to include the statute number when no popular name exists.

4) We'd need some taxonomies (e.g., infectious disease is a sub-class of disability).

5) In a perfect world, we'd have some way to trigger a couple alternative displays. For example, it would be nice to be able to trigger one of two rule structures: either one that says a rule requires a match or one that says a rules requires a non-match. The reason for this is that if we always have to use the same structure, about half of the outputs will be very stilted and cause the lawyers to struggle to understand.

6) We need someway to deal with something the system can't reason. If the law requires the reason to be disability and the system doesn't know whether health information is the same as or different from disability, then it ought to be able to produce an analysis that says something along the lines of "The relationship between Health Information and disability is unknown" and produce a conclusion that says "Whether the transaction is compliant is unknown." If we're reasoning over millions of transactions there are likely to be quite a few of these and they ought to be presented after the non-compliant ones.

 

 

I can only imagine...

Submitted by connolly on Sun, 2007-12-09 09:30. :: | | | |

I have a new bookmark. No, not a del.icio.us bookmark; not some bits in a file. This is the kind you have to go there to get.. go to Cleveland, that is. It reads:

Thank you
for you love & support
for the Ikpia & Ogbuji families
At this time of real need.
We will never forget
Imose, Chica, & Anya

Imose, Chica, & Anya

Abundant Life International Church
Highland heights, OH

After working with Chime for a year or so on the GRDDL Working Group (he was the difference between a hodge-podge of test files and a nicely organized GRDDL Test Cases technical report), I was really excited to meet him at the W3C Technical Plenary in Cambridge in early November. His Fuxi work is one of the best implementations of the way I think semantic web rules and proofs should go. When he told me some people didn't see the practical applications, it made me want to fly there and tell them how I think this will save lives and end world hunger.

So this past Tuesday, when I read the news about his family, the only way I could make my peace with it was to go and be with him. I can only imagine what he is going through. Eric Miller and Brian and David drove me to the funeral, but the line to say hi to the family was too long. And the internment service didn't really provide an opportunity to talk. So I was really glad that after I filled my plate at the reception, a seat across from Chime and Roschelle opened up for me and I got to sit and share a meal with them.

Grandpa Linus was at the table, too. His eulogy earlier at the funeral ended with the most inspiring spoken rendition of a song that I have ever heard:

Now The Hacienda's Dark The Town Is Sleeping
Now The Time Has Come To Part The Time For Weeping
Vaya Con Dios My Darling
Vaya Con Dios My Love

His eulogy is also posted as Grandparents' lament on The Kingdom Kids web site, along with details about a fund to help the family get back on their feet.

FOAF and OpenID: two great tastes that taste great together

Submitted by connolly on Wed, 2007-10-24 23:00. :: | |

As Simon Willison notes, OpenID solves the identity problem, not the trust problem. Meanwhile, FOAF and RDF are potential solutions to lots of problems but not yet actual solutions to very many. I think they go together like peanut butter and chocolate, creating a deliciously practical testbed for our Policy Aware Web research.

Our struggle to build a community is fairly typical:

In Dec 2006, Ryan did a Drupal upgrade that included OpenID support, but that only held the spammers back for a couple weeks. Meanwhile, Six Apart is Opening the Social Graph:

 

... if you manage a social networking service, we strongly encourage you to embrace OpenID, hCard XFN, FOAF and the other open standards around data portability.

With that in mind, a suggestion to outsource to a centralized commercial blog spam filtering service seemed like a step in the wrong direction; we are the Decentralized Information Group after all; time to eat our own cooking!

The policy we have working right now is, roughly: you can comment on our blog if you're a friend of a friend of a member of the group.

In more detail, you can comment on our blog if:

  1. You can show ownership of a web page via the OpenID protocol.
  2. That web page is related by the foaf:openid property to a foaf:Person, and
  3. That foaf:Person is
    1. listed as a member of the DIG group in http://dig.csail.mit.edu/data, or
    2. related to a dig member by one or two foaf:knows links.

The implementation has two components so far:

  • an enhancement to drupal's OpenID support to check a whitelist
  • a FOAF crawler that generates a whitelist periodically

We're looking into policies such as You can comment if you're in a class taught by a DIG group member, but there are challenges reconciling policies protecting privacy of MIT students with this approach.

We're also interested in federating with other communities. The Advogato community is particuarly interesting because

  1. The DIG group is pretty into Open Source, the core value of advogato.
  2. Advogato's trust metric is designed to be robust in the face of spammers and seems to work well in practice.

So I'd like to be able to say You can comment on our blog if you're certified Journeyer or above in the Advogato community. Advogato has been exporting basic foaf:name and foaf:knows data since a Feb 2007 update, but they didn't export the results of the trust metric computation in RDF.

Asking for that data in RDF has been on my todo list for months, but when Sean Palmer found out about this OpenID and FOAF stuff, he sent an enhancement request, and Steven Rainwater joined the #swig channel to let us alpha test it in no time. Sean also did a nice write-up.

This is a perfect example of the sort of integration of statistical methods into the Semantic Web that we have been talking about as far back as our DAML proposal in 2000:

Some of these systems use relatively simple and straightforward manipulation of well-characterized data, such as an access control system. Others, such as search engines, use wildly heuristic manipulations to reach less clearly justified but often extremely useful conclusions. In order to achieve its potential, the Semantic Web must provide a common interchange language bridging these diverse systems. Like HTML, the Semantic Web language should be basic enough that it does not impose an undue burden on the simplest web software systems, but powerful enough to allow more sophisticated components to use it to advantage as well.

Now we just have to enhance our crawler to get that data or otherwise integrate it with the drupal whitelist. (I'm particularly interested in using GRDDL to get FOAF data right from the OpenID page; stay tuned for more on that.) And I guess we need Advogato to provide a user interface for foaf:openid support... or maybe links to supplementary FOAF files via rdfs:seeAlso or owl:sameAs.

Soccer schedules, flight itineraries, timezones, and python web frameworks

Submitted by connolly on Wed, 2007-09-12 17:17. :: | | | |

The schedule for this fall soccer season came out August 11th. I got the itinerary for the trip I'm about to take on July 26. But I just now got them synchronized with the family calendar.

The soccer league publishes the schedule in somewhat reasonable HTML; to get that into my sidekick, I have a Makefile that does these steps:

  1. Use tidy to make the markup well-formed.
  2. Use 100 lines of XSLT (soccer-schedfix.xsl) to add hCalendar markup.
  3. Use glean-hcal.xsl to get RDF calendar data.
  4. Use hipAgent.py to upload the calendar items via XMLRPC to the danger/t-mobile service, which magically updates the sidekick device.

But oops! The timezones come out wrong. Ugh... manually fix the times of 12 soccer games... better than manually keying in all the data... then sync with the family calendar. My usual calendar sync Makefile does the following:

  1. Use dangerSync.py to download the calendar and task data via XMLRPC.
  2. Use hipsrv.py to filter by category=family, convert from danger/sidekick/hiptop conventions to iCalendar standard conventions pour the records into a kid template to produce RDF Calendar (and hCalendar).
  3. Use toIcal.py to convert RDF Calendar to .ics format.
  4. Upload to family WebDAV server using curl.

Then check the results on my mac to make sure that when my wife refreshes her iCal subscriptions it will look right.

Oh no! The timezones are wrong again!

The sidekick has no visible support for timezones, but the start_time and end_time fields in the XMLRPC interface are in Z/UTC time, and there's a timezone field. However, after years with this device, I'm still mystified about how it works. The Makefiles approach is not conducive to tinkering at this level, so I worked on my REST interface, hipwsgi.py until it had crude support for editing records (using JSON syntax in a form field). What I discovered is that once you post an event record with a mixed up timezone, there's no way to fix it. When you use the device UI to change the start time, it looks OK, but the Z time via XMLRPC is then wrong.

So I deleted all the soccer game records, carefully factored the danger/iCalendar conversion code out of hipAgent.py into calitems.py for ease of testing, and goit it working for local Chicago-time events.

Then I went through the whole story again with my itinerary. Just replace tidy and soccer-schedfix.xsl with flightCal.py to get the itinerary from SABRE's text format to hCalendar:

  1. Upload itinerary to the sidekick.
  2. Manually fix the times.
  3. Sync with iCal. Bzzt. Off by several hours.
  4. Delete the flights from the sidekick.
  5. Work on calitems.py some more.
  6. Upload to the sidekick again. Ignore the sidekick display, which is right for the parts of the itinerary in Chicago, but wrong for the others.
  7. Sync with iCal. Win!

I suppose I'm resigned that the only way to get the XMLRPC POST/upload right (the stored Z times, at least, if not the display) is to know what timezone the device is set to when the POST occurs. Sigh.

A March 2005 review corroborates my findings:

The Sidekick and the sync software do not seem to be aware of time zones. That means that your PC and your Sidekick have to be configured for the same time zone when they synchronize, else your appointments will be all wrong.

 

 hipwsgi.py is about my 5th iteration on this idea of a web server interface to my PDA data. It uses WSGI and JSON and Genshi, following Joe G's stuff. Previous itertions include:

  1. pdkb.pl - quick n dirty perl hack (started April 2001)
  2. hipAgent.py - screen scraping (Dec 2002)
  3. dangerSync.py - XMLRPC with a python shelf and hardcoded RDF/XML output (Feb 2004)
  4. hipsrv.py - conversion logic in python with kid templates and SPARQL-like filters over JSON-shaped data (March 2006)
It's pretty raw right now, but fleshing out the details looks like fun. Wish me luck.

Units of measure and property chaining

Submitted by connolly on Tue, 2007-07-31 13:42. :: | | | |

We're long overdue for standard URIs for units of measure in the Semantic Web.

The SUMO stuff has a nice browser (e.g. see meter), a nice mapping from wordnet, and nice licensing terms. Of course, it's not RDF-native. In particular, it uses n-ary relations in the form of functions of more than one argument; 1 hour is written (&%MeasureFn 1 &%HourDuration). I might be willing to work out a mapping for that, but other details in the KIF source bother me a bit: a month is modelled conservatively as something between 28 and 31 days, but a year is exactly 365 days, despite leap-years. Go figure.

There's a nice Units in MathML note from November 2003, but all the URIs are incomplete, e.g. http://.../units/yard .

The Sep 2006 OWL Time Working Draft has full URIs such as http://www.w3.org/2006/time#seconds, but its approach to n-ary relations is unsound, as I pointed out in a Jun 2006 comment.

Tim sketched the Interpretation Properties idiom back in 1998; I don't suppose it fits in OWL-DL, but it appeals to me quite a bit as an approach to units of measure. He just recently fleshed out some details in http://www.w3.org/2007/ont/unit. Units of measure are modelled as properties that relate quantities to magnitudes; for example:

 track length [ un:mile 0.25].

This Interpretation Properties approach allows us to model composition of units in the natural way:

W is o2:chain of (A V).

where o2:chain is like property chaining in OWL 1.1 (we hope).

Likewise, inverse units are modelled as inverse properties:

s a Unit; rdfs:label "s".
hz rdfs:label "Hz"; owl:inverseOf s.

Finally, scalar conversions are modelled using product; for example, mile is defined in terms of meter like so:

(m 0.0254) product inch.
(inch 12) product foot.
(foot 3) product yard.
(yard 22) product chain.
(chain 10) product furlong.
(furlong 8)product mile.

I supplemented his ontology with some test/example cases, unit_ex.n3 and then added a few rules to flesh out the modelling. These rules converts between meters and miles:

# numeric multiplication associates with unit multiplication
{ (?U1 ?S1) un:product ?U2.
(?U2 ?S2) un:product ?U3.
(?S1 ?S2) math:product ?S3
} => { (?U1 ?S3) un:product ?U3 }

# scalar conversions between units
{ ?X ?UNIT ?V.
(?BASE ?CONVERSION) un:product ?UNIT.
(?V ?CONVERSION) math:product ?V2.
} => { ?X ?BASE ?V2 }.

Put them together and out comes:

    ex:track     ex:length  [
:chain 20.0;
:foot 1320.0;
:furlong 2.0;
:inch 15840.0;
:m 402.336;
:mile 0.25;
:yard 440.0 ] .

The rules I wrote for pushing conversion factors into chains isn't fully general, but it works in cases like converting from this:

(un:foot un:hz) o2:chain fps.
bullet speed [ fps 4000 ].

to this:

    ex:bullet     ex:speed  [
ex:fps 4000;
:mps 1219.2 ] .

As I say, I find this approach quite appealing. I hope to discuss it with people working on units of measure in development of a Delivery Context Ontology.

A design for web content labels built from GRDDL and rules

Submitted by connolly on Thu, 2007-01-25 13:35. :: | | |

In #swig discussion, Tim mentioned he did some writing on labels and rules and OWL which prompted me to flesh out some related ideas I had. The result is a Makefile and four tests with example labels. One of them is:

All resources on example.com are accessible for all users and meet WAI AA guidelines except those on visual.example.com which are not suitable for users with impaired vision.

I picked an XML syntax out of the air and wrote visaa.lbl:

<label
xmlns="http://www.w3.org/2007/01/lbl22/label"
xmlns:mobilebp="http://www.w3.org/2007/01/lbl22/mobilebp@@#"
xmlns:wai="http://www.w3.org/2007/01/lbl22/wai@@#"
>
<scope>
<domain>example.com</domain>
<except>
<domain>visual.example.com</domain>
</except>
</scope>
<audience>
<wai:AAuser />
</audience>
</label>

And then in testdata.ttl we have:

<http://example.com/pg1simple> a webarch:InformationResource.
<http://visual.example.com/pg2needsVision> a
webarch:InformationResource.
:charlene a wai:AAuser.

Then we run the test thusly...

$ make visaa_test.ttl
xsltproc --output visaa.rdf label2rdf.xsl visaa.lbl
python ../../../2000/10/swap/cwm.py visaa.rdf lblrules.n3 owlAx.n3
testdata.ttl \
--think --filter=findlabels.n3 --n3 >visaa_test.ttl

and indeed, it concludes:

    <http://example.com/pg1simple>     lt:suitableFor :charlene .

but doesn't conclude that pg2needsVision is OK for charlene.

The .lbl syntax is RDF data via GRDDL and label2rdf.xsl. Then owlAx.n3 is rules that derive from the RDFS and OWL specs; i.e. stuff that's already standard. As Tim wrote, A label is a fairly direct use of OWL restrictions. This is very much the sort of thing OWL is designed for. Only the lblrules.n3 bit goes beyond what's standardized, and it's written in the N3 Rules subset of N3, which, assuming a few built-ins, maps pretty neatly to recent RIF designs.

A recent item from Bijan notes a SPARQL-rules design by Axel; I wonder if these rules fit in that design too. I hope to take a look soonish.

She's a witch and I have the proof (in N3)

Submitted by connolly on Tue, 2007-01-02 22:28. :: |

A while back, somebody turned the Monty Python Burn the Witch sketch into an example resolution proof. Bijan and Kendall had some fun turning it into OWL.

I'm still finding bugs pretty regularly, but the cwm/n3 proof stuff is starting to mature; it works for a few PAW demo scenarios. Ralph asked me to characterize the set of problems it works for. I don't have a good handle on that, but this witch example seems to be in the set.

Transcribing the example resolution FOL KB to N3 is pretty straightforward; the original is preserved in the comments:


@prefix : <witch#>.
@keywords is, of, a.

#[1] BURNS(x) /\ WOMAN(x) => WITCH(x)

{ ?x a BURNS. ?x a WOMAN } => { ?x a WITCH }.

#[2] WOMAN(GIRL)
GIRL a WOMAN.

#[3] \forall x, ISMADEOFWOOD(x) => BURNS(x)
{ ?x a ISMADEOFWOOD. } => { ?x a BURNS. }.

#[4] \forall x, FLOATS(x) => ISMADEOFWOOD(x)
{ ?x a FLOATS } => { ?x a ISMADEOFWOOD }.

#[5] FLOATS(DUCK)

DUCK a FLOATS.

#[6] \forall x,y FLOATS(x) /\ SAMEWEIGHT(x,y) => FLOATS(y)

{ ?x a FLOATS. ?x SAMEWEIGHT ?y } => { ?y a FLOATS }.

# and, by experiment
# [7] SAMEWEIGHT(DUCK,GIRL)

DUCK SAMEWEIGHT GIRL.

Then we run cwm to generate the proof and then run the proof checker in report mode:

$ cwm.py witch.n3  --think --filter=witch-goal.n3  --why >witch-pf.n3
$ check.py --report witch-pf.n3 >witch-pf.txt

The report is plain text; I'll enrich it just a bit here. Note that in the N3 proof format, some formulas are elided. It makes some sense not to repeat the whole formula you get by parsing an input file, but I'm not sure why cwm elides results of rule application. It seems to give the relevant formula on the next line, at least:

  1. ...
    [by parsing <witch.n3>]

  2. :GIRL a :WOMAN .
    [by erasure from step 1]

  3. :DUCK :SAMEWEIGHT :GIRL .
    [by erasure from step 1]

  4. :DUCK a :FLOATS .
    [by erasure from step 1]

  5. @forAll :x, :y . { :x a wit:FLOATS; wit:SAMEWEIGHT :y . } log:implies {:y a wit:FLOATS . } .
    [by erasure from step 1]

  6. ...
    [by rule from step 5 applied to steps [3, 4]
    with bindings {'y': '<witch#GIRL>', 'x': '<witch#DUCK>'}]


  7. :GIRL a :FLOATS .
    [by erasure from step 6]

  8. @forAll :x . { :x a wit:FLOATS . } log:implies {:x a wit:ISMADEOFWOOD . } .
    [by erasure from step 1]

  9. ...
    [by rule from step 8 applied to steps [7]
    with bindings {'x': '<witch#GIRL>'}]


  10. :GIRL a :ISMADEOFWOOD .
    [by erasure from step 9]

  11. @forAll :x . { :x a wit:ISMADEOFWOOD . } log:implies {:x a wit:BURNS . } .
    [by erasure from step 1]

  12. ...
    [by rule from step 11 applied to steps [10]
    with bindings {'x': '<witch#GIRL>'}]

  13. :GIRL a :BURNS .
    [by erasure from step 12]

  14. @forAll witch:x . { witch:x a :BURNS, :WOMAN . } log:implies {witch:x a :WITCH . } .
    [by erasure from step 1]

  15. ...
    [by rule from step 14 applied to steps [2, 13]
    with bindings {'x': '<witch#GIRL>'}]


  16. :GIRL a :WITCH .
    [by erasure from step 15]


All the files are in the swap/test/reason directory: witch.n3, witch-goal.n3, witch-pf.n3, witch-pf.txt. Enjoy.

Modelling HTTP cache configuration in the Semantic Web

Submitted by connolly on Fri, 2006-12-22 19:10. :: |

The W3C Semantic Web Interest Group is considering URI best practices, whether to use LSIDs or HTTP URIs, etc. I ran into some of them at MIT last week. At first it sounded like they wanted some solution so general it would solve the only two hard things in Computer Science: cache invalidation and naming things , as Phil Karlton would say. But then we started talking about a pretty interesting approach: using the semantic web to model cache configuration. It has long been a thorn in my side that there is no standard/portable equivalent ot .htaccess files, no RDF schema for HTTP and MIME, etc.

At WWW9 in May 2000, I gave a talk on formalizing HTTP caching. Where I used larch there, I'd use RDF, OWL, and N3 rules, today. I made some progress in that direction in August 2000: An RDF Model for GET/PUT and Document Management.

Web Architecture: Protocols for State Distribution is a draft I worked on around 1996 to 1999 wihthout ever really finishing it.

I can't find Norm Walsh's item on wwwoffle config, but I did find his XML 2003 paper Caching in with Resolvers:

This paper discusses entity resolvers, caches, and other strategies for dealing with access to sporadically available resources. Our principle focus is on XML Catalogs and local proxy caches. We’ll also consider in passing the ongoing debate of names and addresses, most often arising in the context of URNs vs. URLs.

In Nov 2003 I worked on Web Architecture Illustrated with RDF diagramming tools.

The tabulator, as it's doing HTTP, propagates stuff like content type, last modified, etc. from javascript into its RDF store. Meanwhile, the accessability evaluation and repair folks just released HTTP Vocabulary in RDF. I haven't managed to compare the tabulator's vocabulary with that one yet. I hope somebody does soon.

And while we're doing this little survey, check out the Uri Template stuff by Joe Gregorio and company. I haven't taken a very close look yet, but I suspect it'll be useful for various problems, if not this one in particular.

A new Basketball season brings a new episode in the personal information disaster

Submitted by connolly on Thu, 2006-11-16 12:39. :: | | | |

Basketball season is here. Time to copy my son's schedule to my PDA. The organization that runs the league has their schedules online. (yay!) in HTML. (yay!). But with events separated by all <br>s rather than enclosed in elements. (whimper). Even after running it thru tidy, it looks like:

<br />
<b>Event Date:</b> Wednesday, 11/15/2006<br>
<b>Start Time:</b> 8:15<br />
<b>End Time:</b> 9:30<br />
...
<br />
<b>Event Date:</b> Wednesday, 11/8/2006<br />
<b>Start Time:</b> 8:15<br />

So much for XSLT. Time for a nasty perl hack.

Or maybe not. Between my no more undocumented, untested code new year's resolution and the maturity of the python libraries, my usual doctest-driven development worked fine; I was able to generate JSON-shaped structures without hitting that oh screw it; I'll just use perl point; the gist of the code is:

def main(argv):
    dataf, tplf = argv[1], argv[2]
    tpl = kid.Template(file=tplf)
    tpl.events = eachEvent(file(dataf))

    for s in tpl.generate(output='xml', encoding='utf-8'):
        sys.stdout.write(s)

def eachEvent(lines):
    """turn an iterator over lines into an iterator over events
    """
    for l in lines:
        if 'Last Name' in l:
            surname = findName(l)
            e = mkobj("practice", "Practice w/%s" % surname)
        elif 'Event Date' in l:
            if 'dtstart' in e:
                yield e
                e = mkobj("practice", "Practice w/%s" % surname)
            e['date'] = findDate(l)
        elif 'Start Time' in l:
            e['dtstart'] = e['date'] + "T" + findTime(l)
        elif 'End Time' in l:
            e['dtend'] = e['date'] + "T" + findTime(l)

next = 0
def mkobj(pfx, summary):
    global next
    next += 1
    return {'id': "%s%d" % (pfx, next),
            'summary': summary,
            }

def findTime(s):
    """
    >>> findTime("<b>Start Time:</b> 8:15<br />")
    '20:15:00'
    >>> findTime("<b>End Time:</b> 9:30<br />")
    '21:30:00'
    """
    m = re.search(r"(\d+):(\d+)", s)
    hh, mm = int(m.group(1)), int(m.group(2))
    return "%02d:%02d:00" % (hh + 12, mm)

...

It uses my palmagent hackery: event-rdf.kid to produce RDF/XML which hipAgent.py can upload to my PDA. I also used the event.kid template to generate an hCalendar/XHTML version for archival purposes, though I didn't use that directly to feed my PDA.

The development took half an hour or so squeezed into this morning:

changeset:   5:7d455f25b0cc
user:        Dan Connolly http://www.w3.org/People/Connolly/
date:        Thu Nov 16 11:31:07 2006 -0600
summary:     id, seconds in time, etc.

changeset:   2:2b38765cec0f
user:        Dan Connolly http://www.w3.org/People/Connolly/
date:        Thu Nov 16 09:23:15 2006 -0600
summary:     finds date, dtstart, dtend, and location of each event

changeset:   1:e208314f21b2
user:        Dan Connolly http://www.w3.org/People/Connolly/
date:        Thu Nov 16 09:08:01 2006 -0600
summary:     finds dates of each event

Syndicate content