Fun with Embedded RDF and DOAP for the GRDDL profile

Submitted by connolly on Sat, 2006-02-25 01:13. ::

I just updated the GRDDL profile page to document the easy way to use GRDDL before the hard way.

This past week I swapped in all sorts of travel details and redid the travel schedule on my homepage as a result; I switched to hCalendar rather than my own home-grown schedule dialect while I was at it, and replace my FOAF dialect with Embedded RDF.

Since that went so well, and I don't care to go mucking with XHTML at the DTD level, I took the RDDL and XLink markup out of the GRDDL profile page and replaced it with embedded RDF.

In order to document the domains and ranges of the GRDDL properties, their id attributes had to be on an element that dominates those links, i.e. it had to be on the dd element rather than the previous tt element. That clashes with the way XMDP uses id attributes, so I'm not using XMDP markup either. I'm just using Embedded RDF for the RDF Schema bits.

I figured the link to the mailing list archive deserved to be a 1st-class link, so I used DOAP. The DOAP schema is quite tabulator-happy. Good work there.

Investigating logical reflection, constructive proof, and explicit provability

Submitted by connolly on Thu, 2006-02-16 02:38. :: |

Prompted by a question about RDF schemas for dollars, gallons, and other units, I found myself looking into SUMO again.

The SUMO/wordnet mappings are great, and the SUMO time concepts are backed by a nice KIF formalization of Allen's work, but it's overly constrained; I prefer the cyc style where the before and after relations apply to conferences and events, without indirection via time intervals.

But what got me really thinking was the way holdsDuring takes formulas as arguments, despite the claim that SUMO is written in SUO-KIF, which has pretty traditional first-order syntax and semantics. I eventually found this addressed explicitly:

... SUMO does include a number of predicates that take formulae as arguments, such as holdsDuring, believes, and KappaFn. In Sigma, we also perform a number of "tricks" which allow the user to state things which appears to be higher order, but which are in fact first order and have a simple syntactic transformation to standard first order form.

How does SUMO employ higher order logic? in the Ontology Portal FAQ, March 2005

I'm curious about how variables are handled in these tricks. The code is all there, and I downloaded it, but I didn't get very far.

I actually don't think first-order semantics are the best fit for the Semantic Web, where formulas refer to documents and the formulas they contain; reflection is a key mechanism, and prohibiting loops is unscalable. I think constructive proof is a much better way to think about proofs in N3.

I discovered Artemov's explicit provability stuff a while ago; my notes date from September 2003:

... explicit provability provides a tool of increasing the efficiency of a formal system by verified rules without explosion of the metamathematical level of the system.

So I dug out my LogicOfProofs larch transcription notes and the Explicit provability and constructive semantics article from the Bulletin of Symbolic Logic, volume 7, No.1, pp. 1-36, 2001 and started working on lp.n3, and an N3 rule for the constructive form of modus ponens:

# application
{ (t s) comp ts.
t proves { F => G }.
s proves F
=> { ts proves G }.

that is: if t is an algorithm for proving that F implies G, and s is an algorithm for proving s, then the composition of s and t is an algorithm for proving G. This Curry-Howard correspondence is really nifty.

The proof that "Socrates is mortal" from "if Socrates is a man then he is mortal" and "Socrates is a man" looks like:

2000/10/swap$ python util/lp.n3 --think
( lpex:a2
lpex:a1 )
a l:_g0;
:comp [
a :Proof;
:proves {lpex:Socrates a lpex:Mortal .
} ] .

lpex:a1 a :Proof;
:proves {lpex:Socrates a lpex:Man .
} .

lpex:a2 a :Proof;
:proves {{
lpex:Socrates a lpex:Man .

} log:implies {lpex:Socrates a lpex:Mortal .
} .
} .

... which is much easier to read than cwm's --why style:

2000/10/swap$ python test/reason/socrates.n3 --think --why
@forSome :_g0 .
[ a pr:Conjunction,
pr:component [
a pr:Inference;
pr:evidence (
a pr:Extraction;
pr:because :_g0;
pr:gives {:socrates a :Man .
} ] );
pr:rule [
a pr:Extraction;
pr:because :_g0;
pr:gives {{
:socrates a :Man .

} log:implies {:socrates a :Mortal .
} .
} ] ],
pr:gives {:socrates a :Man,
:Mortal .
:socrates a :Man .

} log:implies {:socrates a :Mortal .
} .
} ].

I didn't actually see formulas occuring as terms in that 2001 paper. So it might be a distraction with respect to the original holdsDuring issue. And that version of the logic of proofs includes all of propositional calculus, including the law of the excluded middle. But among his accomplishments I see

Reflexive lambda-calculus. The Curry-Howard isomorphism converting intuitionistic proofs into typed lambda-terms is a simple instance of an internalization property of a our system lambda-infinity which unifies intuitionistic propositions (types) with lambda-calculus and which is capable of internalizing its own derivations as lambda-terms.

so perhaps I should keep studying his stuff. I wish he'd use s-expressions and QUOTE like Moore's ACL2 paper and Chaitin's work rather than doing reflection with Godel numbering. I wonder what HA is; ah... wikipedia to the rescue:

It is essentially Peano arithmetic, minus the law of the excluded middle...

Toward integration of cwm's proof structures with InferenceWeb's PML

Submitted by connolly on Wed, 2006-02-15 23:11. :: |

The proof generation feature of cwm has been in development for a long time. The idea goes back at least as far as the section on proofs and validation in the original 1998 Semantic Web Roadmap, where we find sees of the proof-based access control idea that is now funded under the policy aware web project:

The HTTP "GET" will contain a proof that the client has a right to the response.

And in the TAMI project, we're looking at proofs as part of an audit mechanism for accountable datamining.

The cwm development process incorprates aspects of extrememe programming: test-driven development and a variation on pair programming; when somebody has a new feature working, somebody else in the group tries to (a) reproduce the test results and (b) review the tests, if not the code. When the pair agree that the tests are OK, we claim victory in a small group setting, and if that goes well, we make a release or at least send mail about the new feature. This typically takes a week or two or three.

In the case of cwm --why, I have been evaluating the tests since at least as far back as this December 2002 check-in comment on swap/test/reason/Makefile, and I still haven't made up my mind:

date: 2002/12/30 15:00:35; author: timbl; state: Exp; lines: +9 -6
--why works up to reason/t5. GK and SBP's list bugs fixed.

I have tried and failed to understand many times when Tim explained the simplicity of the reason proof ontology. I think I'm finally starting to get it. I'm nowhere near being certain it's free of use/mention problems, but I'm starting to see how it works.

The InferenceWeb folks have all sorts of nifty proof browsing stuff, and they're working with us in the TAMI project. In our meeting last August, they explained PML well enough that TimBL started on to-pml.n3, some rules to convert cwm's proof structures to PML. The rest of the integration task has been starved by work on SPARQL and Paulo moving to UTEP and all sorts of other stuff, but we seem to be moving again.

I tried running one of my versioning proofs through to-pml.n3 and then looking at it with the InferenceWeb proof browser, but I haven't got the PML structure right and it doesn't give very detailed diagnostics.

I made some progress by loading one of the PML examples into tabulator (alpha) and loading my swap/reason style proof in there and using the outline view to browse the structure. (It turns out that TimBL started coding the tabulator one day when he was having trouble reading a proof. GMTA ;)I discovered that PML is striped:

  • NodeSet
    • isConsequentOf
      • InferenceStep
        • hasAntecedent
          • NodeSet
            • ...

... where the swap/reason ontology just has Step objects and hangs the conclusions off them.

That was the key to some big changes to to-pml.n3. I don't have the output working in the PML browser yet, but Paolo sent me a pointer to a PML primer, which seems to have the remaining clues that I need.

See also: help checking/reading some proof related to versioning? to cwm-talk.

Using RDF and OWL to model language evolution

Submitted by connolly on Wed, 2006-02-15 18:37. :: | |

diagram on whiteboard Back in September, the TAG had this great whiteboard discussion of versioning.

versioning diagram TimBL captured an omnigraffle version, which is nice because it's XML and I can convert it to RDF with XSLT, but it's a mac thing and Dave Orchard, who's doing most of the relevant writing, was more inclined to building UML diagrams with Visio. Then Henry Thompson found this violet tool that's just what we need: a light-weight, cross-platform (Java, in this case) editor that produces pretty clean XML representations of UML class diagrams.

This week, Dave sent an update on the terminology section.

That was just the prompt I was looking for to swap in my work on formally defining W3C's namespace change policy options from last September. I thought for a while about how to test that the RDF version of the UML diagram was right, and I realized I could get cwm to smush producers and consumers based on cardinality constraints.

First we extract RDF from the UML diagram...

ext-vers$ make test-agent-pf.n3
xsltproc ... --output ext-vers-uml.rdf grokVioletUML.xsl ext-vers-uml.violet

... and then we mix with some rules to implement OWL cardinalities...

python $swap/ ext-vers-uml.rdf test-agent.n3 \
owl-excerpt.n3 rdfs-excerpt.n3 --think \
--filter=test-agent-goal.n3 --why >test-agent-pf.n3

And indeed, it concludes a1 = a2.

I'm working on getting a proof/justification using --why and some proof browsing tools, but that's another story.

One of the reasons I'm pleased with this ontology that the TAG came up with in Edinburgh is that it allows me to formalize the concept of sublanguage that I have heard TimBL talk about now and again. For example:

  1. HTML2 is a sublanguage of HTML4.
  2. a production of chap2 is in HTML2; the intent is chap1in
  3. a consumption of chap2, ?C is in HTML4
  4. Show: ?C intent chap1in
  5. ?C intent chap1in by 1, 2, 3 and defn sublanguage
where the definition of sublanguage is:
[] ev:text ?TXT; :language [ is :sublanguage of ?BIG ]; :intent ?I.
?COMM ev:text ?TXT; :language ?BIG.
} => { ?COMM :intent ?I }.
[] ev:text ?TXT; :language ?BIG; :impact ?I.
?COMM ev:text ?TXT; :language [ is :sublanguage of ?BIG ].
} => { ?COMM :impact ?I }.

formally closing the feedback loop

Submitted by connolly on Fri, 2006-02-10 01:39. :: |

Speaking of checking specs against tests, I suppose the first time I did that was when I was editing HTML 2. I maintained a collection of tests that I checked against the DTD whenever I changed it.

When we couldn't get other collaborators to install an SGML parser and do likewise, Mark Gaither and announced the zero-install validation service, a precursor to the W3C markup validation service.

I'm on an Adventures in Formal Methods panel at the W3C tech plenary day. I expect it will be useful to tell that story to make the point that formalizing knowledge allows us to delegate tasks to the machine.

tags pending: web history? quality?

bnf2turtle -- write a turtle version of an EBNF grammar

Submitted by connolly on Fri, 2006-02-10 01:11. :: |

In order to cross one of the few remaining t's on the SPARQL spec, I wrote today. It turned out to be such a nice piece of code that I elaborated the module documentation using ReStructuredText. It's checked into the SPARQL spec editor's draft materials, but I'll probably move it to the swap codebase presently. Meanwhile, here's the formatted version of the documentation:

Author: Dan Connolly
Version: $Revision: 1.13 $ of 2006-02-10
Copyright: W3C Open Source License Share and enjoy.


Invoke a la:

python foo.bnf >foo.ttl

where foo.bnf is full of lines like:

[1] document ::= prolog element Misc*

as per the XML formal grammar notation. The output is Turtle - Terse RDF Triple Language:

:document rdfs:label "document"; rdf:value "1";
 rdfs:comment "[1] document ::= prolog element Misc*";
 a g:NonTerminal;
  g:seq (
    [ g:star


Many specifications include grammars that look formal but are not actually checked, by machine, against test data sets. Debugging the grammar in the XML specification has been a long, tedious manual process. Only when the loop is closed between a fully formal grammar and a large test data set can we be confident that we have an accurate specification of a language [1].

The grammar in the N3 design note has evolved based on the original manual transcription into a python recursive-descent parser and subsequent development of test cases. Rather than maintain the grammar and the parser independently, our goal is to formalize the language syntax sufficiently to replace the manual implementation with one derived mechanically from the specification.

[1]and even then, only the syntax of the language.

Related Work

Sean Palmer's n3p announcement demonstrated the feasibility of the approach, though that work did not cover some aspects of N3.

In development of the SPARQL specification, Eric Prud'hommeaux developed Yacker, which converts EBNF syntax to perl and C and C++ yacc grammars. It includes an interactive facility for checking strings against the resulting grammars. Yosi Scharf used it in cwm Release 1.1.0rc1, which includes a SPAQRL parser that is almost completely mechanically generated.

The N3/turtle output from yacker is lower level than the EBNF notation from the XML specification; it has the ?, +, and * operators compiled down to pure context-free rules, obscuring the grammar structure. Since that transformation is straightforwardly expressed in semantic web rules (see bnf-rules.n3), it seems best to keep the RDF expression of the grammar in terms of the higher level EBNF constructs.

Open Issues and Future Work

The yacker output also has the terminals compiled to elaborate regular expressions. The best strategy for dealing with lexical tokens is not yet clear. Many tokens in SPARQL are case insensitive; this is not yet captured formally.

The schema for the EBNF vocabulary used here (g:seq, g:alt, ...) is not yet published; it should be aligned with swap/grammar/bnf and the bnf2html.n3 rules (and/or the style of linked XHTML grammar in the SPARQL and XML specificiations).

It would be interesting to corroborate the claim in the SPARQL spec that the grammar is LL(1) with a mechanical proof based on N3 rules.


The N3 Primer by Tim Berners-Lee introduces RDF and the Semantic web using N3, a teaching and scribbling language. Turtle is a subset of N3 that maps directly to (and from) the standard XML syntax for RDF.

I started with a kludged and broken algorithm for handling the precedence of | vs concatenation in EBNF rules; for a moment I thought the task required a yacc-like LR parser, but then I realized recursive descent would work well enough. A dozen or so doctests later, it did indeed work. I haven't checked the resulting grammar against the SPARQL tests yet, but it sure looks right.

Then I wondered how much of the formal grammar notation from the XML spec I hadn't covered, so I tried it out on the XML grammar (after writing a 20 line XSLT transformation to extract the grammar from the XML REC) and it worked the first time! So I think it's reasonably complete, though it has a few details that are hard-coded to SPARQL.

See also: cwm-talk discussion, swig chump entry.

tabulator use cases: when can we meet? and PathCross

Submitted by connolly on Wed, 2006-02-08 13:48. :: | | | | |

I keep all sorts of calendar info in the web, as do my colleagues and the groups and organizations we participate in.

Suppose it was all in RDF, either directly as RDF/XML or indirectly via GRDDL as hCalendar or the like.

Wouldn't it be cool to grab a bunch of sources, and then tabulate names vs. availability on various dates?

I would probably need rules in the tabulator; Jos's work sure seems promising.

Closely related is the PathCross use case...

Suppose I'm travelling to Boston and San Francisco in the next couple months. I'd like my machine to let me know I have a FriendOfaFriend who also lives there or plans to be there.

See also the Open Group's Federated Free/Busy Challenge.

Is the Web any different than the telegraph? Trying to get to the bottom of the Net Neutrality Debate

Submitted by Danny Weitzner on Wed, 2006-02-08 01:50. ::
Is the Web any different than the telegraph? Trying to get to the bottom of the Net Neutrality Debate

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

Ironically enough, just as the Net Neutrality debate is really heating up, Western Union decides it’s time to send the last telegraph, ending an extraordinary 150+ year chapter in the history of human communications. (The first commercial telegram was sent in 1851 after Samuel Morse sent the first telegram using Morse code in 1844.) This historical confluence leads me to wonder, what’s so different about the Internet and the Web today than the telegram was. They’re all communications systems that send messages back and forth using a set of standardized communications protocols, data formats, and addresses.

Commenting on the threat of a non-neutral Internet service model, Vint Cerf stated the worry that many have:

?In the Internet world, both ends essentially pay for access to the Internet system, and so the providers of access get compensated by the users at each end,? said Cerf, who helped develop the Internet?s basic communications protocol. ?My big concern is that suddenly access providers want to step in the middle and create a toll road to limit customers? ability to get access to services of their choice even though they have paid for access to the network in the first place.?

What exactly is wrong with making customers pay for the choices they make, however? As far as I can tell, we all do pay for the network services that we use in rough proportion to the cost of those services. Today network costs are allocated between users (who pay for their own access to the Net) and large services (like Amazon, who pay a much higher price for their Internet connection because they put a lot more traffic load on the Net). But what would be wrong with a change to this arrangement — allowing large content providers like Google or Amazon to pay for the privilege of having their data get to customers faster than other services?

What’s wrong is that it would begin to break the unique many-to-many nature of how information is linked together on the Web. Here we begin to get at the heart of what makes the Web different from other communications networks. The Web is built to take advantage of the really time, many-to-many communications capability of the underlying Internet. Many Internet applications we use are largely just point-to-point, such as email. But some, like the Web and instant messaging chat rooms depend on the ability to aggregate messages from many sources around the Internet. Some of the most social valuable and commercially popular types of services on the Web take advantage of the ability to link together information from across the Web into what appears to be a single information resource (aka a Web page). Consider:

  • portals — aggregating news or information from many different Web sites (ie. Yahoo)
  • search engines — taking advantage of the links that exist between documents (MSN, Google)
  • scientific research collaborations — using the Web scientists have the ability to bring together information from different web pages in order to discover new knowledge by exploring links across data sets (Scientific researchers were the first users of the Web for this very reason, Today this avenue is being pursued particularly in the Life Sciences community.)
  • blog - today’s web logs are the leading edge of mass usage of truly web-like decentralized information systems and they seem to be showing both the cultural, political and commercial benefit of this open architecture. (see below)

All of these uses of the Web depend on the ability to create links between pages and to have those links work for all users without any prior coordination. Changing the business arrangments associated with Internet service in way that would disrupt these links would have a dramatic and negative impact on the Web as we know it.

Here’s one more detailed illustration of the importance of many-to-many linking without prior coordination. blog information flow This image shows a rough view of the information, data and money flow associated with making this blog work. This blog aggregates data from several other web services. To the end user all the data on the page appears to be located at the same URL, but it is actually drawn from a number of different sites. These independent sites have no prior connection arrangements?the linkage of all the sites is made possible by the creativity and coordination of the blog host. (See the CDT Net Neutrality Reading Room for more information.

It’s pretty clear to me that we’re still at the very beginning of the public policy debate over Net Neutrality. Some telecommunications companies have made dramatic statements of their plans to chance the Internet business model and these statements have caused justified alarm. I’m hoping that the debate can mature somewhat beyond the current rhetoric. While it’s easy to see why network operators (cable and telcos) want to preserve maximum flexibility as the enter new businesses (and invest a lot of money), my hope is that they will also realize that it’s not in their business interest to kill the Internet goose (or Google) that has laid such a golden egg for all.

python, javascript, and PHP, oh my!

Submitted by connolly on Tue, 2006-02-07 11:28. ::

My habits for developing quality code in python are bumping up against the fact that the deployment platforms for the web client and server are javascript and PHP, respectively.

I love the python doctest module. is a pretty good example of how I like to use it to simultaneously document and test code:

def splitFrag(uriref):
    """split a URI reference between the fragment and the rest.

    Punctuation is thrown away.

    >>> splitFrag("abc#def")
    ('abc', 'def')

def splitFragP(uriref, punct=0):
    """split a URI reference before the fragment

    Punctuation is kept.

    >>> splitFragP("abc#def")
    ('abc', '#def')

Another important way that python is self-documenting is that it meets the unambiguity requirement: you can pick up any .py file and trace every identifier back to what it refers to by following your nose:

  • for local variables, normal static scoping rules work; just scan up and look for an assignment or a function parameter
  • for imported names, find the relavent import statement. from foo import * is evil, of course.
  • global variables are explicitly declared as such

OK, full disclosure: you need to know the python built-ins, and when you see paramx.methody(z), you have an unbounded search for methody, which makes doctests that show what class paramx comes from pretty important. Mapping from the relevant import statement to the corresponding .py file may involve the usual search path nightmares; python doesn't solve that. redfoot's red_import is interesting. And I'm not sure if eggs are a step in the right direction or the wrong direction; gotta study them more. I try to ground import statements in the web a la:

import MySQLdb # MySQL for Python
               # any Python Database API-happy implementation will do.

... so that you can follow your nose from the ImportError traceback to resolve the dependency.

Now timbl has started migrating the swap/cwm stuff to javascript. Let's look at uri.js:

/** return the protocol of a uri **/
function uri_protocol(uri) {

Thanks for trying to document each function, but that sort of comment isn't worthwhile; the risk that it'll get out of sync with the code is greater than the information it provides. Back to naming...

function URIjoin(given, base) {
	if (base<0) {alert("Invalid base URL "+ base); return given}

Where does alert come from? Is that in the ECMAScript standard? Or in some Netscape devedge stuff?

But more importantly: why not raise an exception? Javascript does have throw/catch, no? Is it not the norm to use them? As I argued in my contribution to the Python, Tcl and Perl, oh my! thread in 1996, the community norms are at least as important, if not more important, than the language features, when it comes to developing quality code.

I keep running into javascript and PHP code that I want to read and wishing for doctests because I can't figure out which end is up.

Whence comes kb in this bit from term.js? Do I face an unbounded search?

RDFFormula.prototype.fromNT = function(str) {
	var len = str.length
	var x
	if (str[0] == '<') return kb.sym(str.slice(1,len-1))

Maybe I just need to study the standard libraries a bit more, but I hear that the drupal project has coding conventions lots of people like for developing quality PHP code; I hope to study those. And the PEAR community must have some norms and best practices. I went looking for javascript testing stuff and ran into JSAN, a CPAN work-alike. That sort of infrastructure naturally reinforces quality norms.

See also: delicious tags , , , , .

The Net Neutrality and Internet Architecture Fundamentals

Submitted by Danny Weitzner on Tue, 2006-02-07 09:25. ::
The Net Neutrality and Internet Architecture Fundamentals

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

The so-called ‘Net Neutrality’ debate is rising to full bore this week. In advance of a Senate Commerce Committee hearing, the commercial protagonists are staking out their positions. For the carriers, John Thorne, a top lawyer at Verizon (Verizon Executive Calls for End to Google’s ‘Free Lunch’) says:

“The network builders are spending a fortune constructing and maintaining the networks that Google intends to ride on with nothing but cheap servers,” Thorne told a conference marking the 10th anniversary of the Telecommunications Act of 1996. “It is enjoying a free lunch that should, by any rational account, be the lunch of the facilities providers.”

He went on to characterize the difficulties of finding enough investment incentive to construct new network infrastructure:

the task of getting thousands of local franchise agreements to offer cable television; and what he called “Google utopianism,” a concept he likened to “spiked Kool-Aid.”

One the other side, Vint Cerf, Google VP and leading Internet architecture responds that Google is:

worried that if net neutrality protections are not enacted, the Internet’s freedom could be compromised, limiting consumer choice, economic growth, technological innovation and U.S. global competitiveness.

“In the Internet world, both ends essentially pay for access to the Internet system, and so the providers of access get compensated by the users at each end,” said Cerf, who helped develop the Internet’s basic communications protocol. “My big concern is that suddenly access providers want to step in the middle and create a toll road to limit customers’ ability to get access to services of their choice even though they have paid for access to the network in the first place.”

Public interest advocacy groups are jumping in, too: from Public Knowledge a position paper, “Good Fences Make Bad Broadband: Preserving an Open Internet through Net Neutrality” (J. Windhausen, 6 Feb 2006) and from the Center for Democracy and Technology, an online reading room with several operational Internet traffic scenarios that illustrate how the Net works today and what might be threatened.

I begin to think that the very framing of the debate as about Net “Neutrality” confuses the issue. The real question is: is there any such thing as a non-neutral Internet? The Internet and the Web operate as fundamentally neutral and non-discriminatory platforms. There is certainly certain kinds of price discrimination, as Cerf notes, but content discrimination and preferential routing are contrary to the operation of the Internet and the Web. There may be other TCP/IP-based networks (some carriers are talking about deploying new Cable TV services with IPTV), but these aren’t the Internet.

Syndicate content