blogs

IKL by Hayes et al. provides a semantics for N3?

Submitted by connolly on Thu, 2007-05-17 14:25. :: |

One my trip to Duke, just after I arrived on Thursday, Pat Hayes gave a talk about IKL; it's a logic with nice Web-like properties such as any collection of well-formed IKL sentences is itself well-formed. As he was talking, I saw lots of parallels to N3... propositions as terms, log:uri, etc.

By Friday night I was exhuasted from travel, lack of sleep, and conference-going, but I couldn't get the IKL/N3 ideas out of my head, so I had to code it up as another output mode of n3absyn.py.

The superman case works, though it's a bit surprising that rdf:type gets contextualized along with superman. The thread continues with the case of "if your homepage says you're vegetarian, then for the purpose of registration for this conference, you're vegetarian". I'm still puzzling over Pat's explanation a bit, but it seems to make sense.

Along with the IKL spec and IKL Guide, Pat also suggests:

Collaboration and crime at a distance at HASTAC, WWW2007

Submitted by connolly on Thu, 2007-05-17 13:44. :: |

I went to the 1st International HASTAC Conference, April 19-21, 2007 at Duke University in Durham, NC, USA. My stated role was to tell the story of How the W3C Process Got Its Stripes to this humanities research community on a The World Wide Web Evolves panel that Harry Halpin arranged.

After a short history of my role in the development of the Web and W3C, I noted that the Internet not only faciiltates remote collaboration; it also opens the door to crime at a distance. Extortion of the form "say... nice web site you got there; it would be a shame if something happened to it" is a reality. I'm interested in research into how much the Internet can tolerate before we see the tragedy of the commons.

I noted the Proof-of-work proves not to work result by Laurie and Clayton in 2004 as a fairly surprising result based on what looks like fairly straightforward and unsophisticated economic analysis of spam, zombies, etc. Does the humanities research community have expertise in statistics and economics of preserving cultural values such as open communication? (Oh yeah... and I meant to encourage them to look at social/ethical issues around OpenID and distributed authentication, but I completely forgot.)

While HASTAC is somewhat on the leading edge of the humanities community, I'm not sure their scope includes what I'm looking for.

Meanwhile, at the Web Science panel at WWW2007 in Banff, Peter asked "Where are the cultural anthropologists?" I was pleasantly surprised that some of them were there. Again, at Harry Halpin's prompting.

Updating network security community's understanding of privacy

Submitted by Danny Weitzner on Mon, 2007-05-07 17:53. ::

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

A few weeks ago a colleague reminded me of one of the early definitions of privacy in the computer security literature from Saltzer and Schroeder (The Protection of Information in Computer Systems):

“The term “privacy” denotes a socially defined ability of an individual (or organization) to determine whether, when, and to whom personal (or organizational) information is to be released.”

This view reflects the widely held view even today amongst computer security architects that the way to achieve privacy policy ends is to control the release of information. To this end, great effort has been expended to design systems that control access to and flow of personal, sensitive information. While there are certainly good reasons to do this, access control alone has not, and never will, be sufficient to achieve compliance with privacy, copyright or other information-related rules.

City of Boston Censoring Municipal WiFi

Submitted by Danny Weitzner on Tue, 2007-04-24 16:03. ::

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

Various people (including David Sheets, a student of mine at MIT, and Seth Finkelstein) have pointed out over the last few days that the ‘free’ municipal WiFi service offered by the City of Boston comes with mandatory content filtering that blocks all kinds of sites which are not even close to illegal nor are they sources of pornography that might be considered harmful to children. One the one hand it’s not hard to see why city officials want to avoid the headline: “Boston’s free network a conduit to porn for city’s children, foiling parents’ filtering software.” But does that mean that it’s either wise public policy or constitutionally-permissible for the city to offer wifi to the public with such sweeping and arbitrary constraints?

If the City is allowed to do this, then they can block just about anything: Web sites operated by the opposing political party, critiques of the Big Dig, not to mention http://yankees.mlb.com/. One has to ask whether this is really a path that any city would want to open up for itself?

As a constitutional matter, it’s not quite clear whether the government can require government-funded Internet service providers to filter content. In United States v. American Library Association, 539 U.S. 194 (2003), the US Supreme Court decided that the Congress could require libraries receiving federal Internet access subsidies (the e-rate) to filter out porn. However, it’s not clear whether this case applies to the muni Wifi situation. The Supreme Court explained:

A public library does not acquire Internet terminals in order to create a public forum for Web publishers to express themselves, any more than it collects books in order to provide a public forum for the authors of books to speak. It provides Internet access, not to “encourage a diversity of views from private speakers,” … but for the same reasons it offers other library resources: to facilitate research, learning, and recreational pursuits by furnishing
materials of requisite and appropriate quality.

For what purpose is muni wifi offered? It’s it precisely to create an expanded public forum to increase the flow of information and new web services around the city?

This will be an interesting issue to watch.

The Mercurial SCM: great for lots of stuff, but not the holy grail

Submitted by connolly on Fri, 2007-03-23 15:44. :: | |

I have been tracking the mercurial project for a couple years now. First just a bookmark under python+scm, then after using hg to code on an airplane about a year later, I was hooked. I helped get the microformats testing effort using mercurial about a year later, and did some noodling on Access control and version control: an over-constrained problem? around that same time.

Yesterday I played host to Matt Mackall as he gave a presentation, The Mercurial SCM, to the W3C Team. In the disucssion that followed, we touched on:

  • fractal project organization (touching on PartiaClone and the ForestExtension)
  • the toplogy of update flows in a large development system with
    overlapping communities with differentt access rights
  • comparisons with Darcs
  • hg hosting, large projects, user support

It seems that hg scales to very large projects, as long as they're fairly uniform, but it doesn't support the sort of tangly fractal web of inter-project dependencies that would make it the holy grail of version control systems.

MP3 patent mess and lessons for standards making

Submitted by Danny Weitzner on Mon, 2007-03-05 19:47. ::

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

The New York Times reports (Patent Fights Are a Legacy of MP3’s Tangled Origins, Douglas Heingartner, 5 March 2007, C03) on the mess over patent licensing for MP3 technology. While most (including Microsoft) had assumed/hoped that if they paid licensing fees to Frauenhofer they’d have the patent basis largely covered, now Alcatel (armed with the former Bell Labs patent portfolio from Lucent) and others are showing up demanding licensing fees, too. Microsoft just got hit with a with $1.5 billion patent infringement judgment in the United States. Other vendors with MP3 as an integral part of their product are worried that they existing licensing arrangments may not insulate them from new demand for fees.

Leonardo Chiariglione, chair of the MPEG group, declares that this is a bad for MP3 deployment:

“I consider the situation in general not positive for the wide adoption of the standard, which is what I have been working on.”

At the same time he laments the fact that there is little the standards body (ISO and MPEG) can do. Says that Times article:

For those confused about where to turn to obtain an MP3 license for a new device or piece of software, he offers little solace. “The rule is that the MPEG working group is not allowed to consider patent issues in our technical work, so there is nothing I can do about it….”

W3C’s Patent Policy takes a more activist approach to such matters. We won’t standardize any technology that cannot be implemented royalty-free, and if we find that there are threats to the RF status of a standard after it’s adopted, we can convene a special group to take action, including recommending changing or rescinding the standard.

US Congress Telecommuncations and the Internet Subcommittee Hearing on the Future of the Web

Submitted by Danny Weitzner on Sun, 2007-03-04 15:23. ::

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

Last week, the US Congress House of Representatives Subcommittee on the Telecommunications and the Internet had it’s first hearing of the year, the subject of which was the Future of the World Wide Web. Tim Berners-Lee was the sole witness at this hearing. The topic and witness choice were notable for a couple of reasons. This is the first meeting of the committee in the new session of Congress and the Chair of the committee, Rep. Ed Markey (D-MA), announced his intention to take a long range look at the larger issues facing communications policy in the United States. This, by itself, is a wonderful idea. The fact that he decided to start this series of hearings with the World Wide Web, as opposed to so many other topics he might have chosen, really speaks to the central importance that the Web has in our society. The fact that he chose Tim to testify was great, too, IMHO. :-)

The Committee covered a wide range of questions, including:

  • how will the Semantic Web change science and health care?
  • what are the key lessons to learn from the first phase of the Web about how to promote continued innovation?
  • what should be done, technically or legally, about spam, pornography available to kids, identity theft?
  • why did Tim decide to make Web technology available royalty-free?
  • does support for royalty-free standards imply that content and services on the Web also have to be free?
  • and even, a slighty sheepish question about whether teleportation might be possible in the future?

I’ve been to a lot of congressional hearings, especially in my earlier professional life as lawyer and advocate for Internet civil liberties organizations EFF and CDT. This was one of the most positive, thoughtful and forward looking hearings that I’ve ever been to. Here you could see the Committee actually looking out into the future about the potential of the Web and trying to figure out what they could do (or not do) to help assure that it continues to grow and be available to all for commercial, political, cultural and personal use. Too often, Congress gets bogged down in its somewhat inevitable but short-sighted role as mediator amongst special interests. This was Congress at its best. It was great to be there.

You can read Tim’s testimony on the Web.

Ironically enough, though it’s easy to read the testimony, it’s not so easy to get an archived copy of the video feed from the hearing. Though most Congressional activity is recorded on video and much of it is streamed live by CSPAN or others, there’s no organized way to get achived copies of the video. Carl Malamud is engaged in a serious effort to try to remedy this situation, including trying to encourage CSPAN to make it’s archive of congressonal video public.

In the meantime, Carl has kindly ripped the feeds from the hearing and put them up in 2 places (Google Video and archive.org)

I’m certainly going to be following Carl’s efforts and looking to help out where I can.

Update: C-SPAN has changed its policy and now provides public access with a Creative Commons license.

What is the Analogue for the Semantic Web? If the Web is like a Page+Links, the SW is like a...

Submitted by mc on Fri, 2007-02-16 16:27. ::

This is the first entry for the breadcrumbs blog from what may be called the Interaction perspective on things Semantic Webbish. To that end, i've been mulling over what is the paradigm for the Semantic Web - more particularly what is the physical world analogue for this concept?
In order to design an interface to support a technology, to expose its potential for what it can do, it helps to know what it is - or failing that - to have a model around which we can conceptualize what it is, what it does, and somewhat how it works. It's not unusual for a new technology or concept to be introduced via an analogue of a previous, familiar technology "it's like this thing - but for this new bit." This "like this, but for this new bit" is what i've been looking for, for the SW.
What i'll propose (eventually) below is that one paradigm may be a notebook in the traditional sense of the term of the notebook as a place to capture work in progress. taken one step (a big step) further, i'll argue that if the Web paradigm is a Page + Links, the paradigm for the Semantic Web may be a notebook + the memex.
The Web PAGE - it's a Page. With Links.
We have a great model for the Web. It's the page: text with images. We're all familiar with concepts of the page. It's clear, easy to grasp. I'd postulate we need a similar construct or paradigm or analogue for the Semantic Web. We have a long history with read-only text, whether as official public communication, or as unofficial comment.

We also have a long experience (400+ years) of experience of a particular technology's deployment of words and images in a page - whether as an illuminated manuscript, or an early printed text with woodcuts.

The one new thing added in the Web to the notion of the page - the thing that makes it a Web page - is the hypertext link. The link is really the only core new concept introduced to the page - and more times than not, that link's job is to links to another page. The translation from one mode of non web-page to the Web page is not a terribly huge leap. The link as a concept is almost what we'd call "intuitive" in its use.
This is not to say that there are not a myriad of design considerations for making that new page+link approach useful, usable and accessible. We have developed whole suites of conventions on how to deliver pages effectively and have gone through now several generations of "web design" to ensure that text, image and link work. Yet despite over a decade of technological evolutions in the Web technology, the paradigm for describing what we create with the Web is the same: it's a page. With Links. The Page as paradigm informs how we design the page, the way we design the page. It's not a spreadsheet; it's not a network diagram. It's a page.
Even with Web 2.0, with RSS feeds, blogs, mash ups, we still have pages. The only model variant in Web 2 with location based mash ups is that the main image on the page is now a map. åAnd again, Maps are familiar technology that have been around for millennia, and are a technology most of us had some training in our education on how to use. It's amazing how much we use familiar technologies to model the representations for new ones, perhaps especially in computing. Bottom line, the web page as page is a clear model that rapidly communicates what the Web is largely about: enabling people to publish content, communicate ideas, and link into the myriad of other ideas available. The page is a powerful analogue for communicating this model, and it is, i would argue, because there is such a clear model, that there has been such rapid adoption of the concepts, and interests across disciplines in the technology.
Analogue for the Semantic Web?
So, if the analogue for the Web is the page, what is the analogue for the Semantic Web? And why is finding this analogue important? Part of the answer to that question may stem from whom do people in the Semantic Web community wish to attract to be involved as practitioners, innovators, creators, discoverers in this space? If it's the same range of passions and expertise that have brought so much to the Web from the arts, humanities, sciences, business and so on, then this question of model becomes critical.
Consider for a moment how the Semantic Web is described in the new First Stop Shop for What is It, Wikipedia.
The Wikipedia entry for the semantic web begins: The Semantic Web is an evolution of the World Wide Web in which information is machine processable (rather than being only human oriented), thus permitting browsers or other software agents to find, share and combine information more easily. It is a manifestation of W3C director Tim Berners-Lee's vision of the Web as a universal medium for data, information, and knowledge exchange. At its core the Semantic Web consists of a data model called Resource Description Framework (RDF), a variety of data interchange formats (e.g RDF/XML, N3, Turtle, N-Triples), and notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL) that facilitate formal description of concepts, terms, and relationships within a given domain. The burgeoning Semantic Web comprises newly created and/or transformed web data sources endowed with computer-processable meaning (semantics).
Now, all that description tells anyone about the semantic Web is that it's for Machines. And i'm not sure i believe that the the end game imagined for the Semantic Web is to make data easier for machines to process. It would seem that that machine-processable stuff is a means to an end, but not the end itself. The end is still about people, and PEOPLE being able to build knowledge by moving through linked information.
We might ask, then, if the Semantic Web has the same human-oriented goals as the Web, why not just use the same model for describing it: pages with links. I'd suggest that the page is not robust enough to support what more we get from the Semantic Web's far greater emphasis on the Link as opposed to the Page.
Because of how it's structured, content in the Semantic Web can be richly associated. We can have the potential with the Semantic Web to explore things in new ways via these associations. For example Beethoven as a composer can be associated with Genres in Music, and to specific Recordings as instances, which associate with various artists, and recording companies or even with the politics of certain works being recorded or not. Beethoven is also associated with a particular period in History; with the interaction of styles in that period, and hence there are correlations between music and architecture and scientific thought at the time. All these associations branch out from Beethoven directly. One might even say that such branches constitute a graph, and the page cannot reflect these possibilities. But neither, David Karger and i have argued, does a raw graph, express that richness.
For one thing, besides being illegible on a number of levels, these graphs present only things which are directly connected on the graph. The Semantic Web has a technical facility to support inferring connections between points according to the expression of rules. For instance, one might see a connection between a Beethoven work and the structure of a poem or an equation, and be able to express that connection such that a new connection among these points becomes available. Seeing, finding and drawing those kinds of connections is a primary attribute which the Semantic Web can enable. The page cannot readily contain that possibility.
Beyond the Wikipedia definition for the Semantic Web, then, the Semantic Web's promise is to enable people to explore, associate, and connect information to build new knowledge. This sounds a lot like what V. Bush described in As We May Think as the Memex (see Chapter 2: Vannevar Bush and Memex, by Ronald D. Houston and Glynn Harmon in ARIST 41 for a fab overview of the perceptions of this paper since its publication).
Drawing of Bush's theoretical Memex machine (Life Magazine, November 19, 1945)
The key part of the Memex is making and sharing associations among divergent sources. Bush imagined professions of "trail blazers" (section 8 of As We May Think) emerging who would go about creating these inferences, and publishing them in new kinds of encyclopedias.
il n’y a pas de hors-memex
Bush's view assumes that there are encylopedias and then trails built with the memex through these encyclopedias.
But what if there was only the memex? In a sense that's what the Semantic Web suggests with its emphasis on links and everything having its own unique id (uri). But does this idea of the Semantic Web as a vehicle that supports making associations - as a memex - get us closer to a readily translatable concept of the Semantic Web?
I'd like to tease out a few more parts of the memex idea as a way to addressing that point. First is that bush imagined the Memex very much as a tool for scientists - as a way to help researchers to make sense of all the work not only they themselves, but their colleagues as well, would be doing. This focus on the researcher is particularly appropriate for this current exploration of possible paradigms because, because it focuses on the artefact of interest - the logbook - as an object supporting work in progress.
Second, the memex took in not only textual notes but images of observations the scientist would literally take with a camera while working. The memex is very much a multimedia repository of not only others' extant work but of the scientists own work-in-progress.
I think this notion of work in progress, of personal work log, is critical. This is distinct from the read-only model of the Web, and moves towards a writerly as well as a readerly medium (to use Barthe's terms). But more particularly, it adds a new dimension to linking from elements presented as finished pages to elements which are in the rough, personal. Which may or may not be (yet) meant for publication. There is something of this middle or transitional ground happening on the Web. This entry is an example of it.
We see this writerly side in Blogs, rss feeds, tags, comments, ratings - all the places where Web 2 approaches are helping with more rapid publication and inter-commentary of content on the Web. But even this new writerly approach is not quite what the memex is also getting at with its model of something almost as familiar to us as the technology of the printed page in a printed book: and that is the scientist's notebook.
Scientist's Notebook as first pass at Semantic Web Analogue.
Notebooks can be the complete filling of pages; or of scaps of information. They can be used for the capture of formal studies like experiments, observations in field work, or notes for future reference. But they are all unlike what we think of as the Web in a particular way: the web is public; we use its protocols to publish work for others. Lab/note books are personal, idiosyncratic, and in particular, they represent works in progress. We may find that we use the semantic web technologies both locally/personally as well as distributed/publically. This *for the researcher* or for the researcher's work in progress as a model of part of the Semantic Web is very memex'y. It's also very different from what the web has become. I think this blending of personal use with the semantic web's potential for automatic association of external, associated resources is a significant shift in how most of us have been thinking about the semantic web.
Let me frame that last statement. There have been projects thinking about the semantic web desktop - using the semantic web as a personal or local server layer for data. There have also been projects like myTea which have imagined using the semantic web technologies to maintain transparent context histories as a great enabler for a bioinformatics lab book that can automatically track and record bioinformatics experiments as they develop.
i don't think however that anyone has previously proposed a paradigm, model, analogue for the semantic web as a researcher's notebook. with links. or, more properly, with memex, where the memex itself is an extension of the researcher's notes, observations, raw data, work in progress. I've already said that the page can't reflect the rich associative possibilities of what the semantic web promises so one may ask, how could the analogue of a researcher's notebook which is so idiosnycratic.
One way is that it is possible in a notebook (or on a huge sheet of paper or on a whiteboard - other spaces of work in progress) to draw lines easily across notes to make connections. David Wang and i this past fall at UMD's MindLab were looking at a way to draw these kinds of lines between known points in an ontology to help create rules/inferences to make new connections in the knowledge base.
Indeed, there may be a great deal more we can take from the qualities of a researcher's notebook to see as a design prompt to capture more of the semantic web. But one of the important components of this notion of the semantic web as notebook + memex is that it situates the Semantic Web conceptually within the realm of the human. It also situates the semantic web as something that can be part of a process that is engaged with the user. Right now, very few semantic web tools, whether mspace, haystack or tabulator to name a few, support direct authoring.
The idea of seeing the semantic web be pull inable into a researcher's context, where the notebook is constantly seeking associations to support the researcher's process, seems to me a compelling kind of inversion of the usual models - instead of putting stuff out there, we are bringing stuff in here, working it. potentially sharing it. but first and foremost using it, munging it, creating with it to develop new knowledge. Process rather than end.

Heterogeneous, Implicitly Structured, Implicitly Associated Data Capture
Another notion of the notebook which seems interesting is that it also breaks the page as read-only, well structured, well presented information space. In the physical pages of the notebook, we see various forms of data entry where long exegesis is rare compared to short bursts of information, what Michael Bernstein calls "information scraps".
These various uses of the page-as-surface, for a variety of forms of content , also demonstrates the personal, though frequently work-related, work-in-progress attributes of this popular form of content capture. Again, we do see examples of a kind of information scrap on the Web - these can be one-liner blog entries pointing to news or other ideas, to tags, to recommendations, to comments. Indeed, entire modes of communication have been built up around short messages like texts, or widgets that communicate only the weather. But these info bytes, if you will, unlike the info scrap, are again meant for publication - for someone else to be able to consume. Tags may be an interesting boundary object as they can be both personal markers - highly idiosyncratic - as well as group or public markers. But for the most part, the short bytes we find on the web are there for public consumption. Notebooks are workspaces, pre-publication resources, the working out of ideas.
This is why for now, in any case, i'm focussing on the idea of the note book rather than the personal journal or log. The note book or lab book is a place for taking notes on ideas; it is not the final forum for the ideas, but it is the gathering place for them. Another similar kind of physical world analogue for this kind of working out process is the notecard stack. Indeed, one of the earliest hypertext systems, NoteCards, attempted to emulate this system of idea capture and reordering. Spatial hypertext systems like Tinderbox have also capitlized on the the affordances of moveable cards or small objects to capture ideas where these ideas can be spaced out, clustered, where space in the organization communicates a kind of meaning - at least to the author of the structures.

The attributes of the notecard stack that i find particularly relevant are the usual purpose of the stack and the kinds of data the cards hold. When i was in highschool, we were taught a particular methodology for notecards as the way to prepare a research paper. There were idea cards, quotation/paraphrase cards, and bibliography cards. These cards could be created in any order as material was discovered or ideas occured "only one idea to a card; only one quotation per card," "only one reference per card" - the idea being of course that individual cards could be organized and reorganized spatially for getting a picture of the developing paper. Not all cards would be used. Gaps could be detected. The organized cards could then be put into one pile, and the paper written effectively from the turning over or laying out of a set of cards at a time (one exercise required us to generate an outline of the paper from the cards before proceeding to the paper-writeing).
The relevance of the notecard model to the concept of the semantic web as personal work space with associated public data is the integration of personal ideas with external sources: the idea cards backed up with the quotations from external sources. In the case of notecards, these associations are either manually created by the researcher/author, or are presented by (and thus attributed to) another author.
The goals are the same: building new knowledge by capturing ones own ideas, and working with others - whether these are ideas that come up in a conversation with others and are hastily jotted down, or are captured from a published source. there is an interplay here, a making of meaning. I mentioned spatial hypertext: Mark Bernstein's Tinderbox software as said very much follows the notecard paradigm to support just this kind of intermix activity: it enables links to be copied from the web into cards, and of course enables other kinds of data to be written into the cards. It blends capture of the external with capture of the personal. So do many digital notebook ideas, like the Circus Ponies one i'm using for drafting this entry - they don'thave the nice spatial affordances of Tinderbox, however - they are more locked to the paper metaphor. Something neither of these fine programs have that i think the notebook or notecard + memex could bring is the automatic discovery of association from both the personal and the external into the personal work space.
This is an idea that Max Van Kleek, Michael Bernstein, David Karger (at MIT) and myself are pushing on right now from one angle in something called "doing" (pronounced "doyng"), and that the Rich Tags project is pushing on in an associated other space. We're interested in finding ways/metaphors/paradigms to support the capture of personal structured data (like Michael's information scrap of a number jotted down that is a phone number) so that it can be first associated with what Max has started grabbing - the local context - and from there look at drawing in appropriate associated external contexts.
In a way the Haystack project modeled this eb and flow of personal information like calendar events with external information such as flight bookings. It created an integrated view of these information resources so that they would be concurrently available. No one knew they were working with semantic web data, and the opportunity to explore across contexts (like the bethoven example way at the start of this entry) was not there. I think this time we're asking the question what would this new thing look like from the moment the computer is engaged. How might input mechanisms change? how might representations across applications-as-contexts differ if there was this collective "data soup" from which these contexts could draw/share?
For the moment, i'm imagining this context-rich interaction as the Semantic Web, and the way i'm thinking of it is as a researcher's Note Book (we are all knowledge workers at some point) of work in progress. A notebook. With the Memex.
The reason that vision of note book + memex appeals to me particularly is that it foregrounds an active engagement with the data - both reading it, writing it, potentially sharing it for reuse. And i think that kind of in-process engagement with information - to have in work data and external data blending for the development of new knowledge is what the Semantic Web is about.

(dig count added May Day, 2007).



© m.c. schraefel, 2007, visiting fellow, DIG. The ideas in this exegesis were initially stimulated from various conversations at January's DIG face to face meeting, then explored later in Jan, 2007 as part of an talk on work in progress while i was visiting the iSchool, University of Texas at Austin.

A design for web content labels built from GRDDL and rules

Submitted by connolly on Thu, 2007-01-25 13:35. :: | | |

In #swig discussion, Tim mentioned he did some writing on labels and rules and OWL which prompted me to flesh out some related ideas I had. The result is a Makefile and four tests with example labels. One of them is:

All resources on example.com are accessible for all users and meet WAI AA guidelines except those on visual.example.com which are not suitable for users with impaired vision.

I picked an XML syntax out of the air and wrote visaa.lbl:

<label
xmlns="http://www.w3.org/2007/01/lbl22/label"
xmlns:mobilebp="http://www.w3.org/2007/01/lbl22/mobilebp@@#"
xmlns:wai="http://www.w3.org/2007/01/lbl22/wai@@#"
>
<scope>
<domain>example.com</domain>
<except>
<domain>visual.example.com</domain>
</except>
</scope>
<audience>
<wai:AAuser />
</audience>
</label>

And then in testdata.ttl we have:

<http://example.com/pg1simple> a webarch:InformationResource.
<http://visual.example.com/pg2needsVision> a
webarch:InformationResource.
:charlene a wai:AAuser.

Then we run the test thusly...

$ make visaa_test.ttl
xsltproc --output visaa.rdf label2rdf.xsl visaa.lbl
python ../../../2000/10/swap/cwm.py visaa.rdf lblrules.n3 owlAx.n3
testdata.ttl \
--think --filter=findlabels.n3 --n3 >visaa_test.ttl

and indeed, it concludes:

    <http://example.com/pg1simple>     lt:suitableFor :charlene .

but doesn't conclude that pg2needsVision is OK for charlene.

The .lbl syntax is RDF data via GRDDL and label2rdf.xsl. Then owlAx.n3 is rules that derive from the RDFS and OWL specs; i.e. stuff that's already standard. As Tim wrote, A label is a fairly direct use of OWL restrictions. This is very much the sort of thing OWL is designed for. Only the lblrules.n3 bit goes beyond what's standardized, and it's written in the N3 Rules subset of N3, which, assuming a few built-ins, maps pretty neatly to recent RIF designs.

A recent item from Bijan notes a SPARQL-rules design by Axel; I wonder if these rules fit in that design too. I hope to take a look soonish.

She's a witch and I have the proof (in N3)

Submitted by connolly on Tue, 2007-01-02 22:28. :: |

A while back, somebody turned the Monty Python Burn the Witch sketch into an example resolution proof. Bijan and Kendall had some fun turning it into OWL.

I'm still finding bugs pretty regularly, but the cwm/n3 proof stuff is starting to mature; it works for a few PAW demo scenarios. Ralph asked me to characterize the set of problems it works for. I don't have a good handle on that, but this witch example seems to be in the set.

Transcribing the example resolution FOL KB to N3 is pretty straightforward; the original is preserved in the comments:


@prefix : <witch#>.
@keywords is, of, a.

#[1] BURNS(x) /\ WOMAN(x) => WITCH(x)

{ ?x a BURNS. ?x a WOMAN } => { ?x a WITCH }.

#[2] WOMAN(GIRL)
GIRL a WOMAN.

#[3] \forall x, ISMADEOFWOOD(x) => BURNS(x)
{ ?x a ISMADEOFWOOD. } => { ?x a BURNS. }.

#[4] \forall x, FLOATS(x) => ISMADEOFWOOD(x)
{ ?x a FLOATS } => { ?x a ISMADEOFWOOD }.

#[5] FLOATS(DUCK)

DUCK a FLOATS.

#[6] \forall x,y FLOATS(x) /\ SAMEWEIGHT(x,y) => FLOATS(y)

{ ?x a FLOATS. ?x SAMEWEIGHT ?y } => { ?y a FLOATS }.

# and, by experiment
# [7] SAMEWEIGHT(DUCK,GIRL)

DUCK SAMEWEIGHT GIRL.

Then we run cwm to generate the proof and then run the proof checker in report mode:

$ cwm.py witch.n3  --think --filter=witch-goal.n3  --why >witch-pf.n3
$ check.py --report witch-pf.n3 >witch-pf.txt

The report is plain text; I'll enrich it just a bit here. Note that in the N3 proof format, some formulas are elided. It makes some sense not to repeat the whole formula you get by parsing an input file, but I'm not sure why cwm elides results of rule application. It seems to give the relevant formula on the next line, at least:

  1. ...
    [by parsing <witch.n3>]

  2. :GIRL a :WOMAN .
    [by erasure from step 1]

  3. :DUCK :SAMEWEIGHT :GIRL .
    [by erasure from step 1]

  4. :DUCK a :FLOATS .
    [by erasure from step 1]

  5. @forAll :x, :y . { :x a wit:FLOATS; wit:SAMEWEIGHT :y . } log:implies {:y a wit:FLOATS . } .
    [by erasure from step 1]

  6. ...
    [by rule from step 5 applied to steps [3, 4]
    with bindings {'y': '<witch#GIRL>', 'x': '<witch#DUCK>'}]


  7. :GIRL a :FLOATS .
    [by erasure from step 6]

  8. @forAll :x . { :x a wit:FLOATS . } log:implies {:x a wit:ISMADEOFWOOD . } .
    [by erasure from step 1]

  9. ...
    [by rule from step 8 applied to steps [7]
    with bindings {'x': '<witch#GIRL>'}]


  10. :GIRL a :ISMADEOFWOOD .
    [by erasure from step 9]

  11. @forAll :x . { :x a wit:ISMADEOFWOOD . } log:implies {:x a wit:BURNS . } .
    [by erasure from step 1]

  12. ...
    [by rule from step 11 applied to steps [10]
    with bindings {'x': '<witch#GIRL>'}]

  13. :GIRL a :BURNS .
    [by erasure from step 12]

  14. @forAll witch:x . { witch:x a :BURNS, :WOMAN . } log:implies {witch:x a :WITCH . } .
    [by erasure from step 1]

  15. ...
    [by rule from step 14 applied to steps [2, 13]
    with bindings {'x': '<witch#GIRL>'}]


  16. :GIRL a :WITCH .
    [by erasure from step 15]


All the files are in the swap/test/reason directory: witch.n3, witch-goal.n3, witch-pf.n3, witch-pf.txt. Enjoy.

Syndicate content