blogs

Drupal upgrade

Submitted by ryanlee on Fri, 2006-12-01 21:15. ::

I've upgraded the Drupal installation so we can use the OpenID module. A few things learned in the process:

  • The Drupal upgrade path requires incremental steps; to go from one minor version to another two numbers way means upgrading through every intermediate minor version until reaching the target. Earlier versions fail to be useful in 'knowing' which data model version the system is at, so an upgrade meant importing / guessing / dropping and repeating the cycle until I hit on the right one, which in itself was not an easy state to assess.
  • The module administration page loads every module, which can cause memory issues resulting in a blank page. Removing unnecessary, unused modules helps.
  • The JanRain OpenID 1.2.0 pear installation fails to install itself properly, requiring the moving of directories post-install.
  • The OpenID module does not respect settings on account creation. I wrote some code to fix this.

But we're now at the latest Drupal version; and more on OpenID later.

Addendum: And apparently I needed to enable the legacy module. Thanks to those who pointed out the symptoms. 'I' in this case is solely Ryan, not Tim, Dan, or anybody else; send any of your issues with the upgrade my way.

A new Basketball season brings a new episode in the personal information disaster

Submitted by connolly on Thu, 2006-11-16 12:39. :: | | | |

Basketball season is here. Time to copy my son's schedule to my PDA. The organization that runs the league has their schedules online. (yay!) in HTML. (yay!). But with events separated by all <br>s rather than enclosed in elements. (whimper). Even after running it thru tidy, it looks like:

<br />
<b>Event Date:</b> Wednesday, 11/15/2006<br>
<b>Start Time:</b> 8:15<br />
<b>End Time:</b> 9:30<br />
...
<br />
<b>Event Date:</b> Wednesday, 11/8/2006<br />
<b>Start Time:</b> 8:15<br />

So much for XSLT. Time for a nasty perl hack.

Or maybe not. Between my no more undocumented, untested code new year's resolution and the maturity of the python libraries, my usual doctest-driven development worked fine; I was able to generate JSON-shaped structures without hitting that oh screw it; I'll just use perl point; the gist of the code is:

def main(argv):
    dataf, tplf = argv[1], argv[2]
    tpl = kid.Template(file=tplf)
    tpl.events = eachEvent(file(dataf))

    for s in tpl.generate(output='xml', encoding='utf-8'):
        sys.stdout.write(s)

def eachEvent(lines):
    """turn an iterator over lines into an iterator over events
    """
    for l in lines:
        if 'Last Name' in l:
            surname = findName(l)
            e = mkobj("practice", "Practice w/%s" % surname)
        elif 'Event Date' in l:
            if 'dtstart' in e:
                yield e
                e = mkobj("practice", "Practice w/%s" % surname)
            e['date'] = findDate(l)
        elif 'Start Time' in l:
            e['dtstart'] = e['date'] + "T" + findTime(l)
        elif 'End Time' in l:
            e['dtend'] = e['date'] + "T" + findTime(l)

next = 0
def mkobj(pfx, summary):
    global next
    next += 1
    return {'id': "%s%d" % (pfx, next),
            'summary': summary,
            }

def findTime(s):
    """
    >>> findTime("<b>Start Time:</b> 8:15<br />")
    '20:15:00'
    >>> findTime("<b>End Time:</b> 9:30<br />")
    '21:30:00'
    """
    m = re.search(r"(\d+):(\d+)", s)
    hh, mm = int(m.group(1)), int(m.group(2))
    return "%02d:%02d:00" % (hh + 12, mm)

...

It uses my palmagent hackery: event-rdf.kid to produce RDF/XML which hipAgent.py can upload to my PDA. I also used the event.kid template to generate an hCalendar/XHTML version for archival purposes, though I didn't use that directly to feed my PDA.

The development took half an hour or so squeezed into this morning:

changeset:   5:7d455f25b0cc
user:        Dan Connolly http://www.w3.org/People/Connolly/
date:        Thu Nov 16 11:31:07 2006 -0600
summary:     id, seconds in time, etc.

changeset:   2:2b38765cec0f
user:        Dan Connolly http://www.w3.org/People/Connolly/
date:        Thu Nov 16 09:23:15 2006 -0600
summary:     finds date, dtstart, dtend, and location of each event

changeset:   1:e208314f21b2
user:        Dan Connolly http://www.w3.org/People/Connolly/
date:        Thu Nov 16 09:08:01 2006 -0600
summary:     finds dates of each event

Celebrating OWL interoperability and spec quality

Submitted by connolly on Sat, 2006-11-11 00:29. :: | |

In a Standards and Pseudo Standards item in July, Holger Knublauch gripes that SQL interoperability is still tricky after all these years, and UML is still shaking out bugs, while RDF and OWL are really solid. I hope GRDDL and SPARQL will get there soon too.

At the OWL: Experiences and Directions workshop in Athens today, as the community gathered to talk about problems they see with OWL and what they'd like to add to OWL, I felt compelled to point out (using a few slides) that:

  • XML interoperability is quite good and tools are pretty much ubiquitous, but don't forget the XML Core working group has fixed over 100 errata in the specifications since they were originally adopted in 1998.
  • HTML interoperability is a black art; the specification is only a small part of what you need to know to build interoperable tools.
  • XML Schema interoperability is improving, but interoperability problem reports are still fairly common, and it's not always clear from the spec which tool is right when they disagree.

And while the OWL errata do include a repeated sentence and a missing word, there have been no substantive problems reported in the normative specifications.

How did we do that? The OWL deliverables include:

OWL test results screenshot

Jeremy and Jos did great work on the tests. And Sandro's approach to getting test results back from the tool developers was particularly inspired. He asked them to publish their test results as RDF data in the web. Then he provided immediate feedback in the form of an aggregate report that included updates live. After our table of test results had columns from one or two tools, several other developers came out of the woodwork and said "here are my results too." Before long we had results from a dozen or so tools and our implementation report was compelling.

The GRDDL tests are coming along nicely; Chime's message on implementation and testing shows that the spec is quite straightforward to implement, and he updated the test harness so that we should be able to support Evaluation and Report Language (EARL) soon.

SPARQL looks a bit more challenging, but I hope we start to get some solid reports from developers about the SPARQL test collection soon too.

tags: QA, GRDDL, SPARQL, OWL, RDF, Semantic Web

Blogging is great

Submitted by timbl on Fri, 2006-11-03 10:11. ::

People have, since it started, complained about the fact that there is junk on the web. And as a universal medium, of course, it is important that the web itself doesn't try to decide what is publishable. The way quality works on the web is through links.

It works because reputable writers make links to things they consider reputable sources. So readers, when they find something distasteful or unreliable, don't just hit the back button once, they hit it twice. They remember not to follow links again through the page which took them there. One's chosen starting page, and a nurtured set of bookmarks, are the entrance points, then, to a selected subweb of information which one is generally inclined to trust and find valuable.

A great example of course is the blogging world. Blogs provide a gently evolving network of pointers of interest. As do FOAF files. I've always thought that FOAF could be extended to provide a trust infrastructure for (e..g.) spam filtering and OpenID-style single sign-on and its good to see things happening in that space.

In a recent interview with the Guardian, alas, my attempt to explain this was turned upside down into a "blogging is one of the biggest perils" message. Sigh. I think they took their lead from an unfortunate BBC article, which for some reason stressed concerns about the web rather than excitement, failure modes rather than opportunities. (This happens, because when you launch a Web Science Research Initiative, people ask what the opportunities are and what the dangers are for the future. And some editors are tempted to just edit out the opportunities and headline the fears to get the eyeballs, which is old and boring newspaper practice. We expect better from the Guardian and BBC, generally very reputable sources)

In fact, it is a really positive time for the web. Startups are launching, and being sold [Disclaimer: people I know] again, academics are excited about new systems and ideas, conferences and camps and wikis and chat channels and are hopping with energy, and every morning demands an excruciating choice of which exciting link to follow first.

And, fortunately, we have blogs. We can publish what we actually think, even when misreported.

Reinventing HTML

Submitted by timbl on Fri, 2006-10-27 16:14. ::

Making standards is hard work. Its hard because it involves listening to other people and figuring out what they mean, which means figuring out where they are coming from, how they are using words, and so on.

There is the age-old tradeoff for any group as to whether to zoom along happily, in relative isolation, putting off the day when they ask for reviews, or whether to get lots of people involved early on, so a wider community gets on board earlier, with all the time that costs. That's a trade-off which won't go away.

The solutions tend to be different for each case, each working group. Some have lots of reviewers and some few, some have lots of time, some urgent deadlines.

A particular case is HTML. HTML has the potential interest of millions of people: anyone who has designed a web page may have useful views on new HTML features. It is the earliest spec of W3C, a battleground of the browser wars, and now the most widespread spec.

The perceived accountability of the HTML group has been an issue. Sometimes this was a departure from the W3C process, sometimes a sticking to it in principle, but not actually providing assurances to commenters. An issue was the formation of the breakaway WHAT WG, which attracted reviewers though it did not have a process or specific accountability measures itself.

There has been discussion in blogs where Daniel Glazman, Björn Hörmann, Molly Holzschlag, Eric Meyer, and Jeffrey Zeldman and others have shared concerns about W3C works particularly in the HTML area. The validator and other subjects cropped up too, but let's focus on HTML now. We had a W3C retreat in which we discussed what to do about these things.

Some things are very clear. It is really important to have real developers on the ground involved with the development of HTML. It is also really important to have browser makers intimately involved and committed. And also all the other stakeholders, including users and user companies and makers of related products.

Some things are clearer with hindsight of several years. It is necessary to evolve HTML incrementally. The attempt to get the world to switch to XML, including quotes around attribute values and slashes in empty tags and namespaces all at once didn't work. The large HTML-generating public did not move, largely because the browsers didn't complain. Some large communities did shift and are enjoying the fruits of well-formed systems, but not all. It is important to maintain HTML incrementally, as well as continuing a transition to well-formed world, and developing more power in that world.

The plan is to charter a completely new HTML group. Unlike the previous one, this one will be chartered to do incremental improvements to HTML, as also in parallel xHTML. It will have a different chair and staff contact. It will work on HTML and xHTML together. We have strong support for this group, from many people we have talked to, including browser makers.

There will also be work on forms. This is a complex area, as existing HTML forms and XForms are both form languages. HTML forms are ubiquitously deployed, and there are many implementations and users of XForms. Meanwhile, the Webforms submission has suggested sensible extensions to HTML forms. The plan is, informed by Webforms, to extend HTML forms. At the same time, there is a work item to look at how HTML forms (existing and extended) can be thought of as XForm equivalents, to allow an easy escalation path. A goal would be to have an HTML forms language which is a superset of the existing HTML language, and a subset of a XForms language wit added HTML compatibility. We will see to what extend this is possible. There will be a new Forms group, and a common task force between it and the HTML group.

There is also a plan for a separate group to work on the XHTML2 work which the old "HTML working group" was working on. There will be no dependency of HTML work on the XHTML2 work.

As well as a new HTML work, there are other things want to change. The validator I think is a really valuable tool both for users and in helping standards deployment. I'd like it to check (even) more stuff, be (even) more helpful, and prioritize carefully its errors, warning and mild chidings. I'd like it to link to an explanations of why things should be a certain way. We have, by the way, just ordered some new server hardware, paid for by the Supporters program -- thank you!

This is going to be hard work. I'd like everyone to go into this realizing this. I'll be asking these groups to be very accountable, to have powerful issue tracking systems on the w3.org web site, and to be responsive in spirit as well as in letter to public comments. As always, we will be insisting on working implementations and test suites. Now we are going to be asking for things like talking with validator developers, maybe providing validator modules and validator test suites. (That's like a language test suite but backwards, in a way). I'm going to ask commenters to be respectful of the groups, as always. Try to check whether the comment has been made before, suggest alternative text, one item per message, etc, and add to technical perception social awareness.

This is going to be a very major collaboration on a very important spec, one of the crown jewels of web technology. Even though hundreds of people will be involved, we are evolving the technology which millions going on billions will use in the future. There won't seem like enough thankyous to go around some days. But we will be maintaining something very important and creating something even better.

Tim BL

p.s. comments are disabled here in breadcrumbs, the DIG research blog, but they are welcome in the W3C QA weblog.

Now is a good time to try the tabulator

Submitted by connolly on Thu, 2006-10-26 11:40. :: | | |

Tim presented the tabulator to the W3C team today; see slides: Tabulator: AJAX Generic RDF browser.

The tabulator was sorta all over the floor when I tried to present it in Austin in September, but David Sheets put it back together in the last couple weeks. Yay David!

In particular, the support for viewing the HTTP data that you pick up by tabulating is working better than ever before. The HTTP vocabulary has URIs like http://dig.csail.mit.edu/2005/ajar/ajaw/httph#content-type. That seems like an interesting contribution to the WAI ER work on HTTP Vocabulary in RDF.

Note comments are disabled here in breadcrumbs until we figure out OpenID comment policies and drupal etc.. The tabulator issue tracker is probably a better place to report problems anyway. We don't have OpenID working there yet either, unfortunately, but we do support email callback based account setup.

Adding Shoenfield, Brachman books to my bookshelf?

Submitted by connolly on Mon, 2006-10-16 13:08. ::

In discussion following my presentation to the ACL 2 seminar, Bob Boyer said "Shoenfield is the bible." And while talking with Elaine Rich before my Dean's Scholars lunch presentation, she strongly recommended Brachman's KR book.

This brought John Udell's library lookup gizmo out of the someday pile for a try. Support for the libraries near me didn't work out of the box. It's reasonably clear how to get it working, but a manual search showed that the libraries don't have the books anyway.

In the research library symposium I learned that even some on-campus researchers find it easier to buy books thru Amazon than to use their campus library. I have no trouble adding these two books to my amazon wishlist, but I hesitate to actually order them.

I make time for novels by McMurtry and Crichton and such, but the computer book industry is full of stuff that is rushed to market. If there's a topic that I'm really interested in, I can download the source the day it's released or follow the mailing list archives and weblogs pretty directly. By the time it has come out in book form, the chance to influence the direction has probably passed. So when I consider having my employer buy a book for my bookshelf, I have a hard time justifying more than $20 or so.

I devote one shelf to books that were part of my education... starting with The Complete Book of Model Car Building and A Field Guide to Rocks and Minerals thru The Facts for the Color Computer, OS/9, GoedelEscherBach, SmallTalk-80 and so on thru POSIX 1003.1. You might try the MyLowestBookshelf excercise. I had some fun with it a few years ago.

I do keep one shelf of Web and XML and networking books; not so much so that I can refer to their contents but rather to commemorate working with the people who wrote them.

I have Lessig's Free Culture on the top shelf, i.e. the "you really should have read this by now" guilt-pile. But I had better luck making time to listen to the recording of his Wikimania talk over the web.

For current events and new developments, I'll probably stick with online sources. But I think I'll order these two books; they seem to have stuff I can't get online.

postscript: for an extra $13.99, amazon offers to let me start reading the Brachman right now with their Amazon Online Reader. Hmm...

Wishing for XOXO microformat support in OmniOutliner

Submitted by connolly on Sat, 2006-09-16 21:46. :: |

OmniOutliner is great; I used it to put together the panel remarks I mentioned an earlier breadcrumbs item.

The XHTML export support is quite good, and even their native format is XML. But it's a bit of a pain to remember to export each time I save so that I can keep the ooutline version and the XHTML version in sync.

Why not just use an XHTML dialect like XOXO as the native format? I wonder what, if anything, their native format records that XOXO doesn't cover and couldn't be squeezed into XHTML's spare bandwidth somewhere.

There's also the question of whether XOXO can express anything that OmniOutliner can't handle, I suppose.

Blogged with Flock

Talking with U.T. Austin students about the Microformats, Drug Discovery, the Tabulator, and the Semantic Web

Submitted by connolly on Sat, 2006-09-16 21:36. :: | | | | | |

Working with the MIT tabulator students has been such a blast that while I was at U.T. Austin for the research library symposium, I thought I would try to recruit some undergrads there to get into it. Bob Boyer invited me to speak to his PHL313K class on why the heck they should learn logic, and Alan Cline invited me to the Dean's Scholars lunch, which I used to attend when I was at U.T.

To motivate logic in the PHL313K class, I started with their experience with HTML and blogging and explained how the Semantic Web extends the web by looking at links as logical propositions. cal screen shot I used my XML 2005 slides to talk a little bit about web history and web architecture, and then I moved into using hCalendar (and GRDDL, though I left that largely implicit) to address the personal information disaster. This was the first week or so of class and they had just started learning propositional logic, and hadn't even gotten as far as predicate calculus where atomic formulas like those in RDF show up. And none of them had heard of microformats. I promised not to talk for the full hour but then lost track of time and didn't get to the punch line, "so the computer tells you that no, you can't go to both the conference and Mom's birthday party because you can't be in two places at once" until it was time for them to head off to their next class.

One student did stay after to pose a question that is very interesting and important, if only tangentially related to the Semantic Web: with technology advancing so fast, how do you maintain balance in life?

While Boyer said that talk went well, I think I didn't do a very good job of connecting with them; or maybe they just weren't really awake; it was an 8am class after all. At the Dean's Scholars lunch, on the other hand, the students were talking to each other so loudly as they grabbed their sandwiches that Cline had to really work to get the floor to introduce me as a "local boy done good." They responded with a rousing ovation.

Elaine Rich had provided the vital clue for connecting with this audience earlier in the week. She does AI research and had seen TimBL's AAAI talk. While she didn't exactly give the talk four stars overall, she did get enough out of it to realize it would make an interesting application to add to a book that she's writing, where she's trying to give practical examples that motivate automata theory. So after I took a look at what she had written about URIs and RDF and OWL and such, she reminded me that not all the Deans Scholars are studying computer science; but many of them do biology, and I might do well to present the Semantic Web more from the perspective of that user community.

So I used TimBL's Bio-IT slides. They weren't shy when I went too fast with terms like hypertext, and there were a lot of furrowed brows for a while. But when I got to the FOAFm OMM, UMLS, SNP, Uniprot, Bipax, Patents all have some overlap with drug target ontology drug discovery diagram, I said I didn't even know some of these words and asked them which ones they knew. After a chuckle about "drug", one of them explained about SNP, i.e. single nucleotide polymorphism and another told me about OMM and the discussion really got going. I didn't make much more use of Tim's slides. One great question about integrating data about one place from lots of sources prompted me to tempt the demo gods and try the tabulator. The demo gods were not entirely kind; perhaps I should have used the released version rather than the development version. But I think I did give them a feel for it. In answer to "so what is it you're trying to do, exactly?" I gave a two part answer:

  1. Recruit some of them to work on the tabulator so that their name might be on the next paper like the SWUI06 paper, Tabulator: Exploring and Analyzing linked data on the Semantic Web.
  2. Integrate data accross applications and accross administrative boundaries all over the world, like the Web has done for documents.

We touched on the question of local and global consistency, and someone asked if you can reason about disagreement. I said that yes, I had presented a paper in Edinburgh just this May that demonstrated formally a disagreement between several parties

One of the last questions was "So what is computer science research anway?" which I answered by appeal to the DIG mission statement:

The Decentralized Information Group explores technical, institutional and public policy questions necessary to advance the development of global, decentralized information environments.

And I said how cool it is to have somebody in the TAMI project with real-world experience with the privacy act. One student followed up and asked if we have anybody with real legal background in the group, and I pointed him to Danny. He asked me afterward how to get involved, and it turned out that IRC and freenode are known to him, so the #swig channel was in our common neighborhood in cyberspace, even geography would separate us as I headed to the airport to fly home.

technorati tags:, ,

Blogged with Flock

ACL 2 seminar at U.T. Austin: Toward proof exchange in the Semantic Web

Submitted by connolly on Sat, 2006-09-16 21:15. :: | | |

 

In our PAW and TAMI projects, we're making a lot of progress on the practical aspects of proof exchange: in PAW we're working out the nitty gritty details of making an HTTP client (proxy) and server that exchange proofs, and in TAMI, we're working on user interfaces for audit trails and justifications and on integration with a truth maintenance system.

It doesn't concern me too much that cwm does some crazy stuff when finding proofs; it's the proof checker that I expect to deploy as part of trusted computing bases and the proof language specification that I hope will complete the Semantic Web standards stack.

But N3 proof exchange is no longer a completely hypothetical problem; the first examples of interoperating with InferenceWeb (via a mapping to PML) and with Euler are working. So it's time to take a close look at the proof representation and the proof theory in more detail.

My trip to Austin for a research library symposium at the University of Texas gave me a chance to re-connect with Bob Boyer. A while back, I told him about RDF and asked him about Semantic Web logic issues and he showed me the proof checking part of McCune's Robbins Algebras Are Boolean result:

Proofs found by programs are always questionable. Our approach to this problem is to have the theorem prover construct a detailed proof object and have a very simple program (written in a high-level language) check that the proof object is correct. The proof checking program is simple enough that it can be scrutinized by humans, and formal verification is probably feasible.

In my Jan 2000 notes, that excerpt is followed by...

I offer a 500 brownie-point bounty to anybody who converts it to Java and converts the ()'s in the input format to <>'s.

5 points for perl. ;-)

Bob got me invited to the ACL2 seminar this week; in my presentation, Toward proof exchange in the Semantic Web. I reviewed a bit of Web Architecture and the standardization status of RDF, RDFS, OWL, and SPARQL as background to demonstrating that we're close to collecting that bounty. (Little did I know in 2000 that TimBL would pick up python so that I could avoid Java as well as perl ;-)

Matt Kauffman and company gave all sorts of great feedback on my presentation. I had to go back to the Semantic Web Wave diagram a few times to clarify the boundary between research and standardization:

  • RDF is fully standardized/ratified
  • turtle has the same expressive capability as RDF's XML syntax, but isn't fully ratified, and
  • N3 goes beyond the standards in both syntax and expressiveness

One of the people there who knew about RDF and OWL and such really encouraged me to get N3/turtle done, since every time he does any Semantic Web advocacy, the RDF/XML syntax is a deal-killer. I tried to show them my work on a turtle bnf, but what I was looking for was in June mailing list discussion, not in my February bnf2turtle breadcrumbs item.

They asked what happens if an identifier is used before it appears in an @forAll directive and I had to admit that I could test what the software does if they wanted to, but I couldn't be sure whether that was by design or not; exactly how quantification and {}s interact in N3 is sort of an open issue, or at least something I'm not quite sure about.

Moore noticed that our conjunction introduction (CI) step doesn't result in a formula whose main connective is conjuction; the conjuction gets pushed inside the quantifiers. It's not wrong, but it's not traditional CI either.

I asked about ACL2's proof format, and they said what goes in an ACL2 "book" is not so much a proof as a sequence of lemmas and such, but Jared was working on Milawa, a simple proof checker that can be extended with new prooftechniques.

I started talking a little after 4pm; different people left at different times, but it wasn't until about 8 that Matt realized he was late for a squash game and headed out.

MLK and the UT TowerI went back to visit them in the U.T. tower the next day to follow up on ACL2/N3 connections and Milawa. Matt suggested a translation of N3 quantifiers and {}s into ACL2 that doesn't involve quotation. He offered to guide me as I fleshed it out, but I only got as far as installing lisp and ACL2; I was too tired to get into a coding fugue.

Jared not only gave me some essential installation clues, but for every technical topic I brought up, he printed out two papers showing different approaches. I sure hope I can find time to follow up on at least some of this stuff.

del.icio.us tags:, , , ,

Blogged with Flock

Syndicate content