connolly's blog

webizing TaskJuggler

Submitted by connolly on Fri, 2006-05-19 11:29. :: | | |

Going over my bookmarks I rediscovered TaskJuggler:

TaskJuggler provides an optimizing scheduler that computes your project time lines and resource assignments based on the project outline and the constrains that you have provided. The build-in resource balancer and consistency checker offload you from having to worry about irrelevant details and ring the alarm if the project gets out of hand.

Sound like this tool might be applicable to the hard problem of scheduling meetings with various constraints.

It seems to have a declarative project description language:

flags team

resource dev "Developers" {
  resource dev1 "Paul Smith" { rate 330.0 }
  resource dev2 "Sebastien Bono"
  resource dev3 "Klaus Mueller" { vacation 2002-02-01 - 2002-02-05 }

  flags team
}
resource misc "The Others" {
  resource test "Peter Murphy" { limits { dailymax 6.4h } rate 240.0 }
  resource doc "Dim Sung" { rate 280.0 }

  flags team
}

What might that look like in N3, i.e. in RDF that the tabulator could browse around? (See the N3 primer to get a feel for RDF and N3.) What would it take to webize the Taks Juggler?

Also... how does the taskjuggler consistency checker relate to OWL consistency checkers like pellet?

RDF, Microformats, and Javascript hacking in person at the 'tute

Submitted by connolly on Thu, 2006-05-04 16:14. :: | | | |

My regular schedule of working group meetings and conferences had a gap in April, and my list of reasons to chat with Ben was growing, and we're recruiting some UROPs to work on the tabulator project this summer, so I flew up for a visit to MIT.

I didn't have any particular appointments the first day, so I used the few spare minutes on the T between the airport and MIT to scare up contacts using my handheld gizmo. It turned out Aaron was in town and available for lunch in Harvard Square. We talked about life in start-ups, standards orgs, and research. He suggested layout stuff from Java and Apple should make its way into CSS and offered to write up a few details.

I spent much of Thursday with Ben working on javascript hacks to explore calendar data in RDFa. We did some whiteboard noodling about RDFa and microformats. He showed me the JavaScript shell, which is pretty cool... it gives a read-eval-print loop and tab completion in firefox... just like a lisp machine ;-) Elias dropped by and mixed in some javascript hacking he's been doing to connect SPARQL with the google calendar. Ben and I didn't get around converting my itinerary to RDFa like we planned, but we got pretty close; he sent out a New RDFa demo the next day. That same day, he came down to meet with me again, but I had to go work on a DARPA report, so we were trying to figure out next steps, and he came up with a cool idea and sent it out: Proposal: hGRDDL, an extraction from Microformats to RDFa. Elias and I did some whiteboard noodling too, in the neigborhood of JSON and templates and microformats.

Elias is learning more than he ever wanted to about calendars and timezones. It's like Dougal Campbell said to microformats-discuss:

My server is in timezone A, but I live in timezone B, but I'm posting information about an event that will occur in timezone C. Shoot me now.

On this trip, a Samsonite Dimension Notebook Case from SAM's replaced my aging W3C bag in my travel kit. I think I like all the little pockets and such, but I'm not sure; sometimes I miss the simplicity of one big compartment. I'm sure that I'm not happy that my Kensington K33069 Universal AC/Car/Air Adapter stopped working somewhere between MIT and MKE.

That's one of the reason that I always pack some light reading on dead-trees. I enjoyed escaping into Scott Lasser's Battle Creek on the way home. Baseball and fathers. Good stuff.

On GData, SPARQL update, and RDF Diff/Sync

Submitted by connolly on Tue, 2006-04-25 17:38. :: | | |

The Google Data APIs Protocol is pretty interesting. It seems to be based on the Atom publishing protocol, which is a pretty straightforward application of HTTP and XML, so that's a good thing.

The query features seem to be less expressive than the SPARQL protocol, but GData has an update feature, while the SPARQL update issue is postponed. Updating at the triple level is tricky. I helped TimBL refine Delta: an ontology for the distribution of differences between RDF graphs a bit, and there's working code in cwm. But I haven't really managed to use it in practical settings. My PDA's calendar has an XMLRPC service where I can update a whole record at a time, just like GData. I assume caldav does likewise.

The GData approach to concurrency looks quite reasonable. I haven't studied the authentication mechanism. I hope to get to that presently.

citing W3C specs from WWW conference papers

Submitted by connolly on Tue, 2006-04-25 10:19. :: | | |

As I said in a July 2000 message to www-rdf-interest:

There are very few data formats I trust... when I use when I use the computer to capture my knowledge, I pretty much stick to plain text, XML (esp XHTML, or at least HTML that tidy can turn into XHTML for me), RCS/CVS, and RFC822/MIME. I use JPG, PNG, and PDF if I must, but not for capturing knowledge for exchange, revision, etc.

And as I explained in a 1994 essay, converting from LaTeX is hard, so I try not to write in LaTeX either.

The Web conference has instructions for submitting PDF using LaTeX or MS Word and (finally!) for submitting XHTML. (The WWW2006 paper CSS stylesheet is horrible... who wants to read 9pt times on screen?!?! Anyway...) So when the IRW 2006 organizers told me they'd like a PDF version of my paper in that style, I dusted off my Transforming XHTML to LaTeX and BibTeX tools and got to work.

My paper cites a number of W3C specs, including HTML 4. The W3C tech reports index/digital library has an associated bibliography generator. I fed it http://www.w3.org/TR/html401 and it generated a nice bibliographic reference from an RDF data set. I'm interested in the ongoing citation microformats work that might make that transformation lossless, since I need not just XHTML, but BibTex. What I'm doing currently is adding some bibtex vocabulary in class and rel attributes:

<dt class="TechReport">
<a name="HTML4" id="HTML4">[HTML4]</a>
</dt>

<dd><span class="author">Le Hors, Arnaud and Raggett, Dave and
Jacobs, Ian</span> Editors,
<cite> <a
href="http://www.w3.org/TR/1999/REC-html401-19991224">HTML 4.01
Specification</a> </cite>,
<span class="institution">W3C</span> Recommendation,
24 <span class="month">December</span> <span class="year">1999</span>,
<tt class="number">http://www.w3.org/TR/1999/REC-html401-19991224</tt>.
<a href="http://www.w3.org/TR/html401" title="Latest version of
HTML 4.01 Specification">Latest version</a> available at
http://www.w3.org/TR/html401 .</dd>

When run thru my xh2bib.xsl, out comes:

@TechReport{HTML4,
title = "{
HTML 4.01 Specification
}",
    author = {Le Hors, Arnaud
and Raggett, Dave
and Jacobs, Ian},
    institution = {W3C},
    month = {December},
    year = {1999},
    number = {http://www.w3.org/TR/1999/REC-html401-19991224},
    howpublished = { \url{http://www.w3.org/TR/1999/REC-html401-19991224} }
}

I think I should be using editor = rather than author = but that didn't work the 1st time I tried and I haven't investigated further.

In any case, I'm reasonably happy with the PDF output.

Access control and version control: an over-constrained problem?

Submitted by connolly on Tue, 2006-04-25 03:47. :: |

For a long time, when it came to sharing code, life was simple: everything went into the w3ccvs repository via ssh; when I commit WWW/People/Connolly/Overview.html the latest version is propagated out to all the web mirrors automatically and everybody can see the results at http://www.w3.org/People/Connolly/. I can also write via HTTP, using Amaya or any command-line tool with PUT support, e.g. curl. The PUT results in a cvs commit, followed by the same mirroring magic.

We have a forms-based system for setting the access control for each file -- access control to the latest version, that is; access to older versions, to logs and the rest of the usual cvs goodies is limited to the few dozen people with access to the w3ccvs repository.

To overcome that limitation, Daniel Veillard set up dev.w3.org which conforms more to the open source norms, providing anonymous read-only CVS access and web-browseable history. There is no http-based access control there, though CVS commit access is managed with ssh.

The downside of dev.w3.org is that it doesn't support web publishing like w3ccvs does. http://dev.w3.org/cvsweb/2001/palmagent/event-test.html is the cvs history of event-test.html, not the page itself. The address that gives the page itself is http://dev.w3.org/cvsweb/~checkout~/2001/palmagent/event-test.html?rev=HEAD&content-type=text/html;%20charset=iso-8859-1. And every GET involves cvs reading the ,v file. And of course, relative links don't work.

For the SWAP project, we use a horrible kludge of both: we commit to w3cvs, and every 15 minutes, dev.w3.org is updated by rsync.

Then I have a personal web site, where I run Zope. That gives me thru-the-web editing with revision history/audit/undo, but at the cost of having the whole site in one big file on the server, and no local goodies like cvs diff. Ugh. I'd like to switch to something else, but I haven't found anything else that talks webdav to iCal out-of-the-box. Plus, I can spare precious little time for sysadmin on my personal site, where there's basically just one user. I was really reluctant to use flickr.com URIs for my photos, but their tools are so much nicer that the alternative is that I basically don't publish photos at all. Plus, the social networking benefits of publishing on flickr are considerable. But that's really a separate point (or is it? hmm).

As I wrote in a January item, we're using svn in DIG. DIG is part of CSAIL, which is a long-time kerberos shop. Public key authentication is so much nicer than shared key. I can use cvs or ssh over ssh to any number of places (mindswap, microformats, ...) with the same public key. But I need separate credentials (and a separte kinit step) for CSAIL. Ah... but it does propagate svn commit to http space in the right way. I think I'll try it ... see data4.

I'd rather use hg, which is peer-to-peer; it's theoretically possible to use svn as an hg peer, but that's slightly beyond the documented state-of-the-art.

busy day in #microformats

Submitted by connolly on Wed, 2006-04-19 19:53. :: | | | |
hgk screenshot showing lots of commits today

Last month at SXSWi, I met up with Ryan King and we seemed to be on the same wavelength about developing test cases for microformats. A couple weeks ago, we started using mercurial/hg to share microformats code, tests. I try to sync up with Ryan once a week or so. We had another busy day today.

We're still struggling a bit to shed our CVS habits and learn hg's push/pull/merge routine, but we're getting quite a bit of coding and testing done, as you can see in hg.microformats.org.

We talked a little about line lengths and tabs/indenting and various other open source housekeeping details.

I looked over some experimental code that Brian has added for <del> support. I simplified one big XPath expression and fixed a bug while I was at it, I think.

Brian's code is ahead of the specs when it comes to <abbr class="geo" title="41.9794444444;-87.9044444444"> but it works really well, in for example, the SXSWi hCalendar/hCard/google-maps mash-up so I expect it'll get added to the specs and tests.

Earlier in the week, prompted by a #swig discussion of MicroModels, i.e. GRDDL and microformats, I collected some notes in the microformats wiki on profile URIs to reflect some discussions we had in Austin.

Consensus and community review in open source and open standards

Submitted by connolly on Thu, 2006-04-06 18:03. :: | |

Consensus is a core value of W3C and lots of other open standards and open source communities. I used to think that a decision where almost everybody agreed except a few objectors was an example of consensus. That was based on my experience in the IETF, with its "rough consensus and running code" mantra. Then I learned that this is quite a stretch with respect to the normal dictionary meaning of "consensus".

The debian community seems to be examining the meaning of "consensus":

Many things are done on behalf of the project without every individual member supporting them - for instance, Mark is vigorously opposed to Debian UK being granted a trademark license, even though Branden (and therefore the project) granted one. The key difference here is the difference between consensus and unanimity.

Matthew Garrett 2006-04-04

Definitions of "consensus" vary. The wikipedia article on consensus has a good synthesis: Achieving consensus requires serious treatment of every group member's considered opinion.

W3C's consensus policy formally distinguishes the case of even one objection from consensus:

The following terms are used in this document to describe the level of support for a decision among a set of eligible individuals:

  1. Consensus: A substantial number of individuals in the set support the decision and nobody in the set registers a Formal Objection. Individuals in the set may abstain. Abstention is either an explicit expression of no opinion or silence by an individual in the set. Unanimity is the particular case of consensus where all individuals in the set support the decision (i.e., no individual in the set abstains).
  2. Dissent: At least one individual in the set registers a Formal Objection.

...

In some cases, even after careful consideration of all points of view, a group might find itself unable to reach consensus. The Chair may record a decision where there is dissent (i.e., there is at least one Formal Objection) so that the group may make progress (for example, to produce a deliverable in a timely manner). Dissenters cannot stop a group's work simply by saying that they cannot live with a decision. When the Chair believes that the Group has duly considered the legitimate concerns of dissenters as far as is possible and reasonable, the group should move on.

That last bit is important, since "you can't schedule consensus," another lesson I learned from Michael Sperberg-McQueen. And we do try to schedule our deliverables.

The RDF Data Access Working Group (DAWG) has been working on SPARQL for quite a while now. Our first public release was October 2004. Since then, we have handled comments from a few dozen people and tried to reach consensus with them. We weren't always successful. Our request for Candidate Recommendation shows the outstanding formal objections, each one of which got reviewed by The Director. Though W3C did grant that request for Candidate Recommendation status for SPARQL today (yay!), we need to go back over some of the comments and make test cases and maybe some clarifications. I hope that, in the process, we can address some of the concerns of those with formal objections and achieve consensus with them.

Also, I remember a time though I can't confirm from The Tao of IETF or any of the other records that I searched when people and companies who wanted to deploy new technology on the Internet were expected to submit their proposal for community review before deploying widely. I wrote a message on squatting on link relationship names, x-tokens, registries, and URI-based extensibility to www-tag in April 2005, with concerns about several mechanisms which were deployed, some at giga-scale, as far I can tell, without any community review. I think I'll repeat just about the whole thing:


When somebody wants to deploy a new idiom or a new term in the Web, they're more than welcome to make up a URI for it...

"[URI] is an agreement about how the Internet community allocates names and associates them with the resources they identify."

webarch

We particularly encourage this for XML vocabularies...

The purpose of an XML namespace (defined in [XMLNS]) is to allow the deployment of XML vocabularies (in which element and attribute names are defined) in a global environment and to reduce the risk of name collisions in a given document when vocabularies are combined."

webarch

But while making up a URI is pretty straightforward, it's more trouble than not bothering at all. And people usually don't do any more work than they have to.

There is a time and a place for just using short strings, but since short strings are scarce resources shared by the global community, fair and open processes should be used to manage them. Witness TCP/IP ports, HTML element names, Unicode characters, and domain names and trademarks -- different processes, with different escalation and enforcement mechanisms, but all accepted as fair by the global community, more or less, I think.

The IETF has a tradition of reserving tokens starting with "x-" for experimental use, with the expectation that they'll shed the x- prefix as they're registered by IANA. But it's not really clear how that transition happens.

Witness application/x-www-form-urlencoded. A horrible name, perhaps, but nobody has enough motivation to change it. It's been all the way thru the W3C process... twice now: once for HTML 4 and again in XForms. Hmm... I wonder if it's registered... nope.

A pattern that I'd like to see more of is

  1. start with a URI for a new term
  2. if it picks up steam, introduce a synonym that is a short string thru a fair/open process

I'm not sure where the motivation to complete step 2 will come from, but if it doesn't come at all, that's OK. Stopping with a URI term is a lot better than getting stuck with something like x-www-form-urlencoded.

Lately I'm seeing quite the opposite. The HTML specification includes a hook for grounding link relationships in URI space, but people aren't using it:

when Google sees the attribute (rel="nofollow") on hyperlinks, those links won't get any credit when we rank websites in our search results.

google Jan 2005 announcement

By adding rel="tag" to a hyperlink, a page indicates that the destination of that hyperlink is an author-designated "tag" (or keyword/subject) of the current page."

technorati RelTag

What are the prefetching hints?

The browser looks for either an HTML <link> tag or an HTTP Link: header with a relation type of either next or prefetch.

mozilla prefetching FAQ

Google is sufficiently influential that they form a critical mass for deploying these things all by themselves. While Google enjoys a good reputation these days, and the community isn't complaining much, I don't think what they're doing is fair. Other companies with similarly influential positions used to play this game with HTML element names, and I think the community is decided that it's not fair or even much fun.

Deployment of the technorati RelTag thingy seems much more grass-roots, peer-to-peer. But even so, it's only a matter of time before we see a name clash. So perhaps it's fair, but it doesn't seem wise.

I think all three of these are cases of squatting on the community resource of link relationship names.

Should all new link relationships go thru the W3C HTML Working Group? No, of course not. The profile mechanism is there to decentralize the process.

Should W3C run a registry of link relationship names? That seems boring and inefficient, to me. It can't possibly cost less time and effort to apply for a W3C-registered link relationship name than it can to reserve a domain name and run a web server, can it?

If Google and Mozilla really want the community agree to these short names, I'd be happy to see them use the W3C member submissions process.

A step forward with python and sshagent, and a walk around gnome security tools

Submitted by connolly on Wed, 2006-03-29 09:34. ::

At the August PAW meeting, I dropped a pointer in IRC to sshAuth.py, my attempt to use sshagent to make digital signatures. I started on it 2003/09, and I banged my head against a while for quite a while trying to get it to work.

Last night, while noodling on calendar synchronization and delegation, I took another run at the problem; this time, it worked! Thanks to paramiko:


from paramiko import Agent, RSAKey, Message
import Crypto.Util.randpool
import binascii

data = "hoopy" # data to sign
user = "connolly" # salt to taste

# get my public key
authkeys = file("/home/%s/.ssh/authorized_keys" % user)
authkeys.next() # skip 1st one
keyd = authkeys.next()
tn, uu, other = keyd.split()
keyblob = binascii.a2b_base64(uu)
pubkey = RSAKey(Message(keyblob))

pool = Crypto.Util.randpool.RandomPool()
a = Agent()
agtkey = a.get_keys()[0]
sigblob = agtkey.sign_ssh_data(pool, data)

print pubkey.verify_ssh_sig(data, Message(sigblob))

That skip 1st one bit took me a while to figure out. I have 2 keys in my ~/.ssh/authorized_keys file. I wonder if sshAuth.py would work with that fix.

I also took a look at the state-of-the art in password agents and managers for gnome. revelation looks interesting. I'm still hoping for something like OpenID/SXIP integrated with password managers like the OSX keychain.

I took notes in the #swig channel while I was at it. I got a kick out of this exchange:

 
04:44:59 <Ontogon_> dan, are you talking to yourself?
04:45:32 <dajobe> he's talking to the web

hacking soccer schedules into hCalendar and into my sidekick

Submitted by connolly on Sun, 2006-03-26 00:43. :: | | |

When I gave a microformats+GRDDL talk early this month, I listed my kid's soccer schedules on my wish list slide.

Those soccer schedules are now available. No matter how badly I need them in my sidekick calendar in order to coordinate other family activities, I couldn't bear the thought of manually keying them in. The bane of my existence is doing things that I know the computer could do for me.1998,2002

One of the schedules is in PDF. I toyed with a PDF to excel converter and with google's "view as HTML" cache feature but didn't get very far with those. But the other schedule is on a normal database-backed web site.

It took just 91 lines of XSLT to put hCalendar markup to the page. Each event was a row; I added a tbody element around each row so that I could use the row element as the class="description" element. I used an ASP recipie to figure out how to just add an attribute here and there and leave the rest alone. I didn't get the AM/PM XSLT code right the first time; silly, since I've written it before; I should have copied it, if not factored it out for reuse.

My PDA/cellphone is a t-mobile sidekick. There's a very nice XMLRPC interface to the data, but it went read-only and it's not supported, so I use a ClientForm screen-scraping hack to upload flight schedules and such to my PDA. hipAgent.py (in my palmagent project) gets its data thru a simple command-line interface or thru tab-separated text files. I had a set of eventlines.n3 rules for reducing RDF calendar data to tab-separated format, but its timezone support is quirky and it doesn't handle multi-line descriptions. So I bit the bullet and integrated cwm's RDF reader via the simple myStore API into hipAgent.py. I was simple enough:


elif o in ("--importRDF",):
import uripath, os
from myStore import load # http://www.w3.org/2000/10/swap/
addr = uripath.join("file:" + os.getcwd() + "/", a)
kb = load(addr)
importTimedRDF(number, passwd, kb, icon

...

from myStore import Namespace
RDF = Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#")
ICAL = Namespace('http://www.w3.org/2002/12/cal/icaltzd#')

for ev in kb.each(pred = RDF.type, obj = ICAL.Vevent):
titl = unicode(kb.the(subj = ev, pred = ICAL.summary)).encode("utf-8")
progress("== found event:", titl)

when = str(kb.the(subj = ev, pred = ICAL.dtstart))
dt = when[:10]
ti = when[11:16]

loc = kb.any(subj = ev, pred = ICAL.location)
if loc: loc = unicode(loc).encode("utf-8")
desc = kb.any(subj = ev, pred = ICAL.description)
if desc: desc = unicode(desc).encode("utf-8")

progress("a.addTimedEvent", dt, ti)
a.addTimedEvent(dt, ti,
titl, desc,
'minutes', 60, #@@hardcoded
where=loc)

So I succesfully loaded son's soccer schedule into my sidekick PDA calendar:

  1. GET the schedule
  2. tidy it
  3. add hCalendar markup (vevent, description, summary dtstart, location) using XSLT
  4. convert to RDF/XML using a GRDDL transform for hCalendar, glean-hcal.xsl
  5. load into sidekick using hipAgent.py

The folks running the DB-backed web site could add hCalendar markup with even less hassle then I did (though they might have to think a little bit to produce well-formed XHTML), at which point I could reduce the first 4 steps with GRDDL (either via remote service or by adding GRDDL support to hipAgent.py or to cwm's myStore.load() function).

no more life in a textarea: MozEx and emacs to the rescue!

Submitted by connolly on Tue, 2006-03-21 00:01. ::

I have been living in a textarea since I started this blog, always a little nervous, knowing that firefox doesn't know that integrity is job one. That is: Firefox doesn't guarantee to save all work, by default; I don't consider that a big bug; it's a browser, not an editor, after all. I outsourced bookmarking to delicious because that's knowledge capture, and I don't rely on my browser for that.

But as TimBL has been saying since at least as far back as his 1998 design issues note,

If you think surfing hypertext is cool, that's because you haven't tried writing it. If you have found your bookmarks/favorites have become a more and more important part of your life, that's because you have learned to put up with the simplest form of hypertext editing there is as a compromise.

Syndicate content