connolly's blog

time, context, quoting, and reification

Submitted by connolly on Mon, 2006-03-20 13:10. :: |

The time and context permathread has heated up again. I used to argue with TimBL endlessly about time context until I read Contexts: A Formalization and Some Applications; from Guha's page at stanford.

geocoding and hCards for airports from wikipedia

Submitted by connolly on Sun, 2006-03-19 04:37. :: | |

Inspired by the SXSW hCard/google map mash-up, I'm geocoding some of my travel data.

In palmagent, I started on a module to get airport lat/long info from wikipedia; it grabs some other hCard info meanwhile:

~/devcvs/2001/palmagent$ python aptdata.py LAX
getting list of airports contatining LAX in http://en.wikipedia.org/wiki/List_of_airports_by_IATA_code:_L
looking for LAX in text
link path: /wiki/Los_Angeles_International_Airport
finding data in http://en.wikipedia.org/wiki/Los_Angeles_International_Airport
{'url': 'http://en.wikipedia.org/wiki/Los_Angeles_International_Airport', 'org': 'Los Angeles International Airport', 'nickname': 'LAX', 'geo': {'latitude': 33.942500000000003, 'longitude': -117.59194444444445}, 'tz': -8}

I don't have a kid template yet, but I hope you get the idea.

Along the way, I found a wikipedia airport project and this really cool NAC coding system; basically, it's base 30 lat/long/altitue. Two 4-digit numerals get you down to a building, and with 5 digits, you get down to the square meter. Looks like a great GeoOnion technique.

tags:

using JSON and templates to produce microformat data

Submitted by connolly on Sun, 2006-03-19 04:09. :: | | | |

In Getting my Personal Finance data back with hCalendar and hCard, I discussed using JSON-style records as an intermediate structure between tab-separated transaction report data and hCalendar. I just took it a step further in palmagent; hipsrv.py uses kid templates, so the markup can be tweaked independently of the normalization and SPARQL-like filtering logic. I expect to be able to do RDF/XML output thru templates too.

Working at the JSON level is nice; when I want to make a list of 3 numbers, I can just do that, unlike in XML where I have to make up names and think about whether to use a space-separated microparsed attribute value or a massively redundant element structure.

It brings me back to my March 1997 essay for the Web Apps issue on Distributed Objects and noodling on VHLL types in ILU.

a quick take on Kiko, a nifty looking calendar service

Submitted by connolly on Sun, 2006-03-19 03:05. :: |

I just tried out kiko, an ajax calendar service. It looks kinda cool, but it failed to import my calendar data ("invalid file") and contacts (spun forever). Update: they fixed the problems I reported the same day, and they seem to have most of the relevant features I want planned. See comments for details. Hmm... reminders by IM is kinda interesting. But I'm not sure I could rely on a remote service for that; it's critical that I be able to create/change appointments and get reminders in the middle of the day, even if I have no GPRS. I reported the import problems and sent a feature request for subscribe. hCalendar/hCard support would be cool too.

A look at emerging Web security architectures from a Semantic Web perspective

Submitted by connolly on Fri, 2006-03-17 17:51. :: | | | | | |

W3C had a workshop, Toward a more Secure Web this week. Citigroup hosted; the view from the 50th floor was awesome.

Some notes on the workshop are taking shape:

A look at emerging Web security architectures from a Semantic Web perspective

Comparing OpenID, SXIP/DIX, InfoCard, SAML to RDF, GRDDL, FOAF, P3P, XFN and hCard

At the W3C security workshop this week, I finally got to study SXIP in some detail after hearing about it and wondering how it compares to OpenID, Yadis, and the other "Identity 2.0" techniques brewing. And just in time, with a DIX/SXIP BOF at the Dallas IETF next week.

Getting my Personal Finance data back with hCalendar and hCard

Submitted by connolly on Wed, 2006-03-08 19:25. :: | |

The Quicken Interchange Format (QIF) is notoriously inadequate for clean import/export. The instructions for migrating Quicken data across platforms say:

  1. From the old platform, dump it out as QIF
  2. On the new platform, read in the QIF data
  3. After importing the file, verify that account balances in your new Quicken for Mac 2004 data file are the same as those in Quicken for Windows. If they don't match, look for duplicate or missing transactions.

I have not migrated my data from Windows98 to OS X because of this mess. I use win4lin on my debian linux box as life-support for Quicken 2001.

Meanwhile, Quicken supports printing any report to a tab-separated file, and I found that an exhaustive transaction report represents transfers unambiguously. Since October 2000, when my testing showed that I could re-create various balances and reports from these tab-separated reports, I have been maintaining a CVS history of my exported Quicken data, splitting it every few years:

   $ wc *qtrx.txt
4785 38141 276520 1990-1996qtrx.txt
6193 61973 432107 1997-1999qtrx.txt
4307 46419 335592 2000qtrx.txt
5063 54562 396610 2002qtrx.txt
5748 59941 437710 2004qtrx.txt
26096 261036 1878539 total

I started a little module on dev.w3.org... I call it Quacken currently, but I think I'm going to have to rename it for trademark reasons. I started with normalizeQData.py to load the data into postgress for use with saCASH, but then saCASH went Java/Struts and all way before debian supported Java well enough for me to follow along. Without a way to run them in parallel and sync back and forth, it was a losing proposition anyway.

Then I managed to export the data to the web by first converting it to RDF/XML:

qtrx93.rdf: $(TXTFILES)
$(PYTHON) $(QUACKEN)/grokTrx.py $(TXTFILES) >$@

... and then using searchTrx.xsl (inside a trivial CGI script) that puts up a search form, looks for the relevant transactions, and returns them as XHTML. I have done a few other reports with XSLT; nothing remarkable, but enough that I'm pretty confident I could reproduce all the reports I use from Quicken. But the auto-fill feature is critical, and I didn't see a way to do that.

Then came google suggest and ajax. I'd really like to do an ajax version of Quicken.

I switched the data from CVS to mercurial a few months ago, carrying the history over. I seem to have 189 commits/changesets, of which 154 are on the qtrx files (others are on the makefile and related scripts). So that's about one commit every two weeks.

Mercurial makes it easy to keep the whole 10 year data set, with all the history, in sync on several different computers. So I had it all with me on the flight home from the W3C Tech Plenary in France, where we did a microformats panel. Say... transactions are events, right? And payee info is kinda like hCard...

So factored out the parts of grokTrx.py that do the TSV file handling (trxtsv.py) and wrote an hCalendar output module (trxht.py).

I also added some SPARQL-ish filtering, so you can do:

 python trxht.py --account 'MIT 2000' --class 200009xml-ny  2000qtrx.txt

And get a little microformat expense report:

9/20/00 SEPTEMBERS STEAKHOUSE ELMSFORD NY  MIT 2000
 19:19c[Citi Visa HI]/200009xml-ny29.33
9/22/00 RAMADA INNS ELMSFORD GR ELMSFORD NY  MIT 2000
 3 nightsc[Citi Visa HI]/200009xml-ny603.96
9/24/00 AVIS RENT-A-CAR 1 WHITE PLAINS NY  MIT 2000
  c[Citi Visa HI]/200009xml-ny334.45
1/16/01 MIT  MIT 2000
 MIT check # 20157686 dated 12/28/00c[Intrust Checking]/200009xml-ny-967.74

Mercurial totally revolutionizes coding on a plane. There's no way I would have been as productive if I couldn't commit and diff and such right there on the plane. I'm back to using CVS for the project now, in order to share it over the net, since I don't have mercurial hosting figured out just yet. But here's the log of what I did on the plane:

changeset:   19:d1981dd8e140
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 20:48:44 2006 -0600
summary: playing around with places

changeset: 18:9d2f0073853b
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:21:35 2006 -0600
summary: fixed filter arg reporting

changeset: 17:3993a333747b
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:10:10 2006 -0600
summary: more dict work; filters working

changeset: 16:59234a4caeae
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 17:30:28 2006 -0600
summary: moved trx structure to dict

changeset: 15:425aab9bcc52
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 20:57:17 2006 +0100
summary: vcards for payess with phone numbers, states

changeset: 14:cbd30e67647a
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 19:12:38 2006 +0100
summary: filter by trx acct

changeset: 13:9a2b49bc3303
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:45:06 2006 +0100
summary: explain the filter in the report

changeset: 12:2ea13bafc379
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:36:09 2006 +0100
summary: class filtering option

changeset: 11:a8f550c8759b
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:24:45 2006 +0100
summary: filtering in eachFile; ClassFilter

changeset: 10:acac37293fdd
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 17:53:18 2006 +0100
summary: moved trx/splits fixing into eachTrx in the course of documenting trxtsv.py

changeset: 9:5226429e9ef6
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 17:28:01 2006 +0100
summary: clarify eachTrx with another test

changeset: 8:afd14f2aa895
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 17:19:36 2006 +0100
summary: replaced fp style grokTransactions with iter style eachTrx

changeset: 7:eb020cda1e67
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 16:16:43 2006 +0100
summary: move isoDate down with field routines

changeset: 6:123f66ac79ed
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 16:14:45 2006 +0100
summary: tweak docs; noodle on CVS/hg scm stuff

changeset: 5:4f7ca3041f9a
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 16:04:07 2006 +0100
summary: split trxtsv and trxht out of grokTrx

changeset: 4:95366c104b42
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 14:48:04 2006 +0100
summary: idea dump

changeset: 3:62057f582298
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 09:55:48 2006 +0100
summary: handle S in num field

changeset: 2:0c23921d0dd3
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 09:38:54 2006 +0100
summary: keep tables bounded; even/odd days

changeset: 1:031b9758304c
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 09:19:05 2006 +0100
summary: table formatting. time to land

changeset: 0:2d515c48130b
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 07:55:58 2006 +0100
summary: working on plane

I used doctest unit testing quite a bit, and rst for documentation:

trxht -- format personal finance transactions as hCalendar

Usage

Run a transaction report over all of your data in some date range and print it to a tab-separated file, say, 2004qtrx.txt. Then invoke a la:

$ python trxht.py 2004qtrx.txt  >,x.html
$ xmlwf ,x.html
$ firefox ,x.html

You can give multiple files, as long as the ending balance of one matches the starting balance of the next:

$ python trxht.py 2002qtrx.txt 2004qtrx.txt  >,x.html

Support for SPARQL-style filtering is in progress. Try:

$ python trxht.py --class myclass myqtrx.txt  >myclass-transactions.html

to simulate:

describe ?TRX where { ?TRX qt:split [ qs:class "9912mit-misc"] }.

Future Work

  • add hCards for payees (in progress)
  • pick out phone numbers, city/state names
  • support a form of payee smushing on label
  • make URIs for accounts, categories, classses, payees
  • support round-trip with QIF; sync back up with RDF export work in grokTrx.py
  • move the quacken project to mercurial
  • proxy via dig.csail.mit.edu or w3.org? both?
  • run hg serve on homer? swada? login.csail?
  • publish hg log stuff in a _scm/ subpath; serve the current version at the top

Dates in drupal vs planetrdf

Submitted by connolly on Wed, 2006-03-08 11:13. ::

I fixed some markup errors in various of my posts last night. They all showed up on planetrdf again.

Is drupal doing something buggy? Or is it pilot error? i.e. is there some way for me to tell drupal to change the updated date but not the... umm... whatever date planetrdf uses?

Getting (dis)organized for SxSWi in Austin

Submitted by connolly on Tue, 2006-03-07 20:49. :: | |

SxSWi looks to be quite the PathCross: The microformats panel on Monday is what put the conference on my radar this year, but it's just one of dozens of panels that I really want to see. It's overwhelming. Of course, that's part of the appeal of the Austin/SXSW scene: creative chaos. As a student, my creed was "never plan more than 15 minutes ahead." Life was much simpler, in many ways, back then.

Other stuff I'm looking forward to:

I'm driving down with Mary and the boys, stopping to visit folks here and there.

And on Tuesday night, my itinerary takes me to New York for the W3C workshop on usability and authorization.

Reflections on the W3C Technical Plenary week

Submitted by connolly on Tue, 2006-03-07 20:31. :: | | | | | | |

Here comes (some of) the TAG
Originally uploaded by Norm Walsh.

The last item on the agenda of the TAG meeting in France was "Reviewing what we have learned during a full week of meetings". I proposed that we do it on the beach, and it carried.

By then, the network woes of Monday and Tuesday had largely faded from memory.

I was on two of the plenary day panels. Tantek reports on one of them: Microformats voted best session at W3C Technical Plenary Day!. My presentation in that panel was on GRDDL and microformats. Jim Melton followed with his SPARQL/SQL/XQuery talk. Between the two of them, Noah Mendelsohn said he thought the Semantic Web might just be turning a corner.

My other panel presentation was Feedback loops and formal systems where I talked about UML and OWL after touching on contrast between symbolic approaches like the Semantic Web and statistical approaches like pagerank. Folksonomies are an interesting mixture of both, I suppose. Alistair took me to task for being sloppy with the term "chaotic system"; he's quiet right that complex system is the more appropriate description of the Web.

The TAG discussion of that session started with jokes about how formal systems is soporific enough without putting it right after a big French lunch. TimBL mentioned the scheme denotational semantics, and TV said that Jonathan Rees is now at Creative Commons. News to me. I spent many, many hours poring over his scheme48 code a few years back. I don't think I knew where the name came from until today: Within 48 hours we had designed and implemented a working Scheme, including read, write, byte code compiler, byte code interpreter, garbage collector, and initialization logic.

The SemWeb IG meeting on Thursday was full of fun lightning talks and cool hacks. I led a GRDDL discussion that went well, I think. The SPARQL calendar demo rocked. Great last-minute coding, Lee and Elias!

There and back again

On the return leg of my itinerary, the captain announced the cruising altitude, as usual, and then added ... which means you'll spend most of today 6 miles above the earth.

My travel checklist worked pretty well, with a few exceptions. The postcard thing isn't a habit yet. I forgot a paperback book; that was OK since I slept quite a bit on the way over and I got into the coding zone on the way back more about that later, I hope.

Other Reflections

See also reflections by:

... and stay tuned for something from

See also: Flickr photo group, NCE bookmarks

Toward Semantic Web data from Wikipedia

Submitted by connolly on Tue, 2006-03-07 17:23. :: | | |

When I heard about Wikimania 2006 in August in Boston, I put it on my travel schedule, at least tentatively.

Then I had an idea...

Wikipedia:Infobox where the data lives in wikipedia. sparql, anyone? or grddl?

my bookmarks, 2006-02-16

Then I put the idea in a wishlist slide in my presentation on microformats and GRDDL at the W3C technical plenary last week.

The next day, in the SemWeb IG meeting, I met Markus Krötzsch and at lunch I learned he's working on Semantic MediaWiki, a project to do just what I'm hoping for. From our discussion, I think this could work out really well.

For reference, he's 3rd from the left in a photo from wikimania 2005.

I use wikipedia quite regularly to look up airport codes, latitutes, longitudes, lists of postal codes, and the like; boy would I love to have it all in RDF... maybe using GRDDL on the individual pages, maybe a SPARQL interface from their DB... maybe both.

Hmm... the RDF export of their San Diego demo page seems to conflate pages with topics of pages. I guess I should file a bug.

Syndicate content