Getting my Personal Finance data back with hCalendar and hCard

Submitted by connolly on Wed, 2006-03-08 19:25. :: | |

The Quicken Interchange Format (QIF) is notoriously inadequate for clean import/export. The instructions for migrating Quicken data across platforms say:

  1. From the old platform, dump it out as QIF
  2. On the new platform, read in the QIF data
  3. After importing the file, verify that account balances in your new Quicken for Mac 2004 data file are the same as those in Quicken for Windows. If they don't match, look for duplicate or missing transactions.

I have not migrated my data from Windows98 to OS X because of this mess. I use win4lin on my debian linux box as life-support for Quicken 2001.

Meanwhile, Quicken supports printing any report to a tab-separated file, and I found that an exhaustive transaction report represents transfers unambiguously. Since October 2000, when my testing showed that I could re-create various balances and reports from these tab-separated reports, I have been maintaining a CVS history of my exported Quicken data, splitting it every few years:

   $ wc *qtrx.txt
4785 38141 276520 1990-1996qtrx.txt
6193 61973 432107 1997-1999qtrx.txt
4307 46419 335592 2000qtrx.txt
5063 54562 396610 2002qtrx.txt
5748 59941 437710 2004qtrx.txt
26096 261036 1878539 total

I started a little module on dev.w3.org... I call it Quacken currently, but I think I'm going to have to rename it for trademark reasons. I started with normalizeQData.py to load the data into postgress for use with saCASH, but then saCASH went Java/Struts and all way before debian supported Java well enough for me to follow along. Without a way to run them in parallel and sync back and forth, it was a losing proposition anyway.

Then I managed to export the data to the web by first converting it to RDF/XML:

qtrx93.rdf: $(TXTFILES)
$(PYTHON) $(QUACKEN)/grokTrx.py $(TXTFILES) >$@

... and then using searchTrx.xsl (inside a trivial CGI script) that puts up a search form, looks for the relevant transactions, and returns them as XHTML. I have done a few other reports with XSLT; nothing remarkable, but enough that I'm pretty confident I could reproduce all the reports I use from Quicken. But the auto-fill feature is critical, and I didn't see a way to do that.

Then came google suggest and ajax. I'd really like to do an ajax version of Quicken.

I switched the data from CVS to mercurial a few months ago, carrying the history over. I seem to have 189 commits/changesets, of which 154 are on the qtrx files (others are on the makefile and related scripts). So that's about one commit every two weeks.

Mercurial makes it easy to keep the whole 10 year data set, with all the history, in sync on several different computers. So I had it all with me on the flight home from the W3C Tech Plenary in France, where we did a microformats panel. Say... transactions are events, right? And payee info is kinda like hCard...

So factored out the parts of grokTrx.py that do the TSV file handling (trxtsv.py) and wrote an hCalendar output module (trxht.py).

I also added some SPARQL-ish filtering, so you can do:

 python trxht.py --account 'MIT 2000' --class 200009xml-ny  2000qtrx.txt

And get a little microformat expense report:

9/20/00 SEPTEMBERS STEAKHOUSE ELMSFORD NY  MIT 2000
 19:19c[Citi Visa HI]/200009xml-ny29.33
9/22/00 RAMADA INNS ELMSFORD GR ELMSFORD NY  MIT 2000
 3 nightsc[Citi Visa HI]/200009xml-ny603.96
9/24/00 AVIS RENT-A-CAR 1 WHITE PLAINS NY  MIT 2000
  c[Citi Visa HI]/200009xml-ny334.45
1/16/01 MIT  MIT 2000
 MIT check # 20157686 dated 12/28/00c[Intrust Checking]/200009xml-ny-967.74

Mercurial totally revolutionizes coding on a plane. There's no way I would have been as productive if I couldn't commit and diff and such right there on the plane. I'm back to using CVS for the project now, in order to share it over the net, since I don't have mercurial hosting figured out just yet. But here's the log of what I did on the plane:

changeset:   19:d1981dd8e140
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 20:48:44 2006 -0600
summary: playing around with places

changeset: 18:9d2f0073853b
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:21:35 2006 -0600
summary: fixed filter arg reporting

changeset: 17:3993a333747b
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:10:10 2006 -0600
summary: more dict work; filters working

changeset: 16:59234a4caeae
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 17:30:28 2006 -0600
summary: moved trx structure to dict

changeset: 15:425aab9bcc52
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 20:57:17 2006 +0100
summary: vcards for payess with phone numbers, states

changeset: 14:cbd30e67647a
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 19:12:38 2006 +0100
summary: filter by trx acct

changeset: 13:9a2b49bc3303
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:45:06 2006 +0100
summary: explain the filter in the report

changeset: 12:2ea13bafc379
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:36:09 2006 +0100
summary: class filtering option

changeset: 11:a8f550c8759b
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 18:24:45 2006 +0100
summary: filtering in eachFile; ClassFilter

changeset: 10:acac37293fdd
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 17:53:18 2006 +0100
summary: moved trx/splits fixing into eachTrx in the course of documenting trxtsv.py

changeset: 9:5226429e9ef6
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 17:28:01 2006 +0100
summary: clarify eachTrx with another test

changeset: 8:afd14f2aa895
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 17:19:36 2006 +0100
summary: replaced fp style grokTransactions with iter style eachTrx

changeset: 7:eb020cda1e67
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 16:16:43 2006 +0100
summary: move isoDate down with field routines

changeset: 6:123f66ac79ed
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 16:14:45 2006 +0100
summary: tweak docs; noodle on CVS/hg scm stuff

changeset: 5:4f7ca3041f9a
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 16:04:07 2006 +0100
summary: split trxtsv and trxht out of grokTrx

changeset: 4:95366c104b42
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 14:48:04 2006 +0100
summary: idea dump

changeset: 3:62057f582298
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 09:55:48 2006 +0100
summary: handle S in num field

changeset: 2:0c23921d0dd3
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 09:38:54 2006 +0100
summary: keep tables bounded; even/odd days

changeset: 1:031b9758304c
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 09:19:05 2006 +0100
summary: table formatting. time to land

changeset: 0:2d515c48130b
user: Dan Connolly <connolly@w3.org>
date: Sat Mar 4 07:55:58 2006 +0100
summary: working on plane

I used doctest unit testing quite a bit, and rst for documentation:

trxht -- format personal finance transactions as hCalendar

Usage

Run a transaction report over all of your data in some date range and print it to a tab-separated file, say, 2004qtrx.txt. Then invoke a la:

$ python trxht.py 2004qtrx.txt  >,x.html
$ xmlwf ,x.html
$ firefox ,x.html

You can give multiple files, as long as the ending balance of one matches the starting balance of the next:

$ python trxht.py 2002qtrx.txt 2004qtrx.txt  >,x.html

Support for SPARQL-style filtering is in progress. Try:

$ python trxht.py --class myclass myqtrx.txt  >myclass-transactions.html

to simulate:

describe ?TRX where { ?TRX qt:split [ qs:class "9912mit-misc"] }.

Future Work

  • add hCards for payees (in progress)
  • pick out phone numbers, city/state names
  • support a form of payee smushing on label
  • make URIs for accounts, categories, classses, payees
  • support round-trip with QIF; sync back up with RDF export work in grokTrx.py
  • move the quacken project to mercurial
  • proxy via dig.csail.mit.edu or w3.org? both?
  • run hg serve on homer? swada? login.csail?
  • publish hg log stuff in a _scm/ subpath; serve the current version at the top