python, javascript, and PHP, oh my!

Submitted by connolly on Tue, 2006-02-07 11:28. ::

My habits for developing quality code in python are bumping up against the fact that the deployment platforms for the web client and server are javascript and PHP, respectively.

I love the python doctest module. is a pretty good example of how I like to use it to simultaneously document and test code:

def splitFrag(uriref):
    """split a URI reference between the fragment and the rest.

    Punctuation is thrown away.

    >>> splitFrag("abc#def")
    ('abc', 'def')

def splitFragP(uriref, punct=0):
    """split a URI reference before the fragment

    Punctuation is kept.

    >>> splitFragP("abc#def")
    ('abc', '#def')

Another important way that python is self-documenting is that it meets the unambiguity requirement: you can pick up any .py file and trace every identifier back to what it refers to by following your nose:

  • for local variables, normal static scoping rules work; just scan up and look for an assignment or a function parameter
  • for imported names, find the relavent import statement. from foo import * is evil, of course.
  • global variables are explicitly declared as such

OK, full disclosure: you need to know the python built-ins, and when you see paramx.methody(z), you have an unbounded search for methody, which makes doctests that show what class paramx comes from pretty important. Mapping from the relevant import statement to the corresponding .py file may involve the usual search path nightmares; python doesn't solve that. redfoot's red_import is interesting. And I'm not sure if eggs are a step in the right direction or the wrong direction; gotta study them more. I try to ground import statements in the web a la:

import MySQLdb # MySQL for Python
               # any Python Database API-happy implementation will do.

... so that you can follow your nose from the ImportError traceback to resolve the dependency.

Now timbl has started migrating the swap/cwm stuff to javascript. Let's look at uri.js:

/** return the protocol of a uri **/
function uri_protocol(uri) {

Thanks for trying to document each function, but that sort of comment isn't worthwhile; the risk that it'll get out of sync with the code is greater than the information it provides. Back to naming...

function URIjoin(given, base) {
	if (base<0) {alert("Invalid base URL "+ base); return given}

Where does alert come from? Is that in the ECMAScript standard? Or in some Netscape devedge stuff?

But more importantly: why not raise an exception? Javascript does have throw/catch, no? Is it not the norm to use them? As I argued in my contribution to the Python, Tcl and Perl, oh my! thread in 1996, the community norms are at least as important, if not more important, than the language features, when it comes to developing quality code.

I keep running into javascript and PHP code that I want to read and wishing for doctests because I can't figure out which end is up.

Whence comes kb in this bit from term.js? Do I face an unbounded search?

RDFFormula.prototype.fromNT = function(str) {
	var len = str.length
	var x
	if (str[0] == '<') return kb.sym(str.slice(1,len-1))

Maybe I just need to study the standard libraries a bit more, but I hear that the drupal project has coding conventions lots of people like for developing quality PHP code; I hope to study those. And the PEAR community must have some norms and best practices. I went looking for javascript testing stuff and ran into JSAN, a CPAN work-alike. That sort of infrastructure naturally reinforces quality norms.

See also: delicious tags , , , , .

Arpeggio in D, a little three chord ditty

Submitted by connolly on Mon, 2006-01-09 17:58. ::

I ran across Ping on improvisation the other day. It seems he learned to play mostly from sheet music. I mostly learned by playing in a music group at church. I learned a few chords in a classroom setting, but mostly, I would sit down in the church group and George or Rudy would lead the group and I would try my best to follow. My crowing achievement was one day when neither of them was available; it was just me and a gal on flute, and we pulled it off. Unlike my friends with real talent that learned to play better than I ever will in their first year, it took me at least five years to achieve that level of competence on folk guitar. I need to hear the song and see the chords before I can play it. My ear training is proceeding very, very slowly; it took me years to learn to tune my own guitar.

I have picked up several guitars over the years, but it was years of wishing before we got a piano for our house this year. It was out of tune enough that I could tell, and I had to leave it that way for a month while it settled. On the day of the tuning appointment, I was tidying up the piano room a bit and I couldn't help but sit down and plunk around a bit. The piano tuner came in and asked if I was the piano player in the house; I said no, not really; my son was taking lessons; I just fake it, using my basic three-chord guitar sense. I was relieved that he didn't sneer at this approach, but rather agreed that they should teach chord progressions and the like to beginning piano players. In Ray, there's a flashback that shows him learning more that way.

Anyway, I found Ping's piece on music just as I was running out of steam for technical work, so I headed down to the piano and worked on a few of the easier pieces of my guitar music. I goofed or got frustrated with one or something... and then I wandered into this 1-5-1 bass arpeggio* thing in D... first just I/IV I/IV... then before long the V chord (A) shows up... and after that got monotonous, a Bm bridge showed up. And then I could hear a melody in my head. I can't play well enough to do both the melody and the bass line at the same time, but going back and forth, I sorta worked it out: a bit of sheet music

On the one hand, it's so simple that it's sort of embarrassing to call it an original composition. But it's not every day that my muse visits me this way, and I'm so in the habit of sharing in the Web that I started thinking about all the issues around music markup in the Web.

I'm not talking about mp3 vs ogg; I'm talking about sharing something editable:

There are very few data formats I trust... when I use the computer to capture my knowledge, I pretty much stick to plain text, XML (esp XHTML, or at least HTML that tidy can turn into XHTML for me), RCS/CVS, and RFC822/MIME.

I use JPG, PNG, and PDF if I must, but not for capturing knowledge for exchange, revision, etc.

GarageBand is a blast; I'm really afraid of becoming addicted to it and locking up all my music in there. Version 2 has support for western music notation. Plus, it lets you record tracks separately and mix them. So I gave that a whirl; you can listen to arpeg-d.mpg, mistakes and all; but there doesn't seem to be any way to get the music notation out of GarageBand. The extraction of data created in GarageBand does not appear to be an easy task -- Dent du Midi FAQ.

This is not the first time I have been in this position; I wrote a few songs in college and transcribed them on my Macintosh SE circa 1988. When I recovered the files a couple years ago, I searched for a more modern format and found ABC music notation that is editable and convertable to postscript sheet music and MIDI; fortunately, the Studio Session format documentation survived and I could write a a python ditty to convert my data.

So tonight I captured Arpeggio in D for sharing:

... and a makefile to tie them together. I haven't really decided on the melody for the bridge, and abc2midi doesn't grok the bass cleff extension so the bass should sound two octaves lower. But there you have it.

* reading up on musical terminology, I see that I'm perhaps misusing the term arpeggio; it's really a broken chord.

see also: advogato item, notes on debian linux and music tools

On Google, Jabber, and Jingle and good and evil in IM and IP networks

Submitted by connolly on Tue, 2006-01-03 16:32. :: |

The 14 December jingle announcement gives a hint into google's approach to adding voice to their Google Talk offering. Actually, it gives quite a bit more than a hint; it comes with a jingle spec and an open source library implementation.

Google Talk has had pretty good "do no evil" karma since it started. The dominant commercial IM services (AOL/Yahoo/Microsoft) are each a world unto themselves. Your AIM screen name is just jim47 or whatever, not like an email address, and while clients like trillian and gaim can connect to them all, that's not something the big three encourage. Google Talk uses gmail addresses and the Jabber/XMPP protocol, which has the same network topology as email. While google isn't opening their service to actual server-to-server federation until they get a better handle on some operational issues (think: spam), they are using open protocols and they actively support gaim development.

Apple's iChat uses Jabber at some level too, though I haven't worked out the interoperability issues in practice. I think the last time I tried was before the Tiger release of OS X, when the Jabber support was much more under-the-covers.

The popularity of multi-protocol clients like gaim and trillian surprises me: after all, you can't have one chat room with AIM and MSN messenger users connected. Evidently this just not a big deal. "IRC and instant messaging are very different paradigms," says the Adium X: IRC Howto. I guess I'm just too old school to get it; in the internet relay chat usage that I'm used to, channels (aka chat rooms) are the norm and private channels are the exception. I gather IM is the other way around. I have played with Jabber's support for bridging to other networks, but I have yet to find a reliable combination of:

  • a jabber client with bridging support that I can figure out how to use
  • and either
    • server software with bridging support that I can figure out how to use, or
    • an existing service with bridging support that I can use and trust (since my credentials pass thru their service)

The Jabber protocol has lots of pieces and extensions an such. There's a whole JEP process, in addition to the XMPP process where jabber technology feeds into the IETF. I don't quite have my head around the whole thing. I discovered that there are older and newer protocols for doing chat rooms in jabber that don't mix well. I wonder which of them, if either, the IETF has endorsed. An XMPP summary shows JEP-0045 for Multi-User Chat but no RFC. And I don't see XMPP among IETF Working Groups any more. I wonder what's up. The xmppwg mailing list archives show pretty recent activity.

The $2.6Bn aquisition of skype by Ebay shows the value of networks of IM and voice users. Skype has a novel topology based on the same p2p designers that did Kazaa. As I understand it, they mostly use the p2p network for firewall traversal, which is the biggest problem, in practice, with deploying consumer voice chat. They keep the protocol details to themselves, though, and they have the only implementation, as a consequence. They have a centralized user directory too.

In my visit to the 62nd IETF in Minneapolis, MN, I learned what a sore spot firewall traversal is in Internet standardization. "Just use IPV6 and don't waste your time with those kludges" goes the one side; "but NAT works today" goes the other. Ugh. And since W3C started working more actively with developing countries, I hear more about the political aspects of IPV6. In the 1st world, we can dismiss claims that IPV4 addresses are running out as technically overblown, since we can afford to pay for the management fees and the NAT boxes. But the scarcity is a real economic issue in the developing world; plus it concentrates power in a way that engenders distrust.

Back to network topologies... the fact that Jabber has the same topology as conventional (SMTP) email means that it's subject to the same sorts of spam issues. I wonder if anybody has considered the IM2000 approach of redesigning the mail system as a pull delivery rather than as a push delivery system, so that recipients no longer bear the costs of receiving and storing unwanted messages. In an IM2000 world, senders have to hold still long enough to deliver a message, which makes it much easier to hold them accountable for nastiness.

upgrade to CivicSpace?

Submitted by connolly on Wed, 2005-12-21 18:05. :: |

Hmm... the list of CivicSpace modules looks pretty interesting; TinyMCE is already on my breadcrubms todo list. Maybe just grab them all at once?

The RSVP module looks interesting; does it work across sites somehow? or is it centralized?

Connecting DIG Student Projects to the MIT UROP listing

Submitted by connolly on Mon, 2005-12-19 00:51. :: | | |

A couple MIT students have found their way to the #dig channel and asked about UROPs during IAP. I'm still learning about student rhythms at MIT; I was never a student here; I got my degree at U.T. Austin. My ten years with W3C has exposed me to the terms UROP and IAP before, but I have paged most of it out. Let's refresh our cache, shall we?

The Independent Activities Period (IAP) is a special four week term at MIT that runs from the first week of January until the end of the month. IAP 2006 takes place from January 9 through February 3.

IAP overview

In UROP info for supervisors, I see there's a form for listing projects. Hey... it would be cool if the student projects category here in this blog were automatically syndicated via that form. A meta-student-project?

Meanwhile, we do have a few notes on student projects among our DIG info for MIT students.

I'm not sure how items syndicated from Danny/Eric via the WordPress plug-in can get categorized; I suppose we can do it manually, after-the-fact?

I see a bunch of UROP openings for this time of year. The Building Games to Acquire Commonsense Knowledge project looks cool.

NOTE: It is expected that UROP students are supervised in the laboratory at all times, per the Institute's "no working alone" policy .

UROP safety isses

Sounds a bit like a "no coding alone" policy that I've been pushing around W3C and DIG, since discovering the value of pair programming, or a variant of it.

Toward richtext syndicated feed

Submitted by connolly on Wed, 2005-12-14 12:25. :: | | |

Our RSS feed is plaintext, so when it's syndicated in Planet RDF and the like, there are no links or pictures or even paragraph breaks.

From #swig discussion, I gather that the state-of-the-art is to use nasty escaped markup, but I'm not up for that. The RDF Core WG didn't spend 18 months getting the details of parseType="Literal" right for nothing, did we?

I don't know if there are drupal modules available that Do The Right Thing, and due to my PHP angst I don't really want to know. But maybe there's a motivated student out there... ?

Go-Karting rush tainted by lack of OpenID for bug reporting about hypertext editing

Submitted by connolly on Sat, 2005-12-03 12:56. :: |

Go-Kart medal ceremony I'm writing to my brother about go-karting in Montreal last night, but OS X Mail editor doesn't grok hypertext. I'm tired of in-your-face URIs and poor-man's hypertext. The thunderbird editor groks hypertext but doesn't know that integrity is job one. I really should check that this bug is reported and report it if not, but I don't want to manage yet another bugzilla account. Oh for OpenID in bugzilla! I also want to make sure the lack of "treat this as plain text" in firefox is a reported bug.

SKOS, SIOC, and drupal taxonomy

Submitted by connolly on Mon, 2005-11-21 16:22. :: | |

I've been playing around with the drupal taxonomy module. Why doesn't XML2005 show up under conferences?

I wonder how to export the results as SKOS.
(see earlier noodling with SKOS)

I found the SIOC schema... it seems to create aliases for several widely-known terms. Ugh. I wonder why. update... the SIOC folks are aware of this and are discussing the relationship with foaf and skos. And it uses multiple domains as if the meaning was the disjunction, despite resolution of the issue of What should the semantics of multiple domain and range properties be?

XHTML for computer science research papers and bibliographies

Submitted by connolly on Thu, 2005-11-03 15:34. :: | | |

In a PAW project discussion of writing assignments for WWW2006, KR, etc., I asked that we use XHTML rather than LaTeX to collaborate on the papers.

The WWW2006 deadline is too soon to make the transition, but I took the source of one of the papers in development and translated it to XHTML in order to test my Transforming XHTML to LaTeX and BibTeX tools. Since the tools have only been tested on one project, of course they needed some tweaks. And they'll need some more for figures.

But I'm hopeful that it'll be cost-effective to do things this way.

Meanwhile, there's a cite-formats discussion in the microformat community. My work includes a microformat for bibliography stuff. I haven't figured out URIs for the properties nor converted it to RDF just yet, like I did for my old index of URI schemes and like we did for automating publication of W3C tech reports.

Reflecting blog structure into the Semantic Web with SIOC?

Submitted by connolly on Mon, 2005-10-31 13:18. :: | |

After ryanlee's cool hack to repost wordpress items to drupal via the blogger API, Eric M. asked if Ryan had seen SIOC. I took a quick look; I can't find an example that has all the details like the xmlns declarations worked out.

Syndicate content