connolly's blog

DIG losing the battle with spammers again

Submitted by connolly on Tue, 2009-03-10 11:56. :: |

Blog spam went out of control again; the only remedy I could find was a very big hammer: turn off the drupal comments module altogether and in doing so, unpublish all comments ever posted to this site. I suppose they're still in the database and could be published again, if we could separate them from the spam.

The drupal expertise in our group seems to have gone on to greener pastures. That prompted me to divest from my family business drupal installation and start a hosted wordpress site and makes me wonder how safe is stuff that I write here...

Any MIT students want to help this research group manage a community presence? Please get in touch.

OpenID "Hello World" on apache still deep magic

Submitted by connolly on Thu, 2009-01-08 18:37. ::

I have a home movie that I just want to show to just a few friends around the Web. With OpenID, I should be able to just give my web server a list of my friends' pages, right?

I eventually found a README for mpopenid with just what I wanted:

PythonOption authorized-users "http://alice.com/ http://bob.com/"

But that wasn't on the top page of hits on a search for "apache OpenID". (Like most sites, mine runs on apache.) The top hit is mod_auth_openid, but its FAQ that says my use case isn't directly supported:

Is it possible to limit login to some users, like htaccess/htpasswd does?
No. ... If you want to restrict to specific users that span multiple identity providers, then OpenID probably isn't the authentication method you want. Note that you can always do whatever vetting you want using the REMOTE_USER CGI environment variable after a user authenticates.

So I installed the prerequisites for mpopenid: libapache2-mod-python and python-elementtree were straightforward, but I struggled to find a version of python-openid that matched. I almost gave up at that point, but heartened by somebody else who got mpopenid working, I went back to searching and found a launchpad development version of mpopenid. That seems to work with python-openid-1.1.0.

In /etc/apache2/sites-available/mysite, I have this bit that glues mpopenid's login page into my site:

<Location "/openid-test-aux">
SetHandler mod_python
PythonOption action-path "/openid-test-aux"
PythonHandler mpopenid::openid
</Location>

And in mysite/movies/.htaccess, this bit says only I get to see http://mysite.example/sekret:

<Files "sekret">
PythonAccessHandler mpopenid::protect
PythonOption authorized-users "http://www.w3.org/People/Connolly/"
</Files>

The mpopenid README also shows an option to put the list of pages in a separate file:

PythonOption authorized-users-list-url file:///my/directory/allowed-users.txt

But I haven't tried that yet. So far I'm happy to put the list right in the .htaccess file.

The details of data in documents; GRDDL, profiles, and HTML5

Submitted by connolly on Fri, 2008-08-22 14:09. :: | |
GRDDL, a mechanism for putting RDF data in XML/XHTML documents, is specified mostly at the XPath data model level. Some GRDDL software goes beyond XML and supports HTML as she are spoke, aka tag soup. HTML 5 is intended to standardize the connection between tag soup and XPath. The tidy use case for GRDDL anticipates that using HTML 5 concrete syntax rather than XHTML 1.x concrete syntax involves no changes at the XPath level

But in GRDDL and HTML5, Ian Hickson, editor of HTML 5, advocates dropping the profile attribute of the HTML head element in favor of rel="profile" or some such. I dropped by the #microformats channel to think out loud about this stuff, and Tantek said similarly, "we may solve this with rel="profile" anyway." The rel-profile topic in the microformats wiki shows the idea goes pretty far back.

Possibilities I see include:
  • GRDDL implementors add support for rel="profile" along with HTML 5 concrete syntax
  • GRDDL implementors don't change their code, so people who want to use GRDDL with HTML 5 features such as <video> stick to XML-wf-happy HTML 5 syntax and they use the head/@profile attribute anyway, despite what the HTML 5 spec says.
  • People who want to use GRDDL stick to XHTML 1.x.
  • People who want to put data in their HTML documents use RDFa.

I don't particularly care for the rel="profile" design, but one should choose ones battles and I'm not inclined to choose this one. I'm content for the market to choose.

 

sidekick calendar subscription for SXSW

Submitted by connolly on Sat, 2008-03-08 12:57. :: | |

At a conference, like in a good coding session, it's too easy to lose track of time, so I rely heavily on a PDA to remind me of appointments. The SXSW program has just the features I want:

  • an "add this to my calendar" button next to each session
  • a calendar feed of my choices

But I carry a hiptop, which doesn't support calendar subscription. I could copy-and-paste a few critical sessions to my hiptop, but when the climbing geeks offer an hCalendar feed, it becomes wortwhile to use iCal on the laptop, i.e. something that groks calendar subscription, as the master calendar device.

I have had a system for exporting my mobile calendar as a feed, but it's a tedious 4 step shell command sequence; it's OK once or twice a week, but here at SXSW, I want to sync up several times a day.

I have been moving my palmagent project from shell commands and Makefiles to a RESTful Web service, and this pushed me over the edge to add calendar feed support.

As usual, to pull the data from the hiptop's data servers:

  1. Make a directory to hold hiptop accounts and put it in hip_config.py:
    AccountsDir = "/Users/connolly/Desktop/danger-accts"
  2. Start hipwsgi.py running:
    pbjam:~/projects/palmagent$ python hipwsgi.py &
    Serving HTTP on 0.0.0.0 port 8080 ...
  3. Use dangerSync.py to log in and get some session credentials for half an hou of use:
    ~/Desktop/danger-accts/ACCT $ python ~/projects/palmagent/dangerSync.py \
    --prod --user ACCT \
    --passwd YOUR_PASSWORD_HERE \
    >session-id
  4. Visit http://0.0.0.0:8080/pim/ACCT and hit the Pull button.

Now you have event, task, contact, and note directories containing a JSON file for each record and hipwsgi.py lets you navigate them in a few different ways.

The pull feature is incremental; it grabs just the records that have changed since you previously pulled:

Pull majo from danger hiptop service

back to sync options

event

 

The new feature today is the ical export, linked from the event categories page:

event

back to sync options

 

You can copy the address of that ical export link and subscribe to it from iCal, and bingo, there it is, merged with the SXSW calendar and such.

@@screenshot pending 

 

hAudio for microformats mixtapes, in progress

Submitted by connolly on Thu, 2008-03-06 17:00. ::

I was visiting a friend and I wanted to play Back When I Could Fly and the easiest way was to burn a CD and put it in their CD player and while I was at it I figured I might as well pick a few other songs... a sort of mixtape to say thanks for letting me crash there.

That sort of artifact is too precious to leave locked up in iTunes's proprietary format, even if it is XML; as I said in a July 2000 message

There are very few data formats I trust... when I use
the computer to capture my knowledge, I pretty
much stick to plain text, [X]HTML, and email. I use JPG, PNG, and PDF if I must,
but not for capturing knowledge for exchange, revision, etc.


So I wrote itunekb.py, which reads the iTunes data, picks out one playlist, and writes it out in hAudio format using a genshi template. The result is ordinary HTML at one level:

  1. Poems, Prayers And Promises by John Denver
    4:06 from A Song's Best Friend: The Very Best Of John Denver [Disc 1] (2004)
  2. Did You Feel The Mountains Tremble by Delirious?
    4:42 from WOW Worship: Orange (Disc 1) (2000)
  3. The Reason by Hoobastank
    3:52 from The Reason (2003)
  4. Back When I Could Fly by Trout Fishing In America
    3:29 from Family Music Party (1998)
  5. ...

At another level, it's yummy Semantic Web data.

Oops! Well, it used to be; but hAudio seems to be changing:

Here's hoping I find time to catch up.

I can only imagine...

Submitted by connolly on Sun, 2007-12-09 09:30. :: | | | |

I have a new bookmark. No, not a del.icio.us bookmark; not some bits in a file. This is the kind you have to go there to get.. go to Cleveland, that is. It reads:

Thank you
for you love & support
for the Ikpia & Ogbuji families
At this time of real need.
We will never forget
Imose, Chica, & Anya

Imose, Chica, & Anya

Abundant Life International Church
Highland heights, OH

After working with Chime for a year or so on the GRDDL Working Group (he was the difference between a hodge-podge of test files and a nicely organized GRDDL Test Cases technical report), I was really excited to meet him at the W3C Technical Plenary in Cambridge in early November. His Fuxi work is one of the best implementations of the way I think semantic web rules and proofs should go. When he told me some people didn't see the practical applications, it made me want to fly there and tell them how I think this will save lives and end world hunger.

So this past Tuesday, when I read the news about his family, the only way I could make my peace with it was to go and be with him. I can only imagine what he is going through. Eric Miller and Brian and David drove me to the funeral, but the line to say hi to the family was too long. And the internment service didn't really provide an opportunity to talk. So I was really glad that after I filled my plate at the reception, a seat across from Chime and Roschelle opened up for me and I got to sit and share a meal with them.

Grandpa Linus was at the table, too. His eulogy earlier at the funeral ended with the most inspiring spoken rendition of a song that I have ever heard:

Now The Hacienda's Dark The Town Is Sleeping
Now The Time Has Come To Part The Time For Weeping
Vaya Con Dios My Darling
Vaya Con Dios My Love

His eulogy is also posted as Grandparents' lament on The Kingdom Kids web site, along with details about a fund to help the family get back on their feet.

Free Culture: Why buy the Amazon Kindle when you can give and get an OLPC XO-1 for the same price?

Submitted by connolly on Tue, 2007-11-20 02:11. :: |

I just discovered Kindle: Amazon's New Wireless Reading Device. About $10 per e-book sounds ok, but $0.10 to put my own files on it?!?! It can read blogs like Slashdot and boingboing for as little as $.99 per month over the $399 purchase price. It comes with wikipedia. Say... that sounds familiar... where else can I get wikipedia on a device with a nice display that works in daylight...

Oh yeah! The OLPC XO-1. For the same $400 (+ shipping) you can get one and give one away.

Håkon brought one to the video panel at the W3C TPAC this month, while the voice of Lawrence Lessig was still ringing in my head: What have we done about it? he asked again and again in his powerful OSCON 2002 talk:

Lawrence Lessig: I have been doing this for about two years--more than 100 of these gigs. This is about the last one. One more and it's over for me. So I figured I wanted to write a song to end it. But then I realized I don't sing and I can't write music. But I came up with the refrain, at least, right? This captures the point. If you understand this refrain, you're gonna' understand everything I want to say to you today. It has four parts:

  • Creativity and innovation always builds on the past.

  • The past always tries to control the creativity that builds upon it.

  • Free societies enable the future by limiting this power of the past.

  • Ours is less and less a free society.

 

I don't sing all that well either, but I play a little guitar, so when Håkon walked into the HTML WG meeting as un-conference pitches were next on the agenda, I pitched a jam session. I dedicated the opening number,With a Little Help from My Friends, to Sam Ruby whose comment prompted me to watch the Lessig show before the trip. The InstantGig was "surreal (but awesome)" according to one account.

Håkon's pitch for open standard video for our cultural heritage inspired One laptop per Kyle, the story of getting an XO-1 for my 8-year-old boy instead of the Windows PC he says he want in order to play the games that his friends all play. Before the trip he told me that he wants to build a web site with lots and lots of games and I thought "but you're just one little boy." But I think I get it now...

He has a new name, by the way: Burn, as in Rip, Mix, an Burn. Rip, after 1 year of musical training, can sound out the Mario theme on trombone or piano in an afternoon, something I can't do after 20 years of training my mediocre ear. And the middle child, Mix, is so charming that if you stop at a red light, he'll have a new friend before the light turns green.

I have one give-one-get-one package on order for Burn; if you're feeling like a patron of the arts and you want to see what happens if Rip and Mix get one too, feel free to send us a Christmas Card with a little something inside.

And look out for SwordPedestal.com, which Kyle picked out. It's only a dream now, but I have a hunch it may one day rival Nintendo for the hearts and minds of a few million people.

brainstorming, issue tracking, and problem reporting... with tabulator?

Submitted by connolly on Mon, 2007-11-05 02:07. ::
I want to brainstorm about a bunch of issues in preparation for the HTML WG meeting this week. tracker's forms interface won't let me enter an issue until I fill out all the details... well, it doesn't really have that many constraints, except that it notifies the whole WG when an issue is added.

I want something more like the OmniOutliner experience... I want to brainstorm.

But when I'm done, I don't want to tediously copy and paste each field into tracker.

Clearly, I could write some python or XSLT to take OmniOutliner's XML and feed it to tracker afterward, but... can't we do better than that?

What if tabulator's UI were as smooth as OmniOutliner... and what if I could just push one button and get the toothpaste back in the tube, i.e. feed the outline into the tracker's REST interface?

p.s. why am I using emacs to write this? Apple Mail knows IntegrityIsJobOne, but in OS X 10.4, like iCal, it goes off into the weeds eating CPU for inexplicable reasons, and I don't invest debugging effort in stuff that isn't open source.

How do I feed this to breadcrumbs now? Does emacs have a markdown/ReStructuredText mode?
How about AtomPub support? I manually cut and pasted and cleaned up the line breaks. ugh!

I use Thunderbird on my PowerBook, but it's totally confused about offline operation. It goes to save to the drafts folder every now and again, but over IMAP... so if the net is flakey or down, (a) it doesn't actually save, and (b) it interrupts my drafting!!! OK... found the config option to use a local drafts folder under Tools/Account settings. (why not under preferences?) But Thunderbird doesn't do well filling the IMAP cache; I don't know to tell it to go offline until I've left the airport wifi, and at that point, it's too late to grab the mail I want to read. The Apple mailer does much better at using idle time to prefetch.

p.p.s. how do I use hReview and GRDDL to make the data in this gripe available as if it were a bugzilla entry? More on that to follow, I hope...

FOAF and OpenID: two great tastes that taste great together

Submitted by connolly on Wed, 2007-10-24 23:00. :: | |

As Simon Willison notes, OpenID solves the identity problem, not the trust problem. Meanwhile, FOAF and RDF are potential solutions to lots of problems but not yet actual solutions to very many. I think they go together like peanut butter and chocolate, creating a deliciously practical testbed for our Policy Aware Web research.

Our struggle to build a community is fairly typical:

In Dec 2006, Ryan did a Drupal upgrade that included OpenID support, but that only held the spammers back for a couple weeks. Meanwhile, Six Apart is Opening the Social Graph:

 

... if you manage a social networking service, we strongly encourage you to embrace OpenID, hCard XFN, FOAF and the other open standards around data portability.

With that in mind, a suggestion to outsource to a centralized commercial blog spam filtering service seemed like a step in the wrong direction; we are the Decentralized Information Group after all; time to eat our own cooking!

The policy we have working right now is, roughly: you can comment on our blog if you're a friend of a friend of a member of the group.

In more detail, you can comment on our blog if:

  1. You can show ownership of a web page via the OpenID protocol.
  2. That web page is related by the foaf:openid property to a foaf:Person, and
  3. That foaf:Person is
    1. listed as a member of the DIG group in http://dig.csail.mit.edu/data, or
    2. related to a dig member by one or two foaf:knows links.

The implementation has two components so far:

  • an enhancement to drupal's OpenID support to check a whitelist
  • a FOAF crawler that generates a whitelist periodically

We're looking into policies such as You can comment if you're in a class taught by a DIG group member, but there are challenges reconciling policies protecting privacy of MIT students with this approach.

We're also interested in federating with other communities. The Advogato community is particuarly interesting because

  1. The DIG group is pretty into Open Source, the core value of advogato.
  2. Advogato's trust metric is designed to be robust in the face of spammers and seems to work well in practice.

So I'd like to be able to say You can comment on our blog if you're certified Journeyer or above in the Advogato community. Advogato has been exporting basic foaf:name and foaf:knows data since a Feb 2007 update, but they didn't export the results of the trust metric computation in RDF.

Asking for that data in RDF has been on my todo list for months, but when Sean Palmer found out about this OpenID and FOAF stuff, he sent an enhancement request, and Steven Rainwater joined the #swig channel to let us alpha test it in no time. Sean also did a nice write-up.

This is a perfect example of the sort of integration of statistical methods into the Semantic Web that we have been talking about as far back as our DAML proposal in 2000:

Some of these systems use relatively simple and straightforward manipulation of well-characterized data, such as an access control system. Others, such as search engines, use wildly heuristic manipulations to reach less clearly justified but often extremely useful conclusions. In order to achieve its potential, the Semantic Web must provide a common interchange language bridging these diverse systems. Like HTML, the Semantic Web language should be basic enough that it does not impose an undue burden on the simplest web software systems, but powerful enough to allow more sophisticated components to use it to advantage as well.

Now we just have to enhance our crawler to get that data or otherwise integrate it with the drupal whitelist. (I'm particularly interested in using GRDDL to get FOAF data right from the OpenID page; stay tuned for more on that.) And I guess we need Advogato to provide a user interface for foaf:openid support... or maybe links to supplementary FOAF files via rdfs:seeAlso or owl:sameAs.

Soccer schedules, flight itineraries, timezones, and python web frameworks

Submitted by connolly on Wed, 2007-09-12 17:17. :: | | | |

The schedule for this fall soccer season came out August 11th. I got the itinerary for the trip I'm about to take on July 26. But I just now got them synchronized with the family calendar.

The soccer league publishes the schedule in somewhat reasonable HTML; to get that into my sidekick, I have a Makefile that does these steps:

  1. Use tidy to make the markup well-formed.
  2. Use 100 lines of XSLT (soccer-schedfix.xsl) to add hCalendar markup.
  3. Use glean-hcal.xsl to get RDF calendar data.
  4. Use hipAgent.py to upload the calendar items via XMLRPC to the danger/t-mobile service, which magically updates the sidekick device.

But oops! The timezones come out wrong. Ugh... manually fix the times of 12 soccer games... better than manually keying in all the data... then sync with the family calendar. My usual calendar sync Makefile does the following:

  1. Use dangerSync.py to download the calendar and task data via XMLRPC.
  2. Use hipsrv.py to filter by category=family, convert from danger/sidekick/hiptop conventions to iCalendar standard conventions pour the records into a kid template to produce RDF Calendar (and hCalendar).
  3. Use toIcal.py to convert RDF Calendar to .ics format.
  4. Upload to family WebDAV server using curl.

Then check the results on my mac to make sure that when my wife refreshes her iCal subscriptions it will look right.

Oh no! The timezones are wrong again!

The sidekick has no visible support for timezones, but the start_time and end_time fields in the XMLRPC interface are in Z/UTC time, and there's a timezone field. However, after years with this device, I'm still mystified about how it works. The Makefiles approach is not conducive to tinkering at this level, so I worked on my REST interface, hipwsgi.py until it had crude support for editing records (using JSON syntax in a form field). What I discovered is that once you post an event record with a mixed up timezone, there's no way to fix it. When you use the device UI to change the start time, it looks OK, but the Z time via XMLRPC is then wrong.

So I deleted all the soccer game records, carefully factored the danger/iCalendar conversion code out of hipAgent.py into calitems.py for ease of testing, and goit it working for local Chicago-time events.

Then I went through the whole story again with my itinerary. Just replace tidy and soccer-schedfix.xsl with flightCal.py to get the itinerary from SABRE's text format to hCalendar:

  1. Upload itinerary to the sidekick.
  2. Manually fix the times.
  3. Sync with iCal. Bzzt. Off by several hours.
  4. Delete the flights from the sidekick.
  5. Work on calitems.py some more.
  6. Upload to the sidekick again. Ignore the sidekick display, which is right for the parts of the itinerary in Chicago, but wrong for the others.
  7. Sync with iCal. Win!

I suppose I'm resigned that the only way to get the XMLRPC POST/upload right (the stored Z times, at least, if not the display) is to know what timezone the device is set to when the POST occurs. Sigh.

A March 2005 review corroborates my findings:

The Sidekick and the sync software do not seem to be aware of time zones. That means that your PC and your Sidekick have to be configured for the same time zone when they synchronize, else your appointments will be all wrong.

 

 hipwsgi.py is about my 5th iteration on this idea of a web server interface to my PDA data. It uses WSGI and JSON and Genshi, following Joe G's stuff. Previous itertions include:

  1. pdkb.pl - quick n dirty perl hack (started April 2001)
  2. hipAgent.py - screen scraping (Dec 2002)
  3. dangerSync.py - XMLRPC with a python shelf and hardcoded RDF/XML output (Feb 2004)
  4. hipsrv.py - conversion logic in python with kid templates and SPARQL-like filters over JSON-shaped data (March 2006)
It's pretty raw right now, but fleshing out the details looks like fun. Wish me luck.
Syndicate content