Blog spam went out of control again; the only remedy I could find was a very big hammer: turn off the drupal comments module altogether and in doing so, unpublish all comments ever posted to this site. I suppose they're still in the database and could be published again, if we could separate them from the spam.
The drupal expertise in our group seems to have gone on to greener pastures. That prompted me to divest from my family business drupal installation and start a hosted wordpress site and makes me wonder how safe is stuff that I write here...
Any MIT students want to help this research group manage a community presence? Please get in touch.
As Simon Willison notes, OpenID solves the identity problem, not the trust problem. Meanwhile, FOAF and RDF are potential solutions to lots of problems but not yet actual solutions to very many. I think they go together like peanut butter and chocolate, creating a deliciously practical testbed for our Policy Aware Web research.
Our struggle to build a community is fairly typical:
- Oct 2005: breadcrumbs launches (and I wish for OpenID support)
- Dec 2005: Tim gets 400+ friendly comments on his first item.
- Jun 2006: Comments disabled due to overwhelming spam
... if you manage a social networking service, we strongly encourage you to embrace OpenID, hCard XFN, FOAF and the other open standards around data portability.
With that in mind, a suggestion to outsource to a centralized commercial blog spam filtering service seemed like a step in the wrong direction; we are the Decentralized Information Group after all; time to eat our own cooking!
The policy we have working right now is, roughly: you can comment on our blog if you're a friend of a friend of a member of the group.
In more detail, you can comment on our blog if:
- You can show ownership of a web page via the OpenID protocol.
- That web page is related by the foaf:openid property to a foaf:Person, and
- That foaf:Person is
- listed as a member of the DIG group in http://dig.csail.mit.edu/data, or
- related to a dig member by one or two foaf:knows links.
The implementation has two components so far:
- an enhancement to drupal's OpenID support to check a whitelist
- a FOAF crawler that generates a whitelist periodically
We're looking into policies such as You can comment if you're in a class taught by a DIG group member, but there are challenges reconciling policies protecting privacy of MIT students with this approach.
We're also interested in federating with other communities. The Advogato community is particuarly interesting because
- The DIG group is pretty into Open Source, the core value of advogato.
- Advogato's trust metric is designed to be robust in the face of spammers and seems to work well in practice.
So I'd like to be able to say You can comment on our blog if you're certified Journeyer or above in the Advogato community. Advogato has been exporting basic foaf:name and foaf:knows data since a Feb 2007 update, but they didn't export the results of the trust metric computation in RDF.
Asking for that data in RDF has been on my todo list for months, but when Sean Palmer found out about this OpenID and FOAF stuff, he sent an enhancement request, and Steven Rainwater joined the #swig channel to let us alpha test it in no time. Sean also did a nice write-up.
This is a perfect example of the sort of integration of statistical methods into the Semantic Web that we have been talking about as far back as our DAML proposal in 2000:
Now we just have to enhance our crawler to get that data or otherwise integrate it with the drupal whitelist. (I'm particularly interested in using GRDDL to get FOAF data right from the OpenID page; stay tuned for more on that.) And I guess we need Advogato to provide a user interface for foaf:openid support... or maybe links to supplementary FOAF files via rdfs:seeAlso or owl:sameAs.
Some of these systems use relatively simple and straightforward manipulation of well-characterized data, such as an access control system. Others, such as search engines, use wildly heuristic manipulations to reach less clearly justified but often extremely useful conclusions. In order to achieve its potential, the Semantic Web must provide a common interchange language bridging these diverse systems. Like HTML, the Semantic Web language should be basic enough that it does not impose an undue burden on the simplest web software systems, but powerful enough to allow more sophisticated components to use it to advantage as well.
I've added a new commenting policy to combat our OpenID-based spammers. It's a whitelist based on FOAF. There will be more about it in this space as development moves forward, so stay tuned if you'd like to know how to be placed on the whitelist.
As for specifics, the whitelist implementation is a Drupal module that reads a list of OpenIDs from an externally generated set every hour. The user's OpenID is checked against the whitelist at login time, and matches are allowed to proceed with account creation, commenting, etc.
- a UI for settings
- better database transaction flow, particularly for error handling
- viewable whitelist
Feel free to contact me if you're interested in the module portion of this equation for use with your own openid.module (again, it does none of the whitelist generation).
I've upgraded the Drupal installation so we can use the OpenID module. A few things learned in the process:
- The Drupal upgrade path requires incremental steps; to go from one minor version to another two numbers way means upgrading through every intermediate minor version until reaching the target. Earlier versions fail to be useful in 'knowing' which data model version the system is at, so an upgrade meant importing / guessing / dropping and repeating the cycle until I hit on the right one, which in itself was not an easy state to assess.
- The module administration page loads every module, which can cause memory issues resulting in a blank page. Removing unnecessary, unused modules helps.
- The JanRain OpenID 1.2.0 pear installation fails to install itself properly, requiring the moving of directories post-install.
- The OpenID module does not respect settings on account creation. I wrote some code to fix this.
But we're now at the latest Drupal version; and more on OpenID later.
Addendum: And apparently I needed to enable the legacy module. Thanks to those who pointed out the symptoms. 'I' in this case is solely Ryan, not Tim, Dan, or anybody else; send any of your issues with the upgrade my way.
Tim presented the tabulator to the W3C team today; see slides: Tabulator: AJAX Generic RDF browser.
The tabulator was sorta all over the floor when I tried to present it in Austin in September, but David Sheets put it back together in the last couple weeks. Yay David!
In particular, the support for viewing the HTTP data that you pick up by tabulating is working better than ever before. The HTTP vocabulary has URIs like http://dig.csail.mit.edu/2005/ajar/ajaw/httph#content-type. That seems like an interesting contribution to the WAI ER work on HTTP Vocabulary in RDF.
Note comments are disabled here in breadcrumbs until we figure out OpenID comment policies and drupal etc.. The tabulator issue tracker is probably a better place to report problems anyway. We don't have OpenID working there yet either, unfortunately, but we do support email callback based account setup.
breadcrumbs fell over, again, today. Disk full. Probably the spam database filled up with drek. Again. While googling for reports of similar problems, I discovered drupal 4.7 is out since May 1. They tout TimBL's blog in their release announcement. I wonder if they'd help us upgrade. Well, they do provide a video about upgrading. Maybe I'll find time to watch it.
Meanwhile, I discovered a couple interesting articles on the design/architecture of drupal and how PHP is used: Drupal Programming from an Object-Oriented Perspective and the toungue-in-cheek The Road to Drupal Hell.
I'm not sure how much of this I really want to know. As I said back in my october item on PHP angst, I'm mostly playing simple customer when it comes to drupal. But I'm having a hard time investing in technology that I don't know inside and out.
In a #swig discussion where I was considering Zope alternatives (the one-big-file design has lost its charm), it occurred to me that I have read (parts of) the source to most everything that currently backs my personal web site Zope, the python interpreter, libc, various bits of debian infrastructure, and the linux kernel. I wonder when that will become totally impractical, and I'll understand my web site no more than I understand my car.
Due to an overwhelming signal-noise ratio in the wrong direction, I've disabled all anonymous commenting. We've tried to use spam auto-classification, but the volume is so large and diverse that eventually everything looks like it might be spam, and it's back to square one.
Thanks for your direct participation and input; from now on, we'll be looking for alternatives to continuing these conversations across the web.
I just found this...
Flocks has a nice WYSIWYG blog post editor that's a marvel to use. It does the quoting and citation for me. It lets me drag images from the Web, or my flickr stream straight into the post. It just makes posting so much more fun.
Let's try linking... and list editing...
Let's try pictures...
Seems to work quite nicely!
Odd... posting works, but the posting dialog hangs.
Hmm... I'd rather use delicious than technorati as the base of my tags.
Blogged with Flock
I have been living in a textarea since I started this blog, always a little nervous, knowing that firefox doesn't know that integrity is job one. That is: Firefox doesn't guarantee to save all work, by default; I don't consider that a big bug; it's a browser, not an editor, after all. I outsourced bookmarking to delicious because that's knowledge capture, and I don't rely on my browser for that.
But as TimBL has been saying since at least as far back as his 1998 design issues note,
If you think surfing hypertext is cool, that's because you haven't tried writing it. If you have found your bookmarks/favorites have become a more and more important part of your life, that's because you have learned to put up with the simplest form of hypertext editing there is as a compromise.
I fixed some markup errors in various of my posts last night. They all showed up on planetrdf again.
Is drupal doing something buggy? Or is it pilot error? i.e. is there some way for me to tell drupal to change the updated date but not the... umm... whatever date planetrdf uses?