blogs
<span style="display: none">Important New Jersey Supreme Court decision in Internet privacy</span>
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
The New Jersey Supreme Court (State of New Jersey v. Shirley Reid (A-105-06)) has issued an important decision on Internet users’ right to privacy. The case involves a dispute about whether an ISP violated a user’s privacy rights by turning over subscriber information (name, address, billing details) associated with a particular IP address. It ends up the that subpoena served on the ISP was invalid for a variety of reasons. As the user had a ‘reasonable expectation of privacy’ in her Internet activities and identifying information, and because the subpoena served on the ISP was invalid, the New Jersey court determined that the ISP should not have turned over the personal data.
The important aspect of this case in the evolving understanding of privacy on the Internet is the court’s recognition that we must look at privacy from the broad perspective of what can actually be discovered about people online. In this way, the ruling has significant strengths and weaknesses from a privacy perspective. On the one hand, the court finds that there is, today, an expectation of privacy in IP addresses because they are currently hard to link to personal identity. There have been lots of disputes in the US and the EU about whether IP addresses are ‘personally identifying information.’ (”PII” in the jargon of privacy.) This court takes a pragmatic view of this question and finds that IP addresses should be considered private for now, but that this may change. The court finds:
the reasonableness of the privacy interest may change as technology evolves. A reasonable expectation of privacy is required to establish a protected privacy interest…. Internet users today enjoy relatively complete IP address anonymity when surfing the Web. Given the current state of technology, the dynamic, temporarily assigned, numerical IP address cannot be matched to an individual user without the help of an ISP. Therefore, we accept as reasonable the expectation that one’s identity will not be discovered through a string of numbers left behind on a website.
The availability of IP Address Locator Websites has not altered that expectation because they reveal the name and address of service providers but not individual users. Should that reality change over time, the reasonableness of the expectation of privacy in Internet subscriber information might change as well. For example, if one day new software allowed individuals to type IP addresses into a “reverse directory” and identify the name of a user — as is possible with reverse telephone directories — today’s ruling might need to be reexamined.
Others have written about the legal details of this case and have suggested that it is a big win for privacy. Given the reliance on the shifting state of identity technology, I’m a little less sanguine.
This case is yet another reason why I believe (as I’ve explained elsewhere) that meaningful privacy on the Web requires rules the govern how personal information is used, not just what can be collected. Under the court’s reasoning, as our lives become more and more transparent, that would justify increasing harmful use of personal data. While it’s pretty hard to control how exposed we are all become, we still can limit how powerful institutions (governments, etc.) use personal data about us.
<span style="display: none">Bob Metcalfe’s wisdom on patents and innovation</span>
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Ethernet inventor, journalist and now venture capitalist Bob Metcalfe speaks on the lessons from the Internet community for the global warming arena. In looking at how to accelerate technical innovation to address climate change, Metcalfe asserts that:
“… the place to do research is in university labs. “The best vehicle for technology innovation is not patents, it’s students.”
Of course, Bob also manages to express is distain for monopoly, Bell Labs, and even Al Gore. (See report by Martin LaMonica.) I’m not sure about those but think he’s right on with respect to patents.
<span style="display: none">On meetings</span>
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Ever the astute observer of the various features and bugs of our collective behavior, a longtime mentor of mine, Mitch Kapor, has coined a new defintion:
Meetingboarding: (n) the sensation of being unable to breathe arising from continuous immersion in meeting after meeting
I’d add to this a characterization of email that I learned from Mitch many years ago:
The problem with email is that it has low emotional bandwidth.
-Mitch Kapor, circa 1991
Semantic Web in the news
Well, the Semantic Web has been in the news a bit recently.
There was the buzz about Twine, a "Semantic Web company", getting another round of funding. Then, Yahoo announced that it will pick up Semantic Web information from the Web, and use it to enhance search. And now the Times online mis-states that I think "Google could be superseded". Sigh. In an otherwise useful discussion largely about what the Semantic Web is and how it will affect people, a misunderstanding which ended up being the title of the blog. In fact, the conversation as I recall started with a question whether, if search engines were the killer app for the familiar Web of documents, what will be the killer app for the Semantic Web.
Text search engines are of course good for searching the text in documents, but the Semantic Web isn't text documents, it is data. It isn't obvious what the killer apps will be - there are many contenders. We know that the sort of query you do on data is different: the SPARQL standard defines a query protocol which allows application builders to query remote data stores. So that is one sort of query on data which is different from text search.
One thing to always remember is that the Web of the future will have BOTH documents and data. The Semantic Web will not supersede the current Web. They will coexist. The techniques for searching and surfing the different aspects will be different but will connect. Text search engines don't have to go out of fashion.
The "Google will be superseded" headline is an unfortunate misunderstanding. I didn't say it. (We have, by the way, asked it to be fixed. One can, after all, update a blog to fix errors, and this should be appropriate. Ian Jacobs wrote an email, left voice mail, and tried to post a reply to the blog, but the reply did not appear on the blog - moderated out? So we tried.)
Now of course, as the name of The Times was once associated with a creditable and independent newspaper :-), the headline was picked up and elaborated on by various well-meaning bloggers. So the blogosphere, which one might hope to be the great safety net under the conventional press, in this case just amplified the error.
I note that here the blogosphere was misled by an online version of a conventional organ. There are many who worry about the inverse, that decent material from established sources will be drowned beneath a tide of low-quality information from less creditable sources.
The Media Standards Trust is a group which has been working with the Web Science Research Initiative (I'm a director of WSRI) to develop ways of encoding the standards of reporting a piece of information purports to meet: "This is an eye-witness report"; or "This photo has not been massaged apart from: cropping"; or "The author of the report has no commercial connection with any products described"; and so on. Like creative commons, which lets you mark your work with a licence, the project involves representing social dimensions of information. And it is another Semantic Web application.
In all this Semantic Web news, though, the proof of the pudding is in the eating. The benefit of the Semantic Web is that data may be re-used in ways unexpected by the original publisher. That is the value added. So when a Semantic Web start-up either feeds data to others who reuse it in interesting ways, or itself uses data produced by others, then we start to see the value of each bit increased through the network effect.
So if you are a VC funder or a journalist and some project is being sold to you as a Semantic Web project, ask how it gets extra re-use of data, by people who would not normally have access to it, or in ways for which it was not originally designed. Does it use standards? Is it available in RDF? Is there a SPARQL server?
A great example of Semantic Web data which works this way is Linked Data. There is growing mass of interlinked public data much of it promoted by the Linked Open Data project. There is an upcoming Linked Data workshop on this at the WWW 2008 Conference in April in Beijing, and in June 17-18 in New York at the Linked Data Planet Conference. Linked data comes alive when you explore it with a generic data browser like the Tabulator. It also comes alive when you make mashups out of it. (See Playing with Linked Data, Jamendo, Geonames, Slashfacet and Songbird ; Using Wikipedia as a database). It should be easier to make those mashups by just pulling RDF (maybe using RDFa or GRDDL) or using SPARQL, rather than having to learn a new set of APIs for each site and each application area.
I think there is an important "double bus" architecture here, in which there are separate markets for the raw data and for the mashed up data. Data publishers (e.g., government departments) just produce raw data now, and consumer-facing sites (e.g., soccer sites) mash up data from many sources. I might talk about this a bit at WWW 2008.
So in scanning new Semantic Web news, I'll be looking out for re-use of data. The momentum around Linked Open Data is great and exciting -- let us also make sure we make good use of the data.
<span style="display: none">Today - NPR Science Friday program on Web privacy issues</span>
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
National Public Radio’s Science Friday program will feature a discussion of online privacy with Alessandro Acquisti of CMU and yours truly a little later today. It’s live from 3:00 - 4:00 pm Eastern/US, rebroadcast at various times depending on where you live, and streamed on the Web.
Listen it. Call and challenge other listeners to think about the privacy questions raised by the Semantic Web!
Update: the broadcast is streamed at this link.
<span style="display: none">Transparency for behavioral profiling</span>
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Behavioral targeting is pervasive on the Web. As documented by a very nicely-researched New York Time story today (’To Aim Ads, Web Is Keeping Closer Eye on You,’ NYT, by Louise Story, 10 March 2008.) it’s now clear that each of us who use popular search engines and portals are the subject of thousands of individual data collection events per month of Web usage.
I’m glad to see some clear analysis of the practice out there but would like to see an additional level of transparency. If it is the case that profiling is benign, then why not tell uses what aspect of their profile triggered the placement of a particular ad. The ad delivery systems all make decisions about which ads to place for a given user from some properties of that user that are either known or inferred. Why not just tell us what those properties are along with the add placement. This would go a long way toward eliminating the feeling that we’re being ’spied on’ because it would eliminate any sense of secrecy about what is learned in the course of the behavioral monitoring. My guess is that many people would ignore the profile data, but some would check it, and we’d all have piece of mind from knowing that whatever is being done is happening out in the open.
According to the Times, data is collected on which web pages we look at and is then combined with other data (demographics, browsing history, purchases on partner sites, etc.). Right on cue traditional privacy advocates declare that profiles developed in this way (based on our behavior) do (or should) make us feel uneasy:
“When you start to get into the details, it’s scarier than you might suspect,” said Marc Rotenberg, executive director of the Electronic Privacy Information Center, a privacy rights group. “We’re recording preferences, hopes, worries and fears.”
No doubt people (as least some people) feel alarmed about this and probably others are either implicitly or explicitly happy to have the right ads targeted to them. As an online ad agency exec said in the article:
“Everyone feels that if we can get more data, we could put ads in front of people who are interested in them,” he said. “That’s the whole idea here: put dog food ads in front of people who have dogs.”
Unless were going to require an outright ban on this sort of behavioral targeting, the question what to do about it. Is the goal to allay people’s fears? To limit the use of the profiles? Or to help people avoid incorrect targeting?
The statistics developed by comScore for the New York Times article do a nice job of illustrating the magnitude of data collection that happens. Jules Polonetsky, AOL’s Chief Privacy Officer, is launching a new consumer education campaign to explain the mechanics of data collection and tracking to users. The light that both the Times stories and the AOL campaign shed on marketing practices is valuable.
Many people are going to far more interested in how this profiling actually effects them, than on the overall magnitude of the practice. Is there any reason not to be upfront with people about the basis for delivering an ad? If there is, then there is reason to feel that we’re being deceived or maniplated, not assisted, by the behavior tracking techniques.
sidekick calendar subscription for SXSW
At a conference, like in a good coding session, it's too easy to lose track of time, so I rely heavily on a PDA to remind me of appointments. The SXSW program has just the features I want:
- an "add this to my calendar" button next to each session
- a calendar feed of my choices
But I carry a hiptop, which doesn't support calendar subscription. I could copy-and-paste a few critical sessions to my hiptop, but when the climbing geeks offer an hCalendar feed, it becomes wortwhile to use iCal on the laptop, i.e. something that groks calendar subscription, as the master calendar device.
I have had a system for exporting my mobile calendar as a feed, but it's a tedious 4 step shell command sequence; it's OK once or twice a week, but here at SXSW, I want to sync up several times a day.
I have been moving my palmagent project from shell commands and Makefiles to a RESTful Web service, and this pushed me over the edge to add calendar feed support.
As usual, to pull the data from the hiptop's data servers:
- Make a directory to hold hiptop accounts and put it in hip_config.py:
AccountsDir = "/Users/connolly/Desktop/danger-accts"
- Start hipwsgi.py running:
pbjam:~/projects/palmagent$ python hipwsgi.py &
Serving HTTP on 0.0.0.0 port 8080 ... - Use dangerSync.py to log in and get some session credentials for half an hou of use:
~/Desktop/danger-accts/ACCT $ python ~/projects/palmagent/dangerSync.py \
--prod --user ACCT \
--passwd YOUR_PASSWORD_HERE \
>session-id - Visit http://0.0.0.0:8080/pim/ACCT and hit the Pull button.
Now you have event, task, contact, and note directories containing a JSON file for each record and hipwsgi.py lets you navigate them in a few different ways.
The pull feature is incremental; it grabs just the records that have changed since you previously pulled:
Pull majo from danger hiptop service
back to sync options
event
anchor: 1204914757247
The new feature today is the ical export, linked from the event categories page:
event
back to sync options
You can copy the address of that ical export link and subscribe to it from iCal, and bingo, there it is, merged with the SXSW calendar and such.
@@screenshot pending
hAudio for microformats mixtapes, in progress
I was visiting a friend and I wanted to play Back When I Could Fly and the easiest way was to burn a CD and put it in their CD player and while I was at it I figured I might as well pick a few other songs... a sort of mixtape to say thanks for letting me crash there.
That sort of artifact is too precious to leave locked up in iTunes's proprietary format, even if it is XML; as I said in a July 2000 message:
There are very few data formats I trust... when I use
the computer to capture my knowledge, I pretty
much stick to plain text, [X]HTML, and email. I use JPG, PNG, and PDF if I must,
but not for capturing knowledge for exchange, revision, etc.
So I wrote itunekb.py, which reads the iTunes data, picks out one playlist, and writes it out in hAudio format using a genshi template. The result is ordinary HTML at one level:
- Poems, Prayers And Promises by John Denver
4:06 from A Song's Best Friend: The Very Best Of John Denver [Disc 1] (2004)- Did You Feel The Mountains Tremble by Delirious?
4:42 from WOW Worship: Orange (Disc 1) (2000)- The Reason by Hoobastank
3:52 from The Reason (2003)- Back When I Could Fly by Trout Fishing In America
3:29 from Family Music Party (1998)- ...
At another level, it's yummy Semantic Web data.
Oops! Well, it used to be; but hAudio seems to be changing:
- 00:16, 12 Feb 2008 ManuSporny (Replaced FN with TITLE per the microformats-new mailing list discussion)
Here's hoping I find time to catch up.
The political power of (simple) Web computing
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
It’s pretty amazing what a little bit of structured computer power can do when deployed on the Web. Slate’s Delegate Calculator puts in the hands of Web-enabled citizens some simple computing power that helps us to understand how the delegate counts in the upcoming Democratic primaries may effect the final outcome for Obama and Clinton over the next hours, weeks and months. The knowledge about which states have how many delegates, how they might be apportioned, etc., is information that used to be a closely guarded secret of the political intelligencia and the press. How, it’s out there for all of us to see. It’s such a useful tool that many reporters from other publications are actually writing about it:
Jonathan Alter, Hillary’s Math Problem, Newsweek (4 March 2008)
Peter Baker, Clinton Down, but not Out, for the Count, Washington Post.
Jason Tuohey, Delegate Counter, Boston Globe
Carol Lockhead, Obama Wins Vermont, But Look at the Math, San Francisco Chronicle.
Granted, Slate has a relationship with some of those new outlets, but it’s still striking to see computing make the political news.
Important FCC hearing on Net Neutrality in Cambridge, MA
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
I’d encourage anyone in or around the Boston, MA area to come to the Federal Communications Commission’s field hearing on Broadband Network Management Practices. I’ll be testifying along with a range of witnesses, Dave Clark and David Reed (colleagues from MIT), representatives from various commercial groups, and a number of advocacy organizations such as Free Press. I understand Congressman Ed Markey, a longtime champion of the Internet and the Web, will also be appearing.
Here are the logistical details:
Monday, Feb 25, 2008
11:00 a.m. to 4:00 p.m.
Harvard Law School, Ames Courtroom, Austin Hall
1515 Massachusetts Avenue, Cambridge, Mass.

