Danny Weitzner's blog
Google, Viacom, Privacy and Copyright meet the social web
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
In all the recent uproar (New York Times, “Google Told to Turn Over User Data of YouTube,” Michael Helft, 4 July 2008) about the fact that Google has been forced to turn over a large pile of personally-identifiable information to Viacom as part of a copyright dispute (Opinion), there is a really interesting angle pointed out by Dan Brickley (co-creator of FOAF and general Semantic Web troublemaker). Dan points out in a blog entry today that while the parties before the court are arguing about whether the YouTube ID is, by itself, personally identifiable information, the fact is that the publicly visible part of this ID in the context of other information on the Web is sufficient to identify a lot about a person, not the least of which is their name. Dan explains:
YouTube users who have linked their YouTube account URLs from other social Web sites (something sites like FriendFeed and MyBlogLog actively encourage), are no longer anonymous on YouTube. This is their choice. It can give them a mechanism for sharing ‘favourited’ videos with a wide circle of friends, without those friends needing logins on YouTube or other Google services. This clearly has business value for YouTube and similar ’social video’ services, as well as for users and Social Web aggregators.
Given such a trend towards increased cross-site profile linkage, it is unfortunate to read that YouTube identifiers are being presented as essentially anonymous IDs: this is clearly not the case. If you know my YouTube ID ‘modanbri’ you can quite easily find out a lot more about me, and certainly enough to find out with strong probability my real world identity. As I say, this is my conscious choice as a YouTube user; had I wanted to be (more) anonymous, I would have behaved differently. To understand YouTube IDs as being anonymous accounts is to radically misunderstand the nature of the modern Web.
Dan makes a really important point here. One the on hand, the fact that we are all more identifiable as a result of social networks in which we exist suggests that the judge was just plain wrong (even wronger than others have already said) in saying that the YouTube IDs are not personally-identifiable. But on the other hand, to the extent that Dan is correct about the revealing nature of the social web (true for some of us now, more and more in the future), we have to face the fact that merely limiting disclosure of personal information from one source is less and less unlikely to protect privacy effectively across the Web.
Applying this view to the Viacom v. YouTube case suggests that privacy protection has to focus more limiting how people and institutions can *use* personal information even as we recognize that it is harder and harder to protect privacy by access control alone.
Some of my colleagues and I have written about this view of privacy as Information Accountability in last month’s Communications of the ACM.
A Political Denial of Service (PDOS) attack on blogger.com?
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
A little transparency would go a long way toward helping keep online political discourse open, especially in the particular corner of the blogosphere run by Google (ie. blogger.com). The Herald Tribune (Bloggers take aim at Google - International Herald Tribune) reports on a controversy involving pro-Clinton blogs that might have been blocked as spam due to what we might call a PDOS (Political Denial of Service Attack) in a skirmish between Obama and Clinton partisans. The IHT asks:
Was Google’s network of online services manipulated to silence critics of Barack Obama? That was the question buzzing on a corner of the blogosphere over the past few days, after several anti-Obama bloggers were unable to update their sites, which are hosted on Googles Blogger service.
It is alleged that some pro-Clinton blogs were blocked after a number of pro-Obama users marked them as ’spam’ on blogger.com. A Google spokesperson explained:
“It appears that our anti-spam filters caused some Blogger accounts to be blocked from creating new posts,” a Google spokesman, Adam Kovacevich, said in a statement. “While we are still investigating, we believe this may have been caused by mass spam e-mails mentioning the ‘Just Say No Deal’ network of blogs, which in turn caused our system to classify the blog addresses mentioned in the e-mails as spam.”
Kovacevich said that Google had restored posting rights to the affected blogs and that it was “very important” to Google “that Blogger remain a tool for political debate and free expression.” He gave no further details about Google’s spam-monitoring techniques or how they relate to the Blogger service.
It certainly would be useful if Google could provide some transparency into what they block and why. That way, either Google or the possibly malicious spam-flaggers could be help accountable for their behavior. (In a recent CACM piece on Information Accountability we explain why accountability is so important on the Web and how we might have more of it through additions to the architecture of the Web.)
Google does a very good job of giving transparent explanations when their search results contain information that has been blocked for legal reasons such as copyright takedown notices. I hope they can find a way to bring similar transparency to their part of blogosphere.
Important New Jersey Supreme Court decision in Internet privacy
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
The New Jersey Supreme Court (State of New Jersey v. Shirley Reid (A-105-06)) has issued an important decision on Internet users’ right to privacy. The case involves a dispute about whether an ISP violated a user’s privacy rights by turning over subscriber information (name, address, billing details) associated with a particular IP address. It ends up the that subpoena served on the ISP was invalid for a variety of reasons. As the user had a ‘reasonable expectation of privacy’ in her Internet activities and identifying information, and because the subpoena served on the ISP was invalid, the New Jersey court determined that the ISP should not have turned over the personal data.
The important aspect of this case in the evolving understanding of privacy on the Internet is the court’s recognition that we must look at privacy from the broad perspective of what can actually be discovered about people online. In this way, the ruling has significant strengths and weaknesses from a privacy perspective. On the one hand, the court finds that there is, today, an expectation of privacy in IP addresses because they are currently hard to link to personal identity. There have been lots of disputes in the US and the EU about whether IP addresses are ‘personally identifying information.’ (”PII” in the jargon of privacy.) This court takes a pragmatic view of this question and finds that IP addresses should be considered private for now, but that this may change. The court finds:
the reasonableness of the privacy interest may change as technology evolves. A reasonable expectation of privacy is required to establish a protected privacy interest…. Internet users today enjoy relatively complete IP address anonymity when surfing the Web. Given the current state of technology, the dynamic, temporarily assigned, numerical IP address cannot be matched to an individual user without the help of an ISP. Therefore, we accept as reasonable the expectation that one’s identity will not be discovered through a string of numbers left behind on a website.
The availability of IP Address Locator Websites has not altered that expectation because they reveal the name and address of service providers but not individual users. Should that reality change over time, the reasonableness of the expectation of privacy in Internet subscriber information might change as well. For example, if one day new software allowed individuals to type IP addresses into a “reverse directory” and identify the name of a user — as is possible with reverse telephone directories — today’s ruling might need to be reexamined.
Others have written about the legal details of this case and have suggested that it is a big win for privacy. Given the reliance on the shifting state of identity technology, I’m a little less sanguine.
This case is yet another reason why I believe (as I’ve explained elsewhere) that meaningful privacy on the Web requires rules the govern how personal information is used, not just what can be collected. Under the court’s reasoning, as our lives become more and more transparent, that would justify increasing harmful use of personal data. While it’s pretty hard to control how exposed we are all become, we still can limit how powerful institutions (governments, etc.) use personal data about us.
Bob Metcalfe's wisdom on patents and innovation
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Ethernet inventor, journalist and now venture capitalist Bob Metcalfe speaks on the lessons from the Internet community for the global warming arena. In looking at how to accelerate technical innovation to address climate change, Metcalfe asserts that:
“… the place to do research is in university labs. “The best vehicle for technology innovation is not patents, it’s students.”
Of course, Bob also manages to express is distain for monopoly, Bell Labs, and even Al Gore. (See report by Martin LaMonica.) I’m not sure about those but think he’s right on with respect to patents.
On meetings
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Ever the astute observer of the various features and bugs of our collective behavior, a longtime mentor of mine, Mitch Kapor, has coined a new defintion:
Meetingboarding: (n) the sensation of being unable to breathe arising from continuous immersion in meeting after meeting
I’d add to this a characterization of email that I learned from Mitch many years ago:
The problem with email is that it has low emotional bandwidth.
-Mitch Kapor, circa 1991
Today - NPR Science Friday program on Web privacy issues
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
National Public Radio’s Science Friday program will feature a discussion of online privacy with Alessandro Acquisti of CMU and yours truly a little later today. It’s live from 3:00 - 4:00 pm Eastern/US, rebroadcast at various times depending on where you live, and streamed on the Web.
Listen it. Call and challenge other listeners to think about the privacy questions raised by the Semantic Web!
Update: the broadcast is streamed at this link.
Transparency for behavioral profiling
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Behavioral targeting is pervasive on the Web. As documented by a very nicely-researched New York Time story today (’To Aim Ads, Web Is Keeping Closer Eye on You,’ NYT, by Louise Story, 10 March 2008.) it’s now clear that each of us who use popular search engines and portals are the subject of thousands of individual data collection events per month of Web usage.
I’m glad to see some clear analysis of the practice out there but would like to see an additional level of transparency. If it is the case that profiling is benign, then why not tell uses what aspect of their profile triggered the placement of a particular ad. The ad delivery systems all make decisions about which ads to place for a given user from some properties of that user that are either known or inferred. Why not just tell us what those properties are along with the add placement. This would go a long way toward eliminating the feeling that we’re being ’spied on’ because it would eliminate any sense of secrecy about what is learned in the course of the behavioral monitoring. My guess is that many people would ignore the profile data, but some would check it, and we’d all have piece of mind from knowing that whatever is being done is happening out in the open.
According to the Times, data is collected on which web pages we look at and is then combined with other data (demographics, browsing history, purchases on partner sites, etc.). Right on cue traditional privacy advocates declare that profiles developed in this way (based on our behavior) do (or should) make us feel uneasy:
“When you start to get into the details, it’s scarier than you might suspect,” said Marc Rotenberg, executive director of the Electronic Privacy Information Center, a privacy rights group. “We’re recording preferences, hopes, worries and fears.”
No doubt people (as least some people) feel alarmed about this and probably others are either implicitly or explicitly happy to have the right ads targeted to them. As an online ad agency exec said in the article:
“Everyone feels that if we can get more data, we could put ads in front of people who are interested in them,” he said. “That’s the whole idea here: put dog food ads in front of people who have dogs.”
Unless were going to require an outright ban on this sort of behavioral targeting, the question what to do about it. Is the goal to allay people’s fears? To limit the use of the profiles? Or to help people avoid incorrect targeting?
The statistics developed by comScore for the New York Times article do a nice job of illustrating the magnitude of data collection that happens. Jules Polonetsky, AOL’s Chief Privacy Officer, is launching a new consumer education campaign to explain the mechanics of data collection and tracking to users. The light that both the Times stories and the AOL campaign shed on marketing practices is valuable.
Many people are going to far more interested in how this profiling actually effects them, than on the overall magnitude of the practice. Is there any reason not to be upfront with people about the basis for delivering an ad? If there is, then there is reason to feel that we’re being deceived or maniplated, not assisted, by the behavior tracking techniques.
The political power of (simple) Web computing
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
It’s pretty amazing what a little bit of structured computer power can do when deployed on the Web. Slate’s Delegate Calculator puts in the hands of Web-enabled citizens some simple computing power that helps us to understand how the delegate counts in the upcoming Democratic primaries may effect the final outcome for Obama and Clinton over the next hours, weeks and months. The knowledge about which states have how many delegates, how they might be apportioned, etc., is information that used to be a closely guarded secret of the political intelligencia and the press. How, it’s out there for all of us to see. It’s such a useful tool that many reporters from other publications are actually writing about it:
Jonathan Alter, Hillary’s Math Problem, Newsweek (4 March 2008)
Peter Baker, Clinton Down, but not Out, for the Count, Washington Post.
Jason Tuohey, Delegate Counter, Boston Globe
Carol Lockhead, Obama Wins Vermont, But Look at the Math, San Francisco Chronicle.
Granted, Slate has a relationship with some of those new outlets, but it’s still striking to see computing make the political news.
Important FCC hearing on Net Neutrality in Cambridge, MA
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
I’d encourage anyone in or around the Boston, MA area to come to the Federal Communications Commission’s field hearing on Broadband Network Management Practices. I’ll be testifying along with a range of witnesses, Dave Clark and David Reed (colleagues from MIT), representatives from various commercial groups, and a number of advocacy organizations such as Free Press. I understand Congressman Ed Markey, a longtime champion of the Internet and the Web, will also be appearing.
Here are the logistical details:
Monday, Feb 25, 2008
11:00 a.m. to 4:00 p.m.
Harvard Law School, Ames Courtroom, Austin Hall
1515 Massachusetts Avenue, Cambridge, Mass.
Reciprocal Privacy for the Social Web (a.k.a. FOAF)
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
I’ve loved the idea of FOAF for a long time but always been bothered by the privacy risks that would result of FOAF really took off as a way to represent our social networks. Here’s an idea about how to address privacy in open social networks such as those represented by FOAF-like data structures.
It’s called (for now) REP: Reciprocal Privacy for Social Networks
ReP is a proposal to establish a reasonable privacy balance in social networking environment. Today, more and more social networks are coming onto the Web and are working to share more data across the previously-established boundaries that have previously separate these networks. Participants in social networks should have the benefit of widely shared agreements about how the information they present in those networks will be analyzed and used. To encourage the development of these social and legal privacy norms, we need a simple policy language for expressing rules associated with personal information, and a reliable, scalable mechanism for assessing accountability with those rules. We propose a new protocol by which those who share personal information on the Web can have increased confidence that this information will be used in a transparent manner and that users of the personal information will be able to be held accountable to comply with the stated usage rules.
Privacy policies and associated technologies must provide individuals harmed by breaches with legal recourse against those who abuse the norms of information usage. Hence, agreements must be clear and structured in a manner that there is a chance that the existing legal system could provide a remedy for harm. We should neither expect nor require than a single set of norms will be adequate for all users, all social networking contexts or all cultures, but there should be a common framework and a basic policy vocabulary that can express commonly used rules and be easily extended.
The key to sharing personal information across a diversity of privacy policy frameworks is to establish legal and technical mechanisms that ensures a baseline of social and legal accountability across varying rulesets. Participants in the ReP web must agree as a condition of accessing anyone else’s personal information that usage of personal information will be reported by the user to a log specified by the data subject. Further, anyone who uses the personal information must agree to require that the same set of rules (both the logging requirement and whatever usage rules came with the data) be applied to any subsequent users of the data. The log will allow the data subject to check that a specific usage of personal information complies with the specified usage limitations, and to follow the trail of accountability from the initial access of the data through to the final usage event.
This copy-left-inspired viral policy is the most effective way to assure that the original rules associated with personal data are respected as that data is re-used over and over again in a variety of contexts. In the event of misuse, the logs will provide a means to locate the mis-user and seek correction or other redress. In the event that a use of personal information is discovered which is NOT recorded in the person’s accountability log, that use is by definition a violation of the ReP policy. In many cases where such unauthorized use does real harm to the data subject, it will be possible with some amount of forensic effort will find the mis-user and enable redress. Of course, there will be anonymous mis-users of personal information. We cannot insulate Web users from those risks with ReP, but neither can any other privacy protection strategy that is feasible in an inherently open information environment.
There’s more to read in a skeletal REP design document.
The policy is still rough and the technology hasn’t been built yet, but I’d still really like reactions. ![]()

