blogs
GPS Luddites - the English countryside rebels against satnav
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Seems that a number of villages in the English countryside are being overrun by errant trans-European trucks which are regularly misdirected by their GPS satnav systems onto roads that were better suited for horse-drawn carriages than big, long-distance trucks. According to the New York Times (”Wedmore Journal: Turn Back. Exit Village. Truck Shortcut Hitting Barrier.” Sarah Lyall, 4 December 2007, p.A7):
trucks and tractor-trailers come here [to Wedmore] all the time, as they do in similarly inappropriate spots across Britain, directed by G.P.S. navigation devices that fail to appreciate that the shortest route is not always the best route. “They have no idea where they are,” said Wayne Hahn, a local store owner who watches a daily parade of vehicles come to grief — hitting fences, shearing mirrors from cars and becoming stuck at the bottom of Wedmore’s lone hill.
The head of the parish council offers a practical suggestion:
John Sanderson, chairman of the parish council, has proposed a seemingly simple remedy: removing the route through Wedmore from the G.P.S. navigation systems used by large vehicles.
“We’d like them to have appropriate systems that would show some routes weren’t suitable for H.G.V.’s,” Mr. Sanderson said, using shorthand for heavy goods vehicles.
Mr. Sanderson said he would not go so far as to advocate eradicating Wedmore from the map.
But others go farther:
“We’ve said, ‘Just take us off the map,’ actually,” said Geoff Coombs, chairman of the parish council in Barrow Gurney, a village that, despite being too small to have a sidewalk, is host to some 15,000 vehicles a day, cars as well as larger vehicles, whose G.P.S. systems identify it as a good alternative route to Bristol Airport.
Semantic web geo-taggers, start your engines. There are lots of ways creative metadata could help here, but my guess is that as the Web gets ’smarter,’ some of what happens out in the world as a result will seem just plain dumb. ![]()
Free speech-related privacy rights of book buying (and reading?) records
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Last week, a Federal Magistrate in Wisconsin published an important opinion articulating limits on the government’s power to demand access to records of individuals’ book-buying activity held by 3rd parties such as Amazon.com. The case (IN RE GRAND JURY SUBPOENA TO AMAZON.COM DATED AUGUST 7, 2006) arose in the course of an FBI/IRS investigation of an individual who sells lots of used books on Amazon and was suspected of large-scale tax evasion. In order to develop the case, the Federal investigators acting through a grand jury:
directed Amazon to provide virtually all of its records regarding D’Angelo, including the identities of the thousands of customers who had bought used books from D’Angelo. The government subsequently chose to reduce this scope of this request to the identification of 120 book buyers, 30 per year for the four years under investigation. The government’s plan was for special agents of the FBI and IRS to contact these 120 used book buyers in an attempt to develop concrete evidence necessary to lay a transactional foundation for criminal charges of fraud and tax evasion against D’Angelo. The government does not suspect Amazon or D’Angelo’s customers of any wrongdoing, nor does it consider them victims of D’Angelo; they simply are bricks in the evidentiary wall being erected by the grand jury.
Rather than comply with the subpoena, Amazon exercised its legal right to move the government request ‘quashed’ as it allowed under law. Responding to this motion to quash, the Magistrated acted to to protect the First Amendment rights of the buyers whose identity would be revealed if Amazon responded to the subpoena. The Magistrate concluded that “the government is not entitled to unfettered access to the identities of even a small sample of this group of book buyers without each book buyer’s permission.” Hence, he ordered that a special procedure by which those Amazon customers who bought from the suspect during the relevant time period would be asked in an a manner that did not reveal their identity whether they would be willing, on a voluntary basis, to have their records turned over to the government.
In the end, the government withdrew the subpoena altogether, telling the Wisconsin State Journal that they were able to get names by analyzing the suspects seized computer.
Beyond the First Amendment rationale offered in this case, more striking is the Magistrates assessment of the public mood with respect to privacy in general in the wake of the Patriot Act and warrentless wiretapping activity.
…[I]t is an unsettling and un-American scenario to envision federal agents nosing through the reading lists of law-abiding citizens while hunting for evidence against somebody else. In this era of public apprehension about the scope of the USAPATRIOT Act, the FBI’s (now-retired) “Carnivore” Internet search program, and more recent highly-publicized admissions about political litmus tests at the Department of Justice, rational book buyers would have a non-speculative basis to fear that federal prosecutors and law enforcement agents have a secondary political agenda that could come into play when an opportunity presented itself. Undoubtedly a measurable percentage of people who draw such conclusions would abandon online book purchases in order to avoid the possibility of ending up on some sort of perceived “enemies list.”
While cautioning (in a footnote) that he did not formally recognize these fears to be well-founded, none the less he felt he had to act to limit government power in this case because:
…if word were to spread over the Net–and it would–that the FBI and
the IRS had demanded and received Amazon’s list of customers and their personal purchases,the chilling effect on expressive e-commerce would frost keyboards across America. Fiery rhetoric quickly would follow and the nuances of the subpoena (as actually written and served) would be lost as the cyberdebate roiled itself to a furious boil. One might ask whether this court should concern itself with blogger outrage disproportionate to the government’s actual demand of Amazon. The logical answer is yes, it should: well-founded or not, rumors of an Orwellian federal criminal investigation into the reading habits of Amazon’s customers could frighten countless potential customers into canceling planned online book purchases, now and perhaps forever.
There are two very important caveats to add, however. First, this opinion is only that of one Federal magistrate in one district court. It is not binding on any other part of the country and there are often widely divergent opinions from magistrates. Second, we don’t know who this reasoning might apply to a subpoena issued by a private party in civil litigation (say a divorce lawyer looking to impugn the integrity of an opposing spouse by revealing unsavory reading habits). Finally, as the government dropped its request altogether, this case will never be heard by any other court to be either affirmed or overturned. So, it will hang out there as one view of the privacy problems associated with subpoenas of private information held by 3rd parties.
-not clear how it applies to civil subpoenas in privacy litigation
Data sharing and integration for local and state law enforcement
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
The Washington Post reports today (”System Lets Agencies In Area Share Data,” Mary Beth Sheridan, Thursday, November 29, 2007; Page B03) that over 60 state, local and federal law enforcement agencies in the Washington DC area announced a plan to share information (including 6 million mug shots and 14 million arrest records).
In what they called a breakthrough, law enforcement officials yesterday unveiled a computer system that will allow more than 60 state and local police agencies in the D.C. area to share mug shots and crime reports.
The system, Law Enforcement Information Exchange (LInX), functions like Google for police, except that the database contains law enforcement information.
This despite that fact that several years ago some civil libertarians criticized an earlier version of this this multi-state data sharing system ominously named MATRIX (Multistate Anti-Terrorism Information Exchange). The ACLU even went so far as to declare MATRIX ‘dead.‘ However, now it seems that LInX includes participation from Florida, Georgia, Hawaii, Texas, Virginia, Washington, the DC-area, and soon New Mexico.
I don’t know what sort of technology the system is built on but if it’s not Semantic Web Linked Data-style architecture now, it should be and probably soon will be.
On the importance of ICANN - a wise view from Joi Ito
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
On leaving the ICANN Board of Directors after a 3 year term, Joi Ito, one of the true leaders of the global Internet/Web community, writes:
Joi Ito’s Web: Three years with ICANN
With all of it’s tumultuous history and bumps and warts, ICANN, in my opinion, is the best way that we can manage names and numbers on the Internet and any new thing to try to do what it does would be less fair and probably wouldn’t work.There are some technical architectures and ideas that might make ICANN less relevant, which would be a good thing. However, even relatively obvious things like IPv6, IDNs and DNSEC are having a hard time getting traction. I think that it would be nearly impossible to “redesign the DNS” and get people to use it. It would be like trying to redesign a flying airplane. On the other hand, their might be some evolutionary changes that make domain names less relevant.
While ICANN must continue to improve its openness and public accountability, I wholeheartedly support Joi’s view. Anyone reading this post or able to follow the link to his original owe’s ICANN a real ‘thank you.’
Giant Global Graph
Well, it has been a long time since my last post here. So many topics, so little time. Some talks, a couple of Design Issues articles, but no blog posts. To dissipate the worry of expectation of quality, I resolve to lower the bar. More about what I had for breakfast.
So The Graph word has been creeping in. BradFitz talks of the Social Graph as does Alex Iskold, who discusses social graphs and network theory in general, points out that users want to own their own social graphs. He alo points out that examples of graphs are the Internet and the Web. So what's with the Graph word?
Maybe it is because Net and Web have been used. For perfectly good things .. but different things.
The Net we normally use as short for Internet, which is the International Information Infrastructure. Al Gore promoted the National Information Infrastructure (NII) presumably as a political pragma at the time, but clearly it became International. So let's call it III. Let's think about the Net now as an invention which made life simpler and more powerful. It made it simpler because of having to navigate phone lines from one computer to the next,you could write programs as though the net were just one big cloud, where messages went in at your computer and came out at the destination one. The realization was, "It isn't the cables, it is the computers which are interesting". The Net was designed to allow the computers to be seen without having
to see the cables.
Simpler, more powerful. Obvious, really.
Programmers could write at a more abstract level. Also, there was re-use of the connections, in that, as the packets flowed, a cable which may have been laid for one purpose now got co-opted for all kinds of uses which the original users didn't dream of. And users of the Net, the III, found that they could connect to all kinds of computers which had been hooked up for various reasons, sometimes now forgotten. So the new abstraction gave us more power, and added value by enabling re-use.
The word Web we normally use as short for World Wide Web. The WWW increases the power we have as users again. The realization was "It isn't the computers, but the documents which are interesting". Now you could browse around a sea of documents without having to worry about which computer they were stored on. Simpler, more powerful. Obvious, really.
Also, it allowed unexpected re-use. People would put a document on the web for one reason, but it would end up being found by people using it in completely different ways. Two delights drove the Web: one of being told by a stranger your Web page has saved their day, and the other of discovering just the information you need and for which you couldn't imagine someone having actually had the motivation to provide it.
So the Net and the Web may both be shaped as something mathematicians call a Graph, but they are at different levels. The Net links computers, the Web links documents.
Now, people are making another mental move. There is realization now, "It's not the documents, it is the things they are about which are important". Obvious, really.
Biologists are interested in proteins, drugs, genes. Businesspeople are interested in customers, products, sales. We are all interested in friends, family, colleagues, and acquaintances. There is a lot of blogging about the strain, and total frustration that, while you have a set of friends, the Web is providing you with separate documents about your friends. One in facebook, one on linkedin, one in livejournal, one on advogato, and so on. The frustration that, when you join a photo site or a movie site or a travel site, you name it, you have to tell it who your friends are all over again. The separate Web sites, separate documents, are in fact about the same thing -- but the system doesn't know it.
There are cries from the heart (e.g The Open Social Web Bill of Rights) for my friendship, that relationship to another person, to transcend documents and sites. There is a "Social Network Portability" community. Its not the Social Network Sites that are interesting -- it is the Social Network itself. The Social Graph. The way I am connected, not the way my Web pages are connected.
We can use the word Graph, now, to distinguish from Web.
I called this graph the Semantic Web, but maybe it should have been Giant Global Graph! Any worse than WWWW? ;-) Not the "Semantic Web" term has been established for a long time, I'm not proposing to change it. But let's think about the graph which it is. (Footnote: "Graph" also happens to be the word the RDF specifications use, but that is by the way. While an XML parser creates a DOM tree, an RDF parser creates an RDF graph in memory.)
So, if only we could express these relationships, such as my social graph, in a way that is above the level of documents, then we would get re-use. That's just what the graph does for us. We have the technology -- it is Semantic Web technology, starting with RDF OWL and SPARQL. Not magic bullets, but the tools which allow us to break free of the document layer. If a social network site uses a common format for expressing that I know Dan Brickley, then any other site or program (when access is allowed) can use that information to give me a better service. Un-manacled to specific documents.
I express my network in a FOAF file, and that is a start of the revolution. I blogged on FOAF files earlier, before the major open SNS angst started. The data in a FOAF file can be read by other applications. Photo-sharing, travel sites, sites which accept your input because you are a part of the graph.
The less inviting side of sharing is losing some control. Indeed, at each layer --- Net, Web, or Graph --- we have ceded some control for greater benefits.
People running Internet systems had to let their computer be used for forwarding other people's packets, and connecting new applications they had no control over. People making web sites sometimes tried to legally prevent others from linking into the site, as they wanted complete control of the user experience, and they would not link out as they did not want people to escape. Until after a few months they realized how the web works. And the re-use kicked in. And the payoff started blowing people's minds.
Letting your data connect to other people's data is a bit about letting go in that sense. It is still not about giving to people data which they don't have a right to. It is about letting it be connected to data from peer sites. It is about letting it be joined to data from other applications.
It is about getting excited about connections, rather than nervous.
In the short, what-can-I-code-up-this-afternoon-to-fix-this term, it is about other sites following the lead of my.opera.com, livejournal, advogato, and so on (list) also exporting a public RDF URI for their members, with what information the person would like to share.Right now, this blog re-uses the FOAF data linked to us to fight spam.
In the long term vision, thinking in terms of the graph rather than the web is critical to us making best use of the mobile web, the zoo of wildy differing devices which will give us access to the system. Then, when I book a flight it is the flight that interests me. Not the flight page on the travel site, or the flight page on the airline site, but the URI (issued by the airlines) of the flight itself. That's what I will bookmark. And whichever device I use to look up the bookmark, phone or office wall, it will access a situation-appropriate view of an integration of everything I know about that flight from different sources. The task of booking and taking the flight will involve many interactions. And all throughout them, that task and the flight will be primary things in my awareness, the websites involved will be secondary things, and the network and the devices tertiary.
I'll be thinking in the graph. My flights. My friends. Things in my life. My breakfast. What was that? Oh, yogourt, granola, nuts, and fresh fruit, since you ask.
Free Culture: Why buy the Amazon Kindle when you can give and get an OLPC XO-1 for the same price?
I just discovered Kindle: Amazon's New Wireless Reading Device. About $10 per e-book sounds ok, but $0.10 to put my own files on it?!?! It can read blogs like Slashdot and boingboing for as little as $.99 per month over the $399 purchase price. It comes with wikipedia. Say... that sounds familiar... where else can I get wikipedia on a device with a nice display that works in daylight...
Oh yeah! The OLPC XO-1. For the same $400 (+ shipping) you can get one and give one away.
Håkon brought one to the video panel at the W3C TPAC this month, while the voice of Lawrence Lessig was still ringing in my head: What have we done about it? he asked again and again in his powerful OSCON 2002 talk:
Lawrence Lessig: I have been doing this for about two years--more than 100 of these gigs. This is about the last one. One more and it's over for me. So I figured I wanted to write a song to end it. But then I realized I don't sing and I can't write music. But I came up with the refrain, at least, right? This captures the point. If you understand this refrain, you're gonna' understand everything I want to say to you today. It has four parts:
Creativity and innovation always builds on the past.
The past always tries to control the creativity that builds upon it.
Free societies enable the future by limiting this power of the past.
Ours is less and less a free society.
I don't sing all that well either, but I play a little guitar, so when Håkon walked into the HTML WG meeting as un-conference pitches were next on the agenda, I pitched a jam session. I dedicated the opening number,With a Little Help from My Friends, to Sam Ruby whose comment prompted me to watch the Lessig show before the trip. The InstantGig was "surreal (but awesome)" according to one account.
Håkon's pitch for open standard video for our cultural heritage inspired One laptop per Kyle, the story of getting an XO-1 for my 8-year-old boy instead of the Windows PC he says he want in order to play the games that his friends all play. Before the trip he told me that he wants to build a web site with lots and lots of games and I thought "but you're just one little boy." But I think I get it now...
He has a new name, by the way: Burn, as in Rip, Mix, an Burn. Rip, after 1 year of musical training, can sound out the Mario theme on trombone or piano in an afternoon, something I can't do after 20 years of training my mediocre ear. And the middle child, Mix, is so charming that if you stop at a red light, he'll have a new friend before the light turns green.
I have one give-one-get-one package on order for Burn; if you're feeling like a patron of the arts and you want to see what happens if Rip and Mix get one too, feel free to send us a Christmas Card with a little something inside.
And look out for SwordPedestal.com, which Kyle picked out. It's only a dream now, but I have a hunch it may one day rival Nintendo for the hearts and minds of a few million people.
brainstorming, issue tracking, and problem reporting... with tabulator?
I want something more like the OmniOutliner experience... I want to brainstorm.
But when I'm done, I don't want to tediously copy and paste each field into tracker.
Clearly, I could write some python or XSLT to take OmniOutliner's XML and feed it to tracker afterward, but... can't we do better than that?
What if tabulator's UI were as smooth as OmniOutliner... and what if I could just push one button and get the toothpaste back in the tube, i.e. feed the outline into the tracker's REST interface?
p.s. why am I using emacs to write this? Apple Mail knows IntegrityIsJobOne, but in OS X 10.4, like iCal, it goes off into the weeds eating CPU for inexplicable reasons, and I don't invest debugging effort in stuff that isn't open source.
How do I feed this to breadcrumbs now? Does emacs have a markdown/ReStructuredText mode?
How about AtomPub support? I manually cut and pasted and cleaned up the line breaks. ugh!
I use Thunderbird on my PowerBook, but it's totally confused about offline operation. It goes to save to the drafts folder every now and again, but over IMAP... so if the net is flakey or down, (a) it doesn't actually save, and (b) it interrupts my drafting!!! OK... found the config option to use a local drafts folder under Tools/Account settings. (why not under preferences?) But Thunderbird doesn't do well filling the IMAP cache; I don't know to tell it to go offline until I've left the airport wifi, and at that point, it's too late to grab the mail I want to read. The Apple mailer does much better at using idle time to prefetch.
p.p.s. how do I use hReview and GRDDL to make the data in this gripe available as if it were a bugzilla entry? More on that to follow, I hope...
FOAF and OpenID: two great tastes that taste great together
As Simon Willison notes, OpenID solves the identity problem, not the trust problem. Meanwhile, FOAF and RDF are potential solutions to lots of problems but not yet actual solutions to very many. I think they go together like peanut butter and chocolate, creating a deliciously practical testbed for our Policy Aware Web research.
Our struggle to build a community is fairly typical:
- Oct 2005: breadcrumbs launches (and I wish for OpenID support)
- Dec 2005: Tim gets 400+ friendly comments on his first item.
- Jun 2006: Comments disabled due to overwhelming spam
In Dec 2006, Ryan did a Drupal upgrade that included OpenID support, but that only held the spammers back for a couple weeks. Meanwhile, Six Apart is Opening the Social Graph:
... if you manage a social networking service, we strongly encourage you to embrace OpenID, hCard XFN, FOAF and the other open standards around data portability.
With that in mind, a suggestion to outsource to a centralized commercial blog spam filtering service seemed like a step in the wrong direction; we are the Decentralized Information Group after all; time to eat our own cooking!
The policy we have working right now is, roughly: you can comment on our blog if you're a friend of a friend of a member of the group.
In more detail, you can comment on our blog if:
- You can show ownership of a web page via the OpenID protocol.
- That web page is related by the foaf:openid property to a foaf:Person, and
- That foaf:Person is
- listed as a member of the DIG group in http://dig.csail.mit.edu/data, or
- related to a dig member by one or two foaf:knows links.
The implementation has two components so far:
- an enhancement to drupal's OpenID support to check a whitelist
- a FOAF crawler that generates a whitelist periodically
We're looking into policies such as You can comment if you're in a class taught by a DIG group member, but there are challenges reconciling policies protecting privacy of MIT students with this approach.
We're also interested in federating with other communities. The Advogato community is particuarly interesting because
- The DIG group is pretty into Open Source, the core value of advogato.
- Advogato's trust metric is designed to be robust in the face of spammers and seems to work well in practice.
So I'd like to be able to say You can comment on our blog if you're certified Journeyer or above in the Advogato community. Advogato has been exporting basic foaf:name and foaf:knows data since a Feb 2007 update, but they didn't export the results of the trust metric computation in RDF.
Asking for that data in RDF has been on my todo list for months, but when Sean Palmer found out about this OpenID and FOAF stuff, he sent an enhancement request, and Steven Rainwater joined the #swig channel to let us alpha test it in no time. Sean also did a nice write-up.
This is a perfect example of the sort of integration of statistical methods into the Semantic Web that we have been talking about as far back as our DAML proposal in 2000:
Now we just have to enhance our crawler to get that data or otherwise integrate it with the drupal whitelist. (I'm particularly interested in using GRDDL to get FOAF data right from the OpenID page; stay tuned for more on that.) And I guess we need Advogato to provide a user interface for foaf:openid support... or maybe links to supplementary FOAF files via rdfs:seeAlso or owl:sameAs.Some of these systems use relatively simple and straightforward manipulation of well-characterized data, such as an access control system. Others, such as search engines, use wildly heuristic manipulations to reach less clearly justified but often extremely useful conclusions. In order to achieve its potential, the Semantic Web must provide a common interchange language bridging these diverse systems. Like HTML, the Semantic Web language should be basic enough that it does not impose an undue burden on the simplest web software systems, but powerful enough to allow more sophisticated components to use it to advantage as well.
Privacy Lost?
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
The New York Times writes about new location-sharing services and worries about Privacy Lost. No doubt there will be all sort of privacy questions associated with these services but it seems pretty clear that people are going to flock to location-based services of all kinds. Some data points:
- As the reporter, Laura Holson points out, over 50% of mobile phone now sold are GPS-capable. That number is certain to rise to near 100% over time.
- Even without GPS, mobile networks are pretty good at inferring location by triangulation. In the US it is a requirement that mobile network operators deliver location data to 911 operators in real time, accurate to with 50-100 meters (depending on the application).
- Non-real-time location can be just as revealing, if not more. Social- and Semantic Web enthusiasts are hot to deliver all sorts of geotagging services which are likely to be just as revealing as GPS. Yahoo and Apple are featuring geotagging in their photo services. New social mapping services such as Platial and Flappr will provide every bit as much detail, though not necessarily in real time. Imagine when the parent of a teenager discovered that teen has posted photos the are geo-tagged to show that they were taken at a house the teen was barred from visiting.
Location is hot. In a nice post on the significance of Google’s acquisition of mobile/microblogger Jaiku, Chris Messina writes:
The Web 2.0 Address Book isn’t really about how you connect to someone. It’s not really about having their home, work and secret lair addresses. It’s not about having access to their 15 different cell phone numbers that change depending on whether they’re home, at work, in the car, on a plane, in front of their computer and so on. It’s not about knowing the secret handshake and token-based smoke-signal that gains you direct access to send someone a guaranteed email that will bypass their moats of antispam protection. In the real world (outside of Silicon Valley), people want to type in the name of the recipient and hit send and have it reach the destination, in whatever means necessary, and in as appropriate a manner as possible. For this to happen, recipients need to provide a whole lot more information about themselves and their contexts to the system in order for this whole song and dance to work.
And the founder of Jaiku explains to the New York Times:
Petteri Koponen, one of the two founders of Jaiku, described the service as a “holistic view of a person’s life,” rather than just short posts. “We extract a lot of information automatically, especially from mobile phones,” Mr. Koponen said from Mountain View, Calif., where the company is being integrated into Google. “This kind of information paints a picture of what a person is thinking or doing.”
Privacy is not lost simply because people find these services useful and start sharing location. Privacy could be lost if we don’t start to figure what the rules are for how this sort of location data can be used. We’ve got to make progress in two areas:
- technical: how can users sharing and usage preferences be easily communicated to and acted upon by others? Suppose I share my location with a friend by don’t want my employer to know it. What happens when my friend, intentionally or accidentally shares a social location map with my employer or with the public at large? How would my friend know that this is contrary to the way I want my location data used? What sorts of technologies and standards are needed to allow location data to be freely shared while respective users usage limitation requirements?
- legal: what sort of limits ought there to be on the use of location data?
- can employers require employees to disclose real time location data?
- is there any difference between real-time and historical location data traces? (I doubt it)
- under what conditions can the government get location data?
There’s clearly a lot to think about with these new services. I hope that we can approach this from the perspective that lots of location data will being flowing around and realize the the big challenge is to develop social, technical and legal tools to be sure that it is not misused.
What to do about Google and Doubleclick? Hold Google to it's word with some Extreme factfinding about privacy practices
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
The proposed merger between Google and Doubleclick has raised hackles among those concerned about potential domination of the online advertising marketplace (especially Microsoft) but even more worry among privacy advocates. After a short talk over the weekend with a friend, Peter Swire, a thoughtful and knowledgeable privacy scholar, I came to the view that regulators have to develop a new, robust and scalable means of keeping track of what large data handlers such as Google are actually doing with personal information. (While the conversation with Peter was quite stimulating, I don’t know whether or not he agrees with what I’ve written here.) The mechanisms the exist today to help users make informed choices and policy makers set sound directions are simply inadequate to answer the kinds of questions posed by the Google-Doubleclick deal. Instead of formal, highly negotiated and scripted hearings, we need to much more open, flexible process in which technical experts and the interested public are able to ask detailed questions about current practices. This is not a criticism of either US or EU regulators. On both sides of the Atlantic there is a fine tradition of EU Data Protection Commissions and the US Federal Trade Commission engaging in careful and thoughtful probes of privacy-sensitive activities. However, these processes often take too long, end up producing results that are quite out of date. A lot of energy goes into addressing last year’s data handling practices by which time the leading edge of the industry has moved on.
In the 1990s, the FTC under Christine Varney’s leadership pushed operators of commercial websites to post policies stating how they handle personal information. That was an innovative idea at the time, but the power of personal information processing has swamped the ability of a static statement to capture the privacy impact of sophisticated services, and the level of generality at which these policies tend to be written often obscure the real privacy impact of the practices described. It’s time for regulators to take the next step and assure that both individuals and policy makers have information they need.
So, as part of investigating the Google-Doubleclick merger, regulators should appoint an independent panel of technical, legal and business experts to help them review, on an ongoing basis the privacy practices of Google. Key components of this process should be:
- expert panel made up of those with technical, legal and business expertise from around the world
- public hearings at which Google technical experts are available to answer questions about operational details of personal data handling
- questions submitted by the public and organized in advance by the expert panel
- staff support for the panel from participating regulatory agencies
- real-time publication of questions and answers
- An annual report summarizing what the panel has learned
The Internet open source and open standards communities have learned a lot over the last decade about how to use the Web to facilitate open, collaborative and often rapid development of new technologies. Web users reap the benefit of these open processes with easy access to high-quality software. Indeed, the very infrastructure of the Web and the Internet have been largely developed in this sort of open, extreme technology development process. Making public policy is different than developing technical designs, but the in-depth fact-finding that is needed to make sounds policy decisions could benefit a lot from the open, collaborative, online information gathering and sifting process that we already use for Web technology development. Of course, this would not supplant the traditional policy making role of regulators. Rather, this process would serve as a fact-gathering process to help inform regulators. If everyone was feeling really ambitious, perhaps there could even be cooperation between the various regulators around the world with a commitment to study the results from this process. Despite differences in privacy policy in different parts of the world, there has been an impressive record of information cooperation, especially at the staff level, amongst various privacy regulators around the world. This could be a good next step to take in that direction.
By way of background, regulators in the US (Federal Trade Commission) and Europe (Article 29 Working Party representing the EU’s Data Protection Authorities) are investigating both antitrust and privacy questions regarding the merger. The key privacy concern seems to be that Google would take all of the personal information it has about users (search terms, IP addresses, contents of email, location from map applications, etc.) and combine it with the personal data the Doubleclick has (demographics, click stream data from ads served) and end up with a REALLY powerful private surveillance machine.
Google says that they care about their user’s privacy rights and would never abuse the newfound power they propose to acquire. According to Nicole Wong:
“User, advertiser and publisher trust is paramount to the success of our business and to the success of our acquisition…. We can’t imagine taking any actions that would undermine these relationships or the trust people have in using our products and service.” (Washington Post, 20 April 2007)
But the question is: how will either policy makers or users know that their trust is being violated or pushed to an extreme that they’re not comfortable with? Google, to it’s credit, sees the need to provide more information about what it does with personal data. In testifying before the United States Senate, Google’s chief lawyer, David Drummond, said:
We are also exploring other ways to create more transparency in our privacy practices and policies. We have a lot of information about our privacy practices on our website, and we’re making that information even more accessible to users by adding video-format “tutorials” to help users understand privacy issues online in plain English. The first of these video tutorials has been viewed about 43,500 times on YouTube, and the second video launched earlier this week and has already been viewed hundreds of times.
But will expanded privacy policies and videos really be enough to help uses make sound decisions. Privacy regulations place a large, and I believe unsustainable, burden on users to learn the details of how services such as Google use their personal information and then weigh the current benefit of the service against the perceived privacy cost. There is mounting evidence that people will trade off a lot of future privacy risk in exchange for current convenience. I doubt that simply presenting users with more and more choices will help us arrive at a privacy policy that is sounds in the long run. For example, some privacy advocates (EPIC) demand that Google be required to get a explicit permission from all of who have Doubleclick cookies before the information associated with those cookies can be used together with personal information from Google. EPIC also asks that a lot of other information about Google’s information handling practices be made available to users, consistent with traditional privacy notions of notice and access to personal information.
Imagine the question that Google might ask when seeking permission from a user to associate their Doubleclick cookie with Google data in a mobile search application:
Google Dialog Box (FAKE): We’d like to us some of the demographic information we have about you to give you more accurate, convenient directions on your mobile phone. We will also use this data to target ads to you, just like we do with you GMail account. Click ‘Yes’ to agree or ‘No’ and they you’ll be asked to type the latitude and longitude of your ten favorite locations.
The query may not be so extreme, but the idea will be the same.
So my view is that users could use a bit of help making these decisions. That help ought to come in the form of some baseline rules about how personal information can and cannot be used. The days of saying that all users need is ‘free choice’ are over. Of course, the problems discussed here with respect to Google apply equally to many other services on the Web that handle personal information. Google and it’s merger proposal presents a good opportunity to start figuring our some of these questions, but the process and the answers would be applicable to many others as well. In order to figure out what policies should actually govern how data is used, a careful and ongoing investigate of Google’s practices, with the help of the independent board I have suggested above, would be a good place to start.

