Danny Weitzner's blog
Data mining for (and about) the rest of us
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
The applefritter blog has a very clever entry showing how easy it is to data mine information publicly available on the Web in order to learn all sorts of revealing things about people. Tom Owad shows in his post, Data Mining 101: Finding Subversives with Amazon Wishlists | Applefritter how it possible to piece together relatively simple web technology to create detailed profiles of people who, in this case, have public wishlists on Amazon.com. Wishlist data, combined with other services (Google maps, geocoding tools, etc.), can be combined to give of view of what sort of books people are interested in reading and where they live. (I was at first a bit surprised that wish lists are public by default, but it makes some sense. If you have a wish list you want people who might fulfill your wish, to have access to it.)
Owad hastenes to add that his exercise is not meant as an attack on Amazon, but rather his goal is to underscore the importance of putting limits of how the government can actual use the information that they gather from a variety of sources. I agree with his view and have said so elsewhere, but think that this relatively simply hack proves far more. Owad’s hack (a term I use with only its positive connotations) points out the now widely-dispersed power to infer a lot about individuals based on information that we all leave behind on the Web. As web services, AJAX and Semantic Web technology become more popular on the Web, this trend will only increase geometrically. Some react to this by looking for ways to minimize the trail of data we leave. I believe that to expect people to do that is neither fair nor realistic. We want to encourage user services like wishlists and innovative uses of that data beyond it’s original intent, provided that these uses don’t contravene the data subjects expectations and provided everyone feels confident that there is a legal framework that protects against abuse. As we begin to rethink privacy laws for both the government and private sector, we should think more about how to set generally-accepted rules about how data will be used.
Identify theft experience highlights importance of use restrictions
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
The actual harm done by identity theft has some important clues for those concerned about privacy protection in environments where information flows around through multiple channels. In an excellent account of an identify theft victim’s experience (which began well before the Web, viruses, spam, etc.), Tom Zeller of the New York Times (Waking Up to Recurring ID Nightmares - New York Times, 9 Jan 2006,) illustrate the need to focus on controlling the useof information, rather than trying to control collection Zeller quotes a source from an identify verification company:
“They say once the horse is out of the barn, why bother closing the door?” Mr. Waller said, referring to the millions of bits of consumer data already leaked into the black market. “But even if someone has your Social Security number, if you can prevent them from using it, that’s the solution we should be driving towards.”
There certainly is some information that should never be collected in the first place, but this story illustrates that a key aspect of privacy protection in more open information environments will be creative mechanisms for controlling, both legally and technically (following the users own choices and preferences). I’ve been trying to understand how to do this in recent speaking and writing. This is all still a work in progress.
How much bandwidth is enough?
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Om Malik has a nice post, “Need For Speed: How Real?” asking how much IP bandwidth is enough. He writes:
After years of being stuck in the slow lane, the US consumers are finally going to get a massive speed upgrade and taste the true broadband for the first time. From a 512 Kbps world to 6 Mbps, then 8 and soon 15 Mbps. it seems the future has finally arrived. And with that, the question. how much speed is enough? Can we the consumers really tell the difference between 15 and 30 Mbps?
[Research shows, he says, that as] we increase the speed, the real impact of the speed on what we do with it is marginal. Can your eyes tell the difference between a web-page loading in one second or 0.27 seconds. I guess not. If you can download a music file in 1.08 seconds, does that really mean you will be buying music all the time. No you perhaps will be buying better quality, and perhaps marginally more music. There is the other option, but its just easier to pay! Sure at 30 Mbps you can download DVD quality The Bourne Identity in 11 minutes, but its still going to take you 2 hours to watch it.
[..]
Don’t get me wrong…. I will upgrade, and hope the experience improves, but at some point, we need the applications that truly harness this speed come-along and are allowed to thrive. Not likely in the “we will control the net” attitude adopted by the incumbents. Even in truly immersive multiplayer games, its the latency, not the speed that matters.
The real bandwidth question is when are going to see an increase in the uplink speeds?
I’m acutally not sure that this is the real question, though. It’s always fascinating to ask how much (of anything) is enough but the question seems (a) unanswerable, and (b) perhaps not the most important one to be thinking about, at least from a public policy standpoint. As the so-called Net Neutrality debate heats up again in the US and around the world, policy makers will have to think about how to trade off the need for open, non-discriminatory networks against the desire for higher and higher capacity broadband pipes to the user. Perhaps there are ways to avoid an either-or choice, but I’d suggest that the development of the Internet has rested at least as much on openness as on bandwidth.
There’s an interesting discussion thread on this topic on Dave Farbers Interesting People list.
Judge Posner on privacy and government data mining
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Writing on the Washington Post Op-Ed page this week (Our Domestic Intelligence Crisis, 21 December 2005), Judge Richard Posner asserts that there is no privacy threat from the mere collection of personal information by the government. He writes:
These programs [such as Pentagon’s Counterintelligence Field Activity (CIFA)] are criticized as grave threats to civil liberties. They are not. Their significance is in flagging the existence of gaps in our defenses against terrorism. The Defense Department is rushing to fill those gaps, though there may be better ways.
The collection, mainly through electronic means, of vast amounts of personal data is said to invade privacy. But machine collection and processing of data cannot, as such, invade privacy. Because of their volume, the data are first sifted by computers, which search for names, addresses, phone numbers, etc., that may have intelligence value. This initial sifting, far from invading privacy (a computer is not a sentient being), keeps most private data from being read by any intelligence officer. (emphasis added)
This gets are the heart of the question about data mining. If you believe that privacy is, in Justice Brandeis’ words, “the right to left alone,” then you probably don’t agree with Judge Posner. Even if the government does nothing with you data, the simple act of collecting and possessing it can hardly be said to be ‘leaving you alone’. But if you’re more concerned with privacy as the right to control how personal information is used and know that it will be used according to a clear set of rules, constitutional and statutory, then perhaps you’re more prepared to accept Posner’s view.
Another cellphone location tracking case: this time the government need not meet 4th Amendment probable cause requirement
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
I learned from an Ars Technica item of yet another federal court ruling on cellphone location tracking (US District Court for the Southern District, Gorenstein, Mag., Opinion and Order 05 Mag. 1763, 21 December 2005). This time, the magistrate found that the government could gain access to at least a particular type of location information (cell site information) without satisfying the full 4th Amendment probable cause requirement.
The result in this new opinion by the 4th federal magistrate writing in as many months takes the opposite view from the first 3. He concludes that the location information (at least cell tower information) is available to the government without satisfying the 4th Amendment probably cause standard. All they need to do is to show “specific and articulable facts” that the information sought is relevant to an ongoing criminal investigation. This is less than the full 4th Amendment standard, but entails significant independent judicial oversight of the privacy intrustion, unlike the virtually non-existent oversight required to get ‘dialed number’ information. (See my earlier post on this subject for more details.) Magistrate Gorenstein comes to this conclusion through what I believe may be a mistaken reading of the relevant statute. The reasoning is so convoluted that I hesitate to even try to summarize it here. In short, he concluded that a statute passed in 1994, the Communications Assistance for Law Enforcement Act (Pub. L. 103–414, 47 USC 1001, et. seq), implied, though did not explicitly state, that the lower standard was applicable to cell site location data. I don’t believe that this is what the statute actually says, and the the Congressional Committee that wrote the statute didn’t either. The committee report (aka legislative history) explains that CALEA:
Expressly provides that the authority for pen registers and trap and trace devices cannot be used to obtain tracking or location information, other than that which can be determined from the phone number. H. Rep. No. 103-827.
Judges sometimes do have good arguments for interpreting statutes as they see them, even if the intepretation contradicts the ‘legislative history’. After all, the Congress writes the statute so they’re expect to get it right so that we can all understand it on its ‘face’, without having to resort to extra explanations.
All that said, there’s reason to be other than completely gloomy about this result from a privacy perspective. First, this magistrate specifically limited his ruling to situations in which the government only seeks information about the cell site with which the target cell phone is actually communicating. That reveals someone’s location with a 10 mile to 2000 ft radius (depending on the density of cells) but does not enable the goverment to instantaneously ‘triangulate’ a person’s location to a finer resolution. (It is possible to infer a rough map of were the person travels, however.) And second, it’s generally the case that when lots of trial courts start coming to opposite conclusions on the same or related questions, there’s greater pressure of appeals courts (who have broader jurisdiction) to resolve the differences and settle on one common interpretation of the law. That’s more likely to happen on this issue now that there is disagreement. (There are some legal technicalities that make an quick appeal difficult, but it’s likely to happen sooner or later.)
A report on the first stage of this case is on Declan McCullagh’s Policy Blotter (2 September 2005).
“Cold Hits” - a new frontier in DNA profiling
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Last week the Washington DC Court of Appeals ruled (Case no. 05-CO-333, 15 December 2005, Washington, CJ) that it is permissible for the prosecution to use DNA evidence matching DNA collected from a crime scene against a database of DNA samples collected from a large population (in this case, 100,000+ criminal offenders in Virginia). The legal issue in the case on appeal was limited to the question of whether there was sufficient scientific consensus as to method of assessing the statistical reliability of this procedure to justify the introduction of this evidence in court. In particular, the court asked whether there was agreement among experts on how to assess the ‘random match probabiliy,’ the probability that the DNA match identified could have picked out the wrong person. However, the implications of the holding are far reaching in that the path appears to be more clear for matching unidentified DNA found at a crime scene against large database of DNA data.
The underlying case giving rise to the appeal involves the murder of a man in Washington, DC. Initial investigation of the crime yielded a suspect in a related robbery, but failed generate sufficient evidence to charge anyone with murder. In the course of investigating the scene, the police collected blood from the scene and then matched the DNA found in the blood against a database of Virginia criminal offenders. This profiling yielded a ‘cold hit,’ identifying a man named Raymond Jenkins. The trial court, however, refused to allow the prosecution to introduce this match in evidence so the goverment appealed, bringing the case to the Washington, DC Court of Appeals, DC’s highest court. The Court of Appeals determined that the trial court made a mistake in blocking the introduction of the ‘cold hit’ evidence so the case will now go back to the trial court with the government being able to introduce that match against the defendant. As I wrote above, there was no basis for the appeals court to address the larger policy implications of this sort of DNA matching, but this case does mark and important expansion on DNA profililng powers both in DC and likely in the rest of the country.
Catching up on cell phone location tracking law
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Catching up on my reading after a busy few months, I’ve been looking through 3 recent opinions from US Magistrate Judges (the folks who generally have the first pass on government requests for court orders for wiretaps and other types of electronic surveillance). Since the middle of October, Federal courts in
Maryland (05-4486), Texas(H-05-557M), and New York (M 05-1093 (JO)) have rejected government requests to get real time cell phone location tracking information because the government, so far, has not been willing to meet the the Fourth Amendment ‘probable cause’ standard for justifying this intrusion.
The key legal issue in all of these cases is what level of judicial oversight is required to compel the disclosure of information sought by the government. Answering this question has turned on whether the data sought is “transactional information generated by the mobile phone network (in which case it would be regulated under
18 USC 2703 sections (b) and (d)) or that it is equivalent to a mobile tracking device regulated under 18 USC 3117? If it is transactional data then it is available to the government under a so-called intermediate standard, less than probably cause, but more than a simple request. Under this rule, the government has to give a clear and articulable reason which the data is relevant to an ongoing investigation. It can’t just be going on a fishing expedition. Though the government advocated this position is all three cases, all three courts found that there must be a showing of probable cause. (This is all personally pretty interesting to me because I did a lot of work advocating for this new protection for transactional data back in 1994. At the time we (and I believe the Congress) was thinking about transacational information such as email and web access logs. We were not thinking about real time location data. My guess is that Congress will have to come back to this question to settle it.
Kudos to EFF (where I’m proud to say I once worked) for for filing an amicus curiae brief in the New York case and for bringing attention to this issue.
Added 22 December 2005: See this update
Princeton campus network exposes student identities
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
A group of Princeton University students have just launched an effort to expose privacy risks in the operation of the campus network. They’ve shown that each user’s username is publicly visible to anyone through a reverse DNS lookup. I hope the Princeton IT group will fix this but wonder how many other local networks are configured this way.
(One quibble: I found it ironic that the web site put up by these students fails to post it’s own P3P policy (the one Web standard for informing users about site privacy policies) and doesn’t even have a human-readable privacy policy. I hope they’ll fix both of these privacy gaps.)
Secret Laws: How does the cryptographic ‘law’ against security by obscurity apply to laws in a democracy?
The original appearance of this entry was in Danny Weitzner - Open Internet Policy
Last week, John Gilmore had a chance to convince the 9th Circuit Court of Appeals that he should be allowed to board commercial aircraft without showing ID. And perhaps more importantly, he argues that if there is a government rule requiring an ID, then the full extent of that rule/law should be made public. Gilmore claims that the rule requiring presentation of ID is an unreasonable search under the 4th Amendment and is unconstitutionally vague (violating his 5th Amendment due process rights) because the law isn’t even publicly available. The Department of Justice (defendant in this case) counters that courts have already accepted that searches at airports are acceptable under the 4th Amendment (see US v. Davis, 482 F.2d 893 (CA9, 1971)) and that the rule requiring searches need not be made public. While the Justice Department has not acknowledged the existence of any rules, it did offer to present something to the judges (though not to Gilmore) in a secret session.
There’s certainly a fundamental 4th Amendment question here, but what about our right to know the laws and rules under which we’re governed? In the world of Internet security, cryptographers generally accept Kerckhoffs’ law, holding that the security of a cryptographic algorithm must not be dependent on the secrecy of the ciphering method. That is, the mathematical process used in any coding system must be publicly visible. (Of course there will be secret keys that make the algorithm work, these need not be made public.) Kerckhoffs asserted this view because he believed that an algorithm should strong enough that it remains secure if an adversary discovers it. Modern computer security thinking has extended this law to the more general principle that security mechanisms ought to be able to be subjected to public scrutiny so that we have the best chance of catching unintended flaws in the mechanism. So where does this leave these ID rules? Is it enough that we simply know they exist (Gilmore and the rest of us know the basics of their operation from going through airport screening. We know we can’t get on a plane without showing ID.) Or, is there some practical and/or principled reason why we should know the full extent of the rules.
The government asserts that even if the rule requiring presentation of ID exists, citizens have no constitutional right to see if. The trial court accepted the government’s argument that such a rule is a law enforcement procedure and as such need not be disclosed. The court reasoned that the substance of the rule is quite apparent by the practice of requiring ID presentation so there’s no need to see the details. The Justice Department’s brief likens the rule (if it exists), to a drug dealer profile used by border guards to catch potential drug smugglers. This is a rule to which we’re all subject in that when we cross the border manifesting traits that are on the profile, we’re going to be stopped and searched, but we have no right to see the actual profile. In fact, most people would probably agree that disclosing the details of the drug dealer profile could harm law enforcement effors without any significant enhancement of civil liberties.
Gilmore, on the other hand argue that in a free society there are simply no secret laws. In the case of ID checks or other law enforcement rules, how much transparency is enough?
Hear the oral arguments through this WMA link from the 9th Circuit Court of Appeals website.
9/11 Commission Progress report highlights information policy shortcomings
This entry was originally published at Danny Weitzner - Open Internet Policy
Earlier this week the former (but still active) 9/11 Commission issued a report card on progress in implementing the Commission’s recommendations. The summary gives respectable marks to various military and law enforcement objectives, but not even ‘Gentleman’s Cs’ to information policy priorities such as better information sharing between government agencies (grade = D), putting a privacy oversight board in place (grade = D), effective airline passenger pre-screening (grade = F) and developing real privacy guidelines (grade = D) for all this much-discussed information sharing. These are all areas that require not only public policy leadership by Congress and the Administration, but also creative technical solutions that enable transparency and accontability of the use of personal information in the important, but very privacy sensitive area of homeland security.

