Query Log Privacy Questions

8 May 2007
WWW2007 Workshop: Query Log Analysis: Social and Technological Challenges

Banff, AB

Daniel J. Weitzner
Decentralized Information Group
MIT Computer Science and Artificial Intelligence Laboratory

These slides: http://dig.csail.mit.edu/2007/Talks/0508-query-log-privacy/


1. The advancing privacy challenge

2. Help from the history of the evolution of privacy and technology

3. Possible responses to privacy in query logs: two unsatisfying approaches and one new possibility

Privacy Challenges in the Web's first decade

AT&T TSD 3600 gmail

Characteristics of Today's Privacy Challenge

  1. Lots of personal information data
  2. held by lots of parties
  3. huge increase in analytic capacity and data integration techniques
  4. little time and attention to manage uses
  5. unclear rules when data crosses boundaries

Overall transparency is a major factor to query log-related privacy risks.

What Privacy Isn't

Saltzer and Schroeder (The Protection of Information in Computer Systems):

“The term “privacy” denotes a socially defined ability of an individual (or organization) to determine whether, when, and to whom personal (or organizational) information is to be released.”

Privacy's Boundaries - The Home

Historical foundations - the home

The home "The house of everyone is to him as his castle and fortress, as well for his defence against injury and violence, as for his repose...."
Semayne's Case, All ER Rep 62 (Michaelmas Tern 1604)

Privacy's Boundaries - The Home Breached

Early telephones "Ways may some day be developed by which the Government, without removing papers from secret drawers, can reproduce them in court, and by which it will be enabled to expose to a jury the most intimate occurrences of the home.... Can it be that the Constitution affords no protection against such invasions of individual security?"
Olmstead v. United States, 277 U.S. 438, 467 (1928) (Brandeis, J., dissenting)

Privacy's Boundaries - New Privacy Protections

Public phone booth "The Fourth Amendment protects people, not places. What a person knowingly exposes to the public, even in his own home or office, is not a subject of Fourth Amendment protection.... But what he seeks to preserve as private, even in an area accessible to the public, may be constitutionally protected
Katz v. United States. 389 U.S. 347 (1967)

Privacy's Boundaries - New Challenges

The home It would be foolish to contend that the degree of privacy secured to citizens by the Fourth Amendment has been entirely unaffected by the advance of technology...."
Kyllo v. United States. 533 U.S. 27 (2001) (Scalia, J.)

One approach -- Information Hiding: Privacy Sensitive Data Analysis

Goal: construct data base protocol that limits information access according to a formal definition of privacy

Privacy Definition: indistinguishability of the individual from the community

Method: measures epsilon-indistinguishability of a database query transcript

Differential Privacy, Cynthia Dwork, 33rd International Colloquium on Automata, Languages and Programming, ICALP 2006, Part II, pp. 1–12, 2006.

see also Sweeney's k-anonymity work

Questions upon the Success of Privacy Sensitive Data Analysis

A privacy-safe zone: Privacy sensitive data mining establishes a boundary, which, if respected, assures no privacy risk to the individual.

  1. how do you know that data usage remains within the privacy-safe zone:
    • over time?
    • across an institution?
  2. what legal rules outside the privacy-safe zone?

Another approach -- Consent and its limitations

IRB's & basic privacy notice & consent model. Can today's privacy model (EU or US) be sufficient going forward?

Key will be purpose limitation, but we have a dilemma...

Dilemma: limited individual and regulatory capacity to control escalating data collection.

Current result of consent dilemma + increased inference power: strict about what's collected but loose about usage

Better result: loose about what is collected and strict about usage

Toward Another Approach...

  1. How many believe you are subject to law (any law)?
  2. How many of you follow (most) laws? [exclude speed limits]
  3. How many of you read all the laws to which you believe you are subject?
  4. How many have been to a court of law?

Information Accountability Through Policy Aware Systems

General view (amongst the 'digerati'): law has to catch up with new technology.

General question: how will laws catch up?

My question: How will the Web finally catch up with the 'real world'?: in everyday life, the vast major of 'policy' problems get worked out without recourse to legal system.

Design goal: instrument the Web to provide seamless social interactions which allow us to avoid legal system the way we do in the rest of life

Global perspective: In the shift from centralized to decentralized information systems we see a general trend:

ex ante policy enforcement barriers -> policy description with late binding of rules for accountability

Discussion and More Information

For more information see:

Work described here is supported by the US National Science Foundation Cybertrust Program (05-518) and ITR Program (04-012).

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License.