Beyond Secrecy: New Privacy Protection Strategies for Open Information Spaces

IEEE Internet Computer Technology & Society column

Daniel J. Weitzner <djweitzner@csail.mit.edu>
Principal Research Scientist
Decentralized Information Group
MIT Computer Science and Artificial Intelligence Laboratory

This document on the Web [http://dig.csail.mit.edu/2007/09/ieee-ic-beyond-secrecy-weitzner.html]

A later version of this column appears in IEEE Internet Computing, Sept/Oct 2007

Privacy is the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others.

In 1967, Alan Westin set in motion the foundations of what most Western democracies now think of as privacy when he published his book, Privacy and Freedom.[1]His careful collection of sociological, legal, and historical perspectives on privacy in modern society grounded the concept in basic notions of human dignity and democracy at a time when more and more commercial and governmental institutions were using large mainframe "databanks" to collect and process personal information. People worried that human freedom would erode or that governments would tend toward tyranny, becoming tempted to misuse their newfound power over private data. Computer scientists at the time also worried about the privacy and security risks associated with computer systems that held increasingly sensitive, valuable data. In their seminal paper on information security,[2]Jerry Saltzer and Michael Schroeder adopted verbatim Westin's definition, intending for it to guide engineers in a direction that could produce privacy-protective systems.

Privacy Defined

At the core of Westin's definition is the understanding that privacy is the ability to control who has access to information and to whom that information is communicated. Given the computing technology of the time, we can see this definition's appeal as a means of controlling privacy intrusions that could result from automated data processing. Computing power was expensive, highly centralized in a relatively small number of institutions, and accessible to a select few entities. Thus, following Westin's view that privacy is synonymous with limited information flow, we can assume that if privacy policy could successfully contain the flow of personal information from large data banks, then computers' intrusive power would be effectively controlled.

Reflecting the views of Westin, Salzer, and Schroeder, privacy became synonymous with secrecy in computer and network system designs -- that is, until today. Computer science researchers considered encryption algorithm development to be a boon to privacy. As cryptographer David Chaum explained to a US congressional committee in 1995,

Privacy technology allows people to protect their own information and other interests, while at the same time it maintains very high security for organizations. Essentially, it is the difference between ... a centralized system with disenfranchised participants (like the electronically tagged animals in feedlots) and ... a system where each participant is able to protect its own interests (like buyers and sellers on a town market square). [See http://web.archive.org/web/19970418074834 or www. digicash.com/publish/testimony.html.]

When the US government erected export-control barriers against the cryptographic technology used to ensure data confidentiality, a coalition of privacy advocates joined IT companies to get those barriers removed and enable widespread adoption of encryption for privacy protection. This pro-privacy coalition organized itself under the slogan "Security and Freedom through Encryption" and eventually influenced Congress to pass the SAFE Act in 1997. With cryptographically secured secrecy, humans could have privacy and freedom, following the definition that Westin created years before. Now, on the 40th anniversary of Privacy and Freedom's publication, let's consider present-day privacy challenges and see whether our basic privacy goals are well served by protection through secrecy.

This summer, the New York City police department made headlines with an announcement that it's adding 100 extra video cameras to the more than 3,000 (mostly private) video surveillance lenses that peer down on the comings and goings of those passing through lower Manhattan.[3] This is newsworthy in part because it will let police automatically read license-plate numbers, but, beyond that, the story speaks to our growing unease about how transparent our lives are becoming. We know that private and governmental organizations are collecting more and more data about us, and we're aware that some have advanced serious arguments regarding why this information must be available for these sectors' purposes -- arguments involving counterterrorism, fraud detection, and so on. However, we're also looking for some boundary behind which we can safely retreat. This boundary -- privacy -- is what protects us from overreaching corporations and acts to insure us against totalitarian tendencies in government.

A key challenge to privacy comes from the convergence of powerful computer technology, increases in video surveillance, GPS devices on our cell phones, and the rapidly declining cost of computer storage. In the past, data storage and processing's high cost worked in privacy's favor, but in the Google age, private citizens, in addition to police and corporations, have at their fingertips supercomputing power that lets them find what they're looking for amidst revealing, ever-growing data stores.

To see how computational power over images available to all individuals and institutions is growing, consider Google's new "Street View" service: photographs of actual and relatively current street scenes in major cities accessible alongside search engines' standard mapping features. The views are so detailed, they not only help us recognize the building where we're supposed to meet a friend, they also let us see people heading into adult book stores or even watch a robbery in progress. Google announced that it will limit access to images taken outside women's crisis shelters to minimize the risk that abusive spouses will find their victims. But there's no way to hide every image that might contain sensitive or harmful information.

Social Challenges

The most fundamental challenge to 20th century privacy laws is more social than technical -- adding to the stream of personal data is a new wave of user-generated content in the form of blogs, YouTube videos, eBay transactions, and social networking sites such as Myspace. For various reasons, tens of millions of people worldwide reveal sensitive details about themselves to friends, merchants online or off, or even the public at large. Some suggest that we're moving toward a "post-privacy" transparent society. However, we shouldn't assume that those who share personal information with increasing transparency have ceded all privacy rights. Similarly, it would be unfair to assume that citizens in lower Manhattan have given up their privacy rights just because they don't put paper bags over their heads when they walk to work.

Privacy protection in an era in which information flows more freely than ever will require increased emphasis on laws that govern how we can use personal data, not just who can collect it or how long they can store it. Much of our current privacy views are based on controlling access to information. We believed that if we could keep information about ourselves secret, prevent governments from accessing emails, and so on, then we would have privacy. In reality, privacy has always been about more than just confidentiality, and looking beyond secrecy as the sine qua non of privacy is especially important. Recognizing that surveillance videos will exist to protect private property, spot terrorist threats, or simply enforce traffic laws, we must instead turn our attention to developing a policy consensus about how organizations can use this data.

New privacy laws should emphasize usage restrictions to guard against unfair discrimination based on personal information, even if it's publicly available. For instance, a prospective employer might be able to find a video of a job applicant entering an AIDS clinic or a mosque. Although the individual might have already made such facts public, new privacy protections would preclude the employer from making a hiring decision based on that information and attach real penalties for such abuses.

Computing Solutions

If we can no longer reliably achieve privacy policy goals by merely limiting access to information at one point on the Web, then what system designs will support compliance with policy rules? Exercising control at one point in a large information space ignores the very real possibility that the same data is either available or inferable from somewhere else. Thus, we must engineer policy-aware systems based on design principles suitably robust for Web-scale information environments. Here, we can learn from the design principles that enabled the Internet and the Web to function in a globally coordinated fashion without having to rely on a single point of control.

At MIT, my research group is investigating designs for information systems that can track how organizations use personal information to encourage rules compliance and enable what we call information accountability, which pinpoints use that deviates from established rules.[4] We should put computing power in the service of greater compliance with privacy rules, rather than simply allowing ever more powerful systems to be agents of intrusion.

Information accountability will emerge from the development of three basic capabilities: policy-aware audit logging, a policy language framework, and accountability reasoning tools. A policy-aware transaction log will initially resemble traditional network and database transaction logs, but also include data provenance, annotations about how the information was used, and what rules are associated with that information. Policy-aware logging, not unlike TCP/IP packet routing, is a relatively dumb operation that is oblivious to the semantics of the subject of log entries. What matters is that events are logged. Other tools will be responsible for analysis and action based on the logs' content. Assessing policy compliance over a set of transactions logged at a heterogeneous set of endpoints by numerous human actors requires some common framework for describing policy rules and restrictions with respect to the information being used.

We consider it improbable in the extreme that the entire world would ever agree on a single set of policy language primitives. However, drawing on Semantic Web techniques, including ontologies and rules languages, we believe larger and larger overlapping communities will be able to develop a shared policy vocabulary in a step-by-step, bottom-up fashion. Accountable systems must assist users in seeking answers to questions, such as "Is this piece of data allowed to be used for a given purpose? Is a string of inferences permissible for use in a given context, depending on the provenance of the data and the applicable rules?" It seems likely that we'll need special-purpose reasoners, based on specializations of general logic frameworks, to provide a scalable and open policy reasoner. Cryptographic techniques will play an important role in policy-aware systems, but rather than focus on confidentiality or access control -- as do privacy designs today -- cryptography will help create immutable audit logs and provide verifiable data provenance information.

Access control and security techniques will remain vital to privacy protection -- access control is important for protecting sensitive information and, above all, preserving anonymity. My colleague from UC Berkeley, Deirdre Mulligan, recounts a situation on the Berkeley campus in which a computer vision experiment on campus captured images of a group of female Iranian students engaged in a protest against Iranian human rights violations. Although they were free from harm on the campus, the fact that the researchers recorded the images and made them publicly available on the project's Web site put the students' family members, many of whom were still in Iran, at grave risk. The research group took down the images as soon as they realized the danger, but harm could have easily occurred already.

Clearly, the ability to remain anonymous, or at least unnoticed and unrecorded, can be vital to protect individuals against repressive governments. Although US law doesn't recognize a blanket right of anonymity, it does protect this right in specific contexts, especially where it safeguards political participation and freedom of association. Even though no general protection exists for anonymous speech, we have a right keep private our role in the electoral process. Courts will protect the right of anonymous affiliation with political groups, such as the NAACP, against government intrusion. Finally, of course, we don't want our financial records or sensitive health information spilled all over the Web.

Nevertheless, in many cases the data that can do us harm is out there for one reason or another. With usage restrictions established in law and supported by technology, people can be assured that even though their lives are that much more transparent, powerful institutions must still respect boundaries that exist to preserve our individual dignity and assure a healthy civil society.

Acknowledgments

I thank colleagues Hal Abelson, Tim Berners-Lee, Joan Feigenbaum, Chris Hanson, Jim Hendler, Lalana Kagal, Gerry Sussman, and K. Waterman, all of whom have worked on the Transparent Accountable Data Mining Initiative, in which many of these ideas about privacy and accountability were explored. The views expressed here are entirely my own, and haven't been endorsed or approved by W3C or any of its members.

References

[1] A. Westin, Privacy and Freedom, The Bod-ley Head, 1967.

[2] J. Salzer and M. Schroeder, "The Protection of Information in Computer Systems," Comm. ACM, vol. 17, no. 7, 1974.

[3] C. Buckley, "New York Plans Surveillance Veil for Downtown," The New York Times, 9 July 2007

[4] Weitzner, Abelson, Berners-Lee, Feigenbaum, Hendler, Sussman, Information Accountability, MIT Tech. Report MIT-CSAIL-TR-2007, June 2007.

Daniel J. Weitzner is principal research scientist at the MIT Computer Science and Artificial Intelligence Lab, codirector of the MIT Decentralized Information Group, and the Technology & Society Policy Director of the W3C. You can find his blog at http://people.w3.org/~djweitzner/blog/.

Weitzner is Principal Research Scientist at MIT Computer Scientist and Artificial Intelligence Laboratory and co-founder of the MIT Decentralized Information Group. He is also Technology and Society Policy Director of the World Wide Web Consortium. The views expressed here are purely his own and do not reflect the views of the World Wide Web Consortium or any of its members.

His FOAF file is available

This work is licensed under a Creative Commons Attribution-NoDerivs 2.5 License.