Reciprocal Privacy (ReP) for the Social Web

Daniel Weitzner <djweitzner@csail.mit.edu>
MIT Decentralized Information Group
12 December 2007

This document: http://dig.csail.mit.edu/2007/12/rep.html

Earlier version at http://dig.csail.mit.edu/2007/12/rep-v0-01.html

Status: in progress. draft 0.02 (18 Feb 2008)

Overview

This is a proposal to re-establish privacy boundaries in social networking environment. ReP defines a minimum set of restrictions on usage of personal information as well as on the public assertions that may or may not be made based on inferences from social network data. Accountability to these rules will be achieved through social consensus along with a technical assistance from tools that help participants in the social network. The policy vocabulary defined here is designed to help participants in social networks signal to each other about their own personal privacy boundaries.

With this vocabulary, social networks will be able to keep those privacy rules associated with individual elements of personal information as that data is woven into the growing Giant Global Graph of personal information. Most importantly, I hope ReP can become the basis of not just a social consensus. But also, ReP can be platform on which to build application widgets and other tools that

help users to declare their privacy boundaries
help those who user personal information in social networks (everyone) to be aware of these boundaries, and
provide a light weight layer of accountability to encourage Web users to respect each others stated privacy boundaries.

ReP is designed with some though in mind to legal rules that might come along in the future to add weight to the privacy boundaries that users seek to establish. Those legal frameworks, if they come at all, will come later. They have a chance of emerging if it looks like this accountability approach to social network privacy is a good idea, but social consensus will have to gell before legal rules can be created.

Background

Today, more and more social networks are coming onto the Web and are working to share more data across the previously-established boundaries that have previously separate these networks (Open Social and Social Graph APIs, for example.). Participants in social networks should have the benefit of widely shared agreements about how the information they present in those networks will be used. To encourage the development of these social and legal privacy norms, we need a simple policy language for expressing rules associated with personal information, and a reliable, scalable mechanism for assessing accountability with those rules. We propose a new protocol by which those who share personal information on the Web can have increased confidence that this information will be used in a transparent manner and that users of the personal information will be able to be held accountable to comply with the stated usage rules.

Privacy policies and associated technologies should eventually provide individuals harmed by breaches with legal recourse against those who abuse the norms of information usage. Hence, agreements must be clear and structured in a manner that there is a chance that the existing legal system could provide a remedy for harm. We should neither expect nor require than a single set of norms will be adequate for all users, all social networking contexts or all cultures, but there should be a common framework and a basic policy vocabulary that can express commonly used rules and be easily extended.

The key to sharing personal information across a diversity of privacy policy frameworks is to establish legal and technical mechanisms that ensures a baseline of social and legal accountability across varying rulesets. Participants in the ReP web must agree as a condition of accessing anyone else's personal information that usage of personal information will be tagged with self-reported provenance information including the identify of the user and the sources of the personal information being used. Further, anyone who uses the personal information must agree to require that the same set of rules (both the logging requirement and whatever usage rules came with the data) be applied to any subsequent users of the data. This provenance will allow the data subject to check that a specific usage of personal information complies with the specified usage limitations, and to follow the trail of accountability from the initial access of the data through to the final usage event.

This copy-left-inspired viral policy is the most effective way to assure that the original rules associated with personal data are respected as that data is re-used over and over again in a variety of contexts. In the event of misuse, the provenance data will provide a means to locate the mis-user and seek correction or other redress. In many cases where such unauthorized use does real harm to the data subject, it will be possible with some amount of forensic effort will find the mis-user and enable redress. Of course, there will be anonymous mis-users of personal information. We cannot insulate Web users from those risks with ReP, but neither can any other privacy protection strategy that is feasible in an inherently open information environment.

While the basic social, legal and technical architecture proposed here is generic, we illustrate ReP using FOAF and other Semantic Web technologies. A more detailed explanation of the accountability approach to privacy on the Web can be found in our paper on Information Accountability.

Current Problem

Social networks, blogs, photo sharing sites and other applications known collectively as the Social Web are collecting and exposing an increasingly complex and far-reaching set of information describing the social relationships of millions of individuals on the Web. Privacy risks include not only exposure of one's own personal information, but also anyone else who is represented as a node in the data subjects FOAF graph. Not only does this social network expose the personal information of its individual members, but also the professional and personal relationships among all of these people. Scenario A illustrates a simple set of privacy problems that can arise

Scenario A - privacy of FOAF graph constructed for blog comment filtering

The DIG Research Group runs a blog, along with which comes the common problem of blog comment spam. Rather than outsource the spam filtering with uncertain results, we wanted to use basic semantic web technology to implement exactly the commenting policy we chose. Our goal is to allow anyone who is moderately closely related, ie. no more than three degrees of separation from members of our group. The solution was to create a white list of all OpenIDs that appear in the graph of foaf:person's who are no more than three foaf:knows links from one of our group members. This was relatively easy to do (as least for Joe Presbrey, as documented by Sean Palmer and Dan Connolly). The privacy risk comes from the fact that if it's easy for us to do this, then it's easy for anyone, especially since we've published the crawler code. We're only using data (FOAF files) that is publicly available, but in the course of doing this, we're creating a list of those people who are, in some loose way associated with DIG. This is not information that they publish themselves, but something we infer from following links. We're making claims on their behalf, in their name. What's more, we've said that we will refuse comments from anyone who does not expose enough information about themselves to appear (or not) in this particular social network graph. That is not really such a grave loss for most, nor is it all that likely that any great privacy harm from this, but there are downsides:

employers who think the Semantic Web is a crazy idea might think twice before hiring them
spammers might use this list to market semantic web products
someone might mistakenly think that this means people in this particular network are located near MIT.

There are several privacy responses to these risks. One might dismiss them as unimportant. Worse. one might just throw up one's hands and figure there's nothing to do about them. Those may be acceptable answers where the privacy impact is low, but if web-scale social network is going to grow, we will need a privacy approach that addresses these risks seriously without either forcing people to give up all expectation or privacy and without driving people to hide their personal information in a disconnected set of networks.

Here's how ReP would address the privacy risks in this scenario:

When we use data about you from your FOAF file, we will inform you that we've done so by sending a message (described in more detail below) that contains a pointer to the use we make of your data, a way to contact us, and a characterization of the kind of use we make of you data. From this log, you will be able to check that we, and anyone else who uses your personal information elsewhere on the Web is doing so consistent with the rules you set out with your FOAF file.

When others use data about your from the aggregation and analysis that DIG creates (such as the assertion that you're 1, 2 or 3 degrees removed from a DIG member), they will also have to report that usage to you (and/or perhaps to us... I'm still working out that architectural detail). Anyone who uses your data will also have an easy way to discover the rules that you've set to govern how your data will be used. I've laid our a very basic set of usage and inference restrictions here, but I'd expect that vocabulary of usage rules to evolve quite a bit of over time, but there is a minimum set of rules that everyone in ReP social web has to agree to and follow: the rules governing logging and accountability.

With these usage restrictions and accountability logging in place, anyone whose FOAF data becomes willingly or unwillingly included in the relationship graph that we create describing DIG's social network will have the means to check that their usage rules are being followed by DIG and by anyone else who publishes data about them. If you came across a mention of yourself, as identified by your FOAF URI, you could check to see that that usage was reported in your log. If you find an error or misuse of your data, you can complain to the user. Of course, it is possible that someone could look at this data and then use it offline in an unaccountable manner. The logging the ReP requires may be useful to figure out just where the data leaked out of the accountable ReP graph and was misused. Of course, like any other effort to investigate wrongdoing, this will take work and does not guarantee success. Still this provides a stable privacy environment for those who are honest and those who are dishonest but afraid to break the rules. That is a big step forward from where we are today.

Failure of Available Technical Solutions

Alternatives available for privacy protection in open social networks on the Web are generally unsatisfying:

Encryption of data and other access control mechanisms defeats that very purpose of the social network: locking up data one wants to share is self-defeating, and
The multi-layer re-use of data enabled by the Semantic Web makes for powerful inferencing by multiple, unrelated parties, but is not well suited to policy control in the tradition bilateral model of policy agreements made between one data subject and one data user.
Mailbox checksum is ok for spam project, but provides no broader privacy protection
Anonymization of social network data doesn't work because of the ability to re-identify individuals (ie. Dwork/Kleinberg)
Existing languages are inadequate
- P3P purpose limitations too imprecise (ie. limited to current transaction and sharing with 3rd parties)
- XACML and SAML, are primarily for access control, not usage limits

Limitations of Existing Privacy Law and Policy

Policy Language Elements

Reciprocal Accountability Through Decentralized Provenance

People are more likely to comply with rules (social or legal) if they believe that their non-compliance will be noticed. Hence, ReP proposes that as a condition of using personal information the person who uses or republishes personal data must attach a minimum amount of provenance information so to enable social network participants to walk back through the web of connections that make up a social network and identify individuals who intentionally or inadvertantly mis-use personal information.

The data user attach to a new use of personal information the following:

Identity of the data subject (a profile ID or URI such as foaf:person)
Identity of the data user (such as the user's FOAF URI)
pointer to the source(s) of personal information on which this us was based (back pointers beyond just the URI of the FOAF file used)
timestamp of the usage event

For these purposes, a data usage event is:

creation of a document (whether published on the Web or created for restricted use) based on any information derived from the social network data
taking an action that could have an adverse impact on the individual who is the subject of the data.

Usage restrictions

These are a series of restrictions that can be applied to uses of personal information directly drawn from a person's profile (including the contents of a FOAF file) or and inferred from the social network in which the data subject participates.

Restrictions:

<non-comm>: not for use for cc:commercial purposes
<group-only>: no use except members of named group <foaf:group>
<no-employer>: not for use in employment decisions
<no-finance>: not for use in financial decisions, including loan and insurance approval decisions
<no-contact>: do not use the information here to contact the person.

These restrictions are to interpreted as adding usage restrictions to whatever rules a present given the applicable law and the terms of service of the environment from which the data originates.

(We could also import some of the P3P purpose vocabulary, though most is ecommerce oriented and not generally applicable to the social network context.)

Republication restrictions

A special category of usage restrictions involves statements that others may make about an individual. In general, I believe that people have a right to 'speak' about others, including a right to express their opinion about others and to share what they may be able to learn about others. However, common courtesy, and sometimes law, places certain limits on what we casually will say about others in public. These restrictions are designed to remind others what aspects of our identity and personal status we would like kept private.

Republication restrictions include:

<no-religion>: may not assert religious affiliation
<no-ethnic>: may not assert ethnic identity
<no-health>: may not assert anything about the heath status of the person
<no-politics>: may not assert anything about the person's political believes
<no-sex-orient>: may not assert anything about the person's sexual orientation
<no-location>: may not assert the data subject's location

It's important to be able to assert these kinds of limitations about what others may/should say about you, because of the very unique feature of social networks -- they reveal things about you that you don't explicitly reveal yourself. For example, by observing online behavior such as:

books I say I read
music I say I like
the composition of my social network

you may be able to infer attributes of my personal life that I have not explicitly disclosed. I can't prevent you from noticing these individual facts nor can I prevent you from seeking to draw your own inferences, but I can request that you not republish those inferences as assertions about me.

Developer/Source Author Recipies

@@sample policy@@

@@sample attribution@@

Further information

This is a partial bibliography and list of links that I have found useful in understand social network privacy issues:

Academic research:

Behavior in Public Places: Notes on the Social Organization of Gathering, Erving Goffman (Free Press, 1963)
Research and Social Network Sites, maintained by Danah Boyd

Blog Posts:

MSM:

New York Times, "On Facebook, Scholars Link Up With Data," STEPHANIE ROSENBLOOM, December 17, 2007

Acknowledgements

Many thanks to colleagues from MIT's Decentralized Information Group and the TAMI project. Individual feedback from Dan Brickley, ... has been very helpful.

Further topics/changes:

what usage restrictions should there be on the transaction logs themselves? - lilkely answer: logs can only be analyzed for the purpose of checking compliance upon discovery of an adverse use of perosnal information. This may give rise to a second universal rule in the system: transaction data can only be used for investigating adverse acts.
can we encrypt log data to enforce rule on use of log data?
consider putting all infrerence restrictions with usage restrictions.
consider a separate poliicy element on publication of personal data and inferences
may not need logging, rather just allow individuals to require that republication of personal data is done with attribution by the author so that the data subject can complain if there is a problem with accuracy or inappropriate use.
add 'I'm over it' policy

This work is licensed under a Creative Commons Attribution-No Derivative Works 3.0 Unported License.