Reciprocal Privacy (ReP) for the Social Web

Daniel Weitzner <djweitzner@csail.mit.edu>
MIT Decentralized Information Group
12 December 2007

This document: http://dig.csail.mit.edu/2007/12/rep-v0-01.html

Latest version: http://dig.csail.mit.edu/2007/12/rep.html

Status: in progress. draft 0.01

This is a proposal to establish a reasonable privacy balance in social networking environment. Today, more and more social networks are coming onto the Web and are working to share more data across the previously-established boundaries that have previously separate these networks. Participants in social networks should have the benefit of widely shared agreements about how the information they present in those networks will be analyzed and used. To encourage the development of these social and legal privacy norms, we need a simple policy language for expressing rules associated with personal information, and a reliable, scalable mechanism for assessing accountability with those rules. We propose a new protocol by which those who share personal information on the Web can have increased confidence that this information will be used in a transparent manner and that users of the personal information will be able to be held accountable to comply with the stated usage rules.

Privacy policies and associated technologies must provide individuals harmed by breaches with legal recourse against those who abuse the norms of information usage. Hence, agreements must be clear and structured in a manner that there is a chance that the existing legal system could provide a remedy for harm. We should neither expect nor require than a single set of norms will be adequate for all users, all social networking contexts or all cultures, but there should be a common framework and a basic policy vocabulary that can express commonly used rules and be easily extended.

The key to sharing personal information across a diversity of privacy policy frameworks is to establish legal and technical mechanisms that ensures a baseline of social and legal accountability across varying rulesets. Participants in the ReP web must agree as a condition of accessing anyone else's personal information that usage of personal information will be reported by the user to a log specified by the data subject. Further, anyone who uses the personal information must agree to require that the same set of rules (both the logging requirement and whatever usage rules came with the data) be applied to any subsequent users of the data. The log will allow the data subject to check that a specific usage of personal information complies with the specified usage limitations, and to follow the trail of accountability from the initial access of the data through to the final usage event.

This copy-left-inspired viral policy is the most effective way to assure that the original rules associated with personal data are respected as that data is re-used over and over again in a variety of contexts. In the event of misuse, the logs will provide a means to locate the mis-user and seek correction or other redress. In the event that a use of personal information is discovered which is NOT recorded in the person's accountability log, that use is by definition a violation of the ReP policy. In many cases where such unauthorized use does real harm to the data subject, it will be possible with some amount of forensic effort will find the mis-user and enable redress. Of course, there will be anonymous mis-users of personal information. We cannot insulate Web users from those risks with ReP, but neither can any other privacy protection strategy that is feasible in an inherently open information environment.

While the basic social, legal and technical architecture proposed here is generic, we illustrate ReP using FOAF and other Semantic Web technologies. A more detailed explanation of the accountability approach to privacy on the Web can be found in our paper on Information Accountability.

Current Problem

Social networks, blogs, photo sharing sites and other applications known collectively as the Social Web are collecting and exposing an increasingly complex and far-reaching set of information describing the social relationships of millions of individuals on the Web. Privacy risks include not only exposure of one's own personal information, but also anyone else who is represented as a node in the data subjects FOAF graph. Not only does this social network expose the personal information of its individual members, but also the professional and personal relationships among all of these people. Scenario A illustrates a simple set of privacy problems that can arise

Scenario A - privacy of FOAF graph constructed for blog comment filtering

The DIG Research Group runs a blog, along with which comes the common problem of blog comment spam. Rather than outsource the spam filtering with uncertain results, we wanted to use basic semantic web technology to implement exactly the commenting policy we chose. Our goal is to allow anyone who is moderately closely related, ie. no more than three degrees of separation from members of our group. The solution was to create a white list of all OpenIDs that appear in the graph of foaf:person's who are no more than three foaf:knows links from one of our group members. This was relatively easy to do (as least for Joe Presbrey, as documented by Sean Palmer and Dan Connolly). The privacy risk comes from the fact that if it's easy for us to do this, then it's easy for anyone, especially since we've published the crawler code. We're only using data (FOAF files) that is publicly available, but in the course of doing this, we're creating a list of those people who are, in some loose way associated with DIG. This is not information that they publish themselves, but something we infer from following links. We're making claims on their behalf, in their name. What's more, we've said that we will refuse comments from anyone who does not expose enough information about themselves to appear (or not) in this particular social network graph. That is not really such a grave loss for most, nor is it all that likely that any great privacy harm from this, but there are downsides:

employers who think the Semantic Web is a crazy idea might think twice before hiring them
spammers might use this list to market semantic web products
someone might mistakenly think that this means people in this particular network are located near MIT.

There are several privacy responses to these risks. One might dismiss them as unimportant. Worse. one might just throw up one's hands and figure there's nothing to do about them. Those may be acceptable answers where the privacy impact is low, but if web-scale social network is going to grow, we will need a privacy approach that addresses these risks seriously without either forcing people to give up all expectation or privacy and without driving people to hide their personal information in a disconnected set of networks.

Here's how ReP would address the privacy risks in this scenario:

When we use data about you from your FOAF file, we will inform you that we've done so by sending a message (described in more detail below) that contains a pointer to the use we make of your data, a way to contact us, and a characterization of the kind of use we make of you data. From this log, you will be able to check that we, and anyone else who uses your personal information elsewhere on the Web is doing so consistent with the rules you set out with your FOAF file.

When others use data about your from the aggregation and analysis that DIG creates (such as the assertion that you're 1, 2 or 3 degrees removed from a DIG member), they will also have to report that usage to you (and/or perhaps to us... I'm still working out that architectural detail). Anyone who uses your data will also have an easy way to discover the rules that you've set to govern how your data will be used. I've laid our a very basic set of usage and inference restrictions here, but I'd expect that vocabulary of usage rules to evolve quite a bit of over time, but there is a minimum set of rules that everyone in ReP social web has to agree to and follow: the rules governing logging and accountability.

With these usage restrictions and accountability logging in place, anyone whose FOAF data becomes willingly or unwillingly included in the relationship graph that we create describing DIG's social network will have the means to check that their usage rules are being followed by DIG and by anyone else who publishes data about them. If you came across a mention of yourself, as identified by your FOAF URI, you could check to see that that usage was reported in your log. If you find an error or misuse of your data, you can complain to the user. Of course, it is possible that someone could look at this data and then use it offline in an unaccountable manner. The logging the ReP requires may be useful to figure out just where the data leaked out of the accountable ReP graph and was misused. Of course, like any other effort to investigate wrongdoing, this will take work and does not guarantee success. Still this provides a stable privacy environment for those who are honest and those who are dishonest but afraid to break the rules. That is a big step forward from where we are today.

Failure of Available Technical Solutions

Alternatives available for privacy protection in open social networks on the Web are generally unsatisfying:

Encryption of data and other access control mechanisms defeats that very purpose of the social network: locking up data one wants to share is self-defeating, and
The multi-layer re-use of data enabled by the Semantic Web makes for powerful inferencing by multiple, unrelated parties, but is not well suited to policy control in the tradition bilateral model of policy agreements made between one data subject and one data user.
Mailbox checksum is ok for spam project, but provides no broader privacy protection
Anonymization of social network data doesn't work because of the ability to re-identify individuals (ie. Dwork/Kleinberg)
Existing languages are inadequate
- P3P purpose limitations too imprecise (ie. limited to current transaction and sharing with 3rd parties)
- XACML and SAML, are primarily for access control, not usage limits

Limitations of Existing Privacy Law and Policy

Policy Language Elements

ReP defines a minimum set of restrictions on direct usage of personal information as well as inferences that may or may not be drawn from social network data. Accountability to these rules is achieved by requiring that any information usage be logged for later compliance checking against the usage and inference rules.

Reciprocal Usage Accountability

The accountability rules specify the location of the log and the element required. The starred items (*) are mandatory for all participants in ReP, in order to create an environment where decentralized accountability is possible for all participants.

The data user must deposit into the log the following information:

*Identity of the data subject (such as foaf:person)
*Identity of the data user (such as the user's FOAF URI)
*URI specifying the data usage (this could be a web page URI, a mailto: URI, etc.)
*contact address for the person responsible for the usage and to whom complaints or corrections may be addressed
*pointer to the source(s) of personal information on which this us was based (back pointers beyond just the URI of the FOAF file used)
*timestamp of the usage event
self-assertion of purpose by the user

For these purposes, a data usage event is:

creation of a document (whether published on the Web or created for restricted use) based on any information derived from the FOAF data
taking an action that could have an adverse impact on the FOAF user.

The logging mechanism can be as simple as an email address that the FOAF:person archives for the purpose of later accountability assessment. Or, once SPARQL Update gets further along, this could be implemented in full Semantic Web style.

Usage restrictions

These are a series of restrictions that can be applied to uses of personal information directly drawn from the FOAF file or and inferred from the social network in which the data subject participates.

Restrictions:

<non-comm>: not for use for cc:commercial purposes
<group-only>: no use except members of named group <foaf:group>
<no-employer>: not for use in employment decisions
<no-finance>: not for use in financial decisions
<no-contact>: do not use the information here to contact the foaf:person.

(We may also import some of the P3P purpose vocabulary, though most is ecommerce oriented and not generally applicable to the social network context.)

Inference restrictions

Inference restrictions deny permission to draw certain inferences, including:

<no-religion>: may not infer religious affiliation
<no-ethnic>: may not infer ethnic identity
<no-health>: may not infer anything about the heath status of the person
<no-politics>: may not infer anything about the person's political believes
<no-location>: may not infer the data subject's location

Further topics/changes:

what usage restrictions should there be on the transaction logs themselves? - lilkely answer: logs can only be analyzed for the purpose of checking compliance upon discovery of an adverse use of perosnal information. This may give rise to a second universal rule in the system: transaction data can only be used for investigating adverse acts.
can we encrypt log data to enforce rule on use of log data?
consider putting all infrerence restrictions with usage restrictions.
consider a separate poliicy element on publication of personal data and inferences
may not need logging, rather just allow individuals to require that republication of personal data is done with attribution by the author so that the data subject can complain if there is a problem with accuracy or inappropriate use.
add 'I'm over it' policy

This work is licensed under a Creative Commons Attribution-No Derivative Works 3.0 Unported License.