Daniel Weitzner
<djweitzner@csail.mit.edu>
MIT Decentralized Information
Group
12 December 2007
This document: http://dig.csail.mit.edu/2007/12/rep.html
Earlier version at http://dig.csail.mit.edu/2007/12/rep-v0-01.html
Status: in progress. draft 0.02 (18 Feb 2008)
This is a proposal to re-establish privacy boundaries in social networking environment. ReP defines a minimum set of restrictions on usage of personal information as well as on the public assertions that may or may not be made based on inferences from social network data. Accountability to these rules will be achieved through social consensus along with a technical assistance from tools that help participants in the social network. The policy vocabulary defined here is designed to help participants in social networks signal to each other about their own personal privacy boundaries.
With this vocabulary, social networks will be able to keep those privacy rules associated with individual elements of personal information as that data is woven into the growing Giant Global Graph of personal information. Most importantly, I hope ReP can become the basis of not just a social consensus. But also, ReP can be platform on which to build application widgets and other tools that
ReP is designed with some though in mind to legal rules that might come along in the future to add weight to the privacy boundaries that users seek to establish. Those legal frameworks, if they come at all, will come later. They have a chance of emerging if it looks like this accountability approach to social network privacy is a good idea, but social consensus will have to gell before legal rules can be created.
Today, more and more social networks are coming onto the Web and are working to share more data across the previously-established boundaries that have previously separate these networks (Open Social and Social Graph APIs, for example.). Participants in social networks should have the benefit of widely shared agreements about how the information they present in those networks will be used. To encourage the development of these social and legal privacy norms, we need a simple policy language for expressing rules associated with personal information, and a reliable, scalable mechanism for assessing accountability with those rules. We propose a new protocol by which those who share personal information on the Web can have increased confidence that this information will be used in a transparent manner and that users of the personal information will be able to be held accountable to comply with the stated usage rules.
Privacy policies and associated technologies should eventually provide individuals harmed by breaches with legal recourse against those who abuse the norms of information usage. Hence, agreements must be clear and structured in a manner that there is a chance that the existing legal system could provide a remedy for harm. We should neither expect nor require than a single set of norms will be adequate for all users, all social networking contexts or all cultures, but there should be a common framework and a basic policy vocabulary that can express commonly used rules and be easily extended.
The key to sharing personal information across a diversity of privacy policy frameworks is to establish legal and technical mechanisms that ensures a baseline of social and legal accountability across varying rulesets. Participants in the ReP web must agree as a condition of accessing anyone else's personal information that usage of personal information will be tagged with self-reported provenance information including the identify of the user and the sources of the personal information being used. Further, anyone who uses the personal information must agree to require that the same set of rules (both the logging requirement and whatever usage rules came with the data) be applied to any subsequent users of the data. This provenance will allow the data subject to check that a specific usage of personal information complies with the specified usage limitations, and to follow the trail of accountability from the initial access of the data through to the final usage event.
This copy-left-inspired viral policy is the most effective way to assure that the original rules associated with personal data are respected as that data is re-used over and over again in a variety of contexts. In the event of misuse, the provenance data will provide a means to locate the mis-user and seek correction or other redress. In many cases where such unauthorized use does real harm to the data subject, it will be possible with some amount of forensic effort will find the mis-user and enable redress. Of course, there will be anonymous mis-users of personal information. We cannot insulate Web users from those risks with ReP, but neither can any other privacy protection strategy that is feasible in an inherently open information environment.
While the basic social, legal and technical architecture proposed here is generic, we illustrate ReP using FOAF and other Semantic Web technologies. A more detailed explanation of the accountability approach to privacy on the Web can be found in our paper on Information Accountability.
Social networks, blogs, photo sharing sites and other applications known collectively as the Social Web are collecting and exposing an increasingly complex and far-reaching set of information describing the social relationships of millions of individuals on the Web. Privacy risks include not only exposure of one's own personal information, but also anyone else who is represented as a node in the data subjects FOAF graph. Not only does this social network expose the personal information of its individual members, but also the professional and personal relationships among all of these people. Scenario A illustrates a simple set of privacy problems that can arise
The DIG Research Group runs a blog, along with which comes the common problem of blog comment spam. Rather than outsource the spam filtering with uncertain results, we wanted to use basic semantic web technology to implement exactly the commenting policy we chose. Our goal is to allow anyone who is moderately closely related, ie. no more than three degrees of separation from members of our group. The solution was to create a white list of all OpenIDs that appear in the graph of foaf:person's who are no more than three foaf:knows links from one of our group members. This was relatively easy to do (as least for Joe Presbrey, as documented by Sean Palmer and Dan Connolly). The privacy risk comes from the fact that if it's easy for us to do this, then it's easy for anyone, especially since we've published the crawler code. We're only using data (FOAF files) that is publicly available, but in the course of doing this, we're creating a list of those people who are, in some loose way associated with DIG. This is not information that they publish themselves, but something we infer from following links. We're making claims on their behalf, in their name. What's more, we've said that we will refuse comments from anyone who does not expose enough information about themselves to appear (or not) in this particular social network graph. That is not really such a grave loss for most, nor is it all that likely that any great privacy harm from this, but there are downsides:
There are several privacy responses to these risks. One might dismiss them as unimportant. Worse. one might just throw up one's hands and figure there's nothing to do about them. Those may be acceptable answers where the privacy impact is low, but if web-scale social network is going to grow, we will need a privacy approach that addresses these risks seriously without either forcing people to give up all expectation or privacy and without driving people to hide their personal information in a disconnected set of networks.
Here's how ReP would address the privacy risks in this scenario:
When we use data about you from your FOAF file, we will inform you that we've done so by sending a message (described in more detail below) that contains a pointer to the use we make of your data, a way to contact us, and a characterization of the kind of use we make of you data. From this log, you will be able to check that we, and anyone else who uses your personal information elsewhere on the Web is doing so consistent with the rules you set out with your FOAF file.
When others use data about your from the aggregation and analysis that DIG creates (such as the assertion that you're 1, 2 or 3 degrees removed from a DIG member), they will also have to report that usage to you (and/or perhaps to us... I'm still working out that architectural detail). Anyone who uses your data will also have an easy way to discover the rules that you've set to govern how your data will be used. I've laid our a very basic set of usage and inference restrictions here, but I'd expect that vocabulary of usage rules to evolve quite a bit of over time, but there is a minimum set of rules that everyone in ReP social web has to agree to and follow: the rules governing logging and accountability.
With these usage restrictions and accountability logging in place, anyone whose FOAF data becomes willingly or unwillingly included in the relationship graph that we create describing DIG's social network will have the means to check that their usage rules are being followed by DIG and by anyone else who publishes data about them. If you came across a mention of yourself, as identified by your FOAF URI, you could check to see that that usage was reported in your log. If you find an error or misuse of your data, you can complain to the user. Of course, it is possible that someone could look at this data and then use it offline in an unaccountable manner. The logging the ReP requires may be useful to figure out just where the data leaked out of the accountable ReP graph and was misused. Of course, like any other effort to investigate wrongdoing, this will take work and does not guarantee success. Still this provides a stable privacy environment for those who are honest and those who are dishonest but afraid to break the rules. That is a big step forward from where we are today.
Alternatives available for privacy protection in open social networks on the Web are generally unsatisfying:
@@
People are more likely to comply with rules (social or legal) if they believe that their non-compliance will be noticed. Hence, ReP proposes that as a condition of using personal information the person who uses or republishes personal data must attach a minimum amount of provenance information so to enable social network participants to walk back through the web of connections that make up a social network and identify individuals who intentionally or inadvertantly mis-use personal information.
The data user attach to a new use of personal information the following:
For these purposes, a data usage event is:
These are a series of restrictions that can be applied to uses of personal information directly drawn from a person's profile (including the contents of a FOAF file) or and inferred from the social network in which the data subject participates.
Restrictions:
These restrictions are to interpreted as adding usage restrictions to whatever rules a present given the applicable law and the terms of service of the environment from which the data originates.
(We could also import some of the P3P purpose vocabulary, though most is ecommerce oriented and not generally applicable to the social network context.)
A special category of usage restrictions involves statements that others may make about an individual. In general, I believe that people have a right to 'speak' about others, including a right to express their opinion about others and to share what they may be able to learn about others. However, common courtesy, and sometimes law, places certain limits on what we casually will say about others in public. These restrictions are designed to remind others what aspects of our identity and personal status we would like kept private.
Republication restrictions include:
It's important to be able to assert these kinds of limitations about what others may/should say about you, because of the very unique feature of social networks -- they reveal things about you that you don't explicitly reveal yourself. For example, by observing online behavior such as:
you may be able to infer attributes of my personal life that I have not explicitly disclosed. I can't prevent you from noticing these individual facts nor can I prevent you from seeking to draw your own inferences, but I can request that you not republish those inferences as assertions about me.
@@sample policy@@
@@sample attribution@@
This is a partial bibliography and list of links that I have found useful in understand social network privacy issues:
Many thanks to colleagues from MIT's Decentralized Information Group and the TAMI project. Individual feedback from Dan Brickley, ... has been very helpful.