What we do at DIG

Introduction

With the advent of the World Wide Web, the pace at which people generate, access, and acquire data has accelerated significantly. While there is an inevitable desire for users to share data with each other and uphold social connections, they are often surprised or upset when their data is misused. Grave consequences can result from improper handling of data – examples include privacy infringements, security breaches and criminal intrusions. In order to protect themselves from such harm, people have increasingly resorted to self-censoring of data. However, self-censoring not only destroys learning opportunities information sharing can potentially provide, but it also often leads to social alienation. Hence, consumers are constantly torn between shielding themselves from privacy risks and wanting to present their data to a bigger audience. Our goal at DIG is to enable responsible use of information and enhance willingness to share through targeting areas in which current privacy protection mechanisms do not suffice.

Background

In recent years, various data source platforms such as Facebook, LinkedIn and Twitter have become increasingly focused on issues of consumer data privacy protection. Currently, the primary approach taken to ensure this protection is through the use of access control.

Access control systems ask people to define privacy rules to grant access to their personal information, often times constraining access to different groups of people - do you wish to share your photos with just your friends, or friends of friends, or perhaps everyone who goes to your school? Such rules must be set for every data source the user has, each likely with different terminologies and controls. In general, expecting consumers to make decisions about their privacy policies across all of these systems is impractical. In reality, the complexity and tediousness of this process often escalates quickly, causing users to become confused with these settings. For instance, popular social networking sites such as Facebook frequently update their privacy settings to grant users more freedom in customizing their data access preferences. However, complaints regarding the inefficiency and inadequacy of this system continue to plague companies. It’s not a surprise one would find unhappy users - on Facebook alone there are more than 20 different access control options, each governing a sliver of user data.

Another reason for consumers’ apparent growing dissatisfaction with access control systems lies in the fact that access control assumes authorized users to also be responsible users, where in reality this is often not the case. Here, exclusive reliance on access control is brittle because information, once revealed, is completely uncontrollable. There is no effective mechanism in place to prevent misuse of the released information. Therefore, even when access control systems are successful in blocking out unwanted viewers, they are ineffective as privacy protection for a large, decentralized system. For a system like the World Wide Web, once information is made publicly available, it is easy to copy or aggregate data to infer sensitive information. In a recent paper by Fuming Shih and Sharon Paradesi, they demonstrate this exact problem through Global Inferencer, a data miner that uses linked data technology to perform unified searches across Facebook, Flickr and other public data sites. Results reveal a startling amount of data about users based on inferences performed on individuals’ lifestyle and other behaviors. Thus, simply controlling access to personal information on individual social networking sites still exposes users to privacy risks.

It is difficult for users to protect their own privacy through existing tools and mechanisms because they find it challenging to specify privacy rules and have no way of expressing how they want their data to be used. As the definition of privacy continues to evolve alongside novel ways of sharing and using data, new systems are required in order to fully address the needs of users.

Enabling Information Accountability

At DIG, we are committed to the enabling of widespread deployment of a policy aware web, a system that provides all users with accessible and understandable views of the policies associated with information resources. The main motivation behind this, as outlined in Section 2.0, is the lack of comprehensive effectiveness access control provides for information management. As mentioned previously, exclusive reliance on access restrictions leads to uncontrollable data usage once information is revealed.

In reality, individuals are more interested in how data is used, rather than how it is being accessed. Information policies should be created to accompany data usage, and accountability systems devised to reinforce proper compliance with these policies. We have published several papers addressing the significance of policy awareness, and the need to shift from access control to rule-based access for the World Wide Web. We specifically recognize the power of transparent, accountable data mining, and systems oriented towards information accountability instead of information security and access control. The main contribution of accountable systems is the ability for decentralized systems to determine whether each use of data is permitted by the relevant rules specified by the author, for that particular person, in that particular circumstance. Information accountability enhances privacy protection through transparency of data and deterrence. Our goal, at DIG, is to enable real-time detection and redress, thereby inhibiting data misuse.

It is important to note that we are proposing information accountability as a complement, instead of an alternative, to access control. While access control governs the disclosure and availability of information, accountability enhances privacy protection through deterrence strategies, similar to how legal and social rules work in our societies. Even though most of these rules are not enforced perfectly or automatically, citizens abide by them since our society evolved in a way such that compliance is encouraged over violation. This can be illustrated in a parking spot example, where one obeys parking rules even when there is no traffic police around. Existing social and legal norms govern what we can or cannot do, and they have been ingrained into our way of thinking such that most people go through life following these rules. When problems occur, there are mechanisms that ensure that violators are reprimanded – parking tickets or fines in this particular example. Instead of enforcing privacy policies through restricted access, we suggest focusing on helping users conform to policies by making them aware of the usage restrictions associated with the data and helping them understand the implications of their actions and of violating the policy. This process can proceed in isolated stages – each component of an accountability system phased in to accompany existing access control systems. Ultimately, we aim to work towards introducing a complete infrastructure that encourages transparency and accountability in how user data is collected and used.

Components of an Accountable System

Through years of research and exploration in this field, we have identified 5 major components that make up an accountability system, leading from information generation to information usage. They are listed as follows:

Data Provenance: data provenance is defined as the derivation history of a piece of data, and it is integral to the algorithms of many distributed systems. It is used frequently for trust and authority assignment, hence handling provenance properly is an important task of an accountability system.
Policy Representation: policy representation refers to the process of translating users’ intents and restrictions on data usage to machine-readable language. For instance, a user should be able to easily specify a usage policy that allows his data to be used for educational purposes only.
Policy Reasoning: policy reasoning involves the process machines engage in in order to determine whether or not usage policies are being violated. It comprises of policy interpretation and evaluation.
Transparent Monitoring: transparent monitoring describes the framework that supports tracking of data usage violations and real-time feedback for the users. Most importantly, it needs to provide users with possible redress actions depending on the compliance results.
Contextual Privacy: contextual privacy is an important component of the information accountability platform that focuses on changing contexts surrounding each user’s privacy preferences and how privacy can be interpreted differently under evolving circumstances

Thus far, DIG has conducted significant research on all 5 components outlined above and presented a wide range of applications with the aim of providing complete information accountability on the web. Details of our work can be found on our Publications page. While many of the existing problems have been identified, there is a plethora of future research topics in each of the components that we would like to and will continue to explore.