Transparent Accountable Data Mining

Daniel J. Weitzner & Hal Abelson
Decentralized Information Group
MIT Computer Science and Artificial Intelligence Laboratory

Deborah McGuinness
Knowledge Systems Lab
Stanford University

These slides: http://dig.csail.mit.edu/2006/Talks/0724-tami/

Overview

Why transparent, accountable systems?

General perspective of TAMI project:
- view of the relationship between law and society
- deep openness of Web-based social interactions
Privacy Challenges: Technical and legal
Accountability Design Requirements for Privacy Protection
Application of accountability to other public policy problems

Law and Society -- a pop quiz

How many believe you are subject to law (any law)?
How many of you follow (most) laws? [exclude speed limits]
How many of you read all the laws to which you believe you are subject?
How many have been to a court of law?

General goal: Making the Web 'Policy Aware'

How will the Web finally catch up with the 'real world'?: in everyday life, the vast major of 'policy' problems get worked out without recourse to legal system.

Design goal: instrument the Web to provide seamless social interactions which allow us to avoid legal system the way we do in the rest of life

Global perspective: In the shift from centralized to decentralized information systems we see a general trend:

ex ante policy enforcement barriers -> policy description with late binding of rules for accountability

B. Privacy Challenges -- then and now

B. Privacy challenges in decentralized systems

Less worry about collection

More worry about

aggregation
integration
wide-ranging inferencing

Property (intellectual)

Departure from Hollywood content (centralied production) -> Blogs, Flickr and Livejournal (decentralized content we all make)

Property (intellectual)

Move from up-front enforcement barriers (DRM) -> open description of licensing terms (CC) with after-the-fact enforcement as needed

C. Privacy: the dilemma of consent

Can consent model (EU opt-in or US opt-out) be effective going forward?

Key will be purpose limitation, but we have a dilemma...

narrow purpose definition -> lots of choices = large time investment will yield privacy protection and low flexibility
broad purpose limitation -> few choices = small time investment required but less privacy protection

Dilemma: limited individual and regulatory capacity to control escalating data uses.

Result of consent dilemma + increased inference power: strict about what's collected but loose about usage

Collection Limitation -> Use Limitation

We're at the wrong end of the privacy spectrum and seeking the wrong results:

privacy today

Collection Limitation -> Use Limitation

Still suboptimal control point:

privacy goal for some

Collection Limitation -> Use Limitation

This is where we should be:

privacy goal for some

Collection Limitation -> Use Limitation

Why?

Rules express core values!!
Better allocation of individual and regulatory effort
Often the only logical evaluation point

Other uses of accountable systems

Health Privacy
Credit Reporting
Copyright management (DRM alternative)

Toward Transparent, Accountable Systems

How?

Laws and other social rules:

specify permissible and impermissible uses
require proof of permissible use along with adverse action (in high-sensitivity situations (government access, health records, ...)
learn how to match expressivity of legal rules with inferencing power

Systems:

formal specification of rules over data (working on a purpose algebra)
tools to explore accountability to rules upon 'adverse action' (Hal Abelson)
generate and check proof of compliance (Deb McGuinness)

Links and Acknowledgements

For more information see:

MIT Decentralized Information Group: http://dig.csail.mit.edu/
Transparent, Accountable Datamining Initiative (TAMI): http://groups.csail.mit.edu/dig/TAMI
Policy Aware Web project (PAW): http://www.policyawareweb.org/

Work described here is supported by the US National Science Foundation Cybertrust Program (05-518) and ITR Program (04-012).

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 License.