Transparent Accountable Data Mining Initiative (TAMI)

The TAMI Project is creating technical, legal, and policy foundations for transparency and accountability in large-scale aggregation and inferencing across heterogeneous information systems. We are outling an information architecture for the Web that can provide transparent access to reasoning steps taken in the course of data mining, and accountability for use of personal information as measured by compliance with rules governing data usage.

February 2008

TAMI/e2esa face-to-face [agenda][minutes]

January 2008

Meeting with iARPA. Presentations: Danny's, Lalana's, and Jim's

Submitted paper to IEEE Policy 2008 pdf

December 2007

Tabulator extension release includes justification UI

Download and install extension

Example justification

Begun work on Reciprocal Privacy for Social Networks

November 2007

Scenario 9: MA Disability Discrimination

Detailed workthrough of scenario 9

October 2007

Scenario 0: MIT Prox Card violation

September 2007

6.898 Fall course on Accountability architectures for WWW started

Draft specification of AIR (Accountability in RDF) AIR ontology

August 2007

Decided to move to a more AMORD like language with dependency tracking

July 2007

First draft of Rei+ ontology

Started work on privacy policy language (Rei+)

TAMI Architecture (pdf)

March 2007

Developing Policy Aware Provenance design

February 2007

Beginning work on scenarios 8, 9, 10

January 2007

TAMI/e2eSA FTF meeting

Continuing work on Scenario 6

November 2006

Scenario 4 is starting to take shape, as is the Scheme code that implements one of the reasoning engines.

October 2006

We've begun work on a Data Purpose Algebra and are continuing to work on expressing Scenario 4.

September 2006

Work continues on cwm/n3 and TMS-based reasoners, as well as Scenario 4 and the user interface of the project.

June 2006

We participated in these events:

6/1 - DHS Privacy Workshop
6/28 & 6/29 - TAMI/PORTIA Privacy & Accountability Workshop

May 2006

Integrating Cwm with Inference Web

April 2006

We've decided to produce more complex scenarios in order to test/modify our design:

Scenario 4
- Agreed to expand Scenario 3 to include more data transfers, which will allow us to work with the hand-shakes between
  the rules of Routine Uses (a federal agency's statement of the terms and conditions for disseminating data from a particular system) and
  
  the rules of a System of Records Notice (a federal agency's statement of terms & conditions for receiving data into a particular system)
- Drafted Scenario 4
- Summarized the SORN and Routine Uses for
Scenario 5
- Agreed to expand Scenario 4 to include case law, the decisions of federal courts regarding a conflict over the interpretation or application of a law
- Drafted Scenario 5which talks about
  - F&H Barge Corp v. D&H Corp.
  - Doe v. DiGenova
  - Cardamone v. Cohen
- Drafted some notes about working with case law
Scenario 6
- Agreed to draft scenario in which the data is mined from the internet

March 2006

We've presented a paper at the AAAI Spring Symposium "The Semantic Web Meets E-Government."

The symposium
Our paper - "Transparent Accountable Data Mining: New Strategies for Privacy Protection"
Daniel J. Weitzner, Harold Abelson, Tim Berners-Lee, Chris Hanson, James Hendler, Lalana Kagal, Deborah L. McGuinness, Gerald Jay Sussman, and K. Krasnow Waterman
Our powerpoint - Presented by Deborah McGuinness

We've started producing code:

Chris' code - using RDF. Written in a new notation (NS) embedded in MIT/GNU Scheme.
deduce-1.ns

Top-level code to analyze the scenario and report results.

scenario3.ns

The hypothetical facts of Scenario 3.

sorn.ns

"Secure Flight" SORN used in Scenario 3.

tami-entities.ns

Defines government organizations and their legal authorities, databases, and employees.

tami-schema.ns

RDF ontology for TAMI generally.

sorn-schema.ns
RDF ontology for
- System of Records Notices (SORNs), the means by which federal agencies notify the public of what data is being collected for what purposes.
- Including Routine Use notices, the means by which federal agencies notify the public with with whom they will share particular data for what "authorized purposes".
ns-compiler.scm
ns-runtime-1.scm
ns-runtime-2.ns

Compiler and runtime support for NS notation.
Carlos' code - using RDF and N3
- Scenario 3
- Scenario 4

February 2006

We've agreed that we have enough common understanding to work through a scenario from end to end.

Produced Scenario 3 to work from.
Produced a representation (in triples) of a lawyer's reasoning.

Fall Term 2005 - Preliminary Work

One significant challenge was the range of backgrounds necessitated by the project. In order to ensure a common knowledge base, we provided each other with overviews of relevant topics:

Structure of US federal law (including how federal agency policies are derived)
Legal Reasoning
Classical Logic
- The system P-ND
Semantic Web logic
N3
- RDF & N3
Truth Maintenance Systems
Proof Markup Language (PML)

Produced a hypothetical use case

We created a fictional scenario that addresses some of the common public concerns. It involves an airline passenger who is a potential match in the testing of the Transportation Security Administration's Secure Flight program (formerly known as CAPPS); his identity is passed to the FBI's Joint Terrorism Task Force and, ultimately, he is arrested on an outstanding warrant for unpaid child support.

The scenario will allow us to test our ability to build a system that can proofcheck the answers to two important data mining questions:

Was the agency permitted to possess/acquire the data in question?
Was the inference they drew from the data reasonable?

The scenario was built specifically to require application of rules with three increasing levels of complexity:

self-referential statute (all necessary rule information is within one document), near mathematical structure (i.e., greater than, less than)
externally-referential statute (necessary rule information obtained from other documents),
case law (running text from which rules must be extracted)

The pieces of our planned system

XML

Based upon current government efforts, we presume that the historical log of data collection, analysis, and transfer, as well as case activities, will exist in XML. Using our hypothetical, we created

A fictional transaction log. This version assumes that the transaction is traced back through multiple agencies' records and that the relevant items were concatenated in a single file.
a "cleansed" version. This version assumes that some system reorganized the data into a more organized, readable format.

Note 1: Where possible, we used the National Information Exchange Model, the joint Department of Justice and Department of Homeland Security XML for law enforcement.

Note 2: The two versions do not contain identical information. The "cleansed" view contains more of the required information.

XSLT

Next Steps:

Determine whether the scope of this project includes using XSLT to convert the NIEM XML to RDF. If so, this work must be assigned.

RDF

We expect that the transactional data will be processed as RDF. A volunteer has produced an RDF version of the transaction data.

Chris Hanson has generated a skeletal RDF Schema for SORN documents and has used that vocabulary to create an example SORN. This uses an updated RDF/XML version of the above transaction data. K is drawing an RDF graph to demonstrate the SORN Schema.

N3

We are operating on the assumption that the rules should be expressed in N3. This required us to build quite a bit of common understanding about how to convert law to rules to N3. This appears to be an iterative process. So far, we have:

Produced rules in N3 for the "Deadbeat Dad" statute

Note: These rules will not answer either of our two goal questions, but this was an important first step in determining that we could convert any law into N3.

Produced an explanation of the Privacy Act's relevant rules for the collection of data

Note: This will create an opportunity to test a proof for our first goal - was an agency allowed to collect/acquire the data?

Produced a first draft in N3
Produced a more structured explanation of the Privacy Act's rules

Next Steps:

Expand the N3 draft to match the more structured explanation of relevant Privacy Act rules.
Conduct legal research to determine what case law has been created to address the second goal - determining whether an inference was reasonable.
If there is insufficient case law, research scholarly studies?
Produce written rules regarding inferencing.
Convert rules to N3.

CWM

We have had quite a bit of discussion about what is the appropriate logic system for this sort of project.

Because of our Semantic Web interests, we are focused on CWM. We have:

Applied N3 rules to facts in CWM
- The link shows a successful application of the "Deadbeat Dad" rules to facts
- The link also shows a successful application of the basic Privacy Act rules to fact

Next Steps:

Add to all CWM processes (for this project), the --why option which shows a trace dump.
Add to all CWM processes (for this project), to direct the output to a file.
After they are produced, run the expanded N3 Privacy Act rules against the transactional RDF.
After they are produced, run the N3 inferencing rules against the transactional RDF.
Continue to discuss logic issues as we try these steps.

Truth Maintenance Systems

We are contemplating using a TMS as the storage mechanism for our proofs and/or as a deductive reasoner. We have:

Produced one sample, using the "Deadbeat Dad" rules in AMORD

Next Steps:

Continue/complete discussions re: choosing monotonic, non-monotonic, or other logic
Decide whether to go forward with TMS comparator.

Inference Web/Proof Mark-up Language

We are intending to register question answering systems used in TAMI in Inference Web and have those systems generate PML. Then we will use Inference Web to view and manipulate explanations.

Next Steps:

Once we have a trace dump from one or two of these processes, we will use these as use cases to drive the work.
Simultaneously, we are working on getting CWM registered and generating PML.Once we have the initial registration and PML dumps, we will work on appropriate tactics for simplification of presentation.