Addressing and Identifying Privacy Leakage from Query Logs:
An Accountability Approach





Oshani Seneviratne




11 June, 2009


DIG LOGO
Decentralized Information Group
MIT Computer Science and Artificial Intelligence Laboratory

About DIG

Problem Description - Privacy of Query Logs

Motivating Scenario

Possible Mechanisms for Privacy Protection

  1. Secrecy or Information Hiding
  2. Information Accountability

1. Information Hiding
(Privacy Sensitive Data Analysis)

Goal:Construct database protocol that limits information access according to a formal definition of privacy

Privacy Definition:Indistinguishability of the individual from the community

Method:Measures epsilon-indistinguishability of a database query transcript

Cynthia Dwork's Differential Privacy

See Also: Latanya Sweeney's k-anonymity work

Privacy-Safe Zone: Privacy sensitive data mining establishes a boundary, which, if respected, assures no privacy risk to the individual.

2. Information Accountability
(An Alternative to Secrecy)

Information Accountability: When information has been used, it should to possible to determine what happened, and to pinpoint use that is inappropriate

A Sample Regulatory Paradigm for Semantic Web Data

United States Fair Credit Reporting Act

Enabling Information Accountability

Policy Awareness is a property of Information Systems:

Components of our Accountability Framework

  1. Expressive Policy Language
  2. Reasoner
  3. Visualization Tool
  4. Usage-Aware Querying
  5. Checking Policy Compliance of Queries
  6. Policy Violations Validator
  7. Semantic Clipboard

1. Expressive Policy Language

1. Expressive Policy Language

AIR enables Dependency Tracking for Policy Explanations:
  • Dependencies are the specific set of premises from which any conclusion/policy decision was derived is an effective explanation for the conclusion
  • Dependency tracking is the process of maintaining dependency sets for derived conclusions
  • We use Truth Maintenance System (TMS) for tracking dependencies of conclusions:
    • Keeps track of the logical structure of a derivation
    • Associates dependencies with each fact in the KB
    • Has ability to assume and retract hypothetical premises

Example
	of dependencies

1. Expressive Policy Language

AIR Specification

  • AIR policies are written in the N3 serialization of RDF
  • Each AIR policy consists of one or more rules
    policy = { rule }
  • A rule is made up of a pattern that when matched causes an action to be fired. Optional: description
    rule = { pattern, action [ description ]}
  • An action can either be an assertion, which is a set of facts that are added to the knowledge base or a nested rule
    action = { [ assert | assertion ] | rule }
     :Policy1 a air:Policy;
	 air:rule [
	     air:pattern { ... };
	     air:assert { ... };
	     air:rule [ ... ]
	 ].

1. Expressive Policy Language

AIR Specification

@forAll :VAR1, :VAR2.

:Policy2 a air:Policy;
	 air:rule [
	     air:pattern { @forSome :VAR3 . :VAR3 a air:Policy; air:rule :VAR2 . };
	     air:assertion { ... };
           air:description (:VAR1, “ is a variable that is declared in :Policy2 and ”,  :VAR2, 
           “ is a variable that is declared in this rule”);
	     air:rule [ ... ].

2. Reasoner

AIR Reasoner:

3. Visualization Tool

Justification UI

4. Usage-Aware Querying

  • SPARQL DB server specifies usage restrictions on data
    • DIG members and friends of DIG members may use this data for research purposes but not for marketing
    • CSAIL members may view this data but may not use it for research
  • Access control via open-id
  • Policies in AIR
  • Query results annotated with usage restrictions
  • Queries themselves are logged in the DB and can be audited
Architecture of Usage-Aware querying

5. Checking Policy Compliance of Queries

  • Policy Assurance Tool implemented with:
    • Query logger
    • AIR Policy language
    • AIR Reasoner
    • Justification User Interface
    • SPARQL converter
    • Policy Development Support
  • We can test for particular kinds of variables in parts of clauses.
  • We can make compliance decisions based on logical constructs.
  • We can reason over a user's query history.
  • There is enough expressivity to encode interesting policies.

Architecture for policy compliance

Example: SSN Restriction Policy

An example of a simple policy:
"If SSN number is referred to in the query either as the requested value or just to filter the data, the query is incompliant."

Creating the SSN policy in AIR

SSN policy in AIR: the WHERE clause rule

First, checking that it is in fact a SPARQL query
:SSN_RULE1 a air:BeliefRule;
    air:label "SSN policy rule 1";
    air:description (:Q " is a SPARQL query with a WHERE clause.");
    air:pattern {
       :Q a s:Select;
       s:POSList :P;
       s:WhereClause :W.
   };
  
Checking if SSN is mentioned directly in the WHERE clause
:SSN_RULE4 a air:BeliefRule;
    air:label "SSN policy rule 4";
	air:description ("The query, " :Q ", includes reference to
	                 SSN number in the where clause"); 
    air:pattern {
        :P s:variable :V.
        :W s:TriplePattern :T.
        :T log:includes { :X <http://example.com/ssn> :Y }
    };
    air:assert { :Q air:non-compliant-with :SSNPolicy };
  

SSN policy in AIR: OPTIONAL and FILTER

Checking if SSN is mentioned in the OPTIONAL part of a WHERE clause
:SSN_OP02 a air:BeliefRule;
    air:label "SSN optional clause rule 02";
    air:pattern {
        :W s:OptionalGraphPattern :O.
        :O s:TriplePattern :T.
        :T log:includes { [] <http://example.com/ssn> [] }
    };
    air:description ("The query, " :Q  ", includes reference to
	             SSN number in the OPTIONAL part of the WHERE clause"); 
    air:assert { :Q air:non-compliant-with :SSNPolicy_OptionalClause }.
... or used as a FILTER
:SSN_FR02 a air:BeliefRule;
    air:label "SSN filter rule 02";
    air:pattern {
        :P s:variable :V.
        :W s:TriplePattern :T.
        :T log:includes { :X <http://example.com/ssn> :V }.
        :W s:Filter :F. 
        :F s:TriplePattern :S.
        :S log:includes { :V [] [] }.
    };
    air:description ("The query, " :Q  ", filters on SSN variables"); 
    air:assert { :Q air:non-compliant-with :SSNPolicy_FilterRule }.

SPARQL Query that vilates the SSN Restriction Policy

Return a table of the SSN, and OpenID for all entities over age 18

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?s ?id ?n WHERE {
  ?s <http://example.com/ssn> ?n.
  ?s foaf:age ?a.
  ?s foaf:openid ?id.
  FILTER (?a > 18)
}
  

Creating the Log: SPARQL Query To N3

The AIR reasoner cannot understand SPARQL, but it can understand N3, a human readable representation of RDF. We provide an automated tool, sparql2n3, to convert from SPARQL to N3.
$ ./sparql2n3 query1.rq
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix s: <http://dig.csail.mit.edu/2009/IARPA-PIR/sparql#> .
@prefix : <http://dig.csail.mit.edu/2009/IARPA-PIR/query1#> .

:Query a s:Select;
   s:cardinality :ALL;
   s:POSList [
     s:variable :s;
     s:variable :id;
     s:variable :n;
   ];
s:WhereClause :WHERE.

:WHERE a s:DefaultGraphPattern;
  s:TriplePattern  { :s <http://example.com/ssn> :n };
  s:TriplePattern  { :s <http://xmlns.com/foaf/0.1/age> :a };
  s:TriplePattern  { :s <http://xmlns.com/foaf/0.1/openid> :id };
  s:Filter [ 
      a s:ComparatorExpression;
      s:TriplePattern  { :a s:BooleanGT "18 "^^xsd:integer};
];
  

Justification UI Demo



Try it here

Logical Constructs

Many policies build upon basic constructs. We can generalize to create policy "templates".

6. Policy Violations Validators

Attribution License Violations Validator for Flickr Images

7. Semantic Clipboard

7. Semantic Clipboard

CC License Use and Restriction categories mapped in the menu selection.

Tool tip text displays whether the image can be copied or not.

7. Semantic Clipboard



Right Click on the Image with RDFa

The Quest for Assets
The Good, The Bad and The Wanted

References