About DIG
- We work on technical, institutional, and public policy questions necessary to advance the development of global, decentralized information environments.
- The following people are/were involved in this project at DIG:
Problem Description - Privacy of Query Logs
- Lots of user data could be collected from Search Engine query logs
- Query logs alone do not lead to leakage of private information
or to identifying individual searchers
- Sensitive information can be uncovered by
- Cross referencing logs with data (public or otherwise
known)
- Using data-mining algorithms to find patterns in these logs
Possible Mechanisms for Privacy Protection
- Secrecy or Information Hiding
- Information Accountability
1. Information Hiding
(Privacy Sensitive Data Analysis)
Goal:Construct database protocol that limits
information access according to a formal definition of privacy
Privacy Definition:Indistinguishability of the
individual from the community
Method:Measures epsilon-indistinguishability of a database
query transcript
Cynthia Dwork's Differential Privacy
See Also: Latanya Sweeney's k-anonymity work
Privacy-Safe Zone: Privacy sensitive data mining establishes a boundary,
which, if respected, assures no privacy risk to the individual.
2. Information Accountability
(An Alternative to Secrecy)
Information Accountability: When information has
been used, it should to possible to determine what happened, and to
pinpoint use that is inappropriate
- Rules and law should govern how information is used: "It is
illegal to consider health status of applicant or her family in
hiring decisions"
- Interactions with data are logged in order to provide
possibility of machine-assisted human-driven accountability
A Sample Regulatory Paradigm for Semantic Web Data
United
States Fair
Credit Reporting Act
- Nearly unlimited information collection
- Unlimited analysis
- Strict usage limits
- Harsh penalties for mis-use
- Feedback loop to ensure accuracy
Enabling Information Accountability
Policy Awareness is a property of Information Systems:
- Accessible and Understandable views of the policies associated with information
- Machine-readable representations of policies
- Enables Accountability when rules are broken
Components of our Accountability Framework
- Expressive Policy Language
- Reasoner
- Visualization Tool
- Usage-Aware Querying
- Checking Policy Compliance of Queries
- Policy Violations Validator
- Semantic Clipboard
1. Expressive Policy Language
- AIR (Accountability In RDF) Policy Language
- Features:
- Rule-based policy language for accountability and access control
- Explanations for policy decisions through dependency tracking
- Customizable explanations, if required
- Grounded in Semantic Web technologies for greater
interoperability, reusability, and extensibility
1. Expressive Policy Language
AIR enables Dependency Tracking for Policy Explanations: |
- Dependencies are the specific set of premises from which any conclusion/policy decision was derived is an effective explanation for the conclusion
- Dependency tracking is the process of maintaining dependency sets for derived conclusions
- We use Truth Maintenance System (TMS) for tracking dependencies of conclusions:
- Keeps track of the logical structure of a derivation
- Associates dependencies with each fact in the KB
- Has ability to assume and retract hypothetical premises
|
|
1. Expressive Policy Language
AIR Specification
- AIR policies are written in the N3 serialization of RDF
- Each AIR policy
consists of one or more rules
policy = { rule
}
- A rule is made up of
a pattern
that when matched causes
an action to
be fired. Optional:
description
rule = {
pattern, action [ description
]}
- An action can either be
an assertion,
which is a set of facts that are added to the
knowledge base or a
nested rule
action = { [ assert | assertion
] | rule }
:Policy1 a air:Policy;
air:rule [
air:pattern { ... };
air:assert { ... };
air:rule [ ... ]
].
|
1. Expressive Policy Language
AIR Specification
- Variables (N3 Quantification)
- use N3 syntax to quantify variables (@forAll, @forSome)
- used to declare universal/existential variables that can
be used inside patterns
- variables scope to the file with a unique URI. Two
variables with the same URI are the same variable. If the
variable is bound before a rule is invoked, it is passed as
a constant
- Rule descriptions (air:description)
- list of variables and strings that are put together to
provide the NL description
@forAll :VAR1, :VAR2.
:Policy2 a air:Policy;
air:rule [
air:pattern { @forSome :VAR3 . :VAR3 a air:Policy; air:rule :VAR2 . };
air:assertion { ... };
air:description (:VAR1, “ is a variable that is declared in :Policy2 and ”, :VAR2,
“ is a variable that is declared in this rule”);
air:rule [ ... ].
2. Reasoner
AIR Reasoner:
3. Visualization Tool
Justification UI
- Accepts reasoning results
- Allows exploration of graphical display
- Supports 3 views:
- Textual / N3 View
- Explanation View
- Lawyer View
- Included in the Tabulator Firefox Extension: http://dig.csail.mit.edu/2007/tab/
4. Usage-Aware Querying
- SPARQL DB server specifies usage restrictions on data
- DIG members and friends of DIG members may use this data
for research purposes but not for marketing
- CSAIL members may view this data but may not use it for
research
- Access control via open-id
- Policies in AIR
- Query results annotated with usage restrictions
- Queries themselves are logged in the DB and can be
audited
|
|
5. Checking Policy Compliance of Queries
- Policy Assurance Tool implemented with:
- Query logger
- AIR Policy language
- AIR Reasoner
- Justification User Interface
- SPARQL converter
- Policy Development Support
- We can test for particular kinds of variables in parts of clauses.
- We can make compliance decisions based on logical constructs.
- We can reason over a user's query history.
- There is enough expressivity to encode interesting policies.
|
|
Example: SSN Restriction Policy
An example of a simple policy:
"If SSN number is referred
to in the query either as the requested value or just to
filter the data, the query is incompliant."
- Define
SSN:
<http://example.com/ssn>.
- Check different parts of a query for SSN.
- Return as much compliance information as possible.
Creating the SSN policy in AIR
- What parts of a query can contain references to the SSN?
- WHERE clause in SELECT
- OPTIONAL part of a WHERE clause
- FILTER clause
- A variable that is bound to SSN elsewhere.
- ...the list continues.
- How do we create policies that catch these references?
- AIR rules implement pattern matching
- Rules can be chained to form policies
- Independent policies return information
SSN policy in AIR: the WHERE clause rule
First, checking that it is in fact a SPARQL query
:SSN_RULE1 a air:BeliefRule;
air:label "SSN policy rule 1";
air:description (:Q " is a SPARQL query with a WHERE clause.");
air:pattern {
:Q a s:Select;
s:POSList :P;
s:WhereClause :W.
};
Checking if SSN is mentioned directly in the WHERE clause
:SSN_RULE4 a air:BeliefRule;
air:label "SSN policy rule 4";
air:description ("The query, " :Q ", includes reference to
SSN number in the where clause");
air:pattern {
:P s:variable :V.
:W s:TriplePattern :T.
:T log:includes { :X <http://example.com/ssn> :Y }
};
air:assert { :Q air:non-compliant-with :SSNPolicy };
SSN policy in AIR: OPTIONAL and FILTER
Checking if SSN is mentioned in the OPTIONAL part of a WHERE clause
:SSN_OP02 a air:BeliefRule;
air:label "SSN optional clause rule 02";
air:pattern {
:W s:OptionalGraphPattern :O.
:O s:TriplePattern :T.
:T log:includes { [] <http://example.com/ssn> [] }
};
air:description ("The query, " :Q ", includes reference to
SSN number in the OPTIONAL part of the WHERE clause");
air:assert { :Q air:non-compliant-with :SSNPolicy_OptionalClause }.
... or used as a FILTER
:SSN_FR02 a air:BeliefRule;
air:label "SSN filter rule 02";
air:pattern {
:P s:variable :V.
:W s:TriplePattern :T.
:T log:includes { :X <http://example.com/ssn> :V }.
:W s:Filter :F.
:F s:TriplePattern :S.
:S log:includes { :V [] [] }.
};
air:description ("The query, " :Q ", filters on SSN variables");
air:assert { :Q air:non-compliant-with :SSNPolicy_FilterRule }.
SPARQL Query that vilates the SSN Restriction Policy
Return a table of the SSN, and OpenID for all entities over age 18
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?s ?id ?n WHERE {
?s <http://example.com/ssn> ?n.
?s foaf:age ?a.
?s foaf:openid ?id.
FILTER (?a > 18)
}
Creating the Log: SPARQL Query To N3
The AIR reasoner cannot understand SPARQL, but it can
understand
N3,
a human readable representation of RDF. We provide an automated
tool, sparql2n3, to convert from SPARQL to N3.
$ ./sparql2n3 query1.rq
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix s: <http://dig.csail.mit.edu/2009/IARPA-PIR/sparql#> .
@prefix : <http://dig.csail.mit.edu/2009/IARPA-PIR/query1#> .
:Query a s:Select;
s:cardinality :ALL;
s:POSList [
s:variable :s;
s:variable :id;
s:variable :n;
];
s:WhereClause :WHERE.
:WHERE a s:DefaultGraphPattern;
s:TriplePattern { :s <http://example.com/ssn> :n };
s:TriplePattern { :s <http://xmlns.com/foaf/0.1/age> :a };
s:TriplePattern { :s <http://xmlns.com/foaf/0.1/openid> :id };
s:Filter [
a s:ComparatorExpression;
s:TriplePattern { :a s:BooleanGT "18 "^^xsd:integer};
];
Logical Constructs
Many policies build upon basic constructs.
- ~A: Restriction (not). May not view something of type:A, at all.
- i.e. the running example, cannot query a user's SSN
- A (+) B: Exclusion (xor). May view type:A or type:B, but never both.
- i.e. "cannot query a user's bank account number and SSN."
- A <-> B: Inclusion (and). May only view type:A if type:B is
also present.
- i.e. can only get the photo of users over 18
- A -> ~B: Blocking (ordered xor). Viewing type:A prevents viewing type:B.
- i.e. "cannot query for driver's license number after having queried SSN."
- Exclusion and blocking generalize to max(M,N).
- max(M,N) defined as "may view up to M fields of N for M <= N."
- easy to understand, difficult to program directly
- i.e. "may know up to 3 of: last name, DOB, SSN, driver's license number, bank account number."
We can generalize to create policy "templates".
6. Policy Violations Validators
Attribution License Violations Validator for Flickr Images
7. Semantic Clipboard
- RDFa Extractor extracts all the RDFa embedded in an HTML page on Page Load.
- UI Enhancer overlays the user interface for better license awareness.
- Attribution XHTML Constructor composes attribution XHTML code snippet.
- User Interface consists of a context menu on images that can be copied to the System Clipboard.
7. Semantic Clipboard
CC License Use and Restriction categories mapped in the menu selection.
Tool tip text displays whether the image can be copied or not.
7. Semantic Clipboard
Right Click on the Image with RDFa
The Quest for Assets
The Good, The Bad and The Wanted
- Our Research Focus is on Query Compliance with Privacy and Data Usage Policies
- The work done so far used SPARQL Queries and not Search Engine Queries
- Therefore we did not use the search query logs from Microsoft in this project