End to End Semantic Accountability (E2ESA) Deliverable: Scenario 9

Authors:: Li Ding; Lalana Kagal; K. Krasnow Waterman
Abstract:: A public report on a scenario for testing E2ESA infrastructure.
Latest version:: http://dig.csail.mit.edu/TAMI/2007/s9/deliverable
STATUS:: Working Draft, last modified on 2007-11-08.

1. Overview

1.1 Goals

Scenario 9 is a test-bed for policy language and policy checking mechanisms in our accountability infrastructure with the following goals:

enumerate related laws and policies and verify whether the policy language can correctly represent these laws and policies
enumerate policy checking test-cases and verify whether our policy engine can consistently find the use of data which constituted a violation of law or policy in the transaction logs

Below are several important notions used throughout this report:

transaction logs - audit logs of data-manipulation events, maintained by accountability servers (ASs)
laws and policies - a list of laws and policies that apply to the data or data manipulation in the transaction log
violation of laws or policies - violations may be in the form of unauthorized access, dissemination, manipulation, or conclusions derived from them

Part of our motivation is to ensure that people who suffer adverse consequences (e.g., ranging from the loss of an apartment or job to the loss of life or liberty) as a result of such violations of their privacy will have an aid in their path to redress. For this reason, each of our scenario's variation illustrates a real-world harm that might result from law and policy violations and how this technology might help to identify the system or human failure which caused it.

1.2 Requirements

The scenario should be "real" such that:
1. Events similar to those in the transaction logs can be found in real world electronic audit systems
2. Policy-checking process simulates real world lawyers' activities
3. Policy-checking results can be consumed by judges, government managers, and lawyers
4. There should be a reasonable number of events for scalability test.
The scenario should demo complex points of failure such that

it is not possible to avoid the failure by telling a single person or group, "Don't do x again.".
In other word, the violation cannot be caught simply by stopping a single transfer or group of transfers. For example, even though actor A is entitled to have access to data D, it is not permissible to use D for purpose P.

The policy checking process may need OWL DL inference over transaction logs, policies, or both to ensure completeness of policy checking. For example, knowing the diagnose of Mycobacterium tuberculosis may not directly tell whether it will lead to "imminent risk of harm to the individual or others". If we use additional OWL inference on an disease ontology to infer that Mycobacterium tuberculosis is a kind of contagious diseases, we can easily see the relation because the "imminent risk" is part of contagious diseases' definition.

1.3 Storyboard

Thread 1: Medicare transactions over an index case

Background: A hospital discovers that a patient has a rare form of Tuberculosis (TB) that is resistant to known treatment and has a high likelihood of infecting recently contacted people
Data collection: Upon receiving the index case, the CDC gathers information about the person and the patient's contacts, acquiring data from various sources including the Web, university administrators, credit card companies, phone companies, etc..
Data mining: the CDC uses the collected information to find a subset of contacts who have a high probability of being infected
Action: the CDC contacts the patient's contacts prioritized by the data mining results.

Thread 2: Business transactions that denies a service request

Background: After the CDC investigation, a phone company retains records that associate the CDC investigation with the record of each customer about whom the CDC inquired.
Data mining: Upon receiving a request for service from one such customer, a customer service specialist checks the company's database and collected information about the customer
Action: The specialist denied the service request based on collected information

Thread 3: Legal investigation on adverse consequence

Background: A customer files a complaint asserting that his phone service request was denied as the result of discrimination over his health status
Data collection: In response to a discovery request, the phone company's lawyer queries the E2ESA system through interactive interface for the relevant transaction logs, the lawyer infer additional facts, and then manually finds corresponding legal policies (e.g. HIPAA, MA-disability- discrimination)
Policy checking: The customer's lawyer asks the E2ESA system to walk the policies over the collected and manually inferred facts and E2ESA finds apparent violations
Action: The customer's lawyer files a motion for a finding in the customer's favor, presenting the policy checking results and justifications to the judge

1.4 Resources

2. Scenario Design

2.1 Notations and templates

Variations (of scenario 9)

A variation of scenario 9 can be used as a competence test for the expressiveness of policy language or the functionality of policy reasoner. A variation consists of three parts

transaction logs: applicable events from scenario 9 transaction logs
laws and policies: applicable laws and policies from scenario 9 laws and policies
expected results: the expected policy language representation requirements and the expected results in determining violation of laws and policies

Events

event
- data-collection-event: party A collect and store data M
- data-access-event: party B able to see but not keep data M from party A
- data-dissemination-event: passing data M from party A to party B
- data-use-event - party A use data M to take action K (non-data-manipulation)
- data-mining-event - party A use data M about individual X to derive data N

Laws and Policies

rules for data manipulation action
- data collection
- dissemination
- data use
- data mining
rules for concept classification, e.g. which instance is should be classified as "qualified handicapped individual"

2.2 Transaction Variations -- types of point of failure

Variation 1. The most simple case: policy violation caused by direct use of information

In this variation, we show how one event (the event in red color) in the transaction log directly violated one policy.

transaction logs
- thread 1
  - transaction 10
    - To conduct TB investigation, CDC queried Xphone for data about Alfred Newman
    - Xphone transferred Alfred Newman's record to CDC
  - transaction 11
    - To conduct TB investigation, CDC queried Xphone for data about all people whose numbers Alfred Newman had called, including Bob Same
    - Xphone transferred Bob Same's record to CDC
    - Xphone recorded the event of sending CDC personal records and the reason
- thread 2
  - transaction 15
    - event1: "Bob" requested "Betty" (Xphone service rep) approve "install home phone for Bob"
    - event2: "Betty" queries "Xphone database" to "find Bob Same's records"
    - event3: "Xphone database" informs "Betty" with the records about "Bob Same"
    - event4: "Betty" refused "Bob" in reply to event1 with reason "Bob was identified by TB investigation as possible carrier".
laws and policies
- MA Disability Discrimination(MADD): no one (in MA) should use one's health information to deny the person's benefits
  - Exception for risk of imminent harm
expected results
- Violation of MADD if there is no risk of imminent harm, Betty should not use Bob's health information to deny Bob's request for Xphone benefit
- Test if policy engine support OWL inference on rdfs:subPropertyOf
test data and policy
test result

AIR can represent MADD and Policy engine can detect the violation

two events in transaction 15 (variation 1)

Figure 1. violation of Massachusetts Disability Discrimination law in one event (event4) from transaction 15 (variation 1)

Variation 2. Complicated case, test policy representation (indirect use of information)

In this variation, we show a rule violated by a chain of events. Betty denies the customer a benefit using customer's records obtained by someone else about his health information. Indeed, she is not supposed to use any data derived from the health information.

transaction logs
- thread 1 (same as variation 1)
- thread 2
  - transaction 15
laws and policies
- MA Disability Discrimination: no one (in MA) should use one's health information to deny the person's benefits
- potentially some xphone privacy policies
expected results
- violation of MADD, Bob's service request should not be refused, because one of the source of this decision is his health information.
- non-violation of MADD without lawyer-inferred events
- policy language SHOULD be able to represent a violation with more than one events
test data and policy
test result:

AIR can represent MADD on indirect usage of information,
Policy engine can detect the violation because it supports recursive inference
AIR only supports blaming one event, i.e. event7. It cannot represent event5-7 together violation MADD.

four events in transaction 15 (variation 2)

Figure 2. violation of Massachusetts Disability Discrimination law in three events (event5-7) from transaction 15 (variation 2)

Variation 3, DPA computation. data purpose inheritance

transaction logs
- thread 1 (same as variation 1)
- thread 2 (most same as variation 2)
  - transaction 15
    - ...
    - (logged) event7: "Betty" refused "Bob" in-reply-to event1 with reason "Bob is blacklisted" at 10:11am to fulfill business purpose
laws and policies
- DPA generic policy: derived data always inherit the union of purposes of all premises
expected results
- violation of DPA, the (observed) purpose of data-use-event mismatches with the (allowed) purposes of indirectly used information
- test if policy engine support OWL inference on owl:complementOf, rdfs:subClassOf , owlSameAs
NOTE: we decided not working on it because it is tested in scenario 4

Variation 4, DPA computation. Policy representation (policy preemption and purpose introduction)

When disclosing information, Xphone will attach appropriate purpose to it as the result of enforcing their privacy policies.

transaction logs (same as variation 3)
laws and policies (same as variation 3)
expected violation
- introduce the appropriate purposes for the data "Bob's phone record" when sending the data to CDC
- policy language SHOULD be able to represent the case where data-use-event has more than one purposes.
- policy language SHOULD be able to represent XPhone Privacy Policy (rule-copy semantics)
- policy language SHOULD be able to represent policy exception. In this case, verizon privacy principle 4 should be override by its exceptions
- policy language SHOULD be able to represent policy preemption. In this case, Verizion Privacy Principle 4 is override by Verizon Disclosure Policy, and Electronic communication dissemination rule 3 is overridden by Verizion Privacy Principle 4
test data
NOTE: this variation cannot be handled by AIR. Some kind of preemption or overriding mechanism is required.

possible future variation: Complicated case, test policy representation, rule preemption

in this variation, we show an event is allowed after applying preempted rules

transaction
- ... see above
policy
- MA Disability Discrimination: (see above)
- FICTION RULE: anyone may use one's any (including health) information to prevent future TB infection
expected violation
- Betty is allowed to use Bob's health information to deny Bob's request because FICTION RULE preempt MADD.

possible future variation: Complicated case, test connection to OWL, using existing OWL inference feature

in this variation, we show that event log can be enriched by including the ontology referenced by transaction log using OWL DL inference.

users may defined a hierarchy of classes to classify diseases, e.g. The class disease-TB may have an super class "contagious-disease" (see a relevant ontology and NCBI page) which can cause imminent harm to other people. With such class hierarchy, Xphone can justify their disclosure of customer's information to CDC.

e.g. use owl:Restriction and DL class subsumption inference. When specifying the rule, owl Restriction is preferred because we cannot previous assume we can fully enumerate class members.

possible future variation: Complicated case, testing system scalability, the full transaction log,

possible future variation: Complicated case, test transaction log? information leakage

prevent CDC disclose TB infection information, or prevent company collect such information?

possible future variation: Complicated case, test access control, accessing daisy girl scout database

how a RDF database can be equipped with privacy policy checking upon data access request.

3. Publication Plan

focus on data manipulation domain
domain ontology for logging data manipulation
- data summary (can we keep user's privacy?): category, id
- events: data-use, data-transfer
- data purpose
applicable policies
- privacy policy
- data-use policy
- rules of evidence
- other policies
scenarios

4. Conclusions

We have generated a scenario to investigate expressiveness of policy language and expected policy checking mechanisms.

References

P.A. Bonatti, C. Duma, N. Fuchs, W. Nejdl, D. Olmedilla, J. Peer, and N. Shahmehri, Semantic Web Policies - A Discussion of Requirements and Research Issues [PDF]

Change Log

Nov 11, minor revision according K's inputs. updated variation 1 through 4
Oct 29, added variation details. add example data and corresponding policies.
Oct 14, adding scenario design - transaction log section. enumerate the variations for requirement testing.
Oct 11, adding introduction section to bring an overview

Last updated on Nov 11, 2007. Maintained by Li Ding.