1. Overview
1.1 Goals
Scenario 9 is a test-bed for policy language and policy checking
mechanisms in our accountability infrastructure with the following
goals:
- enumerate related laws and policies and verify whether the policy language can correctly represent these laws and policies
- enumerate
policy checking test-cases and verify whether our policy engine can
consistently find the use of data which
constituted a violation of law or policy in the transaction logs
Below are several important notions used throughout this report:
- transaction logs - audit logs of data-manipulation
events, maintained by accountability servers (ASs)
- laws and policies - a list of laws and policies that apply to the data or data
manipulation in the transaction log
- violation of laws or policies - violations
may be in the form of unauthorized access, dissemination, manipulation, or
conclusions derived from them
Part of our motivation is to ensure that people who suffer
adverse consequences (e.g., ranging from the loss of an
apartment or job to the loss of life or liberty) as a result of such
violations of their privacy will have an aid in their path to redress. For
this reason, each of our scenario's variation illustrates a real-world harm that might
result from law and policy violations and how this technology might help to
identify the system or human failure which caused it.
1.2 Requirements
- The scenario should be "real" such that:
- Events similar to those in the transaction logs can be found in
real world electronic audit systems
- Policy-checking process simulates real world lawyers'
activities
- Policy-checking results can be consumed by judges, government
managers, and lawyers
- There should be a reasonable number of events for scalability test.
- The scenario should demo complex
points of failure such that
- it is not possible to avoid the failure by telling a single person
or group, "Don't do x again.".
- In other word, the violation cannot be caught simply by stopping a single
transfer or group of transfers. For example, even though actor A is
entitled to have access to data D, it is not permissible to use D for
purpose P.
- The policy checking process may need OWL DL inference over
transaction logs, policies, or both to ensure completeness of policy
checking. For example, knowing the diagnose of Mycobacterium tuberculosis
may not directly tell whether it will
lead to "imminent risk of harm to the individual or others". If we use
additional OWL inference on an disease ontology to infer that Mycobacterium tuberculosis is a kind of contagious diseases, we can easily see the relation because the "imminent risk" is part of contagious diseases' definition.
1.3 Storyboard
Thread 1: Medicare transactions over an index case
- Background: A hospital discovers that a patient has a rare
form of Tuberculosis (TB) that is resistant to known treatment and has a
high likelihood of infecting recently contacted people
- Data collection: Upon receiving the index case, the CDC gathers
information about the person and the patient's contacts,
acquiring data from various sources including the Web,
university administrators, credit card companies, phone companies,
etc..
- Data mining: the CDC uses the collected information to find a subset of
contacts who have a high probability of being infected
- Action: the CDC contacts the patient's contacts prioritized by the
data mining results.
Thread 2: Business transactions that denies a service request
- Background: After the CDC investigation, a phone company retains
records that associate the CDC investigation with the record of each
customer about whom the CDC inquired.
- Data mining: Upon receiving a request for service from one such
customer, a customer service specialist checks the company's
database and collected information about the customer
- Action: The specialist denied the service request based on
collected information
Thread 3: Legal investigation on adverse consequence
- Background: A customer files a complaint
asserting that his phone service request was denied as the result of
discrimination over his health status
- Data collection: In response to a discovery
request, the phone company's lawyer queries the
E2ESA system through interactive interface for the relevant transaction logs, the lawyer infer additional facts, and then manually finds corresponding
legal policies (e.g. HIPAA, MA-disability- discrimination)
- Policy checking: The customer's lawyer asks the E2ESA
system to walk the policies over the collected and manually inferred facts
and E2ESA finds apparent violations
- Action: The customer's lawyer files a motion for a
finding in the customer's favor, presenting the policy checking
results and justifications to the judge
1.4 Resources
2. Scenario Design
2.1 Notations and templates
Variations (of scenario 9)
A variation of scenario 9 can be used as a competence test for the
expressiveness of policy language or the functionality of policy reasoner.
A variation consists of three parts
- transaction logs: applicable events from scenario 9 transaction logs
- laws and policies: applicable laws and policies from scenario 9 laws and policies
- expected results: the expected policy language representation
requirements and the expected results in determining violation of laws
and policies
Events
- event
- data-collection-event: party A collect and store data M
- data-access-event: party B able to see but not keep data M
from party A
- data-dissemination-event: passing data M from party A to party
B
- data-use-event - party A use data M to take action K
(non-data-manipulation)
- data-mining-event - party A use data M about individual X to
derive data N
Laws and Policies
- rules for data manipulation action
- data collection
- dissemination
- data use
- data mining
- rules for concept classification, e.g. which instance is should be
classified as "qualified handicapped individual"
2.2 Transaction Variations -- types of point of failure
Variation 1. The most simple
case: policy violation caused by direct use of information
In this variation, we show how one event (the event in red color) in
the transaction log directly violated one policy.
-
transaction logs
- thread 1
- transaction 10
- To conduct TB investigation, CDC queried Xphone for data
about Alfred Newman
- Xphone transferred Alfred Newman's record to CDC
- transaction 11
- To conduct TB investigation, CDC queried Xphone for data
about all people whose numbers Alfred Newman had called,
including Bob Same
- Xphone transferred Bob Same's record to CDC
- Xphone recorded the event of sending CDC personal records
and the reason
- thread 2
- transaction 15
- event1: "Bob" requested "Betty" (Xphone service rep)
approve "install home phone for Bob"
- event2: "Betty" queries "Xphone database" to
"find Bob Same's records"
- event3: "Xphone database" informs "Betty" with the
records about "Bob Same"
- event4: "Betty" refused
"Bob" in reply to event1 with reason "Bob was identified
by TB investigation as possible
carrier".
- laws and policies
- MA
Disability Discrimination(MADD): no one (in MA) should use
one's health information to deny the person's benefits
- Exception for risk of imminent harm
- expected results
- Violation of MADD if there is no risk of imminent harm,
Betty should not use Bob's health information to deny Bob's
request for Xphone benefit
- Test if policy engine support OWL inference on
rdfs:subPropertyOf
- test data and policy
- test result
- AIR can represent MADD and Policy engine can detect the violation
Figure 1. violation of
Massachusetts Disability Discrimination law in one event (event4) from transaction 15 (variation 1)
Variation 2. Complicated case,
test policy representation (indirect use of information)
In this variation, we show a rule violated by a chain of
events. Betty denies the customer a benefit using customer's
records obtained by someone else about his health information. Indeed,
she is not supposed to use any data derived from the health
information.
-
transaction logs
- thread 1 (same as variation 1)
-
(logged) event1: "Bob" requested "Betty" schedule
"Xphone installing a home phone for Bob" at 10:00am
- (logged) event1b: "Betty" queries "Xphone database" to "find
Bob Same's records"
- (logged) event1c: "Xphone database" informs "Betty"
with records about "Bob Same"
-
(logged) event2: "Betty" called "Alex" between 10:05-10:10am
-
(logged) event3: "Alex" queried XphoneDB for "Query status of
Bob" at 10:07am
-
(logged) event4: XphoneDB transferred "Alex" about "Bob may have
TB infection" at 10:08am.
-
(laywer inferred) event5: "Alex" derived "Bob is blacklisted"
from "Bob may have TB infection"..
-
(laywer inferred) event6: "Alex" transferred "Betty" about "Bob
is blacklisted"
- (logged) event7: "Betty" refused
"Bob" in-reply-to event1 with reason "Bob is blacklisted" at
10:11am
- laws and policies
- MA
Disability Discrimination: no one (in MA) should use
one's health information to deny the person's benefits
- potentially some xphone privacy policies
- expected results
- violation of MADD, Bob's service request should not be refused,
because one of the source of this decision is his health
information.
- non-violation of MADD without lawyer-inferred events
- policy language SHOULD be able to represent a violation with more
than one events
- test data and policy
- test result:
- AIR can represent MADD on indirect usage of information,
- Policy engine can detect the violation because it supports recursive inference
- AIR only supports blaming one event, i.e. event7. It cannot represent event5-7 together violation MADD.
Figure 2. violation of
Massachusetts Disability Discrimination law in three events (event5-7) from transaction 15 (variation 2)
Variation 3, DPA
computation. data purpose inheritance
-
transaction logs
- thread 1 (same as variation 1)
- thread 2 (most same as variation 2)
- transaction 15
- ...
- (logged) event7: "Betty" refused "Bob" in-reply-to event1
with reason "Bob is blacklisted" at 10:11am to fulfill business
purpose
- laws and policies
- DPA generic policy: derived data always inherit the union of
purposes of all premises
- expected results
- violation of DPA, the (observed) purpose of
data-use-event mismatches with the (allowed) purposes of
indirectly used information
- test if policy engine support OWL inference on owl:complementOf,
rdfs:subClassOf , owlSameAs
- NOTE: we decided not working on it because it is tested in scenario 4
Variation 4, DPA
computation. Policy representation (policy preemption and purpose
introduction)
When disclosing information, Xphone will attach appropriate purpose to it
as the result of enforcing their privacy policies.
- transaction logs (same as variation 3)
- laws and policies (same as variation 3)
- expected violation
- introduce the appropriate purposes for the data "Bob's phone
record" when sending the data to CDC
- policy language SHOULD be able to represent the case where
data-use-event has more than one purposes.
- policy language SHOULD be able to represent XPhone Privacy Policy
(rule-copy semantics)
- policy language SHOULD be able to represent policy exception. In
this case, verizon privacy principle 4 should be override by its
exceptions
- policy language SHOULD be able to represent policy
preemption. In this case, Verizion Privacy Principle 4 is
override by Verizon Disclosure Policy, and Electronic communication
dissemination rule 3 is overridden by Verizion Privacy Principle
4
- test data
- NOTE: this variation cannot be handled by AIR. Some kind of preemption or overriding mechanism is required.
possible future variation: Complicated case, test policy representation, rule preemption
in this variation, we show an event is allowed after applying preempted
rules
-
transaction
- policy
- MA
Disability Discrimination: (see above)
- FICTION RULE: anyone may use one's any (including health)
information to prevent future TB infection
- expected violation
- Betty is allowed to use Bob's health information to deny Bob's
request because FICTION RULE preempt MADD.
possible future variation: Complicated case, test connection to OWL, using existing OWL
inference feature
in this variation, we show that event log can be enriched by
including the ontology referenced by transaction log using OWL DL
inference.
users may defined a hierarchy of classes to classify diseases, e.g.
The class disease-TB may have an super class "contagious-disease" (see a relevant ontology and NCBI page) which can cause imminent harm to other people. With such class hierarchy, Xphone can justify their disclosure of customer's information to CDC.
e.g. use owl:Restriction and DL class subsumption inference.
When specifying the rule, owl Restriction is preferred because we
cannot previous assume we can fully enumerate class members.
possible future variation: Complicated case, testing system scalability, the full transaction
log,
possible future variation: Complicated case, test transaction log? information leakage
prevent CDC disclose TB infection information, or prevent company collect
such information?
possible future variation: Complicated case, test access control, accessing daisy girl scout
database
how a RDF database can be equipped with privacy policy checking upon data
access request.
3. Publication Plan
- focus on data manipulation domain
- domain ontology for logging data manipulation
- data summary (can we keep user's privacy?): category, id
- events: data-use, data-transfer
- data purpose
- applicable policies
- privacy policy
- data-use policy
- rules of evidence
- other policies
- scenarios
4. Conclusions
We have generated a scenario to investigate expressiveness of policy language and expected policy checking mechanisms.
References
- P.A. Bonatti, C. Duma, N. Fuchs, W. Nejdl, D. Olmedilla, J. Peer, and
N. Shahmehri, Semantic Web Policies - A Discussion of Requirements and
Research Issues [PDF]
Change Log
- Nov 11, minor revision according K's inputs. updated variation 1 through 4
- Oct 29, added variation details. add example data and corresponding policies.
- Oct 14, adding scenario design - transaction log section. enumerate the
variations for requirement testing.
- Oct 11, adding introduction section to bring an overview
Last updated on Nov 11, 2007. Maintained by Li Ding.