Privacy

Accountability Appliances: What Lawyers Expect to See - Part III (User Interface)

I've written in the last two blogs about how lawyers operate in a very structured enviroment. This will have a tremendous impact on what they'll consider acceptable in a user interface. They might accept something which seems a bit like an outline or a form, but years of experience tell me that they will rail at anything code-like.

For example, we see

:MList a rdf:List

and automatically read

"MList" is the name of a list written in rdf

Or,

air:pattern {
:MEMBER air:in :MEMBERLIST.


and know that we are asking our system to look for a pattern in the data in which a particular "member" is in a particular list of members. Perhaps because law is already learning to read, speak, and think in another language, most lawyers look at lines like those above and see no meaning.

Our current work-in-progress produces output that includes:


bjb reject bs non compliant with S9Policy 1

Because

phone record 2892 category HealthInformation

Justify

bs request instruction bs request content
type Request
bs request content intended beneficiary customer351
type Benefit Action Instruction
customer351 location MA
xphone record 2892 about customer351



Nearly every output item is a hotlink to something which provides definition, explanation, or derivation. Much of it is in "Tabulator", the cool tool that aggregates just the bits of data we want to know.

From a user-interface-for-lawyers perspective, this version of output is an improvement over our earlier ones because it removes a lot of things programmers do to solve computation challenges. It removes colons and semi-colons from places they're not commonly used in English (i.e., as the beginning of a term) and mostly uses words that are known in the general population. It also parses "humpbacks" - the programmers' traditional
concatenation of a string of words - back into separate words. And, it replaces hyphens and underlines - also used for concatenation - with blank spaces.

At last week's meeting, we talked about the possibility of generating output which simulates short English sentences. These might be stilted but would be most easily read by lawyers. Here's my first attempt at the top-level template:

 

Issue: Whether the transactions in [TransactionLogFilePopularName] {about [VariableName] [VariableValue]} comply with [MasterPolicyPopularName]?

Rule: To be compliant, [SubPolicyPopularName] of [MasterPolicyPopularName] requires [PatternVariableName] of an event to be [PatternValue1].

Fact: In transaction [TransactionNumber] [PatternVariableName] of the event was [PatternValue2].

Analysis: [PatternValue2] is not [PatternValue].

Conclusion: The transactions appear to be non-compliant with [SubPolicyName] of [MasterPolicyPopularName].



This seems to me approximately correct in the context of requests for the appliance to reason over millions of transactions with many sub-rules. A person seeking an answer from the system would create the Issue question. The Issue question is almost always going to ask whether some series of transactions violated a super-rule and often will have a scope limiter (e.g., in regards to a particular person or within a date scope or by one entity), denoted here by {}.

From the lawyer perspective, the interesting part of the result is the finding of non-compliance or possible non-compliance. So, the remainder of the output would be generated to describe only the failure(s) in a pattern-matching for one or more sub-rules. If there's more than one violation, the interface would display the Issue once and then the Rule to Conclusion steps for each non-compliant result.

I tried this out on a laywer I know. He insisted it was unintelligible when the []'s were left in but said it was manageable when he saw the same text without them.


For our Scenario 9, Transaction 15, an idealized top level display would say:


Issue: Whether the transactions in Xphone's Customer Service Log about Person Bob Same comply with MA Disability Discrimination Law?

Rule: To be compliant, Denial of Service Rule of MA Disability Discrimination Law requires reason of an event to be other than disability.

Fact: In transaction Xphone Record 2892 reason of the event was Infectious Disease.

Analysis: Infectious disease is not other than disability.

Conclusion: The transactions appear to be non-compliant with Denial of Service Rule of MA Disability Discrimination Law.



Each one of the bound values should have a hotlink to a Tabulator display that provides background or details.



Right now, we might be able to produce:


Issue: Whether the transactions in Xphone's Customer Service Log about Betty JB reject Bob Same comply with MA Disability Discrimination Law?

Rule: To be non-compliant, Denial of Service Rule of MA Disability Discrimination Law requires REASON of an event to be category Health Information.

Fact: In transaction Xphone Record 2892 REASON of the event was category Health Information.

Analysis: category Health Information is category Health Information.

Conclusion: The transactions appear to be non-compliant with Denial of Service Rule of MA Disability Discrimination Law.




This example highlights a few challenges.

1) It's possible that only failures of policies containing comparative matches (e.g., :v1 sameAs :v2; :v9 greaterThan :v3; :v12 withinDateRange :v4) are legally relevant. This needs more thought.

2) We'd need to name every sub-policy or have a default called UnnamedSubPolicy.

3) We'd need to be able to translate statute numbers to popular names and have a default instruction to include the statute number when no popular name exists.

4) We'd need some taxonomies (e.g., infectious disease is a sub-class of disability).

5) In a perfect world, we'd have some way to trigger a couple alternative displays. For example, it would be nice to be able to trigger one of two rule structures: either one that says a rule requires a match or one that says a rules requires a non-match. The reason for this is that if we always have to use the same structure, about half of the outputs will be very stilted and cause the lawyers to struggle to understand.

6) We need someway to deal with something the system can't reason. If the law requires the reason to be disability and the system doesn't know whether health information is the same as or different from disability, then it ought to be able to produce an analysis that says something along the lines of "The relationship between Health Information and disability is unknown" and produce a conclusion that says "Whether the transaction is compliant is unknown." If we're reasoning over millions of transactions there are likely to be quite a few of these and they ought to be presented after the non-compliant ones.

 

 

Accountability Appliances: What Lawyers Expect to See - Part II (Structure)

Submitted by kkw on Thu, 2008-01-10 14:16. :: | | | | |

Building accountability appliances involves a challenging intersection between business, law, and technology. In my first blog about how to satisfy the legal portion of the triad, I explained that - conceptually - the lawyer would want to know whether particular digital transactions had complied with one or more rules. Lawyers, used to having things their own way, want more... they want to get the answer to that question in a particular structure.

All legal cases are decided using the same structure. As first year law students, we spend a year with highlighter in hand, trying to pick out the pieces of that structure from within the torrent of words of court decisions. Over time, we become proficient and -- like the child who stops moving his lips when he reads -- the activity becomes internalized and instinctive. From then on, we only notice that something's not right by its absence.

The structure is as follows:

  • ISSUE - the legal question that is being answered. Most typically it begins with the word "whether" "Whether the Privacy Act was violated?" Though the bigger question is whether an entire law was violated, because laws tend to have so many subparts and variables, we often frame a much narrower issue based upon a subpart that we think was violated, such as "Whether the computer matching prohibition of the Privacy Act was violated?"
  • RULE - provides the words and the source of the legal requirement. This can be the statement of a particular law, such as "The US Copyright law permits unauthorized use of copyrighted work based upon four conditions - the nature of use, the the nature of the work, the amount of the work used, and the likely impact on the value of the work. 17 USC § 107." Or, it can be a rule created by a court to explain how the law is implemented in practical situations: "In our jurisdiction, there is no infringement of a copyrighted work when the original is distributed widely for free because there is no diminution of market value. Field v. Google, Inc., 412 F. Supp 2d. 1106 (D.Nev. 2006)." [Note: The explanation of the citation formats for the sources has filled books and blogs. Here's a good brief explanation from Cornell.]
  • FACTS - the known or asserted facts that are relevant to the rule we are considering and the source of the information. In a Privacy Act computer matching case, there will be assertions like "the defendant's CIO admitted in deposition that he matched the deadbeat dads list against the welfare list and if there were matches he was to divert the benefits to the custodial parent." In a copyright case fair use case, a statement of facts might include "plaintiff admitted that he has posted the material on his website and has no limitations on access or copying the work."
  • ANALYSIS - is where the facts are pattern-matched to the rule. "The rule does not permit US persons to lose benefits based upon computer matched data unless certain conditions are met. Our facts show that many people lost their welfare benefits after the deadbeat data was matched to the welfare rolls without any of the other conditions being met." Or "There can be no finding of copyright infringement where the original work was so widely distributed for free that it had no market value. Our facts show that Twinky Co. posted its original material on the web on its own site and every other site where it could gain access without any attempt to control copying or access."

  • CONCLUSION - whether a violation has or has not occurred. "The computer matching provision of the Privacy Act was violated." or "The copyright was not infringed.

In light of this structure, we've been working on parsing the tremendous volume of words into their bare essentials so that they can be stored and computed to determine whether certain uses of data occurred in compliance with law. Most of our examples have focused on privacy.

Today, the number of sub-rules, elements of rules, and facts are often so voluminous that there is not enough time for a lawyer or team of lawyers to work through them all. So, the lawyer guesses what's likely to be a problem and works from there; the more experienced or talented the lawyer, the more likely that the guess leads to a productive result. Conversely, this likely means that many violations are never discovered. One of the great benefits of our proposed accountability appliance is that it could quickly reason over a massive volume of sub-rules, elements, and facts to identify the transactiions that appear to violate a rule or for which there's insufficient information to make a determination.

Although we haven't discussed it, I think there also will be a benefit to be derived from all of the reasoning that concludes that activities were compliant. I'm going to try to think of some high value examples.

 

 

Two additional blogs are coming:

Physically, what does the lawyer expect to see? At the simplest level, lawyers are expecting to see things in terms they recognize and without unfamiliar distractions; even the presence of things like curly brackets or metatags will cause most to insist that the output is unreadable. Because there is so much information, visualization tools present opportunities for presentations that will be intuitively understood.

And:

The 1st Lawyer to Programmer/Programmer to Lawyer Dictionary! Compliance, auditing, privacy, and a host of other topics now have lawyers and system developers interacting regularly. As we've worked on DIG, I've noticed how the same words (e.g., rules, binding, fact) have different meanings.

 

Accountability Appliances: What Lawyers Expect to See - Part I

Submitted by kkw on Wed, 2008-01-02 12:59. :: | | |

Just before the holidays, Tim suggested I blog about "what lawyers expect to see" in the context of our accountability appliances projects. Unfortunately, being half-lawyer, my first response is that maddening answer of all lawyers - "it depends." And, worse, my second answer is - "it depends upon what you mean by 'see'". Having had a couple of weeks to let this percolate, I think I can offer some useful answers.

Conceptually, what does the lawyer expect to see? The practice of law has a fundamental dichotomy. The law is a world of intense structure -- the minutae of sub-sub-sub-parts of legal code, the precise tracking of precedents through hundreds of years of court decisions, and so on. But, the lawyers valued most highly are not those who are most structured. Instead, it is those who are most creative at manipulating the structure -- conjuring compelling arguments for extending a concept or reading existing law with just enough of a different light to convince others that something unexpected supersedes something expected. In our discussions, we have concluded that an accountability appliance we build now should address the former and not the latter.

For example, a lawyer could ask our accountability appliance if a single sub-rule had been complied with: "Whether the federal Centers for Disease Control was allowed to pass John Doe's medical history from its Epidemic Investigations Case Records system to a private hospital under the Privacy Act Routine Use rules for that system?" Or, he could ask a question which requires reasoning over many rules. Asking "Whether the NSA's data mining of telephone records is compliant with the Privacy Act?" would require reasoning over the nearly thirty sub-rules contained within the Privacy Act and would be a significant technical accomplishment. Huge numbers of hours are spent to answer these sorts of questions and the automation of the more linear analysis would make it possible to audit vastly higher numbers of transactions and to do so in a consistent manner.

If the accountability appliance determined that a particular use was non-compliant, the lawyer could not ask the system to find a plausible exception somewhere in all of law. That would require reasoning, prioritizing, and de-conflicting over possibly millions of rules -- presenting challenges from transcribing all the rules into process-able structure and creating reasoning technology that can efficiently process such a volume. Perhaps the biggest challenge, though, is the ability to analogize. The great lawyer draws from everything he's ever seen or heard about to assimilate into the new situation to his client's benefit. I believe that some of the greatest potential of the semantic web is in the ability to make comparisons -- I've been thinking about a "what's it like?" engine -- but this sort of conceptual analogizing seems still a ways in the future.

 

Stay tuned for two additional blogs:

Structurally, what does the lawyer expect to see? The common law (used in the UK, most of its former colonies, including the US federal system, and most of US states) follows a standard structure for communicating. Whether a lawyer is writing a motion or a judge is writing a decision, there is a structure embedded within all of the verbiage. Each well-formed discussion includes five parts: issue, rule, fact, analysis, and conclusion.

Physically, what does the lawyer expect to see? At the simplest level, lawyers are expecting to see things in terms they recognize and without unfamiliar distractions; even the presence of things like curly brackets or metatags will cause most to insist that the output is unreadable. Because there is so much information, visualization tools present opportunities for presentations that will be intuitively understood.

And:

The 1st Lawyer to Programmer/Programmer to Lawyer Dictionary! Compliance, auditing, privacy, and a host of other topics now have lawyers and system developers interacting regularly. As we've worked on DIG, I've noticed how the same words (e.g., rules, binding, fact) have different meanings.

A look at emerging Web security architectures from a Semantic Web perspective

Submitted by connolly on Fri, 2006-03-17 17:51. :: | | | | | |

W3C had a workshop, Toward a more Secure Web this week. Citigroup hosted; the view from the 50th floor was awesome.

Some notes on the workshop are taking shape:

A look at emerging Web security architectures from a Semantic Web perspective

Comparing OpenID, SXIP/DIX, InfoCard, SAML to RDF, GRDDL, FOAF, P3P, XFN and hCard

At the W3C security workshop this week, I finally got to study SXIP in some detail after hearing about it and wondering how it compares to OpenID, Yadis, and the other "Identity 2.0" techniques brewing. And just in time, with a DIX/SXIP BOF at the Dallas IETF next week.

Talking About Your Friends (in FOAF)

Submitted by sandro on Mon, 2006-02-06 23:23. ::

Last week I took my first serious stab at making a FOAF file for myself. I found myself in the middle of a couple of social problems.

Naming My Friends

I have 123 "friends" on orkut -- people who have acknowledged they are my "friends". The orkut terms of use are pretty clear:

As an orkut member, you can create a profile or orkut community that includes personal information, such as your gender, age, occupation, hobbies, and interests, plus other content, such as photos. This information may be accessed and viewed by other orkut members.

So my friends have no expectation of privacy, right? But in a quick survey of 31 friends, only 55% said that were okay with me saying in my FOAF file that I knew someone by their name. (That is, just saying { sandro:sandro foaf:knows [ foaf:name "John Smith" ]. }) Only 39% said it was okay to include a secure-hash of their mailbox. 13% said I could include the plaintext as well.

Maybe I didn't ask the question well, but I suspect the real problem is that no matter how well people understood the situation, no one really understands it. No one knows what threats to them might materialize from me listing them in my FOAF file.

A friend of mine asked the question differently in a parallel survey, saying

Imagine one of your friends posts a list of their friends' real names. Nothing on the page ties your real name and online identities together, but your real name and your friend's real name are now on one Googlable page. Are you upset with your friend?

65% of his sample (which was also 31 people, but different people) said "No", they were not upset. 30% were "Not Sure" and the rest said Yes.

So I think the answer is: only list people when you have their explicit permission. I think people who have their own FOAF file probably can be assumed to be granting permission. Maybe it's best to just convince people to make themselves a FOAF file (via whatever service provider they like, which hides the details).

Protecting Pseudonymity

Meanwhile, I have about a hundred livejournal friends. They present a different problem. Livejournal already publishes a FOAF file listing them. It has several problems (like using bNodes instead of URIs to name people), but here's the real problem:

How do I relate my livejournal identity to my professional identity? Do I link to people's LJ identities from my work-related FOAF file? 38% of my polled population said "no". But, oddly, 35% of them (ie all but one) were okay with me linking to my own LJ identity, making them two hops away.

How much do hops count in the semantic web? Not much, I think.

I heard a few interesting stories in response to my poll. People seemed concerned about losing their job if they were too public about their blog. It's okay to write "my employer sucks" if and only if the reader has to do some real work to figure out who your employer is. One friend mentioned having a job where the appearance of neutrality is important, so having an opinioned blog is fine if and only if their name is not obviously associated with it.

I'm reminded of the Judge Jackson's "appearance of partiality" mess that helped save Microsoft.

So what's the right course of action? I could just avoid linking to LJ. I could link to a few people on LJ — including myself — with foaf:knows. So I know a few of the threats now. I don't really know the benefits, though.

Benefits?

Oh yeah. What is the Use Case for FOAF?

The most ineffable is whatever compelled me to build that orkut list in the first place, to surf around my friends to find the people I had missed, to discover the back-door connections between people. To search my brain (and computer) for everyone I knew.

In Guns, Germs, and Steel (page 271) Jared Diamond writes:

In traditional New Guinea society, if a New Guinean happened to encounter an unfamiliar New Guinean while both were away from their respective villages, the two engaged in a long discussion of their relatives, in an attempt to establish some relationship and hence some reason why the two should not attempt to kill each other.

Maybe this is related. Is that a trust-network issue?

More concretely, can I use the network of who-knows-who to figure out how much to trust people? This is the basis of friendster as a dating service, and the friends-network part of okcupid.

If I learn a little bit about someone -- they send me e-mail, or we're briefly introduced -- it's tempting to look up information about them, and to ask around about them. Does that really produce better results? I doubt it. It just reinforces my prejudices.

The most concrete thing I want from FOAF is convenient access control. This is one of the things LJ does it; of course flikr learns about your friends for this reason, too. But they should both just be using the same data, right? An open standard for telling systems who you trust to see and do certain things.

But I think I'll have to name them by their pseudonym, pointing to their web presence -- whatever it may be -- instead of what I know about them from the so-called real world. It's not people we should be talking about, it's personal sites / blogs / personal-points-of-web-presence. Fortunately, this is exactly how OpenID works.

Hmmm. Sound like the next post might just have to be foaf:Person Seen As Harmful..

Syndicate content