tabulator

Accountability Appliances: What Lawyers Expect to See - Part III (User Interface)

I've written in the last two blogs about how lawyers operate in a very structured enviroment. This will have a tremendous impact on what they'll consider acceptable in a user interface. They might accept something which seems a bit like an outline or a form, but years of experience tell me that they will rail at anything code-like.

For example, we see

:MList a rdf:List

and automatically read

"MList" is the name of a list written in rdf

Or,

air:pattern {
:MEMBER air:in :MEMBERLIST.


and know that we are asking our system to look for a pattern in the data in which a particular "member" is in a particular list of members. Perhaps because law is already learning to read, speak, and think in another language, most lawyers look at lines like those above and see no meaning.

Our current work-in-progress produces output that includes:


bjb reject bs non compliant with S9Policy 1

Because

phone record 2892 category HealthInformation

Justify

bs request instruction bs request content
type Request
bs request content intended beneficiary customer351
type Benefit Action Instruction
customer351 location MA
xphone record 2892 about customer351



Nearly every output item is a hotlink to something which provides definition, explanation, or derivation. Much of it is in "Tabulator", the cool tool that aggregates just the bits of data we want to know.

From a user-interface-for-lawyers perspective, this version of output is an improvement over our earlier ones because it removes a lot of things programmers do to solve computation challenges. It removes colons and semi-colons from places they're not commonly used in English (i.e., as the beginning of a term) and mostly uses words that are known in the general population. It also parses "humpbacks" - the programmers' traditional
concatenation of a string of words - back into separate words. And, it replaces hyphens and underlines - also used for concatenation - with blank spaces.

At last week's meeting, we talked about the possibility of generating output which simulates short English sentences. These might be stilted but would be most easily read by lawyers. Here's my first attempt at the top-level template:

 

Issue: Whether the transactions in [TransactionLogFilePopularName] {about [VariableName] [VariableValue]} comply with [MasterPolicyPopularName]?

Rule: To be compliant, [SubPolicyPopularName] of [MasterPolicyPopularName] requires [PatternVariableName] of an event to be [PatternValue1].

Fact: In transaction [TransactionNumber] [PatternVariableName] of the event was [PatternValue2].

Analysis: [PatternValue2] is not [PatternValue].

Conclusion: The transactions appear to be non-compliant with [SubPolicyName] of [MasterPolicyPopularName].



This seems to me approximately correct in the context of requests for the appliance to reason over millions of transactions with many sub-rules. A person seeking an answer from the system would create the Issue question. The Issue question is almost always going to ask whether some series of transactions violated a super-rule and often will have a scope limiter (e.g., in regards to a particular person or within a date scope or by one entity), denoted here by {}.

From the lawyer perspective, the interesting part of the result is the finding of non-compliance or possible non-compliance. So, the remainder of the output would be generated to describe only the failure(s) in a pattern-matching for one or more sub-rules. If there's more than one violation, the interface would display the Issue once and then the Rule to Conclusion steps for each non-compliant result.

I tried this out on a laywer I know. He insisted it was unintelligible when the []'s were left in but said it was manageable when he saw the same text without them.


For our Scenario 9, Transaction 15, an idealized top level display would say:


Issue: Whether the transactions in Xphone's Customer Service Log about Person Bob Same comply with MA Disability Discrimination Law?

Rule: To be compliant, Denial of Service Rule of MA Disability Discrimination Law requires reason of an event to be other than disability.

Fact: In transaction Xphone Record 2892 reason of the event was Infectious Disease.

Analysis: Infectious disease is not other than disability.

Conclusion: The transactions appear to be non-compliant with Denial of Service Rule of MA Disability Discrimination Law.



Each one of the bound values should have a hotlink to a Tabulator display that provides background or details.



Right now, we might be able to produce:


Issue: Whether the transactions in Xphone's Customer Service Log about Betty JB reject Bob Same comply with MA Disability Discrimination Law?

Rule: To be non-compliant, Denial of Service Rule of MA Disability Discrimination Law requires REASON of an event to be category Health Information.

Fact: In transaction Xphone Record 2892 REASON of the event was category Health Information.

Analysis: category Health Information is category Health Information.

Conclusion: The transactions appear to be non-compliant with Denial of Service Rule of MA Disability Discrimination Law.




This example highlights a few challenges.

1) It's possible that only failures of policies containing comparative matches (e.g., :v1 sameAs :v2; :v9 greaterThan :v3; :v12 withinDateRange :v4) are legally relevant. This needs more thought.

2) We'd need to name every sub-policy or have a default called UnnamedSubPolicy.

3) We'd need to be able to translate statute numbers to popular names and have a default instruction to include the statute number when no popular name exists.

4) We'd need some taxonomies (e.g., infectious disease is a sub-class of disability).

5) In a perfect world, we'd have some way to trigger a couple alternative displays. For example, it would be nice to be able to trigger one of two rule structures: either one that says a rule requires a match or one that says a rules requires a non-match. The reason for this is that if we always have to use the same structure, about half of the outputs will be very stilted and cause the lawyers to struggle to understand.

6) We need someway to deal with something the system can't reason. If the law requires the reason to be disability and the system doesn't know whether health information is the same as or different from disability, then it ought to be able to produce an analysis that says something along the lines of "The relationship between Health Information and disability is unknown" and produce a conclusion that says "Whether the transaction is compliant is unknown." If we're reasoning over millions of transactions there are likely to be quite a few of these and they ought to be presented after the non-compliant ones.

 

 

Accountability Appliances: What Lawyers Expect to See - Part II (Structure)

Submitted by kkw on Thu, 2008-01-10 14:16. :: | | | | |

Building accountability appliances involves a challenging intersection between business, law, and technology. In my first blog about how to satisfy the legal portion of the triad, I explained that - conceptually - the lawyer would want to know whether particular digital transactions had complied with one or more rules. Lawyers, used to having things their own way, want more... they want to get the answer to that question in a particular structure.

All legal cases are decided using the same structure. As first year law students, we spend a year with highlighter in hand, trying to pick out the pieces of that structure from within the torrent of words of court decisions. Over time, we become proficient and -- like the child who stops moving his lips when he reads -- the activity becomes internalized and instinctive. From then on, we only notice that something's not right by its absence.

The structure is as follows:

  • ISSUE - the legal question that is being answered. Most typically it begins with the word "whether" "Whether the Privacy Act was violated?" Though the bigger question is whether an entire law was violated, because laws tend to have so many subparts and variables, we often frame a much narrower issue based upon a subpart that we think was violated, such as "Whether the computer matching prohibition of the Privacy Act was violated?"
  • RULE - provides the words and the source of the legal requirement. This can be the statement of a particular law, such as "The US Copyright law permits unauthorized use of copyrighted work based upon four conditions - the nature of use, the the nature of the work, the amount of the work used, and the likely impact on the value of the work. 17 USC § 107." Or, it can be a rule created by a court to explain how the law is implemented in practical situations: "In our jurisdiction, there is no infringement of a copyrighted work when the original is distributed widely for free because there is no diminution of market value. Field v. Google, Inc., 412 F. Supp 2d. 1106 (D.Nev. 2006)." [Note: The explanation of the citation formats for the sources has filled books and blogs. Here's a good brief explanation from Cornell.]
  • FACTS - the known or asserted facts that are relevant to the rule we are considering and the source of the information. In a Privacy Act computer matching case, there will be assertions like "the defendant's CIO admitted in deposition that he matched the deadbeat dads list against the welfare list and if there were matches he was to divert the benefits to the custodial parent." In a copyright case fair use case, a statement of facts might include "plaintiff admitted that he has posted the material on his website and has no limitations on access or copying the work."
  • ANALYSIS - is where the facts are pattern-matched to the rule. "The rule does not permit US persons to lose benefits based upon computer matched data unless certain conditions are met. Our facts show that many people lost their welfare benefits after the deadbeat data was matched to the welfare rolls without any of the other conditions being met." Or "There can be no finding of copyright infringement where the original work was so widely distributed for free that it had no market value. Our facts show that Twinky Co. posted its original material on the web on its own site and every other site where it could gain access without any attempt to control copying or access."

  • CONCLUSION - whether a violation has or has not occurred. "The computer matching provision of the Privacy Act was violated." or "The copyright was not infringed.

In light of this structure, we've been working on parsing the tremendous volume of words into their bare essentials so that they can be stored and computed to determine whether certain uses of data occurred in compliance with law. Most of our examples have focused on privacy.

Today, the number of sub-rules, elements of rules, and facts are often so voluminous that there is not enough time for a lawyer or team of lawyers to work through them all. So, the lawyer guesses what's likely to be a problem and works from there; the more experienced or talented the lawyer, the more likely that the guess leads to a productive result. Conversely, this likely means that many violations are never discovered. One of the great benefits of our proposed accountability appliance is that it could quickly reason over a massive volume of sub-rules, elements, and facts to identify the transactiions that appear to violate a rule or for which there's insufficient information to make a determination.

Although we haven't discussed it, I think there also will be a benefit to be derived from all of the reasoning that concludes that activities were compliant. I'm going to try to think of some high value examples.

 

 

Two additional blogs are coming:

Physically, what does the lawyer expect to see? At the simplest level, lawyers are expecting to see things in terms they recognize and without unfamiliar distractions; even the presence of things like curly brackets or metatags will cause most to insist that the output is unreadable. Because there is so much information, visualization tools present opportunities for presentations that will be intuitively understood.

And:

The 1st Lawyer to Programmer/Programmer to Lawyer Dictionary! Compliance, auditing, privacy, and a host of other topics now have lawyers and system developers interacting regularly. As we've worked on DIG, I've noticed how the same words (e.g., rules, binding, fact) have different meanings.

 

brainstorming, issue tracking, and problem reporting... with tabulator?

Submitted by connolly on Mon, 2007-11-05 02:07. ::
I want to brainstorm about a bunch of issues in preparation for the HTML WG meeting this week. tracker's forms interface won't let me enter an issue until I fill out all the details... well, it doesn't really have that many constraints, except that it notifies the whole WG when an issue is added.

I want something more like the OmniOutliner experience... I want to brainstorm.

But when I'm done, I don't want to tediously copy and paste each field into tracker.

Clearly, I could write some python or XSLT to take OmniOutliner's XML and feed it to tracker afterward, but... can't we do better than that?

What if tabulator's UI were as smooth as OmniOutliner... and what if I could just push one button and get the toothpaste back in the tube, i.e. feed the outline into the tracker's REST interface?

p.s. why am I using emacs to write this? Apple Mail knows IntegrityIsJobOne, but in OS X 10.4, like iCal, it goes off into the weeds eating CPU for inexplicable reasons, and I don't invest debugging effort in stuff that isn't open source.

How do I feed this to breadcrumbs now? Does emacs have a markdown/ReStructuredText mode?
How about AtomPub support? I manually cut and pasted and cleaned up the line breaks. ugh!

I use Thunderbird on my PowerBook, but it's totally confused about offline operation. It goes to save to the drafts folder every now and again, but over IMAP... so if the net is flakey or down, (a) it doesn't actually save, and (b) it interrupts my drafting!!! OK... found the config option to use a local drafts folder under Tools/Account settings. (why not under preferences?) But Thunderbird doesn't do well filling the IMAP cache; I don't know to tell it to go offline until I've left the airport wifi, and at that point, it's too late to grab the mail I want to read. The Apple mailer does much better at using idle time to prefetch.

p.p.s. how do I use hReview and GRDDL to make the data in this gripe available as if it were a bugzilla entry? More on that to follow, I hope...

Now is a good time to try the tabulator

Submitted by connolly on Thu, 2006-10-26 11:40. :: | | |

Tim presented the tabulator to the W3C team today; see slides: Tabulator: AJAX Generic RDF browser.

The tabulator was sorta all over the floor when I tried to present it in Austin in September, but David Sheets put it back together in the last couple weeks. Yay David!

In particular, the support for viewing the HTTP data that you pick up by tabulating is working better than ever before. The HTTP vocabulary has URIs like http://dig.csail.mit.edu/2005/ajar/ajaw/httph#content-type. That seems like an interesting contribution to the WAI ER work on HTTP Vocabulary in RDF.

Note comments are disabled here in breadcrumbs until we figure out OpenID comment policies and drupal etc.. The tabulator issue tracker is probably a better place to report problems anyway. We don't have OpenID working there yet either, unfortunately, but we do support email callback based account setup.

Talking with U.T. Austin students about the Microformats, Drug Discovery, the Tabulator, and the Semantic Web

Submitted by connolly on Sat, 2006-09-16 21:36. :: | | | | | |

Working with the MIT tabulator students has been such a blast that while I was at U.T. Austin for the research library symposium, I thought I would try to recruit some undergrads there to get into it. Bob Boyer invited me to speak to his PHL313K class on why the heck they should learn logic, and Alan Cline invited me to the Dean's Scholars lunch, which I used to attend when I was at U.T.

To motivate logic in the PHL313K class, I started with their experience with HTML and blogging and explained how the Semantic Web extends the web by looking at links as logical propositions. cal screen shot I used my XML 2005 slides to talk a little bit about web history and web architecture, and then I moved into using hCalendar (and GRDDL, though I left that largely implicit) to address the personal information disaster. This was the first week or so of class and they had just started learning propositional logic, and hadn't even gotten as far as predicate calculus where atomic formulas like those in RDF show up. And none of them had heard of microformats. I promised not to talk for the full hour but then lost track of time and didn't get to the punch line, "so the computer tells you that no, you can't go to both the conference and Mom's birthday party because you can't be in two places at once" until it was time for them to head off to their next class.

One student did stay after to pose a question that is very interesting and important, if only tangentially related to the Semantic Web: with technology advancing so fast, how do you maintain balance in life?

While Boyer said that talk went well, I think I didn't do a very good job of connecting with them; or maybe they just weren't really awake; it was an 8am class after all. At the Dean's Scholars lunch, on the other hand, the students were talking to each other so loudly as they grabbed their sandwiches that Cline had to really work to get the floor to introduce me as a "local boy done good." They responded with a rousing ovation.

Elaine Rich had provided the vital clue for connecting with this audience earlier in the week. She does AI research and had seen TimBL's AAAI talk. While she didn't exactly give the talk four stars overall, she did get enough out of it to realize it would make an interesting application to add to a book that she's writing, where she's trying to give practical examples that motivate automata theory. So after I took a look at what she had written about URIs and RDF and OWL and such, she reminded me that not all the Deans Scholars are studying computer science; but many of them do biology, and I might do well to present the Semantic Web more from the perspective of that user community.

So I used TimBL's Bio-IT slides. They weren't shy when I went too fast with terms like hypertext, and there were a lot of furrowed brows for a while. But when I got to the FOAFm OMM, UMLS, SNP, Uniprot, Bipax, Patents all have some overlap with drug target ontology drug discovery diagram, I said I didn't even know some of these words and asked them which ones they knew. After a chuckle about "drug", one of them explained about SNP, i.e. single nucleotide polymorphism and another told me about OMM and the discussion really got going. I didn't make much more use of Tim's slides. One great question about integrating data about one place from lots of sources prompted me to tempt the demo gods and try the tabulator. The demo gods were not entirely kind; perhaps I should have used the released version rather than the development version. But I think I did give them a feel for it. In answer to "so what is it you're trying to do, exactly?" I gave a two part answer:

  1. Recruit some of them to work on the tabulator so that their name might be on the next paper like the SWUI06 paper, Tabulator: Exploring and Analyzing linked data on the Semantic Web.
  2. Integrate data accross applications and accross administrative boundaries all over the world, like the Web has done for documents.

We touched on the question of local and global consistency, and someone asked if you can reason about disagreement. I said that yes, I had presented a paper in Edinburgh just this May that demonstrated formally a disagreement between several parties

One of the last questions was "So what is computer science research anway?" which I answered by appeal to the DIG mission statement:

The Decentralized Information Group explores technical, institutional and public policy questions necessary to advance the development of global, decentralized information environments.

And I said how cool it is to have somebody in the TAMI project with real-world experience with the privacy act. One student followed up and asked if we have anybody with real legal background in the group, and I pointed him to Danny. He asked me afterward how to get involved, and it turned out that IRC and freenode are known to him, so the #swig channel was in our common neighborhood in cyberspace, even geography would separate us as I headed to the airport to fly home.

technorati tags:, ,

Blogged with Flock

tabulator maps in Argentina

Submitted by connolly on Mon, 2006-08-07 11:39. :: | |

My spanish is a little rusty, but it looks like inktel is having fun with the tabulator's map support too.

tags pending: geo, tabulator

Choosing flight itineraries using tabulator and data from Wikipedia

Submitted by connolly on Mon, 2006-07-17 18:13. :: |

While planning a trip to Boston/Cambridge, I was faced with a blizzard of itinerary options from American Airlines. I really wanted to overlay them all on the same map or calendar or something. I pretty much got it to work:

That's a map view using the tabulator, which had another release today. The itinerary data in RDF is converted from HTML via grokOptions.xsl (and tidy).

I can, in fact, see all the itineraries on the same calendar view. Getting these views to be helpful in choosing between the itineraries is going to take some more work, but this is a start.

Getting a map view required getting latitude/longitude info for the airports. I think getting Semantic Web data from Wikipedia is a promising approach. A while back, I figured out how to get lat/long data for airports from wikipedia. This week, I added a Kid template, aptinfo.kid, and I figured figured out how to serve up that data live from the DIG/CSAIL web server. For example, http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD#item is a URI for the Chicago airport, and when you GET it with HTTP, a little CGI script calls aptdata.py, which fetches the relevant page from wikipedia (using an httplib2 cache) and scrapes the lat/long and a few other details and gives them back to you in RDF. Viewed with RDF/N3 glasses, it looks like:

#   Base was: http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD
@prefix : <#> .
@prefix apt: <http://www.daml.org/2001/10/html/airport-ont#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix go: <http://www.w3.org/2003/01/geo/go#> .
@prefix s: <http://www.w3.org/2000/01/rdf-schema#> .

:item a apt:Airport;
apt:iataCode "ORD";
s:comment "tz: -6";
s:label "O'Hare International Airport";
go:within_3_power_11_metres :nearbyCity;
geo:lat "41.9794444444";
geo:long "-87.9044444444";
foaf:homepage <http://en.wikipedia.org/wiki/O%27Hare_International_Airport> .

:nearbyCity foaf:homepage <http://en.wikipedia.org/wiki/wiki/Chicago%2C_Illinois>;
foaf:name "Chicago, Illinois" .

In particular, notice that:

  • I use the swig geo vocabulary, which the new GEO XG is set up to refine. The use of strings rather than datatyped floating point numbers follows the schema for that vocabulary.
  • I use distinct URIs for the airport (http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD#item) and the page about the airport (http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD).
  • I use an owl:InverseFunctionalProperty, foaf:homepage to connect the airport to its wikipedia article, and another, apt:iatacode to relate the airport to its IATA code.
  • I use the GeoOnion pattern to relate the airport to a/the city that it serves. I'm not sure I like that pattern, but the idea is to make a browseable web of linked cities, states, countries, and other geographical items.

Hmm... I use rdfs:label for the name of the airport but foaf:name for the name of the city.I don't think that was a conscious choice. I may change that.

The timezone info is in an rdfs:comment. I hope to refine that in future episodes. Stay tuned.

a walk thru the tabulator calendar view

Submitted by connolly on Tue, 2006-07-11 11:22. :: | |

The SIMILE Timeline is a nifty hack. DanBri writes about Who, what, where, when? of the Semantic Web, and in a message to the Semantic Web Interest Group, he asks

TimBL, any plans for Geo widgets to be wired into the tabulator?

Indeed, there are. Stay tuned for more on that... from the students who are actually doing the work, I hope. But I can't wait to tell everybody about the calendar view. Give it a try:

  1. Start with the development version of the tabulator.
  2. Select a source of calendar info. Morten's SPARQL Timeline uses RSS. The tabulator calendar groks dc:date, so something like W3C's main RSS feed will work fine. Put its URI in the tabulator's URI text field and hit "Add to outliner".
    add to outliner screenshot, before

    When it's done it should look something like this:

    add to outliner screenshot, after
    • For fun, open the Sources tab near the bottom. Note that the tabulator loads the RSS and DC schemas, plus all the schemas they reference, and so on; i.e. the ontological closure. Hmm... the RSS terms seem to be 404.
      sources screenshot
  3. Now navigate the outline down to one of the items.
    item screenshot
    and then re-focus (shift click) on the rss item class itself, and then open an item and select the date property.
    refocus screenshot
  4. Now hit "Tabulate selected properties". You'll get a table of items and their dates.
    table screenshot
  5. OK, so much for review of basic tabulator stuff. Now you're all set for the new stuff. Hit Calendar and scroll down a little:
    table screenshot

Note the Export button with the SPARQL option. That's a whole other item in itself, but for now, you can see the SPARQL query that corresponds to what you've selected to put on the the calendar:

SELECT ?v0 ?v1 
WHERE
{
?v0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/rss/1.0/item> .
?v0 <http://purl.org/dc/elements/1.1/date> ?v1 .

Fun, huh?

tabulator use cases: when can we meet? and PathCross

Submitted by connolly on Wed, 2006-02-08 13:48. :: | | | | |

I keep all sorts of calendar info in the web, as do my colleagues and the groups and organizations we participate in.

Suppose it was all in RDF, either directly as RDF/XML or indirectly via GRDDL as hCalendar or the like.

Wouldn't it be cool to grab a bunch of sources, and then tabulate names vs. availability on various dates?

I would probably need rules in the tabulator; Jos's work sure seems promising.

Closely related is the PathCross use case...

Suppose I'm travelling to Boston and San Francisco in the next couple months. I'd like my machine to let me know I have a FriendOfaFriend who also lives there or plans to be there.

See also the Open Group's Federated Free/Busy Challenge.

Syndicate content