Web Technologies

Accountability Appliances: What Lawyers Expect to See - Part III (User Interface)

I've written in the last two blogs about how lawyers operate in a very structured enviroment. This will have a tremendous impact on what they'll consider acceptable in a user interface. They might accept something which seems a bit like an outline or a form, but years of experience tell me that they will rail at anything code-like.

For example, we see

:MList a rdf:List

and automatically read

"MList" is the name of a list written in rdf


air:pattern {

and know that we are asking our system to look for a pattern in the data in which a particular "member" is in a particular list of members. Perhaps because law is already learning to read, speak, and think in another language, most lawyers look at lines like those above and see no meaning.

Our current work-in-progress produces output that includes:

bjb reject bs non compliant with S9Policy 1


phone record 2892 category HealthInformation


bs request instruction bs request content
type Request
bs request content intended beneficiary customer351
type Benefit Action Instruction
customer351 location MA
xphone record 2892 about customer351

Nearly every output item is a hotlink to something which provides definition, explanation, or derivation. Much of it is in "Tabulator", the cool tool that aggregates just the bits of data we want to know.

From a user-interface-for-lawyers perspective, this version of output is an improvement over our earlier ones because it removes a lot of things programmers do to solve computation challenges. It removes colons and semi-colons from places they're not commonly used in English (i.e., as the beginning of a term) and mostly uses words that are known in the general population. It also parses "humpbacks" - the programmers' traditional
concatenation of a string of words - back into separate words. And, it replaces hyphens and underlines - also used for concatenation - with blank spaces.

At last week's meeting, we talked about the possibility of generating output which simulates short English sentences. These might be stilted but would be most easily read by lawyers. Here's my first attempt at the top-level template:


Issue: Whether the transactions in [TransactionLogFilePopularName] {about [VariableName] [VariableValue]} comply with [MasterPolicyPopularName]?

Rule: To be compliant, [SubPolicyPopularName] of [MasterPolicyPopularName] requires [PatternVariableName] of an event to be [PatternValue1].

Fact: In transaction [TransactionNumber] [PatternVariableName] of the event was [PatternValue2].

Analysis: [PatternValue2] is not [PatternValue].

Conclusion: The transactions appear to be non-compliant with [SubPolicyName] of [MasterPolicyPopularName].

This seems to me approximately correct in the context of requests for the appliance to reason over millions of transactions with many sub-rules. A person seeking an answer from the system would create the Issue question. The Issue question is almost always going to ask whether some series of transactions violated a super-rule and often will have a scope limiter (e.g., in regards to a particular person or within a date scope or by one entity), denoted here by {}.

From the lawyer perspective, the interesting part of the result is the finding of non-compliance or possible non-compliance. So, the remainder of the output would be generated to describe only the failure(s) in a pattern-matching for one or more sub-rules. If there's more than one violation, the interface would display the Issue once and then the Rule to Conclusion steps for each non-compliant result.

I tried this out on a laywer I know. He insisted it was unintelligible when the []'s were left in but said it was manageable when he saw the same text without them.

For our Scenario 9, Transaction 15, an idealized top level display would say:

Issue: Whether the transactions in Xphone's Customer Service Log about Person Bob Same comply with MA Disability Discrimination Law?

Rule: To be compliant, Denial of Service Rule of MA Disability Discrimination Law requires reason of an event to be other than disability.

Fact: In transaction Xphone Record 2892 reason of the event was Infectious Disease.

Analysis: Infectious disease is not other than disability.

Conclusion: The transactions appear to be non-compliant with Denial of Service Rule of MA Disability Discrimination Law.

Each one of the bound values should have a hotlink to a Tabulator display that provides background or details.

Right now, we might be able to produce:

Issue: Whether the transactions in Xphone's Customer Service Log about Betty JB reject Bob Same comply with MA Disability Discrimination Law?

Rule: To be non-compliant, Denial of Service Rule of MA Disability Discrimination Law requires REASON of an event to be category Health Information.

Fact: In transaction Xphone Record 2892 REASON of the event was category Health Information.

Analysis: category Health Information is category Health Information.

Conclusion: The transactions appear to be non-compliant with Denial of Service Rule of MA Disability Discrimination Law.

This example highlights a few challenges.

1) It's possible that only failures of policies containing comparative matches (e.g., :v1 sameAs :v2; :v9 greaterThan :v3; :v12 withinDateRange :v4) are legally relevant. This needs more thought.

2) We'd need to name every sub-policy or have a default called UnnamedSubPolicy.

3) We'd need to be able to translate statute numbers to popular names and have a default instruction to include the statute number when no popular name exists.

4) We'd need some taxonomies (e.g., infectious disease is a sub-class of disability).

5) In a perfect world, we'd have some way to trigger a couple alternative displays. For example, it would be nice to be able to trigger one of two rule structures: either one that says a rule requires a match or one that says a rules requires a non-match. The reason for this is that if we always have to use the same structure, about half of the outputs will be very stilted and cause the lawyers to struggle to understand.

6) We need someway to deal with something the system can't reason. If the law requires the reason to be disability and the system doesn't know whether health information is the same as or different from disability, then it ought to be able to produce an analysis that says something along the lines of "The relationship between Health Information and disability is unknown" and produce a conclusion that says "Whether the transaction is compliant is unknown." If we're reasoning over millions of transactions there are likely to be quite a few of these and they ought to be presented after the non-compliant ones.



An Introduction and a JavaScript RDF/XML Parser

Submitted by dsheets on Mon, 2006-07-17 15:02. :: | | | |

My name is David Sheets. I will be a sophomore at MIT this fall. I like to be at the intersection of theory and practice.

This summer, I am working as a student developer on the Tabulator Project in the Decentralized Information Group at MIT's CSAIL. My charge has been to develop a new RDF/XML parser in JavaScript with a view to a JavaScript RDF library. I am pleased to report that I have finished the first version of the new RDF/XML parser.

Before this release, the only available RDF/XML parser in JavaScript was Jim Ley's parser.js. This parser served the community well for quite a while but fell short of the needs of the Tabulator Project. Most notably, it didn't parse all valid RDF/XML resources.

To rectify this, work on a new parser was begun. The result that is being released today is a JavaScript class that weighs in at under 400 source lines of code and 2.8K gzip compressed (12K uncompressed). For maximum utility, a parser should be small, standards-compliant, widely portable, and fast.

To the best of my knowledge, RDFParser is fully compliant with the RDF/XML specification. The parser passes all of the positive parser test cases from the W3. This was tested using jsUnit -- a unit testing framework similar to jUnit but for JavaScript. To run the automated tests against RDFParser, you can follow the steps here. This means the parser supports features such as xml:base, xml:lang, RDF Collections, XML literals, and so forth. If it's in the specification, it should be supported. An important point to note is that this parser, due to speed concerns, is non-validating. Additionally, RDFParser has been speed optimized resulting in code that is slightly less readable.

The new parser is not as portable as the old parser at this time. It has only been tested in Firefox 1.5 but should work in any browser that supports the DOM Level 2 specification.

RDFParser runs at a speed similar to Jim Ley's parser. One can easily construct example RDF/XML files that run faster on one parser or another. I took five files that the tabulator might come across in day-to-day use and I ran head-to-head benchmarks between the two parsers.

Parse time is highly influenced by compact serialization. The more nested the RDF/XML serialization, the more scope frames must be created to track features from the specification. The less nested, the fewer steps to traverse the DOM, the more triples per DOM element.

Planned in the next release of RDFParser is a callback/continuation system so that the parser can yield in the middle of a parse run and allow other important page features to run.

API documentation for RDFParser included in the Tabulator 0.7 release is available.

Finally, I'd be happy to hear from you if you have questions, comments, or ideas regarding the RDFParser or related technologies.

formally closing the feedback loop

Submitted by connolly on Fri, 2006-02-10 01:39. :: |

Speaking of checking specs against tests, I suppose the first time I did that was when I was editing HTML 2. I maintained a collection of tests that I checked against the DTD whenever I changed it.

When we couldn't get other collaborators to install an SGML parser and do likewise, Mark Gaither and announced the zero-install validation service, a precursor to the W3C markup validation service.

I'm on an Adventures in Formal Methods panel at the W3C tech plenary day. I expect it will be useful to tell that story to make the point that formalizing knowledge allows us to delegate tasks to the machine.

tags pending: web history? quality?

On Google, Jabber, and Jingle and good and evil in IM and IP networks

Submitted by connolly on Tue, 2006-01-03 16:32. :: |

The 14 December jingle announcement gives a hint into google's approach to adding voice to their Google Talk offering. Actually, it gives quite a bit more than a hint; it comes with a jingle spec and an open source library implementation.

Google Talk has had pretty good "do no evil" karma since it started. The dominant commercial IM services (AOL/Yahoo/Microsoft) are each a world unto themselves. Your AIM screen name is just jim47 or whatever, not jim47@aol.com like an email address, and while clients like trillian and gaim can connect to them all, that's not something the big three encourage. Google Talk uses gmail addresses and the Jabber/XMPP protocol, which has the same network topology as email. While google isn't opening their service to actual server-to-server federation until they get a better handle on some operational issues (think: spam), they are using open protocols and they actively support gaim development.

Apple's iChat uses Jabber at some level too, though I haven't worked out the interoperability issues in practice. I think the last time I tried was before the Tiger release of OS X, when the Jabber support was much more under-the-covers.

The popularity of multi-protocol clients like gaim and trillian surprises me: after all, you can't have one chat room with AIM and MSN messenger users connected. Evidently this just not a big deal. "IRC and instant messaging are very different paradigms," says the Adium X: IRC Howto. I guess I'm just too old school to get it; in the internet relay chat usage that I'm used to, channels (aka chat rooms) are the norm and private channels are the exception. I gather IM is the other way around. I have played with Jabber's support for bridging to other networks, but I have yet to find a reliable combination of:

  • a jabber client with bridging support that I can figure out how to use
  • and either
    • server software with bridging support that I can figure out how to use, or
    • an existing service with bridging support that I can use and trust (since my credentials pass thru their service)

The Jabber protocol has lots of pieces and extensions an such. There's a whole JEP process, in addition to the XMPP process where jabber technology feeds into the IETF. I don't quite have my head around the whole thing. I discovered that there are older and newer protocols for doing chat rooms in jabber that don't mix well. I wonder which of them, if either, the IETF has endorsed. An XMPP summary shows JEP-0045 for Multi-User Chat but no RFC. And I don't see XMPP among IETF Working Groups any more. I wonder what's up. The xmppwg mailing list archives show pretty recent activity.

The $2.6Bn aquisition of skype by Ebay shows the value of networks of IM and voice users. Skype has a novel topology based on the same p2p designers that did Kazaa. As I understand it, they mostly use the p2p network for firewall traversal, which is the biggest problem, in practice, with deploying consumer voice chat. They keep the protocol details to themselves, though, and they have the only implementation, as a consequence. They have a centralized user directory too.

In my visit to the 62nd IETF in Minneapolis, MN, I learned what a sore spot firewall traversal is in Internet standardization. "Just use IPV6 and don't waste your time with those kludges" goes the one side; "but NAT works today" goes the other. Ugh. And since W3C started working more actively with developing countries, I hear more about the political aspects of IPV6. In the 1st world, we can dismiss claims that IPV4 addresses are running out as technically overblown, since we can afford to pay for the management fees and the NAT boxes. But the scarcity is a real economic issue in the developing world; plus it concentrates power in a way that engenders distrust.

Back to network topologies... the fact that Jabber has the same topology as conventional (SMTP) email means that it's subject to the same sorts of spam issues. I wonder if anybody has considered the IM2000 approach of redesigning the mail system as a pull delivery rather than as a push delivery system, so that recipients no longer bear the costs of receiving and storing unwanted messages. In an IM2000 world, senders have to hold still long enough to deliver a message, which makes it much easier to hold them accountable for nastiness.

Drupal, OpenID, and the Mac OS X Keychain

Submitted by connolly on Mon, 2005-12-19 16:12. :: | | |

Managing passwords via email callback is hampered by anti-spam mechanisms. I just helped a breadcrumbs user whose password message from drupal was classified as Junk by Mac OS X Mail.

Meanwhile, I did enough research on the Mac OS X keychain to trust it. Support for OpenID in drupal is already in the OpenID wish list and I've see some progress.

It's not obvious to me how to connect the keychain to OpenID, but I'm sure there's a way. Any suggestions?

Syndicate content