Talking with U.T. Austin students about the Microformats, Drug Discovery, the Tabulator, and the Semantic Web

Submitted by connolly on Sat, 2006-09-16 21:36. :: | | | | | |

Working with the MIT tabulator students has been such a blast that while I was at U.T. Austin for the research library symposium, I thought I would try to recruit some undergrads there to get into it. Bob Boyer invited me to speak to his PHL313K class on why the heck they should learn logic, and Alan Cline invited me to the Dean's Scholars lunch, which I used to attend when I was at U.T.

To motivate logic in the PHL313K class, I started with their experience with HTML and blogging and explained how the Semantic Web extends the web by looking at links as logical propositions. cal screen shot I used my XML 2005 slides to talk a little bit about web history and web architecture, and then I moved into using hCalendar (and GRDDL, though I left that largely implicit) to address the personal information disaster. This was the first week or so of class and they had just started learning propositional logic, and hadn't even gotten as far as predicate calculus where atomic formulas like those in RDF show up. And none of them had heard of microformats. I promised not to talk for the full hour but then lost track of time and didn't get to the punch line, "so the computer tells you that no, you can't go to both the conference and Mom's birthday party because you can't be in two places at once" until it was time for them to head off to their next class.

One student did stay after to pose a question that is very interesting and important, if only tangentially related to the Semantic Web: with technology advancing so fast, how do you maintain balance in life?

While Boyer said that talk went well, I think I didn't do a very good job of connecting with them; or maybe they just weren't really awake; it was an 8am class after all. At the Dean's Scholars lunch, on the other hand, the students were talking to each other so loudly as they grabbed their sandwiches that Cline had to really work to get the floor to introduce me as a "local boy done good." They responded with a rousing ovation.

Elaine Rich had provided the vital clue for connecting with this audience earlier in the week. She does AI research and had seen TimBL's AAAI talk. While she didn't exactly give the talk four stars overall, she did get enough out of it to realize it would make an interesting application to add to a book that she's writing, where she's trying to give practical examples that motivate automata theory. So after I took a look at what she had written about URIs and RDF and OWL and such, she reminded me that not all the Deans Scholars are studying computer science; but many of them do biology, and I might do well to present the Semantic Web more from the perspective of that user community.

So I used TimBL's Bio-IT slides. They weren't shy when I went too fast with terms like hypertext, and there were a lot of furrowed brows for a while. But when I got to the FOAFm OMM, UMLS, SNP, Uniprot, Bipax, Patents all have some overlap with drug target ontology drug discovery diagram, I said I didn't even know some of these words and asked them which ones they knew. After a chuckle about "drug", one of them explained about SNP, i.e. single nucleotide polymorphism and another told me about OMM and the discussion really got going. I didn't make much more use of Tim's slides. One great question about integrating data about one place from lots of sources prompted me to tempt the demo gods and try the tabulator. The demo gods were not entirely kind; perhaps I should have used the released version rather than the development version. But I think I did give them a feel for it. In answer to "so what is it you're trying to do, exactly?" I gave a two part answer:

  1. Recruit some of them to work on the tabulator so that their name might be on the next paper like the SWUI06 paper, Tabulator: Exploring and Analyzing linked data on the Semantic Web.
  2. Integrate data accross applications and accross administrative boundaries all over the world, like the Web has done for documents.

We touched on the question of local and global consistency, and someone asked if you can reason about disagreement. I said that yes, I had presented a paper in Edinburgh just this May that demonstrated formally a disagreement between several parties

One of the last questions was "So what is computer science research anway?" which I answered by appeal to the DIG mission statement:

The Decentralized Information Group explores technical, institutional and public policy questions necessary to advance the development of global, decentralized information environments.

And I said how cool it is to have somebody in the TAMI project with real-world experience with the privacy act. One student followed up and asked if we have anybody with real legal background in the group, and I pointed him to Danny. He asked me afterward how to get involved, and it turned out that IRC and freenode are known to him, so the #swig channel was in our common neighborhood in cyberspace, even geography would separate us as I headed to the airport to fly home.

technorati tags:, ,

Blogged with Flock

ACL 2 seminar at U.T. Austin: Toward proof exchange in the Semantic Web

Submitted by connolly on Sat, 2006-09-16 21:15. :: | | |


In our PAW and TAMI projects, we're making a lot of progress on the practical aspects of proof exchange: in PAW we're working out the nitty gritty details of making an HTTP client (proxy) and server that exchange proofs, and in TAMI, we're working on user interfaces for audit trails and justifications and on integration with a truth maintenance system.

It doesn't concern me too much that cwm does some crazy stuff when finding proofs; it's the proof checker that I expect to deploy as part of trusted computing bases and the proof language specification that I hope will complete the Semantic Web standards stack.

But N3 proof exchange is no longer a completely hypothetical problem; the first examples of interoperating with InferenceWeb (via a mapping to PML) and with Euler are working. So it's time to take a close look at the proof representation and the proof theory in more detail.

My trip to Austin for a research library symposium at the University of Texas gave me a chance to re-connect with Bob Boyer. A while back, I told him about RDF and asked him about Semantic Web logic issues and he showed me the proof checking part of McCune's Robbins Algebras Are Boolean result:

Proofs found by programs are always questionable. Our approach to this problem is to have the theorem prover construct a detailed proof object and have a very simple program (written in a high-level language) check that the proof object is correct. The proof checking program is simple enough that it can be scrutinized by humans, and formal verification is probably feasible.

In my Jan 2000 notes, that excerpt is followed by...

I offer a 500 brownie-point bounty to anybody who converts it to Java and converts the ()'s in the input format to <>'s.

5 points for perl. ;-)

Bob got me invited to the ACL2 seminar this week; in my presentation, Toward proof exchange in the Semantic Web. I reviewed a bit of Web Architecture and the standardization status of RDF, RDFS, OWL, and SPARQL as background to demonstrating that we're close to collecting that bounty. (Little did I know in 2000 that TimBL would pick up python so that I could avoid Java as well as perl ;-)

Matt Kauffman and company gave all sorts of great feedback on my presentation. I had to go back to the Semantic Web Wave diagram a few times to clarify the boundary between research and standardization:

  • RDF is fully standardized/ratified
  • turtle has the same expressive capability as RDF's XML syntax, but isn't fully ratified, and
  • N3 goes beyond the standards in both syntax and expressiveness

One of the people there who knew about RDF and OWL and such really encouraged me to get N3/turtle done, since every time he does any Semantic Web advocacy, the RDF/XML syntax is a deal-killer. I tried to show them my work on a turtle bnf, but what I was looking for was in June mailing list discussion, not in my February bnf2turtle breadcrumbs item.

They asked what happens if an identifier is used before it appears in an @forAll directive and I had to admit that I could test what the software does if they wanted to, but I couldn't be sure whether that was by design or not; exactly how quantification and {}s interact in N3 is sort of an open issue, or at least something I'm not quite sure about.

Moore noticed that our conjunction introduction (CI) step doesn't result in a formula whose main connective is conjuction; the conjuction gets pushed inside the quantifiers. It's not wrong, but it's not traditional CI either.

I asked about ACL2's proof format, and they said what goes in an ACL2 "book" is not so much a proof as a sequence of lemmas and such, but Jared was working on Milawa, a simple proof checker that can be extended with new prooftechniques.

I started talking a little after 4pm; different people left at different times, but it wasn't until about 8 that Matt realized he was late for a squash game and headed out.

MLK and the UT TowerI went back to visit them in the U.T. tower the next day to follow up on ACL2/N3 connections and Milawa. Matt suggested a translation of N3 quantifiers and {}s into ACL2 that doesn't involve quotation. He offered to guide me as I fleshed it out, but I only got as far as installing lisp and ACL2; I was too tired to get into a coding fugue.

Jared not only gave me some essential installation clues, but for every technical topic I brought up, he printed out two papers showing different approaches. I sure hope I can find time to follow up on at least some of this stuff. tags:, , , ,

Blogged with Flock

On the Future of Research Libraries at U.T. Austin

Submitted by connolly on Sat, 2006-09-16 17:14. :: | | |

Wow. What a week!

I'm always on the lookout for opportunities to get back to Austin, so I was happy to accept an invitation to this 11 - 12 September symposium, The Research Library in the 21st Century run by University of Texas Libraries:Image: San Jacinto Residence Hall

In today's rapidly changing digital landscape, we are giving serious thought to shaping a strategy for the future of our libraries. Consequently, we are inviting the best minds in the field and representatives from leading institutions to explore the future of the research library and new developments in scholarly communication. While our primary purpose is to inform a strategy for our libraries and collections, we feel that all participants and their institutions will benefit.

I spent the first day getting a feel for this community, where evidently a talk by Clifford Lynch of CNI is a staple. "There is no scholarship without scholarly communication," he said, quoting Courant. He noted that traditionally, publishers disseminate and libraries preserve, but we're shifting to a world where the library helps disseminate and makes decisions on behalf of the whole world about which works to preserve. He said there's a company (I wish I had made a note of the name) that has worked out the price of an endowed web site; at 4% annual return, they figure it at $2500/gigabyte.

James Duderstadt from the University of Michigan told us that the day when the entire contents of the library fits on an iPod (or "a device the size of a football" for other audiences that didn't know about iPods ;-) is not so far off. He said that the University of Michigan started digitizing their 7.8million volumes even before becoming a Google Book Search library partner. They initially estimated it would take 10 years, but the current estimate is 6 years and falling. He said that yes, there are copyright issues and other legal challenges, and he wouldn't be suprised to end up in court over it; he had done that before. Even the sakai project might face litigation. What got the most attention, I think, was when he relayed first-hand experience from the Spellings Commission on the Future of Higher Education; their report is available to those that know where to look, though it is not due for official release until September 26.

He also talked about virtual organizations, i.e. groups of researchers from universities all over, and even the "meta university," with no geographical boundaries at all. That sort of thing fueled my remarks for the Challenges of Access and Preservation panel on the second day. I noted that my job is all about virtual organizations, and if the value of research libraries is connected to recruiting good people, you should keep in mind the fact that "get together and go crazy" events like football games are a big part of building trust and loyalty.

Kevin Guthrie, President of ITHAKA, made a good point that starting new things is usually easier than changing old things, which was exactly what I was thinking when President Powers spoke of "preserving our investment" in libraries in his opening address. U.T. invested $650M in libraries since 1963. That's not counting bricks and mortar; that's special collections, journal subscriptions, etc.

My point that following links is 96% reliable sparked an interesting conversation; it was misunderstood as "96% of web sites are persistent" and then "96% of links persist"; when I clarified that it's 96% of attempts to follow links that succeed, and this is because most attempts to follow links are from one popular resource to another, we had an interesting discussion of ephemera vs. the scholarly record and which parts need what sort of attention and what sort of policies. The main example was that 99% of political websites about the California run-off election went offline right after the election. My main point was: for the scholarly record, HTTP/DNS is as good as it gets for the forseeable future; don't throw up your hands at the 4% and wait for some new technology; apply your expertise of curation and organizational change to the existing technologies.

In fact, I didn't really get beyond URIs and basic web architecture in my remarks. I had prepared some points about the Semantic Web, but I didn't have time for them in my opening statement and they didn't come up much later in the conversation, except when Ann Wolpert, Director of Libraries at MIT, brough up DSPACE a bit.

Betsy Wilson of the University of Washington suggested that collaboration would be the hallmark of the library of the future. I echoed that back in the wrap-up session referring to library science as the "interdisciplinary discipline"; I didn't think I was making that up (and a google search confirms I did not), but it seemed to be new to this audience.

By the end of the event I was pretty much up to speed on the conversation; but on the first day, I felt a little out of place and when I saw the sound engineer getting things ready, I mentioned to him that I had a little experience using and selling that sort of equipment. It turned out that he's George Geranios, sound man for bands like Blue Oyster Cult for about 30 years. We had a great conversation on digital media standards and record companies. I'm glad I sat next to David Seaman of the DLF at lunch; we had a mutual colleague in Michael Sperberg-McQueen. I asked him about IFLA, one of the few acronyms from the conversation that I recognized; he helped me understand that IFLA conferences are relevant, but they're about libraries in general, and the research library community is not the same. And Andrew Dillon got me up to speed on all sorts of things and made the panel I was on fun and pretty relaxed.

Fred Heath made an oblique reference to a New York Times article about moving most of the books out of the U.T. undergraduate library as if everyone knew, but it was news to me. Later in the week I caught up with Ben Kuipers; we didn't have time for my technical agenda of linked data and access limited logic, but we did discover that both of us were a bit concerned with the fragility of civilization as we know it and the value of books over DVDs if there's no reliable electricity.

The speakers comments at the symposium were recorded; there's some chance that edited transcripts will appear in a special issue of a journal. Stay tuned for that. And stay tuned for more breadcrumbs items on talks I gave later in the week where I did get beyond the basic http/DNS/URI layer of Semantic Web Archtiecture.

tags:, ,

Stitching the Semantic Web together with OWL at AAAI-06

Submitted by connolly on Fri, 2006-08-11 16:06. :: | | |

I was pleased to find that AAAI '06 in Boston a couple weeks ago had a spectrum of people I know and don't know and work that's near and far from my own. The talk about the DARPA grand challenge was inspiring.

But closer to my work, I ran into Jeff Heflin, who I worked with on DAML and especially the OWL requirements document. Amid too many papers about ontologies for the sake of ontologies and threads like Is there real world RDF-S/OWL instance data?, his Investigation into the Feasibility of the Semantic Web is a breath of fresh air. The introduction sets out their approach this way:

Our approach is to use axioms of OWL, the de facto Semantic Web language, to describe a map for a set of ontologies. The axioms will relate concepts from one ontology to the other. ... There is a well-established body of research in the area of automated ontology alignment. This is not our focus. Instead we investigate the application of these alignments to provide an integrated view of the Semantic Web data.

(emphasis mine). The rest of the paper justifies this approach, leading up to:

We first query the knowledge base from the perspective of each of the 10 ontologies that define the concept Person. We now ask for all the instances of the concept Person. The results vary from 4 to 1,163,628. We then map the Person concept from all the ontologies to the Person concept defined in the FOAF ontology. We now issue the same query from the perspective of this map and we get 1,213,246 results. The results now encompass all the data sources that commit to these 10 ontologies. Note: a pair wise mapping would have taken 45 mapping axioms to establish this alignment instead of the 9 mapping axioms that we used. More importantly due to this network effect of the maps, by contributing just a single map, one will automatically get the benefit of all the data that is available in the network.

That's fantastic stuff.

We now pause for a word from Steve Lawrence; NEC Research Institute, to lament the lack of free online proceedings for AAAI: Articles freely available online are more highly cited. For greater impact and faster scientific progress, authors and publishers should aim to make research easy to access. OK, now back to the great paper...

Along the way, they give a definition of a knowledge function, K, that is remarkably similar to log:semantics from N3. They also define a commitment function that is basically the ontological closure pattern.

The approach to querying all this data is something they call DLDB, which comes from a paper they submitted to the ISWC Practical and Scalable Semantic Systems workshop. Darn! no full text proceedings online again. Ah... Jeff's pubs include a tech report version. To paraphrase: there's a table for each class and a table for each property that relates rows from the class tables. They use a DL reasoner to find subclass relationships, and they make views out of them. I have never seen this approach to before; it sure looks promising. I wonder if we can integrate it into our dbview work somehow and perhaps into our truth-maintenance system in the TAMI project.

This wasn't the only work at AAAI on scalable, practical knowledge representation. I caught just a glance at some other papers at the conference that exploit wikipedia as a dataset in various algorithms. I hope to study those more.

I also ran into Ben Kuipers, whose Algernon and Access-Limited Logic has long appealed to me as an approach to reasoning that might work well when scaled up to Semantic Web data sets. That work is mostly on hold; we started talking about getting it going again, but didn't get very far into the conversation. I hope to pick that up again soon.

I gather the 1.0 release of OpenCyc happened at the conference; there's a lot of great stuff in cyc, but only time will tell how well it will integrate with other Semantic Web stuff.

Meanwhile, a handy citation for Heflin's paper...

That's marked up using an XHTML/LaText/BibTex idiom that I'm working on so that we get BibTex for free:

title = "{An Investigation into the Feasibility of the Semantic Web}",
    author = {Z. Pan and A. Qasem and J. Heflin},
    booktitle = {Proc. of the Twenty First  National Conference on Artificial Intelligence  (AAAI 2006)},
    year = {2006},
    address = {Boston, USA},

on Wikimania 2006, from a few hundred miles away

Submitted by connolly on Thu, 2006-08-10 16:26. :: | |

Wikimania 2006 was last week in Boston; I had it on my travel schedule, tenatively, months in advance, but I didn't really come up with a solid justification, and there were conflicts, so I ended up not going.

I was very interested to see the online participation options, but I didn't get my hopes up too high, because I know that ConnectingAudiences is challenging.

I tried to participate in the transcription stuff real-time; installation of the goby collaborative editor went smoothly enough (it looks like an interesting alternative to SubEthaEdit, though it's client/server, not peer-to-peer; they're talking about switching to the jabber protocol...) but I couldn't seem to connect to any sessions while people were active in them.

The real-time video feed of mako on a definition of Freedom was surprisingly good, though I couldn't give it my full attention during the work day. I didn't understand the problem he was speaking to (isn't GFDL good enough?) until I listened to Lessig on Free Culture and realized that CC share-alike and GFDL don't interoperate. (Yet another reason to keep the test of independent invention in mind at all times.)

Lessig read this quote, but only referred to the author using a photo that I couldn't see via the audio feed; when I looked it up, I realized there was a gap in this student's free culture education:

If we don't want to live in a jungle, we must change our attitudes. We must start sending the message that a good citizen is one who cooperates when appropriate, not one who is successful at taking from others.

RMS, 1992

These sessions on the wikipedia process look particularly interesting; I hope to find time to see or listen to a recording:

I bumped into TimBL online and remind him about the Wikipedia and the Semantic Web panel; he had turned it down because of other travel obligations, but he just managed to stop by after all. I hope it went allright; he was pretty jet-lagged.

I see WikiSym 2006 coming up August 21-23, 2006 in Odense, Denmark. I'm not sure I can find justification to make travel plans on just a few weeks of notice. But Denny's hottest conference ever item burns like salt in an open wound and motivates me to give it a try. It looks like the SweetWiki folks, who participate in the GRDDL WG, will be there; that's the start of a justification...

how much do I want to know about drupal?

Submitted by connolly on Wed, 2006-08-09 09:59. ::

breadcrumbs fell over, again, today. Disk full. Probably the spam database filled up with drek. Again. While googling for reports of similar problems, I discovered drupal 4.7 is out since May 1. They tout TimBL's blog in their release announcement. I wonder if they'd help us upgrade. Well, they do provide a video about upgrading. Maybe I'll find time to watch it.

Meanwhile, I discovered a couple interesting articles on the design/architecture of drupal and how PHP is used: Drupal Programming from an Object-Oriented Perspective and the toungue-in-cheek The Road to Drupal Hell.

I'm not sure how much of this I really want to know. As I said back in my october item on PHP angst, I'm mostly playing simple customer when it comes to drupal. But I'm having a hard time investing in technology that I don't know inside and out.

In a #swig discussion where I was considering Zope alternatives (the one-big-file design has lost its charm), it occurred to me that I have read (parts of) the source to most everything that currently backs my personal web site Zope, the python interpreter, libc, various bits of debian infrastructure, and the linux kernel. I wonder when that will become totally impractical, and I'll understand my web site no more than I understand my car.

tabulator maps in Argentina

Submitted by connolly on Mon, 2006-08-07 11:39. :: | |

My spanish is a little rusty, but it looks like inktel is having fun with the tabulator's map support too.

tags pending: geo, tabulator

OpenID, verisign, and my life: mediawiki, bugzilla, mailman, roundup, ...

Submitted by connolly on Mon, 2006-07-31 15:45. ::

Please, don't ask me to manage another password! In fact, how about getting rid of most of the ones I already manage?

I have sent support requests for some of these; the response was understandable, if disappointing: when debian/ubuntu supports it, or at least when the core MailMain/mediawiki guys support it, we'll give it a try. I opened Issue 18: OpenID support in roundup too; there are good OpenID libraries in python, after all.

A nice thing about OpenID is that the service provider doesn't have to manage passwords either. I was thinking about where my OpenID password(s) should live, and I realized the answer is: nowhere. If we put the key fingerprint in the OpenID persona URL, I can build an OpenID server does public key challenge-response authentication and doesn't store any passwords at all.

As I sat down to tinker with that idea, I rememberd the verisign labs openid service and gave it a try. Boy, it's nice! They use the user-chosen photo anti-phishing trick and provide nice audit trails. So it will probably be quite a while before I feel the need to code my own OpenID server.

I'm still hoping for mac keychain support for OpenID. Meanwhile, has anybody seen a nice gnome applet for keeping the state of my ssh-agent credentials and my CSAIL kerberos credentials visible?

Slicing and dicing web data with Tabulator

Submitted by timbl on Wed, 2006-07-26 10:41. ::

There is a new version 07 of the Tabulator out. This is the generic data browser which lets you do useful things with your RDF data the moment it's on the web.

It works by exploring the web of relationship between things, loading more data from the web as you go. Then, if you find a pattern of information you are interested in, it will search for all occurrences of that pattern and display them in tables, maps, calendars, and so on.

In the same session, you can explore, say, some geocoded photos taken from on a trip with a GPS, outline view of trip data and then separately explore where in the world the tabulator developers are based. outline view of trip data Then, you can project both datasets onto the same map. outline view of trip data Or onto the same calendar, for data with a time component. This shows the cross-domain power of the semantic web.

This means you can correlate data from completely different domains. Think of all the different mash-ups people have made for putting things like friends houses, photos, or coffee shops on the web. Each a different mash-up for a different data source.

For data in RDF (or any XML with a GRDDL profile), though, then you don't have to program anything. You can just explore it and map it. And you can map many different data sources at the same time.

Oh, and for developers, the core of the tabulator is an open source RDF library with a complete tested RDF/XML parser, a store which smushes on owl:sameAs and owl:[Inverse]FunctionalProperty, and web crawling query engine supporting basic SPARQL. Enjoy.

Comments Disabled

Submitted by ryanlee on Tue, 2006-07-25 14:51. ::

Due to an overwhelming signal-noise ratio in the wrong direction, I've disabled all anonymous commenting. We've tried to use spam auto-classification, but the volume is so large and diverse that eventually everything looks like it might be spam, and it's back to square one.

Thanks for your direct participation and input; from now on, we'll be looking for alternatives to continuing these conversations across the web.

Syndicate content