XML

An Introduction and a JavaScript RDF/XML Parser

Submitted by dsheets on Mon, 2006-07-17 15:02. :: | | | |

My name is David Sheets. I will be a sophomore at MIT this fall. I like to be at the intersection of theory and practice.

This summer, I am working as a student developer on the Tabulator Project in the Decentralized Information Group at MIT's CSAIL. My charge has been to develop a new RDF/XML parser in JavaScript with a view to a JavaScript RDF library. I am pleased to report that I have finished the first version of the new RDF/XML parser.

Before this release, the only available RDF/XML parser in JavaScript was Jim Ley's parser.js. This parser served the community well for quite a while but fell short of the needs of the Tabulator Project. Most notably, it didn't parse all valid RDF/XML resources.

To rectify this, work on a new parser was begun. The result that is being released today is a JavaScript class that weighs in at under 400 source lines of code and 2.8K gzip compressed (12K uncompressed). For maximum utility, a parser should be small, standards-compliant, widely portable, and fast.

To the best of my knowledge, RDFParser is fully compliant with the RDF/XML specification. The parser passes all of the positive parser test cases from the W3. This was tested using jsUnit -- a unit testing framework similar to jUnit but for JavaScript. To run the automated tests against RDFParser, you can follow the steps here. This means the parser supports features such as xml:base, xml:lang, RDF Collections, XML literals, and so forth. If it's in the specification, it should be supported. An important point to note is that this parser, due to speed concerns, is non-validating. Additionally, RDFParser has been speed optimized resulting in code that is slightly less readable.

The new parser is not as portable as the old parser at this time. It has only been tested in Firefox 1.5 but should work in any browser that supports the DOM Level 2 specification.

RDFParser runs at a speed similar to Jim Ley's parser. One can easily construct example RDF/XML files that run faster on one parser or another. I took five files that the tabulator might come across in day-to-day use and I ran head-to-head benchmarks between the two parsers.

Parse time is highly influenced by compact serialization. The more nested the RDF/XML serialization, the more scope frames must be created to track features from the specification. The less nested, the fewer steps to traverse the DOM, the more triples per DOM element.

Planned in the next release of RDFParser is a callback/continuation system so that the parser can yield in the middle of a parse run and allow other important page features to run.

API documentation for RDFParser included in the Tabulator 0.7 release is available.

Finally, I'd be happy to hear from you if you have questions, comments, or ideas regarding the RDFParser or related technologies.

On GData, SPARQL update, and RDF Diff/Sync

Submitted by connolly on Tue, 2006-04-25 17:38. :: | | |

The Google Data APIs Protocol is pretty interesting. It seems to be based on the Atom publishing protocol, which is a pretty straightforward application of HTTP and XML, so that's a good thing.

The query features seem to be less expressive than the SPARQL protocol, but GData has an update feature, while the SPARQL update issue is postponed. Updating at the triple level is tricky. I helped TimBL refine Delta: an ontology for the distribution of differences between RDF graphs a bit, and there's working code in cwm. But I haven't really managed to use it in practical settings. My PDA's calendar has an XMLRPC service where I can update a whole record at a time, just like GData. I assume caldav does likewise.

The GData approach to concurrency looks quite reasonable. I haven't studied the authentication mechanism. I hope to get to that presently.

RDF Calendar, GRDDL, Microformats, and all that at XML2005 in Atlanta

Submitted by connolly on Mon, 2005-11-21 15:21. :: | | | | |

My talk was:

I unfortunately didn't leave any time for questions, but I had some interesting follow-up conversations:

  • Somebody asked about using GRDDL and RDF to track relationships between specs, products that support them, and all that. I recalled that when the folks that run the OASIS standards registry contacted W3C, we told them we prefer a more decentralized approach: each organization publishes stuff about their own standards, in RDF, and anybody can aggregate it. TimBL's roadmap diagrams show one approach. It is somewhat bit-rotten, but we have an automated system in production for publishing basic title/author/date/version metadata about our specs and we're adding more stuff over time; e.g. which WG produced the spec (for patent policy reasons), comment due dates, etc. I told him this had come up in spec-prod; while I'm happy for the discussion to go there, my impression that it had come up there before was wrong. I hope to organize my thoughts on this near NormativeReferences in the QA/ESW wiki and re-kindle discussion in spec-prod or qa-ig.
  • At lunch, somebody brought up my slide about email headers in RDF and asked if thunderbird has RDF support like mozilla and firefox. I don't know, but I hope to find out. DanBri? Anyone?

On the non-technical front, jamming with Len Bullard was a blast. We had a fascinating discussion of DRM and the recording industry where I relayed AaronSw's viewpoint that any model based on scarcity is uninteresting. Len says Prince is no longer independent, which contradicts the impression I got from studying Prince in Wikipedia recently. Len says the big customer ripie for SemWeb technology is transit, at least as much as intelligence. Gotta look into that.

Later in the evening Len brought out a fake book and Tony and Lauren and Eve and John sang and I tried to accompany them on Len's guitar. I was having so much fun that I raised a sizeable blood-blister on my strumming hand before I noticed. I think we did OK with Annie's Song as well as mangling lots of Beatles and such.

Then Len took the guitar and Eve asked him to play Angel from Montgomery by Bonnie Raitt. When he said he didn't know it, I was able to use my sidekick to find chords and lyrics and since it was your basic three chord number, he picked it up in no time.

As to the conference program...

Tue 15 Nov

Wed 16 Nov

Thu 17 Nov

Syndicate content