blogs

Choosing flight itineraries using tabulator and data from Wikipedia

Submitted by connolly on Mon, 2006-07-17 18:13. :: |

While planning a trip to Boston/Cambridge, I was faced with a blizzard of itinerary options from American Airlines. I really wanted to overlay them all on the same map or calendar or something. I pretty much got it to work:

That's a map view using the tabulator, which had another release today. The itinerary data in RDF is converted from HTML via grokOptions.xsl (and tidy).

I can, in fact, see all the itineraries on the same calendar view. Getting these views to be helpful in choosing between the itineraries is going to take some more work, but this is a start.

Getting a map view required getting latitude/longitude info for the airports. I think getting Semantic Web data from Wikipedia is a promising approach. A while back, I figured out how to get lat/long data for airports from wikipedia. This week, I added a Kid template, aptinfo.kid, and I figured figured out how to serve up that data live from the DIG/CSAIL web server. For example, http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD#item is a URI for the Chicago airport, and when you GET it with HTTP, a little CGI script calls aptdata.py, which fetches the relevant page from wikipedia (using an httplib2 cache) and scrapes the lat/long and a few other details and gives them back to you in RDF. Viewed with RDF/N3 glasses, it looks like:

#   Base was: http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD
@prefix : <#> .
@prefix apt: <http://www.daml.org/2001/10/html/airport-ont#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .
@prefix go: <http://www.w3.org/2003/01/geo/go#> .
@prefix s: <http://www.w3.org/2000/01/rdf-schema#> .

:item a apt:Airport;
apt:iataCode "ORD";
s:comment "tz: -6";
s:label "O'Hare International Airport";
go:within_3_power_11_metres :nearbyCity;
geo:lat "41.9794444444";
geo:long "-87.9044444444";
foaf:homepage <http://en.wikipedia.org/wiki/O%27Hare_International_Airport> .

:nearbyCity foaf:homepage <http://en.wikipedia.org/wiki/wiki/Chicago%2C_Illinois>;
foaf:name "Chicago, Illinois" .

In particular, notice that:

  • I use the swig geo vocabulary, which the new GEO XG is set up to refine. The use of strings rather than datatyped floating point numbers follows the schema for that vocabulary.
  • I use distinct URIs for the airport (http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD#item) and the page about the airport (http://dig.csail.mit.edu/2006/wikdt/airports?iata=ORD).
  • I use an owl:InverseFunctionalProperty, foaf:homepage to connect the airport to its wikipedia article, and another, apt:iatacode to relate the airport to its IATA code.
  • I use the GeoOnion pattern to relate the airport to a/the city that it serves. I'm not sure I like that pattern, but the idea is to make a browseable web of linked cities, states, countries, and other geographical items.

Hmm... I use rdfs:label for the name of the airport but foaf:name for the name of the city.I don't think that was a conscious choice. I may change that.

The timezone info is in an rdfs:comment. I hope to refine that in future episodes. Stay tuned.

An Introduction and a JavaScript RDF/XML Parser

Submitted by dsheets on Mon, 2006-07-17 15:02. :: | | | |

My name is David Sheets. I will be a sophomore at MIT this fall. I like to be at the intersection of theory and practice.

This summer, I am working as a student developer on the Tabulator Project in the Decentralized Information Group at MIT's CSAIL. My charge has been to develop a new RDF/XML parser in JavaScript with a view to a JavaScript RDF library. I am pleased to report that I have finished the first version of the new RDF/XML parser.

Before this release, the only available RDF/XML parser in JavaScript was Jim Ley's parser.js. This parser served the community well for quite a while but fell short of the needs of the Tabulator Project. Most notably, it didn't parse all valid RDF/XML resources.

To rectify this, work on a new parser was begun. The result that is being released today is a JavaScript class that weighs in at under 400 source lines of code and 2.8K gzip compressed (12K uncompressed). For maximum utility, a parser should be small, standards-compliant, widely portable, and fast.

To the best of my knowledge, RDFParser is fully compliant with the RDF/XML specification. The parser passes all of the positive parser test cases from the W3. This was tested using jsUnit -- a unit testing framework similar to jUnit but for JavaScript. To run the automated tests against RDFParser, you can follow the steps here. This means the parser supports features such as xml:base, xml:lang, RDF Collections, XML literals, and so forth. If it's in the specification, it should be supported. An important point to note is that this parser, due to speed concerns, is non-validating. Additionally, RDFParser has been speed optimized resulting in code that is slightly less readable.

The new parser is not as portable as the old parser at this time. It has only been tested in Firefox 1.5 but should work in any browser that supports the DOM Level 2 specification.

RDFParser runs at a speed similar to Jim Ley's parser. One can easily construct example RDF/XML files that run faster on one parser or another. I took five files that the tabulator might come across in day-to-day use and I ran head-to-head benchmarks between the two parsers.

Parse time is highly influenced by compact serialization. The more nested the RDF/XML serialization, the more scope frames must be created to track features from the specification. The less nested, the fewer steps to traverse the DOM, the more triples per DOM element.

Planned in the next release of RDFParser is a callback/continuation system so that the parser can yield in the middle of a parse run and allow other important page features to run.

API documentation for RDFParser included in the Tabulator 0.7 release is available.

Finally, I'd be happy to hear from you if you have questions, comments, or ideas regarding the RDFParser or related technologies.

Wondering about how PDF&#8217;s phone home

Submitted by Danny Weitzner on Thu, 2006-07-13 21:40. ::
Wondering about how PDF’s phone home

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

Recently noticed this description of the Adobe Policy Server:

About Adobe LiveCycle Policy Server. Authors who protect their documents with Adobe LiveCycle Policy Server can audit what is done with each copy of the document (such as opening, printing, and editing). They can also change or revoke access rights at any time. If an author has revoked access to a document that is protected by Adobe LiveCycle Policy Server, Adobe Reader or Acrobat informs you that your access rights have been removed the next time you try to open the document.

I wonder how this works.

a walk thru the tabulator calendar view

Submitted by connolly on Tue, 2006-07-11 11:22. :: | |

The SIMILE Timeline is a nifty hack. DanBri writes about Who, what, where, when? of the Semantic Web, and in a message to the Semantic Web Interest Group, he asks

TimBL, any plans for Geo widgets to be wired into the tabulator?

Indeed, there are. Stay tuned for more on that... from the students who are actually doing the work, I hope. But I can't wait to tell everybody about the calendar view. Give it a try:

  1. Start with the development version of the tabulator.
  2. Select a source of calendar info. Morten's SPARQL Timeline uses RSS. The tabulator calendar groks dc:date, so something like W3C's main RSS feed will work fine. Put its URI in the tabulator's URI text field and hit "Add to outliner".
    add to outliner screenshot, before

    When it's done it should look something like this:

    add to outliner screenshot, after
    • For fun, open the Sources tab near the bottom. Note that the tabulator loads the RSS and DC schemas, plus all the schemas they reference, and so on; i.e. the ontological closure. Hmm... the RSS terms seem to be 404.
      sources screenshot
  3. Now navigate the outline down to one of the items.
    item screenshot
    and then re-focus (shift click) on the rss item class itself, and then open an item and select the date property.
    refocus screenshot
  4. Now hit "Tabulate selected properties". You'll get a table of items and their dates.
    table screenshot
  5. OK, so much for review of basic tabulator stuff. Now you're all set for the new stuff. Hit Calendar and scroll down a little:
    table screenshot

Note the Export button with the SPARQL option. That's a whole other item in itself, but for now, you can see the SPARQL query that corresponds to what you've selected to put on the the calendar:

SELECT ?v0 ?v1 
WHERE
{
?v0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/rss/1.0/item> .
?v0 <http://purl.org/dc/elements/1.1/date> ?v1 .

Fun, huh?

BusinessWeek likes CDT&#8217;s middle ground on Net Neutrality

Submitted by Danny Weitzner on Fri, 2006-07-07 10:45. ::
BusinessWeek likes CDT’s middle ground on Net Neutrality

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

BusinessWeek technology columnist Stephen H. Wildstrom writes about The War for the Net’s Future. He describes the simplistic nature of the current debate (currently in a summer hiatus with the anti-neutrality side at a big advantage):

Like most policy debates, the Washington argument over “network neutrality” is thoroughly uninformative. Both the cable and telephone companies on one side and big Internet companies like Google (GOOG) and Microsoft (MSFT) on the other claim to be protecting consumers. But there’s a danger we users could get trampled in this fight among elephants.
The problem is that everyone wants to get into the business of on-demand video — phone companies, cable companies, and players such as Microsoft, Google, and Yahoo! (YHOO).

[..]

Fortunately, there’s a middle ground: We must acknowledge that public networks for everyone can exist alongside premium, private ones, and that these two types of networks can live by different rules. The Center for Democracy & Technology (CDT), a think tank on tech issues, argues for an approach that preserves the open nature of today’s Internet while creating space for premium networks. This solution truly serves the interests of consumers and most businesses.

CDT’s position paper describes the details of their position. I’m happy to have been able to contribute to CDT’s views through my paper: The Neutral Internet: An Information Architecture for Open Societies. (Full disclosure: I’m on CDT’s Board of Directors and founder of the organization, so I’m biased toward their way of approaching Internet issues.)

converting vcard .vcf syntax to hcard and catching up on CALSIFY

Submitted by connolly on Thu, 2006-06-29 00:17. :: | | | |

A while back I wrote about using JSON and templates to produce microformat data. I swapped some of those ideas in today while trying to figure out a simple, consistent model for recurring events using floating times plus locations.

I spent a little time catching up on the IETF CALSIFY WG; they meet Wednesday, July 12 at 9am in Montreal. I wonder how much progress they'll make on issues like the March 2007 DST change and the CalConnect recommendations toward an IANA timezone registry.

When I realized I didn't have a clear problem or use case in mind, I went looking for something that I could chew on in test-driven style.

So I picked up the hcard tests and built a vcard-to-hcard converter sort of out of spare parts. icslex.py handles low-level lexical details of iCalendar, which turn out to have lots in common with vCard: line breaking, escaping, that sort of thing. On top of that, I wrote vcardin.py, which has enough vcard schema smarts to break down the structured N and ADR and GEO properties so there's no microparsing below the JSON level. Then contacts.kid is a kid template that spits out the data following the hcard spec.

It works like this:

python vcardin.py contacts.kid hcard/01-tantek-basic.vcf >,01.html

Then I used X2V to convert the results back to .vcf format and compared them using hcard testing tools (normalize.pl and such) fixed the breakage. Lather, rinse, repeat... I have pretty much all the tests working except 16-honorific-additional-multiple.

It really is a pain to set up a loop for the additional-name field when that field is almost never used, let alone used with multiple values. This sort of structure is more natural in XML/XHTML/hCard than in JSON, in a way. And if I change the JSON structure from a string to a list, does that mean the RDF property should use a list/collection? Probably not... I probably don't need additional-name to be an owl:FunctionalProperty.

Hmm... meanwhile, this contacts.kid template should mesh nicely with my wikipedia airport scraping hack...

See also: IRC notes from #microformats, from #swig.

Net Neutrality is the &#8216;toughest issue&#8217; in the US Senate&#8217;s telecommunications bill

Submitted by Danny Weitzner on Tue, 2006-06-27 17:44. ::
Net Neutrality is the ‘toughest issue’ in the US Senate’s telecommunications bill

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

According to Reuters, the chairman of the Senate Commerce Committee has said that ‘Net Neutrality’ is the toughest issue facing Senators, and that there is not yet sufficient support in the Senate for the bill that Stevens has proposed. ‘Net Neutrality’ will be the subject of committee votes tomorrow with various proposals being offered for the committee to consider.

Sens. Olympia Snowe, a Maine Republican, and Byron Dorgan, a North Dakota Democrat, plan to offer an amendment that would prevent broadband providers from giving priority to any individual company’s content. Snowe said that the debate would likely spill out of the committee and on to the Senate floor. “It won’t be the end of it, it will be the beginning,” she said.

The contention about the issue is, I believe, a positive sign. The committee has recognized that this a high priority but nonetheless complex issue.

Net Neutrality: This is serious

Submitted by timbl on Wed, 2006-06-21 16:35. ::

( real video, download m4v )

When I invented the Web, I didn't have to ask anyone's permission. Now, hundreds of millions of people are using it freely. I am worried that that is going end in the USA.

I blogged on net neutrality before, and so did a lot of other people. (see e.g. Danny Weitzner, SaveTheInternet.com, etc.) Since then, some telecommunications companies spent a lot of money on public relations and TV ads, and the US House seems to have wavered from the path of preserving net neutrality. There has been some misinformation spread about. So here are some clarifications. ( real video Mpegs to come)

Net neutrality is this:

If I pay to connect to the Net with a certain quality of service, and you pay to connect with that or greater quality of service, then we can communicate at that level.
That's all. Its up to the ISPs to make sure they interoperate so that that happens.

Net Neutrality is NOT asking for the internet for free.

Net Neutrality is NOT saying that one shouldn't pay more money for high quality of service. We always have, and we always will.

There have been suggestions that we don't need legislation because we haven't had it. These are nonsense, because in fact we have had net neutrality in the past -- it is only recently that real explicit threats have occurred.

Control of information is hugely powerful. In the US, the threat is that companies control what I can access for commercial reasons. (In China, control is by the government for political reasons.) There is a very strong short-term incentive for a company to grab control of TV distribution over the Internet even though it is against the long-term interests of the industry.

Yes, regulation to keep the Internet open is regulation. And mostly, the Internet thrives on lack of regulation. But some basic values have to be preserved. For example, the market system depends on the rule that you can't photocopy money. Democracy depends on freedom of speech. Freedom of connection, with any application, to any party, is the fundamental social basis of the Internet, and, now, the society based on it.

Let's see whether the United States is capable as acting according to its important values, or whether it is, as so many people are saying, run by the misguided short-term interested of large corporations.

I hope that Congress can protect net neutrality, so I can continue to innovate in the internet space. I want to see the explosion of innovations happening out there on the Web, so diverse and so exciting, continue unabated.

My paper on Internet Neutrality

Submitted by Danny Weitzner on Mon, 2006-06-19 23:48. ::
My paper on Internet Neutrality

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

I’ve just posted a paper on the ‘Net Neutrality’ issue, entitled “The Neutral Internet: An Information Architecture for Open Societies [PDF]. This paper argues that it is important for Congress to enact legislative protection for the essential non-disciminatory and neutral aspects of the Internet.

The United States Senate is likely to take a key vote on the issue this week, so it’s an important time to pay attention and make your voice heard.

The paper identifies four key aspects of the Neutral Internet that must be protected:

  1. Non-discriminatory routing of packets
  2. User control and choice over service levels
  3. Ability to create and use new services and protocols without prior
    approval of network operators
  4. Non-discriminatory peering of backbone networks.

I suggest that:

These principles taken together constitute the social contract among Internet service providers that has been indispensable to its great openness and success. They are equally important regardless of whether the service is broadband or narrowband, wireless or wireline, fiber optic, copper pair or coax. Understanding the Internet requires taking this holistic view of the Internet as a set of business, technical and social arrangements. While traditional telecommunications policy thinking divides the world into ‘facilities’ and different bandwidth levels, these are not the appropriate categories within which we should regulate or de-regulate the Internet. Indeed, the very foundation of the Internet is its ability to connect efficiently a broad array of quite different networks, allowing a publisher of information to reach a global audience without regard to which or what kind of network the recipient is on. To allow the nation’s leading Internet access providers to upend this fundamental global understanding would be to undermine the Internet itself.

There are those who argue that it’s dangerous to ‘regulate’ the Internet and therefore there should be no non-discrimination rules enacted. Some network operators are simply self-serving and want the freedom to take advantage of the tremendous wealth that the Internet generates without playing by the simple, common rules that make it so economically powerful and socially useful. Other well-meaning friends of the Internet worry that getting the FCC involved will risk mendlesome regulation and lead to unintended consequences. I worry about this too, but think that Congress should be able enact Internet Neutrality protections (for example by creating a right to complain against discriminatory conduct at the FTC or FCC) without too much risk. Think of regular old telephone service, whose cornerstone has been for more than 100 years basic commn carriage non-discrimination. True, many countries, including the US have all but eliminated the prices regulation aspect of common carriage. But we still maintain the prohibition against content discrimination simply because it’s vital to the way the telephone system works. This doesn’t result in burdensome regulation, but rather reminds everyone of the importance of maintaining an open voice telephone system.

At the same time, I suggest that some of the Net Neutrality proponents have actually distorted the debate (and hurt our changes to protect the Internet) by trying, intentionally or not, to extend Internet non-discrimination principles to other broadband services such as cable television. The paper explains:

By distinguishing between Internet Neutrality and more general NetNeutrality, it is possible to establish basic non-discriminatory neutrality requirements that will preserve the neutral aspects of the Internet that have brought commercial and non-commercial benefits to hundreds of millions of people around the world. At the same time, policy makers should carefully monitor the evolution of new broadband networks and services. As long as those new networks operate in a manner that does not actively interfere with or unfairly compete against Internet services, policy makers should allow the private sector a freer hand in designing and operating new broadband infrastructure.

I’d welcome comments on the paper and will likely write more about this in the coming days.

Update: Associated Press coverage of the paper: “All Online Traffic May Not Be Equal, 20 June 2006.

fun with flock

Submitted by connolly on Fri, 2006-06-16 23:52. :: |

I just found this...

Flocks has a nice WYSIWYG blog post editor that's a marvel to use. It does the quoting and citation for me. It lets me drag images from the Web, or my flickr stream straight into the post. It just makes posting so much more fun.

Labnotes � Blog Archive � Mucking around with Flock

Ooh... excerpt with 2 guestures.

Source view!

Let's try linking... and list editing...

Let's try pictures...

A photo on Flickr

Seems to work quite nicely!

Odd... posting works, but the posting dialog hangs.

Hmm... I'd rather use delicious than technorati as the base of my tags.

technorati tags:, , ,

Blogged with Flock

Syndicate content