On Google, Jabber, and Jingle and good and evil in IM and IP networks

Submitted by connolly on Tue, 2006-01-03 16:32. :: |

The 14 December jingle announcement gives a hint into google's approach to adding voice to their Google Talk offering. Actually, it gives quite a bit more than a hint; it comes with a jingle spec and an open source library implementation.

Google Talk has had pretty good "do no evil" karma since it started. The dominant commercial IM services (AOL/Yahoo/Microsoft) are each a world unto themselves. Your AIM screen name is just jim47 or whatever, not jim47@aol.com like an email address, and while clients like trillian and gaim can connect to them all, that's not something the big three encourage. Google Talk uses gmail addresses and the Jabber/XMPP protocol, which has the same network topology as email. While google isn't opening their service to actual server-to-server federation until they get a better handle on some operational issues (think: spam), they are using open protocols and they actively support gaim development.

Apple's iChat uses Jabber at some level too, though I haven't worked out the interoperability issues in practice. I think the last time I tried was before the Tiger release of OS X, when the Jabber support was much more under-the-covers.

The popularity of multi-protocol clients like gaim and trillian surprises me: after all, you can't have one chat room with AIM and MSN messenger users connected. Evidently this just not a big deal. "IRC and instant messaging are very different paradigms," says the Adium X: IRC Howto. I guess I'm just too old school to get it; in the internet relay chat usage that I'm used to, channels (aka chat rooms) are the norm and private channels are the exception. I gather IM is the other way around. I have played with Jabber's support for bridging to other networks, but I have yet to find a reliable combination of:

  • a jabber client with bridging support that I can figure out how to use
  • and either
    • server software with bridging support that I can figure out how to use, or
    • an existing service with bridging support that I can use and trust (since my credentials pass thru their service)

The Jabber protocol has lots of pieces and extensions an such. There's a whole JEP process, in addition to the XMPP process where jabber technology feeds into the IETF. I don't quite have my head around the whole thing. I discovered that there are older and newer protocols for doing chat rooms in jabber that don't mix well. I wonder which of them, if either, the IETF has endorsed. An XMPP summary shows JEP-0045 for Multi-User Chat but no RFC. And I don't see XMPP among IETF Working Groups any more. I wonder what's up. The xmppwg mailing list archives show pretty recent activity.

The $2.6Bn aquisition of skype by Ebay shows the value of networks of IM and voice users. Skype has a novel topology based on the same p2p designers that did Kazaa. As I understand it, they mostly use the p2p network for firewall traversal, which is the biggest problem, in practice, with deploying consumer voice chat. They keep the protocol details to themselves, though, and they have the only implementation, as a consequence. They have a centralized user directory too.

In my visit to the 62nd IETF in Minneapolis, MN, I learned what a sore spot firewall traversal is in Internet standardization. "Just use IPV6 and don't waste your time with those kludges" goes the one side; "but NAT works today" goes the other. Ugh. And since W3C started working more actively with developing countries, I hear more about the political aspects of IPV6. In the 1st world, we can dismiss claims that IPV4 addresses are running out as technically overblown, since we can afford to pay for the management fees and the NAT boxes. But the scarcity is a real economic issue in the developing world; plus it concentrates power in a way that engenders distrust.

Back to network topologies... the fact that Jabber has the same topology as conventional (SMTP) email means that it's subject to the same sorts of spam issues. I wonder if anybody has considered the IM2000 approach of redesigning the mail system as a pull delivery rather than as a push delivery system, so that recipients no longer bear the costs of receiving and storing unwanted messages. In an IM2000 world, senders have to hold still long enough to deliver a message, which makes it much easier to hold them accountable for nastiness.

Links on the Semantic Web

Submitted by timbl on Fri, 2005-12-30 15:04. :: | |

On the web of [x]HTML documents, the links are critical. Links are references to 'anchors' in other documents, and they use URIs which are formed by taking the URI of the document and adding a # sign and the local name of the anchor. This way, local anchors get a global name.

On the Semantic Web, links are also critical. Here, the local name, and the URI formed using the hash, refer to arbitrary things. When a semantic web document gives information about something, and uses a URI formed from the name of a different document, like foo.rdf#bar, then that's an invitation to look up the document, if you want more information about. I'd like people to use them more, and I think we need to develop algorithms which for deciding when to follow Semantic Web links as a function of what we are looking for.

To play with semantic web links, I made a toy semantic web browser, Tabulator. Toy, because it is hacked up in Javascript (a change from my usual Python) to experiment with these ideas. It is AJAR - Asynchronous Javascript and RDF. I started off with Jim Ley's RDF Parser and added a little data store. The store understands the mimimal OWL ([inverse] functional properties, sameAs) to smush nodes representing the same thing together, so it doesn't matter if people use many different URIs for the same thing, which of course they can. It has a simple index and supports simple query. The API is more or less the one which cwm and had been tending toward in python.

Then, with the DOM and CSS and Ecmascript standards bookmarked, the rest was just learning the difference between Javascript and Python. Fun, anyway.

The result .. insert a million disclaimers... experimental, work in progress, only runs on Firefox for no serious reason, not accessible, too slow, etc ... at least is a platform for looking at Semantic Web data in a fairly normal way, but also following links. A blue dot indicates something which could be downloaded. Download some data before exploring the data in it. Note that as you download multiple FOAF files for example the data from them merges into the unified view. (You may have to collapse and re-expand an outline).

Here is the current snag, though. Firefox security does not allow a script from a given domain to access data from any other domain, unless the scripts are signed, or made into an extension. And looking for script signing tools (for OS X?) led me to dead ends. So if anyone knows how to do that, let me know. Untill I find a fix for that, the power of following links -- which is that they can potentially go anywhere -- is alas not evident!

How much bandwidth is enough?

Submitted by Danny Weitzner on Wed, 2005-12-28 09:28. ::
How much bandwidth is enough?

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

Om Malik has a nice post, “Need For Speed: How Real?” asking how much IP bandwidth is enough. He writes:

After years of being stuck in the slow lane, the US consumers are finally going to get a massive speed upgrade and taste the true broadband for the first time. From a 512 Kbps world to 6 Mbps, then 8 and soon 15 Mbps. it seems the future has finally arrived. And with that, the question. how much speed is enough? Can we the consumers really tell the difference between 15 and 30 Mbps?

[Research shows, he says, that as] we increase the speed, the real impact of the speed on what we do with it is marginal. Can your eyes tell the difference between a web-page loading in one second or 0.27 seconds. I guess not. If you can download a music file in 1.08 seconds, does that really mean you will be buying music all the time. No you perhaps will be buying better quality, and perhaps marginally more music. There is the other option, but its just easier to pay! Sure at 30 Mbps you can download DVD quality The Bourne Identity in 11 minutes, but its still going to take you 2 hours to watch it.

[..]

Don’t get me wrong…. I will upgrade, and hope the experience improves, but at some point, we need the applications that truly harness this speed come-along and are allowed to thrive. Not likely in the “we will control the net” attitude adopted by the incumbents. Even in truly immersive multiplayer games, its the latency, not the speed that matters.

The real bandwidth question is when are going to see an increase in the uplink speeds?

I’m acutally not sure that this is the real question, though. It’s always fascinating to ask how much (of anything) is enough but the question seems (a) unanswerable, and (b) perhaps not the most important one to be thinking about, at least from a public policy standpoint. As the so-called Net Neutrality debate heats up again in the US and around the world, policy makers will have to think about how to trade off the need for open, non-discriminatory networks against the desire for higher and higher capacity broadband pipes to the user. Perhaps there are ways to avoid an either-or choice, but I’d suggest that the development of the Internet has rested at least as much on openness as on bandwidth.

There’s an interesting discussion thread on this topic on Dave Farbers Interesting People list.

Judge Posner on privacy and government data mining

Submitted by Danny Weitzner on Fri, 2005-12-23 09:26. ::
Judge Posner on privacy and government data mining

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

Writing on the Washington Post Op-Ed page this week (Our Domestic Intelligence Crisis, 21 December 2005), Judge Richard Posner asserts that there is no privacy threat from the mere collection of personal information by the government. He writes:

These programs [such as Pentagon’s Counterintelligence Field Activity (CIFA)] are criticized as grave threats to civil liberties. They are not. Their significance is in flagging the existence of gaps in our defenses against terrorism. The Defense Department is rushing to fill those gaps, though there may be better ways.

The collection, mainly through electronic means, of vast amounts of personal data is said to invade privacy. But machine collection and processing of data cannot, as such, invade privacy. Because of their volume, the data are first sifted by computers, which search for names, addresses, phone numbers, etc., that may have intelligence value. This initial sifting, far from invading privacy (a computer is not a sentient being), keeps most private data from being read by any intelligence officer. (emphasis added)

This gets are the heart of the question about data mining. If you believe that privacy is, in Justice Brandeis’ words, “the right to left alone,” then you probably don’t agree with Judge Posner. Even if the government does nothing with you data, the simple act of collecting and possessing it can hardly be said to be ‘leaving you alone’. But if you’re more concerned with privacy as the right to control how personal information is used and know that it will be used according to a clear set of rules, constitutional and statutory, then perhaps you’re more prepared to accept Posner’s view.

Another cellphone location tracking case: this time the government need not meet 4th Amendment probable cause requirement

Submitted by Danny Weitzner on Thu, 2005-12-22 21:00. ::
Another cellphone location tracking case: this time the government need not meet 4th Amendment probable cause requirement

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

I learned from an Ars Technica item of yet another federal court ruling on cellphone location tracking (US District Court for the Southern District, Gorenstein, Mag., Opinion and Order 05 Mag. 1763, 21 December 2005). This time, the magistrate found that the government could gain access to at least a particular type of location information (cell site information) without satisfying the full 4th Amendment probable cause requirement.

The result in this new opinion by the 4th federal magistrate writing in as many months takes the opposite view from the first 3. He concludes that the location information (at least cell tower information) is available to the government without satisfying the 4th Amendment probably cause standard. All they need to do is to show “specific and articulable facts” that the information sought is relevant to an ongoing criminal investigation. This is less than the full 4th Amendment standard, but entails significant independent judicial oversight of the privacy intrustion, unlike the virtually non-existent oversight required to get ‘dialed number’ information. (See my earlier post on this subject for more details.) Magistrate Gorenstein comes to this conclusion through what I believe may be a mistaken reading of the relevant statute. The reasoning is so convoluted that I hesitate to even try to summarize it here. In short, he concluded that a statute passed in 1994, the Communications Assistance for Law Enforcement Act (Pub. L. 103–414, 47 USC 1001, et. seq), implied, though did not explicitly state, that the lower standard was applicable to cell site location data. I don’t believe that this is what the statute actually says, and the the Congressional Committee that wrote the statute didn’t either. The committee report (aka legislative history) explains that CALEA:

Expressly provides that the authority for pen registers and trap and trace devices cannot be used to obtain tracking or location information, other than that which can be determined from the phone number. H. Rep. No. 103-827.

Judges sometimes do have good arguments for interpreting statutes as they see them, even if the intepretation contradicts the ‘legislative history’. After all, the Congress writes the statute so they’re expect to get it right so that we can all understand it on its ‘face’, without having to resort to extra explanations.

All that said, there’s reason to be other than completely gloomy about this result from a privacy perspective. First, this magistrate specifically limited his ruling to situations in which the government only seeks information about the cell site with which the target cell phone is actually communicating. That reveals someone’s location with a 10 mile to 2000 ft radius (depending on the density of cells) but does not enable the goverment to instantaneously ‘triangulate’ a person’s location to a finer resolution. (It is possible to infer a rough map of were the person travels, however.) And second, it’s generally the case that when lots of trial courts start coming to opposite conclusions on the same or related questions, there’s greater pressure of appeals courts (who have broader jurisdiction) to resolve the differences and settle on one common interpretation of the law. That’s more likely to happen on this issue now that there is disagreement. (There are some legal technicalities that make an quick appeal difficult, but it’s likely to happen sooner or later.)

A report on the first stage of this case is on Declan McCullagh’s Policy Blotter (2 September 2005).

Drupal, OpenID, and the Mac OS X Keychain

Submitted by connolly on Mon, 2005-12-19 16:12. :: | | |

Managing passwords via email callback is hampered by anti-spam mechanisms. I just helped a breadcrumbs user whose password message from drupal was classified as Junk by Mac OS X Mail.

Meanwhile, I did enough research on the Mac OS X keychain to trust it. Support for OpenID in drupal is already in the OpenID wish list and I've see some progress.

It's not obvious to me how to connect the keychain to OpenID, but I'm sure there's a way. Any suggestions?

Thank you for all the comments

Submitted by timbl on Mon, 2005-12-19 16:10. ::

Oops! Thanks for all the wonderful welcoming comments. We've had rather a lot, and had to turn the comments off on the first blog. I can't answer them all, but I would point out one thing. I just played my part. I built on the work of others -- the Internet, invented 20 years before the web, by Vint Cerf and Bob Kahn and colleagues, for example, and hypertext, a word coined by Ted Nelson for an idea of links which was already implemented in many non-networked systems. I just put these technologies together. And then, it all took off because of this amazing community of enthusiasts, who have done such incredible things with the technology, and are still advancing it in so many ways.

By the way, this blog is at DIG, the Decentralised Information group at MIT's CSAIL. I intend it to be geeky semantic web stuff mostly. For example, it won't be for W3C questions which should really be addressed to working groups.

So thanks for all the support, no need for more general 'thank you' comments! Thank *you* all.

“Cold Hits” - a new frontier in DNA profiling

Submitted by Danny Weitzner on Mon, 2005-12-19 11:36. ::
“Cold Hits” - a new frontier in DNA profiling

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

Last week the Washington DC Court of Appeals ruled (Case no. 05-CO-333, 15 December 2005, Washington, CJ) that it is permissible for the prosecution to use DNA evidence matching DNA collected from a crime scene against a database of DNA samples collected from a large population (in this case, 100,000+ criminal offenders in Virginia). The legal issue in the case on appeal was limited to the question of whether there was sufficient scientific consensus as to method of assessing the statistical reliability of this procedure to justify the introduction of this evidence in court. In particular, the court asked whether there was agreement among experts on how to assess the ‘random match probabiliy,’ the probability that the DNA match identified could have picked out the wrong person. However, the implications of the holding are far reaching in that the path appears to be more clear for matching unidentified DNA found at a crime scene against large database of DNA data.

The underlying case giving rise to the appeal involves the murder of a man in Washington, DC. Initial investigation of the crime yielded a suspect in a related robbery, but failed generate sufficient evidence to charge anyone with murder. In the course of investigating the scene, the police collected blood from the scene and then matched the DNA found in the blood against a database of Virginia criminal offenders. This profiling yielded a ‘cold hit,’ identifying a man named Raymond Jenkins. The trial court, however, refused to allow the prosecution to introduce this match in evidence so the goverment appealed, bringing the case to the Washington, DC Court of Appeals, DC’s highest court. The Court of Appeals determined that the trial court made a mistake in blocking the introduction of the ‘cold hit’ evidence so the case will now go back to the trial court with the government being able to introduce that match against the defendant. As I wrote above, there was no basis for the appeals court to address the larger policy implications of this sort of DNA matching, but this case does mark and important expansion on DNA profililng powers both in DC and likely in the rest of the country.

Connecting DIG Student Projects to the MIT UROP listing

Submitted by connolly on Mon, 2005-12-19 00:51. :: | | |

A couple MIT students have found their way to the #dig channel and asked about UROPs during IAP. I'm still learning about student rhythms at MIT; I was never a student here; I got my degree at U.T. Austin. My ten years with W3C has exposed me to the terms UROP and IAP before, but I have paged most of it out. Let's refresh our cache, shall we?

The Independent Activities Period (IAP) is a special four week term at MIT that runs from the first week of January until the end of the month. IAP 2006 takes place from January 9 through February 3.

IAP overview

In UROP info for supervisors, I see there's a form for listing projects. Hey... it would be cool if the student projects category here in this blog were automatically syndicated via that form. A meta-student-project?

Meanwhile, we do have a few notes on student projects among our DIG info for MIT students.

I'm not sure how items syndicated from Danny/Eric via the WordPress plug-in can get categorized; I suppose we can do it manually, after-the-fact?

I see a bunch of UROP openings for this time of year. The Building Games to Acquire Commonsense Knowledge project looks cool.

NOTE: It is expected that UROP students are supervised in the laboratory at all times, per the Institute's "no working alone" policy .

UROP safety isses

Sounds a bit like a "no coding alone" policy that I've been pushing around W3C and DIG, since discovering the value of pair programming, or a variant of it.

Catching up on cell phone location tracking law

Submitted by Danny Weitzner on Fri, 2005-12-16 11:50. ::
Catching up on cell phone location tracking law

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

Catching up on my reading after a busy few months, I’ve been looking through 3 recent opinions from US Magistrate Judges (the folks who generally have the first pass on government requests for court orders for wiretaps and other types of electronic surveillance). Since the middle of October, Federal courts in
Maryland (05-4486), Texas(H-05-557M), and New York (M 05-1093 (JO)) have rejected government requests to get real time cell phone location tracking information because the government, so far, has not been willing to meet the the Fourth Amendment ‘probable cause’ standard for justifying this intrusion.

The key legal issue in all of these cases is what level of judicial oversight is required to compel the disclosure of information sought by the government. Answering this question has turned on whether the data sought is “transactional information generated by the mobile phone network (in which case it would be regulated under
18 USC 2703 sections (b) and (d)) or that it is equivalent to a mobile tracking device regulated under 18 USC 3117? If it is transactional data then it is available to the government under a so-called intermediate standard, less than probably cause, but more than a simple request. Under this rule, the government has to give a clear and articulable reason which the data is relevant to an ongoing investigation. It can’t just be going on a fishing expedition. Though the government advocated this position is all three cases, all three courts found that there must be a showing of probable cause. (This is all personally pretty interesting to me because I did a lot of work advocating for this new protection for transactional data back in 1994. At the time we (and I believe the Congress) was thinking about transacational information such as email and web access logs. We were not thinking about real time location data. My guess is that Congress will have to come back to this question to settle it.

Kudos to EFF (where I’m proud to say I once worked) for for filing an amicus curiae brief in the New York case and for bringing attention to this issue.

Added 22 December 2005: See this update