Privacy Lost?

Submitted by Danny Weitzner on Tue, 2007-10-23 09:48. ::

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

The New York Times writes about new location-sharing services and worries about Privacy Lost. No doubt there will be all sort of privacy questions associated with these services but it seems pretty clear that people are going to flock to location-based services of all kinds. Some data points:

  • As the reporter, Laura Holson points out, over 50% of mobile phone now sold are GPS-capable. That number is certain to rise to near 100% over time.
  • Even without GPS, mobile networks are pretty good at inferring location by triangulation. In the US it is a requirement that mobile network operators deliver location data to 911 operators in real time, accurate to with 50-100 meters (depending on the application).
  • Non-real-time location can be just as revealing, if not more. Social- and Semantic Web enthusiasts are hot to deliver all sorts of geotagging services which are likely to be just as revealing as GPS. Yahoo and Apple are featuring geotagging in their photo services. New social mapping services such as Platial and Flappr will provide every bit as much detail, though not necessarily in real time. Imagine when the parent of a teenager discovered that teen has posted photos the are geo-tagged to show that they were taken at a house the teen was barred from visiting.

Location is hot. In a nice post on the significance of Google’s acquisition of mobile/microblogger Jaiku, Chris Messina writes:

The Web 2.0 Address Book isn’t really about how you connect to someone. It’s not really about having their home, work and secret lair addresses. It’s not about having access to their 15 different cell phone numbers that change depending on whether they’re home, at work, in the car, on a plane, in front of their computer and so on. It’s not about knowing the secret handshake and token-based smoke-signal that gains you direct access to send someone a guaranteed email that will bypass their moats of antispam protection. In the real world (outside of Silicon Valley), people want to type in the name of the recipient and hit send and have it reach the destination, in whatever means necessary, and in as appropriate a manner as possible. For this to happen, recipients need to provide a whole lot more information about themselves and their contexts to the system in order for this whole song and dance to work.

And the founder of Jaiku explains to the New York Times:

Petteri Koponen, one of the two founders of Jaiku, described the service as a “holistic view of a person’s life,” rather than just short posts. “We extract a lot of information automatically, especially from mobile phones,” Mr. Koponen said from Mountain View, Calif., where the company is being integrated into Google. “This kind of information paints a picture of what a person is thinking or doing.”

Privacy is not lost simply because people find these services useful and start sharing location. Privacy could be lost if we don’t start to figure what the rules are for how this sort of location data can be used. We’ve got to make progress in two areas:

  • technical: how can users sharing and usage preferences be easily communicated to and acted upon by others? Suppose I share my location with a friend by don’t want my employer to know it. What happens when my friend, intentionally or accidentally shares a social location map with my employer or with the public at large? How would my friend know that this is contrary to the way I want my location data used? What sorts of technologies and standards are needed to allow location data to be freely shared while respective users usage limitation requirements?
  • legal: what sort of limits ought there to be on the use of location data?
  • can employers require employees to disclose real time location data?
  • is there any difference between real-time and historical location data traces? (I doubt it)
  • under what conditions can the government get location data?

There’s clearly a lot to think about with these new services. I hope that we can approach this from the perspective that lots of location data will being flowing around and realize the the big challenge is to develop social, technical and legal tools to be sure that it is not misused.

What to do about Google and Doubleclick? Hold Google to it's word with some Extreme factfinding about privacy practices

Submitted by Danny Weitzner on Mon, 2007-10-08 11:24. ::

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

The proposed merger between Google and Doubleclick has raised hackles among those concerned about potential domination of the online advertising marketplace (especially Microsoft) but even more worry among privacy advocates. After a short talk over the weekend with a friend, Peter Swire, a thoughtful and knowledgeable privacy scholar, I came to the view that regulators have to develop a new, robust and scalable means of keeping track of what large data handlers such as Google are actually doing with personal information. (While the conversation with Peter was quite stimulating, I don’t know whether or not he agrees with what I’ve written here.) The mechanisms the exist today to help users make informed choices and policy makers set sound directions are simply inadequate to answer the kinds of questions posed by the Google-Doubleclick deal. Instead of formal, highly negotiated and scripted hearings, we need to much more open, flexible process in which technical experts and the interested public are able to ask detailed questions about current practices. This is not a criticism of either US or EU regulators. On both sides of the Atlantic there is a fine tradition of EU Data Protection Commissions and the US Federal Trade Commission engaging in careful and thoughtful probes of privacy-sensitive activities. However, these processes often take too long, end up producing results that are quite out of date. A lot of energy goes into addressing last year’s data handling practices by which time the leading edge of the industry has moved on.

In the 1990s, the FTC under Christine Varney’s leadership pushed operators of commercial websites to post policies stating how they handle personal information. That was an innovative idea at the time, but the power of personal information processing has swamped the ability of a static statement to capture the privacy impact of sophisticated services, and the level of generality at which these policies tend to be written often obscure the real privacy impact of the practices described. It’s time for regulators to take the next step and assure that both individuals and policy makers have information they need.

So, as part of investigating the Google-Doubleclick merger, regulators should appoint an independent panel of technical, legal and business experts to help them review, on an ongoing basis the privacy practices of Google. Key components of this process should be:

  • expert panel made up of those with technical, legal and business expertise from around the world
  • public hearings at which Google technical experts are available to answer questions about operational details of personal data handling
  • questions submitted by the public and organized in advance by the expert panel
  • staff support for the panel from participating regulatory agencies
  • real-time publication of questions and answers
  • An annual report summarizing what the panel has learned

The Internet open source and open standards communities have learned a lot over the last decade about how to use the Web to facilitate open, collaborative and often rapid development of new technologies. Web users reap the benefit of these open processes with easy access to high-quality software. Indeed, the very infrastructure of the Web and the Internet have been largely developed in this sort of open, extreme technology development process. Making public policy is different than developing technical designs, but the in-depth fact-finding that is needed to make sounds policy decisions could benefit a lot from the open, collaborative, online information gathering and sifting process that we already use for Web technology development. Of course, this would not supplant the traditional policy making role of regulators. Rather, this process would serve as a fact-gathering process to help inform regulators. If everyone was feeling really ambitious, perhaps there could even be cooperation between the various regulators around the world with a commitment to study the results from this process. Despite differences in privacy policy in different parts of the world, there has been an impressive record of information cooperation, especially at the staff level, amongst various privacy regulators around the world. This could be a good next step to take in that direction.

By way of background, regulators in the US (Federal Trade Commission) and Europe (Article 29 Working Party representing the EU’s Data Protection Authorities) are investigating both antitrust and privacy questions regarding the merger. The key privacy concern seems to be that Google would take all of the personal information it has about users (search terms, IP addresses, contents of email, location from map applications, etc.) and combine it with the personal data the Doubleclick has (demographics, click stream data from ads served) and end up with a REALLY powerful private surveillance machine.

Google says that they care about their user’s privacy rights and would never abuse the newfound power they propose to acquire. According to Nicole Wong:

“User, advertiser and publisher trust is paramount to the success of our business and to the success of our acquisition…. We can’t imagine taking any actions that would undermine these relationships or the trust people have in using our products and service.” (Washington Post, 20 April 2007)

But the question is: how will either policy makers or users know that their trust is being violated or pushed to an extreme that they’re not comfortable with? Google, to it’s credit, sees the need to provide more information about what it does with personal data. In testifying before the United States Senate, Google’s chief lawyer, David Drummond, said:

We are also exploring other ways to create more transparency in our privacy practices and policies. We have a lot of information about our privacy practices on our website, and we’re making that information even more accessible to users by adding video-format “tutorials” to help users understand privacy issues online in plain English. The first of these video tutorials has been viewed about 43,500 times on YouTube, and the second video launched earlier this week and has already been viewed hundreds of times.

But will expanded privacy policies and videos really be enough to help uses make sound decisions. Privacy regulations place a large, and I believe unsustainable, burden on users to learn the details of how services such as Google use their personal information and then weigh the current benefit of the service against the perceived privacy cost. There is mounting evidence that people will trade off a lot of future privacy risk in exchange for current convenience. I doubt that simply presenting users with more and more choices will help us arrive at a privacy policy that is sounds in the long run. For example, some privacy advocates (EPIC) demand that Google be required to get a explicit permission from all of who have Doubleclick cookies before the information associated with those cookies can be used together with personal information from Google. EPIC also asks that a lot of other information about Google’s information handling practices be made available to users, consistent with traditional privacy notions of notice and access to personal information.

Imagine the question that Google might ask when seeking permission from a user to associate their Doubleclick cookie with Google data in a mobile search application:

Google Dialog Box (FAKE): We’d like to us some of the demographic information we have about you to give you more accurate, convenient directions on your mobile phone. We will also use this data to target ads to you, just like we do with you GMail account. Click ‘Yes’ to agree or ‘No’ and they you’ll be asked to type the latitude and longitude of your ten favorite locations.

The query may not be so extreme, but the idea will be the same.

So my view is that users could use a bit of help making these decisions. That help ought to come in the form of some baseline rules about how personal information can and cannot be used. The days of saying that all users need is ‘free choice’ are over. Of course, the problems discussed here with respect to Google apply equally to many other services on the Web that handle personal information. Google and it’s merger proposal presents a good opportunity to start figuring our some of these questions, but the process and the answers would be applicable to many others as well. In order to figure out what policies should actually govern how data is used, a careful and ongoing investigate of Google’s practices, with the help of the independent board I have suggested above, would be a good place to start.

New Commenting Policy

Submitted by ryanlee on Tue, 2007-10-02 12:57. :: |

I've added a new commenting policy to combat our OpenID-based spammers. It's a whitelist based on FOAF. There will be more about it in this space as development moves forward, so stay tuned if you'd like to know how to be placed on the whitelist.

As for specifics, the whitelist implementation is a Drupal module that reads a list of OpenIDs from an externally generated set every hour. The user's OpenID is checked against the whitelist at login time, and matches are allowed to proceed with account creation, commenting, etc.

Potential improvements:

  • a UI for settings
  • better database transaction flow, particularly for error handling
  • viewable whitelist

Feel free to contact me if you're interested in the module portion of this equation for use with your own openid.module (again, it does none of the whitelist generation).

Soccer schedules, flight itineraries, timezones, and python web frameworks

Submitted by connolly on Wed, 2007-09-12 17:17. :: | | | |

The schedule for this fall soccer season came out August 11th. I got the itinerary for the trip I'm about to take on July 26. But I just now got them synchronized with the family calendar.

The soccer league publishes the schedule in somewhat reasonable HTML; to get that into my sidekick, I have a Makefile that does these steps:

  1. Use tidy to make the markup well-formed.
  2. Use 100 lines of XSLT (soccer-schedfix.xsl) to add hCalendar markup.
  3. Use glean-hcal.xsl to get RDF calendar data.
  4. Use to upload the calendar items via XMLRPC to the danger/t-mobile service, which magically updates the sidekick device.

But oops! The timezones come out wrong. Ugh... manually fix the times of 12 soccer games... better than manually keying in all the data... then sync with the family calendar. My usual calendar sync Makefile does the following:

  1. Use to download the calendar and task data via XMLRPC.
  2. Use to filter by category=family, convert from danger/sidekick/hiptop conventions to iCalendar standard conventions pour the records into a kid template to produce RDF Calendar (and hCalendar).
  3. Use to convert RDF Calendar to .ics format.
  4. Upload to family WebDAV server using curl.

Then check the results on my mac to make sure that when my wife refreshes her iCal subscriptions it will look right.

Oh no! The timezones are wrong again!

The sidekick has no visible support for timezones, but the start_time and end_time fields in the XMLRPC interface are in Z/UTC time, and there's a timezone field. However, after years with this device, I'm still mystified about how it works. The Makefiles approach is not conducive to tinkering at this level, so I worked on my REST interface, until it had crude support for editing records (using JSON syntax in a form field). What I discovered is that once you post an event record with a mixed up timezone, there's no way to fix it. When you use the device UI to change the start time, it looks OK, but the Z time via XMLRPC is then wrong.

So I deleted all the soccer game records, carefully factored the danger/iCalendar conversion code out of into for ease of testing, and goit it working for local Chicago-time events.

Then I went through the whole story again with my itinerary. Just replace tidy and soccer-schedfix.xsl with to get the itinerary from SABRE's text format to hCalendar:

  1. Upload itinerary to the sidekick.
  2. Manually fix the times.
  3. Sync with iCal. Bzzt. Off by several hours.
  4. Delete the flights from the sidekick.
  5. Work on some more.
  6. Upload to the sidekick again. Ignore the sidekick display, which is right for the parts of the itinerary in Chicago, but wrong for the others.
  7. Sync with iCal. Win!

I suppose I'm resigned that the only way to get the XMLRPC POST/upload right (the stored Z times, at least, if not the display) is to know what timezone the device is set to when the POST occurs. Sigh.

A March 2005 review corroborates my findings:

The Sidekick and the sync software do not seem to be aware of time zones. That means that your PC and your Sidekick have to be configured for the same time zone when they synchronize, else your appointments will be all wrong. is about my 5th iteration on this idea of a web server interface to my PDA data. It uses WSGI and JSON and Genshi, following Joe G's stuff. Previous itertions include:

  1. - quick n dirty perl hack (started April 2001)
  2. - screen scraping (Dec 2002)
  3. - XMLRPC with a python shelf and hardcoded RDF/XML output (Feb 2004)
  4. - conversion logic in python with kid templates and SPARQL-like filters over JSON-shaped data (March 2006)
It's pretty raw right now, but fleshing out the details looks like fun. Wish me luck.

Technical standards and role of democracy

Submitted by Danny Weitzner on Tue, 2007-09-04 09:33. ::

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

The pitched lobbying battle over whether the Microsoft Open Office XML specification should be recognized as an international technical standard has brought to the mainstream press in the US the important, though generally obscure, question of just how global information technology standards ought to be set. The debate has gotten a lot of attention because of accusations of ‘vote buying,’ something that everyone can understand and have an opinion about, but I think this misses the real issue: Is the traditional ‘one-country, one-vote approach to setting technical standards that we have inherited from the 19th century the best way to set global information technology standards the the Internet, the Web, and other widely used pieces of information infrastructure?

Here’s some background: This morning’s New York Times reports (”Microsoft Favored to Win Open Document Vote,” Kevin O’Brien, 4 Sept 2007) that

Amid intense lobbying, Microsoft is expected to squeak out a victory this week to have its open document format, Office Open XML, recognized as an international standard, people tracking the vote said Monday….“After what basically has amounted to unprecedented lobbying, I think that Microsoft’s standard is going to get the necessary amount of support,” said Pieter Hintjens, president of Foundation for a Free Information Infrastructure, a Brussels group that led the opposition.

But a recent report in PC Magazine (”ISO votes to reject Microsoft’s OOXML as standard,” Peter Sayer, 4 Sept 2007) indicates that

Microsoft Corp. has failed in its attempt to have its Office Open XML document format fast-tracked straight to the status of an international standard by the International Organization for Standardization.

The proposal must now be revised to take into account the negative comments made during the voting process.

This early report is confirmed by the Wall Street Journal (”Microsoft Fails to Win Approval On File Format for Office,” CHARLES FORELLE, 4 Sept 2007)

All of this follow reports that Microsoft was engaging in vote-buying (”Sweden’s OOXML vote declared invalid,” Infoworld, Martin Wallström, 31 August 2007) and (”Microsoft Memo to Partners in Sweden Surfaces: Vote Yes for OOXML - Updated,” Groklaw, 29 August 2007) so the question has become whether or not the process was fair.

This Tammany Hall-style debate, really misses the point though. Of course it’s ethically objectionable for any participant in this process to ‘buy’ or offer incentives for votes. But let’s suppose that none of that sort of behavior happened, does the world really arrive at the best IT standards by taking a vote of all countries on the planet? Broad participation in standards development is vital, but the experience of the Internet and the Web suggests that the primary mode of participation should software developers writing programs, not governments casting votes. The Internet (through the IETF) and the World Wide Web (through W3C), along with lots of valuable open source software, have evolved into global standards through a much more bottom-up, consensus based process that sets standards based on a much more meritocratic, substantive assessment of which parts of which technical specifications are actually used to make systems interoperable. When it comes to Internet and Web standard setting, we don’t just take a vote to anoint a design as a standard, we combine working groups developing specifications with a requirement that the features proposed for standardization are actually implemented, widely used, and have been demostrated to be interoperable across a range of products and services. (At W3C and (sometimes) OASIS we also require that everyone who participates in the standards setting process make assurances that the standard can be implemented without paying patent license fees.)

The Internet and the Web have grown into truly open platforms because of a process that grants ’standards’ status to technology AFTER it has proven that it has consensus support behind it and that it is actually the basis for interoperability. The strength of this process is that design ideas are subjected to the test of user acceptance and the marketplace. It’s clear that this sort of technical scrutiny, rather than vote-counting, would serve the process well. As the Wall Street Journal article explains, the underlying dispute is really a set of technical questions:

Those opposed to Open XML say it isn’t really open at all — that it is actually so complex and so loaded with Microsoft-specific features that no one but Microsoft can use it fully. Critics also allege technical failings and say the format needlessly duplicates an existing format, called Open Document, used by IBM and many open-source programmers.

Microsoft says it has opened up the Office formats to encourage competition and interoperability, not squelch it. Open XML should be a standard in addition to Open Document, Microsoft argues, because Open XML allows for more features.

The way to figure out whether OOXML is good for interoperability or not is to see whether independent developers actually use it.

Is there politics in this process, of course! But the fact that the basis decisions to be made have to do with assessment of implementation experience, as opposed to vote counting. This is in contrast to the traditional standards process which accords the label ’standard’ based on a vote of country representatives. Democracy is a great thing for government decisions, but not a great way to design new technology.

Does broadband speed matter

Submitted by Danny Weitzner on Wed, 2007-08-29 10:25. ::

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

The Washington Post reports this morning (Japan’s Warp-Speed Ride to Internet Future, 29 August 2007) that in Japan

Broadband service here is eight to 30 times as fast as in the United States — and considerably cheaper. Japan has the world’s fastest Internet connections, delivering more data at a lower cost than anywhere else, recent studies show.

And that the comparative disadvantage faced by Internet users in the US means that

The speed advantage allows the Japanese to watch broadcast-quality, full-screen television over the Internet, an experience that mocks the grainy, wallet-size images Americans endure.

The clear message is that any country without widely deploy, fast broadband Internet access will fall behind in the competitive marketplace for developing new Internet services.

While it’s clear that Japanese users do actually have access to more affordable, high bandwidth access, it’s less clear to me that this will result in the US falling behind in the application/service development arena. The one example that the article cites for this proposition that Japan is racing ahead is that new telemedicine services are now possible:

The burgeoning optical fiber system is hurtling Japan into an Internet future that experts say Americans are unlikely to experience for at least several years.

Shoji Matsuya, director of diagnostic pathology at Kanto Medical Center in Tokyo, has tested an NTT telepathology system scheduled for nationwide use next spring.

It allows pathologists — using high-definition video and remote-controlled microscopes — to examine tissue samples from patients living in areas without access to major hospitals. Those patients need only find a clinic with the right microscope and an NTT fiber connection.

“Before, we did not have the richness of image detail,” Matsuya said, noting that Japan has a severe shortage of pathologists. “With this equipment, I think it is possible to make a definitive remote diagnosis of cancer.”

Japan’s leap forward, as the United States has lost ground among major industrialized countries in providing high-speed broadband connections, has frustrated many American high-tech innovators.

Is it really that case that this is impossible in the US? After all, what’s required for this sort of remote diagnostic service is just broadband to the clinic with the microscope. Surely if a clinic can afford one of these microscopes than it shouldn’t be too hard to pay for the added cost of broadband service, even if it’s a bit more expensive in the US than in Japan. What’s more, it’s not clear that many such new applications require the super-high bandwidth offered by fiber. A one or two minute delay in getting the microscope image from clinic to remote doctor hardly seems ‘fatal’.

All of the statistics about US broadband lag cited in this article and elsewhere refer to overall residential broadband penetration. But applications such as telemedicine don’t depend on universal broadband access. In the end I find that arguments the “we’re falling behind” panic to be a little thin.

More than bandwidth, I really worry about lack of openness in the operation of our Internet infrastructure. We’ve seen pretty extraordinary innovation in Web-based services over the last decade and I’d argue that the key to this has been open, non-discriminatory provision of Internet access services.

“The experience of the last seven years shows that sometimes you need a strong federal regulatory framework to ensure that competition happens in a way that is constructive,” said Vinton G. Cerf, a vice president at Google…. The opening of Japan’s copper phone lines to DSL competition launched a “virtuous cycle” of ever-increasing speed, said Cisco’s [Robert] Pepper. The cycle began shortly after Japanese politicians — fretting about an Internet system that in 2000 was slower and more expensive than what existed in the United States — decided to “unbundle” copper lines.
In the United States, a similar kind of competitive access to phone company lines was strongly endorsed by Congress in a 1996 telecommunications law. But the federal push fizzled in 2003 and 2004, when the Federal Communications Commission and a federal court ruled that major companies do not have to share phone or fiber lines with competitors. The Bush administration did not appeal the court ruling.

So, if there’s a choice between more bandwidth or more openness, I’m for the open platform.

Units of measure and property chaining

Submitted by connolly on Tue, 2007-07-31 13:42. :: | | | |

We're long overdue for standard URIs for units of measure in the Semantic Web.

The SUMO stuff has a nice browser (e.g. see meter), a nice mapping from wordnet, and nice licensing terms. Of course, it's not RDF-native. In particular, it uses n-ary relations in the form of functions of more than one argument; 1 hour is written (&%MeasureFn 1 &%HourDuration). I might be willing to work out a mapping for that, but other details in the KIF source bother me a bit: a month is modelled conservatively as something between 28 and 31 days, but a year is exactly 365 days, despite leap-years. Go figure.

There's a nice Units in MathML note from November 2003, but all the URIs are incomplete, e.g. http://.../units/yard .

The Sep 2006 OWL Time Working Draft has full URIs such as, but its approach to n-ary relations is unsound, as I pointed out in a Jun 2006 comment.

Tim sketched the Interpretation Properties idiom back in 1998; I don't suppose it fits in OWL-DL, but it appeals to me quite a bit as an approach to units of measure. He just recently fleshed out some details in Units of measure are modelled as properties that relate quantities to magnitudes; for example:

 track length [ un:mile 0.25].

This Interpretation Properties approach allows us to model composition of units in the natural way:

W is o2:chain of (A V).

where o2:chain is like property chaining in OWL 1.1 (we hope).

Likewise, inverse units are modelled as inverse properties:

s a Unit; rdfs:label "s".
hz rdfs:label "Hz"; owl:inverseOf s.

Finally, scalar conversions are modelled using product; for example, mile is defined in terms of meter like so:

(m 0.0254) product inch.
(inch 12) product foot.
(foot 3) product yard.
(yard 22) product chain.
(chain 10) product furlong.
(furlong 8)product mile.

I supplemented his ontology with some test/example cases, unit_ex.n3 and then added a few rules to flesh out the modelling. These rules converts between meters and miles:

# numeric multiplication associates with unit multiplication
{ (?U1 ?S1) un:product ?U2.
(?U2 ?S2) un:product ?U3.
(?S1 ?S2) math:product ?S3
} => { (?U1 ?S3) un:product ?U3 }

# scalar conversions between units
{ ?X ?UNIT ?V.
(?BASE ?CONVERSION) un:product ?UNIT.
(?V ?CONVERSION) math:product ?V2.
} => { ?X ?BASE ?V2 }.

Put them together and out comes:

    ex:track     ex:length  [
:chain 20.0;
:foot 1320.0;
:furlong 2.0;
:inch 15840.0;
:m 402.336;
:mile 0.25;
:yard 440.0 ] .

The rules I wrote for pushing conversion factors into chains isn't fully general, but it works in cases like converting from this:

(un:foot un:hz) o2:chain fps.
bullet speed [ fps 4000 ].

to this:

    ex:bullet     ex:speed  [
ex:fps 4000;
:mps 1219.2 ] .

As I say, I find this approach quite appealing. I hope to discuss it with people working on units of measure in development of a Delivery Context Ontology.

More on privacy issues with Apple's DRM-less iTunes Plus

Submitted by Danny Weitzner on Sun, 2007-06-03 21:14. ::

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

There’s been more discussion of Apple iTunes Plus DRM-less music and its practice of embedded personal account information into the tracks that are sold without copy protection. I’ve earlier expressed my support for this accountability approach to copyright protection, as opposed to burdensome DRM systems. However, privacy complaints (BBC, Anger over DRM-free iTunes tracks) are appearing over the use of personal information in this way.

Looking through Apple’s privacy policy (updated 23 December 2004) and iTunes terms of service (updated 30 May 2007 I found no mention of this otherwise hidden use personal information. The terms of service does say:

(xii) iTunes Plus Products do not contain security technology that limits y our usage of such Products, and Usage Rules (iii) – (vi) do not apply to iTunes Plus Products. You may copy, store and burn iTunes Plus Products as reasonably necessary for personal, noncommercial use.

Seems that this would have been a good place to indicate the new use of users information. A simple notice here that passing tracks, which appears to be permitted as long as it is for “personal, non-commercial use,” also results in having your personal information passed around. Perhaps I missed this or perhaps Apple plans to add it. I’m going to ask around to get clarification.

Update: EFF and O’Reilly also report that the iTunes files may have individual differences (that could allegedly be used for individual tracking) even beyond the personal information that is visible.

A glimse of sanity in the online copyright arena

Submitted by Danny Weitzner on Thu, 2007-05-31 21:56. ::

The original appearance of this entry was in Danny Weitzner - Open Internet Policy

With Apple’s announcement of DRM-free music downloadable through iTunes, it appears that we may actually be heading toward a sane, scalable approach to copyrighted commercial content on the Web. Tracks from EMI and other music publishers can now be purchased in two versions, a locked up version for the usual 99 cents or a higher-quality and DRM-free version for $1.29. I got an entire album (Jacqueline Du Pre playing the Dvorák & Elgar Cello Concertos with the Chicago Symphony) for a mere $9.95 in unlocked form.

As several observers have pointed out, these DRM-free tracks do come with a catch — your name is embedded inside the MPEG-4 file so that if you decide to casually share these files around with your hundred thousand closest friends on the Net (exactly the result the DRM has tried, unsuccessfully, to prevent) then you’re at some risk of getting caught and of having personal information spread around the Net with your illegally-copied files. Following some instructions from an independent Apple news blog, I was able to verify that my name was put into these files upon being downloaded

[Daniel-Weitzners-Computer:iTunes Music/...] djweitzn% strings *.m4a | grep name
nameDaniel Weitzner
nameDaniel Weitzner
nameDaniel Weitzner
nameDaniel Weitzner

In addition to my name it appears that my .mac account id, through which I purchased the tracks, was also included.

The big news here goes beyond just copyright. Apple has decided to jettison heavyweight DRM enforcement in favor of an approach that allows the free flow of data with back-end accountability. I believe this is just one step in a larger trend toward what I’ve been calling ‘accountable systems.’

An exclusive reliance on access restrictions such as DRM leads to technology and policy perspectives where information, once revealed, is completely uncontrolled. It’s like focusing all one’s attention on closing the barn door and ignoring what might happen to the horses after they’ve escaped. The reality is that even when information is widely available, society has interests in whether or not that information is used appropriately. Information policies should reflect those interests, and information technology should support those policies.

In research we’ve been doing on accountable systems approaches to privacy and copyright, we seek an alternative to the “hide it or lose it” approach that currently characterizes policy compliance on the Web. Our alternative is to design systems that are oriented toward information accountability and appropriate use, rather than information security and access restriction. I think what Apple is doing here will come to be seen as the an early step in a large-scale transformation in how we approach a wide variety of policy issues on the Web.

Watch this space for more.

Linked Data at WWW2007: GRDDL, SPARQL, and Wikipedia, oh my!

Submitted by connolly on Thu, 2007-05-17 16:29. :: | | |

Last Tuesday, TimBL started to gripe that the WWW2007 program had lots of stuff that he wanted to see all at the same time; we both realized pretty soon: that's a sign of a great conference.

That afternoon, Harry Halpin and I gave a GRDDL tutorial. Deploying Web-scale Mash-ups by Linking Microformats and the Semantic Web is the title Harry came up with... I was hesitant to be that sensationalist when we first started putting it together, but I think it actually lived up to the billing. It's too bad last-minute complications prevented Murray Maloney from being there to enjoy it with us.

For one thing, GRDDL implementations are springing up all over. I donated my list to the community as the GrddlImplementations wiki topic, and when I came back after the GRDDL spec went to Candidate Recommendation on May 2, several more had sprung up.

What's exciting about these new implementations is that they go beyond the basic "here's some RDF data from one web page" mechanism. They're integrated with RDF map/timeline browsers, and SPARQL engines, and so on.

The example from the GRDDL section of the semantic web client library docs (by Chris Bizer, Tobias Gauß, and Richard Cyganiak) is just "tell me about events on Dan's travel schedule" but that's just the tip of the iceberg: they have implemented the whole LinkedData algorithm (see the SWUI06 paper for details).

With all this great new stuff popping up all over, I felt I should include it in our tutorial materials. I'm not sure how long OpenLink Virtuoso has had GRDDL support (along with database integration, WEBDAV, RSS, Bugzilla support, and on and on), but it was news to me. But I also had to work through some bugs in the details of the GRDDL primer examples with Harry (not to mention dealing with some unexpected input on the HTML 5 decision). So the preparation involved some late nights...

I totally forgot to include the fact that Chime got the Semantic Technologies conference web site using microformats+GRDDL, and Edd did likewise with XTech.

But the questions from the audience showed they were really following along. I was a little worried when they didn't ask any questions about the recursive part of GRDDL; when I prompted them, they said they got it. I guess verbal explanations work; I'm still struggling to find an effective way to explain it in the spec. Harry followed up with some people in the halls about the spreadsheet example; as mnot said, Excel spreadsheets contain the bulk of the data in the enterprise.

One person was even followingn along closely enough to help me realize that the slide on monotonicity/partial understanding uses a really bad example.

The official LinkedData session was on Friday, but it spilled over to a few impromptu gatherings; on Wednesday evening, TimBL was browsing around with the tabulator, and he asked for some URIs from the audience, and in no time, we were browsing protiens and diseases, thanks to somebody who had re-packaged some LSID-based stuff as HTTP+RDF linked data.

Giovanni Tummarello showed a pretty cool back-link service for the Semantic Web. It included support for finding SPARQL endpoints relevant to various properties and classes, a contribution to the serviceDescription issue that the RDF Data Access Working Group postponed. I think I've seen a few other related ideas here and there; I'll try to put them in the ServiceDescription wiki topic when I remember the details...

Chris Bizer showed that dbpedia is the catalyst for an impressive federation of linked data. Back in March 2006, Toward Semantic Web data from Wikipedia was my wish into the web, and it's now granted. All those wikipedia infoboxes are now out there for SPARQLing. And other groups are hooking up musicbrainz and wordnet and so on. After such a long wait, it seems to be happening so fast! 

Speaking of fast, the Semantic MediaWiki project itself is starting to do performance testing with a full copy of wikipedia, Denny told us on Friday afternoon in the DevTrack.

Also speaking of fast, how did OpenLink go from not-on-my-radar to supporting every Semantic Web Technology I have ever heard of in about a year? I got part of the story in the halls... it started with ODBC drivers about a decade ago, which explains why their database integration is so good. Kingsley, here's hoping we get to play volleyball sometime. It's a shame we had just a few short moments together in the halls...

tags: (photos), grddl, www2007, travel