I’ve loved the idea of FOAF for a long time but always been bothered by the privacy risks that would result of FOAF really took off as a way to represent our social networks. Here’s an idea about how to address privacy in open social networks such as those represented by FOAF-like data structures.
It’s called (for now) REP: Reciprocal Privacy for Social Networks
ReP is a proposal to establish a reasonable privacy balance in social networking environment. Today, more and more social networks are coming onto the Web and are working to share more data across the previously-established boundaries that have previously separate these networks. Participants in social networks should have the benefit of widely shared agreements about how the information they present in those networks will be analyzed and used. To encourage the development of these social and legal privacy norms, we need a simple policy language for expressing rules associated with personal information, and a reliable, scalable mechanism for assessing accountability with those rules. We propose a new protocol by which those who share personal information on the Web can have increased confidence that this information will be used in a transparent manner and that users of the personal information will be able to be held accountable to comply with the stated usage rules.
Privacy policies and associated technologies must provide individuals harmed by breaches with legal recourse against those who abuse the norms of information usage. Hence, agreements must be clear and structured in a manner that there is a chance that the existing legal system could provide a remedy for harm. We should neither expect nor require than a single set of norms will be adequate for all users, all social networking contexts or all cultures, but there should be a common framework and a basic policy vocabulary that can express commonly used rules and be easily extended.
This copy-left-inspired viral policy is the most effective way to assure that the original rules associated with personal data are respected as that data is re-used over and over again in a variety of contexts. In the event of misuse, the logs will provide a means to locate the mis-user and seek correction or other redress. In the event that a use of personal information is discovered which is NOT recorded in the person’s accountability log, that use is by definition a violation of the ReP policy. In many cases where such unauthorized use does real harm to the data subject, it will be possible with some amount of forensic effort will find the mis-user and enable redress. Of course, there will be anonymous mis-users of personal information. We cannot insulate Web users from those risks with ReP, but neither can any other privacy protection strategy that is feasible in an inherently open information environment.
There’s more to read in a skeletal REP design document.
The policy is still rough and the technology hasn’t been built yet, but I’d still really like reactions.
I have a new bookmark. No, not a del.icio.us bookmark; not some bits in a file. This is the kind you have to go there to get.. go to Cleveland, that is. It reads:
for you love & support
for the Ikpia & Ogbuji families
At this time of real need.
We will never forget
Imose, Chica, & Anya
Abundant Life International Church
Highland heights, OH
After working with Chime for a year or so on the GRDDL Working Group (he was the difference between a hodge-podge of test files and a nicely organized GRDDL Test Cases technical report), I was really excited to meet him at the W3C Technical Plenary in Cambridge in early November. His Fuxi work is one of the best implementations of the way I think semantic web rules and proofs should go. When he told me some people didn't see the practical applications, it made me want to fly there and tell them how I think this will save lives and end world hunger.
So this past Tuesday, when I read the news about his family, the only way I could make my peace with it was to go and be with him. I can only imagine what he is going through. Eric Miller and Brian and David drove me to the funeral, but the line to say hi to the family was too long. And the internment service didn't really provide an opportunity to talk. So I was really glad that after I filled my plate at the reception, a seat across from Chime and Roschelle opened up for me and I got to sit and share a meal with them.
Grandpa Linus was at the table, too. His eulogy earlier at the funeral ended with the most inspiring spoken rendition of a song that I have ever heard:
Now The Hacienda's Dark The Town Is Sleeping
Now The Time Has Come To Part The Time For Weeping
Vaya Con Dios My Darling
Vaya Con Dios My Love
Seems that a number of villages in the English countryside are being overrun by errant trans-European trucks which are regularly misdirected by their GPS satnav systems onto roads that were better suited for horse-drawn carriages than big, long-distance trucks. According to the New York Times (”Wedmore Journal: Turn Back. Exit Village. Truck Shortcut Hitting Barrier.” Sarah Lyall, 4 December 2007, p.A7):
trucks and tractor-trailers come here [to Wedmore] all the time, as they do in similarly inappropriate spots across Britain, directed by G.P.S. navigation devices that fail to appreciate that the shortest route is not always the best route. “They have no idea where they are,” said Wayne Hahn, a local store owner who watches a daily parade of vehicles come to grief — hitting fences, shearing mirrors from cars and becoming stuck at the bottom of Wedmore’s lone hill.
The head of the parish council offers a practical suggestion:
John Sanderson, chairman of the parish council, has proposed a seemingly simple remedy: removing the route through Wedmore from the G.P.S. navigation systems used by large vehicles.
“We’d like them to have appropriate systems that would show some routes weren’t suitable for H.G.V.’s,” Mr. Sanderson said, using shorthand for heavy goods vehicles.
Mr. Sanderson said he would not go so far as to advocate eradicating Wedmore from the map.
But others go farther:
“We’ve said, ‘Just take us off the map,’ actually,” said Geoff Coombs, chairman of the parish council in Barrow Gurney, a village that, despite being too small to have a sidewalk, is host to some 15,000 vehicles a day, cars as well as larger vehicles, whose G.P.S. systems identify it as a good alternative route to Bristol Airport.
Semantic web geo-taggers, start your engines. There are lots of ways creative metadata could help here, but my guess is that as the Web gets ’smarter,’ some of what happens out in the world as a result will seem just plain dumb.
Last week, a Federal Magistrate in Wisconsin published an important opinion articulating limits on the government’s power to demand access to records of individuals’ book-buying activity held by 3rd parties such as Amazon.com. The case (IN RE GRAND JURY SUBPOENA TO AMAZON.COM DATED AUGUST 7, 2006) arose in the course of an FBI/IRS investigation of an individual who sells lots of used books on Amazon and was suspected of large-scale tax evasion. In order to develop the case, the Federal investigators acting through a grand jury:
directed Amazon to provide virtually all of its records regarding D’Angelo, including the identities of the thousands of customers who had bought used books from D’Angelo. The government subsequently chose to reduce this scope of this request to the identification of 120 book buyers, 30 per year for the four years under investigation. The government’s plan was for special agents of the FBI and IRS to contact these 120 used book buyers in an attempt to develop concrete evidence necessary to lay a transactional foundation for criminal charges of fraud and tax evasion against D’Angelo. The government does not suspect Amazon or D’Angelo’s customers of any wrongdoing, nor does it consider them victims of D’Angelo; they simply are bricks in the evidentiary wall being erected by the grand jury.
Rather than comply with the subpoena, Amazon exercised its legal right to move the government request ‘quashed’ as it allowed under law. Responding to this motion to quash, the Magistrated acted to to protect the First Amendment rights of the buyers whose identity would be revealed if Amazon responded to the subpoena. The Magistrate concluded that “the government is not entitled to unfettered access to the identities of even a small sample of this group of book buyers without each book buyer’s permission.” Hence, he ordered that a special procedure by which those Amazon customers who bought from the suspect during the relevant time period would be asked in an a manner that did not reveal their identity whether they would be willing, on a voluntary basis, to have their records turned over to the government.
In the end, the government withdrew the subpoena altogether, telling the Wisconsin State Journal that they were able to get names by analyzing the suspects seized computer.
Beyond the First Amendment rationale offered in this case, more striking is the Magistrates assessment of the public mood with respect to privacy in general in the wake of the Patriot Act and warrentless wiretapping activity.
…[I]t is an unsettling and un-American scenario to envision federal agents nosing through the reading lists of law-abiding citizens while hunting for evidence against somebody else. In this era of public apprehension about the scope of the USAPATRIOT Act, the FBI’s (now-retired) “Carnivore” Internet search program, and more recent highly-publicized admissions about political litmus tests at the Department of Justice, rational book buyers would have a non-speculative basis to fear that federal prosecutors and law enforcement agents have a secondary political agenda that could come into play when an opportunity presented itself. Undoubtedly a measurable percentage of people who draw such conclusions would abandon online book purchases in order to avoid the possibility of ending up on some sort of perceived “enemies list.”
While cautioning (in a footnote) that he did not formally recognize these fears to be well-founded, none the less he felt he had to act to limit government power in this case because:
…if word were to spread over the Net–and it would–that the FBI and
the IRS had demanded and received Amazon’s list of customers and their personal purchases,the chilling effect on expressive e-commerce would frost keyboards across America. Fiery rhetoric quickly would follow and the nuances of the subpoena (as actually written and served) would be lost as the cyberdebate roiled itself to a furious boil. One might ask whether this court should concern itself with blogger outrage disproportionate to the government’s actual demand of Amazon. The logical answer is yes, it should: well-founded or not, rumors of an Orwellian federal criminal investigation into the reading habits of Amazon’s customers could frighten countless potential customers into canceling planned online book purchases, now and perhaps forever.
There are two very important caveats to add, however. First, this opinion is only that of one Federal magistrate in one district court. It is not binding on any other part of the country and there are often widely divergent opinions from magistrates. Second, we don’t know who this reasoning might apply to a subpoena issued by a private party in civil litigation (say a divorce lawyer looking to impugn the integrity of an opposing spouse by revealing unsavory reading habits). Finally, as the government dropped its request altogether, this case will never be heard by any other court to be either affirmed or overturned. So, it will hang out there as one view of the privacy problems associated with subpoenas of private information held by 3rd parties.
-not clear how it applies to civil subpoenas in privacy litigation
The Washington Post reports today (”System Lets Agencies In Area Share Data,” Mary Beth Sheridan, Thursday, November 29, 2007; Page B03) that over 60 state, local and federal law enforcement agencies in the Washington DC area announced a plan to share information (including 6 million mug shots and 14 million arrest records).
In what they called a breakthrough, law enforcement officials yesterday unveiled a computer system that will allow more than 60 state and local police agencies in the D.C. area to share mug shots and crime reports.
The system, Law Enforcement Information Exchange (LInX), functions like Google for police, except that the database contains law enforcement information.
This despite that fact that several years ago some civil libertarians criticized an earlier version of this this multi-state data sharing system ominously named MATRIX (Multistate Anti-Terrorism Information Exchange). The ACLU even went so far as to declare MATRIX ‘dead.‘ However, now it seems that LInX includes participation from Florida, Georgia, Hawaii, Texas, Virginia, Washington, the DC-area, and soon New Mexico.
I don’t know what sort of technology the system is built on but if it’s not Semantic Web Linked Data-style architecture now, it should be and probably soon will be.
On leaving the ICANN Board of Directors after a 3 year term, Joi Ito, one of the true leaders of the global Internet/Web community, writes:
Joi Ito’s Web: Three years with ICANN
With all of it’s tumultuous history and bumps and warts, ICANN, in my opinion, is the best way that we can manage names and numbers on the Internet and any new thing to try to do what it does would be less fair and probably wouldn’t work.
There are some technical architectures and ideas that might make ICANN less relevant, which would be a good thing. However, even relatively obvious things like IPv6, IDNs and DNSEC are having a hard time getting traction. I think that it would be nearly impossible to “redesign the DNS” and get people to use it. It would be like trying to redesign a flying airplane. On the other hand, their might be some evolutionary changes that make domain names less relevant.
While ICANN must continue to improve its openness and public accountability, I wholeheartedly support Joi’s view. Anyone reading this post or able to follow the link to his original owe’s ICANN a real ‘thank you.’
Well, it has been a long time since my last post here. So many topics, so little time. Some talks, a couple of Design Issues articles, but no blog posts. To dissipate the worry of expectation of quality, I resolve to lower the bar. More about what I had for breakfast.
So The Graph word has been creeping in. BradFitz talks of the Social Graph as does Alex Iskold, who discusses social graphs and network theory in general, points out that users want to own their own social graphs. He alo points out that examples of graphs are the Internet and the Web. So what's with the Graph word?
Maybe it is because Net and Web have been used. For perfectly good things .. but different things.
The Net we normally use as short for Internet, which is the International Information Infrastructure. Al Gore promoted the National Information Infrastructure (NII) presumably as a political pragma at the time, but clearly it became International. So let's call it III. Let's think about the Net now as an invention which made life simpler and more powerful. It made it simpler because of having to navigate phone lines from one computer to the next,you could write programs as though the net were just one big cloud, where messages went in at your computer and came out at the destination one. The realization was, "It isn't the cables, it is the computers which are interesting". The Net was designed to allow the computers to be seen without having
to see the cables.
Simpler, more powerful. Obvious, really.
Programmers could write at a more abstract level. Also, there was re-use of the connections, in that, as the packets flowed, a cable which may have been laid for one purpose now got co-opted for all kinds of uses which the original users didn't dream of. And users of the Net, the III, found that they could connect to all kinds of computers which had been hooked up for various reasons, sometimes now forgotten. So the new abstraction gave us more power, and added value by enabling re-use.
The word Web we normally use as short for World Wide Web. The WWW increases the power we have as users again. The realization was "It isn't the computers, but the documents which are interesting". Now you could browse around a sea of documents without having to worry about which computer they were stored on. Simpler, more powerful. Obvious, really.
Also, it allowed unexpected re-use. People would put a document on the web for one reason, but it would end up being found by people using it in completely different ways. Two delights drove the Web: one of being told by a stranger your Web page has saved their day, and the other of discovering just the information you need and for which you couldn't imagine someone having actually had the motivation to provide it.
So the Net and the Web may both be shaped as something mathematicians call a Graph, but they are at different levels. The Net links computers, the Web links documents.
Now, people are making another mental move. There is realization now, "It's not the documents, it is the things they are about which are important". Obvious, really.
Biologists are interested in proteins, drugs, genes. Businesspeople are interested in customers, products, sales. We are all interested in friends, family, colleagues, and acquaintances. There is a lot of blogging about the strain, and total frustration that, while you have a set of friends, the Web is providing you with separate documents about your friends. One in facebook, one on linkedin, one in livejournal, one on advogato, and so on. The frustration that, when you join a photo site or a movie site or a travel site, you name it, you have to tell it who your friends are all over again. The separate Web sites, separate documents, are in fact about the same thing -- but the system doesn't know it.
There are cries from the heart (e.g The Open Social Web Bill of Rights) for my friendship, that relationship to another person, to transcend documents and sites. There is a "Social Network Portability" community. Its not the Social Network Sites that are interesting -- it is the Social Network itself. The Social Graph. The way I am connected, not the way my Web pages are connected.
We can use the word Graph, now, to distinguish from Web.
I called this graph the Semantic Web, but maybe it should have been Giant Global Graph! Any worse than WWWW? ;-) Not the "Semantic Web" term has been established for a long time, I'm not proposing to change it. But let's think about the graph which it is. (Footnote: "Graph" also happens to be the word the RDF specifications use, but that is by the way. While an XML parser creates a DOM tree, an RDF parser creates an RDF graph in memory.)
So, if only we could express these relationships, such as my social graph, in a way that is above the level of documents, then we would get re-use. That's just what the graph does for us. We have the technology -- it is Semantic Web technology, starting with RDF OWL and SPARQL. Not magic bullets, but the tools which allow us to break free of the document layer. If a social network site uses a common format for expressing that I know Dan Brickley, then any other site or program (when access is allowed) can use that information to give me a better service. Un-manacled to specific documents.
I express my network in a FOAF file, and that is a start of the revolution. I blogged on FOAF files earlier, before the major open SNS angst started. The data in a FOAF file can be read by other applications. Photo-sharing, travel sites, sites which accept your input because you are a part of the graph.
The less inviting side of sharing is losing some control. Indeed, at each layer --- Net, Web, or Graph --- we have ceded some control for greater benefits.
People running Internet systems had to let their computer be used for forwarding other people's packets, and connecting new applications they had no control over. People making web sites sometimes tried to legally prevent others from linking into the site, as they wanted complete control of the user experience, and they would not link out as they did not want people to escape. Until after a few months they realized how the web works. And the re-use kicked in. And the payoff started blowing people's minds.
Letting your data connect to other people's data is a bit about letting go in that sense. It is still not about giving to people data which they don't have a right to. It is about letting it be connected to data from peer sites. It is about letting it be joined to data from other applications.
It is about getting excited about connections, rather than nervous.
In the short, what-can-I-code-up-this-afternoon-to-fix-this term, it is about other sites following the lead of my.opera.com, livejournal, advogato, and so on (list) also exporting a public RDF URI for their members, with what information the person would like to share.Right now, this blog re-uses the FOAF data linked to us to fight spam.
In the long term vision, thinking in terms of the graph rather than the web is critical to us making best use of the mobile web, the zoo of wildy differing devices which will give us access to the system. Then, when I book a flight it is the flight that interests me. Not the flight page on the travel site, or the flight page on the airline site, but the URI (issued by the airlines) of the flight itself. That's what I will bookmark. And whichever device I use to look up the bookmark, phone or office wall, it will access a situation-appropriate view of an integration of everything I know about that flight from different sources. The task of booking and taking the flight will involve many interactions. And all throughout them, that task and the flight will be primary things in my awareness, the websites involved will be secondary things, and the network and the devices tertiary.
I'll be thinking in the graph. My flights. My friends. Things in my life. My breakfast. What was that? Oh, yogourt, granola, nuts, and fresh fruit, since you ask.
I just discovered Kindle: Amazon's New Wireless Reading Device. About $10 per e-book sounds ok, but $0.10 to put my own files on it?!?! It can read blogs like Slashdot and boingboing for as little as $.99 per month over the $399 purchase price. It comes with wikipedia. Say... that sounds familiar... where else can I get wikipedia on a device with a nice display that works in daylight...
Håkon brought one to the video panel at the W3C TPAC this month, while the voice of Lawrence Lessig was still ringing in my head: What have we done about it? he asked again and again in his powerful OSCON 2002 talk:
Lawrence Lessig: I have been doing this for about two years--more than 100 of these gigs. This is about the last one. One more and it's over for me. So I figured I wanted to write a song to end it. But then I realized I don't sing and I can't write music. But I came up with the refrain, at least, right? This captures the point. If you understand this refrain, you're gonna' understand everything I want to say to you today. It has four parts:
Creativity and innovation always builds on the past.
The past always tries to control the creativity that builds upon it.
Free societies enable the future by limiting this power of the past.
Ours is less and less a free society.
I don't sing all that well either, but I play a little guitar, so when Håkon walked into the HTML WG meeting as un-conference pitches were next on the agenda, I pitched a jam session. I dedicated the opening number,With a Little Help from My Friends, to Sam Ruby whose comment prompted me to watch the Lessig show before the trip. The InstantGig was "surreal (but awesome)" according to one account.
Håkon's pitch for open standard video for our cultural heritage inspired One laptop per Kyle, the story of getting an XO-1 for my 8-year-old boy instead of the Windows PC he says he want in order to play the games that his friends all play. Before the trip he told me that he wants to build a web site with lots and lots of games and I thought "but you're just one little boy." But I think I get it now...
He has a new name, by the way: Burn, as in Rip, Mix, an Burn. Rip, after 1 year of musical training, can sound out the Mario theme on trombone or piano in an afternoon, something I can't do after 20 years of training my mediocre ear. And the middle child, Mix, is so charming that if you stop at a red light, he'll have a new friend before the light turns green.
I have one give-one-get-one package on order for Burn; if you're feeling like a patron of the arts and you want to see what happens if Rip and Mix get one too, feel free to send us a Christmas Card with a little something inside.
And look out for SwordPedestal.com, which Kyle picked out. It's only a dream now, but I have a hunch it may one day rival Nintendo for the hearts and minds of a few million people.
I want something more like the OmniOutliner experience... I want to brainstorm.
But when I'm done, I don't want to tediously copy and paste each field into tracker.
Clearly, I could write some python or XSLT to take OmniOutliner's XML and feed it to tracker afterward, but... can't we do better than that?
What if tabulator's UI were as smooth as OmniOutliner... and what if I could just push one button and get the toothpaste back in the tube, i.e. feed the outline into the tracker's REST interface?
p.s. why am I using emacs to write this? Apple Mail knows IntegrityIsJobOne, but in OS X 10.4, like iCal, it goes off into the weeds eating CPU for inexplicable reasons, and I don't invest debugging effort in stuff that isn't open source.
How do I feed this to breadcrumbs now? Does emacs have a markdown/ReStructuredText mode?
How about AtomPub support? I manually cut and pasted and cleaned up the line breaks. ugh!
I use Thunderbird on my PowerBook, but it's totally confused about offline operation. It goes to save to the drafts folder every now and again, but over IMAP... so if the net is flakey or down, (a) it doesn't actually save, and (b) it interrupts my drafting!!! OK... found the config option to use a local drafts folder under Tools/Account settings. (why not under preferences?) But Thunderbird doesn't do well filling the IMAP cache; I don't know to tell it to go offline until I've left the airport wifi, and at that point, it's too late to grab the mail I want to read. The Apple mailer does much better at using idle time to prefetch.
p.p.s. how do I use hReview and GRDDL to make the data in this gripe available as if it were a bugzilla entry? More on that to follow, I hope...
As Simon Willison notes, OpenID solves the identity problem, not the trust problem. Meanwhile, FOAF and RDF are potential solutions to lots of problems but not yet actual solutions to very many. I think they go together like peanut butter and chocolate, creating a deliciously practical testbed for our Policy Aware Web research.
Our struggle to build a community is fairly typical:
- Oct 2005: breadcrumbs launches (and I wish for OpenID support)
- Dec 2005: Tim gets 400+ friendly comments on his first item.
- Jun 2006: Comments disabled due to overwhelming spam
... if you manage a social networking service, we strongly encourage you to embrace OpenID, hCard XFN, FOAF and the other open standards around data portability.
With that in mind, a suggestion to outsource to a centralized commercial blog spam filtering service seemed like a step in the wrong direction; we are the Decentralized Information Group after all; time to eat our own cooking!
The policy we have working right now is, roughly: you can comment on our blog if you're a friend of a friend of a member of the group.
In more detail, you can comment on our blog if:
- You can show ownership of a web page via the OpenID protocol.
- That web page is related by the foaf:openid property to a foaf:Person, and
- That foaf:Person is
- listed as a member of the DIG group in http://dig.csail.mit.edu/data, or
- related to a dig member by one or two foaf:knows links.
The implementation has two components so far:
- an enhancement to drupal's OpenID support to check a whitelist
- a FOAF crawler that generates a whitelist periodically
We're looking into policies such as You can comment if you're in a class taught by a DIG group member, but there are challenges reconciling policies protecting privacy of MIT students with this approach.
We're also interested in federating with other communities. The Advogato community is particuarly interesting because
- The DIG group is pretty into Open Source, the core value of advogato.
- Advogato's trust metric is designed to be robust in the face of spammers and seems to work well in practice.
So I'd like to be able to say You can comment on our blog if you're certified Journeyer or above in the Advogato community. Advogato has been exporting basic foaf:name and foaf:knows data since a Feb 2007 update, but they didn't export the results of the trust metric computation in RDF.
Asking for that data in RDF has been on my todo list for months, but when Sean Palmer found out about this OpenID and FOAF stuff, he sent an enhancement request, and Steven Rainwater joined the #swig channel to let us alpha test it in no time. Sean also did a nice write-up.
This is a perfect example of the sort of integration of statistical methods into the Semantic Web that we have been talking about as far back as our DAML proposal in 2000:
Now we just have to enhance our crawler to get that data or otherwise integrate it with the drupal whitelist. (I'm particularly interested in using GRDDL to get FOAF data right from the OpenID page; stay tuned for more on that.) And I guess we need Advogato to provide a user interface for foaf:openid support... or maybe links to supplementary FOAF files via rdfs:seeAlso or owl:sameAs.
Some of these systems use relatively simple and straightforward manipulation of well-characterized data, such as an access control system. Others, such as search engines, use wildly heuristic manipulations to reach less clearly justified but often extremely useful conclusions. In order to achieve its potential, the Semantic Web must provide a common interchange language bridging these diverse systems. Like HTML, the Semantic Web language should be basic enough that it does not impose an undue burden on the simplest web software systems, but powerful enough to allow more sophisticated components to use it to advantage as well.