Well, the Semantic Web has been in the news a bit recently.
There was the buzz about Twine, a "Semantic Web company", getting another round of funding. Then, Yahoo announced that it will pick up Semantic Web information from the Web, and use it to enhance search. And now the Times online mis-states that I think "Google could be superseded". Sigh. In an otherwise useful discussion largely about what the Semantic Web is and how it will affect people, a misunderstanding which ended up being the title of the blog. In fact, the conversation as I recall started with a question whether, if search engines were the killer app for the familiar Web of documents, what will be the killer app for the Semantic Web.
Text search engines are of course good for searching the text in documents, but the Semantic Web isn't text documents, it is data. It isn't obvious what the killer apps will be - there are many contenders. We know that the sort of query you do on data is different: the SPARQL standard defines a query protocol which allows application builders to query remote data stores. So that is one sort of query on data which is different from text search.
One thing to always remember is that the Web of the future will have BOTH documents and data. The Semantic Web will not supersede the current Web. They will coexist. The techniques for searching and surfing the different aspects will be different but will connect. Text search engines don't have to go out of fashion.
The "Google will be superseded" headline is an unfortunate misunderstanding. I didn't say it. (We have, by the way, asked it to be fixed. One can, after all, update a blog to fix errors, and this should be appropriate. Ian Jacobs wrote an email, left voice mail, and tried to post a reply to the blog, but the reply did not appear on the blog - moderated out? So we tried.)
Now of course, as the name of The Times was once associated with a creditable and independent newspaper :-), the headline was picked up and elaborated on by various well-meaning bloggers. So the blogosphere, which one might hope to be the great safety net under the conventional press, in this case just amplified the error.
I note that here the blogosphere was misled by an online version of a conventional organ. There are many who worry about the inverse, that decent material from established sources will be drowned beneath a tide of low-quality information from less creditable sources.
The Media Standards Trust is a group which has been working with the Web Science Research Initiative (I'm a director of WSRI) to develop ways of encoding the standards of reporting a piece of information purports to meet: "This is an eye-witness report"; or "This photo has not been massaged apart from: cropping"; or "The author of the report has no commercial connection with any products described"; and so on. Like creative commons, which lets you mark your work with a licence, the project involves representing social dimensions of information. And it is another Semantic Web application.
In all this Semantic Web news, though, the proof of the pudding is in the eating. The benefit of the Semantic Web is that data may be re-used in ways unexpected by the original publisher. That is the value added. So when a Semantic Web start-up either feeds data to others who reuse it in interesting ways, or itself uses data produced by others, then we start to see the value of each bit increased through the network effect.
So if you are a VC funder or a journalist and some project is being sold to you as a Semantic Web project, ask how it gets extra re-use of data, by people who would not normally have access to it, or in ways for which it was not originally designed. Does it use standards? Is it available in RDF? Is there a SPARQL server?
A great example of Semantic Web data which works this way is Linked Data. There is growing mass of interlinked public data much of it promoted by the Linked Open Data project. There is an upcoming Linked Data workshop on this at the WWW 2008 Conference in April in Beijing, and in June 17-18 in New York at the Linked Data Planet Conference. Linked data comes alive when you explore it with a generic data browser like the Tabulator. It also comes alive when you make mashups out of it. (See Playing with Linked Data, Jamendo, Geonames, Slashfacet and Songbird ; Using Wikipedia as a database). It should be easier to make those mashups by just pulling RDF (maybe using RDFa or GRDDL) or using SPARQL, rather than having to learn a new set of APIs for each site and each application area.
I think there is an important "double bus" architecture here, in which there are separate markets for the raw data and for the mashed up data. Data publishers (e.g., government departments) just produce raw data now, and consumer-facing sites (e.g., soccer sites) mash up data from many sources. I might talk about this a bit at WWW 2008.
So in scanning new Semantic Web news, I'll be looking out for re-use of data. The momentum around Linked Open Data is great and exciting -- let us also make sure we make good use of the data.
Well, it has been a long time since my last post here. So many topics, so little time. Some talks, a couple of Design Issues articles, but no blog posts. To dissipate the worry of expectation of quality, I resolve to lower the bar. More about what I had for breakfast.
So The Graph word has been creeping in. BradFitz talks of the Social Graph as does Alex Iskold, who discusses social graphs and network theory in general, points out that users want to own their own social graphs. He alo points out that examples of graphs are the Internet and the Web. So what's with the Graph word?
Maybe it is because Net and Web have been used. For perfectly good things .. but different things.
The Net we normally use as short for Internet, which is the International Information Infrastructure. Al Gore promoted the National Information Infrastructure (NII) presumably as a political pragma at the time, but clearly it became International. So let's call it III. Let's think about the Net now as an invention which made life simpler and more powerful. It made it simpler because of having to navigate phone lines from one computer to the next,you could write programs as though the net were just one big cloud, where messages went in at your computer and came out at the destination one. The realization was, "It isn't the cables, it is the computers which are interesting". The Net was designed to allow the computers to be seen without having
to see the cables.
Simpler, more powerful. Obvious, really.
Programmers could write at a more abstract level. Also, there was re-use of the connections, in that, as the packets flowed, a cable which may have been laid for one purpose now got co-opted for all kinds of uses which the original users didn't dream of. And users of the Net, the III, found that they could connect to all kinds of computers which had been hooked up for various reasons, sometimes now forgotten. So the new abstraction gave us more power, and added value by enabling re-use.
The word Web we normally use as short for World Wide Web. The WWW increases the power we have as users again. The realization was "It isn't the computers, but the documents which are interesting". Now you could browse around a sea of documents without having to worry about which computer they were stored on. Simpler, more powerful. Obvious, really.
Also, it allowed unexpected re-use. People would put a document on the web for one reason, but it would end up being found by people using it in completely different ways. Two delights drove the Web: one of being told by a stranger your Web page has saved their day, and the other of discovering just the information you need and for which you couldn't imagine someone having actually had the motivation to provide it.
So the Net and the Web may both be shaped as something mathematicians call a Graph, but they are at different levels. The Net links computers, the Web links documents.
Now, people are making another mental move. There is realization now, "It's not the documents, it is the things they are about which are important". Obvious, really.
Biologists are interested in proteins, drugs, genes. Businesspeople are interested in customers, products, sales. We are all interested in friends, family, colleagues, and acquaintances. There is a lot of blogging about the strain, and total frustration that, while you have a set of friends, the Web is providing you with separate documents about your friends. One in facebook, one on linkedin, one in livejournal, one on advogato, and so on. The frustration that, when you join a photo site or a movie site or a travel site, you name it, you have to tell it who your friends are all over again. The separate Web sites, separate documents, are in fact about the same thing -- but the system doesn't know it.
There are cries from the heart (e.g The Open Social Web Bill of Rights) for my friendship, that relationship to another person, to transcend documents and sites. There is a "Social Network Portability" community. Its not the Social Network Sites that are interesting -- it is the Social Network itself. The Social Graph. The way I am connected, not the way my Web pages are connected.
We can use the word Graph, now, to distinguish from Web.
I called this graph the Semantic Web, but maybe it should have been Giant Global Graph! Any worse than WWWW? ;-) Not the "Semantic Web" term has been established for a long time, I'm not proposing to change it. But let's think about the graph which it is. (Footnote: "Graph" also happens to be the word the RDF specifications use, but that is by the way. While an XML parser creates a DOM tree, an RDF parser creates an RDF graph in memory.)
So, if only we could express these relationships, such as my social graph, in a way that is above the level of documents, then we would get re-use. That's just what the graph does for us. We have the technology -- it is Semantic Web technology, starting with RDF OWL and SPARQL. Not magic bullets, but the tools which allow us to break free of the document layer. If a social network site uses a common format for expressing that I know Dan Brickley, then any other site or program (when access is allowed) can use that information to give me a better service. Un-manacled to specific documents.
I express my network in a FOAF file, and that is a start of the revolution. I blogged on FOAF files earlier, before the major open SNS angst started. The data in a FOAF file can be read by other applications. Photo-sharing, travel sites, sites which accept your input because you are a part of the graph.
The less inviting side of sharing is losing some control. Indeed, at each layer --- Net, Web, or Graph --- we have ceded some control for greater benefits.
People running Internet systems had to let their computer be used for forwarding other people's packets, and connecting new applications they had no control over. People making web sites sometimes tried to legally prevent others from linking into the site, as they wanted complete control of the user experience, and they would not link out as they did not want people to escape. Until after a few months they realized how the web works. And the re-use kicked in. And the payoff started blowing people's minds.
Letting your data connect to other people's data is a bit about letting go in that sense. It is still not about giving to people data which they don't have a right to. It is about letting it be connected to data from peer sites. It is about letting it be joined to data from other applications.
It is about getting excited about connections, rather than nervous.
In the short, what-can-I-code-up-this-afternoon-to-fix-this term, it is about other sites following the lead of my.opera.com, livejournal, advogato, and so on (list) also exporting a public RDF URI for their members, with what information the person would like to share.Right now, this blog re-uses the FOAF data linked to us to fight spam.
In the long term vision, thinking in terms of the graph rather than the web is critical to us making best use of the mobile web, the zoo of wildy differing devices which will give us access to the system. Then, when I book a flight it is the flight that interests me. Not the flight page on the travel site, or the flight page on the airline site, but the URI (issued by the airlines) of the flight itself. That's what I will bookmark. And whichever device I use to look up the bookmark, phone or office wall, it will access a situation-appropriate view of an integration of everything I know about that flight from different sources. The task of booking and taking the flight will involve many interactions. And all throughout them, that task and the flight will be primary things in my awareness, the websites involved will be secondary things, and the network and the devices tertiary.
I'll be thinking in the graph. My flights. My friends. Things in my life. My breakfast. What was that? Oh, yogourt, granola, nuts, and fresh fruit, since you ask.
People have, since it started, complained about the fact that there is junk on the web. And as a universal medium, of course, it is important that the web itself doesn't try to decide what is publishable. The way quality works on the web is through links.
It works because reputable writers make links to things they consider reputable sources. So readers, when they find something distasteful or unreliable, don't just hit the back button once, they hit it twice. They remember not to follow links again through the page which took them there. One's chosen starting page, and a nurtured set of bookmarks, are the entrance points, then, to a selected subweb of information which one is generally inclined to trust and find valuable.
A great example of course is the blogging world. Blogs provide a gently evolving network of pointers of interest. As do FOAF files. I've always thought that FOAF could be extended to provide a trust infrastructure for (e..g.) spam filtering and OpenID-style single sign-on and its good to see things happening in that space.
In a recent interview with the Guardian, alas, my attempt to explain this was turned upside down into a "blogging is one of the biggest perils" message. Sigh. I think they took their lead from an unfortunate BBC article, which for some reason stressed concerns about the web rather than excitement, failure modes rather than opportunities. (This happens, because when you launch a Web Science Research Initiative, people ask what the opportunities are and what the dangers are for the future. And some editors are tempted to just edit out the opportunities and headline the fears to get the eyeballs, which is old and boring newspaper practice. We expect better from the Guardian and BBC, generally very reputable sources)
In fact, it is a really positive time for the web. Startups are launching, and being sold [Disclaimer: people I know] again, academics are excited about new systems and ideas, conferences and camps and wikis and chat channels and are hopping with energy, and every morning demands an excruciating choice of which exciting link to follow first.
And, fortunately, we have blogs. We can publish what we actually think, even when misreported.
Making standards is hard work. Its hard because it involves listening to other people and figuring out what they mean, which means figuring out where they are coming from, how they are using words, and so on.
There is the age-old tradeoff for any group as to whether to zoom along happily, in relative isolation, putting off the day when they ask for reviews, or whether to get lots of people involved early on, so a wider community gets on board earlier, with all the time that costs. That's a trade-off which won't go away.
The solutions tend to be different for each case, each working group. Some have lots of reviewers and some few, some have lots of time, some urgent deadlines.
A particular case is HTML. HTML has the potential interest of millions of people: anyone who has designed a web page may have useful views on new HTML features. It is the earliest spec of W3C, a battleground of the browser wars, and now the most widespread spec.
The perceived accountability of the HTML group has been an issue. Sometimes this was a departure from the W3C process, sometimes a sticking to it in principle, but not actually providing assurances to commenters. An issue was the formation of the breakaway WHAT WG, which attracted reviewers though it did not have a process or specific accountability measures itself.
There has been discussion in blogs where Daniel Glazman, Björn Hörmann, Molly Holzschlag, Eric Meyer, and Jeffrey Zeldman and others have shared concerns about W3C works particularly in the HTML area. The validator and other subjects cropped up too, but let's focus on HTML now. We had a W3C retreat in which we discussed what to do about these things.
Some things are very clear. It is really important to have real developers on the ground involved with the development of HTML. It is also really important to have browser makers intimately involved and committed. And also all the other stakeholders, including users and user companies and makers of related products.
Some things are clearer with hindsight of several years. It is necessary to evolve HTML incrementally. The attempt to get the world to switch to XML, including quotes around attribute values and slashes in empty tags and namespaces all at once didn't work. The large HTML-generating public did not move, largely because the browsers didn't complain. Some large communities did shift and are enjoying the fruits of well-formed systems, but not all. It is important to maintain HTML incrementally, as well as continuing a transition to well-formed world, and developing more power in that world.
The plan is to charter a completely new HTML group. Unlike the previous one, this one will be chartered to do incremental improvements to HTML, as also in parallel xHTML. It will have a different chair and staff contact. It will work on HTML and xHTML together. We have strong support for this group, from many people we have talked to, including browser makers.
There will also be work on forms. This is a complex area, as existing HTML forms and XForms are both form languages. HTML forms are ubiquitously deployed, and there are many implementations and users of XForms. Meanwhile, the Webforms submission has suggested sensible extensions to HTML forms. The plan is, informed by Webforms, to extend HTML forms. At the same time, there is a work item to look at how HTML forms (existing and extended) can be thought of as XForm equivalents, to allow an easy escalation path. A goal would be to have an HTML forms language which is a superset of the existing HTML language, and a subset of a XForms language wit added HTML compatibility. We will see to what extend this is possible. There will be a new Forms group, and a common task force between it and the HTML group.
There is also a plan for a separate group to work on the XHTML2 work which the old "HTML working group" was working on. There will be no dependency of HTML work on the XHTML2 work.
As well as a new HTML work, there are other things want to change. The validator I think is a really valuable tool both for users and in helping standards deployment. I'd like it to check (even) more stuff, be (even) more helpful, and prioritize carefully its errors, warning and mild chidings. I'd like it to link to an explanations of why things should be a certain way. We have, by the way, just ordered some new server hardware, paid for by the Supporters program -- thank you!
This is going to be hard work. I'd like everyone to go into this realizing this. I'll be asking these groups to be very accountable, to have powerful issue tracking systems on the w3.org web site, and to be responsive in spirit as well as in letter to public comments. As always, we will be insisting on working implementations and test suites. Now we are going to be asking for things like talking with validator developers, maybe providing validator modules and validator test suites. (That's like a language test suite but backwards, in a way). I'm going to ask commenters to be respectful of the groups, as always. Try to check whether the comment has been made before, suggest alternative text, one item per message, etc, and add to technical perception social awareness.
This is going to be a very major collaboration on a very important spec, one of the crown jewels of web technology. Even though hundreds of people will be involved, we are evolving the technology which millions going on billions will use in the future. There won't seem like enough thankyous to go around some days. But we will be maintaining something very important and creating something even better.
p.s. comments are disabled here in breadcrumbs, the DIG research blog, but they are welcome in the W3C QA weblog.
There is a new version 07 of the Tabulator out. This is the generic data browser which lets you do useful things with your RDF data the moment it's on the web.
It works by exploring the web of relationship between things, loading more data from the web as you go. Then, if you find a pattern of information you are interested in, it will search for all occurrences of that pattern and display them in tables, maps, calendars, and so on.
In the same session, you can explore, say, some geocoded photos taken from on a trip with a GPS, and then separately explore where in the world the tabulator developers are based. Then, you can project both datasets onto the same map. Or onto the same calendar, for data with a time component. This shows the cross-domain power of the semantic web.
This means you can correlate data from completely different domains. Think of all the different mash-ups people have made for putting things like friends houses, photos, or coffee shops on the web. Each a different mash-up for a different data source.
For data in RDF (or any XML with a GRDDL profile), though, then you don't have to program anything. You can just explore it and map it. And you can map many different data sources at the same time.
Oh, and for developers, the core of the tabulator is an open source RDF library with a complete tested RDF/XML parser, a store which smushes on owl:sameAs and owl:[Inverse]FunctionalProperty, and web crawling query engine supporting basic SPARQL. Enjoy.
When I invented the Web, I didn't have to ask anyone's permission. Now, hundreds of millions of people are using it freely. I am worried that that is going end in the USA.
I blogged on net neutrality before, and so did a lot of other people. (see e.g. Danny Weitzner, SaveTheInternet.com, etc.) Since then, some telecommunications companies spent a lot of money on public relations and TV ads, and the US House seems to have wavered from the path of preserving net neutrality. There has been some misinformation spread about. So here are some clarifications. ( real video Mpegs to come)
Net neutrality is this:
If I pay to connect to the Net with a certain quality of service, and you pay to connect with that or greater quality of service, then we can communicate at that level.That's all. Its up to the ISPs to make sure they interoperate so that that happens.
Net Neutrality is NOT asking for the internet for free.
Net Neutrality is NOT saying that one shouldn't pay more money for high quality of service. We always have, and we always will.
There have been suggestions that we don't need legislation because we haven't had it. These are nonsense, because in fact we have had net neutrality in the past -- it is only recently that real explicit threats have occurred.
Control of information is hugely powerful. In the US, the threat is that companies control what I can access for commercial reasons. (In China, control is by the government for political reasons.) There is a very strong short-term incentive for a company to grab control of TV distribution over the Internet even though it is against the long-term interests of the industry.
Yes, regulation to keep the Internet open is regulation. And mostly, the Internet thrives on lack of regulation. But some basic values have to be preserved. For example, the market system depends on the rule that you can't photocopy money. Democracy depends on freedom of speech. Freedom of connection, with any application, to any party, is the fundamental social basis of the Internet, and, now, the society based on it.
Let's see whether the United States is capable as acting according to its important values, or whether it is, as so many people are saying, run by the misguided short-term interested of large corporations.
I hope that Congress can protect net neutrality, so I can continue to innovate in the internet space. I want to see the explosion of innovations happening out there on the Web, so diverse and so exciting, continue unabated.
Net Neutrality is an international issue. In some countries it is addressed better than others. (In France, for example, I understand that the layers are separated, and my colleague in Paris attributes getting 24Mb/s net, a phone with free international dialing and digital TV for 30euros/month to the resulting competition.) In the US, there have been threats to the concept, and a wide discussion about what to do. That is why, though I have written and spoken on this many times, I blog about it now.
Twenty-seven years ago, the inventors of the Internet designed an architecture which was simple and general. Any computer could send a packet to any other computer. The network did not look inside packets. It is the cleanness of that design, and the strict independence of the layers, which allowed the Internet to grow and be useful. It allowed the hardware and transmission technology supporting the Internet to evolve through a thousandfold increase in speed, yet still run the same applications. It allowed new Internet applications to be introduced and to evolve independently.
When, seventeen years ago, I designed the Web, I did not have to ask anyone's permission. . The new application rolled out over the existing Internet without modifying it. I tried then, and many people still work very hard still, to make the Web technology, in turn, a universal, neutral, platform. It must not discriminate against particular hardware, software, underlying network, language, culture, disability, or against particular types of data.
Anyone can build a new application on the Web, without asking me, or Vint Cerf, or their ISP, or their cable company, or their operating system provider, or their government, or their hardware vendor.
It is of the utmost importance that, if I connect to the Internet, and you connect to the Internet, that we can then run any Internet application we want, without discrimination as to who we are or what we are doing. We pay for connection to the Net as though it were a cloud which magically delivers our packets. We may pay for a higher or a lower quality of service. We may pay for a service which has the characteristics of being good for video, or quality audio. But we each pay to connect to the Net, but no one can pay for exclusive access to me.
When I was a child, I was impressed by the fact that the installation fee for a telephone was everywhere the same in the UK, whether you lived in a city or on a mountain, just as the same stamp would get a letter to either place.
To actually design legislation which allows creative interconnections between different service providers, but ensures neutrality of the Net as a whole may be a difficult task. It is a very important one. The US should do it now, and, if it turns out to be the only way, be as draconian as to require financial isolation between IP providers and businesses in other layers.
The Internet is increasingly becoming the dominant medium binding us. The neutral communications medium is essential to our society. It is the basis of a fair competitive market economy. It is the basis of democracy, by which a community should decide what to do. It is the basis of science, by which humankind should decide what is true.
Let us protect the neutrality of the net.
"Two types that had far better leave to their betters
the civilized art of exchanging letters
are those who disdain to make any response,
and those who infallibly answer at once!"
The regularity of this blog fails on both counts.
One meme of RDF ethos is that the direction one choses for a given property is arbitrary: it doesn't matter whether one defines "parent" or "child"; "employee" or "employer". This philosophy (from the Enquire design of 1980) is that one should not favor one way over another. One day, you may be interested in following the link one way, another day, or somene else, the other way.
On the other hand, also one should not encourage people having to declare both a property and its inverse, which would simply double the number of definitions out there, and give one more axis of arbitrary variation in the way information is expressed. Therefore, the design of the tabulator was is to make the system treat forward and backward links equivalently.
The design of N3 also was influenced by this. The ability to write
:Joe is f:parent of :Fred.
makes it easier to write (or generate) N3 without having to use f:child. This in turn reduces the pressure to define both.
The only loss in not having both is that there is no label for the reverse link. (In same cases I have defined an unnamed predicate which is delcared as the inverse and has a label.)
Do you have a URI for yourself? If you are reading this blog and you have the ability to publish stuff on the web, then you can make a FOAF page, and you can give yourself a URI.
A lot of people have published data about themselves without using a URI for themselves. This means I can't refer to them in other data. So please take a minute to give yourself a URI. If you have a FOAF page, you may just have to add rdf:about="" and voila you have a URI http://example.com/Alan/foaf.rdf#ABC. (I suggest you use your initials for the last bit). Check it works in the Tabulator.
The URI will start with "http" (so I can look it up using HTTP) and it will have # in it, so the URI of your foaf file is different from the URI for you.
The AWWW says that everything of importance deserves a URI. Go ahead and give yourself a URI. You deserve it!
On the web of [x]HTML documents, the links are critical. Links are references to 'anchors' in other documents, and they use URIs which are formed by taking the URI of the document and adding a # sign and the local name of the anchor. This way, local anchors get a global name.
On the Semantic Web, links are also critical. Here, the local name, and the URI formed using the hash, refer to arbitrary things. When a semantic web document gives information about something, and uses a URI formed from the name of a different document, like foo.rdf#bar, then that's an invitation to look up the document, if you want more information about. I'd like people to use them more, and I think we need to develop algorithms which for deciding when to follow Semantic Web links as a function of what we are looking for.
The result .. insert a million disclaimers... experimental, work in progress, only runs on Firefox for no serious reason, not accessible, too slow, etc ... at least is a platform for looking at Semantic Web data in a fairly normal way, but also following links. A blue dot indicates something which could be downloaded. Download some data before exploring the data in it. Note that as you download multiple FOAF files for example the data from them merges into the unified view. (You may have to collapse and re-expand an outline).
Here is the current snag, though. Firefox security does not allow a script from a given domain to access data from any other domain, unless the scripts are signed, or made into an extension. And looking for script signing tools (for OS X?) led me to dead ends. So if anyone knows how to do that, let me know. Untill I find a fix for that, the power of following links -- which is that they can potentially go anywhere -- is alas not evident!