Links on the Semantic Web
On the web of [x]HTML documents, the links are critical. Links are references to 'anchors' in other documents, and they use URIs which are formed by taking the URI of the document and adding a # sign and the local name of the anchor. This way, local anchors get a global name.
On the Semantic Web, links are also critical. Here, the local name, and the URI formed using the hash, refer to arbitrary things. When a semantic web document gives information about something, and uses a URI formed from the name of a different document, like foo.rdf#bar, then that's an invitation to look up the document, if you want more information about. I'd like people to use them more, and I think we need to develop algorithms which for deciding when to follow Semantic Web links as a function of what we are looking for.
To play with semantic web links, I made a toy semantic web browser, Tabulator. Toy, because it is hacked up in Javascript (a change from my usual Python) to experiment with these ideas. It is AJAR - Asynchronous Javascript and RDF. I started off with Jim Ley's RDF Parser and added a little data store. The store understands the mimimal OWL ([inverse] functional properties, sameAs) to smush nodes representing the same thing together, so it doesn't matter if people use many different URIs for the same thing, which of course they can. It has a simple index and supports simple query. The API is more or less the one which cwm and had been tending toward in python.
Then, with the DOM and CSS and Ecmascript standards bookmarked, the rest was just learning the difference between Javascript and Python. Fun, anyway.
The result .. insert a million disclaimers... experimental, work in progress, only runs on Firefox for no serious reason, not accessible, too slow, etc ... at least is a platform for looking at Semantic Web data in a fairly normal way, but also following links. A blue dot indicates something which could be downloaded. Download some data before exploring the data in it. Note that as you download multiple FOAF files for example the data from them merges into the unified view. (You may have to collapse and re-expand an outline).
Here is the current snag, though. Firefox security does not allow a script from a given domain to access data from any other domain, unless the scripts are signed, or made into an extension. And looking for script signing tools (for OS X?) led me to dead ends. So if anyone knows how to do that, let me know. Untill I find a fix for that, the power of following links -- which is that they can potentially go anywhere -- is alas not evident!
This all sounds a lot like Web 2.0, whatever that is. I just search all text on this page and the term "web 2.0" is not used one time. To you Tim, and all the people writing the comments, thank you for that.
You might want to look at MindRaider, a Semantic Web Outliner written in Java that runs on Linux and Windows (at least). It is on my "must-try" list, but I have only downloaded it (I have not actually tried it...)
Tim
"I'd like people to use them more, and I think we need to develop algorithms which for deciding when to follow Semantic Web links as a function of what we are looking for."
A tagging system would be a great way to organize it. I also just wanted to say thank you very much for creating the web it has made my life so much better and happpier.
Since you mentioned often programming in Python, and you're writing programs in Javascript, you might be interested in MochiKit. MochiKit adds various features to Javascript which make it easier to work with and to write reasonable code with; many of these features are modelled after those in Python. In particular, it has iterators, extensible comparison and repr functions, and several functional programming tools. It also has excellent support for easily constructing, locating, and manipulating DOM nodes, working with XMLHTTPRequests (asynchronously) and JSON, and adding and removing event handlers without worrying about those already present.
Hi Tim I've just read your post, consider the dynamic script tag method you may find it useful.
link for the full explanation:
http://www.theurer.cc/blog/2005/12/15/web-services-json-dump-your-proxy/
i am looking to pull ceo names from urls if in fact those names are somewhere listed within a web site. i dont have the url where their name is listed but do have the url of their company (their name is on a sub page potentially). what are your ideas to accmplish this? i was thinking using teleport ultra but it does not have the intelligence of what it has gathered nor is that intelligence addressed in underlying html as you are aware.
I built something similiar to what you are describing. A screen scrubbing spider is the first thing you need. The second thing is a natural language processing tool like Lockheed Martins AeroText , which has out of the box rules for extracting person, places and things from unstructered text. At one time they had an XML server, you pass the server an HTML file and it would return you a XML file with the requested relationships parsed out in XML
David,
interesting! could we get in touch. email me at tob_baker@yahoo.com
thanks!
By the way, I think the world needs a good tool for users to browse the Semantic Web. RDF is really cool, but it scares people. First of all, when you use the full-blown XML-syntax, it's not very human-readable. But using the Semantic Web for gathering information is maybe even less straightforward. Even "the geeks" won't browse it by reading the source.
So I think there's definately a need for a nice presentation of RDF. It can be done via XSLT and/or CSS, transforming it into HTML for example, so that links to other resources are clickable for the user. But that means every RDF-provider should build their own stylesheets, and that will slow down or even prevend the massive acceptation of metadata in the form of RDF.
I'm not completely into the Semantic Web, but I'm really interested in how it's going to be accessed by the end-user.
For a discussion of using XSLT, CSS, diagrams, etc., see VisualizingRDF in the ESW Wiki. Hmm... probably should add Fresnel and such.
Perhaps not surprisingly due to its use of Gecko, Camino seems quite happy to run Tabulator.
Thought you'ld like to know.
Well, if you request a page using XMLHttpRequest, it's easy for the owner of the domain to modify the page, for example, send malicious javascript, which is then executed on your page as if it were your script. This allows the owner of the other domain to read your cookies, for example. There are other issues, but I think this one (Cross Site Scripting) is the most dangerous one.
You can download javascript from external sources, but you can't do it using XMLHttpReqest. Quite weird, I agree.
thanks for bring the www to the world and wish you well
Hmm... now that is a way of using the creative commons license that I had not even imagined.
Please reconsider whether it's a good idea, though. See Avoiding URI aliases in the Web Architecture document.
Hi Tim;
Have you considered proxying external requests via a server script? Scott Andrew LePera demonstrated this method in 2001 to make XML-RPC work to external domains from a javascript client via JSRS.
Of course then you get to use python on the server side if you like.
Yes, that is one solution. See the breakdown in the to-do list. The disadvantage is of course that it doesn't scale, it is a centralized solution, and while it would work with the current level of use I just don't like to build in depedency on a central server.
I've implemented codebase principle support now. See the
help file.
The domain restriction when making HTTP calls from JavaScript in a browser must be overcome soon!! I suppose it's a security issue which I don't fully understand. At least the browser could present a dialog warning the user that a script will be accessing a different domain.
It can actually be a very severe security problem: Imagine a script from an "untrusted" website making HTTP calls to some website that uses cookies to remember login information or that uses ip filtering to control access (or some local network service): The script will be able to access information its owner normally wouldn't be supposed to see and it might even change passwords and other things ...
XSS (cross site scripting) can be used to subvert the trust users place in SSL certs. A malicious site could simulate the browsing of a trusted site and convince the user to enter in their auth information, giving the malicious code access equivalent to that of the victim. It's a kind of "man-in-the-middle" attack with the user's own computer serving as the proxy in the middle.
The problem is in the bluring of the line between code and data, and the solution is going to require a massive change in how we think about software, data and network design.
Congratulations with diving into JavaScript, hope you'll find it's better than its reputation..
The reason the Tabulator shows a blank page in Opera is that the page is coded with XHTML style SCRIPT tags - closed with /> rather than with a separate closing tag. Since this is technically invalid HTML Opera only supports this if the page is sent as application/xhtml+xml (this will be fixed because some HTML pages use the invalid form but that bug report has spent some time as a WONTFIX because it is a standards violation..).
Tim, I know it is work in progress but may I suggest a trip to validator.w3.org as the natural next step? :-)
////only runs on Firefox for no serious reason
could you expand on this (there HAS to be a technical reason)
.....many absolutely ADORE Internet Explorer,
there is a sense of being cheated....
BTW:
:LOL - the term AJAX has not even celebrated its first birthday - out of curiousity - what would you have called - AJAR - Asynchronous Javascript and RDF - one year ago? :-)
Lastly:
before there was AJAX there was infact TAXI
(by Tim Bray) - why did it NOT take off ?
http://www.xml.com/pub/a/2001/03/14/taxi.html
Breifly, Firefox is one of the browsers I use. I haven't the time now to debug the tabulator experiment on several machines. I haven't knowingly used any features unsupported by other browsers. I'd be interested in patches is someone else fixes it to work on other browers.
Stuff we now call AJAX was around long before 2004. I liked Dean Jackson's and Jim Ley's work with SVG over RDF, FoafNaut, etc. Found a slide of mine in some talk about UI architecture from 2003/6.
Signing tools are part of mozilla NSS available from mozilla.org (I'm not sure about OSX, but they work fine under Linux and Windows)
Last I tried (mozilla 1.7x, firefox 1.0x), they worked.
I also had bidirectional communication between signed javascript and signed java working fine.
Since you are already running in Firefox one easy way around the same-domain restriction would be to move your JavaScript into a Greasemonkey script.
As was the case with Tomi Häsä (see above), I'm coming up with a blank page after linking to http://www.w3.org/2005/ajar/tab. I tried opening the page using Firefox, Opera and IE, without result.
I went to WC3 and found that there were several other Tabulator/AJAW pages that I couldn't link to, such as http://www.w3.org/2005/10/ajaw/tab.html.
Anyone?
I miss in Tabulator's Help/About some link to the typical "project". Have you planned to create such a "project" or we will have to "bite our nails" till then? I'd like to have acess to the code, see how it evolves, and so on; and a sourceforge-style proyect is what I would expect.
Thanks a lot for been there, congrats for Tabulator, and happy new year.
The code is all under the W3C open source licence. You are of course very welcome to copy and hack it, for commercial or non-commercial use. I didn't emphasize it as I guess I take it for granted. I'm sorry it isn't packaged for easy download - in the mean time you can pick download the files individually. (I'm using the w3.org web site which I am all set up for, for code management, which is very easy for me)
I am not sure how much time I will have for incorporating patches. I'd like to make a plugin view architecture for things like business-card views of people, and map views of things with locations. You should be able to use extensions just by letting the tabulator know about them (using RDF). Like with Piggybank.
I don't see anything on page http://www.w3.org/2005/ajar/tab
(The length for the Homepage in a comment is too short for my hobby page, so I'm using TinyURL.)
Useful links/example for RDF and Semantic Web. And it seems that lots of semantic websites is boosting. Good news.
The Tabulator's UI presentation of RDF model is very nice.
But in my opinions, I am still wondering those who know little about RDF/OWL/semantic concepts will accept the expansing and collapsing of tree nodes and those simple property names and values. Even though the important links is provided for further comprehending, I am still feeling somewhat difficult to get a clear view of the whole things when browsing the nodes. It seems that RDF, which is designed for machine-reading and machine-comprehending, is lack of other UI presentation standards for human-orient reading and comprehending (CSS/XSLT is not enough, I think).
JavaScript/Browser-based application is nice enough and lots of other applications is transforming into JavaScript/Browser-based. So I think JavaScript is enough and no Python is needed (I know little about Python).(Advertisement: Generating JavaScript from existed Java codes: Java2Script Pacemaker: http://j2s.sourceforge.net)
Hi Tim,
Have you given Piggy-Bank from Simile at try?
If so what's your opinion of it?
I certianly have Piggy-bank installed. In fact one motivator for the tabulator was to help explain some of the features have been asking the Piggy bank folks to add. I think piggy bank has got a lot of great stuff in it. It stores, it publishes, it has text search, and maps(!), and nice style. It is fast, with a Java store, and it can access anything on the web (as it is a Firefox extension).
It may be that I haven't got used to using it. It manages information in small chunks, I tend to think in files (web resources). I found I really wanted the outline mode - and an easy way to follow links. I've briefly discussed this with Stephano.
I wanted something I (and anyone else) could hack around with easily, as I think the whole area of the semantic web user interface is going to see lots of research and innovation in the next few years.
Check out also mspace among many other RDF browsers out there. That is an iTunes-like interface, but it is music-specific. I'd like to be able to create an interface of that ease of use for an arbitrary new cross-section of my appications and data out there on the web, just by browsing.
I like Piggy Bank also, it beats Haystack and Brown Sauce, in terms of surfing the triples you have gathered. You are right about the need for an out-liner mode. I get the feeling of one when I look at my FOAF file, which only has a few items. When you start paging to deal with what was once a single web page, you lose continuity of thought. I'm annoyed at anonymous nodes that contain no useful text or visual information. They clutter the screen with what appears to be useless information. I'm sure they may structurally useful to the statement from which they came, couldn't they just include that information with the node that does contain something?
Mspace looks nice, something about it makes me think its not a general purpose tool.
Has anything happened with Piggy Bank in the last six months? All I see is the same handful of scrapers that were there day 1.
If PB is the way to go, there’s a very nice dovetail with microformats as it’s pushing the standardization of markup and it’s link to meaning within XHTML documents.
With Piggy Bank itself, no, though we've been talking about where to go. PB is based on Longwell, a faceted browser, which has seen some development over the past half year towards reusable display configuration.
Our general mailing list is a good place to listen in on recent developments.
If you're willing to change your Firefox configuration, you could set signed.applets.codebase_principal_support to true and then get privileges via netscape.security.PrivilegeManager.enablePrivilege("UniversalBrowserRead").
This is described on http://www.mozilla.org/projects/security/components/signed-scripts.html.
Thanks, Done. The version as of today 2006-01-03 will request UniversalBrowserRead for different-domain documents, so will work reading data from any site for anyone who has codebase principal support enabled.


I enter few sites and script looks like freeze on loading... How long i need to wait?