What is the Analogue for the Semantic Web? If the Web is like a Page+Links, the SW is like a...

Submitted by mc on Fri, 2007-02-16 16:27. ::

This is the first entry for the breadcrumbs blog from what may be called the Interaction perspective on things Semantic Webbish. To that end, i've been mulling over what is the paradigm for the Semantic Web - more particularly what is the physical world analogue for this concept?
In order to design an interface to support a technology, to expose its potential for what it can do, it helps to know what it is - or failing that - to have a model around which we can conceptualize what it is, what it does, and somewhat how it works. It's not unusual for a new technology or concept to be introduced via an analogue of a previous, familiar technology "it's like this thing - but for this new bit." This "like this, but for this new bit" is what i've been looking for, for the SW.
What i'll propose (eventually) below is that one paradigm may be a notebook in the traditional sense of the term of the notebook as a place to capture work in progress. taken one step (a big step) further, i'll argue that if the Web paradigm is a Page + Links, the paradigm for the Semantic Web may be a notebook + the memex.
The Web PAGE - it's a Page. With Links.
We have a great model for the Web. It's the page: text with images. We're all familiar with concepts of the page. It's clear, easy to grasp. I'd postulate we need a similar construct or paradigm or analogue for the Semantic Web. We have a long history with read-only text, whether as official public communication, or as unofficial comment.

We also have a long experience (400+ years) of experience of a particular technology's deployment of words and images in a page - whether as an illuminated manuscript, or an early printed text with woodcuts.

The one new thing added in the Web to the notion of the page - the thing that makes it a Web page - is the hypertext link. The link is really the only core new concept introduced to the page - and more times than not, that link's job is to links to another page. The translation from one mode of non web-page to the Web page is not a terribly huge leap. The link as a concept is almost what we'd call "intuitive" in its use.
This is not to say that there are not a myriad of design considerations for making that new page+link approach useful, usable and accessible. We have developed whole suites of conventions on how to deliver pages effectively and have gone through now several generations of "web design" to ensure that text, image and link work. Yet despite over a decade of technological evolutions in the Web technology, the paradigm for describing what we create with the Web is the same: it's a page. With Links. The Page as paradigm informs how we design the page, the way we design the page. It's not a spreadsheet; it's not a network diagram. It's a page.
Even with Web 2.0, with RSS feeds, blogs, mash ups, we still have pages. The only model variant in Web 2 with location based mash ups is that the main image on the page is now a map. åAnd again, Maps are familiar technology that have been around for millennia, and are a technology most of us had some training in our education on how to use. It's amazing how much we use familiar technologies to model the representations for new ones, perhaps especially in computing. Bottom line, the web page as page is a clear model that rapidly communicates what the Web is largely about: enabling people to publish content, communicate ideas, and link into the myriad of other ideas available. The page is a powerful analogue for communicating this model, and it is, i would argue, because there is such a clear model, that there has been such rapid adoption of the concepts, and interests across disciplines in the technology.
Analogue for the Semantic Web?
So, if the analogue for the Web is the page, what is the analogue for the Semantic Web? And why is finding this analogue important? Part of the answer to that question may stem from whom do people in the Semantic Web community wish to attract to be involved as practitioners, innovators, creators, discoverers in this space? If it's the same range of passions and expertise that have brought so much to the Web from the arts, humanities, sciences, business and so on, then this question of model becomes critical.
Consider for a moment how the Semantic Web is described in the new First Stop Shop for What is It, Wikipedia.
The Wikipedia entry for the semantic web begins: The Semantic Web is an evolution of the World Wide Web in which information is machine processable (rather than being only human oriented), thus permitting browsers or other software agents to find, share and combine information more easily. It is a manifestation of W3C director Tim Berners-Lee's vision of the Web as a universal medium for data, information, and knowledge exchange. At its core the Semantic Web consists of a data model called Resource Description Framework (RDF), a variety of data interchange formats (e.g RDF/XML, N3, Turtle, N-Triples), and notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL) that facilitate formal description of concepts, terms, and relationships within a given domain. The burgeoning Semantic Web comprises newly created and/or transformed web data sources endowed with computer-processable meaning (semantics).
Now, all that description tells anyone about the semantic Web is that it's for Machines. And i'm not sure i believe that the the end game imagined for the Semantic Web is to make data easier for machines to process. It would seem that that machine-processable stuff is a means to an end, but not the end itself. The end is still about people, and PEOPLE being able to build knowledge by moving through linked information.
We might ask, then, if the Semantic Web has the same human-oriented goals as the Web, why not just use the same model for describing it: pages with links. I'd suggest that the page is not robust enough to support what more we get from the Semantic Web's far greater emphasis on the Link as opposed to the Page.
Because of how it's structured, content in the Semantic Web can be richly associated. We can have the potential with the Semantic Web to explore things in new ways via these associations. For example Beethoven as a composer can be associated with Genres in Music, and to specific Recordings as instances, which associate with various artists, and recording companies or even with the politics of certain works being recorded or not. Beethoven is also associated with a particular period in History; with the interaction of styles in that period, and hence there are correlations between music and architecture and scientific thought at the time. All these associations branch out from Beethoven directly. One might even say that such branches constitute a graph, and the page cannot reflect these possibilities. But neither, David Karger and i have argued, does a raw graph, express that richness.
For one thing, besides being illegible on a number of levels, these graphs present only things which are directly connected on the graph. The Semantic Web has a technical facility to support inferring connections between points according to the expression of rules. For instance, one might see a connection between a Beethoven work and the structure of a poem or an equation, and be able to express that connection such that a new connection among these points becomes available. Seeing, finding and drawing those kinds of connections is a primary attribute which the Semantic Web can enable. The page cannot readily contain that possibility.
Beyond the Wikipedia definition for the Semantic Web, then, the Semantic Web's promise is to enable people to explore, associate, and connect information to build new knowledge. This sounds a lot like what V. Bush described in As We May Think as the Memex (see Chapter 2: Vannevar Bush and Memex, by Ronald D. Houston and Glynn Harmon in ARIST 41 for a fab overview of the perceptions of this paper since its publication).
Drawing of Bush's theoretical Memex machine (Life Magazine, November 19, 1945)
The key part of the Memex is making and sharing associations among divergent sources. Bush imagined professions of "trail blazers" (section 8 of As We May Think) emerging who would go about creating these inferences, and publishing them in new kinds of encyclopedias.
il n’y a pas de hors-memex
Bush's view assumes that there are encylopedias and then trails built with the memex through these encyclopedias.
But what if there was only the memex? In a sense that's what the Semantic Web suggests with its emphasis on links and everything having its own unique id (uri). But does this idea of the Semantic Web as a vehicle that supports making associations - as a memex - get us closer to a readily translatable concept of the Semantic Web?
I'd like to tease out a few more parts of the memex idea as a way to addressing that point. First is that bush imagined the Memex very much as a tool for scientists - as a way to help researchers to make sense of all the work not only they themselves, but their colleagues as well, would be doing. This focus on the researcher is particularly appropriate for this current exploration of possible paradigms because, because it focuses on the artefact of interest - the logbook - as an object supporting work in progress.
Second, the memex took in not only textual notes but images of observations the scientist would literally take with a camera while working. The memex is very much a multimedia repository of not only others' extant work but of the scientists own work-in-progress.
I think this notion of work in progress, of personal work log, is critical. This is distinct from the read-only model of the Web, and moves towards a writerly as well as a readerly medium (to use Barthe's terms). But more particularly, it adds a new dimension to linking from elements presented as finished pages to elements which are in the rough, personal. Which may or may not be (yet) meant for publication. There is something of this middle or transitional ground happening on the Web. This entry is an example of it.
We see this writerly side in Blogs, rss feeds, tags, comments, ratings - all the places where Web 2 approaches are helping with more rapid publication and inter-commentary of content on the Web. But even this new writerly approach is not quite what the memex is also getting at with its model of something almost as familiar to us as the technology of the printed page in a printed book: and that is the scientist's notebook.
Scientist's Notebook as first pass at Semantic Web Analogue.
Notebooks can be the complete filling of pages; or of scaps of information. They can be used for the capture of formal studies like experiments, observations in field work, or notes for future reference. But they are all unlike what we think of as the Web in a particular way: the web is public; we use its protocols to publish work for others. Lab/note books are personal, idiosyncratic, and in particular, they represent works in progress. We may find that we use the semantic web technologies both locally/personally as well as distributed/publically. This *for the researcher* or for the researcher's work in progress as a model of part of the Semantic Web is very memex'y. It's also very different from what the web has become. I think this blending of personal use with the semantic web's potential for automatic association of external, associated resources is a significant shift in how most of us have been thinking about the semantic web.
Let me frame that last statement. There have been projects thinking about the semantic web desktop - using the semantic web as a personal or local server layer for data. There have also been projects like myTea which have imagined using the semantic web technologies to maintain transparent context histories as a great enabler for a bioinformatics lab book that can automatically track and record bioinformatics experiments as they develop.
i don't think however that anyone has previously proposed a paradigm, model, analogue for the semantic web as a researcher's notebook. with links. or, more properly, with memex, where the memex itself is an extension of the researcher's notes, observations, raw data, work in progress. I've already said that the page can't reflect the rich associative possibilities of what the semantic web promises so one may ask, how could the analogue of a researcher's notebook which is so idiosnycratic.
One way is that it is possible in a notebook (or on a huge sheet of paper or on a whiteboard - other spaces of work in progress) to draw lines easily across notes to make connections. David Wang and i this past fall at UMD's MindLab were looking at a way to draw these kinds of lines between known points in an ontology to help create rules/inferences to make new connections in the knowledge base.
Indeed, there may be a great deal more we can take from the qualities of a researcher's notebook to see as a design prompt to capture more of the semantic web. But one of the important components of this notion of the semantic web as notebook + memex is that it situates the Semantic Web conceptually within the realm of the human. It also situates the semantic web as something that can be part of a process that is engaged with the user. Right now, very few semantic web tools, whether mspace, haystack or tabulator to name a few, support direct authoring.
The idea of seeing the semantic web be pull inable into a researcher's context, where the notebook is constantly seeking associations to support the researcher's process, seems to me a compelling kind of inversion of the usual models - instead of putting stuff out there, we are bringing stuff in here, working it. potentially sharing it. but first and foremost using it, munging it, creating with it to develop new knowledge. Process rather than end.

Heterogeneous, Implicitly Structured, Implicitly Associated Data Capture
Another notion of the notebook which seems interesting is that it also breaks the page as read-only, well structured, well presented information space. In the physical pages of the notebook, we see various forms of data entry where long exegesis is rare compared to short bursts of information, what Michael Bernstein calls "information scraps".
These various uses of the page-as-surface, for a variety of forms of content , also demonstrates the personal, though frequently work-related, work-in-progress attributes of this popular form of content capture. Again, we do see examples of a kind of information scrap on the Web - these can be one-liner blog entries pointing to news or other ideas, to tags, to recommendations, to comments. Indeed, entire modes of communication have been built up around short messages like texts, or widgets that communicate only the weather. But these info bytes, if you will, unlike the info scrap, are again meant for publication - for someone else to be able to consume. Tags may be an interesting boundary object as they can be both personal markers - highly idiosyncratic - as well as group or public markers. But for the most part, the short bytes we find on the web are there for public consumption. Notebooks are workspaces, pre-publication resources, the working out of ideas.
This is why for now, in any case, i'm focussing on the idea of the note book rather than the personal journal or log. The note book or lab book is a place for taking notes on ideas; it is not the final forum for the ideas, but it is the gathering place for them. Another similar kind of physical world analogue for this kind of working out process is the notecard stack. Indeed, one of the earliest hypertext systems, NoteCards, attempted to emulate this system of idea capture and reordering. Spatial hypertext systems like Tinderbox have also capitlized on the the affordances of moveable cards or small objects to capture ideas where these ideas can be spaced out, clustered, where space in the organization communicates a kind of meaning - at least to the author of the structures.

The attributes of the notecard stack that i find particularly relevant are the usual purpose of the stack and the kinds of data the cards hold. When i was in highschool, we were taught a particular methodology for notecards as the way to prepare a research paper. There were idea cards, quotation/paraphrase cards, and bibliography cards. These cards could be created in any order as material was discovered or ideas occured "only one idea to a card; only one quotation per card," "only one reference per card" - the idea being of course that individual cards could be organized and reorganized spatially for getting a picture of the developing paper. Not all cards would be used. Gaps could be detected. The organized cards could then be put into one pile, and the paper written effectively from the turning over or laying out of a set of cards at a time (one exercise required us to generate an outline of the paper from the cards before proceeding to the paper-writeing).
The relevance of the notecard model to the concept of the semantic web as personal work space with associated public data is the integration of personal ideas with external sources: the idea cards backed up with the quotations from external sources. In the case of notecards, these associations are either manually created by the researcher/author, or are presented by (and thus attributed to) another author.
The goals are the same: building new knowledge by capturing ones own ideas, and working with others - whether these are ideas that come up in a conversation with others and are hastily jotted down, or are captured from a published source. there is an interplay here, a making of meaning. I mentioned spatial hypertext: Mark Bernstein's Tinderbox software as said very much follows the notecard paradigm to support just this kind of intermix activity: it enables links to be copied from the web into cards, and of course enables other kinds of data to be written into the cards. It blends capture of the external with capture of the personal. So do many digital notebook ideas, like the Circus Ponies one i'm using for drafting this entry - they don'thave the nice spatial affordances of Tinderbox, however - they are more locked to the paper metaphor. Something neither of these fine programs have that i think the notebook or notecard + memex could bring is the automatic discovery of association from both the personal and the external into the personal work space.
This is an idea that Max Van Kleek, Michael Bernstein, David Karger (at MIT) and myself are pushing on right now from one angle in something called "doing" (pronounced "doyng"), and that the Rich Tags project is pushing on in an associated other space. We're interested in finding ways/metaphors/paradigms to support the capture of personal structured data (like Michael's information scrap of a number jotted down that is a phone number) so that it can be first associated with what Max has started grabbing - the local context - and from there look at drawing in appropriate associated external contexts.
In a way the Haystack project modeled this eb and flow of personal information like calendar events with external information such as flight bookings. It created an integrated view of these information resources so that they would be concurrently available. No one knew they were working with semantic web data, and the opportunity to explore across contexts (like the bethoven example way at the start of this entry) was not there. I think this time we're asking the question what would this new thing look like from the moment the computer is engaged. How might input mechanisms change? how might representations across applications-as-contexts differ if there was this collective "data soup" from which these contexts could draw/share?
For the moment, i'm imagining this context-rich interaction as the Semantic Web, and the way i'm thinking of it is as a researcher's Note Book (we are all knowledge workers at some point) of work in progress. A notebook. With the Memex.
The reason that vision of note book + memex appeals to me particularly is that it foregrounds an active engagement with the data - both reading it, writing it, potentially sharing it for reuse. And i think that kind of in-process engagement with information - to have in work data and external data blending for the development of new knowledge is what the Semantic Web is about.

(dig count added May Day, 2007).



© m.c. schraefel, 2007, visiting fellow, DIG. The ideas in this exegesis were initially stimulated from various conversations at January's DIG face to face meeting, then explored later in Jan, 2007 as part of an talk on work in progress while i was visiting the iSchool, University of Texas at Austin.