Expeditions

4. Project Description

Abstract

A common dream through the ages is of humanity empowered by technology. The visions of Vannevar Bush and Douglas Engelbart, and the disruptive developments of the Internet and the World Wide Web, were predicated on the technology base available at the time. We look ahead to a time when what is today the latest technology has been widely implemented, when processor speed, network bandwidth and interface technology are orders of magnitude beyond those of today. We ask what sort of software will provide the person the ability to master her immersion in the accumulated knowledge and the constant interactions of humankind, to manipulate and contribute to it. Our expedition into the necessary software architectures is driven from the human interface, by the tasks of scientists, learners and teachers, rich or poor. We allow ourselves to dream again, afresh, of the ultimate human interface. Then we determine what underlying technological developments will be necessary to support it this in terms of powerful processing systems and their interconnection. This expedition will take us, we think, into distributed computation, inference and query; into social machinery, including policy-aware systems of privacy, trust and accountability. But we cannot be completely certain.

Introduction

When Vannevar Bush dreamed (1945) of a scientist enabled by technology, that scientist sat a desk full of microfilm, photosensors, and motors. Doug Engelbart (1960s) put him in front of a monochrome vector graphic screen, with the the power of a multi-user mainframe, with no external connectivity. The first web users (1990) sat at 1 Megapixel grayscale displays, with kbps connectivity, and a few Mips processing power. Today, (2007) a typical web user sits at a screen of a few (e megapixels, with a few Gips processing power and connectivity at a few Mbps to a web of around 10¹¹ documents. Extrapolating, we can only assume that our power, connectivity, and the resources available to us will be, again, manyfold dramantically greater.

Interestingly, over the last 17 years, while AJAX clients have improved the interface, and an impressive creativity has been applied to the variations in web site design and busioness model, little has changed in the basic functionality of the interface and the information space. This expedition assumes that in fact the increase in power will in fact enable disruptive changes in the way technology empowers humanity. It also assumes that the magnitude of the online space of people, knowledge and, increasingly, agents, will be overwhelming if tackled using only the existing technology. It assumes thirdly, that the magnitude of the problems which we face will be such that only the concerted effort of humanity operating with a joint power much greater than that of the individual will be able to solve them.

Motivation

It is appropriate to recall that the motivations of innovators through the centuries are rooted in the driving concerns of our time. The global desires for peace, the eradication of poverty an disease, and elevation of standard of living and education, call for collective decision making, collective knowledge, and collective creativity. We seek to support these functions with technology which allows new social systems, more effective, fair and stable to be devloped over the connected medium. The managemnt of trust, and the selection of scientific truths, are human systems which can be increasinghly mediated by technology. Such systems in turn will be enabled by fundamental advances in the functionality of the information system. This expedition seeks to provide a stepwise more powerful platform for an individual, but an individual always as a member of a community: in fact, many overlapping communities. We do not expect to cure diseases, or even to create the socio-technical structure which will be able to cure the disease. We aim, though to create the technology platform which will allow such structures to be created intutitively. We aim to prototye such a system, so that it can be deployed as it becomes possible and later as it becomes critically essential.

Assumptions

These are the assumptions we make and the tecchniques we envisage employing to reach these goals. Of course, at this stage these are hunches based on past experience and analogy with other cases. The nature of research is to overturn assumptions and discover completely new techniques. So this, then is a starting position, an inital attitude.

Web of people

Firstly, our approach will recognize that every subsystem which grows on the Internet is a product of both a technical protocol and and interlinked social protocol. [@@ref web science] Neither the design of new systems nor the analysis of their effect can be done effectively attemting to take only one aspect into consideration. (The motivation is, at root, social, as a social good is generally the goal of the design). Our user, whether scientist or underpriviledges learner, is part of a social community, and the tranfer of knowledge happens within social protocols, just as with todays' wikis and blogs.

Secondly, we recognize that the web of people interconected by technology even now very large (10^11 pages, perhaps ~10^9 people@@?) and is no homogeneous mass. It is complex, its large-scale behaviors following from the small-scale design in a non-obvious fashion. Indeed, it exhibits scale-free properties [@@ref] which makes it many ways more effective than other froms of network [@@ref]. We beleive that the design of new systems should explicitly take this into account and leverage it in order to acheive the project goals.

((As an example, as a user expresss information, they will be prompted for international terms as a preference, and then national or domain-sepcific ones, and lastly given the opition of making new terms in a new ontology. As another exampe, we expect quality of data to follow from a competitive bubbling ecosystem of different systems of review and filterig, from that proposed locally though to that trusted internationally.)) [@@ example necessary?? Remove?]

Graph of concepts

A significant element of our approach will be to build on semantic web technology. This is an innovation which has been progressively developed over the last decade, though it has not had (yet) a distruptive effect. We believe it represents an important pardigm shift which will be a significant element of the next major change. It is the latest in a series of shifts of plane of thought toward greater levels of abstraction from the infrastructure. The internet allowed programs to communicate without having to concern themselves with the cables which the communciation was going to flow over. The web allows programmers and users to work with a set of interconnected documents without concerning themselves with the computers which store and exchange those documents. The web of things raises us to the next level, by allowing programmers and users to work with real things --- people, chemicals, cars, agrreements, stars, whatever their interest, without concerning themselves with the documents in which these things, abstract and concrete, are described. While the basic technology for interchanging such data has been defined [@@ref] and further componest of the architecture are in the process of design [@@RIF], there has been very little work in understanding the impact of this for the user interface. Still less work has been aimed at the dramatic reinvention of human interface technology which we seek.

We envisage the abstract web of objects and the social web being mutually supportive an inextricably intertwined. The graph of data will depend on the underlying social system to function, while the social system will in fact be implemented using the graph of data.

Research Challenges

Human Interface -- @@ Karger contribute

Our own existing work has pioneed semantic web user interface, specifically the management of a large merged dataset [@Haystack], a interface to open world of linked data [@@tabr] and recently a read-write interface to the same sapce. Our goals however are much higher. That the interface allow the correction and extension and annotation of information in the graph. That changes made (by human input, machine infreence, or by sensor) to the graph of knowlege propagate rapidly to all those whose interfaces include a view of that data. That the system be robust against the loss of central servers, operting efficiently in a ad-hoc peer-peer mode. These are features deevloped individually, but not applied to the situation of a interface to the fully deployed semantic web.

Our expedition into the necessary software architectures is driven from the human interface side. We take as our targets scenarios the tasks of scientists, learners and teachers, whether in rich or poor environments. We allow ourselves, like Bush and Engelbart, to dream of the ultimate human interface, but this time with new understandings of the power of machines, and a stronger platform of existing technology as a base. Then we determine what underlying technological developments will be necessary to support the functionality we desire, in terms of the powerful processing systems and their interconnection which we assume will be at our disposal in the near future.

The systems we have build have allows user to explore links across the web of data, to fid a pattern, and then generlize by searchig for all similar patterns. The functionality has been crudely implemented, and has been slow in operation, even with amounts of data (100-1000 results) which are very much smaller than the result sets whcih we may expect in the more populous, highly conected, end of the scale in the future web of things.

Research challenge: Explore possible completely new user interface metaphors. Existing work has, in cause of easy adoption by users, been careful to wherever possible use existing user interface metaphors. However, this may have unintentionally constrined the design and the total functionality of the interface. This challenge is to investigate the user interface space as unconstrained as possible by existing applications, looking for a new way of using the hardware which extrapolate will b e available in the future.

Research challenge: Allow awareness of the social implications of provenance. Throughout the design, an integrated view of data may be presented directly to the user, combining information of diverse provenance. However, immediately to hand, though, for the discering user, indeed the safe user, must be the provenance of any specific item of information, or rather the social properties such as trust, acceptable use and licencing, asscoiated with that provenance. In 2007, this is in fact an issue for normal web paegs and browsers [@@ref Phishing]: the issue is clearly more complex when information has been merged and in cases infered from many sources. Our minor investigations in earlier work [@@ref Tab www2007 paper] serve to illustrate the magnitude of the challenge.

Research challenge: Allow control and awareness of the disposition of new input. Consider a user inputting information into this system. Just as a user reading must be aware of the social implication of provenance, so a user must be aware of, and in general in control of, the social issues around the infomation he or she is puttting in. The current classic problem of the teen sharing inappropriate things with a wide public on social networking sites is a 2007 example of a general issue in user interface. This too is exacebated in the complexity of the semantic web.

Research challenge: Enable the evolution of user-constructed social machines. The elaboration of the web of knoweldge by the global team fo human editors does not take place in an unconstraind orgy of expression, but through the mechanics of a host of interconnected social protocols. These include scientific review, group formation, election, appeal, liaison, negotiation, and argumentation to name a few. Social machines (BL WTW 1999?) are such systems supported by technology. The 'web 2.0' wave of user-generated content sites typically involve isolated machines. We hypothesize that (a) there are forms of social machine which will be significantly more effective than those we have today; (b) these protocols interlink in society and must interlink in the web; (c) these are unlikely to be developed in single deliberate effort in a single project, but if the whole user population is able to construct ad adapt them, that they will evolve.

Infrastucture for Social Responsability @@Danny

It seems evident from our work to date [@@ref PAW, TAMI, etc] that social machinery for effective, sed systems, needs a certain level of common support throughout the infrastrcture for social responsable systems. This policy awareness includes the limiting of access for privacy, but also the tracking of acceptable uses of information as it is passed around and combined.

If the computational infrastructure of the future Web is going to be able to provide the capabilities discussed in the introduction, machine-readable descriptions will need to be designed not only of the information content (via Semantic Web technologies) but also of acceptable use and other privacy- and trust- related policies and practices. In this section, we summarize both the need for an end-to-end accountability framework and some of the underlying reasoning challenges in making the Web Policy Aware.

Provenance and Dependency tracking @@ Gerry

An effective user interface for a collaborative context ideally allow changes to data to be entered by others, or streamed from other sources such as sensors, and to cause immediately a change in the display presented to the user. This is not possible using existing architectures, in which web esources tend to be loaded onece, or queried once, to build a display which is static thereafter. To support the direct propagation of changes, existing triple store functionlity will need to be replaced by Truth Maintenance Systems (TMS) [@@ref]: stores which can retract data, and any inferences arising from data from a certain source. A goal is that a user might control the social criteria which will select sources (such as, for example, the social standing of the author or the endorsement of the source by a party in a given class), that this should allow a filtered view to be given, based on only that infrmation: a hypothetical view, fundemantal to exploring posibilities.

Research challenge n: To build a system, to support this functionality, in which changes can proagate both ways poses some constraints on the logic which can be involved in the inference. Determine progagation and retraction algorithms which function with various logics, investigate fundamental and computational limitations of these alternatives within a scale-free web.

Security and privacy

Core design principles of today’s Internet and World Wide Web are aimed at moving data packets from one machine to another as simply and quickly as possible. Content-obliviousness of basic communication protocols is a huge asset in the fulfillment of this over-arching goal. Unfortunately, it is also a major obstacle to stakeholder control of data and networks, e.g., to the achievement of user privacy, enterprise-network security, and after-the-fact accountability. Whether one can, by modifying the basic web architecture, preserve the flexibility and scalability of today’s web while enabling information stakeholders to achieve privacy, security, and accountability is a major open question.

Research challenge 3: Explore network-architectural principles that enable controlled dissemination of personal information and robust protection of web-based resources. Are these principles consistent with the dynamism, heterogeneity, scalability, and human-centered nature that we envision for the future web? Formulate precise definitions of these seemingly contradictory goals, and explore the existence of provable, quantifiable trade-offs among them.

Tracking Appropriate Use

Another capability needed in Policy Aware systems is the ability to be able to appropriately apply different policies in different situations, based on their use contexts. For example, consider medical personnel needing access to an accident victim's medical information in an emergency. The normal mechanisms for gaining such access may be prohibitively time-consuming, but ignoring the information (as is done in most current situations) can often lead to complications and/or loss of life. A context-aware solution could allow the emergency personnel to override controls while being warned "You are breaking the law unless this is being done in a life-threatening situation (and this access will be logged)." Developing an infrastructure that can allow such context aware capabilities is an important aspect of reaching the expedition's vision.

Research challenge: Provide a context mechanism that is both strong enough to recognize extenuating circumstances but flexible enough to allow overriding (with audit) in appropriate situations. Current logics of context <cite> generally provide one or the other of these capabilities, but not an acceptable mix that can both let humans make the final decisions with respect to use, but also guarantee accountability (see section xxx) to avoid malicious or improper access. We will explore this issue by exploring the integration of truth-maintenance systems [@@ref] with context mechanisms and accountability logics [@@ref]. This will be based on our past NSF-funded work (sections XX.x and XX.y).

Support for Differing Interpretations

Just as contexts are required in information access, they are also needed in interpreting data being explored via datamining or other machine-based analysis mechanisms. A common flaw in today's datamining algorithms is that they assume a feedforward mechanism assumed to provide the, rather than a, account of the data. However, for example, it is exactly that different scientists can interpret the same data in different ways that provides for the argumentation and testing so crucial to scientific discourse. Mechanisms that can allow different communities to simultaneously have their own interpretations of data resources, but also understand the interpretations of others, will be a powerful enabler to the [[what are we calling it]] of the future Web.

Research challenge: Design scalable mechanisms that allow multiple interpretations and visualizations of data resources to be developed simultaneously. Current datamining and visualization techniques generally are designed such that interpretation biases are hidden in procedural code or in problem encodings. Making the different ontological commitments of competing interpretations explicit, and linked together, can provide a mechanism for different views of data to be simultaneously developed and explored. We will start from current work in Web ontologies developed on the Semantic Web [@@ref], extending them to incorporate techniques for user communities to better identify the implicit biases of different analyses and to explicate and share these varying interpretations. This work will also include interface work in making it possible to develop and link the vocabularies, tie these to the analyses of data descriptions, and explore how the data would be interpreted based on alternate communities' approaches.

Computation in the Web - @@Joan

The two twin challenges of the web, of finding things and of filtering the available information ( find a needle in the hasystck, and drink from the firehose) we expect to be orders of magnitude more challenging. Even fairly simple manipulations of the user interface could ask questions ("What is the average age of all people?", ") which, naively interpreted, would demand large amount of computation and/or access to data from all over the world. We chose to investigate decentralized rather than centralized solutions to this problem, even though centralized indexes have proven possible at current web scale.

@@ Bheind all this UI we will need efficient computation . We imagine that the processing power and ata witll be distributed. (Not obvious, but we do)

Algorithms and complexity theory

Classical theory of computation (ToC) provides computational models, formal notions of “efficient computation,” and both positive (i.e., algorithmic) and negative (i.e., hardness) results about whether various problems of interest can be solved efficiently. Network connections between computers are present in some of these models, e.g., in those that address “parallel computation” and “distributed computation.” However, even these parallel and distributed models fail to capture the nature of web-based computation. New models and definitions will be needed for the design and analysis of algorithms for the future web (and for proofs that there are no good algorithms for some problems of interest).

Theory of web-based computation is unlikely to be just a small variation on the existing theory of distributed computation; some radically new ideas will probably be needed. Even something as central as the classical ToC paradigm of an algorithm that operates on a fixed problem instance and terminates (or “converges”) with a fixed, correct output after a maximum number of steps that is a function of the size of the problem instance is inadequate for web-based computation, where there may not be a fixed problem instance, and computations may never terminate. For example, a large-scale, web-based game or conference need never actually “converge” on a correct “output” but rather may run continuously and indefinitely, as participants come and go and adapt their strategies. Algorithms that support these activities will have to be formulated and analyzed in novel ways.

Research challenge: Formulate the definition(s) that a computational system must satisfy if it is to be called a “web.” Which critical resources are consumed in web-based computation, and what upper bounds on the consumption of these resources must be satisfied for such a computation to be considered “efficient”? Formulate notions of “reduction” that can be used to prove that one web-based computational problem is at least as hard as another or that two such problems are equivalent.

Research challenge: Identify core problems of web-based computation. Prove hardness results for those core problems for which efficient algorithms cannot be found, and investigate reformulations of these problems (or re-design of the future web) that prevent them from becoming barriers to fruitful web-based activity.

Web-based human-aided computation

One intriguing research direction is the creation of formal foundations for the nascent field of human-aided computing. One big success of this field is the creation of CAPTCHAs (Completely Automated Public Turing Tests to Tell Computers and Humans Apart), which are tests that distinguish humans (who are the intended users of web-based services) from computers (which can be programmed to abuse these services), by posing problems that are apparently hard for computers but easy for humans. CAPTCHAs are in widespread use today. Providing theoretical foundations for human-aided computation is a particularly novel challenge. Many observers have celebrated the “democratization” of the information environment that has been wrought by blogs, wikis, chatrooms, and, underlying it all, powerful search. More human input to the search process will make the information environment even more democratic, but it will also strain the algorithmic and mathematical foundations of correctness and information quality that have traditionally been present in the technological world. Trust, noise, and scalability all play a part in human-aided computation, and these words mean different things when applied to humans from what they mean when applied to computers.

Research challenge: Develop the theoretical foundations of web-based, human-aided computation; in particular, develop algorithms that allow web-connected computers to leverage and aggregate the results of millions of human actions. Explore the power and limitations of increasing human involvement in web-based computation.

Controlling Inconsistency -- @@ Jim

In creating an infrastructure that can aid humans in finding, using and controlling information, many things that are currently encoded in procedural ways will need to become more declarative. This is already happening on the Web as database descriptors include business rules and other processing "logics" [@@ref], as scientific workflow tools become more prevalent and usable [@@ref], as service-based description policies continue to be extended [@@ref] and as Semantic Web technologies result in the creation of more ontologies with machine readable descriptions of domain relationships [@@ref]. Standardization of rule formats for the Web is an ongoing activity [@@ref] and research projects have shown the viability of new technologies for logic- and proof- based Web interactions (including the NSF funded projects of the PIs described in section XX.x and XX.y). However, how to make these reasoning-based approaches function in the Web setting remains a challenging problem.

Research Challenge xxx: Explore computationally appropriate logics for handling the necessarily inconsistent and open system of [[@@whatever we are calling it]]. The open and distributed nature of the future Web will require that rule sets be linked together, and that policies defined in one domain may need to be used in another, possibly unanticipated, context. Formal logics that are powerful enough to encode Web policies generally have problems both in tractability (even restrictive subsets may have exponential or undecidable behaviors) and with inconsistency – from a contradiction it is possible to derive any proposition. However, in an open system, inconsistency is sure to arise either from error, disagreement, or malicious behavior. Logics that control contradiction, whether intuitionist or paraconsistent, have been explored [@@ref], but none has yet been shown to function in a Web scale environment. We will explore this issue by augmenting annotated logic programs [@@ref], which have good scaling properties, but which require extensions to be expressive enough for Web policies.

4b) Leadership and Collaboration - Danny

- Working with W3C for Tech transfer to standards

- Working wiht WSRI to generate Web Sciebce curriculum materials

- Including budget to actually build out a system more useable than a raw prototype, so that people will use it and we can explore how they do and what further research is necessary.

Data and testbeds

We attempt to research an age software architectrues and techniques for an age in which large amounts of data are available. this will require the prototypeiing with some signifiant

Metrics for success

- We chose scenarios. such as maybe a scientist in the process of investigating prossibel drugs, or a student investigating the possible crrelation bteween the sale of particular recreational products and ecoogical such as polutants, or a... @@ insert here

FODDER

From Jim:

Reasoning Challenges for the Future Web