Semantic Web 2009 - Research Papers

Reviews of submission #194: "Provenance of Inferences: A Trail of Proofs
in Linked Open Data"

------------------------ Submission 194, Review 1 ------------------------

Reviewer:           external

Overall Rating

   Must reject: It is important to reject this paper 

Originality

   Poor 

Technical Soundness

   Poor (some major issues) 

Presentation

   Horrible: Needs shepherding (help)

Expertise

   2  (Passing Knowledge)

Summarize the Scientific Contribution

   The paper describes a software prototype that crawls linked open data,
   applies N3 rules to the data, and uses SPARQL Update to publish the
   entailments and "proof" data.

Summarize your review

   The paper should be rejected because it does not offer a novel scientific
   contribution, it is poorly organized, it contains many grammatical errors
   that significantly reduce readability, and does not adequately address
   related work in the field.

The Review

   The paper offered a description of a prototype system that crawls linked
   open data and computes entailments based on N3 rules.  The use of "real
   world" examples throughout the paper made the paper accessible to the
   average reader, but the lack of formal technical content was
   disappointing.

   The paper describes a system, but does a poor job of explaining the
   motivation for the system.

   The paper systematically ignored how issues of data quality impact
   trustworthiness.  E.g., in the first paragraph of section 2 the argue
   that attaching a triple has "importance based on the number of proofs
   supporting the same derived fact." But they make no mention of the
   veracity of data that constitutes the proof.

   The paper was poorly organized.  E.g., the motivation and use cases
   sections contained overlapping content, and neither contained the
   expected content.  It is unclear why 2.3 "Desired Features" appeared in
   the motivation section.  Section 3.1 and Section 2.1 had considerable
   overlap.

   The paper did not address a great deal of related work, examples of which
   are included in the comments to authors.

Comments to Authors

   The paper provided a brief system description but did a poor job of
   making clear what the novel contribution of this work was.  If the paper
   is submitted elsewhere, you should the contribution of this work.

   The paper should be revised to be more careful with the use of terms like
   trustworthiness and proof.  It conflates the issues of justification /
   proof and veracity and should be more specific about which problem it is
   trying to address.

   The presentation of "linked data garbage collection" did not appear to be
   complete.  The programming language garbage collection analogy was not
   helpful.  In particular, what in your delete scenario corresponds to an
   in scope variable.  I.e., if un-justified triples are deleted from the
   data web and you haven't described a way to distinguish between ground
   facts and entailments, then either a cycle must exist or all data will be
   deleted.

   This paper contained numerous grammatical errors that would have been
   identified by a careful reading by a native English speaker.  These types
   of errors make reading the paper difficult and put a significant extra
   burden on the reviewer.

   W3 DAWG activity standardizing SPARQL extensions including update should
   be referenced and its relevance discussed.

   Justification of logical entailment has a rich literature that was
   ignored by the paper.  Consider, as a starting point, the ISWC 2008 best
   paper, "Laconic and Precise Justifications in OWL."

   Reference 5 appears to have the incorrect title.  It does not match the
   link and does match reference 3.

------------------------ Submission 194, Review 2 ------------------------

Reviewer:           external

Overall Rating

   Should reject: I argue for rejecting this paper. 

Originality

   Poor 

Technical Soundness

   Poor (some major issues) 

Presentation

   Horrible: Needs shepherding (help)

Expertise

   3  (Knowledgeable)

Summarize the Scientific Contribution

   The main contribution of this paper is that it proposes to generate links
   from the linked data cloud and preserve the provenance trails of those
   links so that these computations become permanent and at the same time
   consistent through a garbage collection process.

Summarize your review

   The paper tackles an interesting aspect of the maintenance of the linked
   data cloud. However, many of the aspects dealt with are dealt with in a
   naive way, which does not consider adequately and in a principled manner
   the complexity of dealing with such a wealth of distributed knowledge.

The Review

   As described above, I consider that the problem tackled by the authors is
   a very relevant problem for the linked data community, in the sense that
   it may be an important alternative to make links that can be generated
   across distributed RDF sources / SPARQL endpoints, by means of rules or
   whatever else method, permanent, so that they can be later exploited more
   easily, while still maintining the consistency of the obtained results
   when the original results change.

   However, my main concern is that the authors only refer slightly to the
   large body of work that was done two decades ago in the development of
   truth maintenance systems, and the review of this work and the lessons
   learned from this work should be reviewed more carefully. For instnace,
   the garbage-collection-based proposal presented in section 2.1 seems to
   be too naive to solve many of the problems that may appear, especially
   when taking into account that the update of distributed sources may be
   done in parallel by different data owners. 

   This is not the only case where the proposals seem rather simple and do
   not go into the necessary details. For instance, the statement that just
   by using rdf:label the data will be multilingual is quite simple and
   hides much of the complexity that may be related to the creation of real
   multilingual links among data sources. 

   Besides, there is no clear description of how rules are executed in this
   environment. The authors rather concentrate on issues related to content
   negotiation, and management of the HTPP protocol in different situations,
   rather than describign how a distributed rule engine may work. It seems
   that the work will be done in a centralised way instead. I am not against
   this (in the current Web there are examples of systems that work on a
   centralised fashion and perform very well, e.g., Google) but there should
   be a clear description of how the rules are encoded, how they are
   executed across sources, etc.

   Finally, there is a lack of a proper evaluation of the proposed solution,
   focusing on either theoretical advantages and disadvantages or
   experimental ones. 

Comments to Authors


------------------------ Submission 194, Review 3 ------------------------

Reviewer:           external

Overall Rating

   I do not care what happens to this paper 

Originality

   Fair 

Technical Soundness

   Excellent (flawless) 

Presentation

   Passable: Comparable to the usual

Expertise

   4  (Expert)

Summarize the Scientific Contribution

   This is mostly a vision statement and a draft of how it could be
   implemented. We should be able to reason on the Semantic Web e.g.
   collecting facts around and applying rules and write back to our server
   the results of this reasoning for later reuse

Summarize your review

   I give great credits to the author for wanting to provide a vision
   statement and i would encourage them to in fact show this as a demo,
   argue this around and engage in discussions.  I do not believe this
   vision paper is polished and convincing enough, to make it to the main
   track at this point (it is more workshop material).

The Review

   What described in the paper can and has been implemented in fact. The way
   this is proposed does in fact make technical sense.

   My problem with this paper is that the idea of having a "ever running
   crawler" and a completely distributed system is, in the light of what
   internet has turned into in the last.. many years unrealistically
   romantic.

   I would love to say "but the paper provides compelling examples, well
   worked out use cases involving real users with empirical evidence that in
   fact an ever crawling deamon is is sustinable and provides benefits" but
   this is not the case. 

   Also if an original crawler was able to get those proofs using
   deterministic algorithms, somebody else might be able to do the same if
   really interested at some later point, so the mechanism becomes a caching
   scheme, is this the most efficient way to do it? to be discusssed.

   In absence of a discussion which compares with other models, I am left
   with my original idea: it wont scale, it would be possible to do all this
   in  a much more industrial centralized way (still preserving distributed
   data origins), its advantages are feeble.. which leaves this paper to be
   still a good vision statement, something to discuss etc (e.g. at an
   appropriate workshop) but i dont think it is "complete" and convincing
   enough for the ISWC main track. 

   But it really depends on the other submissions as well, so i wont argue
   actively for rejection.

Comments to Authors

   the ideas are obviously interesting and to be debated in some venue.