Provenance-Embedding Documents
Abstract
As computer and network technology improves, information will become
more fluid, and the sharing and recombining of data in a
decentralized fashion will become even more prevelant than it is
today. Knowing the provenance and policies associated with data
will be an important part of any attempts at regulation of such an
environment. Models that rely on centralized servers for this data
are too rigid for use on the Internet, and models that rely on separate
metadata files will require extraneous transfers and accesses each time
metadata is required. In an environment such as the Internet,
inline metadata will become the most convenient way to express
provenance information. To increase the ability of end users to
create and use documents with known provenance and policy data, I
present a method of embedding provenance metadata within
documents, using RDFa. I present a JavaScript API for extracting
this information, so utilities that use this metadata to compute
interesting properties can easily be written. I present tools that
will allow end users to easily create provenance-embedded
documents. I present an annotated MediaWiki that produces
provenance-embedded documents from its internal database. Finally,
I present numerous tools for allowing end users to visualize
provenance information in unique ways.
Thesis Outline
- Introduction
- Background
- Technologies
- Policy motivations
- Provenance-Embedded Documents
- Data structure embedded with RDFa
- JavaScript APIs for using PED structures
- Cryptography?
- Document Creation Tools
- Copy-paste bookmarklets
- Maybe a copy-paste Firefox extension?
- Notification of what licenses user can select
- Validator which checks for satisfaction of composite document
- Check for literal satisfaction of CC-BY and CC-NC
from user-generated rulesets?
- MediaWiki
- Generates RDFa embedded pages
- Has extended wiki markup to ease creation of PEDs
- Document Viewing Tools
- Provenance Browser
- Proof of Concept: Creative Commons Rights Engine
- Reasoner design and implementation
- Application Scenarios: TAMI Scenario 10
- Conclusion
Italics are topics that would be nice to cover, but not essential.
Projected Schedule
3/8 Implement copy-paste bookmarklet, implement Java API
3/15 Write policy background, MediaWiki hacking
3/22 Write up ontology, write up bookmarklets, MediaWiki hacking
3/29 Write up MediaWiki
4/5 Code "provenance browser"
4/12 Code "provenance browser"
4/19 Debugging week + Write up provenance browser
4/26 Debugging week + Writing week
5/3 Writing week
5/10 Writing week
5/17 FULL DRAFT COMPLETED
5/17-5/28 Editing
MAY 28 THESIS DUE
Components
Provenance Embedding Documents
Status: Draft 1 done
Description This is the spec for the Provenance-Embedding Document. Sandro,
Danny and I worked out an initial spec. This is enough to get plenty of work started,
but there are still several issues to be resolved. Among them: What namespace to use,
how to best integrate with PML, and what ontology to use for allowed purposes.
JavaScript API
Status: Prototyped
Description
Copy-Paste Functionality
Status: Prototyped
Description
I wrote a Firefox extension to provide provenance-preserving copy.
After discussion with Sandro, we decided to also create a JavaScript bookmarklet for better cross-platform compatibility.
License Notification
Status: CC Version Implemented
Description
MediaWiki
Status: NONE
Description
Provenance Browser
Status: NONE
Description
Creative Commons Scenarios
Status: Draft 1 done
Description The Creative Commons scenarios are a use case
for provenance ontology. I will write a simple reasoning engine
that will compute the results of combining various Creative
Commons licenses. I will then combine this with the JavaScript API
to create tools for allowing users to examine Creative Commons
metadata. Further, I will create use cases for these end-user tools
in the form of a TAMI scenario that revolves around a professor
composing a presentation with slides from various sources.
Scenario 10, part 1
Scenario 10, part 2
Scenario 10, part 3
Creative Commons bookmarklet
Harvey C Jones