Policy Aware Content Reuse
(CC Attribution License Violations Detection + Semantic Clipboard)
Accountability F2F
18 May 2009
Oshani Seneviratne
Decentralized Information Group
MIT
Computer Science and Artificial Intelligence Laboratory
Overview of the Project
Initiated the project during the "Networks for Web Science Student Exchange" at Soton
Became my SM thesis work
Consists of 3 sub projects:
Assessment of Copyright Violations on the Web
Attribution License Violations Validator for Flickr Images
Semantic Clipboard on Tabulator
Assessment of Copyright Violations on the Web
Goal: Find how many CC licensed images are embedded in Web sites without proper attribution.
Criteria: Attribution to original source or the owner specified within reasonable proximity to the embedded image.
Samples: Generated using the Technorati blog crawler. Fair Sample ensured by randomizing seed URIs, and by using the Technorati "Authority Rank".
Analysis of the Results
Sample
Websites
Total Number of Images
Correctly Identified Images
Misattribution
Precision
1
67
426
183
78%
55%
2
70
241
113
80%
42%
3
70
466
268
94%
39%
Manually checked to verify the validity of the results
Very high license violations rate and reasonable precision
Attribution License Violations Validator for Flickr Images
Site Crawler search for all the links embedded in the given site.
Flickr Query Evaluator obtains license specific information using the photo id.
License Checker checks the license and proper attribution details.
Notification System gives the instances of license violations in a human-friendly manner.
Output from Attribution License Violations Validator for Flickr Images
Semantic Clipboard on Tabulator
RDFa Extractor extracts all the RDFa embedded in an HTML page on Page Load.
UI Enhancer overlays the user interface for better license awareness.
Attribution XHTML Constructor composes attribution XHTML code snippet.
User Interface consists of a context menu on images that can be copied to the System Clipboard.
RDFa → Triples → Attribution XHTML
Right Click on the Image with RDFa
RDFa → Triples → Attribution XHTML
Triples corresponding to the subject of the image are extracted from the page.
RDFa → Triples → Attribution XHTML
Attribution XHTML is constructed and placed on the System Clipboard.
Visual Cues in Semantic Clipboard
CC License Use and Restriction categories mapped in the menu selection.
Tool tip text displays whether the image can be copied or not.
Challenges
Attribution License Violations Validator:
Tracking provenance
Tracking subsequent changes in the license
Scope of the human readable attribution notice
Limited Flickr support for license expression
Semantic Clipboard:
License granularity in the HTML DOM
Browser dependence
Usability vs. Operating System independence
Future Work
Check for other types of license violations
Give credit to the original content creator as requested
Extend to other media types
Handle different levels of license granularity
Implement a persistent storage
Perform a user study
Questions?
oshani@csail.mit.edu