Source code: tar.gz, zip
One could upload a photo on Flickr and attribute a Creative Commons (CC) license to it and make it publicly available. But there is no way of knowing if that photo was used for any other purpose by somebody else, unless she actually spots that. It would be nice to have an automatic way of checking whether any CC licensed photo was republished on a blog, but has not given attribution to the original creator when the license says "BY".
The crawler implements a basic BFS algorithm to check for Flickr image URIs in a given site.
A Flickr image URI takes one of the following formats:
From the photo URI, the id of the photo can be extracted. Using this id, all the information related to the photo could be obtained by calling several methods in the Flickr API. This information also includes the original creator's Flickr user account, name and CC license information pertaining to the photo. Again, using the crawler, the page is checked to see whether the original creator is attributed (only if the photo has a CC license attached). With the QDOS SPARQL endpoint, more of the photo owner's data (FOAF URI, etc) could be obtained, to perform a thorough search and notify in case of a license terms violation.
There is a hypothetical scenario using the AIR policy language described in this document. In order to express a policy in AIR, the existing creative commons RDFa schema was extended as it was not expressive enough to define a custom AIR policy to reason over a ransaction event log.