Logging/Authenticating SPARQL client and server Mike Stunes $Date: 2008-08-27 13:37:40 -0400 (Wed, 27 Aug 2008) $ $Revision: 24930 $ $Author: stunes $ $Id: README 24930 2008-08-27 17:37:40Z stunes $ Contents -------- 0. Quick Start 1. Introduction 2. Dependencies 3. Files 4. Internal variables 5. Access Policy 6. Privacy Policy 7. Logging 8. Creating server instances 9. Client 10. TODOs 0. Quick Start -------------- To run the sparql server a. Make sure you've installed the following and that they are added to your pythonpath rdflib Tested with version 2.4.0. Python OpenID library Tested with version 2.1.1 of this library--version 1.x known not to work. Should work with any 2.x version. ElementTree (By default, this will try to use the built-in ElementTree from Python 2.5 and above, falling back on the external ElementTree if necessary.) PyOpenSSL Tested with version 0.7. Cwm Place a symlink named "swap" in the same directory as sparql_server_ssl.py, pointing at the "swap" directory of your cwm distribution, or otherwise make sure that the "swap" directory is accessible to Python. b. Generate a self signed certificate and key using openssl > openssl req -new -x509 -keyout key.pem -out cert.pem Move these into the certs directory c. Modify the configuration file, sparql_server_config - change rules_file to your authorization file or use access_policy/access_policy.n3 to restrict access to DIG members - change policy_file to the usage policy or use access_policy/air_access_policy.n3 as a default - change keyfile to the key you just generated - change certfile to the certificate you just generated - change data_* to the mysql store that contains the data you want to protect. data_identifier should be set to rdfstore (rdflib requirement) - change log_* to the mysql store where you want to store the log of queries. data_identifier should be set to rdfstore d. Start the mysql server(s) e. Start the sparql server > python sparql_server_ssl.py f. You will be asked to enter the pass phrase of the key/certificate you generated. Enter the phrase g. The script will attempt to import the libraries and then start the server. It will provide the port number at which the server is running https://server:8080/ h. To test the server open your Firefox browser and go to https://server:8080/webform i. You will be asked to log in with your openid j. After successfully logging in, you will see a query interface. Type out a query that matches the data in your store or use the default SELECT * WHERE {?s ?p ?o.} k. If properly installed, you will receive the sparql results tagged with the usage policy 1. Introduction --------------- This set of programs is an implementation of the SPARQL protocol, with some extensions that are useful for working with sensitive data. The server almost appears to be a standard SPARQL endpoint, with the difference that it requires OpenID authentication from the client. The endpoint also supports SSL encryption with a server-side certificate, to keep data exchanges confidential. Also, the server logs all incoming queries in an RDF format. See "Logging" below for more details. The server allows for a function to determine whether a given OpenID is allowed to use the server. See "Access Policy" below for more details. The server uses rdflib stores to hold the server's data and log. These are not created automatically. Navigating to the base URI of the server will provide the user with a login form where he/she can enter his/her OpenID. Other pages under the root directory that are used: For interactive use through a browser: /webform -- interactive webform where a user can enter a query /login -- handles requests for session credentials (user is sent here from the login page) /loginComplete -- finishes the authentication process (user is sent here from his/her OP) For noninteractive use: /getSession -- page the client library requests when starting authentication /clientLogin -- page that /getSession redirects to; opened in a browser by the client library; redirects to OP /clientComplete -- page that OP redirects to; finishes the authentication process For both uses: /sparql -- page where actual queries are sent /policy -- page that returns an applicable privacy/usage policy Navigating to any other path will return a 404 Not Found. The client library provides a class that encapsulates the interface to an instance of the server. More details will be provided below. 2. Dependencies --------------- The server requires that you have the following libraries installed: rdflib Tested with version 2.4.0. Python OpenID library Tested with version 2.1.1 of this library--version 1.x known not to work. Should work with any 2.x version. ElementTree (By default, this will try to use the built-in ElementTree from Python 2.5 and above, falling back on the external ElementTree if necessary.) PyOpenSSL Tested with version 0.7. Also, the included example code requires a cwm distribution on your machine. Place a symlink named "swap" in the same directory as sparql_server_ssl.py, pointing to the "swap" directory of your cwm installation. 3. Getting Started ------------------ First, read "Dependencies" above and make sure all of the needed libraries are installed. The server needs two rdflib graphs available, one for logging and one for data, which by default are set up to be persistent MySQL stores. The database parameters are specified in sparql_server_config. New rdflib stores can be created easily. This example will show how to create one on a database "test", on a MySQL server running on sql.example.org: (Substitute "user" and "pw" with a valid username and password, respectively.) "identifier" in the third line is a string that is hashed to create the table names in the MySQL database, allowing one database to hold more than one RDF store. This identifier is specified, along with the database parameters, in the config file. >>> import rdflib >>> configString = "host=sql.example.org,user=user,password=pw,db=test" >>> store = rdflib.plugin.get('MySQL', rdflib.store.Store)(identifier) >>> store.open(configString, create=True) Data from an RDF file or URI pointing to RDF data can be imported like so: >>> graph = rdflib.ConjunctiveGraph(store) >>> graph.parse('http://example.org/some_data.rdf') >>> graph.commit() Changes to a persistent rdflib store will not become permanent until calling commit(). Another good idea is to create or otherwise acquire a valid SSL certificate for the server, and edit the config file (sparql_server_config by default) to point to the certificate and key files. A self-signed certificate can easily be made using OpenSSL: > openssl req -new -x509 -keyout key.pem -out cert.pem The server includes default files to use as an access control policy and a policy attached to the results, but different files can be specified using the options rules_file and policy_file in the config. Also, the server needs a directory to store persistent OpenID information (nonces, etc.) which by default is a subdirectory named "openids" in the directory containing the executable. After all of the external components described above are in place, the server may be started by executing "sparql_server_ssl.py". 4. Files -------- sparql_server_new.py -- The server executable. sparql_server_ssl.py -- The server, with SSL support. sparql_client.py -- The client library. 5. Internal variables --------------------- debug: if True, print debugging output to the console. verbose: if True, print additional information above and beyond debugging to the console. 6. Access Policy ---------------- The server allows for a function that implements the server's access control. The server, when initialized, is passed a function, with this spec: some_fn(openid) -> bool The function takes an OpenID identifier, as a string, and returns a boolean True if the user is allowed to use the server, False otherwise. 7. Privacy Policy ----------------- The server allows for a function that returns a privacy/usage policy to the client when a request is made for "https:///policy". A link to this policy is also incorporated into the results returned from the server, via the SPARQL "link" tag. This function should take an instance of SparqlRequestHandler (or any subclass of BaseHTTPRequestHandler) and return the applicable policy to the client. This is done this way so that the policy can be returned as any MIME type with any headers that are necessary. 8. Logging ---------- The server logs all requests on the database in an rdflib store, which is passed to the server when it is initialized. The server logs the following triples to the log store: @prefix data: : rdf:type data:LogItem; data:hasQuery ^^owl:normalizedString; data:hasRequester ^^owl:normalizedString; data:hasTimestamp ^^owl:dateTime; data:hasOpenId ^^owl:anyURI; data:authenticated ^^(data:authenticatedTrue | data:authenticatedFalse) 9. Creating server instances ---------------------------- A server class is created like so: foo = SparqlServer(rdfstore, logstore, openid_store, authorizedFunc, returnPolicyFunc, KEYFILE, CERTFILE, serveraddress, RequestHandlerClass) with these parameters: rdfstore: instance of rdflib.Graph that stores the server's data. logstore: instance of rdflib.Graph that the server logs to. openid_store: instance of openid.store.filestore.FileOpenIDStore that holds persistent OpenID data authorizedFunc: function described in "Access Policy" above returnPolicyFunc: function that returns a privacy/usage policy to the user. See "Privacy Policy" above KEYFILE: filename (string) of the (private?) key file to use for SSL CERTFILE: filename (string) of the certificate file to use for SSL serveraddress: tuple of (hostname, port) that the server listens on RequestHandlerClass: this should be SparqlRequestHandler. A function start_server has been provided for convenience. 10. Client --------- The client library provides a class that encapsulates a connection to a SPARQL server. Suppose that we have a server running at https://tasty-snack.mit.edu:3456. We can create an object representing it as follows: import sparql_client from sparql_client import SparqlWrapper foo = SparqlWrapper('https://tasty-snack.mit.edu:3456') After creating the wrapper, it is necessary for the client to authenticate him/herself to the server with his/her OpenID. This is done like so: foo.authenticate('http://somebody.youropenid.com') This will open a browser window pointed to a page on the server that will redirect to the user's OP. After the authentication is done, the user will get redirected to a listener that the client has started on localhost:9876, which will present the user with an informational message stating that the authentication has succeeded, and that queries may now be run. Queries are run like so: foo.setQueryString('SELECT * WHERE {?s ?p ?o}') (optionally) foo.setReturnFormat() (currently not implemented) (optionally) foo.addExtraURITag(key, value) results = foo.query() query() returns an instance of SparqlResults, which functions as an iterator that returns the XML returned by the server. It also supports the following methods: getURL(): returns the URL used to get the query results getInfo(): returns additional metadata from the query results Extra support for parsing query results into more usable data structures may be implemented in the future. 11. TODOs --------