#!/usr/bin/python2.4 """ HTTP access to an SQL DB thru RDF glasses. By `Dan Connolly`_ and Rob Crowell. .. _Dan Connolly: http://www.w3.org/People/Connolly/ Tim Berners-Lee wrote a design note, `Relational Databases on the Semantic Web`_. This module started as an implementation of v1.17 of that note from 2002/03/06. It has since been revised to support more browsing features. .. _Relational Databases on the Semantic Web: http://www.w3.org/DesignIssues/RDB-RDF.html License ------- Share and enjoy. Copyright (c) 2001-2006 W3C (MIT, INRIA, Keio) `W3C Open Source License`_ .. _W3C Open Source License: http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231 Browsing -------- This module (especially class `DBViewHandler()`) implements an HTTP server. To run it, see `Usage`. The main resource (see `dbdescsym()` and `aboutDB()`) is a description of the database; i.e. a list of all of the tables. You can follow the (RDF) links to the table descriptions (see `tabledescsym()` and `aboutTable()`) and to the table data (`tabledatasym()` and `tableDump()`) and individual items (`itemsym()` and `rowDump()`). Query ----- The goal is to support SPARQL (@@LinkMe). The original implementation predates SPARQL; it was an HTML forms interface. It has been disabled. In that interface, a query regards a number of tables; for each table, we're given: - the table's name - the fields we want selected from the table (if any) - the table's primary key (if relevant) - the fields we want to join with other table's primary keys - an optional/supplimentary WHERE clause Configuration: foreign keys, hidden columns, and additional RDF --------------------------------------------------------------- dbview relies on a configuration file, often called dbviewconf.n3. This file MUST be specified on the command line using the --configFile=XXX.rdf option, unless the --doctest flag is present. This file can be written in either rdf or n3. The following is a complete configuration file specified in n3:: @prefix rdf: . @prefix rdfs: . @prefix db: . a db:DatabaseDocument; db:mysqlHost "localhost"; db:sqldb "db"; db:mysqlUser "dbuser"; db:mysqlPassword "dbpass"; db:httpHost "db.example"; db:httpPort 8081; db:httpPath "/services/dbview"; db:httpAllowedClients ("1.2.3.4" "5.6.7.8"); rdfs:seeAlso ; rdf:type db:RelativeURIExporter; rdf:type db:ForlinkExporter; rdf:type db:BacklinkExporter; rdf:type db:NumberOfRowsExporter; rdf:type db:Editable. - the URI for the database document which is available directly to clients is (this is also the base URI used by dbview; all URIs generated by dbview will be relative to this location) - the database named 'db' running on the local machine is connected to with the username 'dbuser' and password 'dbpass' - the HTTP service created by dbview is available on the machine with address 'db.example', listening on port 8081 - the string that dbview expects all URIs to begin with is 'http://db.example:8081/services/dbview' using the provided httpPort, httpHost, and httpPath - only two clients are allowed to access the database; 1.2.3.4 and 5.6.7.8 - additional information about foreign keys, no export columns, and other rdf is available in the local file at /home/crowell/db.rdf - dbview should export URIs relative to 'http://example/services/dbview' rather than absolute URIs - dbview should give information about foreign rows when describing a single row, to make browsing easier - dbview should show backlinks when describing a single row - dbview should indicate the number of rows in a table when describing a single table in its table schema document - db:Editable indicates that database schema can be edited Running dbview Behind A Proxy ----------------------------- The configuration given above is adequate for running dbview behind a proxy server. dbview is running on the machine db.example, and the proxy server is listening for requests on port 80 on example. The proxying process roughly follows the script below. - a client requests from the proxy - configured previously, the proxy server translates this into a request for - knowing it is running at db.example:8081 with a path of '/dbview', the URI is parsed and dbview starts generating a database schema document - dbview generates a database schema using relative URIs which are all relative to 'http://example/services/dbview', so the document refers to the database by the relative URI <#.database> - proxy server sends this request to the client which, using the relative URI given by dbview, expands <#.database> into the correct absolute URI If absolute URIs had been requested instead of relative, the document generated by dbview would have correctly referred to the database by the absolute URI . Foreign Keys, No Export Columns, Extra RDF ------------------------------------------ There may be one or more .rdf files in the same directory that contain information about foreign keys, columns that should be ignored by dbview, and any other additional RDF that should be exported about the database, tables, or columns. The foreign keys extracted from the actual database schema are combined with the foreign keys indicated in the rdf file. If the file exists locally, it is reparsed each time it is modified, so that dbview does not need to be restarted each time this file is changed. Note, however, that it IS a good idea to restart dbview if structural changes are made to the actual database (adding tables, removing columns, etc). Assuming that http://example.org/dbview/db is your db:databaseDocument as defined in the dbviewconf file, declaring:: xml:base="http://example.org/dbview/db/" is probably a good idea. Otherwise, absolute URIs must be used for all columns and tables, which reduces portability if you ever move the dbview service. Statements made about the database, however, still need to be written using the absolute URI http://example.org/dbview/db#.database To indicate that urls.acl -> acls.acl, use the following RDF:: To prevent dbview from exporting or searching a column, use:: To prevent dbview from exporting or searching a table, use:: These schema files may also include arbitrary RDF about the database, tables, or columns. A few statements are special; any statements declaring columns as db:NoExportColumn or indicating a db:mapsTo element are exported as described above, and rdfs:label statements, if present, override the default labels provided by dbview. All other RDF statements are exported with no modification on the database and table schema pages, as appropriate. To provide a new label for the database:: all of our data To provide a new label for a table:: an example table To provide a new label for a column:: an important column Subclasses ---------- dbview automatically detects a subclass relationship between tables. When tableA is a subclass of tableB, then rows in tableA are given URIs that point to rows in tableB. Whe the row URI is dereferenced, data from tableB, as well as tableA (and any other subclasses of tableB) is pulled out and exported. This relationship is recursive, so that if tableC is a subclass of tableA, rows for tableC are also given URIs for rows in tableB. Every primary key in tableA must also point to a primary key in tableB; an indication that tableA only provides additional information about tableB. Dependencies ------------ The code is designed to work with the `Python Database API`_ v2.0, though it has only been tested with mysql, so some of the code might be mysql-specific. In particular, we read the `MySQL Reference Manual`_ for version 4.0.2-alpha, generated on 7 March 2002. For RDF serialization we use the RDF/XML serializer from the *Semantic Web Application Platform* (SWAP_). .. _Python Database API: http://www.python.org/topics/database/DatabaseAPI-2.0.html>` .. _MySQL Reference Manual: http://www.mysql.com/documentation/mysql/bychapter/manual_toc.html .. _SWAP: http://www.w3.org/2000/10/swap/ History ------- Version: $Revision$ $ of $Date$ The first announcement was a `chump entry`_ on 2002-02-27 with reference to some `irc notes`_. .. _chump entry: http://rdfig.xmlhack.com/2002/02/27/2002-02-27.html#1014821419.001175 .. _irc notes: http://ilrt.org/discovery/chatlogs/rdfig/2002-02-27#T18-29-01 Colophon: Testing and Documentation ----------------------------------- The examples in the docstrings below are executable doctest_ unit tests. Check them a la:: $ python dbview.py --doctest Note: doctest prior to python 2.4 doesn't seem to work. .. _doctest: http://www.python.org/doc/lib/module-doctest.html We're updating the documetation to use rst_ format for use with epydoc_. It's a bit tedious. .. _epydoc: http://epydoc.sourceforge.net/ .. _rst: http://docutils.sourceforge.net/docs/user/rst/quickstart.html """ __docformat__ = "restructuredtext en" #@@I think guido doesn't like CVS keywords. Consult python style guide # to find community consensus. __version__ = "$Id: dbview.py,v 1.43 2006/05/18 19:19:21 connolly Exp $"[1:-1] import BaseHTTPServer import cgi # for URL-encoded query parsing import codecs import confParser import logging import os import re import socket import SocketServer #for multi-threading import urllib #convert "a/b" to "a%2fb" from sets import Set import MySQLdb # MySQL for Python # http://sourceforge.net/projects/mysql-python # any Python Database API-happy implementation will do. from MySQLdb.connections import ProgrammingError, OperationalError import swap # http://www.w3.org/2000/10/swap/ from swap.RDFSink import SYMBOL, LITERAL, ANONYMOUS, FORMULA from swap.toXML import ToRDF from swap import myStore #for reading in the configuration file from swap.llyn import RDFStore from swap.term import Node, Literal from swap.uripath import refTo class Namespace: """A collection of URIs with a common prefix. >>> r = Namespace('http://www.w3.org/1999/02/22-rdf-syntax-ns#') >>> r.type.uriref() 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' >>> Namespace('http://example/vocab#').sym('class') 'http://example/vocab#class' Copied from mid_proxy_ which derives from AaronSw's rdfapi_ that he told me about in #rdfig. .. _mid_proxy: http://www.w3.org/2000/04/maillog2rdf/mid_proxy.py .. _rdfapi: http://cvs.plexdev.org/viewcvs/viewcvs.cgi/plex/plex/plexrdf/rdfapi.py?rev=1.6&content-type=text/vnd.viewcvs-markup """ def __init__(self, nsname): self.nsname = nsname def __getattr__(self, lname): if not(lname.startswith("__")): return myStore.symbol(self.nsname + lname) def __str__(self): return self.nsname def sym(self, lname): return self.nsname + lname class ForeignKeys: """Keeps track of the foreign keys defined for a database. >>> fk = ForeignKeys() >>> fk.add('table1', 'table2', 'key1', 'key2') >>> fk.add('table1', 'table2', 'key1b', 'key2b') >>> fk.mapTo('table1', 'key1', 'table2') 'key2' >>> fk.mapTo('table1', 'key1b', 'table2') 'key2b' >>> fk.mapFrom('table2', 'key2', 'table1') 'key1' >>> fk.mapsSomewhere('table1', 'key1') True >>> fk.mapsSomewhere('table1', 'someCol') False >>> for t, c in fk.mapToAll('table1', 'key1'): ... [t, c] ['table2', 'key2'] >>> for t, c in fk.mapFromAll('table2', 'key2'): ... [t, c] ['table1', 'key1'] >>> fk.tblsTo('table1') ['table2'] >>> fk.tblsFrom('table2') ['table1'] >>> for s in fk.srcTbls(): ... s 'table1' >>> for d in fk.destTbls(): ... d 'table2' >>> for sc, dt, dc in fk.srcMap('table1'): ... [sc, dt, dc] ['key1', 'table2', 'key2'] ['key1b', 'table2', 'key2b'] >>> for s, d in fk.mapCols('table1', 'table2'): ... [s, d] ['key1', 'key2'] ['key1b', 'key2b'] >>> for c in fk.colsFrom('table1'): ... c 'key1' 'key1b' >>> for c in fk.colsTo('table2'): ... c 'key2b' 'key2' >>> fk = ForeignKeys() >>> fk.add('table1', 'table2', 'key1', 'key2') >>> fk.add('table1', 'table2', 'key1b', 'key2b') >>> fk.removeTbl('table1') >>> fk.colsFrom('table1') [] >>> fk.tblsTo('table1') [] >>> fk.colsTo('table2') [] >>> fk.tblsFrom('table2') [] >>> fk = ForeignKeys() >>> fk.add('table1', 'table2', 'key1', 'key2') >>> fk.add('table1', 'table2', 'key1b', 'key2b') >>> fk.removeTbl('table2') >>> fk.colsFrom('table1') [] >>> fk.colsTo('table2') [] >>> fk = ForeignKeys() >>> fk.add('table1', 'table2', 'key1', 'key2') >>> fk.add('table1', 'table2', 'key1b', 'key2b') >>> fk.removeCol('table1', 'key1') >>> fk.colsFrom('table1') ['key1b'] >>> fk.colsTo('table2') ['key2b'] >>> fk = ForeignKeys() >>> fk.add('table1', 'table2', 'key1', 'key2') >>> fk.add('table1', 'table2', 'key1b', 'key2b') >>> fk.removeCol('table2', 'key2') >>> fk.colsFrom('table1') ['key1b'] >>> fk.colsTo('table2') ['key2b'] >>> fk = ForeignKeys() >>> fk.add('table1', 'table2', 'key1', 'key2') >>> fk.add('table1', 'table2', 'key1b', 'key2b') >>> fk.add('table3', 'table2', 'key3c', 'key2c') >>> fk.add('table3', 'table1', 'key3d', 'key1d') >>> fk.removeSrcTbl('table1') >>> fk.colsFrom('table1') [] >>> fk.colsTo('table1') ['key1d'] >>> fk.colsTo('table2') ['key2c'] >>> fk = ForeignKeys() >>> fk.add('table1', 'table2', 'key1', 'key2') >>> fk.add('table1', 'table2', 'key1b', 'key2b') >>> fk.add('table3', 'table2', 'key3c', 'key2c') >>> fk.add('table3', 'table1', 'key3d', 'key1d') >>> fk.removeDestTbl('table1') >>> fk.colsTo('table1') [] >>> fk.colsFrom('table1') ['key1', 'key1b'] """ def __init__(self): """Creates a new, empty foreign key container.""" #{srcT: {srcC: {destT: destC}}} self._mapStScDt = {} #{srcT: {destT: {srcC: destC}}} self._mapStDtSc = {} #{destT: {destC: {srcT: srcC}}} self._mapDtDcSt = {} #{destT: {srcT: {srcC: destC}}} self._mapDtStSc = {} def add(self, srcT, destT, srcC, destC): """Creates a mapping: srcT.srcC -> destT.destC.""" #to make the various methods of ForeignKeys as fast as possible, we #store the data in four different ways. When we add or remove, we #have to be careful to add 4 times and remove 4 times. logging.debug("%s.%s -> %s.%s", srcT, srcC, destT, destC) m = self._mapStScDt if not m.has_key(srcT): m[srcT] = {} if not m[srcT].has_key(srcC): m[srcT][srcC] = {} m[srcT][srcC][destT] = destC m = self._mapDtDcSt if not m.has_key(destT): m[destT] = {} if not m[destT].has_key(destC): m[destT][destC] = {} m[destT][destC][srcT] = srcC m = self._mapStDtSc if not m.has_key(srcT): m[srcT] = {} if not m[srcT].has_key(destT): m[srcT][destT] = {} m[srcT][destT][srcC] = destC m = self._mapDtStSc if not m.has_key(destT): m[destT] = {} if not m[destT].has_key(srcT): m[destT][srcT] = {} m[destT][srcT][srcC] = destC def mapTo(self, srcT, srcC, destT): """Indicates how a given column maps to another table. mapTo(srcT, srcC, desT) -> desC """ return self._mapStScDt.get(srcT, {}).get(srcC, {}).get(destT, '') def mapsSomewhere(self, srcT, srcC): """Returns True if srctT.srcC points to some column of some table.""" return len(self._mapStScDt.get(srcT, {}).get(srcC, {}).keys()) > 0 def mapToAll(self, srcT, srcC): """Gets all of the mappings where srcT.srcC is the source. mapToAll(srcT, srcC) -> [[destT, destC]] """ mapping = self._mapStScDt.get(srcT, {}).get(srcC, {}) ret = [[x, mapping[x]] for x in mapping.keys()] for item in ret: yield item def mapFrom(self, destT, destC, srcT): """Indicates how a given column is mapped to by another table. mapFrom(destT, destC, srcT) -> srcC """ logging.debug("mapFrom(%s, %s, %s)", destT, destC, srcT) return self._mapDtDcSt.get(destT, {}).get(destC, {}).get(srcT, '') def mapFromAll(self, destT, destC): """Indicates how destT.destC is mapped to by all tables. Returns an iterator. mapFromAll(destT, destC) -> iterator for [[srcT, srcC]] """ mapping = self._mapDtDcSt.get(destT, {}).get(destC, {}) ret = [[x, mapping[x]] for x in mapping.keys()] for item in ret: yield item def tblsTo(self, srcT = None): """Indicates how a given table maps to all other tables. Returns an iterator over destination tables. tblsTo(srcT) -> iterator for [destT] """ return self._mapStDtSc.get(srcT, {}).keys() def srcTbls(self): """Gets all the tables that are the source of some mapping. srcTbls() -> iterator for [srcT, destT] """ for item in self._mapStDtSc: yield item def destTbls(self): """Gets all the tables that are the destination of some mapping. destTbls() -> iterator for [destT] """ for item in self._mapDtStSc: yield item def tblsFrom(self, destT): """Indicates how a given table is mapped to by all other tables. tblsFrom(destT) -> [srcT] """ return self._mapDtStSc.get(destT, {}).keys() def srcMap(self, srcT): """Indicates a complete mapping from a source table to all other tables. Gives an iterator rather than a list. srcMap(srcT) -> [srcC, destT, destC] """ ret = [] scMap = self._mapStScDt.get(srcT, {}) for sc in scMap.keys(): dtMap = scMap.get(sc, {}) for dt in dtMap.keys(): ret.append([sc, dt, dtMap[dt]]) for item in ret: yield item def mapCols(self, srcT, destT): """Indicates the mapping from one table to another. mapTbl(srcT, destT) -> [[srcC, destC]] """ mapping = self._mapStDtSc.get(srcT, {}).get(destT, {}) return [[x, mapping[x]] for x in mapping] def colsFrom(self, srcT): """Gets a list of columns in srcT that point somewhere.""" return self._mapStScDt.get(srcT, {}).keys() def colsTo(self, destT): """Gets a list of columns in destT that are pointed to by something.""" return self._mapDtDcSt.get(destT, {}).keys() def removeSrcTbl(self, table): """Removes all relationships which have table as the source.""" if self._mapStScDt.has_key(table): del self._mapStScDt[table] del self._mapStDtSc[table] for dt in self._mapDtDcSt.keys(): for dc in self._mapDtDcSt[dt].keys(): for st in self._mapDtDcSt[dt][dc].keys(): if st == table: del self._mapDtDcSt[dt][dc][st] if self._mapDtDcSt[dt][dc] == {}: del self._mapDtDcSt[dt][dc] if self._mapDtDcSt[dt] == {}: del self._mapDtDcSt[dt] for dt in self._mapDtStSc.keys(): for st in self._mapDtStSc[dt].keys(): if st == table: del self._mapDtStSc[dt][st] if self._mapDtStSc[dt] == {}: del self._mapDtStSc[dt] def removeDestTbl(self, table): """Removes all relationships involving table as the destination.""" if self._mapDtDcSt.has_key(table): del self._mapDtDcSt[table] del self._mapDtStSc[table] for st in self._mapStScDt.keys(): for sc in self._mapStScDt[st].keys(): if table in self._mapStScDt[st][sc].keys(): del self._mapStScDt[st][sc][table] if self._mapStScDt[st][sc] == {}: del self._mapStScDt[st][sc] if self._mapStScDt[st] == {}: del self._mapStScDt[st] for st in self._mapStDtSc.keys(): if table in self._mapStDtSc[st].keys(): del self._mapStDtSc[st][table] if self._mapStDtSc[st] == {}: del self._mapStDtSc[st] def removeTbl(self, table): """Removes all relationships involving table.""" self.removeSrcTbl(table) self.removeDestTbl(table) def removeSrcCol(self, table, column): """Removes all relationships involving table.column as a source.""" if self._mapStScDt.get(table, {}).has_key(column): del self._mapStScDt[table][column] if self._mapStScDt[table] == {}: del self._mapStScDt[table] for dt in self._mapStDtSc.get(table, {}).keys(): if column in self._mapStDtSc[table][dt].keys(): del self._mapStDtSc[table][dt][column] if self._mapStDtSc[table][dt] == {}: del self._mapStDtSc[table][dt] if self._mapStDtSc[table] == {}: del self._mapStDtSc[table] for dt in self._mapDtDcSt.keys(): for dc in self._mapDtDcSt[dt].keys(): col = self._mapDtDcSt[dt][dc].get(table, '') if col == column: del self._mapDtDcSt[dt][dc][table] if self._mapDtDcSt[dt][dc] == {}: del self._mapDtDcSt[dt][dc] if self._mapDtDcSt[dt] == {}: del self._mapDtDcSt[dt] for dt in self._mapDtStSc.keys(): if column in self._mapDtStSc[dt].get(table, {}).keys(): del self._mapDtStSc[dt][table][column] if self._mapDtStSc[dt][table] == {}: del self._mapDtStSc[dt][table] if self._mapDtStSc[dt] == {}: del self._mapDtStSc[dt] def removeDestCol(self, table, column): """Removes all relationships involving table.column as destination.""" for st in self._mapStScDt.keys(): for sc in self._mapStScDt[st].keys(): col = self._mapStScDt[st][sc].get(table, '') if col == column: del self._mapStScDt[st][sc][table] if self._mapStScDt[st][sc] == {}: del self._mapStScDt[st][sc] if self._mapStScDt[st] == {}: del self._mapStScDt[st] for st in self._mapStDtSc.keys(): for sc in self._mapStDtSc[st].get(table, {}).keys(): col = self._mapStDtSc[st][table][sc] if col == column: del self._mapStDtSc[st][table][sc] if self._mapStDtSc[st][table] == {}: del self._mapStDtSc[st][table] if self._mapStDtSc[st] == {}: del self._mapStDtSc[st] if column in self._mapDtDcSt.get(table, {}).keys(): del self._mapDtDcSt[table][column] for st in self._mapDtStSc.get(table, {}).keys(): for sc in self._mapDtStSc[table][st].keys(): col = self._mapDtStSc[table][st][sc] if col == column: del self._mapDtStSc[table][st][sc] if self._mapDtStSc[table][st] == {}: del self._mapDtStSc[table][st] if self._mapDtStSc[table] == {}: del self._mapDtStSc[table] def removeCol(self, table, column): """Removes all relationships involving table.column.""" self.removeSrcCol(table, column) self.removeDestCol(table, column) class Subclass: """Keeps track of all the subclass relationships that exist. >>> fk = ForeignKeys() >>> fk.add('subTableA', 'superTableA', 'subKeyA', 'superKeyA') >>> fk.add('superTableA', 'superTableB', 'superKeyA', 'superKeyB') >>> pk = {} >>> pk['subTableA'] = Set(['subKeyA']) >>> pk['superTableA'] = Set(['superKeyA']) >>> pk['superTableB'] = Set(['superKeyB']) >>> subc = Subclass(fk, pk) >>> children = subc.getChildrenOf('superTableB') >>> children.sort() >>> children ['superTableA'] >>> subc.getChildrenOf('imaginedTable') [] >>> subc.getParentOf('subTableA') 'superTableA' >>> subc.getParentOf('imaginedTable') >>> for t in subc.superclassTrace('subTableA', ['subKeyA']): ... t ['superTableA', ['superKeyA']] ['superTableB', ['superKeyB']] """ def __init__(self, foreignKeys = ForeignKeys(), primaryKeys = {}): """Creates a new Subclass container that derives its information from the primary keys and a ForeignKeys instance. :param foreignKeys: a ForeignKeys instance, or None :param primaryKeys: {table: Set([primKey, primKey])} """ self._parentAsKey = {} #{parent: [child]} self._childAsKey = {} #{child:parent} self.fk = foreignKeys #dangerous, since foreignKeys could change! #get foreign key data, if it exists if foreignKeys: self._incorporateForeignKeys(foreignKeys, primaryKeys) def _isSubclass(self, thisTable, thatTable, foreignKeys, primaryKeys): """Determines whether or not thisTable is a subclass of thatTable. >>> fk = ForeignKeys() >>> fk.add('table1', 'table2', 'key1a', 'key2a') >>> fk.add('table1', 'table2', 'key1b', 'key2b') >>> fk.add('table1', 'table3', 'key1c', 'key3c') >>> pk = {} >>> pk['table1'] = Set(['key1a', 'key1b']) >>> pk['table2'] = Set(['key2a', 'key2b']) >>> pk['table3'] = Set(['key3c']) >>> sc = Subclass(fk, pk) >>> sc._isSubclass('table1', 'table2', fk, pk) True >>> sc._isSubclass('table2', 'table1', fk, pk) False >>> sc._isSubclass('table1', 'table3', fk, pk) False >>> sc._isSubclass('table4', 'table5', fk, pk) False >>> tricky = ForeignKeys() >>> tricky.add('table1', 'table2', 'key1a', 'key2a') >>> tricky.add('table1', 'table2', 'key1b', 'key2a') >>> sc = Subclass(tricky, pk) >>> sc._isSubclass('table1', 'table2', tricky, pk) False >>> tricky = ForeignKeys() >>> tricky.add('table1', 'table2', 'key1a', 'key2a') >>> tricky.add('table1', 'table2', 'key1a', 'key2b') >>> sc = Subclass(tricky, pk) >>> sc._isSubclass('table1', 'table2', tricky, pk) False >>> tricky = ForeignKeys() >>> tricky.add('table1', 'table2', 'key1a', 'key2a') >>> sc = Subclass(tricky, pk) >>> sc._isSubclass('table1', 'table2', tricky, pk) False >>> pk = {} >>> pk['table1'] = Set(['key1a', 'key1b']) >>> pk['table2'] = Set(['key2']) >>> tricky = ForeignKeys() >>> tricky.add('table1', 'table2', 'key1a', 'key2') >>> tricky.add('table1', 'table2', 'key1b', 'key2') >>> sc = Subclass(tricky, pk) >>> sc._isSubclass('table1', 'table2', tricky, pk) False """ emptySet = Set([]) if primaryKeys.get(thisTable, emptySet) == emptySet: return False if primaryKeys.get(thatTable, emptySet) == emptySet: return False spk = primaryKeys[thisTable] #source primary keys dpk = primaryKeys[thatTable] #destination primary keys if not isinstance(spk, Set): raise TypeError, "the value of keys in prim keys must be a Set." if len(spk) != len(dpk): return False mapping = foreignKeys.mapCols(thisTable, thatTable) sfk = Set([x[0] for x in mapping]) #source foreign keys dfk = Set([x[1] for x in mapping]) #source primary keys return spk.issubset(sfk) and dpk.issubset(dfk) #subclass if pk < fk def _incorporateForeignKeys(self, foreignKeys, primaryKeys): """Extracts all subclass relationships out of the foreign keys and primary keys, and adds them to self. >>> fk = ForeignKeys() >>> fk.add('table1', 'table2', 'key1a', 'key2a') >>> fk.add('table1', 'table2', 'key1b', 'key2b') >>> fk.add('table1', 'table3', 'key1c', 'key3c') >>> pk = {} >>> pk['table1'] = Set(['key1a', 'key1b']) >>> pk['table2'] = Set(['key2a', 'key2b']) >>> pk['table3'] = Set(['key3c']) >>> sc = Subclass(foreignKeys=fk, primaryKeys=pk) >>> sc.getChildrenOf('table2') ['table1'] >>> sc.getChildrenOf('table1') [] >>> sc.getChildrenOf('table3') [] """ for st in foreignKeys.srcTbls(): for dt in foreignKeys.tblsTo(st): if self._isSubclass(st, dt, foreignKeys, primaryKeys): logging.debug("%s subClassOf %s", st, dt) try: self._add(st, dt) except ValueError: logging.warning("%s has more than one parent!", st) def _add(self, subTable, superTable): """Adds a relationship subTable.subKey -> superTable.superKey if subTable is a subclass of superTable. >>> sc = Subclass() >>> sc._add('ta', 'tb') >>> sc._add('tb', 'tc') >>> sc.getParentOf('ta') 'tb' >>> sc.getParentOf('tb') 'tc' >>> sc.getParentOf('tc') >>> sc.getChildrenOf('ta') [] >>> sc.getChildrenOf('tb') ['ta'] >>> sc.getChildrenOf('tc') ['tb'] >>> try: ... sc._add('ta', 'td') ... exThrown = False ... except ValueError: ... exThrown = True >>> exThrown True >>> sc.getParentOf('ta') 'tb' """ ## Add the relationship to the superclass map #if we don't know anything about superTable, add a blank if not self._parentAsKey.has_key(superTable): self._parentAsKey[superTable] = [] if subTable in self._parentAsKey[superTable]: raise ValueError, "%s is already a child of %s" % (superTable, subTable) self._parentAsKey[superTable].append(subTable) if self._childAsKey.has_key(subTable): raise ValueError, "%s is already the parent of %s" % \ (self._childAsKey[subTable], subTable) self._childAsKey[subTable] = superTable def getChildrenOf(self, superTable): """Gets a list of tables which are subclasses of superTable.""" return self._parentAsKey.get(superTable, []) def getParentOf(self, subTable): """Gets the list of tables that are superclasses of subTable.""" return self._childAsKey.get(subTable, None) def superclassTrace(self, table, columns): """Iteratively looks up the superclasses of table until it finds one which has no parent. If more than one superclass exists, the first one is chosen. :param table: the table whose superclasses are wanted :param columns: list of columns to trace through all the superclasses :return: an iterator over ordered list of lists, indicating the super tables and their analog in the super table; the last element represents the highest superclass [[nextT, [c3, c4]], [finalT, [c5, c6]]] """ ret = [] while 1: ret.append([table, columns]) parent = self.getParentOf(table) if parent == None: ret = ret[1:] break else: columns = [self.fk.mapTo(table, c, parent) for c in columns] table = parent #broke out of the loop, now iterate over the results for i in ret: yield i class ExtraRDF: """This class keeps up with all of the extra RDF that dbview is interested in exporting. It is capable of keeping up with extra RDF concering the database, tables, and columns of tables. >>> s = myStore.store >>> e = ExtraRDF('http://example/services/dbview') >>> e.addDatabaseStatement(pred=s.symbol('http://ex/r#a'), ... obj=s.symbol('http://ex/r#b')) >>> e.addDatabaseStatement(pred=s.symbol('http://ex/r#1'), ... obj=s.symbol('http://ex/r#2')) >>> e.databaseStatements() [[a, b], [1, 2]] >>> e.addTableStatement(table='t', pred=s.symbol('http://ex/r#c'), ... obj=s.symbol('http://ex/r#d')) >>> e.addTableStatement(table='t', pred=s.symbol('http://ex/r#e'), ... obj=s.symbol('http://ex/r#f')) >>> e.addTableStatement(table='u', pred=s.symbol('http://ex/r#g'), ... obj=s.symbol('http://ex/r#h')) >>> e.tableStatements(table = 'u') [[g, h]] >>> e.tableStatements(table = 't') [[c, d], [e, f]] >>> e.tableStatements(table = 'q') [] >>> e.addColumnStatement(table='t', column='v', ... pred=s.symbol('http://ex/r#i'), ... obj=s.symbol('http://ex/r#j')) >>> e.addColumnStatement(table='w', column='x', ... pred=s.symbol('http://ex/r#k'), ... obj=s.symbol('http://ex/r#l')) >>> e.columnStatements(table='t', column='v') [[i, j]] >>> e.columnStatements(table='w', column='v') [] >>> e.columnStatements(table='w', column='x') [[k, l]] >>> e.addColumnStatement(table='t', column='nex', ... obj=s.symbol('http://www.w3.org/2000/10/swap/db#NoExportColumn'), ... pred=s.symbol('http://www.w3.org/1999/02/22-rdf-syntax-ns#type')) >>> e.noExportColumns('t') ['nex'] >>> e.colsMatchingStatement(table='t', ... pred=s.symbol('http://www.w3.org/1999/02/22-rdf-syntax-ns#type'), ... obj=s.symbol('http://www.w3.org/2000/10/swap/db#NoExportColumn')) ['nex'] >>> tableURI = s.symbol('http://example/services/dbview/blah#.table') >>> e.tableName(tableURI) 'blah' >>> columnURI = s.symbol('http://example/services/dbview/blah#col') >>> e.columnName(columnURI) ('blah', 'col') """ def __init__(self, dbdocaddr): """Creates a new, empty store for extra RDF statements.""" myStore.literal("7") #make sure the store is initialized self._dbdocaddr = dbdocaddr self.store = myStore.store self.formula = self.store.newFormula() def dbURI(self): return self.store.symbol(dbsym(self._dbdocaddr)) def dbSchemaURI(self): return self.store.symbol(self._dbdocaddr) def tableURI(self, table): return self.store.symbol(tablesym(self._dbdocaddr, table)) def tableSchemaURI(self, table): return self.store.symbol(tableschemasym(self._dbdocaddr, table)) def columnURI(self, table, column): return self.store.symbol(colsym(self._dbdocaddr, table, column)) def tableName(self, tableURI): """Gets the string name out of a table URI.""" if tableURI.uriref().startswith(self._dbdocaddr): uriSplit = tableURI.uriref().split('/') if tableURI.fragid == '.table': return uriSplit[-1].split('#')[0] raise ValueError("%s is not a table URI" % tableURI.uriref()) def columnName(self, columnURI): """Gets a tuple of (table, column) from a column URI.""" if columnURI.uriref().startswith(self._dbdocaddr): uriSplit = columnURI.uriref().split('/') if columnURI.fragid != '.table': return (uriSplit[-1].split('#')[0], uriSplit[-1].split('#')[1]) raise TypeError("%s is not a column URI" % columnURI.uriref()) def addDatabaseStatement(self, pred, obj): """Adds a statement that has the database itself as the subject.""" dbURI = self.dbURI() self.formula.add(subj=dbURI, pred=pred, obj=obj) def addTableStatement(self, table, pred, obj): """Adds a statement that has a table as the subject.""" tableURI = self.tableURI(table) self.formula.add(subj=tableURI, pred=pred, obj=obj) def addColumnStatement(self, table, column, pred, obj): """Adds a statement that has table.column as the subject.""" columnURI = self.columnURI(table, column) self.formula.add(subj=columnURI, pred=pred, obj=obj) def databaseStatements(self): """Gets all of the statements made about the database.""" ret = [] for s in self.formula.statementsMatching(subj=self.dbURI()): ret.append([s.predicate(), s.object()]) return ret def databaseSchemaStatements(self): """Gets all of the statements made about the database schema.""" ret = [] for s in self.formula.statementsMatching(subj=self.dbSchemaURI()): ret.append([s.predicate(), s.object()]) return ret def tableStatements(self, table): """Gets all of the statements made about a table.""" ret = [] for s in self.formula.statementsMatching(subj=self.tableURI(table)): ret.append([s.predicate(), s.object()]) return ret def tableSchemaStatements(self, table): """Gets all the statements made about a table's schema.""" ret = [] st = self.formula.statementsMatching(subj=self.tableSchemaURI(table)) for s in st: ret.append([s.predicate(), s.object()]) return ret def columnStatements(self, table, column): """Gets all of the statements made about the column table.column.""" ret = [] for s in self.formula.statementsMatching(subj=self.columnURI(table, column)): ret.append([s.predicate(), s.object()]) return ret def colsMatchingStatement(self, table, pred, obj): """Gets a list of all columns from a table that are the subject in a statement with a given predicate and object. """ ret = [] cols = self.formula.each(pred=pred, obj=obj) for colURI in cols: t, column = self.columnName(colURI) if t == table: ret.append(column) return ret def noExportColumns(self, table): """Gets a list of all no export columns for a table.""" return self.colsMatchingStatement(table, pred=RDF.type, obj=SwDB.NoExportColumn) ## GLOBAL SETTINGS # We use the 'z' flag to avoid absolute URIs in the output so that # dbview can be run behind a reverse proxy. The 'r' flag specifies that # absolute URIs should be used. Relative_nsdecl = 'z' Absolute_nsdecl = 'r' #RDF_MediaType = "application/rdf+xml" #@@ cf. RDF Core RDF_MediaType = "text/plain" #used for looking at RDF in firefox SwDB = Namespace("http://www.w3.org/2000/10/swap/db#") # in particular: db.n3,v 1.5 2002/03/06 23:12:00 RDFS = Namespace("http://www.w3.org/2000/01/rdf-schema#") RDF = Namespace("http://www.w3.org/1999/02/22-rdf-syntax-ns#") LOG = Namespace("http://www.w3.org/2000/10/swap/log#") def dbsym(dbdocaddr): """Build a URI for the database from a URI for the database description. We have a need for arbitrary names that don't clash with table and column names. We chose names with '.' at the end (or beginning?), taking advantage of: 'You cannot use the '.' character in names because it is used to extend the format by which you can refer to columns.' -- 6.1.2_ Database, Table, Index, Column, and Alias Names .. _6.1.2: http://www.mysql.com/documentation/mysql/bychapter/manual_Reference.html#Legal_names >>> dbsym('http://example/dbview/w3c') 'http://example/dbview/w3c#.database' """ return "%s#.database" % dbdocaddr def tablesym(dbdocaddr, tbl): """Build a URI for a table from its name and a URI for the database. >>> tablesym('http://example/dbview/w3c', 'domains') 'http://example/dbview/w3c/domains#.table' """ return "%s/%s#.table" % (dbdocaddr, tbl) def tableschemasym(dbdocaddr, table): """Build a URI for a table's schema. >>> tableschemasym('http://example/dbview/w3c', 'domains') 'http://example/dbview/w3c/domains' """ return "%s/%s" % (dbdocaddr, table) def tableeditsym(dbdocaddr, table): """Build a URI for a table's edit page. >>> tableeditsym('http://example/dbview/w3c', 'domains') 'http://example/dbview/w3c/domains/edit' """ return "%s/%s/edit" % (dbdocaddr, table) def colsym(dbdocaddr, table, col): """Build a URI for a database table column. >>> colsym('http://example/dbview/w3c', 'domains', 'leader') 'http://example/dbview/w3c/domains#leader' Use an empty column name to get a namespace name for the columns. >>> colsym('http://example/dbview/w3c', 'domains', '') 'http://example/dbview/w3c/domains#' """ return "%s/%s#%s" % (dbdocaddr, table, col) def tabledatasym(dbdocaddr, table): """Build a URI for the data in a table. >>> tabledatasym('http://example/dbview/w3c', 'domains') 'http://example/dbview/w3c/domains/data' """ return "%s/%s/data" % (dbdocaddr, table) def tabledescsym(dbdocaddr, table): """Build a URI for a table description. >>> tabledescsym('http://example/dbview/w3c', 'domains') 'http://example/dbview/w3c/domains' """ return "%s/%s" % (dbdocaddr, table) def itemsym(dbdocaddr, table, cols, vals, subclass = Subclass()): """Build a URI for an item (row) in a database. >>> itemsym('http://example/dbview/w3c', 'domains', ('id',), (23,)) 'http://example/dbview/w3c/domains/item/id/23#item' >>> itemsym('http://example/stuff', 'tbl', ('c1', 'c2'), ('v1', 'v2')) 'http://example/stuff/tbl/item/c1/v1/c2/v2#item' """ #if some value is None or '', throw an exception #@@this check might not be necessary, commented out for now # for item in vals: # if item == None or item == '': # raise ValueError("An item was either None or '', URI not made;" + # "table=%s, cols=%s, vals=%s", table, cols, vals) trace = [x for x in subclass.superclassTrace(table, cols)] if len(trace) == 0: #if table has no parent superclass topTable = table topColumns = cols else: topTable = trace[-1][0] topColumns = trace[-1][1] key = '/'.join(["%s/%s" % (c, urllib.quote(str(v))) for c, v in zip(topColumns, vals)]) return "%s/%s/item/%s#item" % (dbdocaddr, topTable, key) def parseIndexString(path, nex=[]): """Parses an index string and returns a list of [key, value] pairs. >>> parseIndexString("id/8/emailId/7") [['id', '8'], ['emailId', '7']] """ segments = path.split("/") clauses = [] if len(segments) % 2 == 1: raise ValueError("the index string '%s' is invalid" % path) for i in range(0, len(segments), 2): n = segments[i] v = segments[i+1] if n not in nex: clauses.append([n, v]) return clauses class DBViewServer(SocketServer.ThreadingMixIn, BaseHTTPServer.HTTPServer): """Export an SQL database, read-only, into HTTP using RDF. This is a threading HTTP server that handles requests if initialized with the appropriate handler, such as DBViewHandler. """ def __init__(self, addr, handlerClass, db, dbName, configfiles, allowIP, httpBase, httpPath, relativeURI, backlinks, forlinks, exportNumberOfRows, editable): BaseHTTPServer.HTTPServer.__init__(self, addr, handlerClass) self._db = db self._path = httpPath #prefix for URIs that dbview parses in do_GET self._base = httpBase #prefix for URIs that dbview creates self._dbName = dbName #the name of the database ## Foreign Keys # We get foreign key data both from the database itself, and also from # the dbSchema rdf file. This data is then combined to define # self._foreignKeys, which contains all of the foreign key information # that we know about. # # Data from the dbSchema rdf file is combined with _getForeignKeys() # to create a complete list of foreign key relationships. self._foreignKeys = ForeignKeys() ## Subclass Relationships # Dictionary of tables that are a subclass of another table. Table # t1 is a subclass of table t2 when t1 has primary key c1 and t2 has # primary key c2, and t1.c1 -> t2.c2 as per the foreign keys above. # # When t1 is a subclass of t2, URIs for rows in t1 should instead # be made to their corresponding rows in t2. So a URI for t1 that # would be http://example/dbview/t1/item/c1/4 is turned into # http://example/dbview/t2/item/c2/4 self._subclass = None ## Other Statements # In addition to providing information about foreign keys and no export # columns, the user may also provide arbitrary RDF to be included # whenever a particular table is being dumped or described. self._extraRDF = ExtraRDF(self._base) ## Number Of Rows # When printing the table schema, dbview can export the number of rows # in the table. self._exportNumRows = exportNumberOfRows ## Access Control # To control access, we optionally maintain a list of allowed IP. # If we are keeping this list, then self.checkIP is True, and # self.allowIP is a list of numeric IP addresses. self.allowIP = allowIP ## DB Schema RDF File # We want to make sure we're using the most recent schema rdf document, # without having to restart dbview each time we make a modification. # We take advantage of os.stat() to check the change_time of the # configuration files whenever do_GET is called. If the config files # we are currently using are out of date, we re-parse the config files # and update self._foreignKeys and self._extraRDF self._configfiles = configfiles self._configCTimes = [-1 for x in configfiles] #check all files now ## Relative URI # Determines whether or not dbview dumps relative URIs in the RDF # it generates. if relativeURI: self._sinkFlags = Relative_nsdecl else: self._sinkFlags = Absolute_nsdecl ## Backlinks # When dumping a row, dbview can additionally display both items that # point to the row. self._backlinks = backlinks self._forlinks = forlinks ## Editable # If set, this allows the database schema to be modified through the # web at the /.edit URI self.editable = editable ## Initialization # Get the foreign keys out of the database and then read the rdf # config file. self.updateConfigFile() def _getPrimaries(self): """Extracts the primary keys from the database. The keys are stored in self._primaries, which is a dictionary with keys as the table name and values are Sets of all the primary keys. """ self._primaries = {} for tbl in tableNames(self._db): self._primaries[tbl] = Set() for colName, ty, nullable, isKey, dunno, dunno \ in showColumns(self._db, tbl): if isKey == 'PRI': self._primaries[tbl].add(colName) return def _extractForeignKeysFromDB(self, foreignKeys): """Extracts the foreign keys from the database. :param foreignKeys: an instance of ForeignKeys """ for tbl in tableNames(self._db): fk = showForeignKeys(self._db, tbl) for thatTable in fk.get(tbl, []): for columnPair in fk[tbl][thatTable]: foreignKeys.add(tbl, thatTable, columnPair[0], columnPair[1]) logging.debug("extracted foreign key %s.%s -> %s.%s", tbl, columnPair[0], thatTable, columnPair[1]) def _filesChanged(self): for i in range(len(self._configfiles)): configfile = self._configfiles[i] #strip off the file:// part if configfile.startswith("file://"): configfile = configfile[7:] else: logging.warning("The config file cannot be read.") continue #compare change times from last read to the current currentCTime = getChangeTime(configfile) if self._configCTimes[i] != currentCTime: #the file is changed self._configCTimes[i] = currentCTime #update the change time logging.debug("%s changed at %d!", configfile, currentCTime) return True return False #no file changed def _cleanupForeignKeys(self, db, foreignKeys, primaries, extraRDF): """Removes any relationships involving tables or columns that do not actually exist in the database, due to data found in the config file. Relationships where a destination table has no primary keys is also removed, since URIs cannot be built for these tables (bnodes are used in the table data). """ #get a list of all tables involved in fk relations srcTbls = [x for x in foreignKeys.srcTbls()] destTbls = [x for x in foreignKeys.destTbls()] tables = srcTbls + destTbls for table in tables: #make sure table actually exists if not tableExists(db, table): logging.warning("table %s does not exist!", table) foreignKeys.removeTbl(table) elif extraRDF.tableURI(table) in \ extraRDF.formula.each(pred=RDF.type, obj=SwDB.NoExportTable): logging.warning("table %s is not exported!", table) foreignKeys.removeTbl(table) else: #make sure all columns for table exist dbColumns = [x[0] for x in showColumns(db, table)] colsTo = foreignKeys.colsTo(table) colsFrom = foreignKeys.colsFrom(table) configColumns = colsTo + colsFrom for column in configColumns: if column not in dbColumns: logging.warning("column %s.%s does not exist!", table, column) foreignKeys.removeCol(table, column) #get rid of statements where a bnode needs to be the URI if len(primaries.get(table, [])) == 0: logging.warning("%s has no primary keys!", table) foreignKeys.removeDestTbl(table) def updateConfigFile(self): """Checks to see if the config file is outdated, and re-parses it when necessary. No-export columns are stored in self._extraRDF. At the end of the reparsing, foreign keys are extracted from the database and appended to the information pulled from the config file. Subclass relationships are also updated. If the config file contains a table or column reference which does not exist, a warning is logged. :return: True if the configuration was updated """ ## Parse Configuration File # Since we allow multiple configuration files (to allow, for instance, # one configuration file to include db:mapsTo and db:noExport # statements while allowing a separate one to include additional # information to include in dbview), we keep a single list of no export # columns, foreign keys, and additional statements. As each config # file is parsed, the statements extracted from them are added to # this list. Once all files are parsed, the list is used to set up # dbview. # # First, look at all of the config files and see if any of them have # been updated. If so, then reparse all of them (easier than keeping # up with each individually). If not, just return. logging.debug("Checking the configuration files ...") if not self._filesChanged(): return False #wipe out the old config self._extraRDF = ExtraRDF(self._base) self._foreignKeys = ForeignKeys() self._getPrimaries() for configfile in self._configfiles: logging.info("reading config file %s ...", configfile) confParser.parseDBSchema(configfile, dbsym(self._base), self._base, self._foreignKeys, self._extraRDF) #extract foreign keys from the database itself self._extractForeignKeysFromDB(self._foreignKeys) #make sure all of the user's foreign key relationships are valid self._cleanupForeignKeys(db=self._db, foreignKeys=self._foreignKeys, primaries=self._primaries, extraRDF=self._extraRDF) #get the subclass relationships and deal with the primary keys self._subclass = Subclass(self._foreignKeys, self._primaries) class DBViewHandler(BaseHTTPServer.BaseHTTPRequestHandler): """Handles requests from HTTP clients. The primary work is done in the do_GET method, which deals with decoding the URI path and performing the requested action. """ EMPTY_TRIPLES_TO_SHOW = 3 #number of empty triples to show in /edit def log_message(self, format, *args): msg = format % args logging.debug(msg) def checkTable(self, table): """Raises a ValueError if table does not exist or is not exported.""" if table == '': raise ValueError('A table name must be specified.') if not tableExists(self.server._db, table): raise ValueError('The table %s does not exist.' % table) tableURI = self.server._extraRDF.tableURI(table) if tableURI in self.server._extraRDF.formula.each(pred=RDF.type, obj=SwDB.NoExportTable): raise ValueError('the table %s does not exist' % table) def do_POST(self): length = int(self.headers.getheader('Content-length')) data = self.rfile.read(length) #@@handle posts! def do_GET(self): """Decode URI path and dispatch.""" try: s = self.server #see if we need to update the config file s.updateConfigFile() #simple access control; if you're not in the users.conf file, you #are not allowed access. if len(self.server.allowIP)>0 and \ not self.client_address[0] in self.server.allowIP: base = "/".join(s._base.split('/')[:3]) thisDoc = "%s%s" % (base, self.path) self.forbidden("Your IP address is not allowed.", thisDoc) return #define the relative PATH to the database, i.e. /dbview/w3 dbdocpath = s._path if self.path[:len(dbdocpath)] == dbdocpath: rest = self.path[len(dbdocpath):] thisDoc = "%s%s" % (s._base, rest) if rest == '': # http://example.org/dbview/db self.describeDB() return elif rest == '/.edit': self.editDB(thisDoc) return elif rest[0] == '/': splits = rest[1:].split("/") if len(splits) == 1: # http://example.org/dbview/db/employees table = rest[1:] self.checkTable(table) self.describeTable(table) return elif len(splits) == 2 and splits[-1] == "data": # http://example.org/dbview/db/employees/data table = splits[0] self.checkTable(table) self.dumpTable(table) return elif len(splits) == 2 and splits[-1] == "edit": # http://example/dbview/db/employees/edit table = splits[0] if tableExists(s._db, table): self.editTable(table) else: raise ValueError('table does not exist') return elif len(splits) == 2 and splits[-1] == "item": # http://example.org/dbview/db/employees/item self.notFound("Column values must be provided.", thisDoc) return elif len(splits) == 2: self.notFound("An unknown URI was requested for" + " the table.", thisDoc) return elif len(splits) > 2 and splits[1] == "item": # http://example.org/dbview/db/employees/item/id/3 table = splits[0] self.checkTable(table) self.dumpRow(table, "/".join(splits[2:])) return else: self.notFound("An unknown URI was requested.", thisDoc) return else: self.notFound('An unknown URI was requested.', thisDoc) else: # to figure out the URI here, we need to make this look like a # uri that starts with base instead of path. # self.path = http://localhost/somewhere # s._base = http://remotehost/other/cool/place # thisDoc = http://remotehost/somewhere base = "/".join(s._base.split('/')[:3]) thisDoc = "%s%s" % (base, self.path) self.notFound('An unknown URI was requested.', thisDoc) except socket.error, e: if e.args[0]: return #user hit 'stop' else: raise e except ValueError, e: self.notFound(e, thisDoc) return def _seeAlsoOptions(self, configFile): seeAlsoOptions = "" #the for the select for c in self.server._configfiles: if c != configFile: seeAlsoOptions += "" % (c, c) else: seeAlsoOptions += """ """ % (c, c) return seeAlsoOptions def editDBForm(self, configFile, thisDoc): """Sends an HTTP 200 and displays an HTML form suitable for editing a database schema. :param configFile: the string URI for the config file that should be read from and written to """ server = self.server #reparse the config file extraRDF = ExtraRDF(server._base) foreignKeys = ForeignKeys() confParser.parseDBSchema(configFile, dbsym(server._base), server._base, foreignKeys, extraRDF) f = extraRDF.formula dbURI = extraRDF.store.symbol(dbsym(server._base)) self.send_response(200, '') self.send_header('Content-type', 'text/html; charset=UTF-8') self.end_headers() self.wfile.write(""" DBViewHandler: Edit Database Schema: %s

Edit Database Schema: %s

""" % (server._dbName, server._dbName)) self.wfile.write("""

The main configuration file may specify one or more schema files that contain user-provided RDF statements about the database schema and its associated tables. These are included as the object of an rdfs:seeAlso statement in the main dbview configuration. This page allows you to edit one configuration file at a time.

If no files are listed, then add at least one rdfs:seeAlso statement to the configuration.

""") #print the drop down for see also self.wfile.write("""
configuration file:
""" % (self._seeAlsoOptions(configFile))) ################################################################# ## DBView Options ## self.wfile.write("""

Database Options

""") #get the label labelStatement = f.the(subj=dbURI, pred=RDFS.label) if labelStatement: label = labelStatement.value() else: label = '' self.wfile.write("""
label:
""" % (label, configFile)) ################################################################# ## DB Metadata ## self.wfile.write("""

Database Metadata

When the database is described, additional RDF can be printed.

""") statements = [] for s in f.statementsMatching(subj=dbURI): statements.append([s.predicate(), s.object()]) for i in range(self.EMPTY_TRIPLES_TO_SHOW): statements.append([None, None]) cnt = 0 for pred, obj in statements: predVal, objVal, literalChecked = self._editFormStatement(pred, obj) self.wfile.write(""" """ % (server._dbName, cnt, predVal, cnt, objVal, cnt, literalChecked)) cnt += 1 self.wfile.write("""
%s literal
""" % (configFile)) ################################################################# ## Tables ## #print the heading for the tables self.wfile.write("""

Tables

Various configuration options are available for each table individually. Tables that are editable link to their edit pages.

""") tblNames = [x for x in tableNames(server._db)] tblNames.sort() for table in tblNames: tableURI = extraRDF.tableURI(table) tableSchemaURI = extraRDF.store.symbol(tableschemasym( server._base, table)) #get the label labelObj = f.the(subj=tableURI, pred=RDFS.label) if labelObj: label = labelObj.value() else: label = '' #get hidden hidden = f.contains(subj=tableURI, pred=RDF.type, obj=SwDB.NoExportTable) if hidden: hiddenChecked = " checked='checked'" else: hiddenChecked = "" #get editable editable = f.contains(subj=tableSchemaURI, pred=RDF.type, obj=SwDB.Editable) if editable: editableChecked = " checked='checked'" else: editableChecked = "" #get updateable updateable = f.contains(subj=tableURI, pred=RDF.type, obj=SwDB.Updateable) if updateable: updateableChecked = " checked='checked'" else: updateableChecked = "" if editable: tableLink = "%s" % \ (refTo(thisDoc, tableeditsym(server._base, table)), table) else: tableLink = table self.wfile.write(""" """ % (tableLink, table, label, table, hiddenChecked, table, editableChecked, table, updateableChecked)) self.wfile.write("""
%s label: hidden editable updateable
""" % (configFile)) self.wfile.write("") def _editFormStatement(self, pred, obj): #spit out a resource or literal? if obj==None: objVal = '' literalChecked = "" elif issubclass(obj.__class__, Literal): objVal = obj.value() literalChecked = " checked='checked'" elif issubclass(obj.__class__, Node): objVal = obj.uriref() literalChecked = "" else: raise RuntimeError("can't make a form with %s" % obj) if pred==None: predVal = '' else: predVal = pred.uriref() return predVal, objVal, literalChecked def editTableForm(self, table, configFile): """Sends an HTTP 200 status and displays an HTML form suitable for editing a table's schema. :param configFile: the string URI for the current config file """ def columnOptions(cols, selectedColumn=None): """Generates options for a select involving columns in table.""" ret = "" ret += "" for col in cols: if col == selectedColumn: ret += """ """ % (col, table, col) else: ret += """ """ % (col, table, col) return ret def tableColumnOptions(tableNames, selectedTbl=None, selectedCol=None): """Generates the options for all tables and columns.""" ret = "" #add a 'nothing' option to remove a foreign key if selectedTbl==None: ret += """ """ else: ret += "\n" for tbl in tableNames: cols = tableColumnNames(server._db, tbl, [])[1] for col in cols: if (tbl == selectedTbl and col == selectedCol): ret += """ """ % (tbl, col, tbl, col) else: ret += "\n" % \ (tbl, col, tbl, col) return ret #@@implement a mutex on the file resource server = self.server #to read triples from only this file, re-read the file extraRDF = ExtraRDF(server._base) foreignKeys = ForeignKeys() confParser.parseDBSchema(configFile, dbsym(server._base), server._base, foreignKeys, extraRDF) tableURI = extraRDF.tableURI(table) #get all the column names cols = tableColumnNames(server._db, table, [])[1] #write the header info self.send_response(200, '') self.send_header("Content-type", "text/html; charset=UTF-8") self.end_headers() self.wfile.write(""" DBViewHandler: Edit Table Schema: %s

Edit Table Schema: %s

""" % (table, table)) ################################################################## ## General Options ## self.wfile.write("""

The main configuration file may specify one or more schema files that contain user-provided RDF statements about table schema. These are included as the object of an rdfs:seeAlso statement in the main dbview configuration. This page allows you to edit one configuration file at a time.

If no files are listed, then add at least one rdfs:seeAlso statement to the configuration.

configuration file:
""" % (self._seeAlsoOptions(configFile))) ################################################################## ## Table Metadata ## self.wfile.write("

Table Metadata

\n") #allow the user to toggle the dbview options noExportStatement = extraRDF.formula.contains(subj=tableURI, pred=RDF.type, obj=SwDB.NoExportTable) if noExportStatement: noExportChecked = " checked='checked'" else: noExportChecked = "" updateableStatement = extraRDF.formula.contains(subj=tableURI, pred=RDF.type, obj=SwDB.Updateable) if updateableStatement: updateableChecked = " checked='checked'" else: updateableChecked = "" #get the label labelSymb = extraRDF.formula.the(subj=tableURI, pred=RDFS.label) if labelSymb: label = labelSymb.value() else: label = "" #get the row labels from rowLabelsFromSymb = None match = extraRDF.formula.each(pred=RDFS.subPropertyOf, obj=RDFS.label) for m in match: if extraRDF.columnName(m)[0] == table: rowLabelsFromSymb = m break if rowLabelsFromSymb: rowLabelsFrom = columnOptions(cols, extraRDF.columnName(rowLabelsFromSymb)[1]) else: rowLabelsFrom = columnOptions(cols, None) self.wfile.write("""

Table Options

DBView understands certain configuration options which control how tables are exported.

  • hidden - the table does not appear in the database schema, foreign keys into the table are exported as literals, and its data cannot be retrieved
  • editable - the table may be edited via the web (this page!)
  • updateable - data may be added to the table via the web
  • label - the rdfs:label to use for the table instead of the table's name; if there is no text, the default is used
  • row labels from - if a column is selected, this column's value is used to generate a label for this table's rows
hidden editable updateable
label:
row labels from:
""" % (noExportChecked, updateableChecked, label, rowLabelsFrom, table, configFile)) self.wfile.write("""

Other RDF

Additional RDF statements can be included in the schema file which are also given when the table is described by DBView.

To remove a statement, delete the text in the statement's predicate and object boxes.

""") #get all of the user's specified RDF about the table inputs = "" statements = [] for s in extraRDF.formula.statementsMatching(subj=tableURI): statements.append([s.predicate(), s.object()]) for i in range(self.EMPTY_TRIPLES_TO_SHOW): statements.append([None, None]) #print input elements for each triple self.wfile.write("""
""") cnt = 0 for pred, obj in statements: if not (pred==RDF.type and obj in [SwDB.NoExportTable, SwDB.Updateable]): predVal, objVal, literalChecked = self._editFormStatement(pred, obj) self.wfile.write(""" """ % (table, cnt, predVal, cnt, objVal, cnt, literalChecked)) cnt += 1 self.wfile.write("""
%s literal
""" % (table, configFile)) ################################################################# ## Column Metadata ## self.wfile.write("

Column Metadata

") colOptions = "" for col in cols: colOptions += "" % (col, col) #toggle the dbview options self.wfile.write("""

Column Options

DBView understands certain configuration options which control how columns are exported.

  • hidden - the column does not appear in its table's schema, foreign keys into the column are exported as literals, and its data cannot be retrieved
  • resource - the column's values are URIs rather than literals
  • relative to - if a column is a resource, then this string is prepended to the front of all URIs generated by the column's values; this is useful for appending 'http://' or 'mailto:' to the front of URIs
""") self.wfile.write("""
""") cnt = 0 for col in cols: colURI = extraRDF.columnURI(table, col) #see if the column is hidden noExportStatement = extraRDF.formula.contains(subj=colURI, pred=RDF.type, obj=SwDB.NoExportColumn) if noExportStatement: noExportChecked = " checked='checked'" else: noExportChecked = "" #see if the column is a resource resourceStatement = extraRDF.formula.contains(subj=colURI, pred=RDF.type, obj=SwDB.Resource) if resourceStatement: resourceChecked = " checked='checked'" prefixStatement = extraRDF.formula.the(subj=colURI, pred=SwDB.relativeTo) if prefixStatement: resourcePrefix = prefixStatement.value() else: resourcePrefix = "" else: resourceChecked = "" resourcePrefix = "" self.wfile.write(""" """ % (table, col, cnt, noExportChecked, cnt, resourceChecked, cnt, resourcePrefix)) cnt += 1 self.wfile.write("""
%s.%s hidden resource relative to:
""" % (table, configFile)) ################################################################## ## Foreign Keys ## tblNames = [x for x in tableNames(server._db)] tblNames.sort() self.wfile.write("

Foreign Keys

") self.wfile.write("""

Foreign key mappings are exported by dbview by exporting a link to a URI for a column's value rather than a literal. The foreign key relationships are also used when defining the subclass relationships.

To remove a foreign key relationship, the 'points to' value should be the empty line at the very top.

""") self.wfile.write("""
""") #print all foreign key relationships from this file cnt = 0 colMap = [x for x in foreignKeys.srcMap(table)] for i in range(self.EMPTY_TRIPLES_TO_SHOW): colMap.append([None, None, None]) for col, dt, dc in colMap: self.wfile.write(""" """ % (cnt, columnOptions(cols, col), cnt, tableColumnOptions(tblNames, dt, dc))) cnt += 1 self.wfile.write("""
points to:
""" % (table, configFile)) ################################################################## ## Links ## self.wfile.write("""

Links

Links are a special kind of foreign key relationship that use a separate linking table to connect two columns. The link's entry and exit columns are transparent; when dereferencing a row in the source table, all rows in the link table's entry column are searched and all rows in destination table which are linked to by link exit are printed.

""") #find all links where this colum is one of the 4 in a link links = [] for col in cols: for subj in extraRDF.formula.each(pred=SwDB.linkSource, obj=extraRDF.columnURI(table, col)): if subj not in links: links.append(subj) for subj in extraRDF.formula.each(pred=SwDB.linkEntry, obj=extraRDF.columnURI(table, col)): if subj not in links: links.append(subj) for subj in extraRDF.formula.each(pred=SwDB.linkExit, obj=extraRDF.columnURI(table, col)): if subj not in links: links.append(subj) for subj in extraRDF.formula.each(pred=SwDB.linkDestination, obj=extraRDF.columnURI(table, col)): if subj not in links: links.append(subj) cnt = 0 links.append(None) for link in links: #get the link properties if link: src = extraRDF.formula.the(subj=link, pred=SwDB.linkSource) srcTbl, srcCol = extraRDF.columnName(src) dest = extraRDF.formula.the(subj=link, pred=SwDB.linkDestination) destTbl, destCol = extraRDF.columnName(dest) entry = extraRDF.formula.the(subj=link, pred=SwDB.linkEntry) entryTbl, entryCol = extraRDF.columnName(entry) linkExit = extraRDF.formula.the(subj=link, pred=SwDB.linkExit) exitTbl, exitCol = extraRDF.columnName(linkExit) pred = extraRDF.formula.the(subj=link, pred=SwDB.linkPredicate) predString = pred.uriref() else: srcTbl = '' srcCol = '' destTbl = '' destCol = '' entryTbl = '' entryCol = '' exitTbl = '' exitCol = '' predString = '' self.wfile.write(""" """ % (cnt, tableColumnOptions(tblNames, srcTbl, srcCol), cnt, tableColumnOptions(tblNames, entryTbl, entryCol), cnt, tableColumnOptions(tblNames, exitTbl, exitCol), cnt, tableColumnOptions(tblNames, destTbl, destCol), cnt, predString)) cnt += 1 self.wfile.write("""
columns from:
point to:
which links to:
which points to:
with predicate:
""" % (configFile)) ################################################################## ## Column Extra RDF ## self.wfile.write("""

Other RDF

Additional RDF statements can be included in the schema file which are also given when the column is described by DBView.

To remove a statement, delete the text in the statement's predicate and object boxes.

""") cnt = 0 colStatements = [] #[col, pred, obj] for col in cols: colURI = extraRDF.columnURI(table, col) for s in extraRDF.formula.statementsMatching(subj=colURI): colStatements.append([col, s.predicate(), s.object()]) for i in range(self.EMPTY_TRIPLES_TO_SHOW): colStatements.append([None, None, None]) for col, pred, obj in colStatements: #print the triples if (not (pred==RDF.type and obj in [SwDB.NoExportColumn, SwDB.Resource]) and not (pred==SwDB.relativeTo)): predVal, objVal, literalChecked = self._editFormStatement(pred, obj) self.wfile.write(""" """ % (cnt, columnOptions(cols, col), col, cnt, predVal, col, cnt, objVal, cnt, literalChecked)) cnt += 1 self.wfile.write("""
literal:
""" % (table, configFile)) ################################################################# ## Footer ## self.wfile.write(""" """) def forbidden(self, message='', thisDocument=''): """Report an HTTP 403 'Forbidden' error. This HTML document indicates that the client is not allowed to access the data. :param message: an optional string to be included in the error message :param thisDocument: the URI for this error document """ s = self.server reldbdoc = refTo(thisDocument, dbsym(s._base)) self.send_response(403, message) self.send_header("Content-type", "text/html; charset=UTF-8") self.end_headers() self.wfile.write(""" DBViewHandler: 403: Forbidden

Forbidden (403)

%s

The client %s does not have permission to access the requested resource.

cf 10.4.4 403 Forbidden from the HTTP specification.

""" % (message, self.client_address[0])) def notFound(self, message='', thisDocument=''): """Report an HTTP 404 'Not Found' error. This HTML document indicates that the requested resource was not found, and has a link to the RDF database schema. :param message: an optional string to be included in the error message :param thisDocument: the URI for this error document """ s = self.server reldbdoc = refTo(thisDocument, dbsym(s._base)) self.send_response(404, message) self.send_header("Content-type", "text/html; charset=UTF-8") self.end_headers() self.wfile.write(""" DBViewHandler: 404: Not Found

Not Found (404)

%s

Have a look at the database schema.

cf 10.4.5 404 Not Found from the HTTP specification.

""" % (message, reldbdoc)) def describeDB(self): """Describe the database in browseable RDF. Sends a 200 response, HTTP headers, and calls `aboutDB()`. """ self.send_response(200, "@@not sure you're winning yet, actually") self.send_header("Content-type", RDF_MediaType) self.end_headers() s = self.server store = myStore.store formula = store.newFormula() #write statements about the database to the store aboutDB(dbdocaddr=s._base, dbName=s._dbName, store=store, formula=formula, eachTable=tableNames(s._db), extraRDF=s._extraRDF, label=True, tables=True) #write a brief description of each table in the database for table in tableNames(db=self.server._db): tableURI = s._extraRDF.tableURI(table) if tableURI in s._extraRDF.formula.each(pred=RDF.type, obj=SwDB.NoExportTable): continue #don't describe this table aboutTable(eachCol=showColumns(s._db, table), dbdocaddr=s._base, table=table, store=store, formula=formula, subclass=s._subclass, extraRDF=s._extraRDF, numRows=None, columns=False, database=False, showSubclass=False, showPrimaryKeys=False, tableData=False) #close the formula and dump its contents formula = formula.close() sink = ToRDF(self.wfile, s._base, flags=s._sinkFlags) sink.bind('rdf', str(RDF)) sink.bind('rdfs', str(RDFS)) sink.bind('db', str(SwDB)) store.dumpBySubject(formula, sink) def describeTable(self, table): """Describes a table in the database in RDF. Sends a 200 response, HTTP headers, and calls `aboutTable()`. """ self.send_response(200, "@@not sure you're winning yet, actually") self.send_header("Content-type", RDF_MediaType) self.end_headers() s = self.server store = myStore.store formula = store.newFormula() #write brief statements about the database to the store aboutDB(dbdocaddr=s._base, store=store, formula=formula, eachTable=[], extraRDF=s._extraRDF, dbName=s._dbName, tables=False) #write statements about the table to the store if s._exportNumRows: numRows = numberOfRows(s._db, table) else: numRows = None cols = [x for x in showColumns(s._db, table)] colNames = [x[0] for x in cols] aboutTable(eachCol=cols, dbdocaddr=s._base, table=table, store=store, formula=formula, subclass=s._subclass, extraRDF=s._extraRDF, numRows=numRows) #print information about each column for column in colNames: aboutColumn(column=column, table=table, dbdocaddr=s._base, store=store, formula=formula, foreignKeys=s._foreignKeys, extraRDF=s._extraRDF, subclass=s._subclass) #finished writing RDF; time to export formula = formula.close() thisDocument = tableschemasym(s._base, table) sink = ToRDF(self.wfile, thisDocument, flags=s._sinkFlags) sink.bind('rdf', str(RDF)) sink.bind('rdfs', str(RDFS)) sink.bind('db', str(SwDB)) store.dumpBySubject(formula, sink) def dumpTable(self, table): """Dumps all the data in a table to the output stream. Mostly just calls `tableDump()`. """ self.send_response(200, "@@cross those fingers") self.send_header("Content-type", RDF_MediaType) self.end_headers() s = self.server store = myStore.store formula = store.newFormula() #dump enough information about the table to work back to the dbDocument aboutTable([], dbdocaddr=s._base, table=table, store=store, formula=formula, subclass=s._subclass, extraRDF=s._extraRDF, showSubclass=False, columns=False, showPrimaryKeys=False, database=False) #dump all of the table's data into the store tableDump(db=s._db, dbdocaddr=s._base, table=table, store=store, formula=formula, foreignKeys=s._foreignKeys, subclass=s._subclass, extraRDF=s._extraRDF) #we are done writing to the store formula = formula.close() thisDoc = tabledatasym(s._base, table) sink = ToRDF(self.wfile, thisDoc, flags=s._sinkFlags) sink.bind('rdf', str(RDF)) sink.bind('rdfs', str(RDFS)) sink.bind('db', str(SwDB)) sink.bind(table, colsym(s._base, table, '')) sink.bind('kludge_%s' % table, "%s." % colsym(s._base, table, '')) store.dumpChronological(formula, sink) def editTable(self, table): """Presents an html document which contains a form that can be used to modify the schema of the table, if the table is declared as editable. """ s = self.server extraRDF = s._extraRDF if [RDF.type, SwDB.Editable] in extraRDF.tableSchemaStatements(table): self.editTableForm(table, configFile=s._configfiles[0]) else: self.forbidden('This schema is not editable.') def editDB(self, thisDoc): """Shows an html document used for editing the database schema.""" s = self.server if s.editable: self.editDBForm(configFile=s._configfiles[0], thisDoc=thisDoc) else: self.forbidden("The database schema is not editable.") def dumpRow(self, table, indexString): """Prints the values of all columns for a given row. Mostly just calls `rowDump()`. """ #make sure the index string is OK try: parsedIndexString = parseIndexString(indexString) except ValueError, error: #print an error and exit self.notFound(str(error)) return self.send_response(200, "@@cross those fingers") self.send_header("Content-type", RDF_MediaType) self.end_headers() s = self.server store = myStore.store formula = store.newFormula() #dump enough about the table to get back to the dbDocument aboutTable([], dbdocaddr=s._base, table=table, store=store, formula=formula, subclass=s._subclass, extraRDF=s._extraRDF, showSubclass=False, columns=False, showPrimaryKeys=False, database=False) #dump information about this row cols = [x[0] for x in parsedIndexString] vals = [x[1] for x in parsedIndexString] nsTables = rowDump(db=s._db, dbdocaddr=s._base, table=table, store=store, cols=cols, vals=vals, formula=formula, foreignKeys=s._foreignKeys, subclass=s._subclass, backlinks=s._backlinks, forlinks=s._forlinks, extraRDF=s._extraRDF) #dump some brief information about each table that was involved for nsTable in nsTables: aboutTable([], dbdocaddr=s._base, table=nsTable, store=store, formula=formula, subclass=s._subclass, extraRDF=s._extraRDF, showSubclass=False, columns=False, showPrimaryKeys=False, database=False, tableData=False) #dump the store back to the user formula = formula.close() try: thisDoc = itemsym(s._base, table, cols, vals, subclass=s._subclass) except UnicodeEncodeError: logging.warning("Encoding error when generating document for " + "%s in table %s", indexString, table) return thisDoc = thisDoc[:-len("#item")] sink = ToRDF(self.wfile, thisDoc, flags=s._sinkFlags) #common namespaces sink.bind('rdf', str(RDF)) sink.bind('rdfs', str(RDFS)) #bind this table as a namespace sink.bind('kludge_%s' % table, "%s." % (colsym(s._base, table, ''))) sink.bind('%s' % table, "%s" % (colsym(s._base, table, ''))) #bind all tables used for backlinks and subclasses for nsTable in nsTables: sink.bind('%s' % nsTable, '%s' % colsym(s._base, nsTable, '')) sink.bind('kludge_%s' % nsTable, '%s.' % colsym(s._base, nsTable, '')) store.dumpBySubject(formula, sink) def getChangeTime(filename): """Gets the time at which a file was most recently changed. :param filename: the file in question """ return os.stat(filename).st_ctime def rowWhere(parsedIndexString): """Returns a string suitable for giving to asSQL as condextra. :parsedIndexString: a list as generated by parseIndexString() >>> rowWhere([['id', '8'], ['emailId', '7']]) "id='8' AND emailId='7'" """ clauseString = ["%s='%s'" % (x[0], x[1]) for x in parsedIndexString] return " AND ".join(clauseString) def asSQL(fields, tables, keys={}, joins=[], condextra = ''): """Format a query as SQL. Roughly: select fields from tables where keyJoins and condextra :param fields: a list of lists of fieldnames, one list per table :param tables: a list of table names :param keys: a dictionary that maps tables to given primary key fields :param joins: a list of lists... the bottom half of a matrix... keyJoins[i][j] is None or the name of a field in table i to join with the primary key of table j. :param condextra: an SQL expression added to the end of the WHERE clause >>> asSQL([['uri']], ['uris']) 'select uris.uri from uris;' >>> asSQL([['uri']], ['uris'], {'uris': 'id'}) 'select uris.uri from uris;' >>> asSQL([['family', 'email', 'city', 'id'], ... ['plenary', 'meal_choice']], ['users', 'techplenary2002'], ... {'users': 'id'}, [[], ['id']], None) 'select users.family,users.email,users.city,users.id,techplenary2002.plenary,techplenary2002.meal_choice from users,techplenary2002 where techplenary2002.id = users.id;' >>> asSQL([['ResourceName', 'ResourceID'], ... ['ActivityName', 'ActivityID'], ... ['Percent_of_person', 'Assignment_ID']], ... ['Resources', 'Activities', 'Assignments'], ... {'Activities': 'ActivityID', ... 'Resources': 'ResourceID', ... 'Assignments': 'Assignment_ID'}, ... [[], ['ResourceID'], ['MyResourceID', 'MyActivityID']], ... None) 'select Resources.ResourceName,Resources.ResourceID,Activities.ActivityName,Activities.ActivityID,Assignments.Percent_of_person,Assignments.Assignment_ID from Resources,Activities,Assignments where Activities.ResourceID = Resources.ResourceID AND Assignments.MyResourceID = Resources.ResourceID AND Assignments.MyActivityID = Activities.ActivityID;' >>> asSQL([['uri'], ['id', 'name'], (), ... ['id', 'family', 'given', 'phone'], ()], ... ['uris', 'groupDetails', 'ids', 'userDetails', 'hierarchy'], ... {'hierarchy': 'id', ... 'ids': 'id', ... 'uris': 'id', ... 'groupDetails': 'id'}, ... [[], ['sub'], [None, 'sponsor'], ... ['emailUrisId', None, 'id'], ... [None, None, 'sub', None]], ... "hierarchy.type='U' and hierarchy.super=30310") "select uris.uri,groupDetails.id,groupDetails.name,userDetails.id,userDetails.family,userDetails.given,userDetails.phone from uris,groupDetails,ids,userDetails,hierarchy where groupDetails.sub = uris.id AND ids.sponsor = groupDetails.id AND userDetails.emailUrisId = uris.id AND userDetails.id = ids.id AND hierarchy.sub = ids.id AND hierarchy.type='U' and hierarchy.super=30310;" """ #check that at least one field was specified if fields == [[]]: raise ValueError('at least one field must be specified in a query.') fldnames = [] for ti in range(len(tables)): for f in fields[ti]: fldnames.append("%s.%s" % (tables[ti], f)) cond = '' for i in range(len(tables)): for j in range(i): if joins[i][j]: jexpr = "%s.%s = %s.%s" % (tables[i], joins[i][j], tables[j], keys[tables[j]]) if cond: cond = cond + ' AND ' cond = cond + jexpr if condextra: if cond: cond = cond + ' AND ' + condextra else: cond = condextra q = 'select %s from %s' % (','.join(fldnames), ','.join(tables)) if cond: q = q + ' where ' + cond return q + ';' def rowSerialize(db, rows, pKeyIndex, columnNames, dbdocaddr, table, foreignKeys, store, formula, subclass, extraRDF, forlinks=False): """Dump the data from some rows into an RDFStore. :param rows: an iterator over row data as returned from the python DB API :param pKeyIndex: a list of the indices of primary keys in columnNames :param columnNames: list of column names corresponding to row (noExportColumns already filtered out) :param dbdocaddr: the base URI :param table: a string representing the table :param foreignKeys: an instance of ForeignKeys :param store: the RDFStore to add statements to :param formula: the formula to write statements into :param subclass: an instance of Subclass :param db: a database connection object (needed to generate proper URIs for foreign rows) :param extraRDF: additional RDF to include when serializing a column :return: [status, tables] where status is True if one or more statements was written into formula, and tables is a list of all tables used in writing foreign rows >>> fk = ForeignKeys() >>> fk.add('users', 'people', 'id', 'personId') >>> pk = {'users': Set(['id']), 'people': Set(['personId'])} >>> sc = Subclass(fk, pk) >>> rows = [[19, 'test@example.org', 'test user']] >>> pKeyIndex = [0] >>> columnNames = ['id', 'email', 'name'] >>> base = 'http://example.org/dbview' >>> table = 'users' >>> extraRDF = ExtraRDF('http://example/services/dbview') >>> store=RDFStore() >>> fmla=store.newFormula() >>> rowSerialize(None, rows, pKeyIndex, columnNames, base, table, fk, ... store, fmla, sc, extraRDF) [True, []] >>> for s in fmla.statements: ... '%s %s %s' % (s.subject(), s.predicate(), s.object()) 'item type .table' 'item id 19' 'item email test@example.org' 'item name test user' >>> s.subject().uriref() 'http://example.org/dbview/people/item/personId/19#item' """ def writeColumn(table, column, columnURI, rowURI, value): """A helper method that writes a column to the store.""" #consider the extra rdf to see if we should take special action if extraRDF.formula.contains(subj=extraRDF.columnURI(table, column), pred=RDF.type, obj=SwDB.Resource): #this column is of type Resource (exports resource not string) rel = extraRDF.formula.the(subj=extraRDF.columnURI(table, column), pred=SwDB.relativeTo) if rel: relativeTo = rel.value() valueURI = store.symbol("%s%s" % (relativeTo, value)) formula.add(subj=rowURI, pred=columnURI, obj=valueURI) else: #not relative, assume path is absolute valueURI = store.symbol(value) formula.add(subj=rowURI, pred=columnURI, obj=valueURI) else: formula.add(subj=rowURI, pred=columnURI, obj=store.intern(value)) return ## Serialize Row tableURI = store.symbol(tablesym(dbdocaddr, table)) forTables = [] rowDataFound=False for row in rows: rowDataFound=True keyColNames = [columnNames[i] for i in pKeyIndex] keyColValues = [row[i] for i in pKeyIndex] try: if len(keyColNames) == 0: #create a blank node since we can't make a unique URI rowURI = store.newBlankNode(formula) else: rowURI = store.symbol(itemsym(dbdocaddr, table, keyColNames, keyColValues, subclass)) except UnicodeEncodeError: #use a blank node instead of a URI, since we can't make one rowURI = store.newBlankNode(formula) #make an error message for the user errorString = "" for i in range(len(keyColNames)): colName = keyColNames[i] colVal = keyColValues[i] errorString += "%s=%s " % (colName, colVal) logging.warning("Encoding error when generating a URI for %s" + "in table %s", errorString, table) formula.add(subj=rowURI, pred=RDF.type, obj=tableURI) #iterate over all the elements of row, which contains all the data for i in range(len(row)): colName = columnNames[i] colValue = row[i] if colValue == None or colValue=='': #don't export nulls/empty continue colValue = str(colValue).decode('iso8859-1') columnURI = store.symbol(colsym(dbdocaddr, table, colName)) #when foreign keys are concerned, we need to generate a URI for #each row of each table that the column points to. This is not #trivial, because we need to generate the *correct* URI for the #foreign row (the foreign key might not point to the primary #key of the other database, or the other database might have #a multi-column primary key)--see getForeignRowIdentifiers() #there is no foreign key (or database connection) if (db==None) or (not foreignKeys.mapsSomewhere(table, colName)): writeColumn(table, colName, columnURI, rowURI, colValue) continue #for every column that table.colName points to for dt, dc in foreignKeys.mapToAll(table, colName): #get information about the URI for foreign rows in dt dtPKeyIndex, dtColNames = tableColumnNames(db, dt, []) dtPColumnNames = [dtColNames[x] for x in dtPKeyIndex] #if table is a subclass of dt and if dc is a pkey, don't write #prevents the id of the row from being printed n times, where #n is the depth of the superclass trace trace = [x for x in subclass.superclassTrace(table, [colName])] if [dt, [dc]] in trace: continue #don't do anything try: foreignRowIds = getForeignRowIdentifiers(db=db, table=dt, cols=[dc], vals=[colValue], pColumnNames=dtPColumnNames) except ValueError: #foreign table has no primary keys logging.debug("could not generate URIs for %s", dt) writeColumn(table, column, columnURI, rowURI, colValue) continue foreignRowIdCols = foreignRowIds[0] foreignRowIdVals = foreignRowIds[1] for foreignRowIdVal in foreignRowIdVals: #vals of pKeys in dt try: foreignRowId = itemsym(dbdocaddr, dt, foreignRowIdCols, foreignRowIdVal, subclass) foreignRowURI = store.symbol(foreignRowId) formula.add(subj=rowURI, pred=columnURI, obj=foreignRowURI) if forlinks: #rows of dt identified by foreignRowIdCols=friVals forPKeyIndex, forCols = tableColumnNames(db, dt, extraRDF.noExportColumns(dt)) forQuery = asSQL([forCols], [dt], {}, {}, rowWhere(zip(foreignRowIdCols, foreignRowIdVal))) forRows = eachRow(db, forQuery) rowWritten, tbls = rowSerialize(db=db, rows=forRows, pKeyIndex=forPKeyIndex, columnNames=forCols, dbdocaddr=dbdocaddr, table=dt, foreignKeys=foreignKeys, store=store, formula=formula, subclass=subclass, extraRDF=extraRDF) if rowWritten: forTables.append(dt) for t in tbls: if t not in forTables: forTables.append(t) except UnicodeEncodeError: logging.warning("Encoding error for %s.%s=%s", dt, keyColNames[0], str(keyColValues[0])) continue return [rowDataFound, forTables] def getForeignRowIdentifiers(db, table, cols, vals, pColumnNames): """Queries the database using the information about the values of columns for a specific table, and returns a list of primary key column names and a list of the values of these columns for each row that matches the query. Note that this method gives values only for the current table; subclass relationships are not taken into account. :param table: the table whose rows are wanted :param cols: the columns used to locate rows in table :param pColumnNames: an ordered list of primary key column names; ordered in the same way that tableColumnNames does :return: [[colName], [[colVal], [colVal]]] """ if len(pColumnNames) == 0: raise ValueError('cannot get foreign rows with no primary keys') #if the columns are all primary keys, we don't need to query; #there can only be one result and we already have its URI if Set(pColumnNames) == Set(cols): return pColumnNames, [vals] #if we're looking for something that isn't a primary key, we start having #to do lots of queries; one for each ROW in the result. #perhaps, in this case, a special page could be built instead? #@@open issue query = asSQL([pColumnNames], [table], {}, {}, rowWhere(zip(cols, vals))) logging.debug("foreign rows query: %s", query) #execute the query rows = eachRow(db, query) retVals = [] #generate the results for row in rows: retVals.append(row) return pColumnNames, retVals def tableExists(db, table): """Returns True if the table exists.""" c = db.cursor() query = "DESCRIBE %s" % table try: c.execute(query) return True except MySQLdb.ProgrammingError, e: if e.args[0] == 1146: #Table 'table' doesn't exist return False else: #some other, unknown exception happened raise e def tableNames(db): """Return an interator over the names of the tables in the database. :param db: a database connection object """ c = db.cursor() c.execute("SHOW TABLES") # mysql-specific? # http://www.mysql.com/documentation/mysql/bychapter/manual_MySQL_Database_Administration.html#SHOW res = [] while 1: row = c.fetchone() if not row: break yield row[0] def tableColumnNames(db, table, noExportColumns = []): """Gets the columns in a table and returns a 2-tuple of results. The first element of the list is a tuple of integer indices of the primary keys in the list of names, or [] if there is no primary key defined for the table. The second element is an list of column names. If the primary keys for t1 are 'id' and 'oid' are are indices 0 and 4, tableColumnNames(db, t1) -> ([0, 4], ['id', 'oid']) :param db: the MySQLdb connection object (@@does it have to be MySQLdb?) :param table: a string representing the table :param noExportColumns: a list of strings representing columns that should not be exported """ keys, cols = [], [] col = 0 for colName, ty, nullable, isKey, dunno, dunno in showColumns(db, table): if colName not in noExportColumns: cols.append(colName) if isKey == "PRI": keys.append(col) col += 1 return keys, cols def eachRow(db, query): """Executes a query and returns an iterator over the rows returned. :param db: a database connection object :param query: an SQL query :return: an iterator over the results of executing query on the database """ cursor = db.cursor() cursor.execute(query) while 1: row = cursor.fetchone() if not row: break yield row def tableDump(db, dbdocaddr, table, store, formula, foreignKeys, subclass, extraRDF): """Dumps the data in a table into an RDF sink. :param db: a database connection object :param dbdocaddr: the base for all URIs that dbview creates :param table: the string name of the table :param store: an instance of RDFStore :param formula: the formula where new statements are written :param foreignKeys: an instance of ForeignKeys :param subclass: an instance of Subclass :param extraRDF: an instance of ExtraRDF """ nex = extraRDF.noExportColumns(table) pKeyIndex, columnNames = tableColumnNames(db, table, nex) query = asSQL([columnNames], [table]) logging.debug("tableDump query: %s", query) rowSerialize(rows=eachRow(db, query), pKeyIndex=pKeyIndex, table=table, columnNames=columnNames, dbdocaddr=dbdocaddr, store=store, foreignKeys=foreignKeys, formula=formula, subclass=subclass, db=db, extraRDF=extraRDF) def _includeSubclasses(db, dbdocaddr, table, cols, vals, store, formula, foreignKeys, subclass, extraRDF): """When dereferencing a URI for a table that has subclasses, this method extracts data about all of the subclasses and includes it. :param db: a database connection object :param dbdocaddr: the base for all URIs that dbview creates :param table: the string name of the table whose subclasses are included :param cols: a list of the primary keys for table :param vals: a list of the values of the columns in cols :param store: an instance of RDFStore, where statements are written to :param formula: the formula statements are written into :param foreignKeys: an instance of ForeignKeys :param subclass: an instance of Subclass :param extraRDF: an instance of ExtraRDF :return: a list of all the tables whose data was included """ #go through all the tables that point here to get their data otherTables = [] for st in subclass.getChildrenOf(table): try: stPKeyIndex, stColNames = tableColumnNames(db, st, extraRDF.noExportColumns(st)) except ProgrammingError: logging.debug("could not find subclass table %s", st) else: #get the names of the primary keys of st stPKeyNames = showPrimKeys(db, st) query = asSQL([stColNames], [st], {}, {}, rowWhere(zip(stPKeyNames, vals))) logging.debug("subclass query: %s", query) ret, ignore = rowSerialize(rows = eachRow(db, query), pKeyIndex=stPKeyIndex, columnNames=stColNames, dbdocaddr=dbdocaddr, table=st, foreignKeys=foreignKeys, store=store, formula=formula, subclass=subclass, db=db, extraRDF=extraRDF) #if something was actually written about this table, note it if ret: otherTables.append(st) #recurse to get all the subclasses of st res = _includeSubclasses(db=db, dbdocaddr=dbdocaddr, table=st, foreignKeys=foreignKeys, cols=stColNames, vals=vals, store=store, formula=formula, subclass=subclass, extraRDF=extraRDF) for x in res: otherTables.append(x) return otherTables def rowDump(db, dbdocaddr, table, cols, vals, store, formula, foreignKeys, subclass, extraRDF, backlinks=True, forlinks=False, includeSubclasses=True): """Get the data in a row and dump as RDF/XML. We assume there is just one row that matches the given indexString. Each column in the indexString must be a defined primary key for the table in question, however. :param db: the MySQLdb object for the database connection :param dbdocaddr: the base URI :param table: a string representing the table :param cols: a list of columns that identify the row :param vals: a list of vals that are associated with cols :param store: an RDFStore :param formula: the formula to write statements into :param foreignKeys: an instance of ForeignKeys :param subclass: information about which tables are subclasses of others :param extraRDF: an instance of ExtraRDF :param backlinks: True if other rows that point to this row are shown :param forlinks: True if rows that are pointed to by this row are shown :param includeSubclasses: True if data from subclasses is shown """ nex = extraRDF.noExportColumns(table) pKeyIndex, columnNames = tableColumnNames(db, table, nex) query = asSQL([columnNames], [table], {}, {}, rowWhere(zip(cols, vals))) logging.debug("rowDump query: %s", query) #write the data for this table written, tbls = rowSerialize(rows=eachRow(db, query), pKeyIndex=pKeyIndex, table=table, columnNames=columnNames, dbdocaddr=dbdocaddr, store=store, foreignKeys=foreignKeys, formula=formula, subclass=subclass, db=db, extraRDF=extraRDF, forlinks=forlinks) #when dereferencing a URI for a table that has subclasses, we need to #include all of the information found in those tables as well if includeSubclasses: otherTables = _includeSubclasses(db=db, dbdocaddr=dbdocaddr, table=table, foreignKeys=foreignKeys, cols=cols, vals=vals, store=store, formula=formula, subclass=subclass, extraRDF=extraRDF) for t in otherTables: if t not in tbls: tbls.append(t) ## Backlinks # When dumping a row in t1 with primary key c1 and value of that # v, then you dump all rows in t2 where c2 is v and there is a # mapping t2.c2 -> t1.c1 # # Backlinks also dump any information held in subclass tables of tables # that point into the table. if backlinks: if len(cols) > 1: logging.debug("backlinks on multi-column keys not implemented") else: logging.debug("looking for links into %s", table) backTables = _findBacklinks(db=db, table=table, indexCol=cols[0], indexVal=vals[0], extraRDF=extraRDF, foreignKeys=foreignKeys, store=store, formula=formula, dbdocaddr=dbdocaddr, subclass=subclass, forlinks=forlinks) for backTable in backTables: tbls.append(backTable) return tbls #tables need namespaces def _dumpTerm(term, extraRDF, formula, seen=[]): """Traverses the RDF provided in extraRDF, following bnodes and writing information about them into the formula. """ def addStatement(statement): """A helper method which adds a statement to formula.""" formula.add(subj=statement.subject(), pred=statement.predicate(), \ obj=statement.object()) #if we are dealing with a bnode, then dump all we know about it, as well #as any bnodes in statements involving it if issubclass(term.__class__, formula.newBlankNode().__class__) and \ term not in seen: seen.append(term) for s in extraRDF.formula.statementsMatching(subj=term): addStatement(s) _dumpTerm(s.object(), extraRDF, formula, seen) for s in extraRDF.formula.statementsMatching(obj=term): addStatement(s) _dumpTerm(s.subject(), extraRDF, formula, seen) def _prepareLink(db, dbdocaddr, entryTbl, entryTblCol, exitTbl, exitTblCol, linkTbl, linkEntry, linkExit, linkEntryVal, store, formula, extraRDF, subclass, foreignKeys, shouldDumpMatches=False): """If a link exists between st and dt, then this code will return a list of the row URIs for the link as well as the tables used in it. The link travels from st.sc -> dt.dc via a link between linkTbl.linkEntry and linkTbl.linkExit, where linkTbl.linkEntry -> st.sc and linkTbl.linkExit -> dt.dc. However, the first constraint is not checked by _prepareLink, so switching st, dt, linkEntry, and linkExit will get links that travel the other direction. The 'linking rows' in linkTbl are first found by creating a query like 'SELECT linkExit FROM linkTbl WHERE linkEntry=indexVal;' Once this query is executed, we have one row for each match in the linking table, where a match has foreignColumn=indexVal. From these rows, the desired values of dc for each match are retrieved (the value of the linkExit column in linkTbl). These values are then used to query dt with something like 'SELECT * FROM dt WHERE dc=linkExitValue;' And now we have the values of all the columns in dt, which was linked to by the link between linkTbl.linkEntry and linkTbl.linkExit. :param db: database connection object :param dbdocaddr: the address to the DatabaseDocument :param entryTbl: the table which is pointed to by linkEntry :param entryTblCol: the column in entryTbl pointed to by linkEntry :param exitTbl: the table which is pointed to by linkExit :param exitTblCol: the column in exitTbl pointed to by linkExit :param linkEntryVal: the value of linkEntry that identifies the linking row :param linkTbl: the table which contains the linkSubject and linkObject :param linkEntry: the linkSubject column name :param linkExit: the linkExit column name :param shouldDumpMatches: True if the object row of links should be printed :return: [symbols, tables] where symbols is a list of symbols for rows involved in the link, and tables is a list of tables of the symbols """ #returns a list of URIs which are connected to the source table dstURIs = [] linkTables = [] #get the row in the linking table linkTblIndex, linkTblNames = tableColumnNames(db, linkTbl, []) backlinkQuery = asSQL([[linkExit]], [linkTbl], {}, {}, rowWhere([[linkEntry, linkEntryVal]])) logging.debug("serializeLink query: %s", backlinkQuery) linkingRows = eachRow(db, backlinkQuery) for linkingRow in linkingRows: #this row points to originalTable.originalColumn #and the dt.dc we want linkExitVal = linkingRow[0] if shouldDumpMatches: #@@should be on if forward links are enabled exitTblIndex, exitTblNames = tableColumnNames(db, exitTbl, extraRDF.noExportColumns(exitTbl)) whereQ = rowWhere([[exitTblCol, linkExitVal]]) query = asSQL([exitTblNames], [exitTbl], {}, {}, whereQ) logging.debug("serializeLink linking query: %s", query) matchRows = eachRow(db, query) rowSerialize(db, rows=matchRows, pKeyIndex=exitTblIndex, dbdocaddr=dbdocaddr, columnNames=exitTblNames, table=exitTbl, foreignKeys=foreignKeys, store=store, formula=formula, subclass=subclass, extraRDF=extraRDF) #include any subclass stuff logging.debug('checking for subclasses of %s', exitTbl) subTables = _includeSubclasses(db=db, dbdocaddr=dbdocaddr, table=exitTbl, subclass=subclass, foreignKeys=foreignKeys, store=store, formula=formula, cols=[exitTblCol], vals=[linkExitVal], extraRDF=extraRDF) for subTable in subTables: linkTables.append(subTable) dstURI = itemsym(dbdocaddr, exitTbl, [exitTblCol], [linkExitVal], subclass) dstURI = store.symbol(dstURI) dstURIs.append(dstURI) return dstURIs, linkTables def _findLinks(db, dbdocaddr, originalTable, originalColumn, originalColVal, linkTable, linkColumn, extraRDF, foreignKeys, subclass, store, formula, shouldDumpMatches=False): """When doing backlinks, if an explicit Link exists, then the link is dumped to the store in preference to the usual backlink. This essentially hides a linking table and directly dumps the rows it links to. :param originalTable: the table whose backlinks are being generated :param originalColumn: the name of the index column in originalTable :param originalColVal: the value of originalColumn :param linkTable: the table which contains the link entry and exit :param linkColumn: the column which is the link entry or exit :param shouldDumpMatches: True if matching rows should be dumped (forlinks) """ linkTables = [] #stores the names of all tables printed due to links linkURI = extraRDF.columnURI(linkTable, linkColumn) #get all statements where linkTable.linkColumn is a linkSubject or linkObj subjLinks = extraRDF.formula.each(pred=SwDB.linkSubject, obj=linkURI) objLinks = extraRDF.formula.each(pred=SwDB.linkObject, obj=linkURI) links = subjLinks + objLinks if len(links) == 0: return False, linkTables else: for link in links: #linkEntry is the column which is a linkSubject #linkExit is the column which is a linkObject #linkEntryTbl and linkExitTbl should be the same linkPred = extraRDF.formula.the(subj=link, pred=SwDB.linkPredicate) linkEntry = extraRDF.formula.the(subj=link, pred=SwDB.linkSubject) linkExit = extraRDF.formula.the(subj=link, pred=SwDB.linkObject) linkEntryTbl, linkEntryCol = extraRDF.columnName(linkEntry) linkExitTbl, linkExitCol = extraRDF.columnName(linkExit) #some sanity checks @@check for this earlier! if linkEntryTbl != linkExitTbl: raise RuntimeError("oops1") if linkEntryCol == linkExitCol: raise RuntimeError("oops2") if linkExitTbl != linkTable: raise RuntimeError("oops3") #dump all we know about the predicate _dumpTerm(term=linkPred, extraRDF=extraRDF, formula=formula, seen=[]) for st, sc in foreignKeys.mapToAll(linkEntryTbl, linkEntryCol): #linkEntryTbl.linkEntryCol -> st.sc for dt, dc in foreignKeys.mapToAll(linkExitTbl, linkExitCol): #linkExitTbl.linkExitCol -> dt.dc; link between sc and dt #ask db what the URI for the unknown table is #(originalTable is either st or dt; we don't know which if st == originalTable: #unknown row is the destination linkTables.append(dt) srcURI = store.symbol(itemsym(dbdocaddr, st, [sc], [originalColVal], subclass)) #write information about the link and destURIs, tbls = _prepareLink(dbdocaddr=dbdocaddr, entryTbl=st, entryTblCol=sc, exitTbl=dt, exitTblCol=dc, db=db, linkEntryVal=originalColVal, linkTbl=linkTable, linkEntry=linkEntryCol, linkExit=linkExitCol, extraRDF=extraRDF, subclass=subclass, foreignKeys=foreignKeys, store=store, formula=formula, shouldDumpMatches=shouldDumpMatches) #make a note of all the tables that were used for tbl in tbls: linkTables.append(tbl) #add a statement connecting source to destionation for destURI in destURIs: formula.add(subj=srcURI, pred=linkPred, obj=destURI) elif dt == originalTable: #unknown row is the source linkTables.append(st) destURI = store.symbol(itemsym(dbdocaddr, dt, [dc], [originalColVal], subclass)) srcURIs, tbls = _prepareLink(dbdocaddr=dbdocaddr, entryTbl=dt, entryTblCol=dc, exitTbl=st, exitTblCol=sc, db=db, linkEntryVal=originalColVal, linkTbl=linkTable, linkEntry=linkExitCol, linkExit=linkEntryCol, extraRDF=extraRDF, subclass=subclass, foreignKeys=foreignKeys, store=store, formula=formula, shouldDumpMatches=shouldDumpMatches) for tbl in tbls: linkTables.append(tbl) for srcURI in srcURIs: formula.add(subj=srcURI, pred=linkPred, obj=destURI) else: raise RuntimeError("the link is malformed") return True, linkTables def _findBacklinks(db, table, indexCol, indexVal, foreignKeys, store, formula, dbdocaddr, subclass, extraRDF, forlinks, completeBacklinks=False): """Prints all backlinking rows and returns a list of tables that were included in that set. :param table: the table whose backlinks are wanted :param indexCol: the column in table which is the primary key :param completeBacklinks: True if links into any column of table, not just the primary key columns, should be retrieved :return: a list of tables that point into table """ backTables = [] for dt, dc in foreignKeys.mapFromAll(table, indexCol): dtPKeyIndex, dtColNames = tableColumnNames(db, dt, extraRDF.noExportColumns(dt)) query = asSQL([dtColNames], [dt], {}, {}, rowWhere([[dc, indexVal]])) logging.debug("backlink query: %s", query) #see if there are any links associated with the foreign table linkExists, linkTables = _findLinks(originalTable=table, originalColumn=indexCol, linkTable=dt, linkColumn=dc, extraRDF=extraRDF, foreignKeys=foreignKeys, dbdocaddr=dbdocaddr, subclass=subclass, db=db, originalColVal=indexVal, store=store, formula=formula, shouldDumpMatches=forlinks) if linkExists: for linkTable in linkTables: backTables.append(linkTable) else: #serialize the information in the foreign table dtInfo, ignore = rowSerialize(rows=eachRow(db, query), pKeyIndex=dtPKeyIndex, dbdocaddr=dbdocaddr, columnNames=dtColNames, table=dt, foreignKeys=foreignKeys, store=store, formula=formula, subclass=subclass, db=db, extraRDF=extraRDF) if dtInfo: #a row was found in the database backTables.append(dt) #go get information from subclasses of foreign table subTables = _includeSubclasses(db=db, dbdocaddr=dbdocaddr, table=dt, subclass=subclass, foreignKeys=foreignKeys, store=store, formula=formula, cols=dtColNames, vals=[indexVal], extraRDF=extraRDF) for subTable in subTables: backTables.append(subTable) return backTables def numberOfRows(db, table): """Gets the number of rows currently in a table and returns a long.""" c = db.cursor() query = "SELECT COUNT(*) FROM %s" % table c.execute(query) row = c.fetchone() return row[0] def showPrimKeys(db, table): """Gets a list of names of the primary keys of a table.""" ret = [] for colName, ty, nullable, isKey, dunno, dunno in showColumns(db, table): if isKey == "PRI": ret.append(colName) return ret def showColumns(db, table): """Query for column info; return an interator of rows: colName, ty, nullable, isKey, dunno, dunno = row """ c = db.cursor() try: c.execute("show columns from %s" % table) except ProgrammingError, e: if e.args[0] == 1146: logging.info("table '%s' does not exist." % (table)) else: while 1: row = c.fetchone() if not row: break yield row def showForeignKeys(db, table): """Extracts the foreign keys from a table and returns a dictionary. :return: {thatTable: [[thisA, thatA], [thisB, thatB]]}, where:: table.thisA -> thatTable.thatA table.thisB -> thatTable.thatB """ ret = {} c = db.cursor() query = "SHOW CREATE TABLE %s" % table c.execute(query) row = c.fetchone() createTableSQL = row[1] #the SQL used to recreate the table #get all of the lines of SQL that indicate constraints fkFinder = re.compile('^ CONSTRAINT .* FOREIGN KEY .*$', re.MULTILINE) fks = fkFinder.findall(createTableSQL) #go through each line and pull out thatTable, and the column names fieldExtractor = re.compile('\(`(.*)`\) REFERENCES `(.*)` \(`(.*)`\)') for fk in fks: result = fieldExtractor.search(fk) thatTable = result.group(2) thisColumn = result.group(1) thatColumn = result.group(3) if not ret.has_key(thatTable): ret[thatTable] = [] ret[thatTable].append([thisColumn, thatColumn]) return ret def aboutDB(dbdocaddr, store, formula, eachTable, extraRDF, dbName="database", label=True, tables=True, showExtraRDF=True, dbSchema=True): """Describe a database into an RDFStore. :param dbdocaddr: the base for all URIs generated by dbview :param store: the RDFStore statements should be written into :param formula: the formula new statements are written into :param eachTable: an iterator over the tables in the database :param extraRDF: additional RDF statements :param dbName: the string name of the database, used for rdfs:label :param label: True if dbview should generate a default rdfs:label :param tables: True if the database should list all of its tables :param showExtraRDF: True if the user's additional RDF is displayed :param dbSchema: True if db:databaseSchema should be shown >>> dbdocaddr = 'http://example/dbview/w3c' >>> eachTable = ['table1', 'table2'] >>> dbName = 'database' >>> store = RDFStore() >>> formula = store.newFormula() >>> extraRDF = ExtraRDF('http://example/services/dbview') >>> pred = RDFS.label >>> obj = store.intern('a useful database') >>> extraRDF.addDatabaseStatement(pred=pred, obj=obj) >>> aboutDB(dbdocaddr, store, formula, eachTable, extraRDF, dbName) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (.database, type, 'http://www.w3.org/2000/10/swap/db#DatabaseDocument') (.database, databaseSchema, 'http://example/dbview/w3c') (.database, table, 'http://example/dbview/w3c/table1#.table') (.database, table, 'http://example/dbview/w3c/table2#.table') (.database, label, 'md5:ae61d1cccefe836b2306e5bfd3a7d32b') >>> formula = store.newFormula() >>> aboutDB(dbdocaddr, store, formula, eachTable, extraRDF, dbName, ... tables=False) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (.database, type, 'http://www.w3.org/2000/10/swap/db#DatabaseDocument') (.database, databaseSchema, 'http://example/dbview/w3c') (.database, label, 'md5:ae61d1cccefe836b2306e5bfd3a7d32b') >>> formula = store.newFormula() >>> aboutDB(dbdocaddr, store, formula, eachTable, extraRDF, dbName, ... showExtraRDF=False) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (.database, type, 'http://www.w3.org/2000/10/swap/db#DatabaseDocument') (.database, databaseSchema, 'http://example/dbview/w3c') (.database, table, 'http://example/dbview/w3c/table1#.table') (.database, table, 'http://example/dbview/w3c/table2#.table') (.database, label, 'md5:11e0eed8d3696c0a632f822df385ab3c') >>> formula = store.newFormula() >>> aboutDB(dbdocaddr, store, formula, eachTable, extraRDF, dbName, ... showExtraRDF=False, label=False) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (.database, type, 'http://www.w3.org/2000/10/swap/db#DatabaseDocument') (.database, databaseSchema, 'http://example/dbview/w3c') (.database, table, 'http://example/dbview/w3c/table1#.table') (.database, table, 'http://example/dbview/w3c/table2#.table') """ dbaddr = dbsym(dbdocaddr) dbURI = store.symbol(dbaddr) # personell rdfs:type db:databaseDocument formula.add(subj=store.symbol(dbaddr), pred=RDF.type, obj=SwDB.DatabaseDocument) # personell#.database db:databaseSchema personell if dbSchema: formula.add(subj=dbURI, pred=SwDB.databaseSchema, obj=store.symbol(dbdocaddr)) if tables: for table in eachTable: tableURI = store.symbol(tablesym(dbdocaddr, table)) if tableURI in extraRDF.formula.each(pred=RDF.type, obj=SwDB.NoExportTable): continue #this is a NoExportTable # personnel#.database db:table personnel/employees#_table formula.add(subj=dbURI, pred=SwDB.table, obj=tableURI) ## RDFS label # If a label is present in the extra rdf, then we print it and we are done. # Otherwise, if label=True, we generate a default label. labelPresent = False if showExtraRDF: for statement in extraRDF.databaseStatements(): if statement[0] == RDFS.label: labelPresent = True formula.add(subj=dbURI, pred=statement[0], obj=statement[1]) _dumpTerm(term=statement[1], extraRDF=extraRDF, formula=formula, seen=[]) # personell#.database rdfs:label "database" if not labelPresent and label: formula.add(subj=dbURI, pred=RDFS.label, obj=store.intern(dbName)) def aboutTable(eachCol, dbdocaddr, table, store, formula, extraRDF, subclass=Subclass(), numRows=None, label=True, showSubclass=True, columns=True, showPrimaryKeys=True, tableData=True, database=True, showExtraRDF=True, tableSchema=True): """Describe a table into an RDFStore. :param eachCol: an interator over column info as given by showColumns() :param dbdocaddr: the base for all URIs created by dbview :param table: the string table name :param store: an instance of RDFStore :param formula: the formula where new statements are written :param subclass: an instance of Subclass :param extraRDF: additional RDF statements to be added, if relevant :param numRows: the number of rows in the table, or None to not export :param label: True if dbview should create rdfs:label for tables :param showSubclass: True if rdfs:subClassOf should be shown :param columns: True if the columns of a table should be shown :param showPrimaryKeys: True if db:primaryKey is shown :param tableData: True if db:tableData is shown :param database: True if database db:table table is shown :param showExtraRDF: True if user's additional RDF about tables is shown :param tableSchema: True if db:tableSchema is shown >>> fk = ForeignKeys() >>> fk.add('group', 'people', 'leader', 'id') >>> fk.add('group', 'entities', 'id', 'id') >>> pk = {'group':Set(['id']), 'people':Set(['id']), ... 'entities':Set(['id'])} >>> sc = Subclass(fk, pk) >>> dbdocaddr = 'http://example/dbview/w3c' >>> table = 'table1' >>> store = RDFStore() >>> extraRDF = ExtraRDF('http://example/services/dbview') >>> subj = store.symbol(tablesym(dbdocaddr, 'group')) >>> pred = RDFS.label >>> obj = store.intern('a whimsical table') >>> extraRDF.addTableStatement(table=table, pred=pred, obj=obj) >>> extraRDF.addColumnStatement(table='group', column='leader', ... pred=RDF.type, obj=SwDB.NoExportColumn) >>> eachCol = [('id', 'int', None, 'PRI', None, 'auto_increment'), ... ('name', 'varchar', True, None, None, None), ... ('leader', 'int', None, None, None, None)] >>> formula = store.newFormula() >>> numRows = None >>> aboutTable(eachCol=eachCol, dbdocaddr=dbdocaddr, table=table, ... store=store, formula=formula, extraRDF=extraRDF, subclass=sc, ... numRows=numRows, label=False) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (.database, table, 'http://example/dbview/w3c/table1#.table') (.table, type, 'http://www.w3.org/2000/10/swap/db#Table') (.table, tableSchema, 'http://example/dbview/w3c/table1') (.table, tableData, 'http://example/dbview/w3c/table1/data') (.table, label, 'md5:0a6d893ab74fb5e9fa40a0b4090e8828') (.table, column, 'http://example/dbview/w3c/table1#id') (.table, primaryKeyColumn, 'http://example/dbview/w3c/table1#id') (.table, column, 'http://example/dbview/w3c/table1#name') (.table, column, 'http://example/dbview/w3c/table1#leader') >>> formula = store.newFormula() >>> numRows = 77 >>> aboutTable(eachCol=eachCol, dbdocaddr=dbdocaddr, table=table, ... store=store, formula=formula, extraRDF=extraRDF, subclass=sc, ... numRows=numRows) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (.database, table, 'http://example/dbview/w3c/table1#.table') (.table, type, 'http://www.w3.org/2000/10/swap/db#Table') (.table, tableSchema, 'http://example/dbview/w3c/table1') (.table, tableData, 'http://example/dbview/w3c/table1/data') (.table, numberOfRows, 'md5:28dd2c7955ce926456240b2ff0100bde') (.table, label, 'md5:0a6d893ab74fb5e9fa40a0b4090e8828') (.table, column, 'http://example/dbview/w3c/table1#id') (.table, primaryKeyColumn, 'http://example/dbview/w3c/table1#id') (.table, column, 'http://example/dbview/w3c/table1#name') (.table, column, 'http://example/dbview/w3c/table1#leader') >>> formula = store.newFormula() >>> numRows = 77 >>> aboutTable(eachCol=eachCol, dbdocaddr=dbdocaddr, table=table, ... store=store, formula=formula, extraRDF=extraRDF, subclass=sc, ... numRows=numRows, showExtraRDF=False) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (.database, table, 'http://example/dbview/w3c/table1#.table') (.table, type, 'http://www.w3.org/2000/10/swap/db#Table') (.table, tableSchema, 'http://example/dbview/w3c/table1') (.table, tableData, 'http://example/dbview/w3c/table1/data') (.table, numberOfRows, 'md5:28dd2c7955ce926456240b2ff0100bde') (.table, label, 'md5:270e33da79c5156c1ba3b42cbc190c6c') (.table, column, 'http://example/dbview/w3c/table1#id') (.table, primaryKeyColumn, 'http://example/dbview/w3c/table1#id') (.table, column, 'http://example/dbview/w3c/table1#name') (.table, column, 'http://example/dbview/w3c/table1#leader') >>> formula = store.newFormula() >>> numRows = None >>> aboutTable(eachCol=eachCol, dbdocaddr=dbdocaddr, table=table, ... store=store, formula=formula, extraRDF=extraRDF, subclass=sc, ... numRows=numRows, showExtraRDF=False, ... label=False) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (.database, table, 'http://example/dbview/w3c/table1#.table') (.table, type, 'http://www.w3.org/2000/10/swap/db#Table') (.table, tableSchema, 'http://example/dbview/w3c/table1') (.table, tableData, 'http://example/dbview/w3c/table1/data') (.table, column, 'http://example/dbview/w3c/table1#id') (.table, primaryKeyColumn, 'http://example/dbview/w3c/table1#id') (.table, column, 'http://example/dbview/w3c/table1#name') (.table, column, 'http://example/dbview/w3c/table1#leader') >>> formula = store.newFormula() >>> numRows = None >>> aboutTable(eachCol=eachCol, dbdocaddr=dbdocaddr, table=table, ... store=store, formula=formula, extraRDF=extraRDF, subclass=sc, ... numRows=numRows, tableData=False, tableSchema=False, ... database=False) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (.table, type, 'http://www.w3.org/2000/10/swap/db#Table') (.table, label, 'md5:0a6d893ab74fb5e9fa40a0b4090e8828') (.table, column, 'http://example/dbview/w3c/table1#id') (.table, primaryKeyColumn, 'http://example/dbview/w3c/table1#id') (.table, column, 'http://example/dbview/w3c/table1#name') (.table, column, 'http://example/dbview/w3c/table1#leader') >>> formula = store.newFormula() >>> numRows = None >>> aboutTable(eachCol=eachCol, dbdocaddr=dbdocaddr, table=table, ... store=store, formula=formula, extraRDF=extraRDF, subclass=sc, ... numRows=numRows, showPrimaryKeys=False, columns=False) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (.database, table, 'http://example/dbview/w3c/table1#.table') (.table, type, 'http://www.w3.org/2000/10/swap/db#Table') (.table, tableSchema, 'http://example/dbview/w3c/table1') (.table, tableData, 'http://example/dbview/w3c/table1/data') (.table, label, 'md5:0a6d893ab74fb5e9fa40a0b4090e8828') >>> table = 'group' >>> formula = store.newFormula() >>> numRows = 77 >>> aboutTable(eachCol=eachCol, dbdocaddr=dbdocaddr, table=table, ... store=store, formula=formula, extraRDF=extraRDF, subclass=sc, ... numRows=numRows) >>> for s in formula.statements: ... s.subject().uriref()[-14:], s.predicate(), s.object().uriref() ('/w3c#.database', table, 'http://example/dbview/w3c/group#.table') ('c/group#.table', type, 'http://www.w3.org/2000/10/swap/db#Table') ('c/group#.table', tableSchema, 'http://example/dbview/w3c/group') ('c/group#.table', tableData, 'http://example/dbview/w3c/group/data') ('c/group#.table', numberOfRows, 'md5:28dd2c7955ce926456240b2ff0100bde') ('c/group#.table', subClassOf, 'http://example/dbview/w3c/entities#.table') ('/w3c#.database', table, 'http://example/dbview/w3c/entities#.table') ('ntities#.table', type, 'http://www.w3.org/2000/10/swap/db#Table') ('ntities#.table', tableSchema, 'http://example/dbview/w3c/entities') ('ntities#.table', label, 'md5:07214c6750d983a32e0a33da225c4efd') ('c/group#.table', label, 'md5:db0f6f37ebeb6ea09489124345af2a45') ('c/group#.table', column, 'http://example/dbview/w3c/group#id') ('c/group#.table', primaryKeyColumn, 'http://example/dbview/w3c/group#id') ('c/group#.table', column, 'http://example/dbview/w3c/group#name') >>> fk.add('entities', 'things', 'id', 'id') >>> pk['things'] = Set(['id']) >>> sc = Subclass(fk, pk) >>> table = 'group' >>> formula = store.newFormula() >>> aboutTable(eachCol=eachCol, dbdocaddr=dbdocaddr, table=table, ... store=store, formula=formula, extraRDF=extraRDF, subclass=sc, ... numRows=numRows) >>> for s in formula.statements: ... s.subject().uriref()[-14:], s.predicate(), s.object().uriref() ('/w3c#.database', table, 'http://example/dbview/w3c/group#.table') ('c/group#.table', type, 'http://www.w3.org/2000/10/swap/db#Table') ('c/group#.table', tableSchema, 'http://example/dbview/w3c/group') ('c/group#.table', tableData, 'http://example/dbview/w3c/group/data') ('c/group#.table', numberOfRows, 'md5:28dd2c7955ce926456240b2ff0100bde') ('c/group#.table', subClassOf, 'http://example/dbview/w3c/entities#.table') ('/w3c#.database', table, 'http://example/dbview/w3c/entities#.table') ('ntities#.table', type, 'http://www.w3.org/2000/10/swap/db#Table') ('ntities#.table', tableSchema, 'http://example/dbview/w3c/entities') ('ntities#.table', subClassOf, 'http://example/dbview/w3c/things#.table') ('/w3c#.database', table, 'http://example/dbview/w3c/things#.table') ('/things#.table', type, 'http://www.w3.org/2000/10/swap/db#Table') ('/things#.table', tableSchema, 'http://example/dbview/w3c/things') ('/things#.table', label, 'md5:e64c3132023b29762b354a750b474c53') ('ntities#.table', label, 'md5:07214c6750d983a32e0a33da225c4efd') ('c/group#.table', label, 'md5:db0f6f37ebeb6ea09489124345af2a45') ('c/group#.table', column, 'http://example/dbview/w3c/group#id') ('c/group#.table', primaryKeyColumn, 'http://example/dbview/w3c/group#id') ('c/group#.table', column, 'http://example/dbview/w3c/group#name') """ dbaddr = dbsym(dbdocaddr) #the #.database resource tableSchemaURI = store.symbol(tableschemasym(dbdocaddr, table)) tableURI = store.symbol(tablesym(dbdocaddr, table)) # personnell#.database db:table personell/employees#.table if database: formula.add(subj=store.symbol(dbaddr), pred=SwDB.table, obj=tableURI) # personell/employees#.table rdf:type db:Table formula.add(subj=tableURI, pred=RDF.type, obj=SwDB.Table) # personnel/employees#.table db:tableSchema personnel/employees if tableSchema: schemaURI = store.symbol(tableschemasym(dbdocaddr, table)) formula.add(subj=tableURI, pred=SwDB.tableSchema, obj=schemaURI) # personnel/employees#.table db:tableData personnel/employees/data . if tableData: formula.add(subj=tableURI, pred=SwDB.tableData, obj=store.symbol(tabledatasym(dbdocaddr, table))) # personell/employees#.table db:numberOfRows 690. if numRows != None: formula.add(subj=tableURI,pred=SwDB.numberOfRows, obj=store.intern(numRows)) # personell/employees#.table db:editPage personell/employees/edit editable = extraRDF.formula.contains(subj=tableSchemaURI, pred=RDF.type, obj=SwDB.Editable) if editable: tableEditURI=store.symbol(tableeditsym(dbdocaddr, table)) formula.add(subj=tableURI, pred=SwDB.editPage, obj=tableEditURI) ## Superclass relationships # Indicate that this table is a subclass of its parent, its parent is a # subclass of its parent, and so forth if showSubclass: parent = subclass.getParentOf(table) #print the immediate subclass relationship if parent: superURI = store.symbol(tablesym(dbdocaddr, parent)) formula.add(subj=tableURI, pred=RDFS.subClassOf, obj=superURI) aboutTable([], dbdocaddr, parent, store, formula, subclass=subclass, extraRDF=extraRDF, columns=False, showPrimaryKeys=False, tableData=False) #include any additional information provided for the table labelPresent = False if showExtraRDF: for statement in extraRDF.tableStatements(table=table): if statement[0] == RDFS.label: labelPresent = True formula.add(subj=tableURI, pred=statement[0], obj=statement[1]) _dumpTerm(term=statement[1], extraRDF=extraRDF, formula=formula, seen=[]) # rdfs:label "employees" . if not labelPresent and label: formula.add(subj=tableURI,pred=RDFS.label, obj=store.intern(table)) ## Column Information # Indicate that a column belongs to this table, and optionally which is # a primary key for the table. if columns or showPrimaryKeys: for colName, ty, nullable, isKey, dunno, dunno in eachCol: # don't include noExportColumns at all if colName in extraRDF.noExportColumns(table): continue colURI = store.symbol(colsym(dbdocaddr, table, colName)) # personnel#employees db:column personnel/employees#email if columns: #don't add the column if it defines part of a subclass relation formula.add(subj=tableURI, pred=SwDB.column, obj=colURI) # personnel/employees#.table db:primaryKey personnel/employees#mail if isKey == 'PRI' and showPrimaryKeys: formula.add(subj=tableURI, pred=SwDB.primaryKeyColumn, obj=colURI) def aboutColumn(column, table, dbdocaddr, store, formula, foreignKeys, subclass, extraRDF, showExtraRDF=True, showDomain=True, showRange=True, showMapsTo=True, label=True): """Describes a column into an RDFStore. :param column: the string name of the column :param table: the string name of the table :param dbdocaddr: the server base URI :param store: an instance of RDFStore :param formula: the formula to write statements into :param foreignKeys: an instance of ForeignKeys :param subclass: an instance of Subclass() :param extraRDF: an instance of ExtraRDF :param showExtraRDF: True if the user's RDF about the column is shown :param showDoman: True if the domain of a column is shown :param showRange: True if the range (foreign key column) is shown :param showMapsTo: True if db:mapsTo should be shown for columns :param label: True if dbview should generate a label >>> fk = ForeignKeys() >>> fk.add('group', 'people', 'leader', 'id') >>> fk.add('group', 'entities', 'id', 'id') >>> sc = Subclass(fk, {'group':Set(['id']), 'people':Set(['id']), ... 'entities':Set(['id'])}) >>> dbdocaddr = 'http://example/services/dbview' >>> table = 'group' >>> column = 'id' >>> store = RDFStore() >>> extraRDF = ExtraRDF(dbdocaddr) >>> pred = RDFS.label >>> obj = store.intern('yummy data') >>> extraRDF.addColumnStatement(table, column, pred, obj) >>> extraRDF.addColumnStatement('table222', column, pred=RDF.type, ... obj=SwDB.NoExportColumn) >>> formula=store.newFormula() >>> aboutColumn(column, table, dbdocaddr, store, formula=formula, ... foreignKeys=fk, subclass=sc, extraRDF=extraRDF) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (id, type, 'http://www.w3.org/2000/10/swap/db#Column') (id, domain, 'http://example/services/dbview/group#.table') (id, range, 'http://example/services/dbview/entities#.table') (id, mapsTo, 'http://example/services/dbview/entities#id') (id, label, 'md5:fa4bf76df33d824de95343f0b631edf1') >>> formula=store.newFormula() >>> aboutColumn(column, table, dbdocaddr, store, formula, fk, sc, ... extraRDF, showDomain=False) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (id, type, 'http://www.w3.org/2000/10/swap/db#Column') (id, range, 'http://example/services/dbview/entities#.table') (id, mapsTo, 'http://example/services/dbview/entities#id') (id, label, 'md5:fa4bf76df33d824de95343f0b631edf1') >>> formula=store.newFormula() >>> aboutColumn(column, table, dbdocaddr, store, formula, fk, sc, ... extraRDF, showExtraRDF=False) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (id, type, 'http://www.w3.org/2000/10/swap/db#Column') (id, domain, 'http://example/services/dbview/group#.table') (id, range, 'http://example/services/dbview/entities#.table') (id, mapsTo, 'http://example/services/dbview/entities#id') (id, label, 'md5:b80bb7740288fda1f201890375a60c8f') >>> formula=store.newFormula() >>> aboutColumn(column, table, dbdocaddr, store, formula, fk, sc, ... extraRDF, showExtraRDF=False, label=False) >>> for s in formula.statements: ... s.subject(), s.predicate(), s.object().uriref() (id, type, 'http://www.w3.org/2000/10/swap/db#Column') (id, domain, 'http://example/services/dbview/group#.table') (id, range, 'http://example/services/dbview/entities#.table') (id, mapsTo, 'http://example/services/dbview/entities#id') """ #do not export anything if this is a no export column if column in extraRDF.noExportColumns(table): return colURI = store.symbol(colsym(dbdocaddr, table, column)) tableURI = store.symbol(tablesym(dbdocaddr, table)) # personnel/employees#email rdfs:type db:Column formula.add(subj=colURI, pred=RDF.type, obj=SwDB.Column) # personnel/employees#email rdfs:domain personnel/employees#.table if showDomain: formula.add(subj=colURI, pred=RDFS.domain, obj=tableURI) # personnel/employees#.table rdfs:range personnel/people#.table if showRange or showMapsTo: for dt, dc in foreignKeys.mapToAll(table, column): # dt = destination (foreign) table, # dc = destination col #use the highest superclass for range (not immediate) trace = [x for x in subclass.superclassTrace(dt, [dc])] if len(trace) == 0: #dt has no parent otherTable = dt otherColumn = dc else: otherTable = trace[-1][0] otherColumn = trace[-1][1][0] if showRange: otherTableURI = store.symbol(tablesym(dbdocaddr, otherTable)) formula.add(subj=colURI, pred=RDFS.range, obj=otherTableURI) #indicate the immediate map if showMapsTo: otherColURI = store.symbol(colsym(dbdocaddr, otherTable, otherColumn)) formula.add(subj=colURI, pred=SwDB.mapsTo, obj=otherColURI) #include any additional information about the column labelPresent = False if showExtraRDF: for s in extraRDF.columnStatements(table=table, column=column): if s[0] == RDFS.label: labelPresent = True formula.add(subj=colURI, pred=s[0], obj=s[1]) _dumpTerm(term=s[1], extraRDF=extraRDF, formula=formula, seen=[]) if not labelPresent and label: # personnel/employees#email rdfs:label "email" formula.add(subj=colURI, pred=RDFS.label, obj=store.intern(column)) ############################### # Test harness... # @@horribly outdated, and doesn't work anymore def testSvc(): import sys host, port, user, passwd, httpHost, httpPort = sys.argv[2:8] port = int(port) dbName = 'administration' #@@ db=MySQLdb.connect(host=host, port=port, user=user, passwd=passwd, db=dbName) hostPort = (httpHost, int(httpPort)) httpd = DBViewServer(hostPort, DBViewHandler, db, '/', dbName) logging.info("testSvc base: %s", httpd._base) logging.info("Serving HTTP on port %s...", httpPort) httpd.serve_forever() def testShow(): import sys host, port, user, passwd = sys.argv[2:6] port = int(port) dbName = 'w3c' #@@ db=MySQLdb.connect(host=host, port=port, user=user, passwd=passwd, db=dbName) dbaddr = 'http://example/w3c' #@@hmm... when did the API change from sink to wfile? #sink = ToRDF(sys.stdout, dbaddr, flags=Relative_nsdecl) aboutDB(db, dbaddr, sys.stdout).endDoc() aboutTable(db, dbaddr, 'domains', sys.stdout).endDoc() def testUI(): import sys queryUI(sys.stdout.write, 3, '/', "niftydb") def testQ(): import sys host, port, user, passwd = sys.argv[2:6] port = int(port) path='/administration/.dbq?name1=users&fields1=family%2Cemail%2Ccity%2Cid&key1=id&name2=techplenary2002&fields2=plenary%2Cmeal_choice&key2=&kj2_1=id&name3=&fields3=&key3=&kj3_1=&kj3_2=&name4=&fields4=&key4=&kj4_1=&kj4_2=&kj4_3=' path, fields = path.split('?') logging.info("CGI parse: %s", cgi.parse_qs(fields)) fields, tables, keys, joins, cond = parseQuery(fields) logging.info("SQL parse: %s", fields, tables, keys, joins, cond) logging.info("as SQL: %s", asSQL(fields, tables, keys, joins, cond)) sink = ToRDF(sys.stdout, 'stdout:', flags=Relative_nsdecl) db=MySQLdb.connect(host=host, port=port, user=user, passwd=passwd, db='administration') dbaddr = 'http://example/administration' askdb(db, dbaddr, sink, fields, tables, keys, joins, cond) sink.endDoc() class Usage(Exception): """ dbview.py command line arguments Usage ----- $ dbview.py --configFile=dbviewconf.rdf dbview.py also recognizes the following command line arguments: --help displays this message --configFile=file starts dbview.py with a given config file --test runs all the tests --doctest runs the doctest unit tests --dbtest runs the database tests --verbose debug messages are logged --silent no messages are logged Unless the doctest is being performed, a configFile is required. If a dbtest is being performed, the configFile is used to determine which database should be used to perform the test. If the database specified in the configFile already exists, the test fails to prevent the data from being overwritten. """ def __str__(self): return self.__doc__ def main(confFile): """Runs dbview normally after parsing the config file.""" c = confParser.parseDBConf(confFile) #returns a DBViewConf object #connect to the database db=MySQLdb.connect(host=c.mysqlHost, port=c.mysqlPort, user=c.mysqlUser, passwd=c.mysqlPass, db=c.sqldb) #run the http server hostTuple = (c.httpHost, c.httpPort) httpd = DBViewServer(addr=hostTuple, handlerClass=DBViewHandler, dbName=c.sqldb, db=db, httpPath=c.httpPath, configfiles=c.schemaRDF, allowIP=c.allowedClients, httpBase=c.dbDocument, relativeURI=c.relativeURI, backlinks=c.backlinks, forlinks=c.forlinks, exportNumberOfRows=c.exportNumberOfRows, editable=c.editable) #print value of various dbview parameters logging.info("db document: %s", c.dbDocument) logging.info("database: %s", c.sqldb) logging.info("http path: http://%s:%d%s", c.httpHost, c.httpPort, c.httpPath) logging.info("media type: %s", RDF_MediaType) if c.backlinks: logging.info("backlinks: yes") else: logging.info("backlinks: no") if c.forlinks: logging.info("forlinks: yes") else: logging.info("forlinks: no") if c.relativeURI: logging.info("relative URI: yes") else: logging.info("relative URI: no") if c.exportNumberOfRows: logging.info("num rows: yes") else: logging.info("num rows: no") if c.editable: logging.info("editable: yes") else: logging.info("editable: no") logging.info("Starting dbview ...") httpd.serve_forever() def _docTest(): """Run doctest unit tests.""" import doctest import sys if sys.version_info[:2] < (2, 4): from warnings import warn warn("doctest from python 2.3 doesn't seem to work") doctest.testmod() def _dbTest(configFile): """Runs the db tests. This test create the database specified in config, creates new tables, and populates the tables with data. Then various queries are executed on the data, and the results are compared with their desired values. """ ## Testing Procedure # To set up the test, we first need to connect to the database and # fill the test tables with data. To do this, we need to have a live # database to test against, which is specified in a configuration # file identical to dbviewconf.rdf. This file is usually called # dbviewconftest.rdf. If this file does not exist, the test fails. # # To prevent mishaps, we check that the database specified in the # configuration file does not already contain the tables used in testing; # if it does, the test fails. After the test completes, the testing # tables are destroyed. # # This test also implicitly tests the parsing of the dbviewconf.rdf file, # so if this parsing isn't working the test will (hopefully) fail. #start dbview so we can perform tests on it import thread #so we can run dbview and perform tests import time if configFile == None: logging.critical("A configFile must be specified for db tests.") sys.exit(1) logging.info("Preparing to run tests...") #to perform the test, we need to know where the http server is located c = confParser.parseDBConf(configFile) dbviewURI = "http://%s:%d%s" % (c.httpHost, c.httpPort, c.httpPath) #start dbview thread.start_new_thread(main, tuple([configFile])) #we need to wait for dbview to start running before we can proceed triesLeft = 3 dbviewStarted = False while triesLeft > 0: try: urllib.urlopen(dbviewURI) dbviewStarted = True triesLeft = 0 #get out of here except IOError: # there was an error getting there time.sleep(2) triesLeft = triesLeft - 1 if not dbviewStarted: logging.critical("dbview http server could not be contacted; is " + \ "the configuration file correct?") sys.exit(1) #set up the database logging.info("Populating mysql tables...") try: db = MySQLdb.connect(host = c.mysqlHost, port = c.mysqlPort, user = c.mysqlUser, passwd = c.mysqlPass, db = c.sqldb) except OperationalError, err: #could not connect to the database for some reason errNum = err[0] if errNum == 1044: logging.critical("the database %s specified " % (c.sqldb) + "in the configFile does not exist, and dbview " + "cannot create it. Create this database and " + "try again.") sys.exit(1) elif errNum == 1045: logging.critical("the user %s specified " % (c.mysqlUser) + "in the configFile does not have permission " + "to access the database %s." % (c.sqldb)) sys.exit(1) else: logging.critical(err[1]) sys.exit(1) #we have a connection to the database, so make sure the tables don't exist for table in tableNames(db): if table in ['sat_scores', 'customer', 'location', 'shipping']: logging.critical("data already exists in %s.%s" % (c.sqldb, table)) sys.exit(1) if __name__ == '__main__': """Parses the string sent in as command-line options and acts. Either starts dbview or begins running one of the selected test suites. Command line options are specified in Usage. """ import getopt import sys logging.basicConfig(format = '%(levelname)s %(module)s: %(message)s') ## Parse Command Line Arguments # Available options are given in the Usage class. We're either running # a test suite or starting dbview normally. # # If we are performing the doctest, we ignore the configFile and # use the settings that are mandated by the test suite. Once the test # completes, dbview is not started and the results are logged. If the # dbtest flag is found, then we are testing the database, and we use the # file specified in configFile to set up the test. This database MUST NOT # ALREADY EXIST. # # If a test is not being performed, configFile is a required argument. arguments = getopt.getopt(sys.argv[1:], "", ["test", "doctest", "dbtest", "configFile=", "help", "verbose", "silent"]) confFile = None doDocTest = False #true to run the doctests doDbTest = False #true to run the dbtest logging.getLogger().setLevel(logging.INFO) #might be overwritten in args for arg in arguments[0]: #cycle through the command-line arguments if arg[0] == "--configFile": confFile = arg[1] elif arg[0] == "--test": doDocTest = True doDbTest = True elif arg[0] == '--doctest': doDocTest = True elif arg[0] == '--dbtest': doDbTest = True elif arg[0] == '--help': raise Usage elif arg[0] == '--silent': logging.getLogger().setLevel(999) #nothing elif arg[0] == '--verbose': logging.getLogger().setLevel(logging.DEBUG) # Running Mode # Test to see which test suites were requested. If at least one test suite # is requested, dbview does not run normally, and all of the requested # tests are performed. This is accomplished by setting a boolean flag # for each test that is requested. If all the flags are false, then # we're not testing anything. # # In the absense of any test flags, dbview is started by a call to main(). if doDocTest: _docTest() if doDbTest: _dbTest(confFile) if not (doDocTest or doDbTest): if confFile == None: raise Usage else: main(confFile) # $Log: dbview.py,v $ # Revision 1.43 2006/05/18 19:19:21 connolly # remove noExportColumns param from rowSerialize(); wasn't used # get the right noExportColumns for backlinks # # Revision 1.42 2006/05/18 17:30:05 ted # putting danc's version back in for rob to sync # # Revision 1.40 2006/05/18 04:08:13 connolly # oops # # Revision 1.39 2006/05/18 04:06:33 connolly # more xmlns serialization kludge/tweaks # # Revision 1.38 2006/05/18 03:49:22 connolly # - added a bunch more namespace decls in rowDump() to try to # work around toXML serializer issues; doesn't seem to work, though # # - changed flags="z" to flags=Relative_nsdecl to be more transparent # # Revision 1.37 2006/05/18 03:04:39 connolly # use python2.3 logging api # # Revision 1.36 2006/05/18 02:51:43 connolly # backlinks! # # Revision 1.35 2006/05/18 02:06:27 connolly # some debug info toward backlinks # # Revision 1.34 2006/05/17 22:59:16 connolly # fix handling of foreign keys (clearly insufficiently tested) # # Revision 1.33 2006/05/17 22:53:51 connolly # fix call to rowSerialize in tableDump # # Revision 1.32 2006/05/17 22:53:03 connolly # fix call to asSQL # # Revision 1.31 2006/05/17 22:49:48 connolly # make keys and joins on asSQL optional # # Revision 1.30 2006/05/17 22:48:26 connolly # get rid of leftover reference to cursor in tableColumnNames # # Revision 1.29 2006/05/17 22:45:30 connolly # - factor out common code in tableDump and rowDump # # - use asSQL in rowDump, tableDump # - factor out showColumns() # - forget DOC.Work # - kill dead latin1 code # # Revision 1.28 2006/05/17 21:14:21 connolly # oops... tabledata, not tabledesc # # Revision 1.27 2006/05/17 21:11:56 connolly # add link from table description to table data # # Revision 1.26 2006/05/17 21:06:11 connolly # - fixed bug in rowSerialize computing goofy about URI # - added multi-column testcase for itemsym (and fixed a bug) # - got rid of superfluous unicode() calls on table names # - clarified tableDump() docstring that it only dumps key data # - refined tableDump() to use colsym() and itemsym() # # Revision 1.25 2006/05/17 20:31:18 connolly # move str() inside itemsym # # Revision 1.24 2006/05/17 20:21:51 connolly # more on rowDump split # # Revision 1.23 2006/05/17 20:20:38 connolly # more on rowDump split: fKeys capitalization # clearly we need more unit tests. hm. # # Revision 1.22 2006/05/17 20:16:20 connolly # more work on rowDump split; fix nex # # Revision 1.21 2006/05/17 19:34:00 timbl # read primary keys to figure subclass relationships # # Revision 1.20 2006/05/17 19:22:08 timbl # untested read primary keys # # Revision 1.19 2006/05/17 19:21:17 timbl # untested read primary keys # # Revision 1.18 2006/05/17 19:19:57 timbl # untested read primary keys # # Revision 1.17 2006/05/17 19:19:04 timbl # untested read primary keys # # Revision 1.16 2006/05/17 18:57:21 connolly # str() on key values # # Revision 1.15 2006/05/17 18:52:44 connolly # fix tableDump to use tablesym() and itemsym() # # Revision 1.14 2006/05/17 18:30:26 connolly # in aboutTable, make a link to the database # # Revision 1.13 2006/05/17 18:26:14 connolly # - changed table names a la w3c/domains#_table # # - converted DBViewServer docstring notes on URI structure # to functions an unit tests # # - moved Namespaces up # # Revision 1.12 2006/05/17 17:32:59 connolly # use domains in testShow # # Revision 1.11 2006/05/17 17:26:44 connolly # update testShow # # Revision 1.10 2006/05/17 14:43:28 connolly # update testShow to use current db name: w3c # # Revision 1.9 2006/05/17 14:38:02 connolly # factored rowDump into 2 parts, one of which can be tested # _without_ a DB connection # # Revision 1.8 2006/05/17 02:28:40 connolly # simplify table (class) labels # # Revision 1.7 2006/05/17 01:39:22 connolly # try flags="z" on ToRDF # # Revision 1.3 2006/05/08 21:53:48 rcrowell # (rcrowell) Changed through Jigsaw. # # Revision 1.19 2003/03/02 06:07:46 connolly # updated per bind API change, XML serializer move to toXML module # # Revision 1.18 2002/03/16 06:14:53 connolly # allow command-line test cases for --testSQL # # Revision 1.17 2002/03/16 05:59:43 connolly # xml mime type more convenient; fixed buggy namespace bindings # # Revision 1.16 2002/03/08 06:45:24 connolly # fixed bug with empty where clause # added --testSQL # handle NULL values by not asserting anything. Hmm... # # Revision 1.15 2002/03/08 00:11:36 connolly # HTTP export of schemas working. # UI fleshed out a bit. # # Revision 1.14 2002/03/07 23:25:01 connolly # moved ui path to not conflict with table names # # Revision 1.13 2002/03/07 00:24:43 connolly # stop conflating stuff, per designissues thingy. # lightly tested... # # Revision 1.12 2002/03/06 17:24:39 connolly # add pointer to msql doc for mysql-specific schema-interrogation stuff # # Revision 1.11 2002/03/06 17:20:18 timbl # (timbl) Changed through Jigsaw. # # Revision 1.10 2002/03/06 06:37:32 connolly # structure browsing is starting to work: # listing tables in a database, # listing columns in a table. # Converting to RDF/RDFS works. # SwDB namespace name needs deciding. # HTTP export is TODO. # # Revision 1.9 2002/03/06 05:41:32 connolly # OK! basic query interface, including joins, # seems to work. # # Revision 1.8 2002/03/06 03:44:05 connolly # the multi-table case is starting to work... # I haven't captured the joined relationships yet... # I'm thinking about turning the fields structure inside out... # # Revision 1.7 2002/03/06 00:03:16 connolly # starting to work out the forms UI... # # Revision 1.6 2002/03/05 22:16:24 connolly # per-table namespace of columns starting to work... # # Revision 1.5 2002/03/05 22:04:18 connolly # HTTP interface starting to work... #