toXML
index
/Users/yosi/CVSROOT/WWW/2000/10/swap/toXML.py

$Id: toXML.py,v 1.39 2007/06/26 02:36:15 syosi Exp $
 
 
This module implements basic sources and sinks for RDF data.
It defines a stream interface for such data.
It has a command line interface, can work as a web query engine,
and has built in test(), all of which demosntrate how it is used.
 
To make a new RDF processor, subclass RDFSink.
 
See also:
 
Notation 3
http://www.w3.org/DesignIssues/Notation3
 
Closed World Machine - and RDF Processor
http;//www.w3.org/2000/10/swap/cwm
 
To DO: See also "@@" in comments
 
Internationlization:
- Decode incoming N3 file as unicode
- Encode outgoing file
- unicode \u  (??) escapes in parse
- unicode \u  (??) escapes in string output
 
Note currently unicode strings work in this code
but fail when they are output into the python debugger
interactive window.
 
______________________________________________
 
Module originally by Dan Connolly.
TimBL added RDF stream model.

 
Modules
       
RDFSink
codecs
re
string
sys
triple_maker
triple_maker
urllib
urlparse

 
Classes
       
RDFSink.RDFStructuredOutput(RDFSink.RDFSink)
ToRDF
tmToRDF
XMLWriter

 
class ToRDF(RDFSink.RDFStructuredOutput)
    keeps track of most recent subject, reuses it
 
 
Method resolution order:
ToRDF
RDFSink.RDFStructuredOutput
RDFSink.RDFSink

Methods defined here:
__init__(self, outFp, thisURI=None, base=None, flags='')
#@ Not actually complete, and can encode anyway
bind(self, prefix, namespace)
dummyClone(self)
retun a version of myself which will only count occurrences
endAnonymous(self, subject, verb)
endAnonymousNode(self, subj=None)
endDoc(self, rootFormulaPair=None)
endFormulaObject(self, pred, subj)
endFormulaSubject(self, subj)
endListObject(self, subject, verb)
endListSubject(self, subj)
flushStart(self)
makeComment(self, str)
makeStatement(self, tuple, why=None, aIsPossible=0)
referenceTo(self, uri)
Conditional relative URI
startAnonymous(self, tuple)
startAnonymousNode(self, subj, isList=0)
startDoc(self)
startFormulaObject(self, tuple)
startFormulaSubject(self, context)
startListObject(self, tuple, isList=0)
startListSubject(self, subj)
startWithParseType(self, parseType, tuple)

Data and other attributes defined here:
flagDocumentation = '\nFlags to control RDF/XML output (after --rdf=) ...e URIs.\nz - Allow relative URIs for namespaces\n\n'

Methods inherited from RDFSink.RDFSink:
checkNewId(self, uri)
The store can override this to raise an exception if the
id is not in fact new. This is useful because it is usfeul
to generate IDs with useful diagnostic ways but this lays them
open to possibly clashing in pathalogical cases.
countNamespace(self, namesp)
On output, count how many times each namespace is used
genId(self)
intern(self, something)
namespaceCounts(self)
newBlankNode(self, context, uri=None, why=None)
newExistential(self, context, uri=None, why=None)
newFormula(self, uri=None)
newList(self, l, context)
newLiteral(self, str, dt=None, lang=None)
newSymbol(self, uri)
newUniversal(self, context, uri=None, why=None)
newXMLLiteral(self, doc)
reopen(self)
Un-End a document
 
If you have added stuff to a document, thought you were done, and
then want to add more, call this to get back into the sate that makeSatement
is again acceptable. Remember to end the document again when done.
setDefaultNamespace(self, uri)
Pass on a binding hint for later use in output
 
This really is just a hint. The parser calls this to pass on
the default namespace which it came across, as this is a
useful hint for a human readable prefix for output of the same
namespace. Otherwise, output processors will have to invent or
avoid useing namespaces, which will look ugly.
setGenPrefix(self, genPrefix)

 
class XMLWriter
    taken from
Id: tsv2xml.py,v 1.1 2000/10/02 19:41:02 connolly Exp connolly
 
Takes as argument a writer which does the (eg utf-8) encoding
 
  Methods defined here:
__init__(self, encodingWriter, counter, squeaky=0, version='1.0')
closeTag(self)
data(self, str)
emptyElement(self, n, attrs=[], prefixes={})
endDocument(self)
endElement(self)
figurePrefix(self, uriref, rawAttrs, prefixes)
flushClose(self)
indent(self, extra=0)
makeComment(self, str)
makePI(self, str)
newline(self, howmany=1)
passXML(self, st)
startElement(self, n, attrs=[], prefixes={})

Data and other attributes defined here:
attrEsc = <_sre.SRE_Pattern object>
dataEsc = <_sre.SRE_Pattern object>

 
class tmToRDF(RDFSink.RDFStructuredOutput)
    Trying to do the same as above, using the TripleMaker interface
 
 
Method resolution order:
tmToRDF
RDFSink.RDFStructuredOutput
RDFSink.RDFSink

Methods defined here:
IsOf(self)
__init__(self, outFp, thisURI=None, base=None, flags='')
addAnonymous(self, Id)
addLiteral(self, lit, dt=None, lang=None)
addNode(self, node, nameLess=0)
addQuestionMarkedSymbol(self, sym)
addSymbol(self, sym)
backwardPath(self)
beginAnonymous(self)
beginFormula(self)
beginList(self)
checkIsOf(self)
declareExistential(self, sym)
declareUniversal(self, sym)
end(self)
endAnonymous(self)
endFormula(self)
endList(self)
endStatement(self)
forewardPath(self)
nodeIDize(self, argument)
referenceTo(self, uri)
Conditional relative URI
start(self)

Methods inherited from RDFSink.RDFStructuredOutput:
endAnonymousNode(self, endAnonymousNode)
endFormulaObject(self, pred, subj)
endFormulaSubject(self, subj)
startAnonymous(self, triple, isList=0)
startAnonymousNode(self, subj)
startFormulaObject(self, triple)
startFormulaSubject(self, context)

Methods inherited from RDFSink.RDFSink:
bind(self, prefix, uri)
Pass on a binding hint for later use in output
 
This really is just a hint. The parser calls bind to pass on
the prefix which it came across, as this is a useful hint for
a human readable prefix for output of the same
namespace. Otherwise, output processors will have to invent or
avoid useing namespaces, which will look ugly
checkNewId(self, uri)
The store can override this to raise an exception if the
id is not in fact new. This is useful because it is usfeul
to generate IDs with useful diagnostic ways but this lays them
open to possibly clashing in pathalogical cases.
countNamespace(self, namesp)
On output, count how many times each namespace is used
endDoc(self, rootFormulaPair)
End a document
 
Call this once only at the end of parsing so that the receiver can wrap
things up, oprimize, intern, index and so on.  The pair given is the (type, value)
identifier of the root formula of the thing parsed.
genId(self)
intern(self, something)
makeComment(self, str)
This passes on a comment line which of course has no semantics.
 
This is only useful in direct piping of parsers to output, to preserve
comments in the original file.
makeStatement(self, tuple, why=None)
add a statement to a stream/store.
 
raises URISyntaxError on bad URIs
tuple is a quad (context, predicate, subject, object) of things generated by calls to newLiteral etc
why is reason for the statement.
namespaceCounts(self)
newBlankNode(self, context, uri=None, why=None)
newExistential(self, context, uri=None, why=None)
newFormula(self, uri=None)
newList(self, l, context)
newLiteral(self, str, dt=None, lang=None)
newSymbol(self, uri)
newUniversal(self, context, uri=None, why=None)
newXMLLiteral(self, doc)
reopen(self)
Un-End a document
 
If you have added stuff to a document, thought you were done, and
then want to add more, call this to get back into the sate that makeSatement
is again acceptable. Remember to end the document again when done.
setDefaultNamespace(self, uri)
Pass on a binding hint for later use in output
 
This really is just a hint. The parser calls this to pass on
the default namespace which it came across, as this is a
useful hint for a human readable prefix for output of the same
namespace. Otherwise, output processors will have to invent or
avoid useing namespaces, which will look ugly.
setGenPrefix(self, genPrefix)
startDoc(self)

 
Functions
       
bNodePredicate()
dummyWrite(x)
findLegal(dict, str)
swap(List, a, b)
xmldata(write, str, markupChars)

 
Data
        ADDED_HASH = '#'
ALL4 = (0, 1, 2, 3)
ANONYMOUS = 3
CONTEXT = 0
DAML_LISTS = 1
DAML_NS = 'http://www.daml.org/2001/03/daml+oil#'
DAML_sameAs = (0, 'http://www.daml.org/2001/03/daml+oil#sameAs')
DAML_sameAs_URI = 'http://www.daml.org/2001/03/daml+oil#sameAs'
DPO_NS = 'http://www.daml.org/2001/03/daml+oil#'
FORMULA = 1
LITERAL = 2
LITERAL_DT = 21
LITERAL_LANG = 22
List_NS = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
Logic_NS = 'http://www.w3.org/2000/10/swap/log#'
N3_Empty = (0, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#Empty')
N3_List = (0, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#List')
N3_first = (0, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#first')
N3_forAll_URI = 'http://www.w3.org/2000/10/swap/log#forAll'
N3_forSome_URI = 'http://www.w3.org/2000/10/swap/log#forSome'
N3_nil = (0, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#nil')
N3_rest = (0, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#rest')
NCNameChar = 0
NCNameStartChar = 1
NODE_MERGE_URI = 'http://www.w3.org/2000/10/swap/log#is'
OBJ = 3
PARTS = (1, 2, 3)
PRED = 1
RDF_NS_URI = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'
RDF_li = (0, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#li')
RDF_spec = 'http://www.w3.org/TR/REC-rdf-syntax/'
RDF_type = (0, 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type')
RDF_type_URI = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type'
SUBJ = 2
SYMBOL = 0
XMLLITERAL = 25
XML_NS_URI = 'http://www.w3.org/XML/1998/namespace'
option_noregen = 0
parsesTo_URI = 'http://www.w3.org/2000/10/swap/log#parsesTo'