\chapter{Policy Assurance}

\section{Introduction to Policy Assurance}

In a broad sense, policies exist in almost any situation where multiple agents are in contention for a shared resource. If these agents were in cooperation, or if there were sufficient resources, there would be no contention, and thus, no need for a policy system. The policies give a system a basis for mitigating contention.

Policies manifest themselves in multiple forms: as laws and customs in a human world; as permissions and restrictions in a digital world; as habits in nature. Policy seeks to identify particular actions, perhaps evaluated in a particular context, and make a decision of some sort as a result of those actions. Policies exist for a number of reasons. They may protect a shared resource from overuse or from simultaneous users, which may damage the resource. They may regulate or restrict access to certain kinds of resources, as access may cause side effects within a system. Policies themselves often have justifications, to explain their existence.

In the everyday world, rules may exist for our own protection, or the protection of others. As an example, I consider the public road system. There are a large number of policies which concern the use of public roads, for any purpose, by any individual. The existence of the policies is something of a fair trade bargain: without the policies, the road system would deteriorate to the point of being useless; without a road system, many other activities (such as trade, visiting others, providing medical care, obtaining food, etc.) would become exceedingly difficult. The policies may not be ideal, but are a superior option to the complete absence of the system.

A well-known policy states that ``all users of motorized vehicles on public roads must be certified.'' In common terms, I know this as \emph{all drivers must have a license}. This is a simple, if expensive, policy to enforce: an agent (usually a police officer) may check for compliance of this policy by asking someone operating a motorized vehicle to present their license for inspection. This policy is expensive to enforce because the means for doing so involve either having a police officer check everyone's license (which is not feasible), or requiring a license in order to enable a vehicle's operation (which may also be infeasible). As a result, there is some non-zero probability that there are unlicensed drivers on the road.

In this case, complete policy assurance is exceedingly difficult. Every time an officer checks a driver's license, there is a record which includes the date and time of the check, the officer's identity, the identity of the driver and the vehicle, and information about the driver's license. This provides policy assurance for a single incident, but complete assurance is not feasible. In practice, the choice to check every single action versus some probabilistic number of interactions is itself a policy decision, hopefully the result of a rigorous risk or cost-benefit analysis.

In a digital world, the marginal cost of explicit checks and storing lots of data are far lower, and in many situations, effectively negligible or even zero. Thus, it is easily possible to perform a rigorous check of every action against policies. As an example, consider a Web site with a policy that states, ``all users of the advanced features of this Web site must have a valid account.'' Regardless of the definitions of user, feature, valid, and account that a Web site may choose, it is straightforward to perform a check every time a user tries to access an advanced feature. The Web site may maintain a log of successful and unsuccessful attempts to use advanced features.

\begin{figure}
\centering
\includegraphics{overview}
\caption{Overview of an integrated policy assurance system. From \cite{iarpa-pir-slides}.}
\label{overview}
\end{figure}

We believe that the AIR language, reasoner, and Semantic Web approach is a good fit for the problem of securing a database with sensitive data\cite{iarpa-pir-slides}. The policies we write only need to look at the queries that a user makes. In the case of most policies, we only need to look at a single query to determine compliance. The reasoner looks at a user's query, and an administrator's policy, compares the two, and finds code in the query that matches some template in the policy. The reasoner output provides positive, verified confirmation that a query is or is not in compliance with a policy, while only divulging enough information from the query to back up the compliance claim. The policy assurance approach fulfills the need for positive confirmation and traceability of query compliance using the minimal amount of data. Indeed, if the very contents or results of the query are confidential, we must tread carefully.

\section{User Roles and Perspectives}

The system we present herein has multiple usage scenarios, to help end users implement this system and integrate it into existing databases.

\subsection{The Administrator}

A database administrator, or DBA, would be the first user to interact with the system. The DBA is the entity responsible for maintaining a data set, and thus, for creating policies that regulate access to the data set. It is possible that the DBA does not have access to the queries that a user will make, and it is also possible that the DBA only has access to a data set's metadata.

In order to create a policy, a DBA must have a list of the fields in a database, and in particular, the data types of those fields as URIs. Our implementation of policies at present is largely dependent on finding a query that binds to a particular data type. I are working on a tool that helps to automate this process by determining what types are used in a data set, though this is moot if the DBA cannot access the data set proper.

It is up to the DBA to determine what kinds of policies they wish to implement. The DBA may be bound by local and national laws, by department practices, or by any number of other factors in creating policies. In all likelihood, the policies that the DBA needs to implement will be expressible in terms of the primitives that I define in the next chapter. The DBA would then use the Web based tool to create policies using our templates, and possibly check for compliance using some sample queries the same way a user would. Some policies can be ``history aware'', meaning that the policy looks at all of a user's past queries in addition to the current query when making a compliance decision.

\subsection{The User}

The second major user of this system is someone who wishes to access the database. There are two possible modes of operation here. If the DBA or a system administrator has configured the policy assurance system as a SPARQL add-in, it is possible that the user will see no change, other than having some queries rejected for lack of compliance. HoIver, if the DBA chooses to implement the Web-based option, the user would be able to see the compliance output of their query in a Tabulator-equipped Web browser. The reasoner's output would be helpful in aiding the ``honest but curious'' user to make compliant queries to the database.

\subsection{The Auditor}

The third user of the policy assurance system is the auditor. This is a person or entity charged with the responsibility of assuring that the policies that the DBA wrote are correct, and that the system achieves compliance. The auditor would be able to access the query history, the policies, and the reasoner outputs, and manually verify that things are working correctly. An important future work of this project is to provide tools that help the auditor perform query analysis.

\section{Modes of Operation}

As implemented in this thesis, the policy assurance system exists entirely outside of any database implementation. It would allow analysis of a query history by a user or administrator, and allow a user or administrator to check new queries for compliance before sending them to the database. With no further modification, this system could work, slowly but effectively, in a hypothetical ``air gap'' environment where a user sends a query to an administrator for manual verification and entry. An important future work is to integrate this completely with a SPARQL endpoint; we describe the work needed here in the Future Work section.

\section{Demonstration}

In this section, we demonstrate the workflow for an administrator who wishes to create a policy and use it to check queries for compliance using the Firefox Web browser with the Tabulator browser plugin. This section expands on the sample use scenario presented in the introduction. The workflow for all policy types is similar.

\subsection{Describing a Free-Text Policy}

The administrator wants to encode a policy that says, ``users may not find out where members of the database live. In the case of my data set, this means that users may not see any data of the type \Verb|ex:address|. Users may not USE or RETRIEVE such data.''

The first thing that the administrator needs to do is to create a policy. The administrator heads to the policy generation page,
\begin{Verbatim}
http://dig.csail.mit.edu/2009/policy-assurance/generator/
\end{Verbatim}

\begin{figure}
\centering
\includegraphics[scale=0.55]{create01}
\caption{Restriction Policy Creator Web Page}
\label{create01}
\end{figure}

The administrator clicks the ``Restriction Policy'' link, since this policy is most in line with what the administrator wants to do. The administrator enters the following input:

\begin{itemize}
    \item Policy name: no-address
    \item Policy description: Users may not find the home address of members of the database.
    \item Namespaces: \Verb|@prefix ex: <http://example.com/#>.|
    \item Included attributes: Variable: \Verb|ex:address|. Click ``both'' to check USE and RETRIEVE.
\end{itemize}

\begin{figure}
\centering
\includegraphics[scale=0.55]{create02}
\caption{Tabulator Representation of the Sample ``No Address'' Restriction Policy}
\label{create02}
\end{figure}

The administrator clicks ``Submit!'' to generate the policy (see figure \ref{create01}), and it appears in Tabulator as shown in figure \ref{create02}.

The administrator can type Ctrl+U to see the source AIR code of the policy. The source of this policy is in the appendix \ref{no-address}.

The policy has a unique URI. Since the administrator will need the URI later, the administrator saves the URI, as in figure \ref{uri77}.

\begin{figure}
\begin{Verbatim}
http://dig.csail.mit.edu/2009/policy-assurance/generator/make_
restriction_policy.py?policyName=no-address&textDescription=Users+may
+not+find+the+home+address+of+members+of+the+database.&namespaces=%40
prefix+ex%3A+%3Chttp%3A%2F%2Fexample.com%2F%23%3E.&Attribute1=ex%3A
address&Var1=both&url=whatever
\end{Verbatim}
\caption{URI for the demo ``no address'' policy.}
\label{uri77}
\end{figure}

\subsection{Checking a Compliant Query}

An administrator wants to check the following SPARQL query for compliance. Since it does not mention \Verb|ex:address|, the administrator suspects that it will be compliant:

\begin{Verbatim}
PREFIX example: <http://example.com/#>

SELECT * WHERE {
     ?s example:ssn ?ssn.
     ?s example:age ?age.
     ?s example:name ?name.
     FILTER (?age > 18)
}
\end{Verbatim}

The administrator visits the translation page to perform the translation:
\begin{Verbatim}
http://dig.csail.mit.edu/2009/policy-assurance/sparql2n3.py
\end{Verbatim}

\begin{figure}
\centering
\includegraphics[scale=0.55]{create03}
\caption{Converting a SPARQL Query to N3}
\label{create03}
\end{figure}

The administrator enters the query into the translator and gets the result shown in figure \ref{query55}. The translated query appears in a Web browser in figure \ref{create03}.

\begin{figure}
\begin{Verbatim}
@prefix s: <http://dig.csail.mit.edu/2009/IARPA-PIR/sparql#> .

:Query19369095151250692756 a s:SPARQLQuery;

s:clause [
  s:triplePattern  { :s <http://example.com/#ssn> :ssn };
  s:triplePattern  { :s <http://example.com/#age> :age };
  s:triplePattern  { :s <http://example.com/#name> :name };
  s:triplePattern  { :age s:booleanGT "18 "};

]; 
   s:retrieve [
      s:var :age;
      s:var :name;
      s:var :s;
      s:var :ssn;
].
\end{Verbatim}
\caption{Compliant SPARQL query, converted to N3.}
\label{query55}
\end{figure}

\begin{figure}
\centering
\includegraphics[scale=0.55]{create04}
\caption{Tabulator Representation of a Sample SPARQL Query}
\label{create04}
\end{figure}

The administrator clicks the ``View in Tabulator'' link, to see what the translation looks like in Tabulator. The result is in figure \ref{create04}.

The translation has a unique URI, which is the URI of the Tabulator page. The administrator saves the URI, as in figure \ref{uri88}, for checking the query for compliance.

\begin{figure}
\begin{Verbatim}
http://dig.csail.mit.edu/2009/policy-assurance/print-input.py?input=
%40prefix+s%3A+%3Chttp%3A%2F%2Fdig.csail.mit.edu%2F2009%2FIARPA-PIR
%2Fsparql%23%3E+.%0A%0A%3AQuery19369095151250692756+a+s%3ASPARQLQuery
%3B%0A%0As%3Aclause+[%0A++s%3AtriplePattern++{+%3As+%3Chttp%3A%2F
%2Fexample.com%2F%23ssn%3E+%3Assn+}%3B%0A++s%3AtriplePattern++{+
%3As+%3Chttp%3A%2F%2Fexample.com%2F%23age%3E+%3Aage+}%3B%0A++s%3A
triplePattern++{+%3As+%3Chttp%3A%2F%2Fexample.com%2F%23name%3E+%3A
name+}%3B%0A++s%3AtriplePattern++{+%3Aage+s%3AbooleanGT+%2218+%22}%3B
%0A%0A]%3B+%0A+++s%3Aretrieve+[%0A++++++s%3Avar+%3Aage%3B%0A+++
+++s%3Avar+%3Aname%3B%0A++++++s%3Avar+%3As%3B%0A++++++s%3Avar+%3A
ssn%3B%0A].%0A%0A
\end{Verbatim}
\caption{URI for the translation of the compliant demo query.}
\label{uri88}
\end{figure}

To check the query for compliance against the policy, the administrator goes to the policy execution page:
\begin{Verbatim}
http://dig.csail.mit.edu/2009/policy-assurance/run-policy.py
\end{Verbatim}

The administrator pastes in the URIs of the query and the policy into the correct text boxes, and clicks ``Execute''. A ``View in Tabulator'' link appears, as in figure \ref{create05}, which the administrator clicks. The Tabulator page appears, and the administrator views the output, as in figure \ref{create06}. The administrator can use the justification pane to get more information about this decision, as in figure \ref{create07}.

\begin{figure}
\centering
\includegraphics[scale=0.55]{create05}
\caption{Policy Execute Web Page}
\label{create05}
\end{figure}

\begin{figure}
\centering
\includegraphics[scale=0.5]{create06}
\caption{Tabulator Compliance Summary for Sample Query and Sample Policy}
\label{create06}
\end{figure}

\begin{figure}
\centering
\includegraphics[scale=0.5]{create07}
\caption{Tabulator Compliance Justification UI for Sample Query and Sample Policy}
\label{create07}
\end{figure}

\begin{figure}
\centering
\includegraphics[scale=0.5]{create08}
\caption{Tabulator Non-Compliance Summary for Sample Query and Sample Policy}
\label{create08}
\end{figure}

\begin{figure}
\centering
\includegraphics[scale=0.5]{create09}
\caption{Tabulator Non-Compliance Justification UI for Sample Query and Sample Policy}
\label{create09}
\end{figure}

\subsection{Checking an Incompliant Query}

The administrator now checks an incompliant query:

\begin{Verbatim}
PREFIX ex: <http://example.com/#>
SELECT * WHERE {
     ?s ex:address ?a.
}
\end{Verbatim}

This translation has the URI listed in figure \ref{uri99}.

\begin{figure}
\begin{Verbatim}
http://dig.csail.mit.edu/2009/policy-assurance/print-input.py?input=%
40prefix+s%3A+%3Chttp%3A%2F%2Fdig.csail.mit.edu%2F2009%2FIARPA-PIR%2F
sparql%23%3E+.%0A%0A%3AQuery4606263041250694128+a+s%3ASPARQLQuery%3B
%0A%0As%3Aclause+[%0A++s%3AtriplePattern++{+%3As+%3Chttp%3A%2F%2F
example.com%2F%23address%3E+%3Aa+}%3B%0A%0A]%3B+%0A+++s%3Aretrieve+
[%0A++++++s%3Avar+%3Aa%3B%0A++++++s%3Avar+%3As%3B%0A].%0A%0A
\end{Verbatim}
\caption{URI for the translation of the non-compliant demo query.}
\label{uri99}
\end{figure}

Following the same steps as before, the administrator can see that this query is non compliant, as in figure \ref{create08} and \ref{create09}.

\subsection{Demo Notes}

This demonstration will work on any Firefox Web browser with a recent version of the Tabulator plugin installed. Without the Tabulator plugin, the demo will work in any Web browser. However, withou the plugin, a browser will either display the textual output of the reasoner, or prompt the user to save the output to a file on disk. 

\section{Summary}

This chapter presented an overview of the system implemented in this thesis. It described a sample scenario, and defined three perspectives to the system. It provided a Web based demonstration of the three major components of the system: the SPARQL to N3 query converter, the automated policy generator, and the user-friendly reasoner output. The following chapters offer more technical information about the design and implementation of this system.