changeset 355:3d0f96c49ec5 default tip

Updated paper
author Sigurd Meldgaard <stm@daimi.au.dk>
date Wed, 15 Dec 2010 12:34:53 +0100
parents 4f48ec3417bb
children
files paper/pysmcl.tex
diffstat 1 files changed, 261 insertions(+), 87 deletions(-) [+]
line wrap: on
line diff
--- a/paper/pysmcl.tex	Wed Dec 15 12:03:21 2010 +0100
+++ b/paper/pysmcl.tex	Wed Dec 15 12:34:53 2010 +0100
@@ -1,4 +1,4 @@
-\documentclass[11pt,a4paper]{article}
+\documentclass[11pt]{llncs}
 \usepackage[utf8]{inputenc}
 % \usepackage[danish]{babel}
 \usepackage[T1]{fontenc}
@@ -8,8 +8,6 @@
 \usepackage{hyperref}
 \usepackage{url}
 \usepackage{fullpage}
-%\usepackage[firstpage]{draftwatermark}
-%\SetWatermarkText{\input "|hg parents --template='{node}'"}
 
 \newcommand{\s}{\sigma}
 \newcommand{\NP}{\mathcal{NP}}
@@ -205,17 +203,18 @@
 
       \nt{e} &::=& \VAR{x} &\mbox{variable reference}\\
       &$\mid$& $x$[$e$] &\mbox{\rm array indexing}\\
-      &$\mid$& \nt{f}(\nt{e}) &\mbox{\rm function application}\\
+      &$\mid$& \nt{f}( $\overline{\nt{e}}$  ) &\mbox{\rm function application}\\
       &$\mid$ & \nt{n} &\mbox{number}\\
-      &$\mid$ & $e_1\oplus e_2$ &\mbox{binary operator, $\oplus \in \{+,-,*\}$}\\
+      &$\mid$ & $e_1\diamond e_2$ &\mbox{binary operator, $\diamond \in \{+,-,*\}$}\\
       &$\mid$ & $e_1\sim e_2$ &\mbox{comparison, $\sim \in \{==,<,>,<=,>=\}$}\\
-      &$\mid$& \KWD{input}($e_1, e_2, e_3, e_4$) &\rm retrieve secret value identified by $e_1$ from \\ 
+      &$\mid$& \KWD{input}($str, e_2, e_3, e_4$) &\rm retrieve secret value described by 
+      char string $str$ from \\ 
       & & & party \nt{e}$_2$ in the range from $e_3$ to $e_4$\\
       %            &$\mid$& \KWD{Secret}(\nt{e}) &\mbox{\rm result of \nt{e} becomes secret}\\
-      &$\mid$& \KWD{output}(\nt{e}) &\mbox{\rm value of \nt{e}$_2$ becomes public for everyone}\\
-      &$\mid$& \KWD{output}(\nt{e}$_1$,\nt{e}$_2$) &\mbox{\rm value of \nt{e}$_2$ becomes public for identity \nt{e}$_1$}\\
-      &$\mid$& \KWD{result}(\nt{e}$_1$,\nt{e}$_2$) &\mbox{\rm value of \nt{e}$_2$ is marked} \\
-        & & & \mbox{ as the result of the computation}\\
+      &$\mid$& \KWD{output}(\nt{e}) &\mbox{\rm value of \nt{e} becomes public for everyone}\\
+      &$\mid$& \KWD{output}(\nt{e}$_1$,\nt{e}$_2$) &\mbox{\rm value of \nt{e}$_2$ becomes known to only identity \nt{e}$_1$}\\
+      &$\mid$& \KWD{result}(\nt{e}) &\mbox{\rm value of \nt{e} is marked} \\
+        & & & \mbox{ as a result of the computation}\\
       &$\mid$& \KWD{random}() &\mbox{\rm random secret number}\\
       &$\mid$& \KWD{random\_bit}() &\mbox{\rm random secret value in \{0,1\}}\\
       &$\mid$& \KWD{player}() &\mbox{\rm the id of the current player}\\
@@ -470,11 +469,11 @@
 to be left to be done ``by hand''.
 
 An important difference from the work on SMCL is the way we handle the
-information that is released from ``out'' statements. Where in SMCL an
-``out(x)'' statement caused any variable \texttt{x} depends on to be
+information that is released from \KWD{out} statements. Where in SMCL an
+\KWD{out(x)} statement caused any variable \texttt{x} depends on to be
 considered insecure from the view point of the analysis. Our approach
 as described is to require the user to declare exactly what can
-intentionally be leaked (by using ``result''), and then inferring that
+intentionally be leaked (by using \KWD{result}), and then inferring that
 it is OK to reveal what is revealed.
 
 \subsection{Functions returning secret values}
@@ -509,9 +508,14 @@
 variable are guaranteed to lie, assuming that the assertions on the
 input data hold true.
 
-The range analysis is based on values in a finite lattice (by the use
-of a widening function), so we know that a fix-point analysis will
-terminate.
+The range analysis is based on values in a finite lattice using of a
+\emph{widening function}, giving the analysis high precision on small
+numbers (exact up to 128), and then gradually loosing precision
+(approximating by rounding up to the closest power of two) and then
+giving up (represented by a ``bottom'' value) on very high values
+(larger than $2^{32}$). in this way we ensure that a fix-point
+analysis will always terminate. The chosen limits are somewhat
+artificial, but seems to work well for many common tasks.
 
 The current implementation is doing a simplistic analysis on the
 control-flow graph, but it is still useful to use in combination with
@@ -675,95 +679,268 @@
 continue, then it pays off to collect them and wait for all of them at
 the same time.  How to approximate this (which cannot be done in the
 current version) is left for future work.
+
+
 \section{Security in the UC model}
+In this section, we will show using the UC framework that running
+a PySMCL program on top of a correctly implemented run-time system
+is ``as secure as'' an running an idealized program that does the same computation,
+but only outputs the values marked as intended results.
+
+\subsection{Control-safe Programs}
+To prove our result, we will need to assume that the program is what we call
+{\em control-flow safe}:
+
+\begin{enumerate}
+\item VIFF automatically parallellizes some computations
+and the relative ordering of these operations may therefore not be well defined.
+But in PySMCL we make sure to serialize all input and output operations, so they always are executed in the order in which they 
+appear in the control flow. (This may be not be needed, if it already follows from the
+general semantics).
+\item The program is such that for any set of inputs, the following holds:
+if the program executes an output operation making a non-result value $x$ known, 
+it will always at a later point open a set of result-values $y_1,...y_t$ 
+from which $x$ can be simulated ``stand-alone''. More precisely, we assume 
+that, given the inputs, $x$ has distribution
+independent of everything else\footnote{this would be the case if $x$  is the result of masking a secret value with something random, or if $x$ depends deterministically on the inputs.}, and
+there exists a probabilistic polynomial time machine $M_x$ where
+the distribution of $M_x(y_1,...,y_t)$ is statistically indistinguishable from the distribution of
+$x$. 
+\item
+Using the same notation as in the previous item, it must be the case that in every program
+execution, no input operation is executed before all $y_1,...,y_t$ have been opened.
+\end{enumerate}
+
+Of course, the last two conditions are void if $M_x$ does not need any input. In general,
+however, it should be noted that it is not enough to assume that openings of $y_1,..,y_t$ appear
+in the program and that $x$ can be simulated from $y_1,...,y_t$,
+we do need the last two condition to avoid programs that would not be secure. For instance,
+if the program branches on $x$, and in one branch $y_1,...,y_t$ are opened, while in the other branch nothing happens, this clearly means it would not in general be secure to open $x$: we might reveal information on the $y_i$'s even in cases where they were not supposed to be opened. Moreover,
+if a corrupt player were to supply an input value after $x$ is opened but before any $y_i$ is known, he gets a chance to make his input depend on the $y_i$'s because $x$ may 
+depend on the $y_i$'s. So this would not be equivalent to a ideal functionality that only
+reveals the intended results.
+
+Some remarks are in order here, on the extent to which the PySMCL verifier helps
+to ensure that a program is control-flow safe. 
 
-One way to think of the runtime is as an ideal functionality $F$ that
+A first point here is to argue that
+the fact that the analysis only considers functions that are marked by
+\KWD{ideal\_functionality}, is not a problem. This is because
+the only parts of $P$ that can \KWD{output} and \KWD{input}
+information are the ones that are marked this way: Secret information 
+can only be created by
+those functions, and they can only be called from other such
+functions\footnote{in practice they can be
+called, but they need an extra parameter, the \emph{runtime},
+without the runtime the program cannot create or operate on secret values,
+and so such calls cannot do harm, only cause the program to fail.}, and they cannot pass the information to non-analyzed
+functions (unless the user specifically asks for an
+exception). Therefore secret information can only be revealed by an
+analyzed function doing an \KWD{output}. 
+
+Next, note that the verifier will create a proof burden to show the
+relationship between the \KWD{output} value and the \KWD{results},
+this corresponds directly to showing the existence of the machine
+$M_x$ mentioned above. Carrying the proof burden has to be done
+manually at the moment, but once this is done, we actually know that
+the program is control-flow safe, because the preprocessor
+automatically checks the other conditions that relate to the
+control-flow. The proof burden is made by assigning to each
+\KWD{output} the set of \KWD{result} expressions that are guaranteed
+by a control-flow analysis to be met in the future before meeting any
+input-statements.
+
+\subsection{Handling Range Checks on Inputs}
+In PySMCL, \KWD{input} expressions always specify a range that
+the input value is supposed to be contained in, and the program analysis
+is based on the assumption that this is actually true. However, not all
+run-times in VIFF can do such range checks -- in fact,  those that are 
+only secure against passive corruption do not have to. Moreover, several applications
+do the range check on inputs by means that are external to the PySMCL program.
+For instance, the player who is supplying the input is sent an applet that checks 
+whether the inputs satisfy the required conditions, and it is assumed that users
+would not have sufficient time and resources to reverse-engineer the applet.
+
+In such a case, there are no security requirements we can meaningfully
+specify in case the inputs are not in range. The standard way to reflect this in
+the UC framework is to specify that the involved ideal functionality ``breaks down'' 
+if it gets input that is not in range, i.e., it sends its internal state to the 
+adversary/environment, and let him decide all future actions. Since now everything is 
+public and events are under adversarial control, there are no security requirements
+on the protocol implementing the functionality in such a case.
+
+In the following, we will always assume that our functionalities have such
+an ``exception handling'', without explicitly specifying it. This effectively means that we only
+have to prove something under the assumption that all inputs are in range. 
+However, it is trivial to extend our analysis
+and prove a result corresponding to Theorem \ref{thm} below for the case where
+the run-time enforces range checks.
+
+
+\subsection{Modeling Run-Times and Program Executions}
+We will model the run-time system underlying a PySMCL program as
+an ideal functionality $F$ in the UC framework.
+In a nutshell, we think of the runtime as a functionality $F$ that
 can receive inputs from the players, evaluate arithmetic circuits on
 its internal memory and output results. All of this when asked to do
-it by honest players.
+it by honest players. More precisely, it is defined as follows:
+
+\subsubsection{Run-Time Functionality $F$}
+ 
+$F$ keeps at all times an internal list of variables that have been referenced
+in earlier commands, together with their values. The list is initially empty.
+
+$F$ will receive commands from players. $F$ will execute
+a command once all honest players have sent that same command.
+Each time $F$ executes a command, it tells the environment which command
+was executed (but nothing about the values involved).
+Commands are of the following types:
 
-Running a PySMCL program $P$ corresponds to executing the protocol
+\begin{description}
+\item[\KWD{input(i,v, val).}]
+On receiving $input(i,v,val)$ for new variable $v$ from $P_i$ and 
+$input(i,v,\bot)$ from all honest players, $F$ stores $val$ as the value of variable $v$.
+\item[$v_1= n$, $v_1= v_2$]
+On receiving one of these commands 
+(for a number $n$ or existing variable $v_2$), $F$ records $n$ or
+the value of $v_2$ as the new value of $v_1$.
+\item[$v_1=  v_2 \diamond v_3$]
+On receiving this command for existing variables $v_2, v_3$ with values $val_2, val_3$, store $val_2\diamond val_3$ as the value of $v_1$. Here, $\diamond$
+may be any of the arithmetic or comparison operators defined in PySMCL. 
+\item[\KWD{output(v).}] 
+On receiving this command for existing variable $v$, send the value of $v$ to all parties (and to the environment).
+\item[\KWD{output(i,v).}]
+On receiving this command for existing variable $v$, send the value of $v$ to player $P_i$ only.
+\end{description}
+
+
+We now want to model running a PySMCL program $P$ as the execution of a protocol
 $\pi_P$ in a hybrid model where $F$ is available.
 
-$\pi_P$ works as follows: The players follow the instructions of
-$P$. When $P$ contains an \KWD{input} statement the designated player
-sends input to the $F$, when the program contains computation on
-hidden values, the players ask $F$ to do it, and when $P$ contains
-\KWD{output} statements the players ask $F$ to output the target
-values.
-
-Any if-statement with a secret condition will be rewritten, and
-looping is not allowed with a secret condition, so the programs will
-never branch on secret values and the players will never have to
-disagree, or be uncertain on what to do next. The correctness of the
-protocol thus relies on the correctness of the if-rewriting, so the
-computation of $F_P$ is equivalent to that of $P$.
+$\pi_P$ works as follows: We first note that we will rely on what we argued previously, namely that rewriting of if-statements results in an equivalent program. Accordingly, we consider the program as it looks after the rewrite.  This means that there is a uniquely defined 
+control-flow in $P$ that only depends on $P$ and the values that are output from $P$. The protocol now consists of each player
+simulating $P$ locally, using calls 
+to $F$ whenever needed. More precisely, players do the following:
+\begin{itemize}
+\item
+Expressions are evaluated in the standard order, and players assign unique variable names to
+every temporary result. This means executing any assignment or expression evaluation in PySMCL can be reduced to  executing operations of form $v_1= v_2 \diamond v_3$, $v_1= n$, or 
+$v_1= v_2$. Players send the corresponding commands to $F$.
+\item
+When an input expression \KWD{input(str, i, low, high)}
+referring to input from $P_i$ is encountered, players
+output \KWD{input(str, i, low, high)} and send $input(i,v,\bot)$ to $F$
+except $P_i$ who sends $input(i,v,val)$, where $val$ is his input value and
+$v$ is a new variable name.
+\item
+When an \KWD{output(e)} expression is encountered, players output
+send 
+\KWD{output(v)} to $F$ where $v$ is the variable storing the value of $e$.
+Each player outputs the value returned by $F$.
+When an \KWD{output(i,e)} expression is encountered, players send
+\KWD{output(i,v)} to $F$ where $v$ is the variable storing the value of $e$.
+$P_i$ outputs the value returned by $F$, all other players output
+\KWD{output(i,e)}.
+\item
+Players terminate when the program terminates.
+\end{itemize}
 
-Now we define a functionality $F_P$, working just as $F$, but it has a
-new command $P$ that, when called, makes the functionality run the
-program $P$ --- taking inputs when $P$ would, outputting \emph{only}
-what is marked as \KWD{result} in $P$. At all times $F_P$ will tell
-the adversary where in the program execution it is.
+In order to be able to say something about the security of $\pi_p$
+we exploit the fact that $P$ can also be seen in a natural way as a 
+specification of an ideal functionality $F_P$, namely one that runs
+$P$ but only outputs the intended results. More precisely, we define
+$F_P$ as follows:
+
+\subsubsection{Program Functionality $F_P$}
 
-We now want to show that $\pi_P$ with access to $F$ is implementing
-$F_P$.
+$F_P$ knows the program $P$ it is supposed to execute. It also has
+a (initially empty) list of internal variables together with their current values.
 
-The main property we need to show, is that any \KWD{output} that $\pi_P$
-will do, can only release information (do an \KWD{output}) when there is
-a way to simulate this output from the information $F_P$ outputs from
-the $result$ instructions, as of now the verifier will print a proof
-burden for each \KWD{output} not directly within an \KWD{result} to show
-the consequences of the program.
+$F_P$ initially waits to receive a ``start'' command from all honest players.
+It will then start executing $P$ using the same control flow that players use
+in $\pi_P$. The only special provisions needed are for statements involving
+input or output: 
 
-A function marked \KWD{ideal\_functionality} should not be called from
-functions that are not themselves marked (in practice they can be
-called, but they are given an extra parameter, the \emph{runtime},
-without the runtime program cannot create or operate on secret values,
-and so cannot do harm, only cause the program to fail).
-
-Also we need to show that the only parts of $\pi_P$ that can \KWD{output}
-information are the ones that are marked by
-\KWD{@ideal\_functionality}: Secret information can only be created by
-those functions, and they can only be called from other such
-functions, and they cannot pass the information to non-analyzed
-functions (unless the user specifically asks for an
-exception). Therefore secret information can only be revealed by an
-analyzed function doing an \KWD{output}, and there will be created a
-proof burden to show the relationship between the \KWD{outputted}'ed value and
-the \KWD{result}.
+\begin{description}
+\item[\KWD{input.}]
+When an input expression \KWD{input(str, i, low, high)}
+referring to input from $P_i$ is encountered, 
+send  \KWD{input(str, i, low, high)} 
+to all players and wait to receive a value 
+\KWD{val} from $P_i$.Then
+continue the computation, using \KWD{val} as the value for the expression.
+\item[\KWD{output(e).}]
+When an \KWD{output(e)} expression for a non-result value is encountered,
+store the value of $e$ in a separate variable, output nothing, and continue the computation. 
+When an output expression for a \KWD{result} value is encountered, send
+the value of $e$ to the environment. When the environment sends \KWD{deliver} back,
+send the value to all players.
+\item[\KWD{output(i,e).}]
+When an \KWD{output(i,e)} expression is encountered, if $P_i$ is corrupt,
+send the value of $e$ to the environment. When the environment sends \KWD{deliver} back,
+send a notification \KWD{output(i,e)} to all players. If $P_i$ is honest,
+send a notification \KWD{output(i,e)}  to the environment. When the 
+environment sends \KWD{deliver} back, send the value of $e$ to $P_i$ only, and
+a notification \KWD{output(i,e)} to all other players. 
+\end{description}
 
-The security of the implemented protocol is now inherited from the
-primitives of the runtime. Because they are proven to be UC-secure and
-disclose no information at all, any composition of them will have the same
-property.
+\bigskip
 
-
-Whenever $\pi_P$ outputs information, this information is coming from
-an \KWD{output} statement (because $\pi_P$ is composed of UC-secure
-primitives releasing no information at all).
+\subsection{The Result}
+We now want to show that $\pi_P$ with access to $F$ securely implements
+$F_P$. More precisely, we have
 
-A simulator for $\pi_P$ with access to $F_P$ works as follows: it runs
-$F_P$, that will output only the information marked as \KWD{result} in
-the program $P$. The simulator records this, and to simulate the
-transcript of $\pi_P$ it uses the algorithm (that was proved to exist
-by the user as a fulfillment of the proof burden) for deriving a value
-indistinguishable from the \KWD{output} from the \KWD{result}.
-
-In VIFF there exists several different runtimes with different
-security and performance characteristics with respect to correctness,
-passive/active security, termination etc. The implemented protocol
-will be as secure as the chosen runtime. At the time of writing VIFF
+\begin{theorem}
+\label{thm}
+If $P$ is control-flow safe, then $\pi_P$ with access to $F$ implements
+$F_P$ with statistical security against unbounded environments
+corrupting any number of players.
+\end{theorem}
+\begin{remark}
+The security statement claimed here may seem very strong, but this is
+because modeling the run-time as the functionality $F$ means that 
+we abstract away any limitations that a concrete implementation
+of $F$ might have. In particular, in VIFF there exists several different 
+runtimes with different security and performance characteristics with respect to correctness,
+passive/active security, termination etc. Since the Theorem works with 
+no limitations on the environment, the UC composition theorem
+implies that
+running a program based on a concrete run-time will essentially be as
+as secure as the chosen runtime. At the time of writing VIFF
 has runtimes for both actively and passively corrupted players with
 perfect security based on the GMW87-protocol, and a cryptographic
 runtime with self-trust. %TODO cite properly
+\end{remark}
+\begin{proof}
+We prove the statement by exhibiting an algorithm for a simulator, that is a program with access
+  to the ideal functionality $F_P$ that can create a view of the
+  protocol indistinguishable from that of $\pi_P$ running on top of the ideal runtime $F$. The only real source of leaked information is that from ``output'' statements.
+
+The simulator starts by sending \KWD{start} to $F_P$, and then follow the structure of the program.
+
+\begin{description}
+\item[\KWD{input.}]  When an input expression \KWD{input(str, i, low,
+    high)} is encountered, send \KWD{input(str, i, low, high)} let all
+  players send input to $F_P$.
+%\item[\KWD{result(e).}]
+%When a result expression \KWD{result(e)} is encountered, wait for $F_P$ to send the result.
+\item[\KWD{output(e).}]
+When an \KWD{output(e)} expression for a non-result value is encountered $F_P$ will not output anything, but continue computation. By assuming the program is control-flow safe, we know that $F_P$ will come through all the result calls necessary for computing a value that is statistically indistinguishable from that of \KWD{e} without requiring anymore interaction. When that has happened the simulator uses those results to compute a value \KWD{e} and outputs that.
+
+\end{description}
+
+%Specify an algorithm for the simulator, argue that it works.
+%Algorithm should follow structure for $\pi_P$ and $F_P$.
+\end{proof}
+
 
 \begin{figure}[htbp]
   \centering
   \includegraphics[width=0.5\linewidth]{uc}
   \caption{For the computation to be secure, no adversary must be able
-    to distinguish between communicating with the simulator and
-    communicating with the real protocol running on top of the ideal
-    runtime.}
+    to distinguish between communicating with the real protocol
+    running on top of the ideal runtime and communicating with the
+    simulator.}
 \end{figure}
 
 
@@ -906,10 +1083,6 @@
 Installation instructions
 is in the \verb|INSTALL| text file.
 
-%\input{core}
-
-%\input references.tex
-
 \section{Conclusion}
 We have described an embedded domain specific language for describing
 secure multiparty computations, and the analysis we do to ensure
@@ -932,5 +1105,6 @@
 
 % Local Words (en_US): preprocessing multi preprocessor deferreds
 % Local Words (en_US): Composability analyses runtimes
+% Local Words (en_US): runtime functionalities
 % Local Words: MPC CACE PySMCL SMCL VIFF