Buckets:

|
download
raw
120 kB

Computable Stochastic Processes

Pieter Collins
Department of Knowledge Engineering
Maastricht University
pieter.collins@maastrichtuniversity.nl

15 September 2014

Abstract

The aim of this paper is to present an elementary computable theory of probability, random variables and stochastic processes. The probability theory is based on existing approaches using valuations and lower integrals. Various approaches to random variables are discussed, including the approach based on completions in a Polish space. We apply the theory to the study of stochastic dynamical systems in discrete-time, and give a brief exposition of the Wiener process as a foundation for stochastic differential equations. The theory is based within the framework of type-two effectivity, so has an explicit direct link with Turing computation, and is expressed in a system of computable types and operations, so has a clean mathematical description.

1 Introduction

In this paper, we present a computable theory of probability, random variables and stochastic processes, with the aim of providing a theoretical foundation for the rigorous numerical analysis of discrete-time continuous-state Markov chains and stochastic differential equations. The first part of the paper provide an exposition of the approach to probability distributions using valuations and the development of integrals of positive lower-semicontinuous and of bounded continuous functions, and on the approach to random variables as limits of almost-everywhere defined continuous partial functions. In the second part, we show that our approach allows one to very quickly derive computability results for discrete-time stochastic processes. In the third part, we provide a new construction of the Wiener process in which sample paths are effectively computable, and use this to show that the solutions to stochastic differential equations can be effectively computed.

An early approach to constructive measure theory was developed in [BC72]; see also [BB85]. The standard approach to a constructive theory of probability measures, as developed in [JP89, Eda95a, SS06, Esc09], is through valuations, which are measures restricted to open sets. The most straightforward approach to integration is the Choquet or horizontal integral, a lower integral introduced within the framework of domain theory in [Tix95]; see also [Kön97, Law04]. The lower integral on valuations in the form used here was given in [Vic08]. Relationships between the constructive and classical approaches were given in [Eda95a]. Explicit representations of valuations within the framework of type-two effectivity were given in [Sch07], and representation of probability measures using probabilistic processes were given by [SS06]. In [Esc09], a language EPCL for nondeterministic and probabilistic computation was given, based on the PCL language of [Esc04]. In [HR09], a theory of measure was developed for the study of algorithmic randomness.A constructive theory of measurable functions was also developed in [BC72, BB85]. The theory is developed using abstract integration spaces, and the integral is extended from test functions to integrable functions by taking limits. However, the approach we use here, in which measurable functions are defined as limits of effectively-converging Cauchy sequences of continuous functions was introduced in [Spi03] and further developed in [Spi06, CS09]. Random variables over discrete domains were defined in [Mis07], based on work of [Var02]. This was extended to random variables over continuous domains in [GLV11], but the construction allows only for continuous random variables, and is overly-restrictive in practice.

To the best of our knowledge there has been relatively little work on constructive and computable approaches to stochastic processes. An early constructive theory of discrete-time stochastic processes focusing on stopping times was given in [Cha72]. A fairly comprehensive theory though technically advanced theory based on stochastic relations is developed in [Dob07]; the approach here is considerably simpler. The monadic properties of the lower integral on valuations, were noted by [Vic11], and of the completion construction by [OS10]. .

We use the framework of type-two effectivity (TTE), in which computations are performed by Turing machines working on infinite sequences, as a foundational theory of computability. We believe that this framework is conceptually simpler for non-specialists than the alternative of using a domain-theoretic framework. Since in TTE we work entirely in the class of quotients of countably-based (QCB) spaces, which form a cartesian closed category, many of the basic operations can be carried out using simple type-theoretic constructions such as the $\lambda$ -calculus.

We assume that the reader has a basic familiarity with classical probability theory (see e.g. [Shi95]) and stochastic processes (see [Fri75, WI81, GS04]). Much of this article is concerned with giving computational meaning to classical concepts and arguments. The main difficulty lies in the use of $\sigma$ -algebras in classical probability, which have poor computability properties. Instead, we use only topological constructions, which can usually be effectivised directly. In particular, we define types of measurable functions as a completion of types of continuous functions.

2 Computable Analysis

In the theory of type-two effectivity, computations are performed by Turing machines acting on sequences over some alphabet $\Sigma$ . A computation performed by a machine $\mathcal{M}$ is valid on an input $p \in \Sigma^\omega$ if the computation does not halt, and writes infinitely many symbols to the output tape. A type-two Turing machine therefore performs a computation of a partial function $\eta : \Sigma^\omega \rightharpoonup \Sigma^\omega$ ; we may also consider multi-tape machines computing $\eta : (\Sigma^\omega)^n \rightharpoonup (\Sigma^\omega)^m$ . It is straightforward to show that any machine-computable function $\Sigma^\omega \rightharpoonup \Sigma^\omega$ is continuous on its domain.

In order to relate Turing computation to functions on mathematical objects, we use representations of the underlying sets, which are partial surjective functions $\delta : \Sigma^\omega \rightharpoonup \mathbb{X}$ . An operation $\mathbb{X} \rightarrow \mathbb{Y}$ is $(\delta_{\mathbb{X}}; \delta_{\mathbb{Y}})$ -computable if there is a machine-computable function $\eta : \Sigma^\omega \rightharpoonup \Sigma^\omega$ with $\text{dom}(\eta) \supset \text{dom}(\delta_{\mathbb{X}})$ such that $\delta_{\mathbb{Y}} \circ \eta = f \circ \delta_{\mathbb{X}}$ on $\text{dom}(\delta_{\mathbb{X}})$ . Representations are equivalent if they induce the same computable functions. If $\mathbb{X}$ is a topological space, we say that a representation $\delta$ of $\mathbb{X}$ is an admissible quotient representation if (i) whenever $f : \mathbb{X} \rightarrow \mathbb{Y}$ is such that $f \circ \delta$ is continuous, then $f$ is continuous, and (ii) whenever $\phi : \Sigma^\omega \rightharpoonup \mathbb{X}$ is continuous, there exists continuous $\eta : \Sigma^\omega \rightharpoonup \Sigma^\omega$ such that $\phi = \delta \circ \eta$ . A computable type is a pair $(\mathbb{X}, [\delta])$ where $\mathbb{X}$ is a space and $[\delta]$ is an equivalence class of admissible quotient representations of $\mathbb{X}$ . A multi-valued function $F : \mathbb{X} \rightrightarrows \mathbb{Y}$ is computably selectable if there is a machine-computable function $\eta : \Sigma^\omega \rightharpoonup \Sigma^\omega$ with $\text{dom}(\eta) \supset \text{dom}(\delta_{\mathbb{X}})$ such that $\delta_{\mathbb{Y}} \circ \eta \in F \circ \delta_{\mathbb{X}}$ on $\text{dom}(\delta_{\mathbb{X}})$ ; note that different names of $x \in \mathbb{X}$ may give rise to different values of $y \in \mathbb{Y}$ .The category of computable types with continuous functions is Cartesian closed, and the computable functions yield a Cartesian closed subcategory. For any types $\mathbb{X}, \mathbb{Y}$ there exist a canonical product type $\mathbb{X} \times \mathbb{Y}$ with computable projections $\pi_{\mathbb{X}} : \mathbb{X} \times \mathbb{Y} \rightarrow \mathbb{X}$ and $\pi_{\mathbb{Y}} : \mathbb{X} \times \mathbb{Y} \rightarrow \mathbb{Y}$ , and a canonical exponential type $\mathbb{Y}^{\mathbb{X}}$ such that evaluation $\epsilon : \mathbb{Y}^{\mathbb{X}} \times \mathbb{X} \rightarrow \mathbb{Y} : (f, x) \mapsto f(x)$ is computable. Since objects of the exponential type are continuous function from $\mathbb{X}$ to $\mathbb{Y}$ , we also denote $\mathbb{Y}^{\mathbb{X}}$ by $\mathbb{X} \rightarrow \mathbb{Y}$ or $\mathcal{C}(\mathbb{X}; \mathbb{Y})$ ; in particular, whenever we write $f : \mathbb{X} \rightarrow \mathbb{Y}$ , we imply that $f$ is continuous. There is a canonical equivalence between $(\mathbb{X} \times \mathbb{Y}) \rightarrow \mathbb{Z}$ and $\mathbb{X} \rightarrow (\mathbb{Y} \rightarrow \mathbb{Z})$ given by $\tilde{f}(x) : \mathbb{Y} \rightarrow \mathbb{Z} : \tilde{f}(x)(y) = f(x, y)$ .

There are canonical types representing basic building blocks of mathematics, including the natural number type $\mathbb{N}$ and the real number type $\mathbb{R}$ . We use a three-valued logical type with elements ${\mathsf{F}, \mathsf{T}, \perp}$ representing false, true, and indeterminate or unknowable, and its subtypes the Boolean type $\mathbb{B}$ with elements ${\mathsf{F}, \mathsf{T}}$ and the Sierpinski type $\mathbb{S}$ with elements ${\mathsf{T}, \perp}$ . Given any type $\mathbb{X}$ , we can identify the type $\mathcal{O}(\mathbb{X})$ of open subsets $U$ of $\mathbb{X}$ with $\mathbb{X} \rightarrow \mathbb{S}$ via the characteristic function $\chi_U$ . Further, standard operations on these types, such as arithmetic on real numbers, are computable.

A sequence $(x_n)$ is an effective Cauchy sequence if $d(x_m, x_n) < \epsilon_{\max(m,n)}$ where $(\epsilon_n){n \in \mathbb{N}}$ is a known computable sequence with $\lim{n \rightarrow \infty} \epsilon_n = 0$ , and a strong Cauchy sequency if $\epsilon_n = 2^{-n}$ . The limit of an effective Cauchy sequence of real number is computable.

We shall also need the type $\mathbb{H} \equiv \mathbb{R}_{<}^{+, \infty}$ of positive real numbers with infinity under the lower topology. The topology on the lower halfline $\mathbb{H}$ is the topology of lower convergence, with open sets $(a, \infty]$ for $a \in \mathbb{R}^+$ and $\mathbb{H}$ itself. A representation of $\mathbb{H}$ then encodes an increasing sequence of positive rationals with the desired limit. We note that the operators $+$ and $\times$ are computable on $\mathbb{H}$ , where we define $0 \times \infty = \infty \times 0 = 0$ , as is countable supremum $\sup : \mathbb{H}^\omega \rightarrow \mathbb{H}$ , $(x_0, x_1, x_2, \dots) \mapsto \sup{x_0, x_1, x_2, \dots}$ . Further, $\text{abs} : \mathbb{R} \rightarrow \mathbb{H}$ is computable, as is the embedding $\mathbb{S} \hookrightarrow \mathbb{H}$ taking $\mathsf{T} \mapsto 1$ and $\perp \mapsto 0$ . We let $\mathbb{I}_{<}$ be the unit interval $[0, 1]$ , again with the topology of lower convergence with open sets $(a, 1]$ for $a \in [0, 1)$ and $\mathbb{I}$ itself, and $\mathbb{I}_{>}$ the interval with the topology of upper convergence.

A computable metric space is a pair $(\mathbb{X}, d)$ where $\mathbb{X}$ is a computable type, and $d : \mathbb{X} \times \mathbb{X} \rightarrow \mathbb{R}^+$ is a computable metric, such that the extension of $d$ to $\mathbb{X} \times \mathcal{A}(\mathbb{X})$ defined by $d(x, A) = \inf{d(x, y) \mid y \in A}$ is computable as a function into $\mathbb{R}_{<}^{+, \infty}$ . This implies that given an open set $U$ we can compute $\epsilon > 0$ such that $B_\epsilon(x) \subset U$ , which captures the relationship between the metric and the open sets. The effective metric spaces of [Wei99] are a concrete class of computable metric space.

A type $\mathbb{X}$ is effectively separable if there is a computable function $\xi : \mathbb{N} \rightarrow \mathbb{X}$ such that $\text{rng}(\xi)$ is dense in $\mathbb{X}$ .

Throughout this paper we shall use the term “compute” to indicate that a formula or procedure can be effectively carried out in the framework of type-two effectivity. Other definitions and equations may not be possible to verify constructively, but hold from axiomatic considerations.

3 Computable Measure Theory

The main difficulty with classical measure theory is that Borel sets and Borel measures have very poor computability properties. Although a computable theory of Borel sets was given in [Bra05], the measure of a Borel set is in general not computable in $\mathbb{R}$ . However, we can consider an approach to measure theory in which we may only compute the measure of open sets. Since open sets are precisely those which can be approximated from inside, we expect to be able to compute lower bounds for the measure of an open set, but not upper bounds. The above considerations suggest an approach which has become standard in computable measure theory, namely that using valuations [JP89, Eda95a, SS06, Esc09].Definition 1 (Valuation). The type of (continuous) valuations on $\mathbb{X}$ is the subtype of continuous functions $\nu : \mathcal{O}(\mathbb{X}) \rightarrow \mathbb{H}$ satisfying $\nu(\emptyset) = 0$ and the modularity condition $\nu(U) + \nu(V) = \nu(U \cup V) + \nu(U \cap V)$ for all $U, V \in \mathcal{O}(\mathbb{X})$ .

A valuation $\nu$ on $\mathbb{X}$ is finite if $\nu(\mathbb{X})$ is finite, effectively finite if $\nu(\mathbb{X})$ is a computable real number, and locally finite if $\nu(U) < \infty$ for any $U$ with compact closure. An effectively finite valuation computably induces an upper-valuation on closed sets $\bar{\nu} : \mathcal{A}(\mathbb{X}) \rightarrow \mathbb{R}_{>}^+$ by $\bar{\nu}(A) = \nu(\mathbb{X}) - \nu(\mathbb{X} \setminus A)$ . The following proposition gives standard monotonicity and convergence properties for type of valuations.

Proposition 2. Let $\nu : \mathcal{O}(\mathbb{X}) \rightarrow \mathbb{H}$ be continuous. Then $\nu$ satisfies the monotonicity condition $\nu(U) \leq \nu(V)$ whenever $U \subset V$ , and the continuity condition $\nu(\bigcup_{n=0}^{\infty} U_n) = \lim_{n \rightarrow \infty} \nu(U_n)$ whenever $U_n$ is an increasing sequence of open sets.

The proof is immediate from properties of continuous functions into $\mathbb{H}$ . An immediate consequence is that $\nu(U) \leq \bar{\nu}(A)$ whenever $U \subset A$ .

An explicit representation of valuations $\mathcal{M}[0, 1]$ on the unit interval was given in [Wei99] using the basic open sets $I_{a,b,r} = {\nu \in \mathcal{M}[0, 1] \mid \nu(a, b) > r}$ for $a, b, r \in \mathbb{Q}$ with $0 \leq a < b \leq 1$ and $r > 0$ . Various representations for arbitrary spaces were given in [Sch07].

The following theorem [Eda95b, Corollary 5.3] shows that valuations and measures are equivalent on locally-compact Hausdorff spaces.

Theorem 3. On a countably-based locally-compact Hausdorff space, finite Borel measures and continuous valuations are in one-to-one correspondence.

In [AM02], it was shown that any continuous valuation on a locally compact sober space extends to a unique Borel measure. This result provides a link with classical measure theory, but are not needed for a purely constructive approach; valuations themselves are the objects of study, and we only (directly) consider the measure of open and closed sets.

The following result shows that the measure of a sequence of small sets approaches zero. Recall that a space $\mathbb{X}$ is regular if for any point $x$ and open set $U$ , there exists an open set $V$ and a closed set $A$ such that $x \in V \subset A \subset U$ .

Lemma 4. Let $\mathbb{X}$ be a separable regular space, and $\nu$ a finite valuation on $\mathbb{X}$ . If $U_n$ is any sequence of open sets such that $U_{n+1} \subset U_n$ and $\bigcap_{n=0}^{\infty} U_n = \emptyset$ , then $\nu(U_n) \rightarrow 0$ as $n \rightarrow \infty$ .

Proof. Since $\mathbb{X}$ is separable and regular, there exist open sets $V_{n,k}$ and closed sets $A_{n,k}$ such that $V_{n,k} \subset A_{n,k} \subset V_{n,k+1} \subset U_n$ and $\bigcup_{k \rightarrow \infty} V_{n,k} = U_n$ . Then $\lim_{k \rightarrow \infty} \nu(V_{n,k}) = \nu(U_n)$ . Suppose $\inf_{n \in \mathbb{N}} \nu(U_n) > \epsilon > 0$ . Choose a sequence $\delta_n$ such that $\sum_{n=0}^{\infty} \delta_n = \delta < \epsilon$ , and sets $V_n \subset A_n \subset U_n$ such that $\nu(V_n) \geq \nu(U_n) - \delta_n$ . Then $\bigcap_{n=0}^N A_n = U_N \setminus \bigcup_{n=0}^N (U_n \setminus A_n)$ , so $\nu(\bigcap_{n=0}^N A_n) \geq \nu(U_N) - \sum_{n=0}^N \nu(U_n \setminus A_n) \geq \nu(U_N) - \sum_{n=0}^N (\nu(U_n) - \nu(A_n)) \geq \nu(U_N) - \sum_{n=0}^N \delta_n \geq \epsilon - \delta > 0$ . Since $\nu$ is upper-continuous on closed sets, $\nu(\bigcap_{n=0}^{\infty} A_n) \geq \epsilon - \delta > 0$ . Then $\bigcap_{n=0}^{\infty} A_n \neq \emptyset$ , contradicting $\emptyset = \bigcap_{n=0}^{\infty} U_n \supset \bigcap_{n=0}^{\infty} A_n$ . $\square$

Definition 5. Given a sub-topology $\mathcal{V}$ on $\mathbb{X}$ and a valuation $\nu$ on $\mathbb{X}$ , a conditional valuation is a function $\nu(\cdot|\cdot) : \mathcal{O}(\mathbb{X}) \times \mathcal{V} \rightarrow \mathbb{H}$ such that $\nu(U \cap V) = \nu(U|V)\nu(V)$ for all $U \in \mathcal{O}(\mathbb{X})$ and $V \in \mathcal{V}$ .

Clearly, $\nu$ can be computed given $\nu$ restricted to $\mathcal{V}$ and $\nu(\cdot|\cdot)$ . The conditional valuation $\nu(\cdot|V)$ is uniquely defined if $\nu(V) \neq 0$ . However, since $\nu(U \cap V) : \mathbb{R}_{<}^+$ but $1/\nu(V) : \mathbb{R}_{>}^{+,\infty}$ , the conditional valuation $\nu(\cdot|V)$ cannot be computed unless we are also given a set $A \in \mathcal{A}(\mathbb{X})$ such that $V \subset A$ and $\bar{\nu}(A \setminus V) = 0$ , in which case we have $\nu(U|V) = \nu(U \cap V)/\bar{\nu}(A)$ . We define the $\nu$ -regular setsas those for which $\nu(\partial V) = 0$ , so that $\nu(U|V)$ is continuous for open $U$ and non-null $\nu$ -regular $V$ .

Just as for classical probability, we say (open) sets $U_1, U_2$ are independent if $\nu(U_1 \cap U_2) = \nu(U_1)\nu(U_2)$ .

We can define a notion of integration for positive lower-semicontinuous functions by the Choquet or horizontal integral; see [Tix95, Law04, Vic08].

Definition 6 (Lower horizontal integral). Given a valuation $\nu : (\mathbb{X} \rightarrow \mathbb{S}) \rightarrow \mathbb{H}$ , define the lower integral $(\mathbb{X} \rightarrow \mathbb{H}) \rightarrow \mathbb{H}$ by

Xψdν=sup{m=1n(pmpm1)ν(ψ1(pm,])(p0,,pn)Q and 0=p0<p1<<pn}.(1)\int_{\mathbb{X}} \psi \, d\nu = \sup \left\{ \sum_{m=1}^n (p_m - p_{m-1}) \nu(\psi^{-1}(p_m, \infty]) \mid (p_0, \dots, p_n) \in \mathbb{Q}^* \text{ and } 0 = p_0 < p_1 < \dots < p_n \right\}. \quad (1)

Note that we could use any dense set of computable positive real numbers, such as the dyadic rationals $\mathbb{Q}_2$ , instead of the rationals in (1). Since each sum is computable, and the supremum of countably many elements of $\mathbb{H}$ is computable, we immediately obtain:

Proposition 7. Given names of a valuation $\nu$ in $(\mathbb{X} \rightarrow \mathbb{S}) \rightarrow \mathbb{H}$ and of a function $\psi$ in $\mathbb{X} \rightarrow \mathbb{H}$ , the lower integral $\int_{\mathbb{X}} \psi , d\nu$ is computable in $\mathbb{H}$ .

Note that although an alternative form for the sum is given through the equality

m=1n(pmpm1)ν(ϕ1(pm,])=m=1npmν(ϕ1(pm,pm+1])\sum_{m=1}^n (p_m - p_{m-1}) \nu(\phi^{-1}(p_m, \infty]) = \sum_{m=1}^n p_m \nu(\phi^{-1}(p_m, p_{m+1}])

where $p_{n+1} = \infty$ , the lower integral cannot be computed in this form since $\nu(\phi^{-1}(p_m, p_{m+1}]) = \nu(\phi^{-1}(p_m, \infty]) - \nu(\phi^{-1}(p_{m+1}, \infty])$ is uncomputable in $\mathbb{H}$ .

It is fairly straightforward to show that the integral is linear,

X(a1ψ1+a2ψ2)dν=a1Xψ1dν+a2Xψ2dν(2)\int_{\mathbb{X}} (a_1 \psi_1 + a_2 \psi_2) \, d\nu = a_1 \int_{\mathbb{X}} \psi_1 \, d\nu + a_2 \int_{\mathbb{X}} \psi_2 \, d\nu \quad (2)

for all $a_1, a_2 \in \mathbb{H}$ and $\psi_1, \psi_2 : \mathbb{H} \rightarrow \mathbb{H}$ .

If $\chi_U$ is the characteristic function of a set $U$ , then $\int_{\mathbb{X}} \chi_U , d\nu = \nu(U)$ , and it follows that if $\phi = \sum_{i=1}^n a_i \chi_{U_i}$ is a step function, then $\int_{\mathbb{X}} \phi , d\nu = \sum_{i=1}^n a_i \nu(U_i)$ .

Given a (lower-semi)continuous linear functional $\mu : (\mathbb{X} \rightarrow \mathbb{H}) \rightarrow \mathbb{H}$ , we can define a function $\mathcal{O}(\mathbb{X}) \rightarrow \mathbb{H}$ by $U \mapsto \mu(\chi_U)$ for $U \in \mathcal{O}(\mathbb{X})$ . By linearity,

μ(χU)+μ(χV)=μ(χUV)+μ(χUV).\mu(\chi_U) + \mu(\chi_V) = \mu(\chi_{U \cap V}) + \mu(\chi_{U \cup V}).

Hence $\mu$ induces a valuation on $\mathbb{X}$ . We therefore obtain a computable equivalence between the type of valuations and the type of positive linear lower-semicontinuous functionals:

Theorem 8. The type of valuations $(\mathbb{X} \rightarrow \mathbb{S}) \rightarrow \mathbb{H}$ is computably equivalent to the type of continuous linear functionals $(\mathbb{X} \rightarrow \mathbb{H}) \rightarrow \mathbb{H}$ .

Types of the form $(\mathbb{X} \rightarrow \mathbb{T}) \rightarrow \mathbb{T}$ for a fixed type $\mathbb{T}$ form a monad [Str72] over $\mathbb{X}$ , and are particularly easy to work with.

In [Eda95a, Section 4], a notion of integral $\mathcal{C}_{\text{bd}}(\mathbb{X}; \mathbb{R}) \rightarrow \mathbb{R}$ on continuous bounded functions was introduced based on the approximation by measures supported on finite sets of points. Our lower integral on positive lower-semicontinuous functions can be extended to bounded functions as follows:

Definition 9 (Bounded integration). A valuation $\mu$ on $\mathbb{X}$ is effectively finite if there is a (known) computable real $c \in \mathbb{R}$ such that $\mu(\mathbb{X}) = c$ .An upper-semicontinuous function $f : \mathbb{X} \rightarrow \mathbb{R}_>$ is effectively bounded if there is a (known) computable real $b \in \mathbb{R}$ such that $f(x) < b$ for all $x \in \mathbb{X}$ .

If $\mu$ is effectively finite and $f$ is effectively bounded, then the function $b - f : \mathbb{X} \rightarrow \mathbb{R}<$ is computable (given names of $b$ and $f$ ), and we define the integral $\mathcal{C}_{\text{bd}}(\mathbb{X}; \mathbb{R}_>) \rightarrow \mathbb{R}>$ by

Xf(x)dμ(x)=bcX(bf(x))dμ(x).\int_{\mathbb{X}} f(x) d\mu(x) = b c - \int_{\mathbb{X}} (b - f(x)) d\mu(x).

Similarly, if $f : \mathbb{X} \rightarrow \mathbb{R}<$ has a computable lower bound $a$ , we define the integral $\mathcal{C}{\text{bd}}(\mathbb{X}; \mathbb{R}<) \rightarrow \mathbb{R}<$ by

Xf(x)dμ(x)=X(a+f(x))dμ(x)ac.\int_{\mathbb{X}} f(x) d\mu(x) = \int_{\mathbb{X}} (a + f(x)) d\mu(x) - a c.

A continuous function $f : \mathbb{X} \rightarrow \mathbb{R}$ is effectively bounded if there are a (known) computable reals $a, b \in \mathbb{R}$ such that $a < f(x) < b$ for all $x \in \mathbb{X}$ . Then we define the integral $\mathcal{C}_{\text{bd}}(\mathbb{X}; \mathbb{R}) \rightarrow \mathbb{R}$ by

Xf(x)dμ(x)=X(a+f(x))dμ(x)ac=bcX(bf(x))dμ(x).\int_{\mathbb{X}} f(x) d\mu(x) = \int_{\mathbb{X}} (a + f(x)) d\mu(x) - a c = b c - \int_{\mathbb{X}} (b - f(x)) d\mu(x).

It is clear that the integrals defined above are computable in $\mathbb{R}_\geq$ and that the lower and upper integrals agree if $f$ is continuous. If $\mathbb{X}$ is compact, then any (semi)continuous function is effectively bounded, so the integrals always exist.

In order to define a valuation given a positive linear functional $\mathcal{C}{\text{cpt}}(\mathbb{X}; \mathbb{R}) \rightarrow \mathbb{R}$ on compactly-supported continuous functions, we need some way of approximating the characteristic function of an open set by continuous functions. If $\mathbb{X}$ is effectively regular, then given any open set $U$ , we can construct an increasing sequence of closed sets $A_n$ such that $\bigcup{n \rightarrow \infty} A_n = U$ . Further, a type $\mathbb{X}$ is effectively quasi-normal if given disjoint closed sets $A_0$ and $A_1$ , we can construct a continuous function $\phi : \mathbb{X} \rightarrow [0, 1]$ such that $\phi(A_0) = {0}$ and $\phi(A_1) = {1}$ using an effective Uryshon lemma; see [Sch09] for details.

We then have an effective version of the Riesz representation theorem:

Theorem 10. Suppose $\mathbb{X}$ is an effectively regular and quasi-normal type. Then type of locally-finite valuations $(\mathbb{X} \rightarrow \mathbb{S}) \rightarrow \mathbb{H}$ is effectively equivalent to the type of positive linear functionals $\mathcal{C}_{\text{cpt}}(\mathbb{X} \rightarrow \mathbb{R}) \rightarrow \mathbb{R}$ on continuous functions of compact support.

We consider lower-semicontinuous functionals $(\mathbb{X} \rightarrow \mathbb{H}) \rightarrow \mathbb{H}$ to be more appropriate as a foundation for computable measure theory than the continuous functionals $(\mathbb{X} \rightarrow \mathbb{R}) \rightarrow \mathbb{R}$ , since the equivalence given by Theorem 8 is entirely independent of any assumptions on the type $\mathbb{X}$ whereas the equivalence of Theorem 10 requires extra properties of $\mathbb{X}$ and places restrictions on the function space.

A similar monadic approach to probability measures [Esc09] based on type theory identified the type of probability measures on the Cantor space $\Omega = {0, 1}^\omega$ with the type of integrals $(\Omega \rightarrow \mathbb{I}) \rightarrow \mathbb{I}$ where $\mathbb{I} = [0, 1]$ is the unit interval.

4 Computable Random Variables

A computable theory of random variables should, at a minimum, enable us to perform certain basic operations, including:

  • (i) Given a random variable $X$ and open set $U$ , compute lower-approximation to $\mathbb{P}(X \in U)$ .
  • (ii) Given random variables $X_1, X_2$ , compute the random variable $X_1 \times X_2$ giving the joint distribution.
  • (iii) Given a random variable $X$ and a continuous function $f$ , compute the image $f(X)$ .- (iv) Given a sequence of random variables $X_1, X_2, \dots$ converging effectively in probability, compute a limit random variable $X_\infty = \lim_{m \rightarrow \infty} X_m$ .
  • (v) Given a probability distribution $\nu$ on a sufficiently nice space $\mathbb{X}$ , compute a random variable $X$ with distribution $\nu$ .
  • (vi) Given random variables $X_1, X_2$ , compute a random variable $X_1 \otimes X_2$ such that $\mathbb{P}(X_1 \otimes X_2) \in (U_1 \times U_2) = \mathbb{P}(X_1 \in U_1)\mathbb{P}(X_2 \in U_2)$ .

Property (i) states that we can compute the distribution of a random variable, while property (ii) implies that a random variable is more than its distribution; it also allows us to compute its joint distribution with another random variable. Property (iii) also implies that for random variables $X_1, X_2$ on a computable metric space $(\mathbb{X}, d)$ , the random variable $d(X_1, X_2)$ is computable in $\mathbb{R}^+$ , so the probability $\mathbb{P}(d(X_1, X_2) < \epsilon)$ is computable in $\mathbb{I}_<$ , and $\mathbb{P}(d(X_1, X_2) \leq \epsilon)$ is computable in $\mathbb{I}_>$ . Property (iv) is a completeness property and allows random variables to be approximated. Property (v) shows that random variables can realise a given distribution, while property (vi) shows that independent random variables can be constructed realising a given distribution. These properties are similar to those used in [Ker08].

The standard approach to probability theory used in classical analysis is to define random variables as measurable functions over a base probability space. Given types $\mathbb{X}$ and $\mathbb{Y}$ , a representation of the Borel measurable functions $f : \mathbb{X} \rightarrow \mathbb{Y}$ was given in [Bra05], but this does not allow one to compute lower bounds for the measure of $f^{-1}(V)$ for $V \in \mathcal{O}(\mathbb{Y})$ . Ideally, one would like a representation of bounded measurable functions $f : \mathbb{X} \rightarrow \mathbb{R}$ such that for every finite measure $\mu$ on $\mathbb{X}$ , the integral $\int_{\mathbb{X}} f(x) d\mu(x)$ is computable. But then $f(y) = \int_{\mathbb{X}} f(x) d\delta_y(x)$ would be computable, so $f$ would be continuous. Any effective approach to measurable functions and integration must therefore take some information about the measure into account.

In the approach of [BC72], a notion of full-measure set was given independently of a specific measure, but this introduces additional technical details. In [BB85], integrable functions are defined as limits in an integration space of functions, and measurable functions through approximation by integrable function. In the approach of [GLV11] a notion of continuous random variable was introduced as a continuous function on $\text{supp}(\nu)$ , where $\nu$ is a valuation on the Cantor space ${0, 1}^\omega$ . However, in order to define a joint distribution, we need to fix the measure $\nu$ , but for fixed $\nu$ , the set of continuous functions is not expressive enough. For example, using the standard probability measure $P$ on ${0, 1}^\omega$ , there is no continuous total function $X : {0, 1}^\omega \rightarrow {0, 1}$ such that $\mathbb{P}(X(\omega) = 1) = 1/3$ . In [Spi03], a type of integrable real-valued functions is defined as the completion of the continuous functions under the metric defined by $d(f, g) = \int_{\mathbb{X}} |f(x) - g(x)| d\mu(x)$ , and extended to a type of measurable functions. This approach is natural, constructive, and allows for integrals of measurable functions to be computed; it is this approach we shall use here.

We will consider random variables on a fixed probability space $(\Omega, P)$ . Since any probability distribution on a Polish space is equivalent to a distribution on the standard Lebesgue-Rokhlin probability space [Roh52], it is reasonable to take the base space to be the Cantor space $\Sigma = {0, 1}^\omega$ and $P$ the standard measure.

4.1 Measurability

Definition 11 (Continuous random variable). An continuous random variable on $(\Omega, P)$ with values in $\mathbb{X}$ is a continuous function $X : \Omega \rightarrow \mathbb{X}$ .

We will sometimes write $\mathcal{P}(X \in U)$ as a shorthand for $P({\omega \in \Omega \mid X(\omega) \in U})$ . Continuous random variables $X$ and $Y$ are considered equal if $P({\omega \in \Omega \mid X(\omega) \neq Y(\omega)}) = 0$ . In other words, $X$ and $Y$ are almost-surely equal.Suppose $\mathbb{X}$ is a Polish space, i.e. a space which is separable and complete under the metric $d$ . Define the Fan metric on continuous random variables by

d(X,Y)=sup{εQ+P({ωΩd(X(ω),Y(ω))>ε})>ε}=inf{εQ+P({ωΩd(X(ω),Y(ω))ε})<ε}.(3)\begin{aligned} d(X, Y) &= \sup\{\varepsilon \in \mathbb{Q}^+ \mid P(\{\omega \in \Omega \mid d(X(\omega), Y(\omega)) > \varepsilon\}) > \varepsilon\} \\ &= \inf\{\varepsilon \in \mathbb{Q}^+ \mid P(\{\omega \in \Omega \mid d(X(\omega), Y(\omega)) \geq \varepsilon\}) < \varepsilon\}. \end{aligned} \quad (3)

Given the probability distribution (valuation) $P$ and a computable metric $d : \mathbb{X} \times \mathbb{X} \rightarrow \mathbb{R}^+$ , the Fan metric on continuous random variables is easily seen to be computable. The convergence relation defined by the Fan metric corresponds to convergence in probability. As an alternative to using the Fan metric, we can consider a uniform structure on $\mathbb{X}$ , or, if the metric $d$ on $\mathbb{X}$ is bounded, the distance $d(X, Y) := \int_{\Omega} d(X(\omega), Y(\omega)) dP(\omega)$ .

Definition 12 (Measurable random variable). The type of measurable random variables is the effective completion of the type of continuous random variables under the Fan metric (3). We write $X : \Omega \rightsquigarrow \mathbb{X}$ if $X$ is a measurable random variable taking values in $\mathbb{X}$ , and let $\mathcal{R}(\mathbb{X})$ be the type of measurable random variables with values in $\mathbb{X}$ .

In other words, a random variable is represented by a sequence $(X_0, X_1, X_2, \dots)$ of continuous random variables satisfying $d(X_m, X_n) < 2^{-\min(m,n)}$ , and two such sequences are equivalent (represent the same random variable) if $d(X_{1,n}, X_{2,n}) \rightarrow 0$ as $n \rightarrow \infty$ .

By standard results on the completion, the Fan metric on continuous random variables extends computably to measurable random variables. For if $m > n$ , then $|d(X_m, Y_m) - d(X_n, Y_n)| \leq d(X_m, X_n) + d(Y_m, Y_n) \leq 2 \cdot 2^{-n}$ , so $(d(X_m, Y_m)){m \in \mathbb{N}}$ is an effective Cauchy sequence converging to a value we define as $d(\lim{n \rightarrow \infty} X_n, \lim_{n \rightarrow \infty} Y_n)$ . Further, if $(X_0, X_1, X_2, \dots)$ is an effective Cauchy sequence converging to $X_{\infty}$ , then $d(X_n, X_{\infty}) \leq 2^{-n}$ .

Remark 13. Although a measurable random variable $X$ is defined relative to the underlying space $\Omega$ , we cannot in general actually compute $X(\omega)$ in any meaningful sense for fixed $\omega \in \Omega$ ! The expression $X(\omega)$ only makes sense for random variables given as continuous functions $\Omega \rightarrow \mathbb{X}$ .

It will sometimes be useful to consider random variables taking only finitely many values.

Definition 14 (Simple random variable). An simple random variable on $(\Omega, P)$ with values in $\mathbb{X}$ is a continuous function $X : \Omega \rightarrow \mathbb{X}$ which takes finitely many values.

Clearly, if the base space $\Omega$ is connected, then any simple random variable is constant, but for base space $\Sigma$ , any continuous random variable can be effectively approximated by simple random variables, which immediately yields effective approximation by measurable random variables.

Lemma 15. Given any continuous function $X : \Sigma \rightarrow \mathbb{X}$ , we can compute a sequence of simple functions $X_m$ converging effectively to $X$ in the uniform metric.

Proof. Take $I_m(\omega) = \omega|m 0^\omega$ for all $\omega \in \Sigma$ , $m \in \mathbb{N}$ , and let $X_m = X \circ I_m$ . Then $d(X_m, X) = \sup{\omega \in \Sigma} d(X(I_m(\omega)), X(\omega))$ is computable, being the supremum of a continuous function over a compact set. Further, $d(X_m, X) \rightarrow 0$ as $m \rightarrow \infty$ since $X$ is uniformly continuous, so $X_m$ converges effectively. $\square$

It is also useful to consider more general classes of random variables by allowing for partial functions on a full-measure set. This is important if the base space $\Omega$ is connected, but $\mathbb{X}$ is path-connected.

Definition 16 (Piecewise-continuous random variable). A piecewise-continuous random variable on $(\Omega, P)$ with values in $\mathbb{X}$ is a continuous partial function $X : \Omega \rightarrow \mathbb{X}$ such that $(\text{dom}(X) \in \mathcal{O}(\Omega))$ and $P(\text{dom}(X)) = 1$ .We use the terminology “piecewise-continuous” since $X : \omega \rightarrow \mathbb{X}$ may arise as the restriction of a piecewise-continuous function to its continuity set.

By [Wei99, Theorem 2.2.4], machine-computable functions ${0, 1}^\omega \rightarrow {0, 1}^\omega$ are defined on a $G_\delta$ -subsets of ${0, 1}^\omega$ . Indeed, any function into a metric space is continuous on a $G_\delta$ set of points. This makes functions defined and continuous on a full-measure $G_\delta$ -subset of $\Omega$ a natural class of random variables.

Definition 17 (Almost-surely continuous random variable). An almost-surely-continuous random variable on $(\Omega, P)$ with values in $\mathbb{X}$ is a continuous partial function $X : \Omega \rightarrow \mathbb{X}$ such that $\text{dom}(X)$ is a $G_\delta$ set and $P(\text{dom}(X)) = 1$ , where $P(\bigcap_{n \in \mathbb{N}} U_n) = 1$ if $P(U_n) = 1$ for all $n$ .

The following result shows that almost-surely continuous random variables are measurable random variables for the base space $\Sigma$ .

Proposition 18. Suppose the base probability space is ${0, 1}^\omega$ . Then any almost-surely-continuous random variable defined on a full-measure open set is effectively a measurable random variable. However, not all measurable random variables are almost-surely continuous.

Proof. Given a almost-surely-continuous random variable $X$ , we can construct a sequence of continuous random variables converging weakly to $X$ . Let $\xi : {0, 1}^\omega \rightarrow {0, 1}^\omega$ be the code of machine-computable function representing $X$ , so $X = \delta \circ \xi$ where $\delta : {0, 1}^\omega \rightarrow \mathbb{X}$ is a representation of $\mathbb{X}$ , and $\text{dom}(X) = \text{dom}(\xi)$ . Fix $\epsilon = 2^{-n} > 0$ . Consider the set of $\omega$ on which $\xi$ is defined and for some $\delta$ maps the $\delta$ -ball about $\omega$ into the $\epsilon$ -ball about $\xi(\omega)$ . Since $\xi$ is continuous on its domain, this set is $\text{dom}(\xi)$ . Hence on some full-measure open set, $\xi$ is provably defined up to error $2^{-n}$ . Since this set is a countable union of cylinder sets, we can compute $X_n$ agreeing with $X_\infty$ up to $2^{-n}$ on a set of measure $1 - 2^{-n}$ .

Conversely, we can define a strong Cauchy sequence $X_n$ of piecewise-continuous random variable taking values in ${0, 1}$ such that $X_n = 1$ on a decreasing sequence of closed sets $W_n$ of measure $(1 + 2^{-n})/2$ whose limit is a Cantor set. Then $\lim_{n \rightarrow \infty} X_n$ is discontinuous on a set of positive measure. $\square$

Similarly to the notion of measurable function, we can define the notion of measurable set.

Definition 19 (Measurable set). A measurable set $A$ in $\Omega$ (more precisely, the characteristic function $\chi_A$ of a measurable set $A$ ) is a measurable random variable in $\mathbb{I} = [0, 1]$ such that $\mathbb{P}(\chi_A \in {0, 1}) = 1$ .

If $\Omega = \Sigma$ , then the characteristic function $\chi_A$ is the limit in probability of $\chi_{A_n}$ for clopen sets $A_n$ . Equivalently, any measurable set is a limit of an effective Cauchy sequence of clopen sets $(A_n)_{n \in \mathbb{N}}$ under the metric $d(A_m, A_n) = P(A_m \Delta A_n)$ .

In classical measure theory, it is also useful to consider the indicator function of a set of values of a random variable, defined as

I[XS]:Ω{0,1}:ω{1if X(ω)S;0if X(ω)S.(4)I[X \in S] : \Omega \rightsquigarrow \{0, 1\} : \omega \mapsto \begin{cases} 1 & \text{if } X(\omega) \in S; \\ 0 & \text{if } X(\omega) \notin S. \end{cases} \quad (4)

If $X$ is a continuous random variable, and $U$ is open, then $I[X \in U]$ is computable as a function $\Omega \rightarrow [0, 1]<$ , and if $A$ is closed, then $I[X \in A]$ is computable in $\Omega \rightarrow [0, 1]_>$ . These indicator functions cannot be seen as random variables as the range spaces are not Hausdorff. Indicator functions for measurable random variables taking values in the Polish space ${0, 1} \subset \mathbb{R}$ are only computable as measurable random variables for clopen sets as the following example shows:Example 20 (Uncomputability of indicator functions). Let $X$ be a random variable and $U$ an open set. Suppose $I[X \in U]$ were to be computable as a measurable random variable given $X$ and $A$ . Then $\mathbb{P}(X \in A) = \mathbb{P}(I[X \in U] \geq \frac{1}{2})$ would be computable in $[0, 1]>$ , so $\mathbb{P}(X \in U)$ would be computable in $[0, 1]$ . Taking $X = \delta_x$ gives $\mathbb{P}(X \in U) = 1$ if $x \in U$ and 0 if $x \notin U$ , so $U$ would be effectively closed.

However, we shall see that for an open set $U$ , the indicator function $I[X \in U]$ induces a valuation on $\Omega$ which is computable. This makes indicator functions useful when we are only interested in information about probabilities and expectations, such as the submartingale inequality (16).

4.2 Distribution

We now consider the probability distribution of a measurable random variable. Let $\mathbb{X}$ be a computable metric space. For a closed set $A$ , define $\overline{N}\epsilon(A) := {x \in \mathbb{X} \mid d(x, A) \leq \epsilon}$ , and for an open set $U$ define $I_\epsilon(U) := \mathbb{X} \setminus (\overline{N}\epsilon(\mathbb{X} \setminus U)) = {x \in U \mid \exists \delta > 0, B(x, \epsilon + \delta) \subset U}$ . Since $d(x, A)$ is computable in $R{<}^{+,\infty}$ by definition of a computable metric space, $\overline{N}\epsilon(A)$ is computable as a closed set, so $I_\epsilon(U)$ is computable as an open set. Note that $I{\epsilon_1+\epsilon_2}(U) \subset I{\epsilon_1}(I_{\epsilon_2}(U))$ .

Definition 21 (Distribution of a measurable random variable). For a measurable random variable $X$ , define its distribution by

P(XU)=sup{P(YV)ϵϵQ+,VIϵ(U),d(Y,X)<ϵ}.\mathbb{P}(X \in U) = \sup\{P(Y \in V) - \epsilon \mid \epsilon \in \mathbb{Q}^+, V \subset I_\epsilon(U), d(Y, X) < \epsilon\}.

where $Y$ ranges over continuous random variables and $V$ over open sets.

Theorem 22 (Computability of distribution). Suppose $(\mathbb{X}, d)$ is a computable metric space. The distribution of a measurable random variable $X$ taking values in $\mathbb{X}$ is a valuation, and is computable from a name of $X$ . If $X$ is a continuous random variable, then $\mathbb{P}(X \in U) = P({\omega \in \Omega \mid X(\omega) \in U})$ .

Proof. Suppose $X, Y$ are continuous random variables, $d(X, Y) < \epsilon$ and $V \subset I_\epsilon(U)$ . Then $\mathcal{P}(X \in U) \geq \mathcal{P}(Y \in V \wedge d(X, Y) < \epsilon) \geq \mathcal{P}(Y \in V) - \mathcal{P}(d(X(\omega), Y(\omega)) \geq \epsilon) \geq \mathcal{P}(Y(\omega) \in V) - \epsilon$ .

Now take $(X_n){n \in \mathbb{N}}$ to be any sequence of continuous random variables converging effectively to a measurable random variable $X$ . By definition of $\mathbb{P}(X \in U)$ , we have $\mathbb{P}(X \in U) \geq \mathcal{P}(X_m \in I{2^{-m}}) - 2^{-m}$ for all $m$

Fix $\delta > 0$ and take a continuous random variable $Y$ such that $d(X, Y) < \epsilon$ and $\mathcal{P}(Y \in I_\epsilon(U)) - \epsilon > \mathbb{P}(X \in U) - \delta$ . By taking $m$ sufficiently large, we can ensure that $\mathcal{P}(Y \in I_{2^{1-m}+\epsilon}(U)) - \epsilon > \mathbb{P}(X \in U) - \delta$ . Then since $d(X_m, Y) \leq \epsilon + 2^{-m}$ , we have $\mathcal{P}(X_m \in I_{2^{-m}}) - 2^{-m} \geq \mathcal{P}(Y \in I_{\epsilon+2^{1-m}}(U)) - (\epsilon + 2^{1-m}) > \mathbb{P}(X \in U) - (\delta + 2^{1-m})$ . Since $\delta$ is arbitrary and $m$ may be taken arbitrarily large, we have $\mathbb{P}(X \in U) = \sup_{n \in \mathbb{N}} \mathcal{P}(X_n \in I_{2^{-n}}(U)) - 2^{-n}$ , so is computable in $\mathbb{I}_{<}$ .

If $X$ is a continuous random variable, $\mathcal{P}(X \in I_\epsilon(U)) \nearrow \mathcal{P}(X \in U)$ as $\epsilon \rightarrow 0$ by continuity, so taking $X_n = X_\infty = X$ above, we have $\mathbb{P}(X \in U) = \lim_{n \rightarrow \infty} \mathcal{P}(X \in I_\epsilon(U)) - \epsilon = \mathcal{P}(X \in U)$ . $\square$

Remark 23. Although the notation $\mathbb{P}(X \in U)$ suggests that we can define a measurable random variable $X$ as a function from $\Omega$ to $\mathbb{X}$ , such a function could not be constructed in general, and is not required to compute probabilities.

Corollary 24. If $X$ is a random variable and $A$ is a closed set, then $\mathbb{P}(X \in A)$ is computable in $[0, 1]_>$ .

We now show that we can construct a random variable with a given distribution. The result below is a variant of [HR09, Theorem 1.1.1], which shows that any distribution is effectively measurably isomorphic to a distribution on ${0, 1}^\omega$ , and the proof is similar.We first prove the following generally-useful decomposition result, which is essentially a special case of the effective Baire category theorem [YMT99, Bra01].

Lemma 25. Let $X$ be an effectively separable computable metric space, and $\mu$ be a measure on $X$ . Then given any $\epsilon > 0$ , we can compute a topological partition $\mathcal{B}$ of $X$ such that $\text{diam}(B) < \epsilon$ for all $B \in \mathcal{B}$ , and $\mu(X \setminus \bigcup \mathcal{B}) = 0$ .

Proof. For any $\delta > 0$ , and any $x \in \mathbb{X}$ , ${r > 0 \mid \mu(\overline{B}(x, r) \setminus B(x, r)) < \delta}$ is a computable open dense set. We can therefore construct a sequence of rationals $q_k$ such that $|q_k - q_{k+1}| < 2^{-k-1}$ and $\mu(\overline{B}(x, q_k) \setminus B(x, q_k)) < 2^{-k}$ . Then taking $r_\delta(x) = \lim_{k \rightarrow \infty} q_k$ yields a suitable radius.

Since $X$ is effectively separable, it has a computable dense sequence $(x_n){n \in \mathbb{N}}$ . For $\epsilon > 0$ , we take as topological partition the sets $B(x_n, r_\epsilon(x_n)) \setminus \bigcup{m=0}^{n-1} \overline{B}(x_m, r_\epsilon(x_m))$ for $n \in \mathbb{N}$ . $\square$

Theorem 26. Let $\mathbb{X}$ be a computable metric space, and $\nu$ be a valuation on $\mathbb{X}$ . Then we can compute a measurable random variable $X$ on base space ${0, 1}^\omega$ such that for any open $U$ , $\mathbb{P}(X \in U) = \nu(U)$ .

Proof. For each $n$ , construct a countable topological partition $\mathcal{B}_n$ such that each $B \in \mathcal{B}n$ has radius at most $2^{-n}$ , and $\sum{B \in \mathcal{B}_n} \mu(\partial B) = 0$ such that $\nu(\bigcup \mathcal{B}n) > 1 - 2^{-n-1}$ . By taking intersections if necessary, we can assume that each $\mathcal{B}{n+1}$ is a refinement of $\mathcal{B}_n$ .

We now construct random variables $X_n$ as follows. Suppose we have constructed cylinder sets $W_{n,m} \subset {0, 1}^\infty$ such that $P(W_{n,m}) < \mu(B_{n,m})$ and $\sum_m P(W_{n,m}) > 1 - 2^{-n}$ . Since $B_{n,m}$ is a union of open sets ${B_{n+1,m,1}, \dots, B_{n+1,m,k}} \subset \mathcal{B}{n+1}$ , we can effectively compute dyadic numbers $p{n+1,m}$ such that $p_{n+1,m,k} < \mu(B_{n+1,m,k})$ and $\sum_k p_{n+1,m,k} \geq P(W_{n,m})$ . We then partition $W_{n,m}$ into cylinder sets $W_{n+1,m,k}$ each of measure $p_{n+1,m,k}$ . We take $X_{n+1}$ to map $W_{n+1,m,k}$ to a point $x_{n+1,m,k} \in B_{n+1,m,k}$ . It is clear that $X_n$ is a strongly-convergent Cauchy sequence, so converges to a random variable $X_\infty$ .

It remains to show that $\mathbb{P}(X_\infty \in U) = \mu(U)$ for all $U \in \mathcal{O}(X)$ . This follows since for given $n$ we have $\mathbb{P}(X_n \in U) > \mu(I_{2^{1-n}}(U)) - 2^{-n} \nearrow \mu(U)$ as $n \rightarrow \infty$ . $\square$

4.3 Product and Image

If $X_1, X_2$ are continuous random variables, define the product $X_1 \times X_2$ as the functional product

(X1×X2)(ω)=(X1(ω),X2(ω)).(X_1 \times X_2)(\omega) = (X_1(\omega), X_2(\omega)).

Our second main result is that we can also compute products of measurable random variables.

Theorem 27 (Computability of products). If $X_n$ and $Y_n$ are effective Cauchy sequences of continuous random variables, then $X_n \times Y_n$ is an effective Cauchy sequence.

Proof. If $m > n$ , then $\mathcal{P}(d(X_{m+1} \times Y_{m+1}, X_{n+1} \times Y_{n+1}) \geq 2^{-n}) = \mathcal{P}(d(X_{m+1}, X_{n+1}) \geq 2^{-n} \vee d(Y_{m+1}, Y_{n+1}) \geq 2^{-n}) \leq \mathcal{P}(d(X_{m+1}, X_{n+1}) \geq 2^{-(n+1)}) + \mathcal{P}(d(Y_{m+1}, Y_{n+1}) \geq 2^{-(n+1)}) \leq 2^{-(n+1)} + 2^{-(n+1)} = 2 \cdot 2^{-(n+1)} = 2^{-n}$ . Hence $d(X_m \times Y_m, X_n \times Y_n) \leq 2^{-n}$ as required. $\square$

If $X : \Omega \rightarrow \mathbb{X}$ is a continuous random variable, and $f : \mathbb{X} \rightarrow \mathbb{Y}$ is continuous, then $f \circ X$ is a continuous random variable. By taking limits of effective Cauchy sequences, it is clear that we can define the image of a measurable random variable under a uniformly continuous function. However, it is possible to compute the image under an arbitrary continuous function.

Theorem 28 (Continuous mapping). *If $X_n$ is an effective Cauchy sequence of continuous random variables taking values in $\mathbb{X}$ and $f : \mathbb{X} \rightarrow \mathbb{Y}$ is continuous, then $f \circ X_n$ is an effective Cauchy sequence. Further, for any open $V \subset \mathbb{Y}$ , $\mathbb{P}(\lim_{n \rightarrow \infty} f \circ X_n \in V) = \mathbb{P}(\lim_{n \rightarrow \infty} X_n \in f^{-1}(V))$ .*The proof is based on the non-effective version of this result from [MW43].

Proof. Consider the open subsets of $\mathbb{X}$ defined as

Bδ,ϵ(f)={xXyX,d(x,y)<δd(f(x),f(y))>ϵ}.B_{\delta,\epsilon}(f) = \{x \in \mathbb{X} \mid \exists y \in \mathbb{X}, d(x, y) < \delta \wedge d(f(x), f(y)) > \epsilon\}.

Since $f$ is continuous, for all $\epsilon > 0$ , $\bigcap_{\delta > 0} B_{\delta,\epsilon}(f) = \emptyset$ .

For all $x, y, z \in \mathbb{X}$ ,

d(f(x),f(y))>ϵ    d(f(x),f(z))>ϵ/2d(f(y),f(z))>ϵ/2    d(x,z)δd(y,z)δzBδ,ϵ/2(f).\begin{aligned} d(f(x), f(y)) > \epsilon &\implies d(f(x), f(z)) > \epsilon/2 \vee d(f(y), f(z)) > \epsilon/2 \\ &\implies d(x, z) \geq \delta \vee d(y, z) \geq \delta \vee z \in B_{\delta,\epsilon/2}(f). \end{aligned}

Hence for random variables $X, Y, Z$ ,

P(d(f(X),f(Y))>ϵ)P(d(X,Z)>δ)+P(d(Y,Z)>δ)+P(ZBδ,ϵ/2(f)).\mathbb{P}(d(f(X), f(Y)) > \epsilon) \leq \mathbb{P}(d(X, Z) > \delta) + \mathbb{P}(d(Y, Z) > \delta) + \mathbb{P}(Z \in B_{\delta,\epsilon/2}(f)).

Let $X_n$ be an effective Cauchy sequence of continuous random variables with limit $X$ . Fix $\epsilon > 0$ . By continuity of the distribution of $X$ , we have $\mathbb{P}(X \in B_{\delta,\epsilon/2}) < \epsilon/2$ for sufficiently small $\delta$ , and by computability of the distribution, we can effectively find such a $\delta$ . Take $N(\epsilon)$ so that $2^{-N(\epsilon)} < \min{\delta, \epsilon/4}$ . Then for $n \geq N(\epsilon)$ , we have $\mathbb{P}(d(X, X_n) > \delta) < \mathbb{P}(d(X, X_n) > 2^{-n}) < 2^{-n} < \epsilon/4$ . Therefore if $m, n \geq N(\epsilon)$ , we have $\mathbb{P}(d(f(X_m), f(X_n)) > \epsilon) \leq \mathbb{P}(d(X_m, X) > \delta) + \mathbb{P}(d(X_n, X) > \delta) + \mathbb{P}(X \in B_{\delta,\epsilon/2}(f)) < \epsilon$ . Thus $f(X_n)$ is an effective Cauchy sequence, satisfying $d(f(X_m), f(X_n)) < \epsilon$ whenever $m, n \geq N(\epsilon)$ .

Fix $\epsilon > 0$ , and choose $\delta$ such that $\mathbb{P}(f(X) \in V) < \mathbb{P}(f(X) \in I_\delta(V)) + \epsilon/2$ . For $n$ sufficiently large, we have $d(f(X_n), f(X)) < \min{\delta, \epsilon/2}$ . Then $\mathbb{P}(f(X) \in V) \geq \mathbb{P}(f(X_n) \in I_\epsilon(V)) - \epsilon = \mathbb{P}(X_n \in f^{-1}(I_\epsilon(V))) - \epsilon$ . Taking $n \rightarrow \infty$ gives $\mathbb{P}(f(X) \in V) \geq \mathbb{P}(X \in f^{-1}(I_\epsilon(V))) - \epsilon$ , and taking $\epsilon \rightarrow 0$ given $f^{-1}(I_\epsilon(V)) \rightarrow f^{-1}(V)$ , so $\mathbb{P}(f(X) \in V) \geq \mathbb{P}(X \in f^{-1}(V))$ . For the reverse inequality, for $n$ sufficiently large, we have $d(f(X_n), f(X)) < \min{\delta, \epsilon/2}$ , so $\mathbb{P}(f(X_n) \in V) \geq \mathbb{P}(f(X) \in I_\delta(V)) - \epsilon/2$ . Then $\mathbb{P}(f(X) \in V) < \mathbb{P}(f(X_n) \in V) + \epsilon = \mathbb{P}(X_n \in f^{-1}(V)) + \epsilon$ . Taking $n \rightarrow \infty$ gives $\mathbb{P}(f(X) \in V) \leq \mathbb{P}(X \in f^{-1}(V)) + \epsilon$ , and since $\epsilon$ is arbitrary, $\mathbb{P}(f(X) \in V) \leq \mathbb{P}(X \in f^{-1}(V))$ . $\square$

If $X$ is a measurable $\mathbb{X}$ -valued random variable and $f : \mathbb{X} \rightarrow \mathbb{Y}$ where $\mathbb{Y}$ is not a metric space, then $f(X)$ is not defined as a random variable, since we have only a notion of random variables on metric spaces. However, the distribution of $f(X)$ is well-defined and computable, with $\mathbb{P}(f(X) \in V) := \mathbb{P}(X \in f^{-1}(V))$ for $V \in \mathcal{O}(V)$ . Indeed, we can compute the joint distribution of $f(X)$ and $Z \in \mathcal{R}(\mathbb{Z})$ by $\mathbb{P}(f(X) \in V \wedge Z \in W) := \mathbb{P}(X \in f^{-1}(V) \wedge Z \in W)$ for $V \in \mathcal{O}(\mathbb{Y})$ and $W \in \mathcal{O}(\mathbb{Z})$ .

4.4 Expectation

The expectation of a random variable is not continuous in the weak topology; for example, we can define continuous random variables $X_n$ taking value $2^n$ on a subset of $\Omega$ of measure $2^{-n}$ , so that $X_n \rightarrow 0$ but $\mathbb{E}(X_n) = 1$ for all $n$ . For this reason, we need a new type of integrable random variables.

Definition 29 (Integrable random variable). Let $(\mathbb{X}, d)$ be a metric space. The type of integrable random variables is the effective completion of the type of continuous random variables under the metric

d1(X,Y)=Ωd(X(ω),Y(ω))dP(ω).(5)d_1(X, Y) = \int_{\Omega} d(X(\omega), Y(\omega)) dP(\omega). \quad (5)If $d$ is a bounded metric, then this metric is equivalent to the Fan metric.

For integrable random variables taking values in the reals, the expectation is defined in the usual way:

Definition 30 (Expectation). If $X : \Omega \rightarrow \mathbb{R}$ is a continuous real-valued random variable, the expectation of $X$ is given by the integral

E(X)=ΩX(ω)dP(ω),\mathbb{E}(X) = \int_{\Omega} X(\omega) dP(\omega),

which always exists since $X$ has compact values.

If $X : \Omega \rightsquigarrow \mathbb{R}$ is an integrable real-valued random variable, and $X$ is presented as $\lim_{n \rightarrow \infty} X_n$ for some sequence of random variables satisfying $\mathbb{E}[|X_{n_1} - X_{n_2}|] \leq 2^{-\min(n_1, n_2)}$ , define

E(X)=limnE(Xn),\mathbb{E}(X) = \lim_{n \rightarrow \infty} \mathbb{E}(X_n),

which is an effective Cauchy sequence since $|\mathbb{E}(X_{n_1}) - \mathbb{E}(X_{n_2})| \leq \mathbb{E}(|X_{n_1} - X_{n_2}|)$ .

We can effectivise Lebesgue spaces $\mathcal{L}^p(\mathbb{X})$ of integrable random variables through the use of effective Cauchy sequences in the natural way: If $(\mathbb{X}, |\cdot|)$ is a normed space, then the type of $p$ -integrable random variables with values in $\mathbb{X}$ is the effective completion of the type of $p$ -integrable continuous random variables under the metric $d_p(X, Y) = |X - Y|_p$ induced by the norm

Xp=(ΩX(ω)pdP(ω))1/p=(E(Xp))1/p.(6)\|X\|_p = \left( \int_{\Omega} |X(\omega)|^p dP(\omega) \right)^{1/p} = (\mathbb{E}(|X|^p))^{1/p}. \quad (6)

We can easily prove the Cauchy-Schwarz and triangle inequalities for measurable random variables $|XY|_{pq/(p+q)} \leq |X|_p \cdot |Y|_q$ and $|X + Y|_p \leq |X|_p + |Y|_p$ .

Theorem 31 (Expectation). Let $X$ be a positive real-valued random variable such that $\mathbb{E}(X) < \infty$ . Then

E(X)=0P(X>x)dx=0P(Xx)dx.\mathbb{E}(X) = \int_0^{\infty} \mathbb{P}(X > x) dx = \int_0^{\infty} \mathbb{P}(X \geq x) dx.

Note that the first integral is computable in $\mathbb{R}_{<}^+$ , but the second integral is in general uncomputable in $\mathbb{R}_{>}^+$ , due to the need to take the limit as the upper bound of the integral goes to infinity. However, the second integral may be computable if the tail is bounded, for example, if $X$ takes bounded values. The proof follows from the definition of the lower integral:

Proof. First assume $X$ is a continuous random variable, so by definition, $\mathbb{E}(X) = \int_{\Omega} X(\omega) dP(\omega)$ .

The definition of the lower horizontal integral gives $\int_{\Omega} X(\omega) dP(\omega) \geq \sum_{i=0}^{n-1} (x_i - x_{i-1}) P({\omega \mid X(\omega) > x_i})$ for all values $0 = x_0 < x_1 < \dots < x_n$ . Take $x_i - x_{i-1} < \epsilon$ for all $i$ . Then $\mathbb{E}(X) + \epsilon = \int_{\Omega} X(\omega) + \epsilon dP(\omega) \geq \sum_{i=1}^n (x_i - x_{i-1}) P({\omega \mid X(\omega) + \epsilon > x_i}) = \sum_{i=1}^n \int_{x_{i-1}}^{x_i} \mathbb{P}(X(\omega) > x_i - \epsilon) dx \geq \sum_{i=1}^n \int_{x_{i-1}}^{x_i} \mathbb{P}(X(\omega) > x_{i-1}) dx \geq \sum_{i=1}^n \int_{x_{i-1}}^{x_i} \mathbb{P}(X(\omega) > x) dx = \int_0^{x_n} \mathbb{P}(X(\omega) > x) dx$ . Taking $n \rightarrow \infty$ gives $\mathbb{E}(X) \geq \int_0^{\infty} \mathbb{P}(X > x) dx - \epsilon$ , and since $\epsilon$ is arbitrary, $\mathbb{E}(X) \geq \int_0^{\infty} \mathbb{P}(X > x) dx$ .

The definition of the lower horizontal integral gives for all $\epsilon > 0$ , there exist $0 = x_0 < x_1 < \dots < x_n$ , such that $\int_{\Omega} X(\omega) dP(\omega) \leq \sum_{i=1}^n (x_i - x_{i-1}) P({\omega \mid X(\omega) > x_i}) + \epsilon$ . By refining the partition if necessary, we can assume $x_i - x_{i-1} < \epsilon$ for all $i$ . Then $\mathbb{E}(X) - \epsilon \leq \sum_{i=1}^n (x_i - x_{i-1}) P({\omega \mid X(\omega) > x_i}) = \sum_{i=1}^n \int_{x_{i-1}}^{x_i} P({\omega \mid X(\omega) > x_i}) dx \leq \sum_{i=1}^n \int_{x_{i-1}}^{x_i} P({\omega \mid X(\omega) > x}) dx = \int_0^{x_n} P({\omega \mid X(\omega) > x}) dx \leq \int_0^{\infty} P({\omega \mid X(\omega) > x}) dx$ . Hence $\mathbb{E}(X) \leq \int_0^{\infty} \mathbb{P}(X > x) dx + \epsilon$ , and since $\epsilon$ is arbitrary, $\mathbb{E}(X) \leq \int_0^{\infty} \mathbb{P}(X > x) dx$ .

The case of measurable random variables follows by taking limits.

We show $\int_0^{\infty} \mathbb{P}(X \geq x) dx = \mathbb{E}(X)$ since $\int_0^{\infty} \mathbb{P}(X > x) dx \leq \int_0^{\infty} \mathbb{P}(X \geq x) dx \leq \int_0^{\infty} \mathbb{P}(X + \epsilon > x) dx = \epsilon + \int_0^{\infty} \mathbb{P}(X \geq x) dx$ for any $\epsilon > 0$ . $\square$

By changing variables in the integral, we obtain:Corollary 32. If $X$ is a real-valued random variable, then for any $\alpha \geq 1$ ,

E(Xα)=0αxα1P(X>x)dx=0αxα1P(Xx)dx.\mathbb{E}(|X|^\alpha) = \int_0^\infty \alpha x^{\alpha-1} \mathbb{P}(X > x) dx = \int_0^\infty \alpha x^{\alpha-1} \mathbb{P}(X \geq x) dx.

Remark 33 (Expectation of a distribution). Theorem 31 shows that the expectation of a random variable depends only on its distribution. Indeed, we can define the expectation of a probability valuation $\pi$ on $[0, \infty[$ by

E(π)=0π([x,[)dx=0π([x,[)dx.\mathbb{E}(\pi) = \int_0^\infty \pi([x, \infty[) dx = \int_0^\infty \pi([x, \infty[) dx.

If $f : \mathbb{X} \rightarrow \mathbb{R}_{<}^+$ , then we can compute the lower expectation of $f(X)$ by

E<(f(X)):=0P(Xf1([λ,[))dλ.(7)\mathbb{E}_{<}(f(X)) := \int_0^\infty \mathbb{P}(X \in f^{-1}([\lambda, \infty[)) d\lambda. \quad (7)

We have an effective version of the classical dominated convergence theorem.

Theorem 34 (Dominated convergence). Suppose $X_n \rightarrow X$ weakly, and there is an integrable function $Y : \Omega \rightarrow \mathbb{R}$ such that $|X_n| \leq Y$ for all $n$ (i.e. $\mathbb{P}(Y - |X_n| \geq 0) = 1$ ) and that $\mathbb{E}|Y| < \infty$ . Then $X_n$ converges effectively under the metric (5). In particular, the limit of $\mathbb{E}(X_n)$ always exists

Proof. Since $\mathbb{E}(Y) < \infty$ , the probabilities $\mathbb{P}(Y \geq y) \rightarrow 0$ as $y \rightarrow \infty$ . For fixed $\epsilon > 0$ , let $b(\epsilon) = \sup{y \mid \mathbb{P}(Y \geq y) \geq \epsilon}$ , which is computable in $\mathbb{R}{>}$ given $\epsilon$ . Then $\sup{\int_A Y dP \mid P(A) \leq \epsilon} \leq \int{b(\epsilon)}^\infty \mathbb{P}(Y \geq y) dy = \mathbb{E}(Y) - \int_0^{b(\epsilon)} \mathbb{P}(Y > y) dy$ in $\mathbb{R}{>}^+$ . For continuous random variables $X_m, X_n$ with $2^{-m}, 2^{-n} < \epsilon$ , taking $A_\epsilon = {\omega \mid d(X_m(\omega), X_n(\omega)) \geq \epsilon}$ gives $\mathbb{E}(|X_m - X_n|) \leq \epsilon + \int{A_\epsilon} |X_m(\omega) - X_n(\omega)| dP(\omega) \leq \epsilon + \int_{A_\epsilon} |X_m(\omega)| + |X_n(\omega)| dP(\omega) \leq \epsilon + \int_{A_\epsilon} 2|Y| dP \leq \epsilon + 2 \int_{b(\epsilon)}^\infty \mathbb{P}(Y \geq y) dy$ , which converges effectively to 0 as $\epsilon \rightarrow 0$ . $\square$

4.5 Independence and Conditioning

The concept of conditional random variable is subtle even in classical probability theory. The basic idea is that if we condition a random quantity $Y$ on some information of kind $\mathcal{X}$ , then we can reconstruct $Y$ given a value $x$ . Classically, conditional random variables are not defined, but conditional distributions and expectations are. Conditional expectations can be shown to exist using the Radon-Nikodym derivative, but this is uncomputable [HRW11].

In the classical case, we condition relative to a sub-sigma-algebra of the measure space. In the computable case, it makes sense to consider instead a sub-topology $\mathcal{T}$ on $\Omega$ . We first need to define concepts of $\mathcal{T}$ measurability and $\mathcal{T}$ independence

Definition 35. Let $\mathcal{T}$ be a topology on $\Omega$ . We say a measurable random variable $X$ is $\mathcal{T}$ -measurable if $X$ is a limit of $\mathcal{T}$ -continuous random variables $X_m$ , and write $\mathcal{R}_{\mathcal{T}}(\mathbb{X})$ for the type of $\mathcal{T}$ -measurable random variables with values in $\mathbb{X}$ .

We define independence relative to a sub-topology using the identity random variable $I : \Omega \rightarrow \Omega$ .

Definition 36 (Independence). Given a topology $\mathcal{T}$ on $\Omega$ , we say a random variable $X : \Omega \rightsquigarrow \mathbb{X}$ is independent of $\mathcal{T}$ if

P(X×IU×W)=P(XU)P(W)(8)\mathbb{P}(X \times I \in U \times W) = \mathbb{P}(X \in U)P(W) \quad (8)

whenever $U \in \mathcal{O}(\mathbb{X})$ and $W \in \mathcal{T}$ . We write $\mathcal{R}{\perp \mathcal{T}}(\mathbb{X})$ for the type of $\mathcal{T}$ -independent random variables with values in $\mathbb{X}$ .We say random variables $X_1, \dots, X_k$ are jointly independent of $\mathcal{T}$ if the product $\prod{i=1}^k X_i$ is independent of $\mathcal{T}$ .

We say random variables $X_1, X_2$ taking values in $\mathbb{X}_1, \mathbb{X}_2$ are independent if for all open $U_1 \subset \mathbb{X}_1$ and $U_2 \subset \mathbb{X}_2$ , we have

P(X1U1X2U2)=P(X1U1)P(X2U2).(9)\mathbb{P}(X_1 \in U_1 \wedge X_2 \in U_2) = \mathbb{P}(X_1 \in U_1) \cdot \mathbb{P}(X_2 \in U_2). \quad (9)

If $X : \Omega \rightarrow \mathbb{X}$ is continuous, the definition of independence with respect to $\mathcal{T}$ reduces to

P({ωΩX(ω)UωW})=P(XU)P(W)P(\{\omega \in \Omega \mid X(\omega) \in U \wedge \omega \in W\}) = P(X \in U)P(W)

If $X$ is independent of $\mathcal{T}$ , and $\mathcal{T}$ is the topology generated by a continuous random variable $Y$ , then $X$ is independent of $Y$ , since

P(XUYV)=P(XUIY1(V))=P(XU)P(Y1(V))=P(XU)P(YV).\mathbb{P}(X \in U \wedge Y \in V) = \mathbb{P}(X \in U \wedge I \in Y^{-1}(V)) = \mathbb{P}(X \in U)P(Y^{-1}(V)) = \mathbb{P}(X \in U)\mathbb{P}(Y \in V).

Note that it is possible for $X_1, X_2$ to be independent of $\mathcal{T}$ , but $X_1 \times X_2$ not to be. Clearly if $X_1$ and $X_2$ are independent real-valued integrable random variables, then $\mathbb{E}(X_1 X_2) = \mathbb{E}(X_1)\mathbb{E}(X_2)$ .

Since the space $(\mathbb{T}, \mathcal{T})$ need not be a Kolmogorov $(T^0)$ space, it is useful to quotient out sets of points which cannot be distinguished from each other by the topology $\mathcal{T}$ . The resulting quotient space $\mathbb{T}/\mathcal{T}$ is determined by the equivalence relation

t1Tt2    UT,t1U    t2U,t_1 \sim_{\mathcal{T}} t_2 \iff \forall U \in \mathcal{T}, t_1 \in U \iff t_2 \in U,

and quotient map $q_{\mathcal{T}} : \mathbb{T} \rightarrow \mathbb{T}/\mathcal{T}$ is computable. It is easy to see that the quotient space is Hausdorff if, and only if,

t1Tt2U1,U2T,t1U1t2U2U1U2=,t_1 \sim_{\mathcal{T}} t_2 \vee \exists U_1, U_2 \in \mathcal{T}, t_1 \in U_1 \wedge t_2 \in U_2 \wedge U_1 \cap U_2 = \emptyset,

and that if the quotient space is Hausdorff, then we can define a metric by

dT(qT(t1),qT(t2))=sup{d(U1,U2)U1t1U2t2}where d(U1,U2)=inf{d(s1,s2)s1U1s2U2}.\begin{aligned} d_{\mathcal{T}}(q_{\mathcal{T}}(t_1), q_{\mathcal{T}}(t_2)) &= \sup\{d(U_1, U_2) \mid U_1 \ni t_1 \wedge U_2 \ni t_2\} \\ \text{where } d(U_1, U_2) &= \inf\{d(s_1, s_2) \mid s_1 \in U_1 \wedge s_2 \in U_2\}. \end{aligned}

Further, if $\mathcal{T}$ is the preimage of a Hausdorff topology by a continuous function, then the quotient space $\mathbb{T}/\mathcal{T}$ is Hausdorff.

We now give our notion of conditional random variable $Y|\mathcal{X}$ , where $\mathcal{X}$ is a topology on $\Omega$ .

Definition 37 (Conditional random variable). Let $\mathcal{X}$ be a topology on $\Omega$ such that the quotient space $\Omega/\mathcal{X}$ is a Polish space. A random variable taking values in $\mathbb{Y}$ conditional on $\mathcal{X}$ is a continuous function $Y|\mathcal{X} : (\Omega, \mathcal{X}) \rightarrow \mathcal{R}(\mathbb{Y})$ such that $Y|\mathcal{X}(\omega)$ is independent of $\mathcal{X}$ for all $\omega \in \Omega$ .

Note that we do not require that the values of $Y|\mathcal{X}$ be jointly independent of $\mathcal{X}$ .

It is convenient to allow a different space for the conditioning variable and consider functions $Y| : \mathbb{X} \rightarrow \mathcal{R}_{\perp \mathcal{X}}(\mathbb{Y})$ , and write $Y|x$ for $Y|(x)$ . The idea of conditioning is that knowing the value of a $\mathcal{X}$ -measurable random variable in $\mathbb{X}$ , we know the random variable $Y$ .

Proposition 38. *Let $\mathcal{X}$ a topology on $\Omega$ . The operator $\mathcal{R}{\mathcal{X}}(\mathbb{X}) \times (\mathbb{X} \rightarrow \mathcal{R}{\perp \mathcal{X}}(\mathbb{Y})) \rightarrow \mathcal{R}(\mathbb{Y})$ extending the operator on continuous random variables $(\Omega \rightarrow \mathbb{X}) \times (\mathbb{X} \times \Omega \rightarrow \mathbb{Y}) : (X, Y) \mapsto Y|(X(\omega), \omega)$ , is computable.*Proof. For the case that each $Y|x$ is a continuous random variable, note that $Y| : \mathbb{X} \rightarrow (\Omega \rightarrow \mathbb{Y}) \equiv \mathbb{X} \times \Omega \rightarrow \mathbb{Y}$ .

Suppose $X$ is a $\mathcal{X}$ -measurable simple random variable. Let $U \in \mathcal{X}$ , and let $x_1, \dots, x_j$ be the values of $X$ lying in $U$ . Then $\mathbb{P}(X \in U \wedge Y \in V) = \sum_{i=1}^j \mathbb{P}(X = x_i \wedge [Y|x_i] \in V)$ . Since each $Y|x_i$ is independent of $\mathcal{X}$ , $\sum_{i=1}^j \mathbb{P}(X = x_i \wedge [Y|x_i] \in V) = \sum_{i=1}^j \mathbb{P}(X_m = x_i) \mathbb{P}([Y|x_i] \in V) = \int_{\omega \in U} \mathbb{P}([Y|X(\omega)] \in V) d\mathbb{P}X$ .

Since measurable random variables are limits of fast converging Cauchy sequences of continuous random variables, we have $X : \mathbb{N} \times \Omega \rightarrow \mathbb{X}$ and $Y| : \mathbb{X} \times \mathbb{N} \times \Omega \rightarrow \mathbb{Y}$ ; further, we can assume each $X_m$ is a simple random variable. Denote $X(m, \cdot)$ by $X_m$ , $Y|(\cdot, n, \cdot)$ by $Y_n|$ , $Y|(x, n, \cdot)$ by $Y_n|x$ , and $Y|(X_m(\omega), n, \omega) = Y_n|X_m$ .

For fixed $m$ , $X_m$ takes finitely many values $x_{m,1}, \dots, x_{m,k_m}$ , on clopen sets $W_{m,1}, \dots, W_{m,k_m}$ in $\mathcal{X}$ . Since for all $x$ , and for $n_1, n_2 \geq n$ , $d(Y_{n_1}|x, Y_{n_2}|x) < 2^{-n}$ , we have $d(Y_{n_1}|X_m, Y_{n_2}|X_m) < k_m 2^{-n}$ , so $(Y_n|X_m)_{n \in \mathbb{N}}$ is an effective Cauchy sequence, and converges to a random variable $Y|X_m$ .

By the argument of the proof of Theorem 28, for any $n$ there exists $M(n)$ such that $\mathbb{P}(d(Y|X_{m_1}, Y|X_{m_2}) > 2^{-n}) < 2^{-n}$ whenever $m_1, m_2 \geq M(n)$ , so $(Y|X_m)_{m \in \mathbb{N}}$ converges effectively. $\square$

Theorem 39. Let $\mathcal{X}$ a topology on $\Omega$ such that the quotient space $\Omega/\mathcal{X}$ is a Polish space. Then there is a computable embedding of $(\Omega/\mathcal{X} \rightarrow \mathcal{R}_{\perp \mathcal{X}}(\mathbb{Y})) \hookrightarrow \mathcal{R}(\mathbb{Y})$ extending the operator on continuous random variables given by $Y|\mathcal{X} \mapsto Y|\mathcal{X}(\omega)$ .

Proof. Apply Proposition 38 where $\mathbb{X} = \Omega/\mathcal{X}$ and $X$ is the quotient map. $\square$

The following result is an analogue of classical results on conditioning.

Proposition 40. Suppose $Y|\mathcal{X} : \Omega/\mathcal{X} \rightarrow \mathcal{R}_{\perp \mathcal{X}}(\mathbb{Y})$ and $Y : \mathcal{R}(\mathbb{Y})$ the unconditioned version given by Theorem 39. $Y$ is $\mathcal{X}$ -measurable if, and only if, for all $\omega$ , $Y|\mathcal{X}(\omega)$ is a constant random variable. $Y$ is independent of $\mathcal{X}$ if, and only if, $Y|\mathcal{X}(\omega_1) = Y|\mathcal{X}(\omega_2)$ for all $\omega_1, \omega_2 \in \Omega$ .

We cannot compute $Y$ given a conditional random variable $Y|\mathcal{X}$ , and a $\mathcal{X}$ -measurable random variable $X$ taking values in $\mathbb{X}$ , since the value of $X$ might not be sufficient to identify a unique element of $\mathcal{X}$ . However, we can compute the joint distribution of $X \times Y$ using conditional probabilities. We define these in terms of conditional random variables, and show that they satisfy the usual classical properties.

Definition 41. The conditional probability $\mathbb{P}(Y|\mathcal{X})$ is the function $(\Omega, \mathcal{X}) \rightarrow \mathcal{P}(\mathbb{Y})$ defined by

P(YX):(Ω,X)×O(Y)I<:(ω,V)P([YX(ω)]V).(10)\mathbb{P}(Y|\mathcal{X}) : (\Omega, \mathcal{X}) \times \mathcal{O}(\mathbb{Y}) \rightarrow \mathbb{I}_< : (\omega, V) \mapsto \mathbb{P}([Y|\mathcal{X}(\omega)] \in V). \quad (10)

We define $\mathbb{P}(Y \in V|\mathcal{X}) : (\Omega, \mathcal{X}) \rightarrow \mathbb{I}<$ , $\mathbb{P}(Y|x) : \mathcal{O}(\mathbb{Y}) \rightarrow \mathbb{I}$ and $\mathbb{P}(Y \in V|x) : \mathbb{I}<$ in the natural way.

We now show that the joint distribution of $\mathcal{X}$ -measurable $X$ and $Y$ can be computed from $X$ and $Y|\mathcal{X}$ . As a consequence of Theorems 22 and 27, we can define a distribution conditional on a random variable lying in a given set:

Definition 42. Let $X : \mathcal{R}(\mathbb{X})$ be a random variable, and $U \in \mathcal{O}(\mathbb{X})$ . Define the induced valuation on $\Omega$ by

P[XU](V)=P(X×IU×V).P[\cdot|X \in U](V) = \mathbb{P}(X \times I \in U \times V).

Write $P(V|X \in U) = P\cdot|X \in U$ .Note that $P[\cdot|X \in U]$ has total measure $P(U)$ , and if $X$ is continuous, $P\cdot|X \in U = P(X^{-1}(U) \cap V)$ .

Further, if $X$ is $\mathcal{X}$ -measurable, then the projection $\Omega \rightarrow \Omega/\mathcal{X}$ induces a measure on $\Omega/\mathcal{X}$ .

Proposition 43. If $X : \mathcal{R}(\mathbb{X})$ is a $\mathcal{X}$ -measurable, and $Y|\mathcal{X} : (\Omega, \mathcal{X}) \rightarrow \mathcal{R}(\mathbb{Y})$ has values which are independent of $\mathcal{X}$ , then

P(XUYV)=[ω]Ω/XP(YX[ω]V)dP([ω]XU).P(X \in U \wedge Y \in V) = \int_{[\omega] \in \Omega/\mathcal{X}} \mathbb{P}(Y|\mathcal{X}[\omega] \in V) dP([\omega]|X \in U).

Proof. The result clearly holds for simple random variables $X$ , and extends to all random variables. $\square$

Note that if $X : \mathcal{R}(\mathbb{X})$ , and $Y|\mathbb{X} : \mathbb{X} \rightarrow \mathcal{R}(\mathbb{Y})$ , then the distributions $\xi = \mathbb{P}[X]$ and $\eta_x = \mathbb{P}[Y|x]$ are computable as valuation. Define the joint distribution $\xi \rtimes \eta$ on $\mathbb{X} \times \mathbb{Y}$ by its integrals

X×Yψ(x,y)dξη(x,y)=XYψ(x,y)dηx(y)dξ(x)(11)\int_{\mathbb{X} \times \mathbb{Y}} \psi(x, y) d\xi \rtimes \eta(x, y) = \int_{\mathbb{X}} \int_{\mathbb{Y}} \psi(x, y) d\eta_x(y) d\xi(x) \quad (11)

In particular $\xi \rtimes \eta(U \times V) = \int_{x \in U} \eta_x(V) d\xi(x)$ . Proposition 43 strengthens this result by weakening the requirement that $Y$ is defined on $\mathbb{X}$ itself.

Definition 44. Suppose each $Y|\mathcal{X}[\omega]$ is an integrable random variable. The conditional expectation $\mathbb{E}(Y|\mathcal{X})$ is the function $(\Omega, \mathcal{X}) \rightarrow \mathbb{R}$ defined by

E(YX):ωE(YX(ω)).(12)\mathbb{E}(Y|\mathcal{X}) : \omega \mapsto \mathbb{E}(Y|\mathcal{X}(\omega)). \quad (12)

Then for any random variable $X : \mathcal{R}(\mathbb{X})$ , we have $\mathbb{E}[Y|X] : \mathcal{R}(\mathbb{R})$ by composition.

Proposition 45. If $Y|\mathcal{X} : (\Omega, \mathcal{X}) : \mathcal{R}(\mathbb{Y})$ is independent of $\mathcal{X}$ , then for any $\mathcal{X}$ -measurable $X : \mathcal{R}(\mathbb{R})$ , we have

E(XE(YX))=E(XY).(13)\mathbb{E}(X\mathbb{E}(Y|\mathcal{X})) = \mathbb{E}(XY). \quad (13)

In particular, $\mathbb{E}(Y) = \mathbb{E}(\mathbb{E}(Y|\mathcal{X}))$ .

Proof. If $Y|\mathcal{X}$ takes finitely many values, each on sets $A_i \in \mathcal{X}$ , each value is continuous, and $X$ is simple taking values $x_i$ on $A_i$ , then

E(XY)=ωΩX(ω)YX(ω)(ω)dP(ω)=ωΩi=1kI[ωAi]xiYX(Ai)(ω)dP(ω)=i=1kxiωAiYX(Ai)(ω)dP(ω)=i=1kP(Ai)xiE(YX(Ai))=E(XE(YX)).\begin{aligned} \mathbb{E}(XY) &= \int_{\omega \in \Omega} X(\omega) Y|\mathcal{X}(\omega)(\omega) dP(\omega) = \int_{\omega \in \Omega} \sum_{i=1}^k I[\omega \in A_i] x_i Y|\mathcal{X}(A_i)(\omega) dP(\omega) \\ &= \sum_{i=1}^k x_i \int_{\omega \in A_i} Y|\mathcal{X}(A_i)(\omega) dP(\omega) = \sum_{i=1}^k P(A_i) x_i \mathbb{E}(Y|\mathcal{X}(A_i)) = \mathbb{E}(X\mathbb{E}(Y|\mathcal{X})). \end{aligned}

The result follows by extension to measurable random variables. $\square$

In the definition of conditional random variable, we use objects of type $\mathbb{X} \rightarrow \mathcal{R}(\mathbb{Y})$ , which are random-variable-valued functions, rather than random functions with type $\mathcal{R}(\mathbb{X} \rightarrow \mathbb{Y})$ . The latter type encodes strictly more information than the former.

Theorem 46 (Random function). The natural bijection $\mathcal{R}(\mathbb{X} \rightarrow \mathbb{Y}) \hookrightarrow (\mathbb{X} \rightarrow \mathcal{R}(\mathbb{Y}))$ is computable, but its inverse is not continuous.

Proof. For fixed $x$ , evaluation $\varepsilon_x : (\mathbb{X} \rightarrow \mathbb{Y}) \rightarrow \mathbb{Y} : f \mapsto f(x)$ is computable, so by Theorem 28, $\varepsilon(F) : \mathcal{R}(\mathbb{Y})$ is computable for any $F : \mathcal{R}(\mathbb{X} \rightarrow \mathbb{Y})$ given $x$ . Hence the function $x \mapsto \varepsilon_x(F)$ is computable.

Conversely, let $X = {0, 1}^\omega$ and $Y = {0, 1}$ . Define $F(x, \omega, n) = 1$ if $x|_n = \omega|_n$ , and 0 otherwise. Then for fixed $x$ , $F(d(x, \cdot, n), 0) = 2^{-n}$ , so $F(x, \cdot, n)$ converges to 0 uniformly in $x$ .

For fixed $\omega$ , $d(F(\cdot, \omega, n_1), F(\cdot, \omega, n_2)) = \sup_{x \in X} d(F(x, \omega, n_1), F(x, \omega, n_2)) = 1$ , since (for $n_1 < n_2$ ) there exists $x$ such that $x|{n_1} = \omega|{n_1}$ but $x|{n_2} \neq \omega|{n_2}$ . Hence $d(F(\cdot, \cdot, n_1), F(\cdot, \cdot, n_2)) = 1$ for all $n_1, n_2$ , and the sequence is not a Cauchy sequence in $\mathcal{R}(\mathbb{X} \rightarrow \mathbb{Y})$ . $\square$## 5 Discrete-Time Stochastic Processes

A discrete-time stochastic process with state space $\mathbb{X}$ is a random variable $\vec{X} = (X_0, X_1, X_2, \dots)$ taking values in $\mathbb{X}^\infty$ . A Markov process is a stochastic process such that $X_{n+1}$ depends only on the previous state $X_n$ , so is determined by the conditional value $X_{n+1}|X_n$ , such that $X_{n+1}|X_n = x_n$ is independent of $(X_0, X_1, \dots, X_{n-1})$ . A Markov process is stationary if the distributions of $X_{n+1}|X_n$ , i.e. $\mathbb{P}((X_{n+1}|X_n = x_n) \in U)$ , are equal. In this case we can write $\mathbb{P}(X_{n+1}|X_n) = F_n : \mathbb{X} \rightarrow (\Omega \rightsquigarrow \mathbb{X})$ , where $F_n(x, \omega) = F_n(\omega_0, \omega_1, \dots) = F(\omega_n)$ . Hence the process is defined by $F : \mathbb{X} \rightarrow \mathcal{R}(\mathbb{X})$ .

Typically, we are only interested in the distribution of the states $X_n$ , and so rather than treating $X_n$ as a random variable $X_n : \Omega \rightsquigarrow \mathbb{X}$ , we consider $X_n \in \mathcal{P}(\mathbb{X})$ . Then the Markov process is defined by $F : \mathbb{X} \rightarrow \mathcal{P}(\mathbb{X})$ .

When working in Cartesian-closed categories, objects of the form $(\mathbb{X} \rightarrow \mathbb{T}) \rightarrow \mathbb{T}$ for some fixed type $\mathbb{T}$ are an example of a monad [Str72]. They support standard manipulations which make them ideal for the representation of dynamic systems. When $\mathbb{T} = \mathbb{S}$ , the Sierpinski type, we obtain categories of overt and compact sets [Esc04], which form a basis for discrete-time nondeterministic systems [Col09]. Since $\mathcal{P}(\mathbb{X})$ is a subtype of $(\mathbb{X} \rightarrow \mathbb{H}) \rightarrow \mathbb{H}$ we can take $\mathbb{T} = \mathbb{H}$ and obtain the same operators for discrete-time stochastic systems.

Proposition 47. Let $\mathbb{T}$ , $\mathbb{X}$ and $\mathbb{Y}$ be elements of the category of computable types. Then the following operators are computable:

    1. The embedding of $\mathbb{X}$ in $(\mathbb{X} \rightarrow \mathbb{T}) \rightarrow \mathbb{T}$ given by $\delta_x(\phi) = \phi(x)$ for $x \in \mathbb{X}$ and $\phi : \mathbb{X} \rightarrow \mathbb{T}$ .
    1. The canonical equivalence between $\mathbb{X} \rightarrow ((\mathbb{Y} \rightarrow \mathbb{T}) \rightarrow \mathbb{T})$ and $(\mathbb{Y} \rightarrow \mathbb{T}) \rightarrow (\mathbb{X} \rightarrow \mathbb{T})$ given by $F^\psi(x) = F(x)(\psi)$ for $F : \mathbb{X} \rightarrow ((\mathbb{Y} \rightarrow \mathbb{T}) \rightarrow \mathbb{T})$ and $\psi : \mathbb{Y} \rightarrow \mathbb{T}$ .*
    1. An element $f$ of $\mathbb{X} \rightarrow \mathbb{Y}$ lifts to an operator $f_$ from $(\mathbb{X} \rightarrow \mathbb{T}) \rightarrow \mathbb{T}$ to $((\mathbb{Y} \rightarrow \mathbb{T}) \rightarrow \mathbb{T})$ defined by $f_\mu(\psi) = \mu(\psi \circ f)$ for $\mu : (\mathbb{X} \rightarrow \mathbb{T}) \rightarrow \mathbb{T}$ and $\psi : \mathbb{Y} \rightarrow \mathbb{T}$ .
    1. An element $F$ of $\mathbb{X} \rightarrow ((\mathbb{Y} \rightarrow \mathbb{T}) \rightarrow \mathbb{T})$ lifts to an operator $F_$ from $(\mathbb{X} \rightarrow \mathbb{T}) \rightarrow \mathbb{T}$ to $((\mathbb{Y} \rightarrow \mathbb{T}) \rightarrow \mathbb{T})$ defined by $F_\mu(\psi) = \mu(\lambda x. F(x)(\psi))$ .
    1. Given $F : (\mathbb{X} \rightarrow \mathbb{T}) \rightarrow \mathbb{T}$ and $G : (\mathbb{Y} \rightarrow \mathbb{T}) \rightarrow \mathbb{T}$ , the skew-products $F \times G : (\mathbb{X} \times \mathbb{Y} \rightarrow \mathbb{T}) \rightarrow \mathbb{T}$ defined by $(F \times G)(\psi) = F(\lambda x. G(\lambda y. \psi(x, y)))$ . and $(F \ltimes G)(\psi) = G(\lambda y. F(\lambda x. \psi(x, y)))$ .

We write $F \times G$ for the product if $F \times G = F \ltimes G$ for all $F, G$ in some restricted class of interest.

For the case of set types, the embedding $\mathbb{X} \hookrightarrow ((\mathbb{X} \rightarrow \mathbb{T}) \rightarrow \mathbb{T})$ a singleton set; for measures, the point-measure $\delta_x$ . Note that if $F : (\mathbb{X} \rightarrow \mathbb{T}) \rightarrow \mathbb{T}$ and $G : (\mathbb{Y} \rightarrow \mathbb{T}) \rightarrow \mathbb{T}$ , then in general $F \times G$ and $F \ltimes G$ are not equal. Equality (i.e. commutativity of the product) does hold in many important cases, including products of measures. The generalisation to monads $\mathcal{M}(\mathbb{X})$ requires canonical operators $\mathbb{X} \rightarrow \mathcal{M}(\mathbb{X})$ and $(\mathbb{X} \rightarrow \mathcal{M}(\mathbb{Y})) \rightarrow (\mathcal{M}(\mathbb{X}) \rightarrow \mathcal{M}(\mathbb{Y}))$ .

We now apply the standard push-forward operators of Proposition 47 to the case of probability measures. Computability of the operators on $(\mathbb{X} \rightarrow \mathbb{H}) \rightarrow \mathbb{H}$ is clear, it remains to check the linearity properties and the unit total measure.

Lemma 48. There is a computable point-measure operator taking $x \in \mathbb{X}$ to $\delta_x \in \mathcal{P}(\mathbb{X})$ .

Proof. For $\psi : \mathbb{X} \rightarrow \mathbb{H}$ , define $\delta_x(\psi) = \psi(x)$ . Then $\delta_x(\alpha_1\psi_1 + \alpha_2\psi_2) = (\alpha_1\psi_1 + \alpha_2\psi_2)(x) = \alpha_1\psi_1(x) + \alpha_2\psi_2(x) = \alpha_1\delta_x(\psi_1) + \alpha_2\delta_x(\psi_2)$ , and if $\psi \equiv 1$ , then $\delta_x(\psi) = 1$ , so $\delta_x$ is a probability measure. $\square$Proposition 49. *There is a computable push-forward operator taking a function $F : \mathbb{X} \rightarrow \mathcal{P}(\mathbb{Y})$ and $\mu \in \mathcal{P}(\mathbb{X})$ to the push-forward distribution $F_\mu \in \mathcal{P}(\mathbb{Y})$ is computable.

Proof. For $\psi : \mathbb{Y} \rightarrow \mathbb{T}$ , we have $F_\mu(\psi) = \mu(\lambda x. F(x)(\psi))$ is computable. We need to check that $F_\mu$ is a probability measure. It is easy to verify that $F^(\alpha_1\psi_1 + \alpha_2\psi_2) = \alpha_1F^(\psi_1) + \alpha_2F^*(\psi_2)$ , and hence $F_*\mu(\alpha_1\psi_1 + \alpha_2\psi_2) = \alpha_1F_*\mu(\psi_1) + \alpha_2F_\mu(\psi_2)$ . If $\psi \equiv 1$ , then $\phi = F^\psi = \lambda x. F(x)(\psi) \equiv 1$ since for all $x \in \mathbb{X}$ , $\phi(x) = F(x)(\psi)$ and $F(x)$ is a probability measure. Then $\mu(\phi) = 1$ as $\mu$ is a probability measure. $\square$

Corollary 50. If $f : \mathbb{X} \rightarrow \mathbb{Y}$ , then $f$ induces a computable operator $f_ : \mathcal{P}(\mathbb{X}) \rightarrow \mathcal{P}(\mathbb{Y})$ by $f_*\mu = F_*\mu$ where $F(x) = \delta_{f(x)}$ . Explicitly, $f_\mu(\psi) = \mu(\psi \circ f)$ for $\psi : \mathbb{Y} \rightarrow \mathbb{H}$ .

Proposition 51. *The push-forward operator taking a function $F : \mathbb{X} \rightarrow \mathcal{P}(\mathbb{Y})$ and probability measure $\mu \in \mathcal{P}(\mathbb{X})$ to the joint distribution $(\mu, F_*\mu) = (\text{id} \times F)_\mu$ on $\mathbb{X} \times \mathbb{Y}$ is computable

Proof. We have $F : \mathbb{X} \rightarrow ((\mathbb{Y} \rightarrow \mathbb{H}) \rightarrow \mathbb{H})$ and $\mu : ((\mathbb{X} \rightarrow \mathbb{H}) \rightarrow \mathbb{H})$ . Define $(\text{id} \times F)$ to be the function $\mathbb{X} \rightarrow \mathcal{P}(\mathbb{X} \times \mathbb{Y})$ given by $(\text{id} \times F)(x)(\psi) = F(x)(\lambda y. \psi(x, y))$ . Note that if $\psi : \mathbb{X} \times \mathbb{Y} \rightarrow \mathbb{H}$ is the constant function 1, then $\lambda y. \psi(x, y) \equiv 1$ , so $(\text{id} \times F)(x)(\psi) = F(x)(\phi) = 1$ since $F(x)$ is a probability distribution. Then by Proposition 49, $(\text{id} \times F)_*\mu$ is computable in $\mathcal{P}(\mathbb{X} \times \mathbb{Y})$ . $\square$

We first consider the simplest approach to stochastic processes, where we only compute the distribution of the states. A Markov process is then defined by a stochastic update rule $F$ for states of a dynamic system. Given $x \in \mathbb{X}$ , the probability distribution of the next state is $F(x)$ . Denoting the state at time $n$ by a random variable $\mathbb{X}n$ , a Markov process can be written $F(x)(U) = \mathbb{P}(\mathbb{X}{n+1} \in U \mid \mathbb{X}_n = x)$ .

Definition 52. The type of simple Markov processes on a type $\mathbb{X}$ is $\mathbb{X} \rightarrow \mathcal{P}(\mathbb{X})$ .

Since a continuous function $f : \mathbb{X} \rightarrow \mathbb{Y}$ induces a natural operator $F : \mathbb{X} \rightarrow \mathcal{P}(\mathbb{Y})$ by $F(x)(\psi) = \psi(f(x))$ for $\psi : \mathbb{Y} \rightarrow \mathbb{H}$ , any deterministic system can be seen as a stochastic system.

The main result on Markov processes is that given the probability distribution $\mu_0$ of the state $x_0$ at time 0, we can compute the joint probability distributions up to time $n$ .

Theorem 53. Let $F : \mathbb{X} \rightarrow \mathcal{P}(\mathbb{X})$ be a Markov process. Then given a probability distribution $\mu_0$ of the initial state $x_0$ , the probability distributions $\mu_n$ of the state $x_n$ at time $n$ , and the joint probability distribution $\gamma_n$ of the states $(x_0, \dots, x_n)$ up to time $n$ , are computable.

The proof is trivial given the categorical constructions of Proposition 47:

Proof. Compute $\mu_n \in \mathcal{P}(\mathbb{X})$ recursively by $\mu_n = F_*\mu_{n-1}$ , which are computable by Proposition 49. Compute the joint distributions $\gamma_n \in \mathcal{P}(\mathbb{X}^{n+1})$ recursively by $\gamma_0 = \mu_0$ and $\gamma_n = (\text{id} \times F)*\gamma{n-1}$ , which are computable by Proposition 51. $\square$

Note that $\mu_n = (\pi_n)_*\gamma_n$ , where $\pi_n : \mathbb{X}^{n+1} \rightarrow \mathbb{X}$ is given by $\pi_n(x_0, \dots, x_n) = x_n$ ; in other words, the distribution at time $n$ can be extracted from the joint distribution up to time $n$ .

We can also consider the state as a random variable on the base probability space $\Omega$ . This approach yields a random variable $X_n$ for the state at time $n$ .

Definition 54. A parameterised Markov process on a type $\mathbb{X}$ is defined by a conditional random variable $F : \mathbb{X} \rightarrow \mathcal{R}(\mathbb{X})$ and a random variable $X_0 : \mathcal{R}(\mathbb{X})$ .

Given a parameterised Markov process, we can trivially extract the distribution of $X_0$ and the conditional distribution function $\Phi : \mathbb{X} \rightarrow \mathcal{P}(\mathbb{X})$ by $\Phi(x)(\psi) = \mathbb{P}(\lambda \omega. \psi(F(x, \omega)))$ .

The following result shows that a parameterised Markov process gives rise to random variables $X_0, X_1, X_2, \dots$ over the probability space $(\Omega, P)^\omega$ .Theorem 55. If $F : \mathbb{X} \rightarrow (\Omega \rightsquigarrow \mathbb{X})$ is a parameterised Markov process, and $X_{\text{init}} : \Omega \rightsquigarrow \mathbb{X}$ is a random variable giving the initial probability distribution, then we can compute the stochastic process $(X_0, X_1, X_2, \dots)$ as a random variable $\Omega \rightsquigarrow \mathbb{X}^\infty$ .

Proof. Let $\Omega_i$ be a copy of $\Omega$ for each $i \in \mathbb{N}$ , and define $X_n : \Omega_0 \times \Omega_1 \times \dots \times \Omega_n \times \dots \rightarrow \mathbb{X}$ recursively by $X_0(\omega_0, \omega_1, \omega_2, \dots) = X_{\text{init}}(\omega_0)$ and $X_n(\omega_0, \omega_1, \dots, \omega_n, \dots) = F(X_{n-1}(\omega_0, \omega_1, \dots, \omega_{n-1}, \dots))(\omega_n)$ . Then each $X_n$ is computable by computability of random variables from conditional random variables given by Theorem 39. Further, $X_n$ is dependent on $(\omega_0, \omega_1, \dots, \omega_n)$ only. $\square$

6 The Wiener process

The Wiener process $W(t)$ or $W_t$ is a random process such that $W(0) = 0$ , the distribution function $t \mapsto W(t)$ is almost surely continuous in the weak topology, and $W(t)$ has independent increments with $W(t) - W(s) \sim N(0, t - s)$ for $0 \leq s < t$ , where $N(\mu, \sigma^2)$ is the normal distribution with mean $\mu$ and variance $\sigma^2$ . The Wiener process is used in the definition of a stochastic differential equation

dX(t)=f(X(t),t)dt+g(X(t),t)dW(t).dX(t) = f(X(t), t) dt + g(X(t), t) dW(t).

There are many comprehensive books available for continuous-time stochastic processes, notably [Fri75, Eva13]

Theorem 56. A sample path of the Wiener process is almost-surely $\alpha$ -Hölder continuous for all $\alpha < 1/2$ .

The following result on the maximum of the Wiener process up to a given time is based on the André reflection principle.

Theorem 57. Denote by $M(t)$ the maximum of the Wiener process up to time $t$ . Then

P(M(t)X)=2P(W(t)>X).\mathbb{P}(M(t) \geq X) = 2\mathbb{P}(W(t) > X).

There are two main constructions of a Wiener process. The Paley-Wiener construction yields a Wiener process on $[0, 1]$ as

W(t)=A0t+2πn=1Ansin(nπt)nW(t) = A_0 t + \frac{\sqrt{2}}{\pi} \sum_{n=1}^{\infty} A_n \frac{\sin(n\pi t)}{n}

where the $A_n$ are independent $N(0, 1)$ random variables. The simpler Lévy-Ciesielski construction uses wavelets. Let $h_{n,k}$ be the $(n, k)$ -th Haar function, defined for $0 \leq k < 2^n$ by

hn,k(x)={+2n/2for k2ntk+1/22n,2n/2for k+1/22ntk+12n,0otherwise.h_{n,k}(x) = \begin{cases} +2^{n/2} & \text{for } \frac{k}{2^n} \leq t \leq \frac{k+1/2}{2^n}, \\ -2^{n/2} & \text{for } \frac{k+1/2}{2^n} \leq t \leq \frac{k+1}{2^n}, \\ 0 & \text{otherwise.} \end{cases}

Let be $s_{n,k}$ be the $(n, k)$ -th Schauder function defined by

sn,k(t)=0thn,k(τ)dτ.s_{n,k}(t) = \int_0^t h_{n,k}(\tau) d\tau.

Note that $\sup_{t \in [0,1]} s_{n,k}(t) = 2^{-n/2}$ . Let $A_{n,k}$ be a sequence of independent $N(0, 1)$ random variables on a probability space $(\Omega, P)$ . Then

W(t)=n=0k=02n1An,ksn,k(t)W(t) = \sum_{n=0}^{\infty} \sum_{k=0}^{2^n-1} A_{n,k} s_{n,k}(t)is a Wiener process on $[0, 1]$ .

It should be noted that the sum $\sum_{n=0}^{\infty} \sum_{k=0}^{2^n-1} A_{n,k}(\omega) s_{n,k}(t)$ does not converge for all values of the random variables $A_{n,k}$ . However, if the $A_{n,k}$ have growth bounded by $\alpha^{n/2}$ where $\alpha < 2$ , then $\sum_{n=0}^{\infty} \sum_{k=0}^{2^n-1} A_{n,k} s_{n,k}(t)$ converges uniformly. By the Borel-Cantelli lemma, $\mathbb{P}(A_{n,k} \geq 2^{n/2} \text{ i.o.}) = 0$ .

However, given only finitely many values of $A_{n,k}(\omega)$ , we cannot compute a uniform approximation to the sample path $W(\omega)$ , or even an approximation in $L^2([0, 1])$ . In other words, the function $\omega \rightarrow \sum A_{n,k}(\omega) s_{n,k}$ is not a computable function from $\Omega$ to $C([0, 1])$ or $L^2([0, 1])$ . However, it is the case that for any open subset $U$ of $C([0, 1])$ , the probability $P({\omega \mid \sum A_{n,k}(\omega) s_{n,k} \in U})$ is computable in $\mathbb{H}$ . Further, there is a sequence of closed compact subsets $K_n$ of $\Omega$ such that $\mathbb{P}(K_n) \rightarrow 1$ as $n \rightarrow \infty$ and $W$ is computable on each $K_n$ .

We now give a modification of the Lévy-Ciesielski construction with base space $\Omega = {0, 1}^\omega$ for which the Wiener process is a continuous function $W : \Omega \rightarrow C([0, 1])$ . In fact, we obtain sample paths which are Hölder-continuous in $C^\alpha$ for any $\alpha < 1/2$ , though we shall only prove the continuous case.

Theorem 58 (Computable Wiener process). Let $\Omega = {0, 1}^\omega$ and $P$ be the standard probability measure on $\Omega$ . Then there exists a computable Wiener process $W : \Omega \rightarrow C[0, 1]$ with open full measure domain.

Sketch of proof. The basic idea is to modify the Lévy-Ciesielski construction so that after a finite number of bits of information we can bound the size of $A_{n,k}$ for all sufficiently large $n$ .

For the event described by $|A_{n,k}| < n$ whenever $n \geq m$ , we have

n=mk=12nP(An,k<n)1n=m2nP(An,kn)1n=m2n212πnet2/2dt1n=m2n+1en2/412πnet2/4dt1n=m2n+14n41=112m.\begin{aligned} \prod_{n=m}^{\infty} \prod_{k=1}^{2^n} \mathbb{P}(|A_{n,k}| < n) &\geq 1 - \sum_{n=m}^{\infty} 2^n \mathbb{P}(|A_{n,k}| \geq n) \geq 1 - \sum_{n=m}^{\infty} 2^n \cdot 2 \cdot \frac{1}{\sqrt{2\pi}} \int_n^{\infty} e^{-t^2/2} dt \\ &\geq 1 - \sum_{n=m}^{\infty} 2^{n+1} \cdot e^{-n^2/4} \cdot \frac{1}{\sqrt{2\pi}} \int_n^{\infty} e^{-t^2/4} dt \geq 1 - \sum_{n=m}^{\infty} 2^{n+1} \cdot 4^{-n} \cdot 4^{-1} = 1 - \frac{1}{2^m}. \end{aligned}

whenever $m \geq 6$ , since for $n \geq 6$ we have $e^{-n^2/4} < 4^{-n}$ and $\frac{1}{\sqrt{2\pi}} \int_n^{\infty} e^{-t^2/4} < 1/4$ .

We can therefore construct numbers $\beta_{m,n}$ such that $\beta_{m,n} = n$ whenever $n > m$ , $\beta_{m+1,n} \geq \beta_{m,n}$ for all $m, n$ , and

P(n=0,,,k=0,,2n1,An,k<βm,n)=1/2m.\mathbb{P}(\forall n = 0, \dots, \infty, \forall k = 0, \dots, 2^n - 1, |A_{n,k}| < \beta_{m,n}) = 1/2^m.

We now partition a full-measure open subset of $\Omega$ into sets $\Omega_m$ of measure $1/2^{m+1}$ such that every $|A_{n,k}(\omega)| < \beta_{m,n}$ but not every $|A_{n,k}(\omega)| < \beta_{m-1,n}$ whenever $\omega \in \Omega_m$ . On each $\Omega_m$ we can computably construct the corresponding values of $A_{n,k}(\omega)$ . In particular, on $\Omega_m$ , every $A_{n,k}$ is bounded and $|A_{n,k}| < n$ whenever $n > m$ , so

W(ω,t)n=0mk=02n1An,k(ω)sn,k(t)=n=m+1k=02n1An,k(ω)sn,k(t)n=m+1n2n/20 as m.\begin{aligned} |W(\omega, t) - \sum_{n=0}^m \sum_{k=0}^{2^n-1} A_{n,k}(\omega) s_{n,k}(t)| &= \left| \sum_{n=m+1}^{\infty} \sum_{k=0}^{2^n-1} A_{n,k}(\omega) s_{n,k}(t) \right| \\ &\leq \sum_{n=m+1}^{\infty} n \cdot 2^{-n/2} \rightarrow 0 \text{ as } m \rightarrow \infty. \end{aligned}

7 Stochastic integration

A continuous-time real-valued stochastic process defined over the interval $[0, T]$ is a random variable taking values in $\mathcal{C}([0, T]; \mathbb{R})$ . Since the indefinite integral $\mathcal{C}([0, T]; \mathbb{R}) \rightarrow \mathcal{C}([0, T]; \mathbb{R})$taking $\xi$ to the function $t \mapsto \int_0^t \xi(s) ds$ is computable, so is the integral $t \mapsto \int_0^t X(s) ds$ . In stochastic integration, we aim to give a meaning to the integral

0tX(s)dW(s)\int_0^t X(s) dW(s)

for a process $X$ with respect to the Wiener process.

We say that a process $X(t)$ is nonanticipative with respect to the Wiener process if $X(t)$ depends only on $X_0$ and on $W|{[0,t]}$ , the restriction of $W$ to $[0,t]$ . Formally, letting $\mathcal{F}t$ be the topology on $\Omega$ generated by $X_0$ and $W|{[0,t]}$ , then $X|{[0,t]}$ is a limit of $\mathcal{F}_t$ -continuous functions $X_n : \Omega \rightarrow C([0,t]; \mathbb{R})$ .

It turns out that this integral cannot be computed pathwise by the Stieltjes integral. Instead, one uses the Itô integral, which is first defined for step processes, and then extended to continuous processes. In this section, we prove that the standard construction of the Itô integral effectivises.

A stochastic process $X(\cdot)$ is a step process if there are random variables $X_i$ , $i = 0, \dots, n-1$ and times $0 = t_0 < t_1 < \dots < t_n = T$ such that $X(t) = X_i$ for $t \in [t_i, t_{i+1})$ . We formally write $X(t) = X_i I[t \in [t_i, t_{i+1})]$ , where $I[t \in [t_i, t_{i+1})]$ is the indicator function with value 1 if $t \in [t_i, t_{i+1})$ and 0 otherwise. It is straightforward to show that if $\mathbb{E}(X_i^2) < \infty$ for all $i$ , then the step process is well-defined as an element of $M^2(L^2([0,T]; \mathbb{R}))$ , where $L^2(L^2([0,T]; \mathbb{R}))$ is the space of Lesbesgue-integrable functions on $[0,T]$ , and $M^2$ the space of square-integrable random variables.

We first show that given $\xi \in C([0,T]; \mathbb{R})$ , we can compute step functions $\eta$ taking values in the Lesbesgue space $L^2([0,T]; \mathbb{R})$ .

Theorem 59. Given $\xi \in C([0,T]; \mathbb{R})$ , we can compute a sequence of step function $\eta_n : [0,T] \rightarrow \mathbb{R}$ such that $\eta_n \rightarrow \xi$ effectively in $L^2([0,T]; \mathbb{R})$ .

Proof. Choose a sequence $\delta_n > 0$ effectively converging to 0, an choose partitions $\mathcal{T}n = {0 = t{n,0} < t_{n,1} < \dots < t_{n,m_n} = T}$ where each $t_{n,i}$ is computable and $t_{n,i+1} - t_{n,i} < \delta_n$ for all $n, i$ . Compute $\eta_{n,i} = \xi(t_{n,i})$ and define $\eta_n(t) = \eta_{n,i}$ for $t_{n,i} \leq t < t_{n,i+1}$ . Clearly $\eta_n \in L^2([0,T]; \mathbb{R})$ . The integral $\int_{t=0}^T (\xi(t) - \eta_n(t))^2 dt = \sum_{i=0}^{m_n-1} \int_{t_i}^{t_{i+1}} (\xi(t) - \xi(t_{n,i}))^2 dt$ is computable, and converges to 0 as $n \rightarrow \infty$ by continuity of $\xi$ , so convergence is effective. $\square$

Note that continuity of $\xi$ is required to compute $\xi(t_i)$ , but we do not need to know the modulus of continuity to compute the rate of convergence of $\eta_n$ . By Theorem 28, this pathwise computation extends to random variables, and it is clear that if $X$ is nonanticipative with respect to $W$ , then so are the step processes $X_n$ .

Definition 60 (Itô integral for step processes). Given a step process $X = \sum_{i=0}^{n-1} X_i I[t \in [t_i, t_{i+1})]$ , we define the Itô integral as

0TX(t)dW(t)=i=0n1Xi(W(ti+1)W(ti)).\int_0^T X(t) dW(t) = \sum_{i=0}^{n-1} X_i (W(t_{i+1}) - W(t_i)).

This definition can be extended to an indefinite integral: Take $m(s) = \max{i \mid t_i < s}$ and define

0tX(t)dW(t)=i=0m(s)1Xi(W(ti+1)W(ti))+Xm(s)(W(s)W(tm(s))).(14)\int_0^t X(t) dW(t) = \sum_{i=0}^{m(s)-1} X_i (W(t_{i+1}) - W(t_i)) + X_{m(s)} (W(s) - W(t_{m(s)})). \quad (14)

Lemma 61. *The Itô integral of a step process is computable as a continuous process. Further, if $X(\cdot)$ is nonanticipative with respect to the Wiener process, then so is its Itô integral.*Proof. By the defining equation (14), $W(t)$ is continuous when restricted to each interval $[t_i, t_{i+1}]$ , and clearly the integral is continuous over the step boundaries. It is also clear that the Itô integral is nonanticipative, at time $t$ since it depends only on $W(s)$ for $s \leq t$ . $\square$

The following Itô equality is crucial, since it relates the stochastic integral with an ordinary integral.

Lemma 62. If $X = \sum X_k \chi_{[t_k, t_{k+1})}$ is a step process, and $X_k$ is independent of $W(t) - W(s)$ for all $t > s > t_k$ , then

E(0TX(t)dW(t))2=E0TX(t)2dt.(15)\mathbb{E}\left(\int_0^T X(t) dW(t)\right)^2 = \mathbb{E} \int_0^T X(t)^2 dt. \quad (15)

Proof.

E(0TX(t)dW(t))2=E(0TX(s)dW(s)0TX(t)dW(t))=E(i=0m1Xi(W(ti+1)W(ti))j=0m1Xj(W(tj+1)W(tj)))=i,j=0m1E(XiXj(W(ti+1)W(ti))(W(tj+1)W(tj)))\begin{aligned} \mathbb{E}\left(\int_0^T X(t) dW(t)\right)^2 &= \mathbb{E}\left(\int_0^T X(s) dW(s) \int_0^T X(t) dW(t)\right) \\ &= \mathbb{E}\left(\sum_{i=0}^{m-1} X_i (W(t_{i+1}) - W(t_i)) \sum_{j=0}^{m-1} X_j (W(t_{j+1}) - W(t_j))\right) \\ &= \sum_{i,j=0}^{m-1} \mathbb{E}(X_i X_j (W(t_{i+1}) - W(t_i))(W(t_{j+1}) - W(t_j))) \end{aligned}

If $i < j$ , then since $X_i, X_j$ are independent of $(W(t_{j+1}) - W(t_j))$ ,

E(XiXj(W(ti+1)W(ti))(W(tj+1)W(tj)))=E(XiXj(W(ti+1)W(ti)))E(W(tj+1)W(tj))=0\begin{aligned} &\mathbb{E}(X_i X_j (W(t_{i+1}) - W(t_i))(W(t_{j+1}) - W(t_j))) \\ &= \mathbb{E}(X_i X_j (W(t_{i+1}) - W(t_i))) \mathbb{E}(W(t_{j+1}) - W(t_j)) = 0 \end{aligned}

and a similar estimate holds for $i > j$ . Hence

E(0TX(t)dW(t))2=i=0m1E(Xi2(W(ti+1)W(ti))2)=i=0m1E(Xi2)E(W(ti+1)W(ti))2=i=0m1E(Xi2)(ti+1ti)=E(i=0m1Xi2(ti+1ti))=E(0TX(t)2dt).\begin{aligned} \mathbb{E}\left(\int_0^T X(t) dW(t)\right)^2 &= \sum_{i=0}^{m-1} \mathbb{E}(X_i^2 (W(t_{i+1}) - W(t_i))^2) = \sum_{i=0}^{m-1} \mathbb{E}(X_i^2) \mathbb{E}(W(t_{i+1}) - W(t_i))^2 \\ &= \sum_{i=0}^{m-1} \mathbb{E}(X_i^2) (t_{i+1} - t_i) = \mathbb{E}\left(\sum_{i=0}^{m-1} X_i^2 (t_{i+1} - t_i)\right) \\ &= \mathbb{E}\left(\int_0^T X(t)^2 dt\right). \end{aligned}

$\square$

In order to bound the expected maximum value along a path, we will use martingale properties of the integrated process.

Definition 63 ((Sub)martingale). A discrete stochastic process $X$ is a martingale if for all $k$ , $\mathbb{E}(|X_k|) < \infty$ and $\mathbb{E}(X_{k+1}|\mathcal{F}k) = X_k$ , and a submartingale if for all $k$ , $\mathbb{E}(|X_k|) < \infty$ and $\mathbb{E}(X{k+1}|\mathcal{F}_k) \geq X_k$ .

A stochastic process $X$ is a martingale if for all $t$ , $\mathbb{E}(|X_t|) < \infty$ and for all $t > s$ , $\mathbb{E}(X_t|X_s) = X_s$ , and a submartingale if $\mathbb{E}(X_t|X_s) \geq X_s$ .

We can give sufficient conditions for a process to be a (sub)martingale avoiding the use of conditional expectation. We say $X$ has independent increments if for any $t_0 < t_1 < \dots < t_n$ , the increments $X_{t_i} - X_{t_{i-1}}$ for $i = 1, \dots, n$ are all independent. If $X$ has independent increments, then $X$ is a martingale if $\mathbb{E}(X_t - E_s) = 0$ whenever $t > s$ and a submartingale if $\mathbb{E}(X_t - X_s) \geq 0$ .Lemma 64. The Itô integral of a step process has independent increments with zero expectation, so is a martingale.

Proof. Let $Y$ be the integrated process $Y(t) = \int_0^t X(s)dW(s) = Y(s) + \int_s^t X(r) dW(r)$ . Then

Y(t)=Y(s)+Xm(s)(W(tm(s)+1)W(s))+i=m(s)m(t)1Xi(W(ti+1)W(ti))Y(s)+Xm(t)(W(t)W(tm(t))).Y(t) = Y(s) + X_{m(s)}(W(t_{m(s)+1}) - W(s)) + \sum_{i=m(s)}^{m(t)-1} X_i(W(t_{i+1}) - W(t_i))Y(s) \\ + X_{m(t)}(W(t) - W(t_{m(t)})).

Since $W(t_3) - W(t_2)$ is independent of $X(t_1)$ whenever $t_1 < t_2 < t_3$ , we have $\mathbb{E}(Y(t)) = \mathbb{E}(Y(s))$ or $\mathbb{E}(Y(t) - Y(s)) = 0$ . $\square$

Lemma 65. If $(X_k){k=1,\dots,m}$ is a martingale and $\phi$ is convex, then $(\phi(X_k)){k=1,\dots,m}$ is a submartingale. Similarly, if $X(\cdot)$ is a martingale, then $\phi(X(\cdot))$ is a submartingale.

Proof. By Jensen's inequality, $\mathbb{E}(\phi(X_{k+1})|\mathcal{F}k) \geq \phi(\mathbb{E}(X{k+1}|\mathcal{F}_k)) = \phi(\mathbb{E}(X_k))$ . $\square$

We now give some estimates known as martingale inequalities which will allow us to compute limits of processes in $C([0, T]; \mathbb{R})$ .

Lemma 66 (Discrete submartingale inequality). Let $(X_k)$ be a discrete positive submartingale, and $Y_n = \max_{k=1,\dots,n} X_k$ . Then for any $\lambda > 0$ ,

λP(Ynλ)E(XnI[Ynλ])EXn.(16)\lambda \mathbb{P}(Y_n \geq \lambda) \leq \mathbb{E}(X_n I[Y_n \geq \lambda]) \leq \mathbb{E}X_n. \quad (16)

Note that the expression $\mathbb{E}(X_n I[Y_n \geq \lambda])$ makes sense since the function $\max : \mathbb{R}^n \rightarrow \mathbb{R}$ is computable and the characteristic function $\xi_{[\lambda, \infty)}$ is computable $\mathbb{R} \rightarrow [0, 1]>$ for a given $\lambda$ . Hence $\mathbb{E}(X_n I[Y_n \geq \lambda])$ is computable as the upper integral $\mathbb{E}>(f(X_1, \dots, X_n))$ where $f = \chi_{\geq \lambda} \circ \max$ .

The basic idea of the proof is to consider events $X_k \geq \lambda$ and for all $i < k$ , $X_i < \lambda$ . However, these sets of events are neither closed nor open, so we cannot directly probabilities or expectations.

Proof. Let $A$ be the event $\max_{k=1,\dots,n} X_k \geq \lambda$ . For fixed $\delta > 0$ , let $A_{\delta,k}$ be the event $\bigwedge_{i=1}^{k-1} (X_i \leq \lambda - \delta) \wedge (X_k \geq \lambda)$ , and $A_\delta$ the event $\bigcup_{k=1}^n A_{\delta,k}$ . Note that $A = \bigcup_{\delta>0} A_\delta$ . Since $A_{\delta,k}$ holds on a closed set, the characteristic function is upper semicontinuous and we can consider the upper horizontal integral of $\chi_{A_{\delta,k}}$ . Since $X_n - X_k$ is independent of $X_1, \dots, X_k$ for $k < n$ , and hence independent of $A_{\delta,k}$ , we have

E(XnχAδ,k)=E((XnXk)χAδ,k)+E(XkχAδ,k)=E(XnXk)E(χAδ,k)+E(XkχAδ,k)E(XkχAδ,k).\mathbb{E}(X_n \chi_{A_{\delta,k}}) = \mathbb{E}((X_n - X_k) \chi_{A_{\delta,k}}) + \mathbb{E}(X_k \chi_{A_{\delta,k}}) \\ = \mathbb{E}(X_n - X_k) \mathbb{E}(\chi_{A_{\delta,k}}) + \mathbb{E}(X_k \chi_{A_{\delta,k}}) \geq \mathbb{E}(X_k \chi_{A_{\delta,k}}).

Then summing probabilities over each $A_{\delta,k}$ gives

λP(XAδ)k=1nλP(Aδ,k)=k=1nE(λχAδ,k)k=1nE(XkχAδ,k)k=1nE(XnχAδ,k)=E(k=1nXnχAδ,k)=E(XnχAδ)\lambda \mathbb{P}(X \in A_\delta) \leq \sum_{k=1}^n \lambda \mathbb{P}(A_{\delta,k}) = \sum_{k=1}^n \mathbb{E}(\lambda \chi_{A_{\delta,k}}) \\ \leq \sum_{k=1}^n \mathbb{E}(X_k \chi_{A_{\delta,k}}) \leq \sum_{k=1}^n \mathbb{E}(X_n \chi_{A_{\delta,k}}) = \mathbb{E}(\sum_{k=1}^n X_n \chi_{A_{\delta,k}}) = \mathbb{E}(X_n \chi_{A_\delta})

By Lemma 4, $\mathbb{P}(X \in A_\delta) \rightarrow \mathbb{P}(X \in A)$ as $\delta \rightarrow 0$ , and the result follows. $\square$

We can also show the standard result that if $X_t$ is a martingale, and $\mathbb{E}|X_T|^\alpha < \infty$ for some $\alpha \geq 1$ , then $\mathbb{P}(\max_{t \leq T} |X_t| \geq \epsilon) \leq \frac{1}{\epsilon^\alpha} \mathbb{E}|X_T|^\alpha$ , but will not need this in the sequel. Instead, we use the following extension to square-integrable martingales:

Lemma 67 (Discrete integrable martingale inequality). Let $(X_k)_{k=1,\dots,n}$ be a discrete martingale. Then

E(maxk=1,,nXk2)4E(Xn2).(17)\mathbb{E}(\max_{k=1,\dots,n} |X_k|^2) \leq 4\mathbb{E}(X_n^2). \quad (17)Proof. Define $Y = \max_{k=1,\dots,n} |X_k|$ and $Z = X_n$ . By the stronger form of the submartingale inequality of Lemma 66, we have $\lambda \mathbb{P}(Y \geq \lambda) \leq \mathbb{E}(I[Y \geq \lambda]Z)$ . Since $Y = \int_0^\infty I[Y \geq \lambda] d\lambda$ and $Y^2 = 2 \int_0^\infty \lambda I[Y \geq \lambda] d\lambda$ , we have

EY2=2E0λI[Yλ]dλ2E0I[Yλ]Zdλ=2E(Z0I[Yλ]dλ)=2E(ZY).\mathbb{E}Y^2 = 2\mathbb{E} \int_0^\infty \lambda I[Y \geq \lambda] d\lambda \leq 2\mathbb{E} \int_0^\infty I[Y \geq \lambda] Z d\lambda = 2\mathbb{E} \left( Z \int_0^\infty I[Y \geq \lambda] d\lambda \right) = 2\mathbb{E}(ZY).

Hölder's inequality gives $\mathbb{E}(YZ) \leq (\mathbb{E}(Y^2))^{1/2} (\mathbb{E}(Z^2))^{1/2}$ . Therefore we have $\mathbb{E}(Y^2) \leq 2\mathbb{E}(YZ) \leq 2(\mathbb{E}(Y^2))^{1/2} (\mathbb{E}(Z^2))^{1/2}$ . Hence $(\mathbb{E}Y^2)^{1/2} \leq 2(\mathbb{E}Z^2)^{1/2}$ yielding $\mathbb{E}Y^2 \leq 4\mathbb{E}Z^2$ . $\square$

The results above on discrete (sub)martingales extend to continuous processes. We will require:

Theorem 68 (Integrable martingale inequality). Let $X(\cdot)$ be a martingale. Then

E(maxt[0,T]X(t)2)4E(X(T)2).(18)\mathbb{E}(\max_{t \in [0,T]} |X(t)|^2) \leq 4\mathbb{E}(X(T)^2). \quad (18)

Proof. Let ${t_1, t_2, \dots}$ be a dense subset of $[0, T]$ . Then by Lemma 67, $\mathbb{E}(\max_{t \in [0,T]} |X(t)|^2) = \lim_{n \rightarrow \infty} \mathbb{E}(\max_{k=1,\dots,n} |X(t_k)|^2) \leq \lim_{n \rightarrow \infty} 4\mathbb{E}(X(T)^2) = 4\mathbb{E}(X(T)^2)$ . $\square$

Combining these results, we see that for nonanticipative step processes $X(\cdot)$ , the Itô integral $\int_{s=0}^t X(s) dW(s)$ is computable and is a continuous martingale. The Itô equality

E(0TX(t)dW(t))2=E(0TX(t)dt)2.\mathbb{E} \left( \int_0^T X(t) dW(t) \right)^2 = \mathbb{E} \left( \int_0^T X(t) dt \right)^2.

shows that the integrated process is a square-integrable random variable, and the martingale inequality

E(maxt[0,T]0tX(s)dW(s)2)4E(0TX(t)dW(t))2\mathbb{E} \left( \max_{t \in [0,T]} \left| \int_0^t X(s) dW(s) \right|^2 \right) \leq 4\mathbb{E} \left( \int_0^T X(t) dW(t) \right)^2

gives uniform bounds on the integrated process. Combining these inequalities gives

E(maxt[0,T]0tX(s)dW(s)2)4E(0TX(t)dt)2.\mathbb{E} \left( \max_{t \in [0,T]} \left| \int_0^t X(s) dW(s) \right|^2 \right) \leq 4\mathbb{E} \left( \int_0^T X(t) dt \right)^2.

In terms of norms on the processes, we have

XdW,22X2,2.\| \int X dW \|_{\infty,2} \leq 2 \| X \|_{2,2}.

Now if $X_n$ is a Cauchy sequence of step processes converging effectively to $X_\infty$ in the $| \cdot |_{2,2}$ norm, we have form $m \geq n$ ,

XmXndW,22XmXn2,22(XmX2,2+XnX2,2).\| \int X_m - X_n dW \|_{\infty,2} \leq 2 \| X_m - X_n \|_{2,2} \leq 2(\| X_m - X_\infty \|_{2,2} + \| X_n - X_\infty \|_{2,2}).

Hence $\int X_n dW$ converges effectively in the $| \cdot |_{\infty,2}$ norm. Further, for continuous processes

0TX(t)2dtTmaxt[0,T](X(t)2)=T(maxt[0,T]X(t))2,\int_0^T X(t)^2 dt \leq T \max_{t \in [0,T]} (X(t)^2) = T(\max_{t \in [0,T]} |X(t)|)^2,

Theorem 69 (Computability of the Itô integral). If $X$ is a square-integrable step process, then $\int X dW$ is a continuous process computable from $X$ . If $(X_n){n \in \mathbb{N}}$ is a sequence of step processes converging effectively to $X_\infty$ in the $| \cdot |{2,2}$ norm, then $\int X_n dW$ is a Cauchy sequence converging effectively in the $| \cdot |_{\infty,2}$ norm to a process which we define as $\int X_\infty dW$ . Further, if $X$ is a continuous process, then

XdW,22TX,2.\| \int X dW \|_{\infty,2} \leq 2\sqrt{T} \| X \|_{\infty,2}.Remark 70. If $X$ is a continuous computable non-anticipative process such that $|X|_{\infty,2} < \infty$ , then $\int X dW$ is computable in $\mathcal{R}(\mathcal{C}([0,1] \rightarrow \mathbb{R}))$ from process $X$ .

Remark 71. Also, Theorem 69, which asserts the computability of the Itô integral, is much more general than the author seems to imply. First, there are some common Itô-integrable processes, such as $\text{sign } W(t)$ , which are neither step processes nor have continuous (even right-continuous) paths. Moreover, it is implicit in the author's work how to handle these processes computably. The space of square $W$ -integrable processes (up to some basic equivalence) is just the closed subspace of $L^2(\Omega \times [0, T], P \otimes dt)$ spanned by the nonanticipative step processes relative to $W$ . Then it is clear that for any $X$ in this space, $X$ is a function of $W$ , and $X_t(\omega) = X(W(\omega), t)$ is a random variable (computable from $W$ and $X$ ) where $(X, W)$ has the desired joint distribution with $W$ .

8 Stochastic differential equations

We now consider stochastic differential equations

dX(t)=f(X(t))dt+g(X(t))dW(t);X(0)=X0.dX(t) = f(X(t)) dt + g(X(t)) dW(t); \quad X(0) = X_0.

The integral form is

X(t)=X0+0tf(X(s))ds+0tg(X(s))dW(s)=:J[X](t).X(t) = X_0 + \int_0^t f(X(s)) ds + \int_0^t g(X(s)) dW(s) =: J[X](t).

We assume $f, g : \mathbb{R} \rightarrow \mathbb{R}$ are Lipschitzian functions with constants $K$ and $L$ , respectively. We first assume $X_0 \in M^2(\mathbb{R})$ is square-integrable, though the case of a constant $X_0 = x_0$ will actually suffice.

Lemma 72. Suppose $f : \mathbb{R} \rightarrow \mathbb{R}$ is a Lipschitzian function with constant $K$ , and suppose $Y, Z$ are stochastic processes with $Y(0) = Z(0)$ . Then

d2,(f(Y),f(Z))Kd2,(Y,Z).(19)d_{2,\infty}(f(Y), f(Z)) \leq K d_{2,\infty}(Y, Z). \quad (19)

Proof. $\mathbb{E}(\max_{t \in [0, T]} |f(Y(t)) - f(Z(t))|)^2 \leq \mathbb{E}(\max_{t \in [0, T]} K |Y(s) - Z(s)|)^2 \leq K^2 \mathbb{E}(\max_{t \in [0, T]} |Y(s) - Z(s)|)^2$ . $\square$

Lemma 73 (Computability of non-stochastic integrals). The non-stochastic integral $t \mapsto \int_0^t X(s) ds$ of a continuous stochastic process is computable. Further,

d2,(Ydt,Zdt)Td2,(Y,Z).(20)d_{2,\infty}(\int Y dt, \int Z dt) \leq T d_{2,\infty}(Y, Z). \quad (20)

Proof. Computability is immediate from computability of non-random integrals. Further, we have

E(maxt[0,T]0tX(s)ds2)E(maxt[0,T]0tX(s)ds)2E(0TX(t)dt)2E(Tmaxt[0,T]X(t))2=T2E(maxt[0,T]X(t))2.\begin{aligned} \mathbb{E}\left(\max_{t \in [0, T]} \left| \int_0^t X(s) ds \right|^2\right) &\leq \mathbb{E}\left(\max_{t \in [0, T]} \int_0^t |X(s)| ds\right)^2 \leq \mathbb{E}\left(\int_0^T |X(t)| dt\right)^2 \\ &\leq \mathbb{E}(T \max_{t \in [0, T]} |X(t)|)^2 = T^2 \mathbb{E}(\max_{t \in [0, T]} |X(t)|)^2. \end{aligned} \quad \square

For stochastic integrals, we recall from Section 7 that

E(maxt[0,T]0tX(s)dW(s)2)4TE(maxr[0,T]X(r))2\mathbb{E}\left(\max_{t \in [0, T]} \left| \int_0^t X(s) dW(s) \right|^2\right) \leq 4T \mathbb{E}(\max_{r \in [0, T]} |X(r)|)^2so

d2,(YdW,ZdW)2Td2,(Y,Z).d_{2,\infty}(\int Y dW, \int Z dW) \leq 2\sqrt{T}d_{2,\infty}(Y, Z).

Now

J[Y](t)J[Z](t)=0tf(Y(s))f(Z(s))ds+0tg(Y(s))g(Z(s))dW(s)J[Y](t) - J[Z](t) = \int_0^t f(Y(s)) - f(Z(s)) ds + \int_0^t g(Y(s)) - g(Z(s)) dW(s)

The expected square of the supremum of this quantity is

d2,(J[Y],J[Z])=J[Y]J[Z]2,f(Y)f(Z)dt2,+g(Y)g(Z)dW2,=d2,(f(Y)dt,f(Z)dt)+d2,(g(Y)dW,g(Z)dW)Td2,(f(Y),f(Z))+2Td2,(g(Y),g(Z))(TK+2TL)d2,(Y,Z).\begin{aligned} d_{2,\infty}(J[Y], J[Z]) &= \|J[Y] - J[Z]\|_{2,\infty} \leq \|\int f(Y) - f(Z) dt\|_{2,\infty} + \|\int g(Y) - g(Z) dW\|_{2,\infty} \\ &= d_{2,\infty}(\int f(Y) dt, \int f(Z) dt) + d_{2,\infty}(\int g(Y) dW, \int g(Z) dW) \\ &\leq Td_{2,\infty}(f(Y), f(Z)) + 2\sqrt{T}d_{2,\infty}(g(Y), g(Z)) \leq (TK + 2\sqrt{T}L)d_{2,\infty}(Y, Z). \end{aligned}

Hence we have

d2,(J[Y]J[Z])(KT+2LT)d2,(Y,Z).d_{2,\infty}(J[Y] - J[Z]) \leq (KT + 2L\sqrt{T}) d_{2,\infty}(Y, Z).

Taking $T < \min(1/2K, 1/16L^2)$ gives $(KT + 2L\sqrt{T}) < 1$ , so

d2,(J[Y],J[Z])κd2,(Y,Z)d_{2,\infty}(J[Y], J[Z]) \leq \kappa d_{2,\infty}(Y, Z)

where $\kappa < 1$ .

The integral form of the equation i.e. the Picard operator, is therefore a contraction map for small enough $T$ . Taking $X_0$ to be the constant process, $X_0(t) \equiv X_0$ , and setting $X_{n+1} = J[X_n]$ for all $n \in \mathbb{N}$ gives an effectively convergent subsequence. The initial difference is given by

d2,(X1X0)κ(EX02)1/2.d_{2,\infty}(X_1 - X_0) \leq \kappa (\mathbb{E}X_0^2)^{1/2}.

By joining together computations of the solution for small enough $T$ , we obtain:

Theorem 74 (Computability of Lipschitz stochastic differential equations). Consider the stochastic differential equation

dX(t)=f(X(t))dt+g(X(t))dW(t);X(0)=X0dX(t) = f(X(t)) dt + g(X(t)) dW(t); \quad X(0) = X_0

where $f, g : \mathbb{R} \rightarrow \mathbb{R}$ are Lipschitz and $X_0 \in R(\mathbb{R})$ . Then the solution $X(t)$ is computable as a random variable taking values in $C([0, \infty); \mathbb{R})$ .

Proof. Let $K$ be the Lipschitz constant for $f, g$ , and choose $T < \min(4, 1/16K^2)$ . For a given $x_0$ , the solution with initial condition $X(0) = x_0$ is computable in $M^2(C([0, T]; \mathbb{R}))$ , and hence in $R(C([0, T]; \mathbb{R}))$ . Hence the solution operator given initial condition $x_0 \in \mathbb{R}$ is computable $\mathbb{R} \rightarrow R(C([0, T]; \mathbb{R}))$ . For an initial condition which is a probability distribution over $\mathbb{R}$ , the solution is computable over $[0, T]$ by Theorem 69. Then the random variable $X(T)$ is computable by projection. The result follows by recursively computing $X$ over the intervals $[kT, (k+1)T]$ . $\square$

9 Conclusions

In this paper, we have developed a theory of probability, random variables and stochastic processes which is sufficiently powerful to effectively compute the solution of stochastic differential equations. The theory uses type-two effectivity to provide an underlying machine model of computation, but is largely developed using type theory in the cartesian-closed category of quotients of countably-based spaces, which has an effective interpretation. The approach extends existing work on probability via valuations and random variables in metric spaces via limits ofCauchy sequences. Ultimately, we hope that this work will form a basic for practical software tools for the rigorous computational analysis of stochastic systems.

Acknowledgement: The author would like to thank Bas Spitters for many interesting discussions on measurable functions and type theory, and pointing out the connection with monads.

References

  • [AM02] Mauricio Alvarez-Manilla. Extension of valuations on locally compact sober spaces. Topology Appl., 124:397–433, 2002.
  • [BB85] Errett Bishop and Douglas Bridges. Constructive analysis, volume 279 of Grundlehren der Mathematischen Wissenschaften. Springer, 1985.
  • [BC72] Errett Bishop and Henry Cheng. Constructive measure theory. American Mathematical Society, 1972.
  • [Bra01] Vasco Brattka. Computable versions of Baire’s category theorem. In Proc. 26th International Symposium on Mathematical Foundations of Computer Science, pages 224–235. Springer, 2001.
  • [Bra05] Vasco Brattka. Effective Borel measurability and reducibility of functions. Math. Logic Quarterly, 51:19–44, 2005.
  • [Cha72] Yuen-Kwok Chan. A constructive approach to the theory of stochastic processes. Transactions of the American Mathematical Society, 165:37–44, 1972.
  • [Col09] Pieter Collins. Computable types for dynamic systems. In Proceedings of the Fifth Conference on Computability in Europe, 2009.
  • [CS09] Thierry Coquand and Bas Spitters. Integrals and valuations. J. Logic Analysis, 1(3):1–22, 2009.
  • [Dob07] Ernst-Erich Doberkat. Stochastic Relations: Foundations for Markov Transition Systems. Chapman & Hall, 2007.
  • [Eda95a] Abbas Edalat. Domain theory and integration. Theor. Comput. Sci., 151:163–193, November 1995.
  • [Eda95b] Abbas Edalat. Dynamical systems, measures, and fractals via domain theory. Inf. Comput., 120:32–48, July 1995.
  • [Esc04] Martín Escardó. Synthetic topology. Electron. Notes Theor. Comput. Sci., 87:21–156, November 2004.
  • [Esc09] Martín Escardó. Semi-decidability of may, must and probabilistic testing in a higher-type setting. Electron. Notes Theor. Comput. Sci., 249:219–242, August 2009.
  • [Eva13] Craig Evans. An Introduction to Stochastic Differential Equations. AMS, 2013.
  • [Fri75] Avner Friedman. Stochastic Differential Equations. Dover, 1975.
  • [GLV11] Jean Goubault-Larrecq and Daniele Varacca. Continuous random variables. In Proceedings of the 2011 IEEE 26th Annual Symposium on Logic in Computer Science, pages 97–106, Washington, DC, USA, 2011.- [GS04] I. Gikhman and A. Skorohod. The Theory of Stochastic Processes 1. Springer, 2004.
  • [HR09] Mathieu Hoyrup and Cristóbal Rojas. Computability of probability measures and Martin-Löf randomness over metric spaces. Information and Computation, 207:830–847, 2009.
  • [HRW11] Mathieu Hoyrup, Cristóbal Rojas, and Klaus Weihrauch. Computability of the radon-nikodym derivative. In Benedikt Löwe, Dag Normann, Ivan Soskov, and Alexandra Soskova, editors, Models of Computation in Context, volume 6735 of Lecture Notes in Computer Science, pages 132–141. Springer, 2011.
  • [JP89] C. Jones and G. Plotkin. A probabilistic powerdomain of evaluations. In Proceedings of the Fourth Annual Symposium on Logic in computer science, pages 186–195, Piscataway, NJ, USA, 1989.
  • [Ker08] Götz Kersting. Random variables — without basic space. In J. Blath, P. Mörters, and M. Scheutzow, editors, Trends in Stochastic Analysis. Cambridge University Press, 2008.
  • [Kön97] H. König. Measure and Integration. Springer-Verlag, 1997.
  • [Law04] Jimmie D. Lawson. Domains, integration and ‘positive analysis’. Mathematical Structures in Comp. Sci., 14:815–832, December 2004.
  • [Mis07] Michael Mislove. Discrete random variables over domains. Theor. Comput. Sci., 380:181–198, July 2007.
  • [MW43] H.B. Mann and A. Wald. On stochastic limit and order relationships. Ann. Math. Statistics, 14(3):217–226, 1943.
  • [OS10] Russell O’Connor and Bas Spitters. A computer-verified monadic functional implementation of the integral. Theor. Comput. Sci., 411(37):3386–3402, 2010.
  • [Roh52] V. A. Rohlin. On the fundamental ideas of measure theory, volume 71 of Translations. American Mathematical Society, 1952. Translated from Russian.
  • [Sch07] Matthias Schröder. Admissible representations of probability measures. Electron. Notes Theor. Comput. Sci., 167:61–78, January 2007.
  • [Sch09] Matthias Schröder. An effective Tietze-Urysohn theorem for QCB-spaces. J. Univers. Comput. Sci., 15(6):1317–1336, 2009.
  • [Shi95] Al’bert Nikolaevich Shiryaev. Probability. Springer, 1995.
  • [Spi03] Bas Spitters. Constructive and intuitionistic integration theory and functional analysis. PhD thesis, Katholieke Universiteit Nijmegen, 2003.
  • [Spi06] Bas Spitters. Constructive algebraic integration theory. Ann. Pure Appl. Logic, 137(1–3):380–390, 2006.
  • [SS06] Matthias Schröder and Alex Simpson. Probabilistic observations and valuations. Electron. Notes Theor. Comput. Sci., 155:605–615, May 2006.
  • [Str72] Ross Street. The formal theory of monads. J. Pure Appl. Math., 2:149–168, 1972.[Tix95] R. Tix. Stetige Bewertungen auf topologischen Räumen. PhD thesis, Master's Thesis, Technische Universität Darmstadt, 1995.

[Var02] Daniele Varacca. The powerdomain of indexed valuations. In Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science, pages 299–, Washington, DC, USA, 2002.

[Vic08] Steven Vickers. A localic theory of lower and upper integrals. Math. Log. Quart., 54(1):109–123, 2008.

[Vic11] Steven Vickers. A monad of valuation locales. http://www.cs.bham.ac.uk/~sjv/Riesz.pdf, 2011.

[Wei99] Klaus Weihrauch. Computability on the probability measures on the Borel sets of the unit interval. Theor. Comput. Sci., 219:421–437, May 1999.

[WI81] S. Watanabe and N. Ikeda. Stochastic Differential Equations and Diffusion Processes. North-Holland, 1981.

[YMT99] M. Yasugi, T. Mori, and Y. Tsujii. Effective properties of sets and functions in metric spaces with computability structure. Theor. Comput. Sci., 219(1-2):467–486, 1999.

Xet Storage Details

Size:
120 kB
·
Xet hash:
e53845a9feeb8bc26c3af871938275b89727119d12e8caad55524a30d384e47a

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.