Concept in probability theory
In probability theory, a Markov kernel (also known as a stochastic kernel or probability kernel) is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space.[1]
Let
and
be measurable spaces. A Markov kernel with source
and target
, sometimes written as
, is a function
with the following properties:
- For every (fixed)
, the map
is
-measurable
- For every (fixed)
, the map
is a probability measure on ![{\displaystyle (Y,{\mathcal {B}})}](https://wikimedia.org/api/rest_v1/media/math/render/svg/1a096688702c0174240d3e607724ba176711eb19)
In other words it associates to each point
a probability measure
on
such that, for every measurable set
, the map
is measurable with respect to the
-algebra
.[2]
Take
, and
(the power set of
). Then a Markov kernel is fully determined by the probability it assigns to singletons
for each
:
.
Now the random walk
that goes to the right with probability
and to the left with probability
is defined by
![{\displaystyle \kappa (\{m\}|n)=p\delta _{m,n+1}+(1-p)\delta _{m,n-1},\quad \forall n,m\in \mathbb {Z} }](https://wikimedia.org/api/rest_v1/media/math/render/svg/e259e41a6a30458de7649a8b4aea9a3054aad760)
where
is the Kronecker delta. The transition probabilities
for the random walk are equivalent to the Markov kernel.
More generally take
and
both countable and
.
Again a Markov kernel is defined by the probability it assigns to singleton sets for each
,
We define a Markov process by defining a transition probability
where the numbers
define a (countable) stochastic matrix
i.e.
![{\displaystyle {\begin{aligned}K_{ji}&\geq 0,\qquad &\forall (j,i)\in Y\times X,\\\sum _{j\in Y}K_{ji}&=1,\qquad &\forall i\in X.\\\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/c4986fbe03ea1be60ddf699d77e6d170854db8c1)
We then define
.
Again the transition probability, the stochastic matrix and the Markov kernel are equivalent reformulations.
Markov kernel defined by a kernel function and a measure
[edit]
Let
be a measure on
, and
a measurable function with respect to the product
-algebra
such that
,
then
i.e. the mapping
![{\displaystyle {\begin{cases}\kappa :{\mathcal {B}}\times X\to [0,1]\\\kappa (B|x)=\int _{B}k(y,x)\nu (\mathrm {d} y)\end{cases}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/108781193dc73ba702e77da371698684096dec55)
defines a Markov kernel.[3] This example generalises the countable Markov process example where
was the counting measure. Moreover it encompasses other important examples such as the convolution kernels, in particular the Markov kernels defined by the heat equation. The latter example includes the Gaussian kernel on
with
standard Lebesgue measure and
.
Measurable functions
[edit]
Take
and
arbitrary measurable spaces, and let
be a measurable function. Now define
i.e.
for all
.
Note that the indicator function
is
-measurable for all
iff
is measurable.
This example allows us to think of a Markov kernel as a generalised function with a (in general) random rather than certain value. That is, it is a multivalued function where the values are not equally weighted.
As a less obvious example, take
, and
the real numbers
with the standard sigma algebra of Borel sets. Then
![{\displaystyle \kappa (B|n)={\begin{cases}\mathbf {1} _{B}(0)&n=0\\\Pr(\xi _{1}+\cdots +\xi _{x}\in B)&n\neq 0\\\end{cases}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/ed4f9dcc614c9839560b16b70fbcb78a93f0973f)
where
is the number of element at the state
,
are i.i.d. random variables (usually with mean 0) and where
is the indicator function. For the simple case of coin flips this models the different levels of a Galton board.
Composition of Markov Kernels
[edit]
Given measurable spaces
,
we consider a Markov kernel
as a morphism
. Intuitively, rather than assigning to each
a sharply defined point
the kernel assigns a "fuzzy" point in
which is only known with some level of uncertainty, much like actual physical measurements. If we have a third measurable space
, and probability kernels
and
, we can define a composition
by the Chapman-Kolmogorov equation
.
The composition is associative by the Monotone Convergence Theorem and the identity function considered as a Markov kernel (i.e. the delta measure
) is the unit for this composition.
This composition defines the structure of a category on the measurable spaces with Markov kernels as morphisms, first defined by Lawvere,[4] the category of Markov kernels.
Probability Space defined by Probability Distribution and a Markov Kernel
[edit]
A composition of a probability space
and a probability kernel
defines a probability space
, where the probability measure is given by
![{\displaystyle P_{Y}(B)=\int _{X}\int _{B}\kappa (dy|x)P_{X}(dx)=\int _{X}\kappa (B|x)P_{X}(dx)=\mathbb {E} _{P_{X}}\kappa (B|\cdot ).}](https://wikimedia.org/api/rest_v1/media/math/render/svg/988009a77377b82be12a156a1fd72c9a38682b0a)
Let
be a probability space and
a Markov kernel from
to some
. Then there exists a unique measure
on
, such that:
![{\displaystyle Q(A\times B)=\int _{A}\kappa (B|x)\,P(dx),\quad \forall A\in {\mathcal {A}},\quad \forall B\in {\mathcal {B}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/07765c52fd58eecab0dd2ab42b529903032a246e)
Regular conditional distribution
[edit]
Let
be a Borel space,
a
-valued random variable on the measure space
and
a sub-
-algebra. Then there exists a Markov kernel
from
to
, such that
is a version of the conditional expectation
for every
, i.e.
![{\displaystyle P(X\in B\mid {\mathcal {G}})=\mathbb {E} \left[\mathbf {1} _{\{X\in B\}}\mid {\mathcal {G}}\right]=\kappa (\cdot ,B),\qquad P{\text{-a.s.}}\,\,\forall B\in {\mathcal {G}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b6c55d8e1b610aff7692854fdf6dcb544680b221)
It is called regular conditional distribution of
given
and is not uniquely defined.
Transition kernels generalize Markov kernels in the sense that for all
, the map
![{\displaystyle B\mapsto \kappa (B|x)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/02eb48c39cdd8d920017a93bbb39103d72633b8a)
can be any type of (non negative) measure, not necessarily a probability measure.
- §36. Kernels and semigroups of kernels