Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
1 Introduction
Geostatistical data are data that could, in principle, be measured anywhere within
a domain of interest. Examples of such data are:
– Gold grades within a gold mine.
– Particulate Matter (PM) in air samples.
Unlike in a point pattern, the observation locations themselves are not of primary
interest - these are usually fixed.
The interest is in aspects of the variable that have not been measured yet:
– maps of estimated values.
– maps of exceedance probabilities (i.e. regions where the probability of a particular
event is greater/less than a specified amount.)
– estimates of averages/aggregates over regions.
2 Random Fields and Stationarity
In geostatistics, data are assumed to be a partial realisation of a random field
{Z(s) : s ∈ D}
– D is a fixed subset of R
2
– The spatial index varies continuously throughout D.
– For a fixed s, Z(s) is a random variable.
– For a fixed realisation of this process, our observations are a function of space:
Z(s1), Z(s2), . . . , Z(sn).
Since we only have one observation of the random field, assuming stationarity of some
sort allows us to have some form of replication and hence we can perform estimation.
A random process Z(·) is second-order stationary if
1. E[Z(s)] = μ for all s ∈ D.
2. Cov(Z(si), Z(sj )) = C(si ? sj ) for all si
, sj ∈ D.
C(·) is called the covariance function. Note that by definition of what a covariance is,
V ar(Z(si)) = Cov(Z(si), Z(sj )) = C(0)
1
In our notation, we use h to denote the spatial lag h = si ? sj and h to denote the
distance ||h||.
If C(·) is only a function of the distance between si and sj (and not the direction), then
we say that the process is isotropic.
If C(·) is a function of the distance and direction between si and sj
, we say that the
process is anisotropic.
A random process is intrinsically stationary if
1. E[Z(s)] = μ for all s ∈ D.
2. V ar(Z(si) ? Z(sj )) = 2γ(si ? sj ) for all si
, sj ∈ D.
2γ(·) is known as the variogram, and γ(·) is known as the semivariogram. The
semivariogram shows how the dissimilarity between Z(si) and Z(si + h) varies with h.
The variogram is closely related to the covariance function. Both are functions of the
spatial lag h.
3 Semivariogram
3.1 Properties of the semivariogram
The semivariogram has the following properties:
– γ(?h) = γ(h)
– γ(0) = 0
– γ(h)/h2 → 0 as h → ∞. This implies that that the semivariogram can increase to
∞, but not in an uncontrolled fashion.
– γ is condionally non-negative definite. In other words, for any set of m real
numbers {ai} such that Pm
i=1 ai = 1, it holds that
Xm
i=1
Xm
j=1
aiajγ(si ? sj ) ≤ 0
– If the process is isotropic, γ(h) = γ(h).
A graph of the semivariogram plotted against separation h conveys information about
continuity and spatial variability of the process (see Figure 1).
If there is no spatial correlation between Z(si) and Z(sj ), the semivariogram will be a
horizontal line.
The shape of the semivariogram near the origin indicates the degree of smoothness of
the variable.
2
Figure 1: Semivariogram
– The shape reflects the covariance when si and sj are close together.
– A parabolic shape here means that it is a very smooth process.
– A linear shape indicates it is less smooth.
A vertical jump, or a non-zero nugget corresponds to high spatial irregularity.
– It means that two observations fairly close together can have very different values.
– A non-zero nugget is a discontinuity and could be due to measurement error.
3.2 Semivariogram and Covariance Function
If Z(·) is second-order stationary, then it can be shown that
γ(h) = C(0) C(h)
Further, if C(h) → 0 as h → ∞, then it also holds that γ(h) → C(0).
Note that C(0) is the variance of Z(s), and is also the sill of the semivariogram.
Since the two functions are closely related, why not just use the covariance function?
– The class of intrinsically stationary processes is larger than the class of second-order
stationary process (although only barely).
– Estimation of the semivariogram is more reliable than estimation of the covariance
function as the former does not require estimation of the mean.
3.3 Estimation of Semivariogram
Note that an alternative expression for the semivariogram can be obtained from the
following derviation:
3
2γ(h) = V ar(Z(s + h) ? Z(s))
= E[(Z(s + h) ? Z(s)) ? (μ ? μ)]2
= E[(Z(s + h) ? Z(s))2
]
Thus one way to estimate this function is to average all squared differences (Z(si) ?
Z(sj ))2
for pairs of observations taken the same lag apart, in the same direction.
γ?(h) = 1
2|N(h)|
X
N(h)
(Z(si) ? Z(sj ))2
– N(h) is the set of distinct pairs separated by lag h:
N(h) = {(si
, sj ) : si ? sj = h}
This is known as the classical semivariogram estimator or sample variogram. It can
provide point estimates of γ at observed values of h.
The most useful library for geostatistics is gstat.