High-dimensional Minimum Variance Portfolio Estimation
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
High-dimensional Minimum Variance Portfolio Estimation
Based on High-frequency Data
Abstract
This paper studies the estimation of high-dimensional minimum variance portfolio
(MVP) based on the high frequency returns which can exhibit heteroscedasticity and
possibly be contaminated by microstructure noise. Under certain sparsity assumptions
on the precision matrix, we propose estimators of the MVP and prove that our portfolios asymptotically achieve the minimum variance in a sharp sense. In addition, we
introduce consistent estimators of the minimum variance, which provide reference targets. Simulation and empirical studies demonstrate the favorable performance of the
proposed portfolios.
Key Words: Minimum variance portfolio; High dimension; High frequency; CLIME
estimator; Precision matrix.
JEL Codes: C13, C55, C58, G11
1 Introduction
1.1 Background
Since the ground-breaking work of Markowitz (1952), the mean-variance portfolio has caught
significant attention from both academics and practitioners. To implement such a strategy in
practice, the accuracy in estimating both the expected returns and the covariance structure
of returns is vital. It has been well documented that the estimation of the expected returns
is more difficult than the estimation of covariances (Merton (1980)), and the impact on
portfolio performance caused by the estimation error in the expected returns is larger than
that caused by the error in covariance estimation. These difficulties pose serious challenges
for the practical implementation of the Markowitz portfolio optimization.
mean-variance optimization has been criticized from the portfolio standpoint due to invalid
preferences from the axioms of choice, inconsistent dynamic behavior, etc.
The minimum variance portfolio (MVP) has received growing attention over the past few
years (see, e.g., DeMiguel et al. (2009a) and the references therein). It avoids the difficulties
in estimating the expected returns and is on the efficient frontier. In addition, the MVP is
found to perform well on real data. Empirical studies in Haugen and Baker (1991), Chan
et al. (1999), Schwartz (2000), Jagannathan and Ma (2003) and Clarke et al. (2006) have
found that the MVP can enjoy both lower risk and higher return compared with some
benchmark portfolios. These features make the MVP an attractive investment strategy in
practice.
The MVP is more natural in the context of high-frequency data. Over short time horizons, mean return is usually completely dominated by the volatility, consequently, as a
prudent common practice, when the time horizon of interest is short, the expected returns
are often assumed to be zero (see, e.g., Part II of Christoffersen (2012) and the references
therein). Fan et al. (2012a) make this assumption when considering the management of
portfolios that are rebalanced daily or every other few days. When the expected returns are
zero, the mean-variance optimization reduces to the risk minimization problem, in which
one seeks the MVP.
In addition, there are benefits of using high-frequency data. On the one hand, large
number of observations can potentially help facilitate better understanding of the covariance
structure of returns. Developments in this direction in the high-dimensional setting include
Wang and Zou (2010); Tao et al. (2011); Zheng and Li (2011); Tao et al. (2013); Kim et al.
(2016); Aït-Sahalia and Xiu (2017); Xia and Zheng (2018); Dai et al. (2019); Pelger (2019),
among others. On the other hand, high-frequency data allow short-horizon rebalancing
and hence the portfolios can adjust quickly to time variability of volatilities/co-volatilities.
However, high-frequency data do come with significant challenges in analysis. Complications
arise due to heteroscedasticity and microstructure noise, among others.
We consider in this paper the estimation of high-dimensional MVP using high-frequency
data. To be more specific, given p assets, whose returns X = (X1, . . . , Xp)
| have covariance
matrix Σ, we aim to find:
arg min
w
w|Σw subject to w|1 = 1, (1.1)
where w = (w1, . . . , wp)
|
represents the weights put on different assets, and 1 = (1, . . . , 1)|
is the p-dimensional vector with all entries being 1. The optimal solution is given by
wopt =
Σ−11
1
|Σ−11
, (1.2)
which yields the minimum risk
Rmin = w
|
optΣwopt =
1
1
|Σ−11
. (1.3)
More generally, one may be interested in the following optimization problem: for a given
vector β = (β1, . . . , βp)
| and constant c,
arg min
w
we
|Σwe subject to w|1 = c, where we = (β1w1, . . . , βpwp), (1.4)
or its equivalent formulation:
arg min
we
we
|Σwe subject to we
|β
−1 = c, where β
−1
:= (1/β1, . . . , 1/βp)
|
. (1.5)
Such a setting applies, for example, in leveraged investment. We remark that the optimization problem (1.5) can be reduced to (1.1) by noticing that if we solves (1.5), then
wˇ := (we1/β1, . . . , wep/βp)
|/c solves (1.1) with Σe = diag(β1, . . . , βp)Σ diag(β1, . . . , βp), and
vice versa. For this reason, the two optimization problems (1.1) and (1.4) or (1.5) can be
transformed into each other. In the rest of the paper, we focus on the problem (1.1).
A main challenge of solving the optimization problem (1.1) comes from high-dimensionality
because modern portfolios often involve a large number of assets. See, for example, Zheng
and Li (2011), Fan et al. (2012a), Fan et al. (2012b), Ao et al. (2018), Xia and Zheng (2018),
and Dai et al. (2019) on issues about and progress made on vast portfolio management.
On the other hand, the estimation of the minimum risk defined in (1.3) is also a problem
of interest. This provides a reference target for the estimated minimum variance portfolios.
In practice, because the true covariance matrix is unknown, the sample covariance matrix S is usually used as a proxy, and the resulting “plug-in” portfolio, wp = S
−11/1
|S
−11,
has been widely adopted. How well does such a portfolio perform? This question has been
considered in Basak et al. (2009). The following simulation result visualizes their first finding (Proposition 1 therein). Figure 1 shows the risk of the plug-in portfolio based on 100
replications. One can see that the actual risk R(wp) = w
|
pΣwp of the plug-in portfolio
can be devastatingly higher than the theoretical minimum risk. On the other hand, the
perceived risk Rbp = w
|
pSwp can be even lower than the theoretical minimum risk. Such
contradictory phenomena lead to two questions: (1) Can we consistently estimate the true
minimum risk? ; and (2) More importantly, can we find a portfolio with a risk close to the
true minimum risk?
0 20 40 60 80 100
1.0e−05 1.5e−05 2.0e−05
Comparison of risks
replication
risk
perceived risk
actual risk
minimun risk
Figure 1. Comparison of actual and perceived risks of the plug-in portfolio. The portfolios
are constructed based on returns simulated from i.i.d. multivariate normal distribution with
mean zero and covariance matrix Σ calibrated from real data; see Section 4 for details.
The number of assets and observations are 80 and 252, respectively. The comparison is
replicated 100 times.
Because of such issues with the plug-in portfolio, alternative methods have been proposed.
Jagannathan and Ma (2003) argue that imposing no short-sale constraint helps.
More generally, Fan et al. (2012b) study the MVP under the following gross-exposure constraint:
arg min
w
w|Σw subject to w|1 = 1 and ||w||1 ≤ λ , (1.6)
where ||w||1 =
Pp
i=1 |wi
| and λ is a chosen constant. They derive the following bound on
the risk of estimated portfolio. If Σb is an estimator of Σ, then the solution to (1.6) with Σ
replaced by Σb, denoted by wbopt, satisfies that
|R(wbopt) − Rmin| ≤ λ
2
· ||Σb − Σ||∞, (1.7)
where for any weight vector w, R(w) = w|Σw stands for the risk measured by the variance
of the portfolio return, and for any matrix A = (aij ), ||A||∞ := maxij |aij |. In particular,
||Σb −Σ||∞ is the maximum element-wise estimation error in using Σb to estimate Σ. Fan et al.
(2012a) consider the high-frequency setting, where they use the two-scale realized covariance
matrix (Zhang et al. (2005)) to estimate the integrated covariance matrix (see Section 2.1
below for related background), and establish concentration inequalities for the element-wise
estimation error. These concentration inequalities imply that even if the number of assets p
grows faster than the number of observations n, one still has that ||Σb −Σ||∞ → 0 as n → ∞;
see equation (18) in Fan et al. (2012a) for the precise statement. In particular, bound (1.7)
guarantees that under gross-exposure constraint, the difference between the risk associated
with wbopt and the minimum risk is asymptotically negligible.
The difference between the risk of an estimated portfolio and the minimum risk going to
zero, however, may not be sufficient to guarantee (near) optimality. In fact, under rather general assumptions (which do not exclude factor models), the minimum risk Rmin = 1/1
|Σ−11
may go to zero as the number of assets p → ∞; see Ding et al. (2018) for a thorough discussion. If indeed the minimum risk goes to 0 as p → ∞, then the difference |R(wbopt) − Rmin|
going to 0 is not enough to guarantee (near) optimality. Based on the above consideration,
we turn to find an asset allocation wb which satisfies a stronger sense of consistency in that
the ratio between the risk of the estimated portfolio and the minimum risk goes to one, i.e.,
R(wb)
Rmin
p
−→ 1 as p → ∞, (1.8)
where p
−→ stands for convergence in probability.
1.2 Main contributions of the paper
Our contributions mainly lie in the following aspects.
We propose estimators of minimum variance portfolio that can accommodate stochastic
volatility and market microstructure noise, which are intrinsic to high-frequency returns.
Under some sparsity assumptions on the inverse of the covariance matrix (also known as the
precision matrix), our estimated portfolios enjoy the desired convergence (1.8).
We also introduce consistent estimators of the minimum risk. One such estimator does
not depend on the sparsity assumption and also enjoys a CLT.
1.3 Organization of the paper
The paper is organized as follows. In Section 2, we present our estimators of the MVP
and show that their risks converge to the minimum risk in the sense of (1.8). We have
an estimator that incorporates stochastic volatility (CLIME-SV) and one that incorporates
stochastic volatility and microstructure noise (CLIME-SVMN). A consistent estimator of the
minimum risk is proposed in Section 2.4, for which we also establish the CLT. An extension of
our method to utilize factors is developed in Section 3. Section 4 presents simulation results
to illustrate the performance of both portfolio and minimum risk estimations. Empirical
study results based on S&P 100 Index constituents are reported in Section 5. We conclude
our paper with a brief summary in Section 6. All proofs are given in the Appendix.
2 Estimation Methods and Asymptotic Properties
2.1 High-frequency data model
We assume that the latent p-dimensional log-price process (Xt) follows a diffusion model:
dXt = µtdt + Θt dWt
, for t ≥ 0, (2.1)
1
t
, . . . , µ
p
t
)
|
is the drift process, (Θt) = (θ
ij
t
)1≤i,j≤p is a p × p matrix-valued
process called the spot co-volatility process, and (Wt) is a p-dimensional Brownian motion.
Both (µt) and (Θt) are stochastic, càdlàg, and may depend on (Wt), all defined on a
common filtered probability space (Ω, F,(Ft)t≥0).
Let
Σt = ΘtΘ
|
t
:= (σ
ij
t
)
be the spot covariance matrix process. The ex-post integrated covariance (ICV) matrix over
an interval, say [0, 1], is
ΣICV = ΣICV,1 = (σ
ij ) := Z 1
0
Σt dt.
Denote its inverse by ΩICV := Σ
−1
ICV. The ex-post minimum risk, Rmin, is obtained by
replacing the Σ in (1.3) with ΣICV.
Let us emphasize that in general, ΣICV is a random variable which is only measurable
to F1, and so is Rmin. It is therefore in principle impossible to construct a portfolio that
is measurable to F0 to achieve the minimum risk Rmin. Practical implementation of the
minimum variance portfolio relies on making forecasts of ΣICV based on historical data.
The simplest approach is to assume that ΣICV,t ≈ ΣICV,t+1
1
, where ΣICV,t stands for the
ICV matrix in period [t−1, t]. Under such an assumption, if we can construct a portfolio w
based on the observations during [t − 1, t] (and hence only measurable to Ft) that can
approximately minimize the ex-post risk w|ΣICV,tw, then if we hold the portfolio during
the next period [t, t + 1], the actual risk w|ΣICV,t+1w is still approximately minimized. In
this article we adopt such a strategy.
2.2 High-frequency case with no microstructure noise
We first consider the case when there is no microstructure noise - in other words, one observes
the true log-prices (Xi
t
).
Our approach to estimate the minimum variance portfolio relies on the constrained l1-
minimization for inverse matrix estimation (CLIME) proposed in Cai et al. (2011). The
original CLIME method is developed under the i.i.d. observation setting. Specifically, suppose one has n i.i.d. observations from a population with covariance matrix Σ. Let Ω := Σ−1
be the corresponding precision matrix. The CLIME estimator of Ω is defined as
Ωb CLIME := arg min
Ω0
||Ω0
||1 subject to ||ΣΩb 0 − I||∞ ≤ λ, (2.2)
where Σb is the sample covariance matrix, I is the identity matrix, and for any matrix
A = (aij ), ||A||1 := P
i,j |aij | and, recall that, ||A||∞ = maxij |aij |. The λ is a tuning
parameter, and is usually chosen via cross-validation.
1This is related to the phenomenon that the volatility process is often found to be nearly unit root, in
which case the one-step ahead prediction is approximately the current value. The assumption is also used
in Fan et al. (2012a) and Dai et al. (2019), among others.
The CLIME method is designed for the following uniformity class of precision matrices.
For any 0 ≤ q < 1, s0 = s0(p) < ∞ and M = M(p) < ∞, let
U(q, s0, M) = n
Ω = (Ωij )p×p : Ω positive definite, ||Ω||L1 ≤ M, max
1≤i≤p
X
p
j=1
|Ωij |
q ≤ s0
o
,
(2.3)
where ||Ω||L1
:= max1≤j≤p
Pp
i=1 |Ωij |.
Under the situation when the observations are i.i.d. sub-Gaussian and the underlying Ω
belongs to U(q, s0(p), M(p)), Cai et al. (2011) establish consistency of Ωb CLIME when q, s0(p)
and M(p) satisfy M2−2q
s0 (log p/n)
(1−q)/2 → 0; see Theorem 1(a) therein.
In the high-frequency setting, due to stochastic volatility, leverage effect etc., the returns
are not i.i.d., so the results in Cai et al. (2011) do not apply. Before we discuss how
to tackle this difficulty, we remark that the sparsity assumption Ω = Σ−1 ∈ U(q, s0, M)
appears to be reasonable in financial applications. For example, if the returns are assumed
to follow a (conditional) multivariate normal distribution with covariance matrix Σ, then
the (i, j)th element in Ω being 0 is equivalent to that the returns of the ith and jth assets
are conditionally independent given the other asset returns. For stocks in different sectors,
many pairs might be conditionally independent or only weakly dependent.
Now we discuss how to adopt CLIME to the high-frequency setting. Our goal is to
estimate ΣICV, or, more precisely, its inverse ΩICV := Σ
−1
ICV. In order to apply CLIME, we
need to decide which Σb to use in (2.2).
When the true log-prices are observed, one of the most commonly used estimators
for ΣICV is the realized covariance (RCV) matrix. Specifically, for each asset i, suppose
the observations at stage n are
Xi
t
i,n
`
, where 0 = t
i,n
0 < ti,n
1 < · · · < ti,n
Ni
= 1 are the
observation times. The n characterizes the observation frequency, and Ni → ∞ as n → ∞.