Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
1 Introduction Generalized Method of Moments (GMM) refers to a class of estimators which are constructed from exploiting the sample moment counterparts of population moment conditions (some- times known as orthogonality conditions) of the data generating model. GMM estimators have become widely used, for the following reasons: • GMM estimators have large sample properties that are easy to characterize in ways that facilitate comparison. A family of such estimators can be studied a priori in ways that make asymptotic efficiency comparisons easy. The method also provides a natural way to construct tests which take account of both sampling and estimation error. • In practice, researchers find it useful that GMM estimators can be constructed without specifying the full data generating process (which would be required to write down the maximum likelihood estimator.) This characteristic has been exploited in analyzing partially specified economic models, in studying potentially misspecified dynamic mod- els designed to match target moments, and in constructing stochastic discount factor models that link asset pricing to sources of macroeconomic risk. Books with good discussions of GMM estimation with a wide array of applications in- clude: Cochrane (2001), Arellano (2003), Hall (2005), and Singleton (2006). For a theoretical treatment of this method see Hansen (1982) along with the self contained discussions in the books. See also Ogaki (1993) for a general discussion of GMM estimation and applications, and see Hansen (2001) for a complementary entry that, among other things, links GMM estimation to related literatures in statistics. For a collection of recent methodological ad- vances related to GMM estimation see Ghysels and Hall (2002). While some of these other references explore the range of substantive applications, in what follows we focus more on the methodology. 2 Setup As we will see, formally there are two alternative ways to specify GMM estimators, but they have a common starting point. Data are a finite number of realizations of the process {xt : t = 1, 2, ...}. The model is specified as a vector of moment conditions: Ef(xt, βo) = 0 where f has r coordinates and βo is an unknown vector in a parameter space P ⊂ Rk. To achieve identification we assume that on the parameter space P Ef(xt, β) = 0 if, and only if β = βo. (1) The parameter βo is typically not sufficient to write down a likelihood function. Other pa- rameters are needed to specify fully the probability model that underlies the data generation. In other words, the model is only partially specified. Examples include: 1 i) linear and nonlinear versions of instrumental variables estimators as in Sargan (1958), Sargan (1959), Amemiya (1974); ii) rational expectations models as in Hansen and Singleton (1982), Cumby et al. (1983), and Hayashi and Sims (1983) iii) security market pricing of aggregate risks as described, for example, by Cochrane (2001), Singleton (2006) and Hansen et al. (2007); iv) matching and testing target moments of possibly misspecified models as described by, for example, Christiano and Eichenbaum (1992) and Hansen and Heckman (1996). Regarding example iv, many related methods have been developed for estimating correctly specified models, dating back to some of the original applications in statistics of method- of-moments type estimators. The motivation for such methods was computational. See Hansen (2001) for a discussion of this literature and how it relates to GMM estimation. With advances in numerical methods, the fully efficient maximum likelihood method and Bayesian counterparts have become much more tractable. On the other hand, there continues to be an interest in the study of dynamic stochastic economic models that are misspecified because of their purposeful simplicity. Thus moment matching remains an interesting application for the methods described here. Testing target moments remains valuable even when maximum likelihood estimation is possible (for example, see Bontemps and Meddahi (2005)). 2.1 Central Limit Theory and Martingale approximation The parameter dependent average gN(β) = 1 N N∑ t=1 f(xt, β) is featured in the construction of estimators and tests. When the Law of Large Numbers is applicable, this average converges to the Ef(xt, β). As a refinement of the identification condition: √ NgN(β0) =⇒ Normal(0, V ) (2) where =⇒ denotes convergence in distribution and V is a covariance matrix assumed to be nonsingular. In an iid data setting, V is the covariance matrix of the random vector f(xt, βo). In a time series setting: V = lim N→∞ NE [gN(βo)gN(βo) ′] , (3) which is the long run counterpart to a covariance matrix. Central limit theory for time series is typically built on martingale approximation. (See Gordin (1969) or Hall and Heyde (1980)). For many time series models, the martingale approximators can be constructed directly and there is specific structure to the V matrix. 2 A leading example is when f(xt, βo) defines a conditional moment restriction. Suppose that xt, t = 0, 1, ... generates a sigma algebra Ft, E [|f(xt, β0)|2] <∞ and E [f(xt+`, β0)|Ft] = 0 for some ` ≥ 1. This restriction is satisfied in models of multi-period security market pricing and in models that restrict multi-period forecasting. If ` = 1, then gN is itself a martingale; but when ` > 1 it is straightforward to find a martingale mN with stationary increments and finite second moments such that lim N→∞ E [|gN(β0)−mN(β0)|2] = 0, where | · | is the standard Euclidean norm. Moreover, the lag structure may be exploited to show that the limit in (3) is1 V = `−1∑ j=−`+1 E [f(xt, β0)f(xt+j, β0) ′] . (4) When there is no exploitable structure to the martingale approximator, the matrix V is the spectral density at frequency zero. V = ∞∑ j=−∞ E [f(xt, β0)f(xt+j, β0) ′] 2.2 Minimizing a Quadratic Form One approach for constructing a GMM estimator is to minimize the quadratic form: bN = argmin β∈P gN(β) ′WgN(β) for some positive definite weighting matrix W . Alternative weighting matrices W are asso- ciated with alternative estimators. Part of the justification for this approach is that β0 = argmin β∈P Ef(xt, β) ′WEf(xt, β). The GMM estimator mimics this identification scheme by using a sample counterpart. There are a variety of ways to prove consistency of GMM estimators. Hansen (1982) established a uniform law of large numbers for random functions when the data generation is stationary and ergodic. This uniformity is applied to show that sup β∈P |gN(β)− E [f(xt, β)]| = 0 1The sample counterpart to this formula is not guaranteed to be positive semidefinite. There are a variety of ways to exploit this dependence structure in estimation in constructing a positive semidefinite estimate. See Eichenbaum et al. (1988) for an example. 3 and presumes a compact parameter space. The uniformity in the approximation carries over directly the GMM criterion function gN(β) ′WgN(β). See Newey and McFadden (1994) for a more complete catalog of approaches of this type. The compactness of the parameter space is often not ignored in applications, and this commonly invoked result is therefore less useful than it might seem. Instead the compactness restriction is a substitute for checking behavior of the approximating function far away from βo to make sure that spurious optimizers are not induced by approximation error. This tail behavior can be important in practice, so a direct investigation of it can be fruitful. For models with parameter separation: f(x, β) = Xh(β) where X is a r × m matrix constructed from x and h is a one-to-one function mapping P into subset of Rm, there is an alternative way to establish consistency. See Hansen (1982) for details. Models that are either linear in the variables or models based on matching moments that are nonlinear functions of the underlying parameters can be written in this separable form. The choice of W = V −1 receives special attention, in part because NgN(β) ′V −1gN(β) =⇒ χ2(r). While the matrix V is typically not known, it can be replaced by a consistent estimator without altering the large sample properties of bN . When using martingale approximation, the implied structure of V can often be exploited as in formula (4). When there is no such exploitable structure, the method of Newey and West (1987b) and others can be employed that are based on frequency-domain methods for time series data. For asset pricing models there are other choices of a weighting matrix motivated by considerations of misspecification. In these models with parameterized stochastic discount factors, the sample moment conditions gN(β) can be interpreted as a vector of pricing errors associated with the parameter vector β.