Jak zapewnić właściwości macierzy kowariancji przy dopasowywaniu wielowymiarowego modelu normalnego przy maksymalnym prawdopodobieństwie?

22

Załóżmy, że mam następujący model

y_{i} = f (x_{i}, θ) + ε_{i}

$y_i=f(x_i,\theta)+\varepsilon_i$

where $y_i\in \mathbb{R}^K$ , $x_i$ is a vector of explanatory variables, $\theta$ is the parameters of non-linear function $f$ and $\varepsilon_i\sim N(0,\Sigma)$ , where $\Sigma$ naturally is $K\times K$ matrix.

The goal is the usual to estimate $\theta$ and $\Sigma$ . The obvious choice is maximum likelihood method. Log-likelihood for this model (assuming we have a sample $(y_i,x_i),i=1,...,n$ ) looks like

l (θ, Σ) = - \frac{n}{2} \log (2 π) - \frac{n}{2} \log det Σ - \sum_{i = 1}^{n} (y_{i} - f (x_{i}, θ))^{'} Σ^{- 1} (y - f (x_{i}, θ)))

$l(\theta,\Sigma)=-\frac{n}{2}\log(2\pi)-\frac{n}{2} \log\det\Sigma-\sum_{i=1}^n(y_i-f(x_i,\theta))'\Sigma^{-1}(y-f(x_i,\theta)))$

Now this seems simple, the log-likelihood is specified, put in data, and use some algorithm for non-linear optimisation. The problem is how to ensure that $\Sigma$ is positive definite. Using for example optim in R (or any other non-linear optimisation algorithm) will not guarantee me that $\Sigma$ is positive definite.

So the question is how to ensure that $\Sigma$ stays positive definite? I see two possible solutions:

Reparametrise $\Sigma$ as $RR'$ where $R$ is upper-triangular or symmetric matrix. Then $\Sigma$ will always be positive-definite and $R$ can be unconstrained.
Use profile likelihood. Derive the formulas for $\hat\theta(\Sigma)$ and $\hat{\Sigma}(\theta)$ . Start with some $\theta_0$ and iterate $\hat{\Sigma}_j=\hat\Sigma(\hat\theta_{j-1})$ , $\hat{\theta}_j=\hat\theta(\hat\Sigma_{j-1})$ until convergence.

Is there some other way and what about these 2 approaches, will they work, are they standard? This seems pretty standard problem, but quick search did not give me any pointers. I know that Bayesian estimation would be also possible, but for the moment I would not want to engage in it.

maximum-likelihood optimization covariance

— mpiktas
źródło

I have the same issue in a Kalman algorithm, but the problem is much more complicated and not as easy to use the Hamilton trick. I wonder then whether a simpler thing to do would to simply use

\log (det Σ + 1)

$\log (\det \Sigma+1)$ . This way I force the code to not give an error and do not change the solution. This also has the benefit of forcing this term to have the same sign as the final part of the likelihood. Any ideas?

— econ_pipo

6

Assuming that in constructing the covariance matrix, you are automatically taking care of the symmetry issue, your log-likelihood will be $-\infty$ when $\Sigma$ is not positive definite because of the $\log {\rm det} \ \Sigma$ term in the model right? To prevent a numerical error if ${\rm det} \ \Sigma < 0$ I would precalculate ${\rm det} \ \Sigma$ and, if it is not positive, then make the log likelihood equal -Inf, otherwise continue. You have to calculate the determinant anyways, so this is not costing you any extra calculation.

— Macro
źródło

5

As it turns out you can use profile maximum likelihood to ensure the necessary properties. You can prove that for given $\hat\theta$ , $l(\hat\theta,\Sigma)$ is maximised by

\hat{Σ} = \frac{1}{n} \sum_{i = 1}^{n} {\hat{ε}}_{i} {\hat{ε}}_{i}^{'},

$\hat\Sigma=\frac{1}{n}\sum_{i=1}^n\hat{\varepsilon}_i\hat{\varepsilon}_i',$

where

{\hat{ε}}_{i} = y_{i} - f (x_{i}, \hat{θ})

$\hat{\varepsilon}_i=y_i-f(x_i,\hat\theta)$

Then it is possible to show that

\sum_{i = 1}^{n} (y_{i} - f (x_{i}, \hat{θ}))^{'} {\hat{Σ}}^{- 1} (y - f (x_{i}, \hat{θ}))) = c o n s t,

$\sum_{i=1}^n(y_i-f(x_i,\hat\theta))'\hat\Sigma^{-1}(y-f(x_i,\hat\theta)))=const,$

hence we only need to maximise

l_{R} (θ, Σ) = - \frac{n}{2} \log det \hat{Σ} .

$l_R(\theta,\Sigma)=-\frac{n}{2} \log\det\hat\Sigma.$

Naturally in this case $\Sigma$ will satisfy all the necessary properties. The proofs are identical for the case when $f$ is linear which can be found in Time Series Analysis by J. D. Hamilton page 295, hence I omitted them.

— mpiktas
źródło

3

An alternative parameterization for the covariance matrix is in terms of eigenvalues $\lambda_1,...,\lambda_p$ and $p(p-1)/2$ "Givens" angles $\theta_ij$ .

That is, we can write

Σ = G^{T} Λ G

$\Sigma = G^T \Lambda G$

where $G$ is orthonormal, and

Λ = d i a g (λ_{1}, . . ., λ_{p})

$\Lambda = diag(\lambda_1, ..., \lambda_p)$

with $\lambda_1 \geq ... \geq \lambda_p \geq 0$ .

Meanwhile, $G$ can be parameterized uniquely in terms of $p(p-1)/2$ angles, $\theta_{ij}$ , where $i = 1,2,...,p-1$ and $j = i, ..., p-1$ .[1]

(details to be added)

[1]: Hoffman, Raffenetti, Ruedenberg. "Generalization of Euler Angles to N‐Dimensional Orthogonal Matrices". J. Math. Phys. 13, 528 (1972)

— charles.y.zheng
źródło

The matrix

G

$G$ is actually orthogonal, because

Σ

$\Sigma$ is a symmetric matrix. This is the approach I was going to recommend - Basically amounts to rotating the

y_{i}

$y_i$ vector and the model function

f (x_{i}, θ)

$f(x_i,\theta)$ so that the errors are independent, then applying OLS to each of the rotated components (I think).

— probabilityislogic

2

Along the lines of charles.y.zheng's solution, you may wish to model $\Sigma = \Lambda + C C^{\top}$ , where $\Lambda$ is a diagonal matrix, and $C$ is a Cholesky factorization of a rank update to $\Lambda$ . You only then need to keep the diagonal of $\Lambda$ positive to keep $\Sigma$ positive definite. That is, you should estimate the diagonal of $\Lambda$ and the elements of $C$ instead of estimating $\Sigma$ .

— shabbychef
źródło

Can below diagonal elements in this settings be anything I want as long as the diagonal is positive? When simulate matrices this way in numpy not all of them are positive definite.

— sztal

Λ

$\Lambda$ is a diagonal matrix.

— shabbychef