﻿ 纵向数据下广义经验似然方法的有效稳健估计
 上海理工大学学报  2019, Vol. 41 Issue (4): 331-338 PDF

Effective Robust Estimation Based on the Generalized Empirical Likelihood Method for Longitudinal Data
XU Xiaolin, FAN Yali, SU Yiguan
College of Science, University of Shanghai for Science and Technology, Shanghai 200093, China
Abstract: Based on the generalized empirical likelihood estimation method, a robust and effective estimation was proposed to realize the joint estimation of the mean and covariance matrix of longitudinal data in linear models. Using the Cholysky decomposition, the model was re-parameterized, and using the Lagrangian multiplier method, estimates were obtained, then, an estimate of the mean and covariance matrix was restored. Comparing the proposed method with other robust estimates in the literature in a simulation study, the results show that the proposed method is more efficient. Finally, the proposed method was used to analyze CD4 cell data. Cross-validation results show that the proposed method is more reliable.
Key words: longitudinal data     robustness     effectivenes     empirical likelihood     alternating optimization

1 模型与估计方法 1.1 初始模型

 ${y_{ij}} = {{x}}_{ij}^{\rm T}{{{\beta}} _0} + {\varepsilon _{ij}},\;\;\;\;\;\;i = 1,2, \cdots, n,\;\;\;j = 1,2, \cdots, m$ (1)

${{{Y}}_i} = {({y_{i1}}, \cdots, {y_{im}})^{\rm T}}$ ${{{X}}_i} = {({{{x}}_{i1}}, \cdots ,{{{x}}_{im}})^{\rm T}}$ ，在本文中主要研究模型（1）中 $\,{{{\beta }}_0}$ ${{{\varSigma }}_0}$ 的有效稳健估计。

1.2 模型重参数化

 ${{C}} = \left( {\begin{array}{*{20}{c}} 1 & 0 & \cdots & 0\\ { - {c_{21}}}& 1& \cdots & 0\\ \vdots & \vdots & {}& \vdots \\ { - {c_{m1}}}& { - {c_{m2}}}& \cdots &1 \end{array}} \right)\;\;$

${{{e}}_i} \!\!=\!\! {({e_{i1}}, \!\cdots\!, {e_{im}})^{\rm T}}\! \!=\!\! {{C}}{{{\varepsilon }}_i}$ ${\rm{var}} ({e_{ij}}) \!\!=\!\! b_j^2\;,j \!\!=\!\! 1,2, \!\cdots\! ,m$

 $\left\{ \begin{array}{l} {y_{i1}} = {{x}}_{i1}^{\rm T}{{{\beta }}_0} + {e_{i1}} \\ {y_{ij}} = {{x}}_{ij}^{\rm T}{{{\beta }}_0} + {c_{j1}}{\varepsilon _{i1}} + \cdots+ {c_{jj - 1}}{\varepsilon _{ij - 1}} + {e_{ij}},\\ \;\;\; i = 1,2, \cdots, n,\quad j = 2,3 \cdots, m \\ \end{array} \right.$ (2)

${{{s}}_0} \!=\! {({c_{21}},{c_{31}},{c_{32}}, \cdots, {c_{m,m \!-\! 1}})^{\rm T}}$ ${{\hat{ Z}}_{ij}} = ({{\textit{0}}}_{(j - 2)(j - 1)/2}^{\rm T},$ ${\hat \varepsilon _{i1}}, \cdots ,{\hat \varepsilon _{i,j - 1}},{{\textit{0}}}_{(m - 1)m/2 - (j - 1)j/2}^{\rm T})^{\rm T}$ ${{\hat{ Z}}_i}^{( \!-\! 1)} \!\!=\!\! {({{\hat{ Z}}_{i2}}, \cdots ,{{\hat{ Z}}_{im}})^{\rm T}}$ ${{{\theta }}_0} \!\!=\!\! {({{\beta }}_0^{\rm T},{{s}}_0^{\rm T})^{\rm T}}$ ${{X}}_i^{( \!-\! 1)}\! \!=\!\! ({{{x}}_{i2}}, \cdots ,$ ${{{x}}_{im}})^{\rm T}$ ${{{D}}_1} = ({{x}}_{i1}^{\rm T},{{\textit{0}}}_{m(m - 1)/2}^{\rm T})$

${\hat{ D}}_i^{( - 1)} = ({{X}}_i^{( - 1)},{\hat{ Z}}_i^{( - 1)})$ ${\hat{ D}}_i^{} = {({{{D}}_1},{\hat{ D}}_i^{( - 1)})^{\rm T}}$

${{Y}} ={({{{Y}}_1},{{{Y}}_2}, \cdots, {{{Y}}_n})^{\rm T}} = {({y_1},{y_2}, \cdots, {y_{nm}})^{\rm T}}\qquad$ 　　　　 ${\hat{ D}} = {({{\hat{ D}}_1},{{\hat{ D}}_2}, \cdots, {{\hat{ D}}_n})^{\rm T}} = {({{d}}_1^T,{{d}}_2^{\rm T}, \cdots, {{d}}_{nm}^{\rm T})^{\rm T}}$ 　　　　 ${{e}} = {({{{e}}_1},{{{e}}_2}, \cdots ,{{{e}}_n})^{\rm T}} = {({\omega _1},{\omega _2}, \cdots, {\omega _{nm}})^{\rm T}}\quad$

 ${y_k} = {{d}}_k^{\rm T}{{\theta }} + {\omega _k},\quad k = 1,2, \cdots ,N$ (3)

1.3 广义经验似然估计

 $L({{\theta }},{{w}}) = \min \left\{ {\sum\limits_{k = 1}^N {{w_k}\log (N{w_k})} } \right\}$ (4)

 $\sum\limits_{k = 1}^N {{w_k}} = 1,\;\;\;\sum\limits_{k = 1}^N {{w_k}} ({y_k} - {{d}}_k^{\rm T}{{\theta }}){{{d}}_k} = 0\;\;$ (5)
 $\sum\limits_{k = 1}^N {{w_k}} {({y_k} - {{d}}_{k}^{\rm T}{{\theta }})^2} = \tilde \sigma _T^2\;\;\;\;\;$ (6)

 ${{\tilde \theta }} = \arg \min\; L({{\theta }},{{w}}), \quad{\hat w_k} = \arg \min \;L({{\theta }},{{w}})。$

 ${\hat{ \theta }} = ({{{D}}^{\rm T}}{{WD}}){}^{ - 1}{{{D}}^{\rm T}}{{WY}}$ (7)

$r_k^2({\hat{ \theta }}) = {y_k} - {{d}}_k^{\rm T}{\hat{ \theta }}$ ，因为， $\displaystyle\sum\limits_{k = 1}^N {{{\hat w}_k}\left\{ {r_k^2({{\theta }}) - \tilde \sigma _T^2} \right\} = } 0$ ，且 $\displaystyle\sum\limits_{k = 1}^N {\hat w{}_k} = 1$ 。在假设 $\;\tilde \sigma _T^2\; \leqslant \;\hat \sigma _{{\rm{OLS}}}^2$ 的前提下，可得

 $\begin{split}& \quad \sum\limits_{k = 1}^N {{{\hat w}_k}} r_k^2({\hat{ \theta }}) =\sum\limits_{k = 1}^N {{{\hat w}_k}} \tilde \sigma _T^2 = \tilde \sigma _T^2 \leqslant \hat \sigma _{\rm{OLS}}^2= \\ &\qquad(1/N)\sum\limits_{k = 1}^N {r_k^2({{{\hat{ \theta }}}_{\rm{OLS}}})} \leqslant (1/N)\sum\limits_{k = 1}^N {r_k^2({\hat{ \theta }})}\qquad\qquad\qquad\qquad\\[-20pt]\end{split}$ (8)

 $\;\sum\limits_{k = 1}^N {{{\hat w}_k}r_k^2({\hat{ \theta }})} \leqslant (1/N)\sum\limits_{k = 1}^N {r_k^2({\hat{ \theta }})} \Rightarrow \sum\limits_{k = 1}^N {\left\{ {{{\hat w}_k} - (1/N)} \right\}r_k^2({\hat{ \theta }})} \leqslant 0$

a.根据每一组对象的第一个观测值，利用最小二乘方法，得到 $\,{{{\beta }}_0}$ 的初始估计值，记为 $\,{{\hat{ \beta }}_b}$ ，从而有 ${\hat \varepsilon _{ij}} = {y_{ij}} - {{x}}_{ij}^{\rm T}{{\hat{ \beta }}_b}$ ，并且令 ${\varepsilon _{ij}} = {\hat \varepsilon _{ij}}$

b.对于模型（3）， ${{\theta }} = {({{\beta }}_0^{\rm T},{{s}}_0^{\rm T})^{\rm T}}$ ，采用Bondell等[4]提出的两阶段方法，得到 ${\hat{ \theta }}$ ，提取 ${\hat{ \theta }}$ 的前 $p$ 个分量即为 $\,{{\hat{ \beta }}_0}$ ，再根据 $\,{\hat \omega _k} = y{}_k - {{d}}_k^{\rm T}{\hat{ \theta }}$ ，利用绝对中位数离差MAD方法得到 $b_j^2$ 的稳健估计。

c.当求得 ${{\hat{ s}}_0}$ $\hat b_j^2$ 后，可根据Cholesky分解还原，就能得到纵向数据协方差矩阵 ${{\hat{ \varSigma }}_0}$

2 交替优化算法

 $\begin{split}&\quad {L_0}({{w}},{{\theta }},{{\varLambda }}) =\displaystyle \sum\limits_{k = 1}^N {{w_k}\log (N{w_k})} - \\ &\qquad {\lambda _1}\left( {\displaystyle\sum\limits_{k = 1}^N {{w_k} - 1} } \right) -{{\lambda }}_2^{\rm T}\displaystyle\sum\limits_{k = 1}^N {{w_k}\left( {{y_k} - {{d}}_k^{\rm T}{{\theta }}} \right){{{d}}_k}} -\qquad\qquad\qquad\qquad\qquad \\ &\qquad {\lambda _3}\displaystyle\sum\limits_{k = 1}^N {{w_k}\left\{ {{{\left( {{y_k} - {{d}}_k^{\rm T}{{\theta }}} \right)}^2} - \tilde \sigma _{T}^2} \right\}} \end{split}$

$\partial {L_0}/\partial {w_k} = 0,\; k = 1,2, \cdots, N$ $\partial {L_0}/\partial {\lambda _1} \!= \!0$ ，解得

 ${w_k} = w_k^*\Biggr/\displaystyle\sum\limits_{k = 1}^N {w_k^*}$

$w_k^ * = \exp\left[ {{\lambda }}_2^{\rm T}{{{d}}_{{k}}}{r_k}({{\theta }}) + \right.$ $\left.{\lambda _3}\left\{ {{r_k}({{\theta }}) - \tilde \sigma _T^2} \right\} \right]$

$\text{∂} {L_0}/\text{∂} \lambda _2^{} = 0,\;\;\;\text{∂} {L_0}/\text{∂}{\lambda _3} = 0,\;\;\;\text{∂} {L_0}/\text{∂} {{\theta }} = 0$ ，可得

 $\left\{\begin{split} &\sum\limits_{k = 1}^N {w_k^ * {r_k}({{\theta }}){{{d}}_k} = 0\;} \\ &\sum\limits_{k = 1}^N {w_k^ * \left\{ {r_k^2({{\theta }}) - \tilde \sigma _T^2} \right\} = 0\;} \;\;\\ &\sum\limits_{k = 1}^N {w_k^ * } \left\{ {{{{d}}_k}{{d}}_k^{\rm T}{{{\lambda }}_2} + 2{r_k}({{\theta }}){{{d}}_k}} \right\} = 0\;\;\end{split}\right.$ (9)

 $\left\{\begin{split}& \sum\limits_{k = 1}^N {\hat w_k^ * } \left\{ {r_k^2({\hat{ \theta }}) - \tilde \sigma _T^2} \right\} = 0\\ &\sum\limits_{k = 1}^N {\hat w_k^ * {r_k}({\hat{ \theta }}){{{d}}_k} = 0} \\ &\hat w_k^ * - \exp \left[ {{{\hat \lambda }_3}\left\{ {r_k^2({\hat{ \theta }}) - \tilde \sigma _T^2} \right\}} \right] = 0,\\ &\qquad k = 1,2, \cdots, N\end{split}\right.$ (10)

 $\begin{split} &\quad J({\lambda _3},{{\theta }}) = (1/N)\displaystyle\sum\limits_k {\exp \left[{\lambda _3}\left\{ {r_k^2({{\theta }}) - \tilde \sigma _T^2} \right\}\right]\;} \; =\qquad\qquad\;\;\\ &\qquad \left\{ \exp ({\lambda _3}\tilde \sigma _T^2)/N \right\}\displaystyle\sum\limits_k {\exp \left\{ {{\lambda _3}r_k^2({{\theta }})} \right\}} \end{split}$

${J_1}({\lambda _3},{{\theta }}) \!=\! \text{∂} J/\text{∂} {\lambda _3}$ ${J_{11}}({\lambda _3},{{\theta }}) \!=\! \text{∂} {J^2}/{\text{∂} ^2}{\lambda _3}$ ${J_2}({\lambda _3},$ ${{\theta }}) \!=\! \text{∂} J/\text{∂} {{\theta }}$

 $\left\{ \begin{array}{l} {J_1}({\lambda _3},{{\theta }}) = 0 \\ {J_2}({\lambda _3},{{\theta }}) = 0 \\ \end{array} \right.$ (11)

 ${J_1}({\hat \lambda _3}({{\theta }}),{{\theta }}) = 0。$ (12)

${\lambda _3} = {\hat \lambda _3}({{\theta }})$ 代入 $J({\lambda _3},{{\theta }})$ ，得到仅与 ${{\theta }}$ 有关的函数 $Q({{\theta }}) = J({\hat \lambda _3}({{\theta }}),{{\theta }})$ ，当 $Q({{\theta }})$ 取极大值时， ${{\theta }}$ 对应的取值记为 ${{\hat{ \theta }}_Q}$ ，则其对应点的一阶导数必为0，即

 ${J_1}({\hat \lambda _3}({{\hat{ \theta }}_Q}),{{\hat{ \theta }}_Q})*(\partial {\hat \lambda _3}({{\theta }})/\partial {{\theta }}) + {J_2}({\hat \lambda _3}({{\hat{ \theta }}_Q}),{{\hat{ \theta }}_Q}) = 0\!\!\!\!\!\!\!\!$ (13)

 $\left\{ \begin{array}{l} {J_1}({{\hat {\lambda} }_{3Q}},{{{\hat{ {\theta} }}}_Q}) = 0 \\ {J_2}({{{\hat {\lambda}} }_{3Q}},{{{\hat{ {\theta} }}}_Q}) = 0 \\ \end{array} \right.$ (14)

a.取定 $\;\tilde \sigma _T^2\; = \min (\hat \sigma _{\rm {OLS}}^2,\;\hat \sigma _{\rm{LTS}}^2)\;$ $\;\hat \sigma _{\rm{LTS}}^2$ 是样本在截尾最小二乘估计下的残差尺度的估计。

b.选取截尾最小二乘估计 ${{\hat{ \theta }}^{(0)}}$ 作为 ${{\theta }}$ 的初始估计值，另外，给定 $\;\varepsilon$ 为一个很小的数。

c. ${{\theta }} = {{\hat{ \theta }}^{(0)}}$ ，求关于 ${\lambda _3}$ 的函数 $J({\lambda _3},{{\hat{ \theta }}^{(0)}})$ 的极小值，即 ${J_1}({\lambda _3},{{\hat{ \theta }}^{(0)}}) = 0$ 时对应的 ${\lambda _3}$ 的取值，记为 $\hat \lambda _3^{(0)} = \arg \min J({\lambda _3},{{\hat{ \theta }}^{(0)}})$

d.再固定 ${\lambda _3} = \hat \lambda _3^{(0)}$ ，求关于 ${{\theta }}$ 的函数 $J(\hat \lambda _3^{(0)},{{\theta }})$ 中的极大值，即 ${J_2}(\hat \lambda _3^{(0)},{{\theta }}) = 0$ 时对应 ${{\theta }}$ 的估计值 ${{\hat{ \theta }}^{(1)}}$ ，此时 ${{\hat{ \theta }}^{(1)}}$ 即为更新后的参数估计值。

e.重复步骤c—d，第 $i$ 次更新后的值记为 ${{\hat{ \theta }}^{(i)}}$ ，直至 $\left| {{{{\hat{ \theta }}}^{(i + 1)}} - {{{\hat{ \theta }}}^{(i)}}} \right| < \varepsilon$ ，或者，循环次数 $i > 1\;000$ 截止，此时 ${{\hat{ \theta }}^{(*)}}$ 记为 ${{{\theta }}^{(i)}}$ 的最终迭代值。

f.计算样本在 ${{\hat{ \theta }}^{(*)}}$ 下的残差 ${r_k}({{\hat{ \theta }}^{(*)}})$ ，则权重 ${w_k} = \exp \left\{ {\hat \lambda _3^{(0)}{r_k}({{{\hat{ \theta }}}^{(*)}})} \right\}\Biggr/\displaystyle\sum\limits_{k=1}^N {\exp \left\{ {\hat \lambda _3^{(0)}{r_k}({{{\hat{ \theta }}}^{(*)}})} \right\}}$ ，最终 ${{\theta }}$ 的估计值由式（7）可得，提取 ${\hat{ \theta }}$ 的前 $p$ 个分量为均值 $\,{{\hat{ \beta }}_0}$ ，最后再利用Cholesky分解，通过对 ${\hat{ \theta }}$ 的反向分解和还原，得到纵向数据协方差矩阵 ${{\hat{ \varSigma }}_0}$

3 模拟研究

 $\left[ \begin{array}{l} \;1\;\;\;\;\;0.7\;\;\;0.4\;\; \\ 0.7\;\;\;\;\;1\;\;\;\;0.2 \\ 0.4\;\;\;\;0.2\;\;\;\;1 \\ \end{array} \right]$

$\,{{{\beta }}_0} = {({\beta _1},{\beta _2},{\beta _3})^{\rm T}} = {(1,1,1)^{\rm T}}$ ，模拟重复次数为100次。 ${{\theta }}$ 的初始估计值选取数据在最小二乘估计下的参数估计值 ${{\hat{ \theta }}_{\rm{OLS}}}$ ，记Xu为本文提出的方法，He是He等[5]提出的稳健广义估计方程的方法，Qin是Qin等[7]提出的稳健方程的估计方法，由于Qin等[7]研究的纵向数据带有缺失，而本文只考虑了污染，故在模拟时将缺失示性函数全部改为1，代表数据不缺失。此外，数据可能存在污染等异常情况，本文 $\,{{{\beta }}_0}$ 是最终估计式（7）中 ${\hat{ \theta }}$ 的前 $p$ 个分量，协方差 ${{\varSigma }}$ 通过返还函数对 ${\hat{ \theta }}$ 反向分解和还原可得，在返还函数中，R表示采用了绝对中位数离差MAD方法，NR表示一般处理方法，以此来验证本文提出的方法是否对异常值具备抵抗性。另外，两种方法中R和NR均代表各自的稳健方法和非稳健方法，表1比较了3种方法得到的 $\,{{\hat{ \beta }}_0}$ 的偏差(BIAS)、标准差(SE)和均方误差(MSE)。

 $QL = {\varDelta _Q}({{\varSigma }},{\hat{ \varSigma }}) = {\left( {{{trace}}({{{\varSigma }}^{ - 1}}{\hat{ \varSigma }} - {{I}})} \right)^2}$
 $EL = {\varDelta _E}({\hat{ \varSigma }},{{\varSigma }}) = {{trace}}({{{\varSigma }}^{ - 1}}{\hat{ \varSigma }}) - \log \left| {{{{\varSigma }}^{ - 1}}{\hat{ \varSigma }}} \right|$

4 实证分析

 $\quad\;\;{{X}} = {({x_1},{x_2},x_2^2,{x_3},x_3^2,{x_4},x_4^2,{x_5},{x_{6,}}x_6^2,{x_7},x_7^2)^{\rm T}}\!\!\!\!$ (15)

 $\qquad\quad{y_{ij}} = {{{X}}_{ij}}^{\rm T}{{\beta }}\;\;\;,i = 1,2 ,\cdots, 340,\;\;\;j = 1,2,3$ (16)

 $MS\!E_{CV} = \frac{1}{n}\sum\limits_{i = 1}^n {\left| {{y_i} - {{\hat y}_{( - i)}}} \right|}$ (17)

 [1] THOMAS D R, GRUNKEMEIER G L. Confidence interval estimation of survival probabilities for censored data[J]. Journal of the American Statistical Association, 1975, 70(352): 865-871. DOI:10.1080/01621459.1975.10480315 [2] OWEN A. Empirical likelihood ratio confidence regions[J]. The Annals of Statistics, 1990, 18(1): 90-120. DOI:10.1214/aos/1176347494 [3] OWEN A. Empirical likelihood for linear models[J]. The Annals of Statistics, 1991, 19(4): 1725-1747. DOI:10.1214/aos/1176348368 [4] BONDELL H D, STEFANSKI L A. Efficient robust regression via two-stage generalized empirical likelihood[J]. Journal of the American Statistical Association, 2013, 108(502): 644-655. DOI:10.1080/01621459.2013.779847 [5] HE X M, FUNG W K, ZHU Z Y. Robust estimation in generalized partial linear models for clustered data[J]. Journal of the American Statistical Association, 2005, 100(472): 1176-1184. DOI:10.1198/016214505000000277 [6] QIN G Y, ZHU Z Y, FUNG W K. Robust estimation of generalized partially linear model for longitudinal data with dropouts[J]. Annals of the Institute of Statistical Mathematics, 2016, 68(5): 977-1000. DOI:10.1007/s10463-015-0519-8 [7] QIN G Y, ZHU Z Y. Robust estimation of mean and covariance for longitudinal data with dropouts[J]. Journal of Applied Statistics, 2015, 42(6): 1240-1254. DOI:10.1080/02664763.2014.999033 [8] CRESSIE N, READ T R C. Multinomial goodness-of-fit tests[J]. Journal of the Royal Statistical Society. Series B, 1984, 46(3): 440-464. [9] CORCORAN S A. Bartlett adjustment of empirical discrepancy statistics[J]. Biometrika, 1998, 85(4): 967-972. DOI:10.1093/biomet/85.4.967 [10] ZEGER S L, DIGGLE P J. Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters[J]. Biometrics, 1994, 50(3): 689-699. DOI:10.2307/2532783 [11] WANG Y G, LIN X, ZHU M. Robust estimating functions and bias correction for longitudinal data analysis[J]. Biometrics, 2005, 61(3): 684-691. DOI:10.1111/j.1541-0420.2005.00354.x