Full article: Gram–Schmidt–Fisher scoring algorithm for parameter orthogonalization in MLE

Formulae display: $MathJax Logo$ ?Mathematical formulae have been encoded as MathML and are displayed in this HTML version using MathJax in order to improve their display. Uncheck the box to turn MathJax off. This feature requires Javascript. Click on a formula to zoom.

Abstract

The estimation of parameters is a key component in statistical modelling and inference. However, parametrization of certain likelihood functions could lead to highly correlated estimates, causing numerical problems, mathematical complexities and difficulty in estimation or erroneous interpretation and subsequently inference. In statistical estimation, the concept of orthogonalization is familiar as a simplifying technique that allows parameters to be estimated independently and thus separates information from each other. We introduce a Fisher scoring iterative process that incorporates the Gram–Schmidt orthogonalization technique for maximum likelihood estimation. A finite mixture model for correlated binary data is used to illustrate the implementation of the method with discussion of application to oesophageal cancer data.

Keywords:

Public Interest Statement

In statistical estimation, the concept of orthogonalization is simply a technique that allows parameters to be estimated independently, through the rotation of the parameter space. The paper introduces an iterative algorithm that utilizes the Fisher scoring process and incorporates the Gram–Schmidt orthogonalization technique for maximum likelihood estimation. The algorithm that we propose provides an exceedingly convenient method for multi-parameter statistical problems first to reduce numerical difficulties and second to improve accuracy in parameter estimation.

1. Introduction

The problem of estimating parameters is one of the key stages in fitting a statistical model to a set of data. Typically, the model will contain some parameters which are not of interest in themselves, but whose values affect the inferences that we make about the parameters of direct interest. Parametrization used when working with certain distributions could lead to high correlations of the maximum likelihood estimates. Maximization algorithms converge rapidly if the initial estimates are good, the likelihood function is well approximated by a quadratic in the neighbourhood of the parameter space and the information matrix is well conditioned, which means that the parameter estimates are not strongly intercorrelated. High intercorrelation of parameters causes numerical problems, and difficulty or erroneous interpretation in the parameter estimation. In statistical estimation, the concept of orthogonalization is familiar as a simplifying technique that allows parameters to be estimated independently, through rotation of the parameter space. Computationally, all that is required is that the original parameters should be computable from the new ones, and vice-versa. An injective transformation will therefore be desirable, though not necessary.

Ross (ross) gave a comprehensive discussion of techniques that can be used to reduce correlation in particular problems. Among them are sequential and nested maximizations. Kalbliesch and Sprott (kalbliesch) discuss methods aimed at eliminating nuisance parameters from the likelihood function so that inference can be made about only the parameters of interest. Amari (amari, amari1) derived from a theoretical point of view, a general criteria for the existence and construction of orthogonal parameters for statistical models using differential geometry. Cox and Reid (cox) gave a more general procedure and covered the advantages in maximum likelihood estimation. The approach, though popular in theoretical statistics, is computationally impractical in many application situations. Willmot (willmot) discussed orthogonal parametrization but only for a two-parameter family of discrete distributions. His method, however, does not allow for higher parametric problems. Hurlimann (hurlimann) discusses the existence of orthogonal parameterization to the mean, characterized by a partial differential equation involving the mean, the variance and cummulant generating functions. Godambe (godambe) deals with the problem of nuisance parameters, within a semi-parametric set-up which includes the class of distributions associated with generalized linear models. Their technique uses the optimum orthogonal estimating functions of Godambe and Thompson (godambe1). Bonney and Kissling (bonney) applied the Gram–Schmidt orthogonalization (Clayton, clayton) to multi-normal variates and presented an application in genetics. Kwagyan, Apprey and Bonney (kwagyan1) presented the idea of Gram–Schmidt parameter orthogonalization in genetic analysis. A major consequence of parameter orthogonalization is that the maximum likelihood estimate of one parameter is not affected or changes slowly with the misspecification of the other. The most direct interpreted consequence is that the maximum likelihood estimates of orthogonal coordinates are asymptotically independent.

The remainder of this paper is organized as follows. In Section 2, the disposition model for clustered binary data proposed by Bonney (bonney1, bonney2) and further investigated by Kwagyan (kwagyan) is introduced and used as a motivation. In Section 3, an overview of maximum likelihood estimation and parameter orthogonalization is then presented. Section 4 develops the proposed Gram–Schmidt parameter orthogonalization and iterative scheme for maximum likelihood estimation. In section Section , fitting algorithm of the disposition model is discussed with application to oesophageal cancer data in Chinese families. Finally, Section wraps up with discussion of the methodology and outlines directions for further research.

2. Model for correlated binary data

Consider a sample of N groups, each of size $n_{i},$ $i = 1, \dots, N;$ and let $Y_{i} = {(Y_{i 1}, \dots, Y_{{i n}_{i}})}^{T}$ denote the vector of binary outcomes for the ith group. It is postulated that the measures of the outcomes within a group are possibly correlated. Further suppose the jth subject in the ith group has $1 \times p$ individual-specific covariates $X_{i j} = (x_{i j 1}, \dots, x_{i j p})$ and let the ith group have group-specific covariates $Z_{i} = (z_{1}, \dots, z_{q})$ . Let $δ_{i j}$ denote the conditional probability of $Y_{i j} = 1$ given $Y_{i j^{'}} = 1$ ; that is $\begin{matrix} δ_{i j} = P (Y_{i j} = 1 | Y_{i j^{'}} = 1), j \neq j^{'}, j, j^{'} = 1, 2, \dots, n . \end{matrix}$

We call $δ_{i j}$ the individual disposition which is simply interpreted as the probability of the outcome on one unit given another unit from the same cluster has the attribute. This in essence is an indication of aggregation. Assume further that within the group, a pair of observations satisfy the relation $\frac{P (Y_{i j} = 1, Y_{i j^{'}} = 1)}{P (Y_{i j} = 1) P (Y_{i j^{'}} = 1)} = \frac{1}{α_{i}}, α_{i} > 0, j \neq j^{'}, j, j^{'} = 1, 2, \dots, n .$

where $α_{i}$ is common for all pairs. Clearly, $α_{i} = 1$ implies independence of the observations. Thus, $α_{i}$ is a measure of the departure from independence and is called the relative disposition. With the above definitions, Bonney (bonney1, bonney2), and from further investigation through a latent mixture formulation by Kwagyan (kwagyan), has shown that the joint distribution for the N groups can be based on(1) $\begin{matrix} L (Θ) = \prod_{i = 1}^{N} \{(1 - α_{i}) \prod_{j = 1}^{n_{i}} (1 - y_{i j}) + α_{i} \prod_{j = 1}^{n_{i}} δ_{i j}^{y i j} {(1 - δ_{i j})}^{1 - y_{i j}}\} \end{matrix}$ (1)

The $α_{i}$ and $δ_{i j}$ are generally modelled as $α_{i} = \frac{1 + e^{[- {M (Z_{i}) + D (Z_{i})}]}}{1 + e^{- M (Z_{i})}}, δ_{i j} = \frac{1}{1 + e^{- [M (Z_{i}) + D (Z_{i}) + W (X_{i j})]}}$

where $M (Z_{i})$ is function of mean effects, $D (Z_{i})$ is a measure of within-group dependence and $W (X_{i j})$ a function describing the effect of the individual covariates. Typically, $M (Z_{i}), D (Z_{i})$ and $W (X_{i j})$ are modelled, respectively, as $\begin{matrix} M (Z_{i}) & = λ_{0} + λ_{1} Z_{1} + \dots + λ_{p} Z_{p} \\ D (Z_{i}) & = γ_{0} + γ_{1} Z_{1} + \dots + γ_{q} Z_{q} \\ W (X_{i j}) & = β_{1} X_{1} + β_{2} X_{2} + \dots + β_{r} X_{r} \end{matrix}$

where $Θ^{'} = {Λ, Γ, β}$

and $Λ = γ_{0}, \dots, γ_{q}, Γ = λ_{0}, \dots, λ_{q}}, β = {β_{1}, \dots, β_{r}}$

are unknown parameters to be estimated. When data consist of a few large clusters or truncated or size-biased samples, Bonney (bonney3) has shown that there is little or no information to separate the effects of M(Z) and D(Z). The estimates of the parameters tend to be highly correlated. Similar problems occur in other approaches as well. The popular estimating equations approach, GEE (Liang & Zeger, liang), suffers the same problems when there are only a few large clusters (see discussion of Prentice, prentice).

3. Maximum likelihood estimation & parameter orthogonalization

There are different statistical methods for estimating parameters, but the approach most commonly used is that based on maximum likelihood estimation. Maximum likelihood estimates of parameters are those values of the parameters that make the likelihood function a maximum. Let $y = (y_{1}, y_{2}, \dots, y_{n})$ be observed data from a population with a probability distribution $P (y; θ)$ indexed by the vector of unknown parameters $θ = (θ_{1}, \dots, θ_{p})$ . The likelihood function for $θ$ is the joint distribution $L (θ; y) = \prod_{i = 1}^{n} P (y_{i}; θ)$

According to the likelihood principle, $\hat{θ}$ is regarded as the value of $θ$ which is most strongly supported by the observed data. In practice, $\hat{θ}$ is obtained by direct maximization of Log $L (θ)$ or by solving the set of equations from the score function, $U (θ),$ where $U (θ) = \frac{\partial log L}{\partial θ} = 0$

for which the Hessian matrix $H (θ) = (\frac{\partial^{2} log L}{\partial θ \partial θ^{T}})$ is negative definite. The matrix $I (θ) = E [- H (θ)] = E (- \frac{\partial^{2} log L}{\partial θ \partial θ^{T}})$ is called Fisher (expected) Information matrix for $θ$ and its inverse, $Ω (θ) = {[I (θ)]}^{- 1}$ gives the asymptotic variances and covariances of the maximum likelihood estimates.

Suppose $θ$ is a p parameter vector that is partitioned into two vectors $θ_{1}$ and $θ_{2}$ of lengths $p_{1}$ and $p_{2}$ , respectively, $p_{1} + p_{2} = p$ . Cox and Reid (cox) define $θ_{1}$ to be orthogonal to $θ_{2}$ if the elements of the information matrix satisfy $E (\frac{\partial l}{\partial θ_{s}} \frac{\partial l}{\partial θ_{t}}) = E (- \frac{\partial^{2} l}{\partial θ_{s} \partial θ_{t}}) = 0, s = 1, \dots, p_{1}; t = p_{1} + 1, \dots, p$

If this holds for all $θ$ in the parameter space, it is sometimes called global orthogonality. If it holds at only one parameter value say $θ^{0}$ , then $θ_{1}$ and $θ_{2}$ are said to be locally orthogonal at $θ^{0}$ .

We now discuss Cox and Reid (cox) approach for the construction of orthogonal parameters. Suppose that initially the likelihood is specified in terms of $(θ, γ)$ , $γ = (γ_{1}, \dots, γ_{p})$ and let $l (θ, γ)$ be the log-likelihood function. Assume $θ$ is the parameter of interest and $γ$ the set of nuisance parameters. We seek a transformation from $(θ, γ)$ to a new set of orthogonal parameters $(θ, λ)$ , ${λ = (λ}_{1}, \dots, λ_{p})$ . It is easiest to think of the original parameters $γ$ as some function of the new parameters, $λ$ , that is $γ = γ (θ, λ)$

Then, the log-likelihood function can be expressed as $l^{*} (θ, λ) = l (θ, γ_{1} (θ, λ), \dots, γ_{p} (θ, λ)),$

where now $l^{*}$ refers to the log-likelihood in the new parametrization. Taking derivatives of this equation with respect to the new parameters, we have by the chain rule; $\frac{\partial l^{*}}{\partial θ} = \frac{\partial l}{\partial θ} + \sum_{j = 1}^{p} \frac{\partial l}{\partial γ_{j}} \frac{\partial γ_{j}}{\partial θ}$

and(2) $\begin{matrix} \frac{\partial^{2} l^{*}}{\partial θ \partial λ_{k}} = \sum_{l}^{p} \frac{\partial^{2} l}{\partial θ \partial γ_{l}} \frac{\partial γ_{l}}{\partial λ_{k}} + \sum_{j}^{p} \sum_{l}^{p} \frac{\partial^{2} l}{\partial γ_{j} \partial γ_{l}} \frac{\partial γ_{l}}{\partial θ_{k}} \frac{\partial γ_{j}}{\partial θ} + \sum_{l}^{p} \frac{\partial l}{\partial γ_{l}} \frac{\partial^{2} γ_{l}}{\partial θ \partial λ_{k}} \end{matrix}$ (2)

The derivatives of $γ$ are not functions of the data, and hence are constant with respect to expectation. Therefore, after taken expectations with respect to the distribution of the data indexed by the parameters ${θ, γ}$ ,the last term in Equation 3.1 is zero. If the parameters are orthogonal, then again from Equation 3.1, $\sum_{l = 1}^{p} \frac{\partial γ_{l}}{\partial λ_{k}} (E \{\frac{\partial^{2} l}{\partial θ \partial γ_{l}}\} + \sum_{j = 1}^{p} E \{\frac{\partial^{2} l}{\partial γ_{l} \partial γ_{j}}\} \frac{\partial γ_{j}}{\partial θ}) = 0 .$

so that the orthogonality equations are $\sum_{l = 1}^{p} \frac{\partial γ_{l}}{\partial λ_{k}} (E \{\frac{\partial^{2} l}{\partial θ \partial γ_{l}}\} + \sum_{j = 1}^{p} E \{\frac{\partial^{2} l}{\partial γ_{l} \partial γ_{j}}\} \frac{\partial γ_{j}}{\partial θ}) = 0, k = 1, \dots, p .$

We require the transformation from $(θ, γ)$ to $(θ, λ)$ to have a non-zero Jacobian; hence, $\sum_{j = 1}^{p} E \{\frac{\partial^{2} l}{\partial γ_{l} \partial γ_{k}}\} \frac{\partial γ_{j}}{\partial θ} = - E \{\frac{\partial^{2} l}{\partial θ \partial γ_{k}}\}, k = 1, \dots, p$

This is a system of p differential equations which must be solved for $γ (θ, λ)$ . In fact, since $λ$ does not enter explicitly into the equations, the solution for $γ_{j}$ can contain an arbitrary function of $λ$ as the integrating constant. It is noted in the discussion of Cox and Reid that although it is sometimes theoretically possible to find a differential equation, simple explicit solutions of the differential equation were not feasible for the some models. There are also cases where explicit solutions are possible, but the original nuisance parameters could not be explicitly expressed in terms of the orthogonal parameter (Hills, hills). In general, global orthogonalization can also not be achieved by this approach.

4. Gram–Schmidt parameter orthogonalization

The Gram–Schmidt orthogonalization process (Clayton, clayton) is equivalent to the linear transformation $Θ = (θ_{1}, \dots, θ_{p})$ to $Θ^{*} = (θ_{1}^{*}, \dots, θ_{p}^{*})$ defined as:(3) $\begin{matrix} θ_{1}^{*} & = θ_{1} \\ θ_{2}^{*} & = θ_{2} - b_{21} θ_{1} \\ - - - - - - - \\ θ_{j}^{*} & = θ_{j} - \sum_{k = 1}^{j - 1} b_{j (j - k)} θ_{j - k}, j = 2, \dots, p \end{matrix}$ (3)

In linear regression set-up, $θ_{j}^{*}$ is $θ_{j}$ adjusted for $θ_{1}, \dots, θ_{p},$ and $b_{j k},$ $j = 2, \dots, p;$ $k = 1, \dots, (j - 1)$ are the multiple regression coefficients. Writing this using matrix notation, we have $Θ^{*} = B Θ$

where $B$ , the transformation matrix, is lower triangular with ones along the diagonal. The transformation matrix is chosen so that $θ_{1}^{*}, θ_{2}^{*}, \dots, θ_{p}^{*}$ are mutually uncorrelated. The Jacobian of the transformation is unity. Suppose $Ω$ is the covariance matrix of $Θ$ , then the covariance matrix of $Θ^{*},$ $Ω^{*} = B Ω B^{T}$ is a diagonal matrix. We recommend that the parameters be ordered in terms of interest. This ensures that the parameters of most interest are least affected by round-off errors.

4.1. Evaluation of the transformation matrix

To evaluate the transformation matrix, $B$ , let $Σ = (c_{i j}),$ $i, j = 1, \dots, p,$ where $c_{i j}$ $= c o v (θ_{i}, θ_{j}) = {[E (- \frac{\partial^{2} log L}{\partial θ_{i} \partial θ_{j}})]}^{- 1},$ are elements of the covariance matrix. The orthogonality relation, $c_{i j}^{*} = c o v (θ_{i}^{*}, θ_{j}^{*}) = 0$ $(i \neq j)$ implies that for $i < j$ , $\begin{matrix} 0 & = c_{i j}^{*} = c o v (θ_{i}^{*}, θ_{j}^{*}) \\ = c o v (θ_{i} - \sum_{k^{'} = 1}^{i - 1} b_{i (i - k^{'})} θ_{i - k^{'}}, θ_{j} - \sum_{k = 1}^{j - 1} b_{j (j - k)} θ_{j - k}) \\ = c o v (θ_{i}, θ_{j}) - \sum_{k = 1}^{j - 1} b_{j (j - k)} c o v (θ_{i}, θ_{j - k}) \\ - \sum_{k^{'} = 1}^{i - 1} b_{i (i - k^{'})} \{c o v (θ_{j - k^{'}}, θ_{j}) + \sum_{k = 1}^{j - 1} b_{j (j - k)} c o v (θ_{i - k^{'}}, θ_{j - k})\} \\ = c_{i j} - \sum_{k = 1}^{j - 1} b_{j (j - k)} c_{i, j - k} - \sum_{k^{'} = 1}^{i - 1} b_{i (i - k^{'})} \{c_{j - k^{'}, j} + \sum_{k = 1}^{j - 1} b_{j (j - k)} c_{i - k^{'}, j - k}\} \\ = c_{i j} - \sum_{k = 1}^{j - 1} b_{j (j - k)} c_{i, j - k} - \sum_{k^{'} = 1}^{i - 1} b_{i (i - k^{'})} \{c_{j - k^{'}, j}^{*}\} \\ = c_{i j} - \sum_{k = 1}^{j - 1} b_{j (j - k)} c_{i, j - k}, since c_{j - k^{'}, j}^{*} = 0 \end{matrix}$

Thus, the system of linear equations to be solved for entries of the transformation matrix based on the covariance matrix is:(4) $\begin{matrix} c_{i j} = \sum_{k = 1}^{j - 1} b_{j (j - k)} c_{i (j - k)}, i = 1, \dots (j - 1), j = 2, \dots, p . \end{matrix}$ (4)

Consequently, solving the system of linear equations (4.2), we obtain the elements of the transformation matrix as(5) $\begin{matrix} b_{j k} = \sum_{r = 1}^{j - 1} c_{r j} I_{r (j - k)}, j = 2, \dots, p; k = 1, \dots, (j - 1) . \end{matrix}$ (5)

where $I_{r (j - k)}$ are entries of the information matrix.The observed information matrix would be a good approximation of the expected information, if there is difficulty evaluating it.

For illustration, when $p = 4$ , we have Gram–Schmidt orthogonalization process in matrix notation as, $(\begin{matrix} θ_{1}^{*} \\ θ_{2}^{*} \\ θ_{3}^{*} \\ θ_{4}^{*} \end{matrix}) = (\begin{matrix} 1 & 0 & 0 & 0 \\ - b_{21} & 1 & 0 & 0 \\ - b_{31} & - b_{32} & 1 & 0 \\ - b_{41} & - b_{42} & - b_{43} & 1 \end{matrix}) (\begin{matrix} θ_{1} \\ θ_{2} \\ θ_{3} \\ θ_{4} \end{matrix})$

The orthogonalization relationship of the parameters $(θ_{1}^{*}, θ_{2}^{*}, θ_{3}^{*}, θ_{4}^{*})$ implies $\begin{matrix} 0 & = c_{12} - b_{21} c_{11} \\ 0 & = c_{13} - b_{32} c_{12} - b_{31} c_{11} \\ 0 & = c_{23} - b_{32} c_{22} - b_{31} c_{12} \\ 0 & = c_{14} - b_{43} c_{13} - b_{42} c_{12} - b_{41} c_{11} \\ 0 & = c_{24} - b_{43} c_{23} - b_{42} c_{22} - b_{41} c_{12} \\ 0 & = c_{34} - b_{43} c_{33} - b_{42} c_{23} - b_{41} c_{13} \end{matrix}$

And so the system of linear equations to be solved for the elements of the transformation matrix is:

Let Q be the matrix of coefficients for the system of equations (4.4) Above; then, we note that Q is a patterned block diagonal matrix. Furthermore, let $D_{r} (r = 1, 2, 3),$ be the block diagonal of Q; then, in this illustration, $Q = (\begin{matrix} D_{1} & 0 & 0 \\ 0 & D_{2} & 0 \\ 0 & 0 & D_{3} \end{matrix})$

where $D_{1} = (c_{11}), D_{2} = (\begin{matrix} c_{11} & c_{12} \\ c_{12} & c_{22} \end{matrix}), D_{3} = (\begin{matrix} c_{11} & c_{12} & c_{13} \\ c_{12} & c_{22} & c_{23} \\ c_{13} & c_{23} & c_{33} \end{matrix})$

Thus, $D_{r}$ is the covariance matrix of the first r parameters. It can easily be shown that $D_{r}$ is symmetric and positive definite and so Q is non-singular. A unique solution thus exists for the system of linear Equation (4.3) and in general for Equation (4.2). And so, the elements of the transformation matrix, B, can be obtained as

where $I_{i j}$ are the entries of the information matrix .

Having obtained $Θ^{*} = (θ_{1}^{*}, \dots, θ_{p}^{*})$ the original parameters, $Θ = (θ_{1}, \dots, θ_{p})$ can be obtained recursively as $\begin{matrix} θ_{1} & = θ_{1}^{*} \\ θ_{j} & = θ_{j}^{*} + \sum_{k = 1}^{j - 1} b_{j (j - k)} θ_{j - k}, j = 2, \dots, p \end{matrix}$

or writing this using matrix notation, we have $Θ = B^{- 1} Θ^{*}$

4.2. Block orthogonalization

Let $Θ^{T} = {Λ, Γ, β}$ be a set of parameters to be estimated where $Λ = {(λ_{0}, λ_{1}, \dots, λ_{q})}^{T}; Γ = {(γ_{0}, γ_{1}, \dots, γ_{p})}^{T}; β = (β_{1}, β_{1,} \dots, β_{r})^{T}$

Suppose further that the set $Θ^{T} = {Λ, Γ, β}$ is correlated. Then, we wish to find a new set $Θ^{* T} = {Λ^{*}, Γ^{*}, β^{*}}$ through a linear transformation such that the vector of parameters in $Θ^{*}$ is mutually uncorrelated. We allow for correlation, if any, within each set of parameters in $Ω^{*}$ .

Further suppose the vector $β$ is the set of parameters of interest. Then, the Gram–Schmidt orthogonalization process computes the new set of parameters in terms of the original parameters as follows: $\begin{matrix} β^{*} & = β \\ Λ^{*} & = Λ - B_{21} β \\ Γ^{*} & = Γ - B_{31} Λ - B_{32} β \end{matrix}$

where $B_{21},$ $B_{31}$ and $B_{32}$ are matrices with dimensions $q \times r$ , $p \times q$ and $p \times r$ , respectively.

In matrix notation, we write $(\begin{matrix} β^{*} \\ Λ^{*} \\ Γ^{*} \end{matrix}) = (\begin{matrix} 1 & 0 & 0 \\ - B_{21} & 1 & 0 \\ - B_{31} & - B_{32} & 1 \end{matrix}) (\begin{matrix} β \\ Λ \\ Γ \end{matrix})$

or $Θ^{*} = B Θ$

where $B$ is a lower triangular block matrix whose diagonal unit matrix is the transformation matrix chosen such that $Λ^{*}, Γ^{*}, β^{*}$ are mutually uncorrelated and where $Θ^{* T} Θ$ is a block diagonal matrix. The Jacobian of the transformation is unity. The only way in which this procedure could break down would be if one of the vectors of $Θ$ is identically zero. From the method of formation of $Θ^{*}$ , it is clear that it is a linear combination of the vectors in $Θ$ . If $Θ$ is identically zero, this means that $Θ$ are linearly dependent, thus contradicting the assumption of independence. Extensions to more than three sets of vectors of parameters readily follow.

4.3. Gram–Schmidt–Fisher scoring algorithm

We introduce a modification of the Fisher scoring algorithm incorporating the Gram–Schmidt process to obtain an information matrix which is diagonal and thus ensuring the approximate (near) orthogonality and consequently the stability of the estimates of the new parameters. Since the transformation is linear and injective, the original parameters are readily obtained.

Let $l (θ; y)$ be the log-likelihood in the original Parametrization; then, Fisher scoring algorithm is given by the iterative routine(6) $\begin{matrix} θ_{m + 1} = θ_{m} + {[{I (θ}_{m})]}^{- 1} {U (θ}_{m}) \end{matrix}$ (6)

where $I (θ)$ is the expected information matrix and $U (θ)$ is the score function.

Suppose $θ$ is transformed to $θ^{*}$ through the Gram–Schmidt orthogonalization process. Let $B$ $= (b_{i j})$ be the matrix of transformation, defined as in Equation (3.4). Then, since $B$ is square and non-singular, we can write from Equation (4.5).(7) $\begin{matrix} {B θ}_{m + 1} & = {B θ}_{m} + B {[I (θ_{m})]}^{- 1} {[B}^{T} {(B}^{T})^{- 1} {] U (θ}_{m}) \\ {B θ}_{m + 1} & = {B θ}_{m} + {B {[I (θ_{m})]}^{- 1} B^{T}} [{(B^{T})}^{- 1} U (θ_{m})] \\ θ_{m + 1}^{*} & = θ_{m}^{*} + {[I (θ_{m}^{*})]}^{- 1} {U (θ}_{m}^{*}) \end{matrix}$ (7)

where $\begin{matrix} θ_{m}^{*} & = {B θ}_{m} \\ I (θ_{m}^{*}) & = B {[I (θ_{m})]}^{- 1} B^{T} is asymptotically diagonal \\ U (θ_{m}^{*}) & = {(B^{T})}^{- 1} U (θ_{m}) \end{matrix}$

We shall call Equation (4.6) the Gram–Schmidt–Fisher scoring algorithm.

4.3.1. Algorithmic process

The proposed Gram–Schmit–Fisher scoring algorithm is a 2-stage iterative process that oscillates between Equations (4.3) and (4.6) until convergence and is described iteratively as follows:

1	=	Start with an initial estimate of the original parameter, $θ$ , denoted by $θ^{(0)}$ .
2	=	Estimate $B$ at $θ^{(0)}$ from Equation 4.3 that is $b_{j k} = \sum_{r = 1}^{j - 1} c_{r j} I_{r (j - k)}, j = 2, \dots, p; k = 1, \dots, (j - 1)$ .
3	=	Determine the initial estimate of the orthogonal parameterization, $θ^{* (0)}$ from Equation 4.1, i.e. $θ_{1}^{* (0)} = θ_{1}^{(0)}$ , $θ_{j}^{* (0)} = θ_{j}^{(0)} - \sum_{k = 1}^{j - 1} b_{j (j - k)} θ_{j - k}^{(0)}, j = 2, \dots, p$ .
4	=	Update $θ^{}$ to obtain a new value for $θ^{ (1)}$ , that is $θ^{* (1)} = θ^{* (0)} + θ^{*}$ .
5	=	Update $θ$ to obtain a new value for $θ^{(1)}$ , recursively as $θ^{(1)} = B^{- 1} θ^{* (1)}$ .
6	=	Repeat steps 2 through 5 using $θ^{* (1)}$ to obtain $θ^{* (2)}$ .
7	=	Stop when $\| θ^{* (n - 1)} - θ^{* (n)} \| < ε$ .

4.4. Example

We consider a sample problem which concerns inference about the difference between two exponential means. Let $Y_{1}$ and $Y_{2}$ be the independent exponential random variables with means $ϕ$ and $(ϕ + ψ)$ , respectively. Then, the joint distribution is given by the function $f (y_{1}, y_{2} | ϕ, ψ) = \frac{1}{ϕ (ϕ + ψ)} exp (- [\frac{y_{1}}{ϕ} + \frac{y_{2}}{ϕ + ψ}])$

The score vector is $U (ϕ, ψ) = (\begin{matrix} \frac{\partial l}{\partial ϕ} \\ \frac{\partial l}{\partial ψ} \end{matrix}) = (\begin{matrix} - \frac{n}{ϕ} - \frac{n}{ϕ + ψ} + \frac{\sum_{i = 1}^{n} y_{1 i}}{ϕ^{2}} + \frac{\sum_{i = 1}^{n} y_{2 i}}{{(ϕ + ψ)}^{2}} \\ - \frac{n}{ϕ + ψ} + \frac{\sum_{i = 1}^{n} y_{2 i}}{{(ϕ + ψ)}^{2}} \end{matrix})$

The information matrix is $I_{ϕ ψ} = (\begin{matrix} i_{ϕ ϕ} & i_{ϕ ψ} \\ i_{ϕ ψ} & i_{ψ ψ} \end{matrix}) = (\begin{matrix} \frac{n}{{(ϕ + ψ)}^{2}} & \frac{n}{{(ϕ + ψ)}^{2}} \\ \frac{n}{{(ϕ + ψ)}^{2}} & \frac{n}{ϕ^{2}} + \frac{n}{{(ϕ + ψ)}^{2}} \end{matrix})$

and its inverse, the variance–covariance matrix, is $I^{ϕ ψ} = (\begin{matrix} i^{ϕ ϕ} & i^{ψ ϕ} \\ i^{ψ ϕ} & i^{_{^{ψ ψ}}} \end{matrix}) = (\begin{matrix} \frac{1}{n} (2 ϕ^{2} + 2 ϕ ψ + ψ^{2}) & - \frac{1}{n} ϕ^{2} \\ - \frac{1}{n} ϕ^{2} & \frac{1}{n} ϕ^{2} \end{matrix})$

4.4.1. Cox and Reid approach

Orthogonal parametrization, following Cox–Reid method, requires solving the differential equation $\{\frac{1}{(ϕ + ψ)} + \frac{1}{ϕ^{2}}\} \frac{\partial ϕ}{\partial ψ} = - \frac{1}{ϕ + ψ}$

This can be solved by separation of variables, leading to $a (λ) = ϕ (ψ + ϕ) / (ψ + 2 ϕ),$ where $a (λ)$ is an arbitrary function of $λ .$ Cox–Reid suggest setting $a (λ) = e^{λ}$ as a suitable choice. Clearly, this does not produce a unique solution for $ϕ$ , regardless of the parametrization of $a (λ) .$ Thus, different reparametrizations may lead to different modified likelihoods, so that a Cox–Reid estimator may not exist or there may be many of them.

4.4.2. Gram–Schmidt approach

The proposed Gram–Schmidt approach requires seeking a transformation $(ψ, ϕ)$ to $(ψ, λ (ψ, ϕ))$ such that $λ = ϕ - b_{21} ψ$ and solving the equation $i^{ϕ ψ} = b_{21} i^{ψ ψ}$

From the variance–covariance matrix, we obtain $b_{21} (ϕ, ψ) = - \frac{ϕ^{2}}{(2 ϕ^{2} + 2 ϕ ψ + ψ^{2})} ⟹ λ = ϕ + [\frac{ϕ^{2}}{(2 ϕ^{2} + 2 ϕ ψ + ψ^{2})}] ψ$

Just like the Cox–Ried approach, it is impractical to obtain or express $ϕ$ uniquely in terms of $ψ$ and $λ$ and subsequently formulate a modified approximate orthogonal joint distribution function. Therefore, one could proceed iteratively, using the proposed Gram–Schmidt–Fisher scoring algorithm.

5. Fitting the disposition model

We will present procedures for estimating the parameters in the correlated model Equation (2.1). The calculations are standard, and so we will only outline the results. We write the likelihood of the joint distribution of the ith group as $L_{i} (Y_{i}; θ) = (1 - α_{i}) \prod_{j = 1}^{n_{i}} (1 - y_{i j}) + α_{i} L_{0 i} (θ | y_{i})$

where $\begin{matrix} L_{0 i} (θ | y_{i}) & = \prod_{j = 1}^{n_{i}} δ_{i j}^{y_{i j}} {(1 - δ_{i j})}^{1 - y_{i j}} \\ δ_{i j} (θ) & = \frac{1}{1 + exp {- [M (Z_{i}) + D (Z_{i}) + W (X_{i j})]}} \\ α_{i} (θ) & = \frac{1 + exp [- (M (Z_{i}) + D (Z_{i})]}{1 + exp [- M (Z_{i})]} \end{matrix}$

Let $Υ_{i j} (θ) = M (Z_{i}) + D (Z_{i}) + W (X_{i j})$

and define the following $Υ_{i j}^{(1)} = \frac{\partial}{\partial θ} Υ_{i j} (θ) and Υ_{i j}^{(2)} = \frac{\partial^{2}}{\partial θ^{T} \partial θ} Υ_{i j} (θ)$

The contribution of the i-th group to the log-likelihood is the term $log L_{i} (θ | y_{i}) = log [\{1 - α (θ)\} \prod (1 - y_{i j}) + α (θ) L_{0 i} (θ | y_{i})]$

and the contribution to the score function is the term $U_{_{i}} = \frac{\partial}{\partial θ} log L_{i} (θ) = A_{i} (θ) α_{i}^{*} + D_{i} (θ) U_{0 i} (θ | y),$

where $\begin{matrix} α_{i}^{*} (θ | y) & = \frac{\partial}{\partial θ} log α (θ) = δ_{i 0} (1 - α_{i}) \frac{\partial}{\partial θ} M_{i} (Z) - (1 - δ_{i 0}) \frac{\partial}{\partial θ} D_{i} (Z) \\ δ_{i 0} & = \frac{1}{1 + exp {- [M (Z_{i}) + D (Z_{i})]}} \\ A_{i} (θ) & = α_{i} (θ) [L_{0 i} (θ | y) - \prod (1 - y_{i j})] / L_{i}, \\ D_{i} (θ) & = \frac{α_{i} L_{0_{i}}}{L_{_{i}}} \\ U_{0 i} (θ | y_{i}) & = \frac{\partial}{\partial θ} log L_{0 i} (θ | y) = \sum_{j = 1}^{n_{i}} (y_{i j} - δ_{i j}) Υ_{i j}^{(1)}; \end{matrix}$

Setting $U (θ) = \sum_{i = 1}^{N} U_{i} = 0$ , we obtain the score equations. Closed-form solutions are not possible and so the equations are solved by iterative procedures to obtain the maximum likelihood estimates of the parameters.

The contribution of the i-th group to the Hessian matrix is the term $H_{i} (θ) = \frac{\partial^{2} l_{i} (θ)}{\partial θ \partial θ^{t}} = \sum_{j = 1}^{n_{i}} [(y_{i j} - δ_{i j}) Υ_{i j}^{(2)} - δ_{i j} (1 - δ_{i j}) Υ_{i j}^{(1)} Υ_{i j}^{(1) T}] .$

Estimates of the parameter vector can be obtained by the Newton–Raphson iteration routine, which is given by the updating the formula $θ_{s + 1} = θ_{s} - {[H (θ_{s})]}^{- 1} U (θ_{s})$

where $θ_{s}$ is the estimate of the sth iteration.

The contribution of the i-th group to the Fisher information matrix is the term $I_{i} (θ) = α_{i} I_{0 i} + A_{i} α_{i}^{*} α^{* T} + D_{i} [α^{*} U_{0 i}^{T} + U_{0 i} α^{* T}] - D_{i} (1 - α_{i}) U_{0 i} U_{0 i}^{T}$

where $I_{0_{i}} = \sum_{j = 1}^{n_{i}} δ_{i j} (1 - δ_{i j}) Υ_{i j}^{(1)} Υ_{i j}^{(1) T}$

and where $A_{i},$ $D_{i}$ and $U_{0 i}$ are evaluated at $y = (y_{1}, y_{2}, \dots, y_{n}) = 0$ . The estimates can be alternatively obtained by the Fisher scoring method which is given by $θ_{s + 1} = θ_{s} + {[I (θ_{s})]}^{- 1} U (θ_{s})$

The asymptotic variance–covariance matrix of the parameter estimates, $C (θ) = {[I (θ)]}^{- 1}$ , is the inverse of the information matrix. Thus, the transformation matrix for use of the proposed Gram–Schmidt–Fisher scoring algorithm can be obtained as described from Equation 4.4. Specifically, the transformation matrix, a lower triangular matrix with ones along the diagonal, is $B = (b_{j k})$ , where $b_{j k} = \sum_{r = 1}^{j - 1} c_{r j} I_{r (j - k)},$ $j = 2, \dots, p;$ $k = 1, \dots, (j - 1)$ . and where $(c_{r j})$ and $(I_{r (j - k)})$ are entries of the variance–covariance matrix , $C (θ),$ and the information matrix, $[I (θ)],$ respectively.

The parameter estimates can subsequently be obtained by the Gram–Schmidt–Fisher scoring algorithm given by $θ_{s + 1}^{*} = θ_{s}^{*} + {[I (θ_{s}^{*})]}^{- 1} U (θ_{s}^{*})$

where $θ_{s}^{*} = {B θ}_{s}; U (θ_{s}^{*}) = {(B^{T})^{- 1} U (θ_{s}); I (θ_{s}^{*}) = B [I (θ_{s})]}^{- 1} B^{T}$

5.1. Application to oesophageal cancer in Chinese families

This application involves the study of oesophageal cancer in 2951 nuclear families collected in Yangcheng County, Shanxi Province in China (Kwagyan, kwagyan). In this study, we consider as a group the nuclear family unit. The outcome variable is whether an individual is affected with oesophageal cancer or not. The objective of the study is to assess the presence and aggregation of oesophageal cancer in these families. Table summarizes the distribution of number of affected individuals by family size. Of the 2951 families, 1580 (53%) had no affected individuals. The combined total number of individuals from the studied families was 14310, with mean $\pm$ sd age of 48.26 $\pm$ 18.17 years. Males comprise 56.4 and 25.4% indicated drinking alcohol.

Table 1. Distribution of family size by number of affecteds

Download CSV Display Table

The respondent within a family has correlated outcomes which are influenced in part or wholly by the group variables as well as the variables on the individual respondent. The main objective here is to assess the presence of familial aggregation of oesophageal cancer adjusting for measured risk factors: sex, age and alcohol consumption. Here, the model for the regression analysis of disposition is parametrization as $M (Z) = λ, D (Z) = γ and W (X) = β_{1} * sex + β_{2} * alcohol + β_{3} * age$

In this application, the vector of parameters $β = (β_{1}, β_{2}, β_{3})$ is the set of parameters of interest. Computations are performed using the computer program we developed CORRDAT (Bonney, Kwagyan, Slater, & Slifker, bonney5), which was linked with the likelihood optimization software MULTIMAX (Bonney, Kwagyan, & Apprey, bonney4). Computations can also be accomplished in MATLAB. Table gives estimates of the correlation matrices. The correlations between the original parameters are quite high compared to those of the orthogonal parameters. In particular, the correlation between $λ$ and $γ$ is $- 0.179$ ; the correlation between $λ$ and $β_{1}$ is $- 0.414$ ; and that between $λ$ and $β_{3}$ is high, $- 0.841$ . The correlations between the orthogonal parameters are near zero or nonexistent. The correlation between $λ^{*}$ is and $γ^{*}$ is $- 0.026$ ; the correlation between $λ^{*}$ and $β_{1}^{*}$ is now low, 0.004 and that between $λ^{*}$ and $β_{3}^{*}$ is also low, 0.0188.

As expected, the orthogonal likelihood converged in fewer iterations than the non-orthogonal one—the orthogonal likelihood converged after 17 iterations; the non-orthogonal one converged after 56 iterations. Estimates of the parameters are summarized in Table . The maximum likelihood estimate of the relative dependence parameter, $\hat{γ} = 0.577$ with a corresponding asymptotic 95% confidence interval of (0.529, 0.625) suggests that there is significant familial aggregation of oesophageal cancer in the families sampled. Sex and age have a positive significance, while alcohol was negative. The results suggest that males are at a higher risk of getting oesophageal cancer than females and also that it is more prevalent in older people. The negative effect of alcohol seems to suggest that it has the propensity to lower the risk of oesophageal cancer in the Chinese population studied, perhaps if drank in moderation. In summary, we conclude that oesophageal cancer aggregates in the families sampled.

Table 2. Correlation matrix from original and orthogonal parametrization. Fisher scoring algorithm used for original parametrization and Gram–Schmidt–Fisher scoring algorithm used of orthogonal parametrization

Display Table

Table 3. Regression analysis of disposition to oesophageal cancer. Fisher scoring algorithm used for original parametrization and Gram–Schmidt–Fisher scoring algorithm for orthogonal parametrization

Display Table

6. Conclusions

Parameter orthogonalization is used as an aid in computation, approximation, interpretation and elimination or reduction of the effect of nuisance parameters. The resulting algorithm that we have proposed based on the Gram–Schmidt orthogonalization process is computationally feasible. Unlike the approach of Cox–Reid which requires solving a system of differential equations to obtain orthogonality, the proposed method only requires solving a system of linear equations. Our approach has some similarities with the conjugate gradient algorithm, but is distinctly different from it. The conjugate gradient method is an optimization routine, whereas the method we have proposed is principally an approximate orthogonalization technique combined with a maximization algorithm to aid in parameter estimation. Amari (amari1) claims that global orthogonalization is possible only in a special case. Our method allows for global orthogonalization, if the information matrix can be found or estimated. The approach we have proposed is based on exact calculations. The transformation is surjective, that is the original parameters can be sufficiently obtained. Clearly, the process can be cumbersome if a large number of parameters must be accurately estimated because of the inversion of the information matrix at every stage to obtain the covariance matrix and subsequently the transformation matrix. The advantages, however, are that convergence is generally rapid in iteration times, sure and accurate. Block parametrization of parameters would be desirable if interest is in sets of parameters and where the dimension of the parameter space is large. The work we have presented in this article is the first attempt to consider the use of Gram–Schmidt process for estimation of the parameters. It is possible, however, that closer scrutiny, practical considerations, numerical studies for numerical stability and properties, and simulation studies to evaluate and compare convergence times would suggest modifications and refinements to the methods we have discussed.

In conclusion, the algorithm that we have proposed provides an exceedingly convenient method for multi-parameter statistical problems first to reduce numerical difficulties and second improve accuracy in parameter estimation. The method would be efficient for use of function minimization in both linear and non-linear maximum likelihood estimations and particularly useful for small parameter space estimation problems.

Additional information

Funding

The work was supported in part from NIH/NCATS [grant number UL1TR000101 previously UL1RR031975] and NIH/NIMHHD [grant number G12MD007597].

Notes on contributors

John Kwagyan

Dr John Kwagyan is a mathematician and a public health and medical researcher. His research interests include mathematical and statistical modelling of correlated data, clinical trials, survival models, big data analytics, statistical genetics and pharmacokinetic & pharmacodynamics modeling. He is currently the director of Biostatistics, Epidemiology and Research Design, of the Georgetown-Howard University Center for Clinical and Translational Science, Howard University College of Medicine and adjunct professor in the Department of Mathematics, Howard University, Washington, DC, USA.

References

Amari, S. I. (1982). Differential geometry of curved exponential families- curvatures and information loss. Annals of Statistics, 10, 357–85.
Web of Science ®Google Scholar
Amari, S. I. (1985). Differential geometry in statistics. New York, NY: Springer Verlag.
Google Scholar
Bonney, G. E., & Kissling, G. E. (1986). Gram--Schmidt orthogonalization of multinormal variates: Applications in genetics. Biometrical Journal, 28, 417–425.
Web of Science ®Google Scholar
Bonney, G. E. (1995). Some new results on regressive models in family studies. Proceedings of the Biometrics Section, American Statistical Association, 177–182.
Google Scholar
Bonney, G. E. (1998a). Regression analysis of disposition to correlated binary outcomes (Scientific Report). Philadelphia, PA: Fox Chase Cancer Center.
Google Scholar
Bonney, G. E. (1998b). Regression analysis of disposition to correlated binary outcomes. Unpublished Manuscript.
Google Scholar
Bonney, G. E., Kwagyan, J., & Apprey, V. (1997). MULTIMAX-A computer package for MULTI-objective MAXimization with applications in genetics and epidemiology. The American Journal of Human Genetics, 61, 447.
Web of Science ®Google Scholar
Bonney, G. E., Kwagyan, J., Slater, E., & Slifker, M. (1997). CORRDAT-A computer package for CORRelated DATa. The American Journal of Human Genetics, 61, A194.
Web of Science ®Google Scholar
Cox, D. R., & Reid, N. (1987). Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society: Series B, 49, 1–39.
Google Scholar
Cox, D. R., & Reid, N. (1989). On the stability of maximum likelihood estimators of orthogonal parameters. Canada Journal of Statistics, 17, 229–233.
Web of Science ®Google Scholar
Clayton, D. G. (1971). Gram-Schmidt orthogonalization. Journal of the Royal Statistical Society: Series C (Applied Statistics), 20, 335–338.
Web of Science ®Google Scholar
Godambe, V. P. (1991). Orthogonality of estimating functions and nuisance parameters. Biometrika, 78, 143–151.
Web of Science ®Google Scholar
Godambe, V. P., & Thompson, M. E. (1989). An extension of quasi-likelihood estimation (with discussion). Journal of Statistical Planning and Inference, 22, 137–72.
Web of Science ®Google Scholar
Hills, S. E. (1987). Parameter orthogonality and approximate conditional inference [Discussion]. Journal of the Royal Statistical Society: Series B, 49, 1–39.
Google Scholar
Hurlimann, W. (1992). On parameter orthogonality of the mean. Statistical Papers, 33, 69–74.
Google Scholar
Kalbliesch, J. D., & Sprott, D. A. (1970). Application of likelihood methods to models involving large number of parameters. Journal of the Royal Statistical Society. Series B (Methodological), 32, 175–208.
Google Scholar
Kwagyan, J. (2001). Further Investigation of the disposition Model for correlated binary outcomes ( PhD Thesis). Philadelphia, PA: Department of Statistics, Temple University.
Google Scholar
Kwagyan, J., Apprey, V., & Bonney, G. E. (2001). Parameter orthogonalization in genetic analysis. Genetic Epidemiology, 21, 163 IGES 059.
Google Scholar
Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22.
Web of Science ®Google Scholar
Prentice, R. L. (1988). Correlated binary regression with covariates specific to each binary observation. Biometrics, 44, 1088–1048.
Web of Science ®Google Scholar
Ross, G. J. S. (1970). The efficient use of function minimization in non-linear maximum-likelihood estimation. Applied Statistics, 19, 205–221.
Web of Science ®Google Scholar
Willmot, G. E. (1988). Parameter orthogonality for a family of discrete distributions. The Journal of the Acoustical Society of America, 83, 517–521.
Google Scholar

Gram–Schmidt–Fisher scoring algorithm for parameter orthogonalization in MLE

Abstract

Public Interest Statement

1. Introduction

2. Model for correlated binary data

3. Maximum likelihood estimation & parameter orthogonalization

4. Gram–Schmidt parameter orthogonalization

4.1. Evaluation of the transformation matrix

4.2. Block orthogonalization

4.3. Gram–Schmidt–Fisher scoring algorithm

4.3.1. Algorithmic process

4.4. Example

4.4.1. Cox and Reid approach

4.4.2. Gram–Schmidt approach

5. Fitting the disposition model

5.1. Application to oesophageal cancer in Chinese families

Table 1. Distribution of family size by number of affecteds

Table 2. Correlation matrix from original and orthogonal parametrization. Fisher scoring algorithm used for original parametrization and Gram–Schmidt–Fisher scoring algorithm used of orthogonal parametrization

Table 3. Regression analysis of disposition to oesophageal cancer. Fisher scoring algorithm used for original parametrization and Gram–Schmidt–Fisher scoring algorithm for orthogonal parametrization

6. Conclusions

Notes on contributors

John Kwagyan

References

Information for

Open access

Opportunities

Help and information

Gram–Schmidt–Fisher scoring algorithm for parameter orthogonalization in MLE

Abstract

Public Interest Statement

1. Introduction

2. Model for correlated binary data

3. Maximum likelihood estimation & parameter orthogonalization

4. Gram–Schmidt parameter orthogonalization

4.1. Evaluation of the transformation matrix

4.2. Block orthogonalization

4.3. Gram–Schmidt–Fisher scoring algorithm

4.3.1. Algorithmic process

4.4. Example

4.4.1. Cox and Reid approach

4.4.2. Gram–Schmidt approach

5. Fitting the disposition model

5.1. Application to oesophageal cancer in Chinese families

Table 1. Distribution of family size by number of affecteds

Table 2. Correlation matrix from original and orthogonal parametrization. Fisher scoring algorithm used for original parametrization and Gram–Schmidt–Fisher scoring algorithm used of orthogonal parametrization

Table 3. Regression analysis of disposition to oesophageal cancer. Fisher scoring algorithm used for original parametrization and Gram–Schmidt–Fisher scoring algorithm for orthogonal parametrization

6. Conclusions

Additional information

Funding

Notes on contributors

John Kwagyan

References

Related research

To cite this article:

Download citation

Your download is now in progress and you may close this window

Login or register to access this feature

Information for

Open access

Opportunities

Help and information

Keep up to date