817
Views
5
CrossRef citations to date
0
Altmetric
Original Articles

Generating survival times with time-varying covariates using the Lambert W Function

, , , , &
Pages 135-153 | Received 20 Sep 2018, Accepted 21 Jul 2019, Published online: 08 Aug 2019
 

Abstract

Simulation studies provide an important statistical tool in evaluating survival methods, requiring an appropriate data-generating process to simulate data for an underlying statistical model. Many studies with time-to-event outcomes use the Cox proportional hazard model. While methods for simulating such data with time-invariant predictors have been described, methods for simulating data with time-varying covariates are sorely needed. Here, we describe an approach for generating data for the Cox proportional hazard model with time-varying covariates when event times follow an Exponential or Weibull distribution. For each distribution, we derive a closed-form expression to generate survival times and link the time-varying covariates with the hazard function. We consider a continuous time-varying covariate measured at regular intervals over time, as well as time-invariant covariates, in generating time-to-event data under a number of scenarios. Our results suggest this method can lead to simulation studies with reliable and robust estimation of the association parameter in Cox-Weibull and Cox-Exponential models.

Acknowledgments

We thank Dr. Timothy Heeren and Dr. Michael Pencina for their input in the simulation analysis.

Disclosure statement

There were no competing interests in the conduct of this study.

Data availability statement

The data used in this study are available at NIH BioLINCC (https://biolincc.nhlbi.nih.gov/home/). They can also be provided to interested researchers on written request to FHS. Request for FHS data may be done by submitting a proposal through the FHS web-based research application. A catalog of the FHS data repository may be accessed through the FHS website: www.framinghamheartstudy.org/researchers/description-data/

Ethics, approval and consent

The research protocols of the FHS are reviewed annually by the Institutional Review Board of the Boston University Medical Center and by the Observational Studies Monitoring Board of the National Heart, Lung and Blood Institute. Since 1971, written consent has been obtained from participants before each examination. Information about the content of the Framingham Heart Study research examinations is presented to the participants at each examination cycle in the text of the corresponding consent form and in a discussion with a trained admitting coordinator at the beginning of the scheduled appointment. Information from every completed consent form is coded and recorded in a database. Questions regarding the ethical conduct of research are presented by FHS investigators to the FHS Ethics Advisory Board. The Advisory Board reviews, discusses, and make recommendations regarding these questions.

Consent to publish

Manuscript does not contain any individual person’s data.

R Codes For Generating Survival Times Using Time-Varying Covariates

##########################################

### Modeling Survival and Longitudinal ###

##########################################

# Sample Size = 1000; Event Rate = 10%; Link = 0.500;

library(MCMCpack)

library(stats)

library(Rlab)

library(MASS)

library(Matrix)

library(mvtnorm)

library(survival)

library(ggplot2)

library(coda)

library(lattice)

library(boa)

library(nlme)

library(car)

set.seed(33) # save seed

rm(list = ls())

Set <- as.character(1:1000)

for (K in Set) {

#################################

### Creating Global Variables ###

#################################

Time <- c(0, 5, 10, 15, 20, 25)

N=1000

M=6

###############################

### Creating Random Effects ###

###############################

Ubeta = c(4.250, 0.250)

G = structure(.Data = c(0.29, -0.00465, -0.00465, 0.000320), .Dim = c(2, 2))

UY = rmvnorm(N, mean = Ubeta, sigma = G, method = "chol")

Z = matrix(0,M,length(dim(G)))

Z[,1] <- 1

Z[,2] <- c(Time)

muy <- matrix(0,N,M)

Y <- matrix(0,N,M)

R <- diag(rnorm(M,1.000,0.00001), M)

V=Z%*%G%*%t(Z) + R

###############################

### Covariate Specification ###

###############################

SEX = rbern(N,0.540)

TREATMENT = rbern(N,0.500)

for (i in 1:N){if (SEX[i] == 0) SEX[i] = 1 else SEX[i] = 2 }

AGE = round(rnorm(N,35,5))

###########################################

### Generating the Trajectories Outcome ###

###########################################

for (j in 1:M){

for (i in 1:N){

muy[i, j] <- UY[i,1] + UY[i,2]*(Time[j])

}

}

muy <- data.frame(muy)

names(muy) <- c("X1", "X2", "X3", "X4", "X5", "X6")

###########################

### Simulating Y Values ###

###########################

Y <- rmvnorm(n=N, mean = colMeans(muy), sigma = V, method = "chol")

Y <- data.frame(Y)

names(Y) <- c("Y1", "Y2", "Y3", "Y4", "Y5", "Y6")

Error <- rnorm(N, 0, 0.001)

for (i in 1:N){

Y[i, ] <- Y[i, ] + 0.05*AGE[i] + Error[i]

}

boxplot(Y, main = "Boxplot of time-varying covariate measures by Exam")

##############################

### Linear Mixed Model Fit ###

##############################

Timeinterval <- rep(Time, N)

YSimul <- c(t(as.matrix(Y)))

ID <- rep(1:N, each = M)

CAGE <- rep(AGE, each = M)

Long <- data.frame(ID, YSimul, Timeinterval, CAGE)

LME <- lme(YSimul ∼ Timeinterval + CAGE, random = ∼ Timeinterval | ID, data = Long, control = lmeControl(msMaxIter = 100, msVerbose = TRUE, opt = "optim"))

U <- coef(LME)

names(U) <- c("X1_1", "X2_1")

LME_Coeff <- data.frame(coefficients(LME))

colMeans(LME_Coeff)

############################

### Lambert's W Function ###

############################

LambertW = function(z, b=0,maxiter=10,eps=.Machine$double.eps,

min.imag = 1e-9) {

if (any(round(Re(b)) != b))

stop("branch number for W must be an integer")

if (!is.complex(z) && any(z<0)) z = as.complex(z)

## series expansion about -1/e

##

## p = (1 - 2*abs(b)).*sqrt(2*e*z+2);

## w = (11/72)*p;

## w = (w - 1/3).*p;

## w = (w+1).*p - 1

##

## first-order version suffices:

##

w = (1 - 2*abs(b))*sqrt(2*exp(1)*z+2) - 1

## asymptotic expansion at 0 and Inf

##

v = log(z + as.numeric(z==0 & b==0)) + 2*pi*b*1i;

v=v - log(v + as.numeric(v==0))

## choose strategy for initial guess

##

c = abs(z + exp(-1));

c = (c>1.45 - 1.1*abs(b));

c=c | (b*Im(z) > 0) | (!Im(z) & (b == 1))

w = (1 - c)*w+c*v

## Halley iteration

##

for (n in 1:maxiter) {

p = exp(w)

t=w*p - z

f = (w != -1)

t=f*t/(p*(w+f) - 0.5*(w+2.0)*t/(w+f))

w=w - t

if (abs(Re(t)) < (2.48*eps)*(1.0+abs(Re(w)))

&& abs(Im(t)) < (2.48*eps)*(1.0+abs(Im(w))))

break

}

if (n==maxiter) warning(paste("iteration limit (",maxiter,") reached, result of W may be inaccurate", sep=""))

if (all(Im(w) < min.imag)) w = as.numeric(w)

return(w)

}

#############################################

#### Specification of Parameter Estimates ###

#############################################

link <- 0.500

Agebeta <- 0.050

Sexbeta <- -0.500

#################################################

### Survival Times Using Weibull Distribution ###

#################################################

wshape = 3

# Lambda (Scale) = (1/Lambda)^v

wikiscale = 150

wscale = 1/(wikiscale)^(wshape)

Uni <- runif(N, min = 0, max = 1)

Numweib <- -log(Uni)

Denweib <- wscale*exp(Agebeta*AGE + Sexbeta*SEX + link*(U[,1]))

ratioweib <- link*(U[,2])*(1/wshape)*((Numweib/Denweib)⁁(1/wshape))

Lweib <- LambertW(ratioweib)

Survweib <- Lweib*1/(link*(U[,2])*(1/wshape))

hist(Survweib)

##################################################

### Censoring Times Using Uniform Distribution ###

#################################################

cwshape = 3

cscale = 14

Censoring <- rweibull(n=N, shape = cwshape, scale = cscale)

# Censoring <- runif(n=N, min = 25, max = 30)

##############################

### Time to Event Variable ###

##############################

Timetoevent = pmin(Survweib, Censoring)

eventoccur = as.numeric(Survweib < = Censoring)

table(eventoccur)

summary(coxph(Surv(Timetoevent, eventoccur) ∼ AGE + SEX))

###################

### Data Output ###

###################

ID <- rep(1:N)

LongSurv <- data.frame(ID, as.matrix(Y), U, AGE, SEX, Survweib)

write.csv(LongSurv, paste("JMLSD_N1000_E10_S1.00_L0.50_", K ,"",".csv", sep=""), row.names = FALSE)}

Additional information

Funding

This work was supported by the National Heart, Lung and Blood Institute's Framingham Heart Study (Contracts No. N01-HC-25195 and HHSN268201500001I) of the National Institutes of Health and Boston University School of Medicine. A portion of this research was conducted using Boston University Linux Cluster for Genetic Analysis (LinGA) funded by the National Center for Research Resources (NIH NCRR).

Notes on contributors

Michael P. LaValley

JSN and LAC conceived the study. The data were cleaned by JSN and LAC. All authors contributed towards the design of the simulation scenarios implemented in the study. The analysis was carried out by JSN. The manuscript was written by JSN. All authors contributed toward reviewing and revising the final manuscript.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.