Abstract
When data consist of clusters of potentially correlated observations then conditional logistic regression can be used to estimate the association between a binary outcome and covariates conditionally on the cluster effects. Surveys can use multistage sampling with potentially differential probabilities of sampling individuals from the same conditioning cluster (e.g., family). We show that conditional logistic regression of survey data using standard inflation (weighted) estimation (i.e., observations are weighted by the inverse of their inclusion probabilities) can result in biased estimators when the sample sizes of the observations sampled from the conditioning clusters are small. We propose using methods based on weighted pseudo-pairwise likelihoods that combine the conditional logistic likelihoods for all pairs of observations consisting of a positive and a negative outcome within a conditioning cluster and weights the pairwise likelihoods by the inverse of their joint inclusion probabilities within the cluster. Design-based variance estimators for regression coefficient estimators are provided. Limited simulations demonstrate that the proposed methods produce approximately unbiased regression coefficients and variance estimates, but can be considerably less efficient than maximum likelihood estimation when the sampling is uninformative. The proposed methods are illustrated with an analysis of data from the Hispanic Health and Nutrition Examination Survey.