Abstract
Given the very large amount of data obtained everyday through population surveys, much of the new research again could use this information instead of collecting new samples. Unfortunately, relevant data are often disseminated into different files obtained through different sampling designs. Data fusion is a set of methods used to combine information from different sources into a single dataset. In this article, we are interested in a specific problem: the fusion of two data files, one of which being quite small. We propose a model-based procedure combining a logistic regression with an Expectation-Maximization algorithm. Results show that despite the lack of data, this procedure can perform better than standard matching procedures.
Acknowledgments
This study has been realized using data from two separate projects: The “Living in Switzerland 1999–2020” project is carried out by the Swiss Household Panel (SHP) of the Université de Neuchâtel and the Swiss Federal Statistical Office (SFSO). The SMASH 2002 survey was run within a multicenter multidisciplinary group from the Institute for social & preventive medicine in Lausanne*, Institute for Psychology, Psychology of Development and Developemental Disorders, University of Berne, Switzerland**, and the Sezione Sanitaria, Dipartimento della sanità e della socialità, Canton Ticino***: Véronique Addor*, Françoise Alsaker**, Andrea Bütikofer**, Chantal Diserens*, Laura Inderwildi Bonivento***, André Jeannin*, Guy van Melle*, Pierre-André Michaud*, Françoise Narring**, Joan-Carles Suris*, Annemarie Tschumper**. We would like to thank Jean-Philippe Antonietti for is careful reading of the article.