392
Views
2
CrossRef citations to date
0
Altmetric
Original Articles

Analysis of repairable systems with severe left censoring or truncation

&
 

ABSTRACT

Left censoring or left truncation occurs when specific failure information on machines is not available before a certain age. If only the number of failures but not the actual failure times before a certain age is known, we have left censoring. If neither the number of failures nor the times of failure are known, we have left truncation. A datacenter will typically include servers and storage equipment installed on different dates. However, data collection on failures and repairs may not begin on the installation date. Often, the capture of reliability data starts only after the initiation of a service contract on a particular date. Thus, such data may exhibit severe left censoring or truncation, since machines may have operated for considerable time periods without any reliability history being recorded. This situation is quite different from the notion of left censoring in non-repairable systems, which has been dealt with extensively in the literature. Parametric modeling methods are less intuitive when the data has severe left censoring. In contrast, non-parametric methods based on the Mean Cumulative Function (MCF), recurrence rate plots, and calendar time analysis are simple to use and can provide valuable insights into the reliability of repairable systems, even under severe left censoring or truncation. The techniques shown have been successfully applied at a large server manufacturer to quantify the reliability of computer servers at customer sites. In this discussion, the techniques will be illustrated with actual field examples.

Additional information

Notes on contributors

David Trindade

David Trindade is the Chief of Best Practices and Fellow at Bloom Energy. Previously, he was a Distinguished Principal Engineer at Sun Microsystems. He has been Senior Director of Software Quality at Phoenix Technologies, Senior Fellow and Director of Reliability and Applied Statistics at Advanced Micro Devices (AMD), Worldwide Director of Quality and Reliability at General Instruments, and Advisory Engineer at IBM. He has a B.S. in Physics, an M.S. in Statistics, an M.S. in Material Sciences, and a Ph.D. in Mechanical Engineering and Statistics. He has been an adjunct lecturer at the University of Vermont and Santa Clara University, teaching courses in statistical analysis, reliability, probability, and applied statistics, especially design of experiments (DOE), and statistical process control (SPC). In 2008, he was the recipient of the IEEE Reliability Society's Lifetime Achievement Award. He is a Senior Member of IEEE and ASQ and a Fellow of the American Statistical Association.

Swami Nathan

Swami Nathan is a Senior Quality and Reliability Engineer at Intel. His interests encompass field data analysis, statistical analysis, and reliability and availability modeling of complex systems. He received his B. Tech from Indian Institute of Technology and M.S. and Ph.D. degrees in reliability engineering from the University of Maryland, College Park. He has authored over a dozen papers in peer reviewed journals and international conferences and holds two patents.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.