976
Views
0
CrossRef citations to date
0
Altmetric
Editorial

Networked knowledge, combinatorial creativity, and (statistical) innovation

During one of a series of long 10-hour drives to Pennsylvania, I came across the Facebook page “Brain Pickings” by Maria Popova. The page covers topics across literature, science, art, philosophy, and the various other tentacles of human thought drawn from the extended marginalia of the author’s search for meaning (Popova Citation2006). Over its 15 years of existence, the author, on several occasions, poured her thoughts on networked knowledge and combinatorial creativity, which I find insightful. She states that “(practical creativity is) the ability to connect the seemingly unconnected and meld existing knowledge into new insight about some element of how the world works.” She thinks that creativity requires “connect(ing) countless dots and cross-pollination of ideas from a wealth of disciplines, to combine and recombine these pieces and build new castles”. I think it resonated some familiar thought – my perspective on innovation which I hold as not always about novelty but perhaps novel combinations of existing ideas or how existing ideas reveal knowledge gaps and serving as springboards to other novel ideas.

This happenstance motivated me to explore its meaning and relevance through the journal I am editing and where I often string tendrils of musings to encourage a culture of synthesis and curiosity among young statisticians. The diversity of innovative statistical research in the last two issues, Vol 30(1) and (2), either in the methodology or application spread across a variety of topic areas encountered in the biopharmaceutical industry delightfully provides a source of examples of networked knowledge and combinatorial creativity. It is fascinating to unravel combinatorial thought matching problem and solution and piecing multiple strategies or techniques creating satisfactory solutions. For example, as the industry is in constant pursuit of breakthroughs, data are generally sparse. Statisticians usually generate data and create models to understand disease progression when clinical data are absent. Two papers exemplify the combination of these techniques: “Joint model for longitudinal mixture of normal and zero-inflated power series correlated responses” (Sharifian et al. Citation2020) and “Pair copula construction for longitudinal data with zero-inflated power series marginal distributions” (Sefidi et al. Citation2020). In the first article, the authors explored multiple statistical concepts – analyzing longitudinal continuous and count mixed responses, correlation among mixed responses, and zero-inflation modeling, particularly complex data in a nuanced way. In the second article, the authors explored paired copula constructions with a D-vine structure to model marginal distributions and used members of power series family of distributions for count repeated measurements having excess zeros. Another example is in the paper “Construction of a survival tree for dependent censoring” which examines the problem of constructing a set of subgroups of covariate space, wherein each element had the same failure and used the copula to represent the dependency between failure and censoring times and accommodate classification and regression trees in survival analysis. Shimokawa, A., and E. Miyaoka. Citation2020.

In the examples mentioned, all the techniques used are available in statistical literature, but the authors combined them thoughtfully that create data models explaining their finer details to satisfy their needs. This is not to suggest that new ideas and techniques must be complicated to be novel or innovative. Nor do I think networked knowledge and combinatorial creativity imply complexity either. In fact, networked knowledge is a resource that is used to reduce complexity in the innovation. We are reminded of the meta-theoretical principle called Occam’s razor – “pluralitas non est ponenda sine necessitate“ (plurality should not be posited without necessity). There are times when the novelty is in the simplicity of the approach. For example, in the paper “Investigation of the relationship between heart rate and functional class in pulmonary hypertension” (Lawrence and Hu Citation2020), the authors explore the utility of a non-linear mixed-effects model to model heart rate. They were able to achieve their goal of understanding heart rate acceleration and its relationship to different potential measures of disease severity without having to resort to elaborate means.

Usually, innovation is created through observing the system, performing combinations or creating perturbations, and then observing what happens. That is, networked knowledge in the end provides new insights into or sheds new light on a problem serendipitously or that there is an open-loop feedback in innovation which is also implied in combinatorial creativity. This type of innovation can be seen in the interesting work on “Prediction of treatment effect perception in cosmetics using machine learning” (Salah et al. Citation2020). The article talks about discrete choice experiments (DCEs) in cosmetics product development and the application of machine learning to explore combinations of attributes that determine product utility beyond linear or low-dimensional combinations. Of note, a DCE is a quantitative technique for eliciting individual preferences among alternative products. The higher level or latent combinations can unravel hidden dependencies and assist with product design and marketing decisions more precisely. The positive role of machine learning in this example is feedback of its new role in cosmetic product development. In another application entitled “Baseline selection in concentration-QTc (C-QTc) modeling: impact on assay sensitivity”, the authors seek to understand the impact of baseline selection in C-QTc (time-matched, pre-dose, average) onto assay sensitivity (Huang et al. Citation2020). Their analysis reveals that indeed baseline selection has an impact on prediction from C-QTc modeling, and the impact depends on study design (parallel, crossover). Awareness of this dependency provides caution to subsequent C-QTc studies on its ability to detect difference. In a similar exploration but distant field of adverse event (AE) signal detection, the authors focused on the use of a likelihood ratio test (LRT)-based method in identifying AE signals associated with left ventricular assist devices (LVADs) using medical device reporting data (Xu et al. Citation2020). During the exercise, the authors also compared the method with proportional reporting ratios, Bayesian confidence propagation neural network, and simplified Bayes methods and found the LRT to be the most conservative in terms of controlling type-I error and the false discovery rate. Given this finding, a similar set of authors applied the LRT-based method for safety signal detection to LVADs. This time they modify the LRT to account for exposure time as a more nuanced exploration of the potential risk associated with medical device use. Their finding is that even with a more conservative testing procedure, patients utilizing HeartWare Ventricular Assist Device had greater incidence of stroke than those using HeartMate II (Jung et al. Citation2020). Lastly, in the paper “Analysis of recurrent hypoglycemic events”, the authors explored three models – a gamma frailty model with variance estimated by the inverse of the observed Fisher information matrix, a gamma frailty model with the sandwich variance estimator, and a piecewise negative binomial regression model to model the recurrent event of hypoglycemia. Their major finding is that the sandwich variance estimator performed better when the frailty model is mis-specified (CitationMa et al. 2Citation020).

Innovation can also spring from deliberate identification of knowledge gaps or limitations within a networked knowledge. The newly added knowledge either comes from another existing idea or is completely novel. For example, in the elegantly written paper “Tests for equivalence of two survival functions: Alternatives to the (proportional hazards) PH and (proportional odds) PO models”, the author proposes equivalence tests for the difference of two survival functions under the class of log transformation model and that of two cumulative hazard functions under the partly Aalen’s model (Shen Citation2020). The author argues that this test builds on the inadequacies of popular testing procedure where both proportional hazards (PH) and proportional odds (PO) hazards assumptions can be violated. It is, however, an extension of the log-rank test proposed by Wellek (Citation1993) under PH model and an alternative test by Martinez et al. (Citation2017) under PO survival model. The idea to use the class of log transformation model and cumulative hazard functions that are partly Aalen exist in literature. In a related field paper, “Conditional restricted mean survival time for multiple events”, the authors propose a simple, non-parametric estimator of conditional mean survival time for multiple events to quantify treatment effect with clinically meaningful interpretation (Qiu et al. Citation2020). The authors noted that in clinical events used to form a composite endpoint, only the first occurrence of the composite endpoint event is considered in primary efficacy analysis and the others are masked. They argued that such an approach may not be ideal, and existing methods to mitigate the undesirability rely on certain model assumptions that may greatly affect the inferences for treatment effect. There are other articles in the last two issues exhibiting this thought trajectory, e.g., “A simulation study of approaches for handling disease progression in dose-finding clinical trials” (Biard et al. Citation2020), “Empirical weighted bayesian tolerance intervals” (Tran Citation2020), “Relative efficiency of equal versus unequal cluster sizes in cluster randomized trials with a small number of clusters” (CitationLiu et al. 202Citation0), and “Empirical likelihood-based inferences for median medical cost regression models with censored data”, where the authors are forced to create empirical likelihood-based inferences for median medical cost because there is no test available with such a robust estimator. The readers are invited to explore the thought process described in these papers within Vol 30(1) and (2).

The limitation or gap in knowledge could be scientific as shown above, or it could be about practical efficiency. Combinatorial creativity is naturally relevance seeking and so does innovation. Innovation is restricted to intentional attempts to bring about benefits from new changes brought about by filling gaps or practical efficiency. In the following papers, practical efficiency is the main goal, e.g., “Bayesian pooling versus sequential integration of small preclinical trials: a comparison within linear and nonlinear modeling frameworks”, where the authors recursively update posterior distributions as soon as new data become available to considerably reduce the computation time (La Gamba et al. Citation2020). In another paper, “Fast (quadratic lower bound) QLB algorithm and hypothesis tests in logistic model for ophthalmologic bilateral correlated data”, the authors focus on investigating the relationship between the disease probability and covariates (e.g., ages, weights, gender, etc.) via the logistic regression for the analysis of bilateral correlated data Lin et al., Citation2020. They propose a new minorization-maximization algorithm and a fast QLB algorithm to calculate the maximum likelihood estimates of the vector of regression coefficients. How beneficial these are may be dependent on tacit knowledge. Tacit knowledge is the knowledge of techniques, methods, and designs that work in certain ways and with certain consequences, even when one cannot explain exactly why (Polanyi Citation1969). Tacit knowledge represents knowledge based on the experience of individuals and can serve as a guide for what is worthwhile and relevant pursuit.

Networked knowledge and combinatorial creativity are dependent on tacit knowledge, i.e., tacit knowledge is important in guiding innovation. Tacit knowledge is required to make fine discriminations, to detect patterns, to judge familiarity (and therefore to notice anomalies), and to draw on a rich mental model of causal relations. Tacit knowledge is improved by collaboration and constant curiosity since curiosity is a powerful motivator for learning and for gaining expertise. Tacit knowledge and curiosity also aid in problem formulation such as filling knowledge gaps or better practical efficiency. Great ideas may sometimes seem obvious because the solution has all the parts of the question lining up and shedding light on a resolve. “Obvious” answers are not visible to most people, partly because most people are not thinking about the question. Ideas only come to those who recognize a problem and look for innovative solutions through networked knowledge and combinatorial creativity. For example, the innovation could be completely out of the blue to most people but is probably based on deep tacit knowledge as in the case of the paper “A curtailed selection procedure for comparing bernoulli outcomes with a control”. In this work, the authors propose a sequential selection procedure for comparing the success probability of k(>1) experimental Bernoulli populations with a controlled Bernoulli population. The procedure integrates the indifference zone formulation and the subset selection formulation to select either the best population or a random-sized subset that contains the best population (Chen and Hsu Citation2020).

In reading through these papers, each one combines different ideas and is like exploring a medieval text called florilegium as Maria Popova described in Brain Pickings. Florilegia, coming from the Latin for “flower” and “gather”, were compilations of excerpts from other writings, essentially mashing up selected passages and connecting dots from existing texts to illuminate a specific topic or doctrine or idea (Popova Citation2011). Then, it becomes clear that the old can be new again through a cycle of combinations, re-combinations, and sometimes unusual linkages across disciplines over time.

Innovation is not different from networked knowledge and combinatorial creativity – interconnecting dots sometimes in unexpected or unconventional ways. I hope that the examples herein can help a young statistician navigate through statistical innovation through a mindset of synthesis, curiosity, and creativity.

Acknowledgments

The author would like to thank Junjing Lin and Zachary Siebers for helpful editorial comments.

References

  • Biard, L., B. Cheng, G. A. Manji, and S. M. Lee. 2020. A simulation study of approaches for handling disease progression in dose-finding clinical trials. Journal of Biopharmaceutical Statistics 1–12. doi:10.1080/10543406.2020.1814796.
  • Chen, P., and L. Hsu. 2020. A curtailed selection procedure for comparing Bernoulli outcomes with a control. Journal of Biopharmaceutical Statistics 30: 1–11.
  • Huang, D., J. Chen, and Y. Tsong. 2020. Baseline selection in concentration-QTc modeling: Impact on assay sensitivity. Journal of Biopharmaceutical Statistics 1–12. doi:10.1080/10543406.2020.1814797.
  • Jung, M. Y., R. Ward, Z. Xu, J. Xu, Z. Yao, L. Huang, and R. Tiwari. 2020. Application of a likelihood ratio test based method for safety signal detection to left ventricular assist devices. Journal of Biopharmaceutical Statistics 30: 1–8.
  • La Gamba, F., T. Jacobs, J. Serroyen, H. Geys, and C. Faes. 2020. Bayesian pooling versus sequential integration of small preclinical trials: A comparison within linear and nonlinear modeling frameworks. Journal of Biopharmaceutical Statistics 30: 1–12.
  • Lawrence, J., and Z. Hu. 2020. Investigation of the relationship between heart rate and functional class in pulmonary hypertension. Journal of Biopharmaceutical Statistics 1–9. doi:10.1080/10543406.2020.1814800.
  • Lin, Y. Q., Y. S. Zhang, G. L. Tian, and C. X. Ma. 2020. Fast QLB algorithm and hypothesis tests in logistic model for ophthalmologic bilateral correlated data. Journal of Biopharmaceutical Statistics 30: 1–17.
  • Liu, J., C. Xiong, L. Liu, G. Wang, L. Jingqin, F. Gao, … Y. Li. 2020. Relative efficiency of equal versus unequal cluster sizes in cluster randomized trials with a small number of clusters. Journal of Biopharmaceutical Statistics 1–16. doi:10.1080/10543406.2020.1814795.
  • Ma, C., Y. Qu, and H. Fu. 2020. Analysis of recurrent hypoglycemic events. Journal of Biopharmaceutical Statistics 30: 1–9.
  • Martinez, E. E., D. Sinha, W. Wang, S. R. Lipsitz, and R. J. Chappell. 2017. Tests for equivalence of two survival functions: Alternative to the tests under proportional hazards. Statistical Methods in Medical Research 26 (1):75–87. doi:10.1177/0962280214539282.
  • Polanyi, M. (1969). Knowing and being: Essays, Ed. M. Grene. Chicago: Chicago Press.
  • Popova, M. 2006. Brain pickings – An inventory of the meaningful life.
  • Popova, M. 2011. Networked knowledge and combinatorial creativity – Brain pickings.
  • Qiu, J., D. Zhou, H. J. Hung, J. Lawrence, and S. Bai. 2020. Estimation on conditional restricted mean survival time with counting process. Journal of Biopharmaceutical Statistics 1–15. doi:10.1080/10543406.2020.1814799.
  • Salah, S., L. Colomb, A. M. Benize, C. Cornillon, A. Shaiek, J. Charbit, and A. Schritz. 2020. Prediction of treatment effect perception in cosmetics using machine learning. Journal of Biopharmaceutical Statistics 30: 1–8.
  • Sefidi, S., M. Ganjali, and T. Baghfalaki. 2020. Pair copula construction for longitudinal data with zero-inflated power series marginal distributions. Journal of Biopharmaceutical Statistics 1–17. doi:10.1080/10543406.2020.1832108.
  • Sharifian, N., E. Bahrami Samani, and M. Ganjali. 2020. Joint model for longitudinal mixture of normal and zero-inflated power series correlated responses Abbreviated title: Mixture of normal and zero-inflated power series random-effects model. Journal of Biopharmaceutical Statistics 1–24. doi:10.1080/10543406.2020.1814798.
  • Shen, P. S. 2020. Tests for equivalence of two survival functions: Alternatives to the PH and PO models. Journal of Biopharmaceutical Statistics 30:1–12.
  • Shimokawa, A., and E. Miyaoka. 2020. Construction of a survival tree for dependent censoring. Journal of Biopharmaceutical Statistics 30: 1–16.
  • Tran, H. 2020. Empirical weighted Bayesian tolerance intervals. Journal of Biopharmaceutical Statistics 1–11. doi:10.1080/10543406.2020.1814801.
  • Wellek, S. 1993. A log-rank test for equivalence of two survivor functions. Biometrics 49 (3):877–881. doi:10.2307/2532208.
  • Xu, Z., J. Xu, Z. Yao, L. Huang, M. Jung, and R. Tiwari. 2020. Evaluating medical device adverse event signals using a likelihood ratio test method. Journal of Biopharmaceutical Statistics 30: 1–10.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.