1,212
Views
32
CrossRef citations to date
0
Altmetric
ARTICLES

Statistical Genre Analysis: Toward Big Data Methodologies in Technical Communication

, , &
 

Abstract

This article pilots a study in statistical genre analysis, a mixed-method approach for (a) identifying conventional responses as a statistical distribution within a big data set and (b) assessing which deviations from the conventional might be more effective for changes in audience, purpose, or context. The study assesses pharmaceutical sponsor presentations at the Food and Drug Administration (FDA) drug advisory committee meetings. Preliminary findings indicate the need for changes to FDA conflict-of-interest policies.

Notes

Only ES132 (clinical experience), SS111 (disease studies), SS113 (preclinical studies), SS114 (RCT–efficacy), and SS115 (RCT–hazard) appeared to provide adequate amounts of variance. All other sources of evidence seemed used rarely, rendering even small deviations as outliers. Results from entering variables with outliers into analysis may well bias the estimates particularly when the analysis is underpowered with a small sample size. The eight sources of evidence with a saliently skewed distribution were thus excluded from the subsequent analyses. Categorical variables of major research interest (i.e., sponsors, diseases, medical products) contained too many attributes and disallowed from capturing a comprehensible pattern of relationship with other variables.

Results from significance testing should be interpreted cautiously. As in most observational studies, the independent variables should be considered probable not absolute causes of the outcomes.

The current significance testing was underpowered primarily because of the small sample size (N = 45) and even smaller cell sizes across conditions (3 ≤ n ≤ 17). Different results may emerge with a complete data set.

The current summary of the data is based on descriptive statistics only and statistically inconclusive. Post hoc analyses were not conducted intentionally because significance testing might produce misleading outcomes as a result of the small sample size.

To produce an effective yet parsimonious linear model, only the predictor variables of which zero-order correlation with the outcome measure exceeds .20 were selected as candidate antecedents (see Figures A.3 and A.4 in the Appendix). When thus entered, variables had strong associations; only one representative variable was kept in the model, and all other covariates were removed to lower potential multicollinearity. The survivors were usually the ones with the largest amount of variance compared with the rest. For example, subject-matter experts tended to be affiliated with universities (r = .76) and clinics (r = .70), and also, university associates often had positions in a clinic (r = .60). In this case, only subject-matter experts remained in the model for bringing the largest variance to the testing (subject-matter experts = 81.8, university = 6.55, clinic = 4.20). Accordingly, SS114, conflict of interest, duration in hours, and subject-matter expert entered the equation simultaneously to predict the approval rates. The independent variables were standardized to further reduce multicollinearity and ultimately to secure the reliability of parameter estimates. Cook's distance statistic identified no outliers with 0.00 ≤ ds ≤ 0.13. The problem of multicollinearity remained minimal with the highest variance inflation factor only reaching 1.35, which is well below the concerned limit of 5 (Cohen, Cohen, West, & Aiken, 2003).

η2 = .05 has been considered small but nontrivial effect, and η2 = .12 often represents a moderate effect in social science (Cohen, 1992). It is important to note that the current model explains a larger variance within approval rates (R2 = .40, ) than the preceding regression model (R2 = .36, ) even with one less predictor (see Figure A.9).

Additional information

Notes on contributors

S. Scott Graham

S. Scott Graham is the Director of the Scientific and Medical Communications Laboratory and a member of the English Department at the University of Wisconsin-Milwaukee. His research is devoted to exploring deliberation among technical experts and public stakeholders in matters of scientific and medical policy.

Sang-Yeon Kim

Sang-Yeon Kim is an assistant professor in the Department of Communication, University of Wisconsin-Milwaukee. His research is focused on social influence, persuasion, construct validation, and cross-cultural communication.

Danielle M. DeVasto

Danielle M. DeVasto is a PhD student studying Rhetoric and Composition in the Department of English at the University of Wisconsin-Milwaukee. There she is enrolled in the rhetoric and composition program. Her research interests reside at the intersections of visuality, rhetoric, science, and uncertainty.

William Keith

William Keith is a professor of Rhetoric at the University of Wisconsin-Milwaukee.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.