1,416
Views
0
CrossRef citations to date
0
Altmetric
Theory, Contexts, and Mechanisms

Challenges Solving Science Tasks with Text–Picture Combinations Persist beyond Secondary School

, ORCID Icon & ORCID Icon
Pages 759-783 | Received 16 Apr 2019, Accepted 11 Mar 2020, Published online: 24 Jun 2020

Abstract

Combinations of text and different types of pictures are commonplace in biology as in science in general. The single representations (i.e., text, picture) constituting a text–picture combination may contain redundant or complementary information. The ability to identify and integrate information in different kinds of text–picture combinations is indispensable for engaging in science and is normatively expected to be acquired in school. In this experimental study, which was not preregistered in an independent institutional registry, N = 240 undergraduate students worked on 2 constructed-response biology tasks originating from authentic final exams to obtain the higher education entrance qualification. The material carried equivalent information between conditions but consisted of redundant or non-redundant text–picture combinations or only texts. Analysis of variance revealed negative effects of the depictive representations on students’ performance, but these effects only occurred in 1 of the 2 tasks: Students in the text-only group reached the highest mean score, those who worked with the non-redundant text–picture combinations the lowest mean score. Implications for science education are discussed.

Introduction

Combinations of texts and pictures that together depict a particular subject matter are ubiquitous in everyday life. Such combinations are also widely used in scientific contexts (e.g., Bowen & Roth, Citation2002; Lemke, Citation1998), for example a microscopic photo or a line graph and accompanying text. The ability to independently identify and integrate information from text–picture combinations enables an individual to participate in societal discourse and to make reasonable decisions and is thereby an important aim of schooling (e.g., Krajcik & Sutherland, Citation2010; Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der Bundesrepublik Deutschland, Citation2005). More specifically, students intending to enter a science career who enroll in science majors in tertiary education need to identify and integrate relevant information in scientific text–picture combinations—regardless of how the information is presented—as part of their jobs and their studies. Identifying and integrating information in text–picture combinations is therefore an important receptive part of communication in science. Despite the prevalence of these combinations in science, many students—even students who have graduated from secondary school—struggle with them, especially with pictures in text–picture combinations (Cromley et al., Citation2010). Neglecting pictures is especially problematic when the information conveyed by the picture is not repeated in the text.

When presented in combination text and picture may contain complementary or equivalent information, the combinations thereby differ in their redundancy. In the context of learning, educational psychologists have established seemingly contradictory effects of redundancy: On the one hand, Mayer (Citation2001) shows that learning from text and picture is superior to learning from text alone. On the other hand, redundancy of text and picture may increase cognitive load that interferes with learning due to limited working memory capacity (Sweller, Citation2005). Similar mechanisms are expected to operate when a task at hand needs to be comprehended. Overall, research has yet to establish when redundancy helps and when it hinders. Specifically, it is not yet clear how redundant and non-redundant text–picture combinations in constructed-response items affect students’ ability to solve these items. Testing this with university students who have completed school sheds light on how well school prepares students to process text–picture combinations that differ in their redundancy. They are normatively expected to be able to process all kinds of text–picture combinations and deficient science communication skills likely impede students’ academic success in tertiary education.

We therefore compared students’ test performance in constructed-response science items when working with non-redundant or redundant text–picture combinations or text-only material. This comparison provides information on the extent to which students use or possibly disregard pictorial and textual information as a function of the text–picture combinations’ redundancy. In addition, it demonstrates to what extent redundant pictures added to text impact test performance. Our experiment thus gauges the effectiveness of current teaching with regards to students’ receptive science communication skills and its results might inform new policy.

Theoretical Framework

Types of External Representations and their Importance in Biology Contexts

Representations are objects that stand for something else (Schnotz, Citation2002). External representations (ERs) are divided into descriptive representations and depictions (Schnotz, Citation2002). Descriptive representations consist of symbols characterized by an arbitrary structure and conventions determine which object a symbol represents; typical examples are texts and mathematical equations (Schnotz, Citation2002). Depictive representations, however, consist of iconic signs. Whereas realistic pictures, such as photographs and drawings, are similar to the represented object, logical pictures, for example graphs and diagrams, are more abstract (Schnotz, Citation2002). Combinations of two or more single representations are referred to as multiple external representations (MERs).

In the discipline of biology, it is common to combine different types of ERs (Tsui & Treagust, Citation2013), for example texts and pictures; that is, MERs are highly prevalent in biology. The use of depictive representations is highly important to communicate within the discipline of biology, for example when representing biological phenomena on the macro or micro level (Eilam & Gilbert, Citation2014), such as a plant and its different cell types. Pictures can visualize biological phenomena that cannot be seen by the naked eye (e.g., at the submicroscopic level). Logical pictures allow scientists to economically summarize and present large amounts of data (Roth et al., Citation1999). In sum, depictive representations added to text are indispensable to discover and explain findings in the discipline of biology and thus omnipresent in scientific biology publications and course material of biology studies at universities (Arsenault et al., Citation2006; Bowen & Roth, Citation1998; Roth et al., Citation1999). Therefore, students—especially those students aspiring to a science career—should have learnt how to process all kinds of text–picture combinations by the end of secondary school.

Processing MERs

When trying to understand an MER, individuals construct an internal mental model in their cognitive system. According to the integrated model of text and picture comprehension (Schnotz, Citation2014; Schnotz & Bannert, Citation2003), a sensory register conveys the information of external representations (i.e., texts and pictures) to the working memory. Here, text is first processed in a descriptive subsystem and then in a depictive one; the opposite is assumed to apply to pictures. Individuals also integrate prior knowledge stored in their long-term memory when constructing an internal mental model (Schnotz, Citation2014).

According to Mayer’s multimedia principle, an MER constituted by text and picture promotes better comprehension than text-only learning material (Mayer, Citation2001). This is in line with Paivio’s dual coding theory which also assumes two different cognitive subsystems for verbal information such as texts on the one hand, and nonverbal information such as depictive representations on the other (Paivio, Citation1986)—provided that students access all types of representation, MERs constituted by text and picture would thus result in both subsystems being engaged in their processing. As an important benefit of MERs, Ainsworth (Citation2006) stated that in cases where inserting all information in one single representation would be too complicated, complementary information could be conveyed by more than one single representation. Empirically, undergraduate students working with text–picture combinations on three tasks outperformed students provided only with textual material (0.57 ≤ d ≤ 0.80; Eilam & Poyas, Citation2008).

However, not every depictive representation added to the text helps identify and process information. For instance, it is assumed that task-inappropriate types of depictive representations may interfere with constructing a task-appropriate internal mental model (Schnotz & Bannert, Citation2003). Moreover, it appears students do not fully use pictures’ potential: For instance, Cromley et al. (Citation2010) showed that undergraduate students working with texts and pictures dealing with biology topics often just skimmed the pictures or skipped over them. But students who sufficiently focused on pictures employed a higher proportion of high-level cognitive activities (Cromley et al., Citation2010). Not paying sufficient attention to pictures will be particularly problematic when the depictive information is not repeated in the text; no adequate mental model can be constructed due to the lack of depictive information.

Redundancy in MERs

As already mentioned, the representations constituting an MER may offer complementary or equivalent information; therefore, MERs differ in terms of redundancy and students encounter somewhat differing demands when processing MERs. Ainsworth (Citation1999) distinguishes two subclasses of complementary MERs: those consisting of single representations with totally different information and MERs consisting of single representations with partially shared information. However, completely redundant MERs also exist. Teachers, test developers, or researchers presumably do not as a rule consciously apply design principles in terms of redundancy when constructing MERs. To advise which principles to adhere to, the effects of redundancy need to be better understood; respective principles would also potentially enhance the practical utility of MERs in learning contexts.

Sweller’s redundancy principle in multimedia learning draws attention to the fact that redundant material unnecessarily enhances working memory load and could thereby compromise information processing and transfer to long-term memory (Sweller, Citation2005). For novel information to be stored in long-term memory it first needs to be processed in the working memory; but working memory capacity is limited. According to Mayer’s multimedia principle adding pictures to text in instructional material has positive effects compared to presenting text alone (Mayer, Citation2001). He summarized nine studies that demonstrated this positive effect regarding students’ test performance after studying instructional material on the functional principle of, for example, pumps, brakes, or lightning, with a moderate median effect size of 0.67 for the retention of steps in the presented process and a large median effect size of 1.50 regarding problem-solving transfer. Mayer’s cognitive theory of multimedia learning also includes considerations on cognitive load. For example, redundant texts added to narrations and pictures should be avoided to reduce cognitive load caused by text–picture material (Mayer, Citation2001).

Several experimental studies revealed positive effects of redundancy in MERs. Additional redundant pictures in the stem of multiple-choice science items reduced item difficulty as compared to text-only items when the item stem presented information essential for solving items (d = 0.53; Lindner et al., Citation2018). Saß et al. (Citation2012) also observed positive effects on students’ test performance when combining redundant pictures in the stem, in the answer options, or in both with texts in multiple-choice items (η2 = .03 and .04 corresponding to ds of approximately 0.35 and 0.41). A study on university students regarding comprehension of a pulley system’s kinematics showed that students who worked with redundant text–picture material outperformed students who worked with only text or pictures (d = 1.69 and 0.98, respectively; Hegarty & Just, Citation1993). However, it should be noted that only 47 students participated in that study.

On the other hand, Bobis et al. (Citation1993) observed a negative effect of redundancy in instructional material on paper-folding activities. Performance of students provided only with diagrammatic instructions was superior to that of students who received additional redundant texts; the authors explained this difference through an increased cognitive load in the redundancy group. The performance of students who only received text was not significantly different from those in the redundancy condition. Similarly, Pociask and Morrison (Citation2008) observed a negative effect of redundant text–picture material compared to non-redundant text–picture material presented during the learning phase when 41 university students worked on instructional units regarding orthopedic physical therapy with an effect size of approximately d = 0.81.

These studies have several characteristics in common, but some of them also differ in their requirements, which might explain the different findings. The studies compared the effect of redundant text–picture combinations (MERs) with single types of representations (either texts or pictures). Consequently, the amount of test material differed between groups, allowing for other explanations of mean test score differences than the MERs’ redundancy. In their low-powered study, Pociask and Morrison (Citation2008) compared redundant and non-redundant MERs, but due to eliminating redundant features the amount of test material also differed between the groups. Saß et al. (Citation2012) and Lindner et al. (Citation2018) placed pictures into multiple-choice items and generated redundancy based on the text. This kind of redundant text–picture combinations is specific to assessment and does not reflect MERs in the scientific discourse; it therefore lacks ecological validity in this science-related context. The presented pictures were less extensive than, for example, those of Pociask and Morrison (Citation2008). Many previous studies used multiple-choice items or practical tasks rather than constructed-response items, potentially placing different demands on working memory. The material in previous research combined a certain type of depictive representation with text, which does not reflect the combination of diverse depictive representations in the authentic scientific discourse. Moreover, in most previous studies, participants were primary or lower secondary school students (but see Hegarty & Just, Citation1993; Pociask & Morrison, Citation2008). Yet, all studies suggest that students did allocate some of their attention to pictures when they were presented with text–picture combinations.

Purpose of This Study

Since MERs are an integral part of communicating in science, the ability to identify and integrate information in different kinds of MERs is indispensable for engaging in science. An aim of science education in school is for all students to acquire this ability. Students who want to enter a career in science are especially required to process MERs, regardless of how the information is presented. In terms of MERs’ redundancy, this means that students should be able to identify and integrate information presented in redundant or non-redundant MERs after finishing upper secondary school.

In this experimental study, we therefore analyzed undergraduate students’ test performance on constructed-response biology tasks consisting of redundant or non-redundant MERs with equal amounts of material across conditions.Footnote1 We tested undergraduate students because they have already finished upper secondary school and the results give an insight into how well science education in school enabled them to work with different MERs in a biological context. Unlike in the condition with redundant MERs, students working with non-redundant MERs depended on identifying and integrating depictive information to solve the tasks. To gain further insights on how students presented with redundant MERs draw upon the information conveyed by the depictive representations as opposed to relying on the text, we added a text-only group (i.e., a group containing no depictive representations). This third experimental condition allowed us to examine whether redundant depictive representations actually help the students identify task-relevant information in comparison to text alone. Comparisons between experimental conditions allow for inferences about students’ utilization of the material and thus further our understanding of how students process multiple external representations. Such knowledge is necessary to deduce how best to foster students’ ability to independently identify and integrate information from MERs. We used test material based on final exam tasks to obtain the higher education entrance qualification, the German Abitur, to relate to authentic biology education. In this study, we investigated the following research questions (RQs):

RQ1: How does redundancy in MERs (text–picture combinations with redundant as compared with complementary information) affect undergraduate biology students’ performance in a biology test?

RQ2: Do depictive representations accompanied by text reiterating task-relevant depictive information units lead to undergraduate biology students performing better in a biology test as compared with text-only material?Footnote2

Method

Participants

In light of previous research, we performed an a priori power analysis for detecting an effect of medium size (f = 0.25) by means of a one-way analysis of variance (ANOVA)Footnote3 with a power level of 0.95 (Faul et al., Citation2007). Our sample consisted of 245 university bachelor students majoring in biology in Germany, falling only seven students short of the required sample size of N = 252. The dependent variable’s score of five students was missing and these students were excluded from the sample. Thus, the remaining sample size is N = 240 (M = 21.8 years, SD = 3.4); 70.0% of the students were female.

Procedure

All data collection was conducted by the first author. At the beginning of the session, students were informed about the scope of the study and that it was part of a dissertation project. They were informed that all data collection would be completely anonymous and that they would face no adverse consequences by declining or withdrawing their participation from the study at any time. Further, students were informed that participation in the study was neither a course requirement nor an opportunity for extra credit and that test performance would have no effect on their grades. Potential participants were made aware that they could leave prior to study conduction or at a later time, which a few students did. The willingness of the remaining students to participate was implied through survey completion. Dispensing with consent in written form obviated recording participants’ names. In a paper-and-pencil test, all students independently worked on two biology tasks in a constructed-response format (described in more detail in the following subsection). The total time allowed for both tasks was 45 min. The administrator terminated this part of the session after the designated test duration. To avoid effects of task order, we presented the two biology tasks in a balanced design. For each task, students noted the time when they began and completed it. Within classes, the students were randomly assigned to one of three experimental groups—class-level characteristics shared by students attending the same class were thus distributed equally across experimental groups: In the material of the no-redundancy group, all information relevant for solving the task was distributed between texts and depictive material (n = 79), the redundancy group received material in which all relevant information of depictive material was repeated in the text (n = 79), and material of the text-only group (n = 82) consisted solely of texts, which was nearly the same as the text of the redundancy group (for details see subsection “Material”). The tasks were phrased identically across groups and made no mention of the different experimental conditions. This also means that participants in the two conditions containing depictive material as well as text were not explicitly instructed that the pictures might contain task-relevant information. The test imposed no requirements on students beyond those that are commonplace for biology majors. In a second step, students answered a questionnaire consisting of covariates and questions regarding demographic data. Students could take part in a draw to win diverse vouchers with a value between €5 and €50. Moreover, we rewarded participation with sweets. At the end of the study, students were fully debriefed.

Material

The tasks were based on Abitur exam tasks from 2008 (Task A) and 2009 (Task B) from the German federal state North Rhine-Westphalia (Brixius et al., Citation2012).Footnote4 The Abitur is the higher education entrance qualification in Germany, the school leaving certificate obtained at the end of secondary school. The reasons for using Abitur exam tasks were twofold: Their authenticity ensures that the tasks do not require abilities beyond those which students are normatively expected to possess at the end of secondary school and solving the tasks requires students to analyze and discuss complex biological topics while working with MERs. We slightly adapted the task material according to conditions; the task material in the no-redundancy group was almost the original task material from the Abitur exam tasks. While systematically varying the task material between the three experimental groups, we kept the amount of textual material, content, and the question to be answered in an open-ended format constant.

In Task A, students were asked to explain the molecular genetic cause of a mitochondrial disease and its disease pattern (Brixius et al., Citation2012, p. 2008-13, 2008-15 to 2008-17). The students received textual material and—depending on the group they were assigned to—also depictive representations to solve the task. Task A contained three texts and the depictive representations of Task A comprised two biology-specific schematic drawings and a graph. For example, one of the schematic drawings showed the mitochondrial DNA of humans. We present a demonstration example, which closely resembles this particular schematic drawing and accompanying text of the actual task (material) of Task A to illustrate how we created redundancy (); the depictive representations were identical between the two conditions that provided text–picture combinations.Footnote5 To be able to solve Task A, the specific information unit which had to be taken from this particular figure is the base position for the coding of a particular tRNA. This is therefore a depictive information unit. In the no-redundancy group, this information unit was exclusively provided by the figure (see blue marking in the picture of ). In the redundancy group, the respective base position was also given in the text (see blue markings in the picture and the text of ). In the text-only group, this information unit was exclusively provided by the text; no figures were included in the material (see blue marking in ). The text section associated with this particular figure also contained task-relevant information that was only in the text in all three groups (see red markings in ). We refer to them as descriptive information units. The second biology-specific schematic of Task A showed a section of the genome of a healthy person and a person suffering from the genetic disease depicted as sequences of letters representing nucleobases of the mitochondrial DNA. The third depictive representation of Task A was a scatter plot with 10 data points and a compensation curve. Because in the redundancy group the relevant depictive information was repeated in the text, we had to slightly extend the length of the text in the no-redundancy group to keep the text length of these two groups constant. Text length’s equivalence was achieved by content-neutral extension. The text-only group got all information in textual form; we therefore refer to information units that are conveyed by depictive representations in the other two conditions as originally depictive information.

Figure 1. Exemplary test material closely resembling parts of the actual material for the no-redundancy group in Task A. Textual information that was required to solve the task is highlighted in red, relevant depictive information is highlighted in blue. No highlights were included in the material of the actual task provided to the students.

Figure 1. Exemplary test material closely resembling parts of the actual material for the no-redundancy group in Task A. Textual information that was required to solve the task is highlighted in red, relevant depictive information is highlighted in blue. No highlights were included in the material of the actual task provided to the students.

Figure 2. Exemplary test material closely resembling parts of the actual material for the redundancy group in Task A. Textual information that was required to solve the task is highlighted in red, relevant (originally) depictive information is highlighted in blue. No highlights were included in the material of the actual task provided to the students.

Figure 2. Exemplary test material closely resembling parts of the actual material for the redundancy group in Task A. Textual information that was required to solve the task is highlighted in red, relevant (originally) depictive information is highlighted in blue. No highlights were included in the material of the actual task provided to the students.

Figure 3. Exemplary test material closely resembling parts of the actual material for the text-only group in Task A. Textual information that was required to solve the task is highlighted in red, relevant originally depictive information is highlighted in blue. No highlights were included in the material of the actual task provided to the students.

Figure 3. Exemplary test material closely resembling parts of the actual material for the text-only group in Task A. Textual information that was required to solve the task is highlighted in red, relevant originally depictive information is highlighted in blue. No highlights were included in the material of the actual task provided to the students.

In contrast to Task A, Task B concerns evolution and ecology rather than genetics. In Task B, students had to explain the specific adaptations of fire salamander larvae to a certain woodland’s environmental factors (Brixius et al., Citation2012, p. 2009-1 to 2009-3). Task B’s material contained two texts and the depictive representations of Task B comprised a 2 × 2 table containing four values, a column chart containing two columns and a scatter plot with 13 data points. The information density of these depictive representations was thus lower than that of figures of Task A. Furthermore, the depictive representations of Task B were less biology-specific than two figures of Task A and it is likely that students knew this type of depictive representations from various disciplines. Overall, the specific depictive representations in Task B might make solving it easier and therefore more likely solved than Task A. Adapting the material to experimental conditions followed the same procedure as for Task A.

Scoring

A category scheme was developed for each task to score students’ responses based on the suggested solutions provided in association with the tasks (Brixius et al., Citation2012). We identified information units relevant for solving the task both in the figures and in the texts of the original task material. Students could score a maximum of nine points in each task. Depending on the group students were assigned to, they may only have been able to retrieve relevant information units from the text (see subsection “Material”). Identification of a task-relevant element was not only scored when the information was named verbatim, but also when students’ answers implied that the information unit must have been extracted.Footnote6 If none of this was the case, students received no point for the information unit in question. The first author rated all data according to the category schemes. To ensure the objectivity of the judgment, 25% randomly selected student responses for each task were independently rated by a second rater. We then determined Cohen’s kappa as a coefficient to measure the degree of raters’ agreement (Cohen, Citation1960). Inter-rater reliability was satisfactory ranging from Cohen’s κ = .62–.91 for the nine information units of Task A and Cohen’s κ = .64–1.0 for the nine information units of Task B. This strength of agreement is substantial (κ = .61–.80) to almost perfect (κ = .81–1.0) following the classification of Landis and Koch (Citation1977). Rating of Task B responses revealed one low kappa value (κ = .42), although raters’ agreement was high (89.1%)—a paradox known to occur when assignments to a category’s expressions are very unequal (Feinstein & Cicchetti, Citation1990). The scoring was based on students’ utilization of task-relevant elements of the materials; we refer to those elements as relevant information units. The category scheme for the demonstration example is shown in Supplementary Material Table S1. In each of the two actual tasks, students were required to use a total of five information units from the figures to solve the task, and four information units from the text. Thus, students could score a maximum of nine points in each task. Additionally, we calculated subscores for each task considering the five depictive and the four descriptive information units separately to investigate whether the different types of representations differentially drive observed effects. Due to our test design, redundancy group students got all depictive information units also in textual form, and students in the text-only group worked only with text. The information units underlying the subscore for depictive information units were thus in fact differentially represented across groups.

Control Measures

Cognitive Abilities

Cognitive abilities of students were assessed using the nonverbal scale N2 of the KFT 4-12 + R (Heller & Perleth, Citation2000), which is a German version of the Cognitive Abilities Tests of Thorndike and Hagen (Citation1971). This scale consists of 25 items, each presenting a pair of meaningfully matching figures, a third figure, and five answer options. Students had to choose the one figure among the answer options which matches the third figure like the presented second figure matches the first. The items were summed up to form a composite score with an acceptable internal consistency, Cronbach’s α = .78.

Working Memory Capacity

Students’ working memory capacity was measured using an updating task based on the spatial-figural updating task of Wilhelm et al. (Citation2013). In each trial of this test, between 2 and 5 differently colored rectangles were repeatedly presented within a 3 × 3 matrix. All colors appearing in a trial were presented together at the beginning of this trial. Then single color rectangles appeared at different positions within the matrix for a number of times unknown to the students. The order of rectangles was generated at random but was kept constant for all tested classes. Students had to remember the last position of each color. Unlike Wilhelm and colleagues who presented the test to each single participant on a computer, we presented it on a large screen to all students of a class. The test consists of 11 trials; an overall score was obtained by averaging the proportion of correctly remembered color positions across trials. Internal consistency was measured using trial scores and showed a poor value of Cronbach’s α = .52.

Additional Covariates

At the end of the session, we asked for students’ secondary school leaving certificate grade point average (GPA). In Germany, passing grades range between 1 and 4, whereby 1 is the highest grade. Students also reported their last biology grade in school, their sex, and their age. Furthermore, they indicated whether they had taken an Abitur exam in biologyFootnote7 and whether they already knew one or both tasks.

Data Analysis

We calculated ANOVAs and analyses of covariance (ANCOVAs) to answer our research questions. We conducted all analyses with IBM SPSS statistics (version 24, IBM Corporation, Citation2016). Sampling students from different classes to obtain a sufficiently large sample size means that individual observations were nested in classes; random assignment to groups prevented confounding class and group. In addition to the fixed effect of group, we therefore modeled a fixed effect of class, resulting in a two-way ANOVA. We followed up on significant main effects of the factor experimental group by conducting multiple comparisons: Bonferroni corrected post hoc tests answered our main research questions, whether text–picture combinations differentially affected students’ task performance when they contained complementary or redundant information (RQ1) and whether depictive representations accompanied by redundant text improved performance as compared with text-only material (RQ2). To be comprehensive, we also report the Bonferroni corrected post hoc tests between the no-redundancy group and the text-only group.

Results

Baseline Equivalence

Two-way ANOVAs with experimental group and classroom as fixed effects were conducted to ascertain that randomly assigning students to the no-redundancy, the redundancy, or the text-only group had not led to significant differences between the groups with regard to cognitive abilities, working memory capacity, the last biology grade in school, and students’ GPA of the school leaving certificate. Participants in the three groups did not significantly differ with regard to these variables (cf. ). Working memory was not included in further analyses because of our measure’s low reliability. Moreover, the different experimental conditions did not result in statistically significant differences regarding time-on-task (). We further established that the experimental groups did not differ in terms of students’ sex, whether they had taken an Abitur exam in biology and whether they knew at least one task by conducting Pearson’s chi-square tests; neither test was statistically significant, χ2(2) = 0.06, p = .971; χ2(2) = 0.32, p = .852; χ2(2) = 0.82, p = .662, respectively.

Table 1. Descriptive statistics and ANOVA results for the covariates.

Effects of Group on Biology Test Performance

Effects of Group by Task

We calculated separate two-way ANOVAs for the two biology tasks to investigate whether the group, more specifically, the kind of material students worked with, had an effect on students’ performance while accounting for the fact that students had been drawn from different classes. Results for Task A revealed that there was a statistically significant effect of the group, F(2, 228) = 23.32, p < .001, ηp2 = .170. Post hoc tests with Bonferroni correction showed that the three mean test scores were significantly different from each other: Students who had to identify relevant information in depictive and textual representations (no-redundancy group; M = 3.44, SD = 2.49) reached a significantly lower test score than students who worked exclusively with texts (text-only group; M = 6.24, SD = 2.44) and students who got texts and additional pictures containing redundant information (redundancy group; M = 4.68, SD = 2.86) with p < .001, d = 1.13, and p = .005, d = 0.46, respectively; students of the text-only group also outperformed students of the redundancy group, p < .001, d = 0.59 (for detailed distribution of the data, see Supplementary Material S1). In contrast, a two-way ANOVA for Task B showed no statistically significant differences between the mean scores of the no-redundancy group (M = 5.90, SD = 2.13), the redundancy group (M = 6.15, SD = 1.63), and the text-only group (M = 5.95, SD = 1.60), F < 1, p = .669, ηp2 = .004. The main effect of classroom was statistically significant for both tasks signifying mean level differences between classrooms, F(3, 228) = 11.51, p < .001, ηp2 = .132 for Task A, and F(3, 228) = 8.60, p < .001, ηp2 = .102 for Task B. The interactions of experimental group and classroom were, however, not statistically significant, F(6, 228) = 1.35, p = .236, ηp2 = .034 for Task A and F(6, 228) = 1.97, p = .071, ηp2 = .049 for Task B.

Effects of Group by Task and Type of Representation

Next, we considered the depictive versus textual representations of each task separately to elucidate whether information conveyed by these representations were differentially driving the effect. We conducted a set of two-way ANOVAs and post hoc tests with Bonferroni correction (). For both depictive and textual information in Task A, the result pattern was comparable to the effect of the experimental group observed for the entire task, but different in effect size, F(2, 228) = 29.27, p < .001, ηp2 = .204 and F(2, 228) = 13.55, p < .001, ηp2 = .106, respectively (for post hoc test results, see Supplementary Material Table S2). The difference between the no-redundancy group and the redundancy group examining textual information units was, however, no longer statistically significant (). In contrast to results for the entire Task B, the main effect of group was statistically significant for the depictive information, F(2, 228) = 4.20, p = .016, ηp2 = .036; yet, no post hoc test with Bonferroni correction reached conventional levels of statistical significance. The mean subtest scores based on the textual information units of the no-redundancy group and the text-only group were, however, statistically significantly different, resulting in a statistically significant main effect, F(2, 228) = 3.14, p = .045, ηp2 = .027. The remaining simple comparisons for textual information units in Task B were not statistically significant (cf. Supplementary Material Table S2).

Figure 4. Mean subtest scores of different information units of Task A and Task B by group collapsed across classroom. The mean subtest scores are presented for relevant information units (originally) represented by figures or texts of Task A and for information units (originally) represented by figures and tables or by texts of Task B. Fig = Figure; Tab = Table. Error bars represent standard errors. All highlighted significance levels for pairwise comparisons were adjusted using Bonferroni correction. **p < .01; ***p < .001.

Figure 4. Mean subtest scores of different information units of Task A and Task B by group collapsed across classroom. The mean subtest scores are presented for relevant information units (originally) represented by figures or texts of Task A and for information units (originally) represented by figures and tables or by texts of Task B. Fig = Figure; Tab = Table. Error bars represent standard errors. All highlighted significance levels for pairwise comparisons were adjusted using Bonferroni correction. **p < .01; ***p < .001.

The main effect of classroom was statistically significant for both depictive and textual information in Task A, F(3, 228) = 11.49, p < .001, ηp2 = .131 and F(3, 228) = 7.69, p < .001, ηp2 = .092, respectively. The interactions of experimental group and classroom were, however, not statistically significant, F(6, 228) = 1.21, p = .301, ηp2 = .031 for depictive information in Task A, and F(6, 228) = 1.39, p = .219, ηp2 = .035 for textual information in Task A. Likewise for Task B, the main effect of classroom was statistically significant for both depictive and textual information, F(3, 228) = 7.65, p < .001, ηp2 = .091 and F(3, 228) = 4.57, p = .004, ηp2 = .057, respectively, and the interactions of experimental group and classroom were not statistically significant, F(6, 228) = 1.92, p = .078, ηp2 = .048 for depictive information in Task B, and F(6, 228) = 1.38, p = .222, ηp2 = .035 for textual information in Task B.

Analyses of Covariance Including Students’ Cognitive Abilities

To also account for the effect of individual students’ cognitive abilities on their test performance, we ran a set of ANCOVAs with students’ cognitive abilities as the covariate. Cognitive abilities had a statistically significant positive effect on students’ mean test scores for the entire Task A, F(1, 224) = 14.59, p < .001, ηp2 = .061. This effect also existed for the two subscores separately considering depictive or textual information units of Task A, F(1, 224) = 7.63, p = .006, ηp2 = .033 and F(1, 224) = 17.57, p < .001, ηp2 = .073, respectively. For Task B, however, the covariate cognitive abilities did not significantly affect students’ test performance beyond the effects of the two factors experimental group and classroom, neither for the entire Task B nor examining depictive or textual information units separately, F(1, 224) = 3.68, p = .056, ηp2 = .016, F(1, 224) = 2.88, p = .091, ηp2 = .013, and F(1, 224) = 1.66, p = .199, ηp2 = .007, respectively.

Notably, the addition of cognitive abilities did not substantially alter the group’s effects. The group effect on students’ test performance in Task A was still statistically significant after controlling for the effect of their cognitive abilities, F(2, 224) = 24.11, p < .001, ηp2 = .177 for the entire Task A; F(2, 224) = 28.16, p < .001, ηp2 = .201 for depictive information in Task A; F(2, 224) = 15.30, p < .001, ηp2 = .120 for textual information in Task A. The effect of experimental group remained non-significant for the entire Task B and statistically significant for the depictive information units, F < 1, p = .726, ηp2 = .003 and F(2, 224) = 3.66, p = .027, ηp2 = .032, respectively. The group effect became non-significant when textual information in Task B was considered, F(2, 224) = 2.97, p = .053, ηp2 = .026 (cf. Supplementary Material, Table S3 for descriptive data and—where appropriate—post hoc test results).

The main effect of classroom also remained statistically significant for the entire Task A and for both depictive and textual information in Task A when students’ cognitive abilities were controlled for, F(3, 224) = 12.38, p < .001, ηp2 = .142, F(3, 224) = 12.45, p < .001, ηp2 = .143, and F(3, 224) = 8.00, p < .001, ηp2 = .097, respectively. The interactions of experimental group and classroom were again not statistically significant, F(6, 224) = 1.16, p = .329, ηp2 = .030 for the entire Task A, F(6, 224) = 1.04, p = .399, ηp2 = .027 for depictive information in Task A, and F(6, 224) = 1.37, p = .226, ηp2 = .035 for textual information in Task A. The main effect of classroom was also statistically significant for the entire Task B and for both depictive and textual information in Task B, F(3, 224) = 7.76, p < .001, ηp2 = .094, F(3, 224) = 6.55, p < .001, ηp2 = .081, and F(3, 224) = 4.63, p = .004, ηp2 = .058, respectively. As was the case for Task A, the interactions of experimental group and classroom were not statistically significant, F(6, 224) = 1.66, p = .132, ηp2 = .043 for the entire Task B, F(6, 224) = 1.63, p = .141, ηp2 = .042 for depictive information in Task B, and F(6, 224) = 1.27, p = .275, ηp2 = .033 for textual information in Task B.

Discussion

In science communication, text–picture combinations are omnipresent. This is true for communication of research findings, but also in the communication of science in the public. Anyone who wants to participate in society in a self-reliant way should therefore be able to identify and integrate information in different types of MERs, but this particularly applies to students who want to pursue a career in science. Students who graduated from secondary school should be equipped to process all kinds of MERs.

Effect of the Task Material

Our study showed that notable differences in undergraduate students’ test performance might result from how the task material is presented. In Task A, students who had to identify and integrate information units in both descriptive and depictive representations reached the lowest test scores, far lower than students who could retrieve all information from the text (i.e., the redundancy group [see RQ1] and the text-only group). Students who exclusively worked with textual material performed best. When pictures with some redundant relevant information were presented in addition to nearly the same text, test performance was also markedly lower than in the text-only condition (see RQ2). Unexpectedly, however, the variation of the task material produced such effects on test performance only for Task A, the genetics task; students’ performance on Task B, the ecology and evolution task, was not consistently affected by the experimental manipulation. We discuss possible explanations why effects of the task material as observed for Task A failed to show for Task B in the following subsection.

The absence of a positive effect of text–picture material compared to exclusively textual material is contrary to most previous studies, but the studies’ material differed in meaningful ways. These differences may explain why the results of our study differ from that of previous studies (e.g., Hegarty & Just, Citation1993; Lindner et al., Citation2018; Saß et al., Citation2012). The way we created redundancy in our task material differed, for example, from studies of Saß et al. (Citation2012) and Lindner et al. (Citation2018). Instead of repeating nearly all information units of the text in a depictive representation, we repeated task-relevant depictive information units in the text. The latter kind of redundancy is common in authentic scientific communication and therefore important for students who want to pursue a career in science. Although there is disagreement over whether scientific journals contain redundant text–picture combinations (Lemke, Citation1998; Roth et al., Citation1999), one can summarize that when redundancy occurs only relevant depictive information is repeated in the text and can help the reader correctly interpret the depictive representations (for graphs, see Roth et al., Citation1999). Furthermore, the depictive representations in our task material contained a greater amount of information than the multiple-choice items generated for the assessment of considerably younger students’ science achievement (e.g., Lindner et al., Citation2018; Saß et al., Citation2012). This means that students of our study had to exclude irrelevant information in depictive representations and identify the relevant information which has been shown to elevate cognitive load and error rates in graph reading tasks (Strobel et al., Citation2018). In addition, multiple-choice items with little text make lower demands on mental model construction than the constructed-response items with considerable amounts of material in this study. Hegarty and Just (Citation1993) presented material with which a relatively simple process close to everyday life was to be reconstructed, whereas in our study the domain-specific content requirements were higher, which was also reflected in the depictive representations particularly of Task A.

In line with our results, several studies observed that students sometimes struggle with depictive representations. Research of Roth and colleagues revealed that students—including college students—have difficulties interpreting graphs (e.g., Bowen et al., Citation1999; Bowen & Roth, Citation1998; Roth & Bowen, Citation1999). The fact that students in our study have not sufficiently explored the depictive representations in MERs raises the question of their expectations when dealing with MERs. Because pictures are often decorative in science lessons, pictures may not be first and foremost regarded as a necessary source of information. Cook (Citation2006) assumes that more difficulties are experienced with visual representations the less one knows about the respective topic. Prior content knowledge is necessary to select relevant information and to simultaneously hold more information in working memory. Although students in tertiary education should have knowledge of the content covered by Abitur exams, especially in Task A better knowledge of mitochondrial DNA’s structure or genetic disorders may have helped to identify relevant depictive information. However, the observation that students who worked with texts alone outperformed students who worked with almost identical texts and additional redundant pictures in Task A implies that students did not simply disregard the depictive representations—the additional redundant pictures had a deteriorating effect on students’ test performance. According to Paivio’s dual coding theory (Paivio, Citation1986), addressing both subsystems should have facilitated mental model construction. This was apparently not the case here: students actually performed worse when presented with redundant pictures in addition to the text (i.e., the redundancy group) rather than merely with verbal information (i.e., the text-only group).

Unlike the assumption of Sweller’s redundancy principle, the results of our study did not show a negative effect of redundant text–picture combinations (i.e., repeating task-relevant information from the picture in the text) compared to non-redundant text–picture combinations (where task-relevant information is split between pictures and text). While Sweller refers to learning, Bobis et al. (Citation1993) observed a negative effect of redundant instructional material on paper-folding activities in comparison to diagrams only explaining the results by an increased cognitive load of redundancy group’s participants. However, requirements for executing prescribed actions are not directly comparable to requirements for solving constructed-response science items. Whereas our design precluded the potentially confounding influence of differentially extensive material between the no-redundancy and redundancy condition limiting previous research (Bobis et al., Citation1993; Pociask & Morrison, Citation2008), cognitive demands might still be higher when information from the verbal and visual subsystem must be integrated (no-redundancy group) rather than when all information can be extracted by a single subsystem (redundancy group). It is conceivable that students working with MERs in Task A focused on the texts and did not pay sufficient attention to pictures, which would be consistent with the study of Cromley et al. (Citation2010). Consequently, students of the no-redundancy group who needed to identify information in the depictive representations to solve the tasks reached lower test scores than students of the redundancy group who were presented with the same depictive representations, but could have relied completely on the text. Although the text of the text–picture material of both the no-redundancy and the redundancy group occasionally referred to the figure, the material of the no-redundancy group did not contain any content-related referential connections that might have prompted students to focus on relevant parts of the depictive representations as provided by redundant textual information in the redundancy group. Such prompting might contribute to the explanation why the redundancy group outperformed the no-redundancy group. Previous research demonstrated that prompts, such as arrows and bold printing, between associated information in complementary text–picture materials increase science performance, presumably by facilitating text–picture integration (Saß & Schütte, Citation2016). Eye-tracking research could reveal whether students working with redundant MERs actually move from a particular text passage to the associated redundant part in the depictive representation.

Separating depictive and textual information revealed nearly the same pattern of results as occurred for the entire Task A, while effect sizes differed. Notably, the effect size of group effect examining depictive information was almost twice as large as that examining textual information for Task A, which underlines the relevance of pictures in the presented MERs regarding test performance. With respect to depictive representations, the differences between groups were in line with the above assumptions. However, it was remarkable that for originally textual information units that were provided by text in all groups the participants working exclusively with textual material also outperformed the participants in both conditions working with MERs and correctly utilized more of those information units in their constructed responses. Original textual information was presented virtually identically in all three materials. A possible explanation might be higher cognitive demands for students of the no-redundancy group and the redundancy group who had to split their attention between external representations and then integrate information of two subsystems. The depictive representations might have served as a distraction in the sense that it directed attention away from the text. In the absence of depictive representations, that is, in the text-only group, students did not have to split their attention between text and pictures and could thus focus on the text where all relevant information was to be found.

A further explanation might be that negative affective-motivational effects of the complex biology-specific pictures in the material for Task A in the no-redundancy group and the redundancy group contributed to a lack of students’ attention to the depictive representations. One might speculate that students of these groups were discouraged when they looked at the pictures and spent less effort on comprehending task material than students of the text-only group; yet they spent comparable amounts of time working on the tasks.

Comparison of Results of Task A and B

Differences between the results of the two tasks might be explained by the depictive representations and the topic of each task. While Task A, the task on genetics, included two biology-specific depictive representations, Task B, the evolution and ecology task, did not. Task B contained two graphs and one data table and these types of representations are not per se biology-specific, but relatively common in everyday life. Consequently, students are likely to encounter them in out-of-school contexts as well as in other school subjects, thus having more learning opportunities for these types of representations. Therefore, it is conceivable that Task B seemed less intimidating and was easier for them to comprehend than Task A. Difficulties that occur when working with domain-specific depictive representations were also shown in several previous studies (e.g., Halverson et al., Citation2011; Novick et al., Citation2012). For example, Novick et al. (Citation2012) showed that undergraduate biology students have difficulties interpreting information in phylogenetic trees. In addition, the depictive representations of Task A contain a high density of information and only a small proportion of these information units was task-relevant and thus repeated in the text of the redundancy group. In contrast, depictive representations of Task B comprised fewer information units. Therefore, a higher proportion of depictive information units was task-relevant and was repeated in the text of the redundancy group. The biology-specificity and the high density of information may have led to the assumed distracting or discouraging effect of depictive representations of Task A that did not occur in Task B.

In terms of the tasks’ topics, genetics is often described as difficult, for example because of the need to switch between different levels of biological organization, such as cell and organism level (Knippels et al., Citation2005). Although students only had to switch between different subcellular levels in depictive representations of Task A, this requirement may have been a reason to perceive the task as difficult and to be discouraged by the depictive representations.

As expected, students’ cognitive abilities were significantly related to test performance. That an effect of individual student cognitive abilities was observed beyond the effects of experimental group and class for Task A but not for Task B might also indicate that Task A was more difficult. Importantly, the effect sizes of the statistically significant group effects did not notably change after controlling for cognitive abilities. MERs challenged students with high cognitive abilities test scores as much as students with low scores and general cognitive ability did not compensate for the lack of ability to identify and integrate relevant information in MERs. In fact, the effects of students’ cognitive abilities were small or of medium size and thus considerably lower than the effects we observed for group.

Strength and Limitations

To the best of our knowledge, our study was the first to compare the effects of redundant and non-redundant biological text–picture combinations and one of only a few studies that recruited students who already obtained their secondary school leaving certificate to investigate the effects of MERs’ redundancy. The kind of redundancy we created is similar to the redundancy that—if any—awaits students when they get involved in authentic scientific communication.

Both biology tasks can be considered ecologically valid because they have actually been used in Abitur exams. Unlike other studies, our tasks had a constructed-response format, which places higher cognitive demands on students than multiple-choice items. Multiple-choice items might allow excluding answer options based on some aspects’ incompatibility with one’s mental model. Every single multiple-choice item is usually far less complex than the Abitur tasks and each answer option may be collated with one’s mental model; moreover, the likelihoods of the answer options may be collated. Solving a constructed-response item, by contrast, requires test-takers to construct answers based on a correct mental model. At the same time, the authenticity of our tasks entailed a high selectivity and the constructed-response format constrained the test length to two tasks. Therefore, task topic is confounded with characteristics of the depictive representations that we discuss as distinguishing the genetics task from the evolution and ecology task so that we cannot exclude the possibility that its specific topic or other local specifics in the selected genetics task (i.e., Task A) caused the observed deleterious effects of depictive representations in multiple external representations.

With regard to scoring, we cannot ascertain that raters were blind to condition at all times. The response format allows for the possibility that students’ responses indicated whether they had been assigned to a text–picture condition (no-redundancy or redundancy group) or the text-only condition. In light of the satisfactory inter-rating results, a substantial impact seems rather unlikely as both raters’ scoring process would have needed to be equally biased.

The differential results we observed between tasks warrant a thorough investigation of potential negative effects on students’ performance as a function of the type of depictive representation. As a first step, a repetition of the experiment with similar tasks is required to examine whether the effect of task material replicates with other biology-specific representations high in information density and fails to show in tasks lacking these characteristics. Secondly, tasks covering a broader range of topics should be examined to test whether similar effects do indeed occur. From an applied perspective, it would also be instructive to include more advanced students and systematically vary students’ time spent in tertiary education as an additional factor in the research design. Ultimately, one would hope, that the effects of task material would be absent after a certain amount of time as a biology major. Furthermore, due to our working memory test’s low reliability, future research must yet shed light on working memory’s role in solving constructed-response tasks based on MERs.

Moreover, we could observe the outcome of task solving in our study but not the underlying process. Therefore, eye-tracking research could disclose to which extent students focus on and use depictive and descriptive representations in each task. The present results indicate that depictive representations have received less attention than text and that not all of their relevant elements have been identified. However, textual redundancy seems to help identify relevant parts in depictive representations. Eye-tracking may therefore also provide information as to whether textual redundancy serves as a prompt. It can also shed light on whether students are able to systematically decode depictive information or whether they try to decode it in an unstructured way.

Conclusion and Implications

It is not enough for science education in school to only convey content knowledge. Process-related skills such as communication, which includes interpreting domain-specific depictive representations and different kinds of MERs, must also be conveyed to the students. These skills are fundamental for all students but especially for those who want to pursue a career in science. Given that authentic communication heavily depends on the use of domain-specific figures and MERs that present information in different ways, our results suggest that secondary school education—at least in Germany—might improve on how it prepares students to pursue a career in science. Because the tasks we used were Abitur exam tasks, bachelor students majoring in biology, that is, students who have successfully completed upper secondary school should be able to solve these tasks. As domain-specific depictive representations negatively affected students’ test performance they might benefit from more explicit instructions and more learning opportunities regarding how to independently identify and integrate information from all kinds of MERs. Moreover, students might benefit from more targeted instruction on how to process MERs that are more or less redundant. National guidelines are already in place which require proficient working with representations as part of communication competence (for Germany: Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der Bundesrepublik Deutschland, Citation2005). But process-related skills of students have only been proposed to be included in science classes in school alongside subject contents for the past few years. This change in science education is ongoing. Teachers may need time to adapt and possibly more tangible support which is ideally based on rigorous intervention research. The encouraging policy decision to explicitly incorporate process-related skills into required student learning outcomes so far seems to fall short in translating into (effective) teaching.

In light of the significance of multiple external representations in biology and science more generally, we propose that curricula should explicitly include both working with MERs of different degrees of redundancy and working with biology-specific depictive representations. Overall, Western science curricula include working with different representations (e.g., Department for Education, Citation2015; NGSS Lead States, Citation2013), but not specifically enough considering the requirements that studying science places on students. This is not only lacking in Germany. Similar demands have already been made by Halverson et al. (Citation2011) with regard to biology-specific depictive representations, while Roth and colleagues criticized that science education does not even prepare students to work with authentic more general depictive representations (i.e., graphs; e.g., Roth & Bowen, Citation1999). Furthermore, teachers should not only explicitly teach students how to deal with any kind of MERs and depictive representations, but use more authentic scientific literature in science lessons, for example, excerpts of research articles. Being thus empowered students will hopefully also overcome any unease they might experience when confronted with text–picture combinations.

Acknowledgments

We thank Safa Dadkhah Aseman and Christina Kelting for their help in inter-rating and data entry. Special thanks to all instructors who kindly provided access to their courses.

Disclosure statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Notes

1 This study was not preregistered with an analysis plan in an independent institutional registry. However, inquiries about study materials may be directed to the first author. Furthermore, the raw data supporting the conclusion of this manuscript will be made available by the authors to any qualified researcher.

2 In authentic scientific communication, redundancy results from information of the depictive representation being repeated in the text. Pictures are therefore considered the primary material.

3 The targeted sample size could not be obtained within a single classroom. We thank an anonymous reviewer for suggesting to account for this data structure by including classroom as a second factor in our analytic approach, thereby potentially improving its statistical power.

4 Although the Abitur examinations in Germany are not centralized at the national level, defined uniform examination requirements mandate comparability between the requirements of Abitur exam tasks in biology in different German federal states (Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der Bundesrepublik Deutschland, Citation2004).

5 Copyright issues prevent us from reproducing our actual task materials. Therefore, we developed a demonstration example which is merely intended to show how we created redundancy. In contrast to the actual task, which addresses a hereditary disease, the implied disease of the demonstration example is fictitious and not meant to be biologically correct. Inquiries about intervention materials should be directed to the first author.

6 The category schemes can be requested from the first author.

7 Students in upper secondary school in Germany take different courses having a basic and a higher level of requirements; they take final examinations in only 4–5 subjects, depending on the federal state.

References