7,398
Views
0
CrossRef citations to date
0
Altmetric
Original Articles

Difference in Learning Among Students Doing Pen-and-Paper Homework Compared to Web-Based Homework in an Introductory Statistics Course

, &

ABSTRACT

A repeated crossover experiment comparing learning among students handing in pen-and-paper homework (PPH) with students handing in web-based homework (WBH) has been conducted. The system used in the experiments, the tutor-web, has been used to deliver homework problems to thousands of students in mathematics and statistics over several years. Since 2011, experimental changes have been made regarding how the system allocates items to students, how grading is done, and the type of feedback provided. The experiment described here was conducted annually from 2011 to 2014. Approximately, 100 students in an introductory statistics course participated each year. The main goals were to determine whether the above-mentioned changes had an impact on learning as measured by test scores in addition to comparing learning among students doing PPH with students handing in WBH.

The difference in learning between students doing WBH compared to PPH, measured by test scores, increased significantly from 2011 to 2014 with an effect size of 0.634. This is a strong indication that the changes made in the tutor web have a positive impact on learning. Using the data from 2014, a significant difference in learning between WBH and PPH for 2014 was detected with an effect size of 0.416 supporting the use of WBH as a learning tool.

1. Introduction

Enrollment to universities has increased substantially in the past decade in most OECD (Organisation for Economic Co-operation and Development) countries. In Iceland, the increase in tertiary level enrolment was 40% between 2000 and 2010 (OECD Citation2013). This increase has resulted in larger class sizes at the University of Iceland, especially in undergraduate courses. As stated by Black and Wiliam (Citation1998), several studies have shown firm evidence that innovations designed to strengthen the frequent feedback that students receive about their learning yield substantial learning gains. Providing students with frequent quality feedback is time consuming and in large classes this can be very costly. It is therefore of importance to investigate whether web-based homework (WBH), which does not require marking by teachers but provides feedback to students, can replace (at least to some extent) traditional pen-and-paper homework (PPH). To investigate this, an experiment has been conducted over a 4-year period in an introductory course in statistics at the University of Iceland. About 100 students participated each year. The experiment is a repeated crossover experiment so the same students were exposed to both methods, WBH and PPH.

The learning environment tutor-web (http://tutor-web.net) used in the experiments has been under development during the past decade in the University of Iceland. Two research questions are of particular interest.

  1. Have changes made in the tutor-web had an impact on learning, as measured by test performance?

  2. Is there a difference in learning, as measured by test performance, between students doing WBH and PPH after the changes made in the tutor-web?

This section gives an overview of different learning environments in the context of the functionality of the tutor-web (Section 1.1), focusing on how to allocate exercises (problems) to students. A literature review of studies, conducted to investigate a potential difference in learning between WBH and PPH, is given in Section 1.2 followed by a brief discussion about formative assessment and feedback (Section 1.3). Finally, a short description of the tutor-web is given in Section 1.4.

1.1. Web-Based Learning Environments

A number of web-based learning environments are available on the web, some open and free to use, others commercial products. Several types of systems have emerged, including the learning management systems (LMS), learning content management systems (LCMS), and adaptive and intelligent web-based educational systems (AIWBES). The LMS is designed for planning, delivering, and managing learning events, usually adding little value to the learning process nor supporting internal content processes. The primary role of an LCMS, on the other hand, is to provide a collaborative authoring environment for creating and maintaining learning content (Ismail Citation2001). In AIWBES, the focus is on the student. Such systems adapt to the needs of each and every student (Brusilovsky and Peylo Citation2003) in contrast to many systems that are merely a network of static hypertext pages (Brusilovsky Citation1999).

A number of web-based learning environments use intelligent methods to provide personalized content or navigation such as the one described in Own (Citation2006). However, only a few systems use intelligent methods for exercise item allocation (Barla et al. Citation2010). The use of intelligent item allocation algorithms (IAA) is, however, a common practice in testing. Computerized adaptive testing (Wainer Citation2000) is a form of computer-based tests wherein the test is tailored to the examinees ability level by means of item response theory (IRT). IRT is the framework used in psychometrics for the design, analysis, and grading of computerized tests to measure abilities (Lord Citation1980). As Wauters, Desmet, and Van Den Noortgate (Citation2010) argue, IRT is potentially a valuable method for adapting the item sequence to the learners’ knowledge level. However, the IRT methods are designed for testing, not learning, and as shown in Stefansson and Sigurdardottir (Citation2011) and Jonsdottir and Stefansson (Citation2014), the IRT models are not appropriate since they do not take learning into account. New methods for IAA in learning environments are therefore needed.

Several systems can be found that are specifically designed for providing content in the form of exercise items. Examples of systems providing homework exercises are the WeBWork system (Gage, Pizer, and Roth Citation2002), ASSiSTments (Razzaq et al. Citation2005), ActiveMath (Melis et al. Citation2001), OWL (Hart et al. Citation1999), LON-CAPA (Kortemeyer et al. Citation2008), and WebAssign (Brunsmann et al. Citation1999). None of those systems use intelligent methods for item allocation, instead a fixed set of items are submitted to the students or drawn randomly from a pool of items.

1.2. Web-Based Homework Versus Pen-and-Paper Homework

A number of studies have been conducted to investigate a potential difference in learning between WBH and PPH. In most of the studies reviewed, no significant difference was detected (Bonham, Deardorff, and Beichner Citation2003; Cole and Todd Citation2003; Demirci Citation2007; Kodippili and Senaratne Citation2008; Palocsay and Stevens Citation2008; Lenz Citation2010; LaRose Citation2010; Gok Citation2011; Williams Citation2012). In three of the studies reviewed, WBH was found to be more effective than PPH as measured by final exam scores. In the first study, described by Dufresne et al. (Citation2002), data were gathered in various offerings of two large introductory physics courses taught by four lecturers over a 3-year period. The OWL system was used to deliver WBH. The authors found that WBH lead to higher overall exam performance, although the difference in average gain for the five instructor-course combinations was not statistically significant. In the second paper, VanLehn et al. (Citation2005) described Andes, a physics tutoring system. The performance of students working in the system was compared to students doing PPH homework for four years. Students using the system did significantly better on the final exam than the PPH students. However, the study has one limitation; the two groups were not taught by the same instructors. Finally, Brewer and Becker (Citation2010) described a study in multiple sections of college algebra. The WBH group used an online homework system developed by the textbook publisher. The authors concluded that the WBH group generally scored higher on the final exam but no significant difference existed between mathematical achievement of the control and treatment groups except in low-skilled students where the WBH group exhibited significantly higher mathematical achievement.

Even though most of the studies performed comparing WBH and PPH show no difference in learning, the fact that students do not do worse than students doing PPH makes WBH a favorable option, especially in large classes where correcting PPH is very time consuming. Also, students’ perception toward WBH has been shown to be positive (Hauk and Segalla Citation2005; VanLehn et al. Citation2005; Demirci Citation2007; Roth, Ivanchenko, and Record Citation2008; Smolira Citation2008; Hodge, Richardson, and York Citation2009; LaRose Citation2010).

All the studies reviewed were conducted using a quasi-experimental design, that is, students were not randomly assigned to the treatment groups. Either multiple sections of the same course were tested where some sections did PPH while the other(s) did WBH or the two treatments were assigned on different semesters. This could lead to some bias for example, due to difference in the student groups or lecturers participating in the two treatment arms of the experiments. The experiment described in this article is a repeated randomized crossover experiment so the same students were exposed to both WBH and PPH, resulting in a more accurate estimate of the potential difference between the two methods.

1.3. Assessment and Feedback

Assessments are frequently used by teachers to assign grades to students (assessment of learning), but a potential use of assessment is to use it as a part of the learning process (assessment for learning) (Garfield et al. Citation2011). The term summative assessment (SA) is often used for the former and formative assessment (FA) for the latter. The concepts of feedback and FA overlap strongly and, as stated by Black and Wiliam (Citation1998), the terms do not have a tightly defined and widely accepted meaning. Therefore, some definitions will be given below.

Taras (Citation2005) defined SA as “... a judgment which encapsulates all the evidence up to a given point. This point is seen as a finality at the point of the judgment” (p. 468) and about FA she writes “... FA is the same process as SA. In addition for an assessment to be formative, it requires feedback that indicates the existence of a ‘gap’ between the actual level of the work being assessed and the required standard” (p. 468). A widely accepted definition of feedback is then provided in Ramaprasad (Citation1983): “Feedback is information between the actual level and the reference level of a system parameter which is used to alter the gap in some way” (p. 4).

Stobart (Citation2008) suggested making the following distinction between the complexity of feedback; knowledge of results (KR) only states whether the answer is incorrect or correct, knowledge of correct response (KCR) where the correct response is given when the answer is incorrect and elaborated feedback (EF) where, for example, an explanation of the correct answer is given.

The terms formative assessment, feedback, and the distinction between the different types of feedback will be used here as defined above.

1.4. The Tutor-Web

The tutor-web (http://tutor-web.net) project is an ongoing research project. The functionalities of the system have changed considerable during the past decade. A pilot version, written only in HTML and Perl, was described by Stefansson (Citation2004). A newer version, implemented in Plone (Nagle Citation2010), was described in detail by Jonsdottir, Jakobsdottir, and Stefansson (Citation2015). The newest version, described in Lentin et al. (Citation2014), is a mobileweb site and runs smoothly on tablets and smart phones. Also, users do not need to be connected to the Internet when answering exercises, but only when downloading the item banks.

The tutor-web is an LCMS including exercise item banks within mathematics and statistics. The system is open and free to use for everyone having access to the web. At the heart of the system is the formative assessment. Intelligent methods are used for item allocation in such a way that the difficulty of the items allocated adapts to the students’ ability level. Since the focus of the experiment described here is on the effect of doing exercises (answering items) in the system, only functionalities related to that will be described. A more detailed description of the tutor-web is given in the above-mentioned papers.

1.4.1. Item Allocation Algorithm

In the systems used for WBH named in Section 1.1, a fixed set of items are allocated to students or drawn randomly, with uniform probability, from a pool of items. This was also the case in the first version of the tutor-web. A better way might be to implement an IAA so that the difficulty of the items adapts to the students’ ability. As discussed in Section 1.1, current IRT methods are not appropriate when the focus is on learning; therefore, a new type of IAA has been developed using the following basic criteria:

  • increase the difficulty level as the student learns

  • select items so that a student can only complete a session with high grade by completing the most difficult items

  • select items from previous sessions to refresh memory.

Items are grouped into lectures in the tutor-web system where each lecture covers a specific topic. This could be discrete distributions in material used in an introductory course in statistics or limits in a basic course in calculus. Within a lecture, the difficulty of an item is simply calculated as the ratio of incorrect responses to the total number of responses. The items are then ranked according to their difficulty, from the easiest item to the most difficult one.

The implementation of the first criterion (shown above) has changed over the years. In the first version of the tutor-web, all items within a lecture were assigned uniform probability of being chosen for every student. This was changed in 2012 with the introduction of a probability mass function (pmf) that calculates the probability of an item being chosen for a student. The pmf is exponentially related to the ranking of the item and also depends on the student’s grade(1) p(r)=qrc·mgm+gN·mifgm,qNr+1c·gm1m+1gN·(1m)ifg>m,(1) where q is a constant (0  q  1) controlling the steepness of the function, N is the total number of items belonging to the lecture, r is the difficulty rank of the item (r=1,2,...,N), g is the grade of the student (0  g  1), and c is a normalizing constant, c=i=1Nqi. Finally, m is a constant (0<m<1) so that when g<m, the pmf is strongly decreasing and the mass is mostly located at the easy items, when g=m the pmf is uniform and when g>m the pmf is strongly increasing with the mass mostly located at the difficult items. This was changed in 2013 in such a way that the mode of the pmf moves to the right with increasing grade that is achieved by using the following pmf based on the beta distribution(2) p(r)=1i=1NiN+1α·1iN+1βrN+1α·1rN+1β,(2) where r is the ranked item difficulty (r=1,2,...,N) and α and β are the constants controlling the shape of the function. The three different pmfs used over the years (uniform, exponential, and beta) are shown in . Looking at the last figure, showing the pmf currently used, it can be seen that a beginning student (with a score 0) receives easy items with high probability. As the grade increases, the mode of the pmfs shifts to the right until the student reaches a top score resulting in a high probability of getting the most difficult items. Using this pmf, the first two of the criteria for the IAA listed above are fulfilled.

Figure 1. The different pmfs used in the item allocation algorithm. Left: uniform. Middle: exponential. Right: beta.

Figure 1. The different pmfs used in the item allocation algorithm. Left: uniform. Middle: exponential. Right: beta.

The last criterion for the IAA is related to how people forget. Ebbinghaus (Citation1913) was one of the first to research this issue. He proposed the forgetting curve and showed in his studies that learning and the recall of learned information depends on the frequency of exposure to the material. It was therefore decided in 2012 to change the IAA in such a way that students are now occasionally allocated items from previous lectures to refresh memory.

1.4.2. Grading

Although the main goal of making the students answer exercises in the tutor-web is learning, there is also a need to evaluate the students’ performance. The students are permitted to continue to answer items until they (or the instructor) are satisfied, which makes grading a nontrivial issue. In the first version of the tutor-web, the last eight answers counted (with equal weight) toward the tutor-web grade. Students were given one point for a correct answer and minus half a point for an incorrect one. The idea was that old sins should be forgotten when students are learning. This had some bad side effects with students often quitting answering items after seven correct attempts in a row (Jonsdottir, Jakobsdottir, and Stefansson Citation2015), which is a perfectly logical result since a student who has a sequence of seven correct and one incorrect will need another eight correct answers in sequence to increase the grade. The tutor-web grade was also found to be a bad predictor of students’ performance on a final exam, the grade being too high (Lentin et al. Citation2014). It was therefore decided in 2014 to change the grading scheme (GS) and use min(max(n/2,8),30) items after n attempts when calculating the tutor-web grade. That is, use a minimum of eight answers, and then after eight answers use n/2, but no more than 30 answers. Using this GS, the weight of each answer is less than before (when n>8), thus eliminating the fear of answering the eighth item incorrectly, simultaneously making it more difficult for students to get a top grade since more answers are used when calculating the grade.

1.4.3. Feedback

The quality of the feedback is a key feature in any procedure for formative assessment (Black and Wiliam Citation1998). In the first version of the tutor-web, only KR/KCR-type feedback was provided. Sadler (Citation1989) suggested that the KR-type feedback is insufficient if the feedback is to facilitate learning so in 2012 an explanation was added to items in the tutor-web item bank, thus providing students with EF. A question from a lecture covering inferences for proportions is shown in . Here, the student has answered incorrectly (marked by red). The correct answer is marked with green and an explanation given below.

Figure 2. A question from a lecture on inferences for proportions. The students are informed what the correct answer is and shown an explanation of the correct answer.

Figure 2. A question from a lecture on inferences for proportions. The students are informed what the correct answer is and shown an explanation of the correct answer.

1.4.4. Summary of Changes in the Tutor-Web

In the sections above, changes related to the IAA, grading and feedback were reviewed. A summary of the changes discussed is shown in .

Table 1. Summary of changes in the tutor-web.

2. Material and Methods

The data used for the analysis were gathered in an introductory course in statistics in the University of Iceland from 2011 to 2014. Every year some 200 first-year students in chemistry, biochemistry, geology, pharmacology, food science, nutrition, tourism studies, and geography were enrolled in the course. The course was taught by the same instructor over the timespan of the experiment. About 60% of the students had already taken a course in basic calculus the semester before, while the rest of the students had much weaker background in mathematics. Around 60% of the students were females and 40% males. The students needed to hand in homework four times during the course. The subjects of the homework were discrete distributions, continuous distributions, inference about means, and inference about proportions. The students were told in the beginning of the course that there would be several in-class tests during the semester, but they were not told how many, at what timepoints or from which topics they would be examined in. The final grade in the course consisted of four parts, the final exam (50%), the four homework assignments (10%), in-class tests (15%), and assignments in the statistical software R (25%).

The experiment conducted is a repeated randomized crossover experiment. The design of the experiment is shown in .

Figure 3. The design of the experiment. The experiment was repeated four times from 2011 to 2014.

Figure 3. The design of the experiment. The experiment was repeated four times from 2011 to 2014.

Each year the class was split randomly into two groups. One group was instructed to do exercises in the tutor-web system in the first homework assignment (WBH), while the other group handed in written homework (PPH). The exercises on the PPH assignment and in the tutor-web were similar and covered the same topics. Shortly, after the students handed in their homework, they took a test in class. The groups were crossed before the next homework, that is, the former WBH students handed in PPH and vice versa and again the students were tested. Each year this procedure was repeated and the test scores from the four exams registered. The students were not made aware of the experiment but were told that the groups were made to manage the number PPH homework that needed to be corrected at a time. There were no indications that the students were aware of the experiment. The number of students taking each exam is shown in .

Table 2. Number of students taking the tests.

To answer the first research question, stated in Section 1, the following linear mixed model is fitted to the data from 2011 to 2014 and nonsignificant factors removed(3) gmlhyi=μ+αm+βl+γh+δy+(αγ)mh+(βγ)lh+(δγ)yh+si+ϵmlhyi,(3) where g is the test grade, α is the math background (m= weak, strong), β is the lecture material (l= discrete distributions, continuous distributions, inference about means, inference about proportions), γ is the type of homework (h= PPH, WBH), δ is the year (y= 2011, 2012, 2013, 2014), and s is the random student effect (siN(0,σs2)). The interaction term (αγ) measures whether the effect of type of homework is different between students with strong and weak math background and (βγ) whether the effect of type of homework is different for the lecture material covered. The interaction term (δγ) is of special interest since it measures the effect of changes made in the tutor-web system during the four years of experiments.

To answer the second research question, only data gathered in 2014 are used and the following linear mixed model fitted to the data(4) gmlhi=μ+αm+βl+γh+(αγ)mh+(βγ)lh+si+ϵmlhi(4) with α, β, γ, and s as above. If the interaction terms are found to be nonsignificant, the γ factor is of special interest since it measures the potential difference in learning between students doing WBH and PPH.

In addition to collecting the exam grades, the students answered a survey at the end of each semester. Total 442 students in total responded to the surveys (121 in 2011, 88 in 2012, 131 in 2013, and 102 in 2014). Two of the questions are related to the use of the tutor-web and the students’ perception of WBH and PPH homework:

  1. Do you learn by answering items in the tuto-web? (yes/no)

  2. What do you prefer for homework? (PPH/WBH/Mix of PPH and WBH).

3. Results

3.1. Analysis of Exam Scores

In order to see which factors relate to exam scores, the linear mixed model in Equation (Equation3) was fitted to the exam score data using R (R Core Team Citation2014). The lmer function in the lme4 package, which includes functions to fit linear and generalized linear mixed-effects models (Bates et al. Citation2014), was used. The interaction terms (mh) and (lh) were found to be nonsignificant and therefore removed from the model. This indicates that the effect of homework type does not depend on math background or lecture material covered. However, the (yh) interaction was found to be significant implying that the effect of the type of homework is not the same during the four years. The resulting final model can be written as(5) gmlhyi=μ+αm+βl+γh+δy+(δγ)yh+si+ϵmlhyi.(5)

The estimates of the parameters and the associated t-values are shown in along with p-values calculated using the lmerTest package (Kuznetsova, Brockhoff, and Christensen Citation2013). Estimates of the variance components were σ^s2=1.84 and σ^2=3.33. The reference group (included in the intercept) consists of the students in the 2011 course with weak math background handing in PPH on discrete distributions. Residual plots revealed no violation of model assumptions, such as nonnormal errors or random effects.

Table 3. Parameter estimates for the final model used to answer research question 1. The reference group consists of the students in the 2011 course with weak math background handing in PPH on discrete distributions. Grades were given on the 0–10 scale. The effect size related to research question 1 is in bold font.

By looking at the estimate for the year2014:tw term, it can be noticed that the difference between the WBH and PPH groups is significantly different in 2011 (the reference group) and 2014 (p = 0.012), indicating that the changes made to the tutor-web had a positive impact on learning. The difference in effect size between WBH and PPH in 2011 and 2014 is 0.634. It should also be noted that the effect size of math background is large (1.680).

In order to answer the second question, the model in Equation (Equation4) was fitted to the data from 2014. The interaction terms were both nonsignificant and therefore removed from the model. The final model can be written as(6) gmlhi=μ+αm+βl+γh+si+ϵmlhi.(6)

The estimates of the parameters, the associated t- and p-values are shown in . Estimates of the variance components were σ^s2=1.48 and σ^2=2.84. The reference group (included in the intercept) are students with weak math background handing in PPH on discrete distributions.

Table 4. Parameter estimates for the final model used to answer research question 2. The reference group (included in the intercept) consists of the students with weak math background handing in PPH on discrete distributions. Grades were given on the 0–10 scale. The effect size related to research question 2 is in bold font.

By looking at the table, it can be noted that the difference between the WBH and PPH groups is significant (p = 0.009) and the estimated effect size is 0.416 indicating that the students did better after handing in WBH than PPH. Again, the effect size of math background is large (1.379).

3.2. Analysis of Student Surveys

In general, the students’ perception of the tutor-web system is very positive. In student surveys conducted over the four years, over 90% of the students feel they learn using the system. Despite the positive attitude toward the system about 80% of the students prefer a mixture of PPH and WBH over PPH or WBH alone.

It is interesting to look at the difference in perception over the four years, shown in . As stated above, the GS was changed in 2014 making it more difficult to get a top grade for homework in the system and more difficult than in PPH. This lead to a general frustration in the student group. The fraction of students preferring only handing in PPH, compared to WBH or mix of the two, more than tripled compared to the previous years.

Figure 4. Results from the student survey. Left: “Do you learn from the tutor-web?” Right: “What is your preference for homework?”

Figure 4. Results from the student survey. Left: “Do you learn from the tutor-web?” Right: “What is your preference for homework?”

4. Conclusion and Future Work

The learning environment tutor-web has been under development during the past decade at the University of Iceland. An experiment has been conducted to answer the following research questions.

  1. Have changes made in the tutor-web had an impact on learning as measured by test performance?

  2. Is there a difference in learning, as measured by test performance, between students doing PPH and WBH after the changes made in the tutor-web?

The experiment was conducted over four years in an introductory course on statistics. It is a repeated crossover experiment so students were exposed to both methods, WBH and PPH.

The difference between the WBH and PPH groups was found to be significantly different in 2011 and 2014 (p = 0.012), indicating that the changes made to the tutor-web have made a positive impact on learning as measured by test scores. The difference in effect size between WBH and PPH in 2011 and 2014 is 0.634. Several changes were made in the system between 2011 and 2014 as shown in . As can be seen in the table, the changes are somewhat confounded but moving from uniform probability to the pmf shown in Equation (Equation2) when allocating items, allocating items from old material to refresh memory, changing the GS so that min(max(n/2,8),30) items count in the grade instead of eight, providing EF instead of KR/KCR-type feedback and having a mobile version appear to have had a positive impact on learning.

To answer the second research question, only data gathered in 2014 were used. The difference between the WBH and PPH groups was found to be significant (p = 0.009) with effect size 0.416 indicating that the students did better after handing in WBH than PPH. In both models the effect size of math background was large (1.680 and 1.379).

The tutor-web project is an ongoing research project and the tutor-web team will continue to work on improvements to the system. Improvements related to the exercise items are quality of items and feedback, the GS and IAA.

4.1. Quality of Items and Feedback

As pointed out by Garfield (Citation1994), it is important to have items that also require student understanding of the concepts, not only test skills in isolation of a problem context. It is therefore important to have items that encourage deep learning rather than surface learning (Biggs Citation1987).

One goal of the tutor-web team is to collect metadata for each item in the item bank. One classification of the items will reflect how deep an understanding is required using for example, the Structure of the Observed Learning Outcomes (SOLO) taxonomy (Biggs and Collis Citation1982). According to SOLO, the following three structural levels make up a cycle of learning. “Unistructural: The learner focuses on the relevant domain and picks one aspect to work with. Multistructural: The learner picks up more and more relevant or correct features but does not integrate them. Relational: The learner now integrates the parts with each other, so that the whole has a coherent structure and meaning” (p. 152).

In addition to the SOLO framework, to reflect difficulty of items in statistics courses, items could also be classified based on cognitive statistical learning outcomes suggested by delMas (Citation2002), Garfield and Ben-Zvi (Citation2008), and Garfield and delMas (Citation2010). These learning outcomes have been defined as (Garfield and Franklin Citation2011): “Statistical literacy, understanding and using the basic language and tools of statistics. Statistical reasoning, reasoning with statistical ideas and making sense of statistical information. Statistical thinking, recognizing the importance of examining and trying to explain variability and knowing where the data came from, as well as connecting data analysis to the larger context of a statistical investigation” (p. 4–5). Items measuring these concepts could be ranked in the hierarchical order in terms of difficulty, starting with statistical literacy items as less difficult and ending with most difficult items measuring statistical thinking.

4.2. Grading Scheme

The GS used in a learning environment such as the tutor-web influences the behavior of the students (Jonsdottir, Jakobsdottir, and Stefansson Citation2015). The GS used in the tutor-web was changed in 2014 eliminating some problems but introducing a new one; the students found it unfair. The following criteria will be used to develop the GS further.

The GS should:

  • entice students to continue to request items, thus learning more

  • reflect current knowledge well

  • be fair in students’ minds.

Currently, a new GS is being implemented. Instead of giving equal weight to items used to calculate the grade, newer items are given more weight using the following formula(7) w(l)=αwhenl=1,(1α)·1lng+1si=2ng1ing+1swhen1<lng0whenl>ng,(7) where l is the lagged item number (l = 1 being the most recent item answered), α is the weight given to the most recent answer, ng is the number of answers included in the grade and s is the parameter controlling the steepness of the function. Some weight functions for a student that has answered 30 items are shown in . As can be seen by looking at the figure, the newest answers get the most weight and old (sins) get less.

Figure 5. The weight function for a student that has answered 30 items for different values of the parameters. Left: α=0.15,s=1,ng=15. Right: α=0.10,s=2,ng=30.

Figure 5. The weight function for a student that has answered 30 items for different values of the parameters. Left: α=0.15,s=1,ng=15. Right: α=0.10,s=2,ng=30.

The students will be informed of their current grade as well as what their grade will be if they answer the next item correctly to entice them to continue requesting items. Studies investigating the effect of the new GS will be conducted in 2016–2017.

4.3. Item Allocation Algorithm

In the current version of the IAA, the items are ranked according to the difficulty level, calculated as the ratio of incorrect responses to the total number of responses. This is, however, not optimal since the ranking places the items with equal distance apart on the difficulty scale. A solution to this problem could be to use directly the ratio of incorrect responses to the total number of responses in the IAA instead of the ranking. Another solution would be to implement a more sophisticated method for estimating the difficulty of the items using IRT but as mentioned earlier those methods are designed for testing not learning. However, it would be interesting to extend the IRT models by including a learning parameter that would make the models more suitable in a learning environment. Finally, it is of interest to investigate formally the impact of allocating items from old material to refresh memory.

Funding

The tutor-web project has been supported by Ministry of Education and the Marine Research Institute of Iceland, the United Nations University, University of Iceland and the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no 613571—MareFrame.

References

  • Barla, M., Bieliková, M., Ezzeddinne, A., Kramar, T., Simko, M., and Vozár, O. (2010), “On the Impact of Adaptive Test Question Selection for Learning Efficiency,” Computers and Education, 55, 846–857.
  • Bates, D., Maechler, M., Bolker, B. M., and Walker, S. (2015). “Fitting Linear Mixed-Effects Models Using lme4,” Journal of Statistical Software, 67(1), 1–48.
  • Biggs, J. B. (1987), Student Approaches to Learning and Studying. Research Monograph, Melbourne: Australian Council for Educational Research, Ltd.
  • Biggs, J. B., and Collis, K. F. (1982), Evaluating the Quality of Learning: The SOLO Taxonomy, New York: Academic Press.
  • Black, P., and Wiliam, D. (1998), “Assessment and Classroom Learning,” Assessment in Education: Principles, Policy and Practice, 5(1), 7–74.
  • Bonham, S. W., Deardorff, D. L., and Beichner, R. J. (2003), “Comparison of Student Performance using Web and Paper-Based Homework in College-Level Physics,” Journal of Research in Science Teaching, 40, 1050–1071.
  • Brewer, D. S., and Becker, K. (2010), “Online Homework Effectiveness for Underprepared and Repeating College Algebra Students,” Journal of Computers in Mathematics and Science Teaching, 29, 353–371.
  • Brunsmann, J., Homrighausen, A., Six, H.-W., and Voss, J. (1999), “Assignments in a Virtual University—The Webassign-System,” in Proceedings of the 19th World Conference on Open Learning and Distance Education, Vienna, Austria.
  • Brusilovsky, P. (1999), “Adaptive and Intelligent Technologies for Web-Based Education,” Kunstliche Intelligenz, 4, 19–25.
  • Brusilovsky, P., and Peylo, C. (2003), “Adaptive and Intelligent Web-Based Educational Systems,” International Journal of Artificial Intelligence in Education, 13(2–4), 159–172.
  • Cole, R. S., and Todd, J. B. (2003), “Effects of Web-Based Multimedia Homework with Immediate Rich Feedback on Student Learning in General Chemistry,” Journal of Chemical Education, 80, 1338–1343.
  • delMas, R. (2002), “Statistical Literacy, Reasoning, and Learning: A Commentary,” Journal of Statistics Education, 10(3).
  • Demirci, N. (2007), “University Students’ Perceptions of Web-Based vs. Paper-Based Homework in a General Physics Course,” Eurasia Journal of Mathematics, Science and Technology Education, 3, 29–34.
  • Dufresne, R., Mestre, J., Hart, D. M., and Rath, K. A. (2002), “The Effect of Web-Based Homework on Test Performance in Large Enrollment Introductory Physics Courses,” Journal of Computers in Mathematics and Science Teaching, 21, 229–251.
  • Ebbinghaus, H. (1913), Memory: A Contribution to Experimental Psychology (Vol. 3), New York: Teachers College, Columbia University.
  • Gage, M., Pizer, A., and Roth, V. (2002), “WeBWorK: Generating, Delivering, and Checking Math Homework via the Internet,” in ICTM2 International Congress for Teaching of Mathematics at the Undergraduate Level, Crete, Greece: Wiley.
  • Garfield, J., and Ben-Zvi, D. (2008), Developing Students’ Statistical Reasoning: Connecting Research and Teaching Practice, Dordrecht, The Netherlands: Springer Science & Business Media.
  • Garfield, J., and delMas, R. (2010), “A Web Site that Provides Resources for Assessing Students’ Statistical Literacy, Reasoning and Thinking,” Teaching Statistics, 32, 2–7.
  • Garfield, J., and Franklin, C. (2011), “Assessment of Learning, for Learning, and as Learning in Statistics Education,” in Teaching Statistics in School Mathematics-Challenges for Teaching and Teacher Education, Dordrect, The Netherlands: Springer Science & Business Media, pp. 133–145.
  • Garfield, J., Zieffler, A., Kaplan, D., Cobb, G. W., Chance, B. L., and Holcomb, J. P. (2011), “Rethinking Assessment of Student Learning in Statistics Courses,” The American Statistician, 65, 1–10.
  • Garfield, J. B. (1994), “Beyond Testing and Grading: Using Assessment to Improve Student Learning,” Journal of Statistics Education, 2(1), 1–11.
  • Gok, T. (2011), “Comparison of Student Performance using Web-and Paper-Based Homework in Large Enrollment Introductory Physics Courses,” International Journal of Physical Sciences, 6, 3778–3784.
  • Hart, D., Woolf, B., Day, R., Botch, B., and Vining, W. (1999), “OWL: An Integrated Web-Based Learning Environment,” in International Conference on Mathematics/Science Education and Technology, pp. 106–112, San Antonio.
  • Hauk, S., and Segalla, A. (2005), “Student Perceptions of the Web-Based Homework Program WeBWorK in Moderate Enrollment College Algebra Classes,” Journal of Computers in Mathematics and Science Teaching, 24, 229–253.
  • Hodge, A., Richardson, J. C., and York, C. S. (2009), “The Impact of a Web-Based Homework Tool in University Algebra Courses on Student Learning and Strategies,” Journal of Online Learning and Teaching, 5, 616–628.
  • Ismail, J. (2001), “The Design of an E-Learning System: Beyond the Hype,” The Internet and Higher Education, 4(3–4), 329–336.
  • Jonsdottir, A. H., Jakobsdottir, A., and Stefansson, G. (2015), “Development and use of an Adaptive Learning Environment to Research Online Study Behaviour,” Educational Technology and Society, 18, 132–144.
  • Jonsdottir, A. H., and Stefansson, G. (2014), “From Evaluation to Learning: Some Aspects of Designing a Cyber-University,” Computers and Education, 78, 344–351.
  • Kodippili, A., and Senaratne, D. (2008), “Is Computer-Generated Interactive Mathematics Homework More Effective than Traditional Instructor-Graded Homework?” British Journal of Educational Technology, 39, 928–932.
  • Kortemeyer, G., Kashy, E., Benenson, W., and Bauer, W. (2008), “Experiences using the Open-Source Learning Content Management and Assessment System Lon-Capa in Introductory Physics Courses,” American Journal of Physics, 76, 438–444.
  • Kuznetsova, A., Brockhoff, P. B., and Christensen, R. H. B. (2013), “lmertest: Tests for Random and Fixed Effects for Linear Mixed Effect Models (lmer objects of lme4 package),” R Package Version 2.0-3.3, p. 2.
  • LaRose, P. G. (2010), “The Impact of Implementing Web Homework in Second-Semester Calculus,” Primus, 20, 664–683.
  • Lentin, J., Jonsdottir, A. H., Stern, D., Mokua, V., and Stefansson, G. (2014), “A Mobile Web for Enhancing Statistics and Mathematics Education,” in ICOTS9 Proceedings, Arizona, USA. arXiv:1406.5004.
  • Lenz, L. (2010), “The Effect of a Web-Based Homework System on Student Outcomes in a First-Year Mathematics Course,” Journal of Computers in Mathematics and Science Teaching, 29, 233–246.
  • Lord, F. (1980), Applications of Item Response Theory to Practical Testing Problems, Hillsdale, NJ: L. Erlbaum Associates.
  • Melis, E., Andres, E., Budenbender, J., Frischauf, A., Goduadze, G., Libbrecht, P., Pollet, M., and Ullrich, C. (2001), “ActiveMath: A Generic and Adaptive Web-Based Learning Environment,” International Journal of Artificial Intelligence in Education, 12, 385–407.
  • Nagle, R. (2010), A User’s Guide to Plone 4, Houston, TX: Enfold Systems Inc.
  • OECD (2013), Education at a Glance 2013. OECD Indicators, Paris, France: OECD Publishing.
  • Own, Z. (2006), “The Application of an Adaptive Web-Based Learning Environment on Oxidation–Reduction Reactions,” International Journal of Science and Mathematics Education, 4, 73–96.
  • Palocsay, S. W., and Stevens, S. P. (2008), “A Study of the Effectiveness of Web-Based Homework in Teaching Undergraduate Business Statistics,” Decision Sciences Journal of Innovative Education, 6, 213–232.
  • R Core Team (2014), R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Retrieved from https://www.R-project.org/
  • Ramaprasad, A. (1983), “On the Definition of Feedback,” Behavioral Science, 28, 4–13.
  • Razzaq, L. M., Feng, M., Nuzzo-Jones, G., Heffernan, N. T., Koedinger, K. R., Junker, B., Ritter, S., Knight, A., Mercado, E., Turner, T. E., et al. (2005), “The Assistment Project: Blending Assessment and Assisting,” in Proceedings of the 12th Annual Conference on Artificial Intelligence in Education, pp. 555–562, Amsterdam.
  • Roth, V., Ivanchenko, V., and Record, N. (2008), “Evaluating Student Response to Webwork, a Web-Based Homework Delivery and Grading System,” Computers and Education, 50, 1462–1482.
  • Sadler, D. R. (1989), “Formative Assessment and the Design of Instructional Systems,” Instructional Science, 18, 119–144.
  • Smolira, J. C. (2008), “Student Perceptions of Online Homework in Introductory Finance Courses,” Journal of Education for Business, 84, 90–95.
  • Stefansson, G. (2004), “The Tutor-web: An Educational System for Classroom Presentation, Evaluation and Self-Study,” Computers and Education, 43, 315–343.
  • Stefansson, G., and Sigurdardottir, A. J. (2011), “Web-Assisted Education: From Evaluation to Learning,” Journal of Psychiatric Research, 38, 47–60.
  • Stobart, G. (2008), Testing Times: The Uses and Abuses of Assessment, New York: Routledge.
  • Taras, M. (2005), “Assessment–Summative and Formative–Some Theoretical Reflections,” British Journal of Educational Studies, 53, 466–478.
  • VanLehn, K., Lynch, C., Schulze, K., Shapiro, J. A., Shelby, R., Taylor, L., Treacy, D., Weinstein, A., and Wintersgill, M. (2005), “The Andes Physics Tutoring System: Lessons Learned,” International Journal of Artificial Intelligence in Education, 15, 147–204.
  • Wainer, H. (2000), Computerized Adaptive Testing, Hillsdale, NJ: L. Erlbaum Associates.
  • Wauters, K., Desmet, P., and Van Den Noortgate, W. (2010), “Adaptive Item-Based Learning Environments Based on the Item Response Theory: Possibilities and Challenges,” Journal of Computer Assisted Learning, 26, 549–562.
  • Williams, A. (2012), “Online Homework vs. Traditional Homework: Statistics Anxiety and Self-Efficacy in an Educational Statistics Course,” Technology Innovations in Statistics Education, 6(1).