215
Views
0
CrossRef citations to date
0
Altmetric
Articles

La confiabilidad de los evaluadores de la prueba escrita de un examen de certificación de español con fines académicos

Pages 168-181 | Received 18 Jan 2018, Accepted 28 Sep 2018, Published online: 17 Dec 2018

Bibliografía

  • Attali, Y., W. L. Lewis y M. Steier. 2012. “Scoring with the Computer: Alternative Procedures for Improving the Reliability of Holistic Essay Scoring”. Language Testing 30 (1): 125–141.
  • Bachman, L. y A. Palmer. 2010. Language Assessment in Practice: Developing Language Assessments and Justifying their Use in the Real World. Oxford: Oxford University Press.
  • Barkaoui, K. 2007. “Rating Scale Impact on EFL Essay Marking: A Mixed-Method Study”. Assessing Writing 12 (2): 86–107.
  • Barkaoui, K. 2010a. “Variability in ESL Essay Rating Processes: The Role of the Rating Scale and Rater Experience”. Language Assessment Quarterly 7: 54–74.
  • Barkaoui, K. 2010b. “Explaining ESL Essay Holistic Scores: A Multilevel Modeling Approach”. Language Testing 27 (4): 515–535.
  • Barkaoui, K. 2011. “Think-Aloud Protocols in Research on Essay Rating: An Empirical Study of their Veridicality and Reactivity”. Language Testing 28 (1): 51–75.
  • Becker, A. 2011. “Examining Rubrics Used to Measure Writing Performance in U.S. Intensive English Programs”. The CATESOL Journal 22 (1): 113–130.
  • Bejar, I. 2012. “Rater Cognition: Implications for Validity”. Educational Measurement: Issues and Practice 31 (3): 2–9.
  • Canale, M. y M. Swain. 1980. “Theoretical Bases of Communicative Approaches to Second Language Teaching and Testing”. Applied Linguistics 1 (1): 1–47.
  • Cheng, C., H. C. Yang, H. Wen-Chin y S. Gwo-Ji. 2017. “Learning, Behaviour and Reaction Framework: A Model for Training Raters to Improve Assessment Quality”. Assessment & Evaluation in Higher Education 42 (5): 705–723.
  • Consejo de Europa. 2001. Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: Press Syndicate of the University of Cambridge.
  • Cumming, A. 2013. “Assessing Integrated Writing Tasks for Academic Purposes: Promises and Perils”. Language Assessment Quarterly 10 (1): 1–8.
  • Deane, P. 2013. “On the Relation between Automated Essay Scoring and Modern Views of the Writing Construct”. Assessing Writing 18 (1): 7–24.
  • DiPardo, A., B. A. Storms y M. Selland. 2011. “Seeing Voices: Assessing Writerly Stance in the NWP Analytic Writing Continuum”. Assessing Writing 16 (3): 170–188.
  • East, M. 2009. “Evaluating the Reliability of a Detailed Analytic Scoring Rubric for Foreign Language Writing”. Assessing Writing 14 (2): 88–115.
  • Eckes, T. 2009. “Many-Facet Rasch Measurement”. En Reference Supplement to the Manual for Relating Language Examinations to the Common European Framework of Reference for Languages: Learning, Teaching, Assessment (Section H), ed. S. Takala. Estrasburgo: Consejo de Europra/División de Política de la Lengua. https://rm.coe.int/1680667a23
  • Eckes, T. 2012. “Operational Rater Types in Writing Assessment: Linking Rater Cognition to Rater Behavior”. Language Assessment Quarterly 9 (3): 270–292.
  • Enright, M. K. y T. Quinlan. 2010. “Complementing Human Judgment of Essays Written by English Language Learners with E-Rater(R) Scoring”. Language Testing 27 (3): 317–334.
  • Engelhard, G. y S. A. Wind. 2013. “Rating Quality Studies Using Rasch Measurement Theory”. Research Report 3. Nueva York: The College Board.
  • Esfandiari, R. y C. M. Myford. 2013. “Severity Differences among Self-Assessors, Peer-Assessors, and Teacher Assessors Rating EFL Essays”. Assessing Writing 18 (2): 111–131.
  • González, E. F., N. P. Trejo y R. Roux. 2017. “Assessing EFL University Students’ Writing: A Study of Score Reliability”. Revista Electrónica de Investigación Educativa 19 (2): 91–103.
  • Hamp-Lyons, L. 2007. “Worrying about Rating”. Assessing Writing 12: 1–9.
  • Han, T. 2017. “Scores Assigned by Inexpert EFL Raters to Different Quality EFL Compositions, and the Raters’ Decision-Making Behaviors”. International Journal of Progressive Education 13 (1): 136–152.
  • Huang, J. y C. J. Foote. 2010. “Grading between the Lines: What Really Impacts Professors’ Holistic Evaluation of ESL Graduate Student Writing?” Language Assessment Quarterly 7 (1): 37–41.
  • Jonsson, A. y G. Svingby. 2007. “The Use of Scoring Rubrics: Reliability, Validity and Educational Consequences”. Educational Research Review 2 (2): 130–144.
  • Kane, M.T. 2013. “Validating the Interpretations and Uses of Test Scores”. Journal of Educational Measurement 50 (1): 1–73.
  • Knoch, U. 2007a. “Do Empirically Developed Rating Scales Function Differently to Conventional Rating Scales for Academic Writing?” Spaan Fellow Working Papers in Second or Foreign Language Assessment 5: 1–36.
  • Knoch, U. 2007b. Diagnostic Writing Assessment. The Development and Validation a Rating Scale. Tesis doctoral, University of Auckland.
  • Knoch, U. 2009. “Diagnostic Assessment of Writing: A Comparison of Two Rating Scales”. Language Testing 26 (2): 275–304.
  • Lim, G. S. 2011. “The Development and Maintenance of Rating Quality in Performance Writing Assessment: A Longitudinal Study of New and Experienced Raters”. Language Testing 28 (4): 543–560.
  • Linacre, J. M. 2012. “Facets Tutorial 2”: 1–40. http://www.winsteps.com/a/ftutorial2.pdf
  • Linacre, J. M. 2015. Facets Computer Program for Many-Facet Rasch Measurement, version 3.71.4. Beaverton, OR: Winsteps.com
  • Mendoza, A. 2014. “Las prácticas de evaluación docente y las habilidades de escritura requeridas en el nivel posgrado”. Innovación Educativa 14 (66): 147–175.
  • Mendoza, A. 2015. “La selección de las tareas de escritura en los exámenes de lengua extranjera destinados al ámbito académico”. Revista Nebrija de Lingüística Aplicada 18: 106–123.
  • Mendoza, A. 2017. El argumento de validación de la prueba escrita del examen de español como lengua extranjera para el ámbito académico de la UNAM. Tesis doctoral, Universidad Nacional Autónoma de México.
  • Mendoza, A. 2018. “El uso de Many-Facet Rasch Measurement para examinar la calidad del proceso de corrección de pruebas de desempeño”. Revista Mexicana de Investigación Educativa 23 (77): 597–625.
  • Mendoza, A. y U. Knoch. 2018. “Examining the Validity of the Analytic Rating Scale for a Spanish Test for Academic Purposes Using the Argument-Based Approach to Validation”. Assessing Writing 35: 41–55. https://doi.org/10.1016/j.asw.2017.12.003
  • Myford, C. M. 2012. “Rater Cognition Research: Some Possible Directions for the Future”. Educational Measurement: Issues and Practice 31 (3): 48–49.
  • Myford, C. M. y E. W. Wolfe. 2003. “Detecting and Measuring Rater Effects Using Many-Facet Rasch Measurement: Part I”. Journal of Applied Measurement 4: 386–422.
  • Prieto, G. 2011. “Evaluación de la ejecución mediante el modelo Many-Facet Rasch Measurement”. Psicothema 23 (2): 233–238.
  • Rezaei, A. R. y M. Lovorn. 2010. “Reliability and Validity of Rubrics for Assessment through Writing”. Assessing Writing 15 (1): 18–39.
  • Wang, Z. y L. Yao 2013. “The Effects of Rater Severity and Rater Distribution on Examinees’ Ability Estimation for Constructed-Response Items”. ETS Research Report No. RR:13–23.
  • Weigle, S. C. 2002. Assessing Writing. Cambridge: Cambridge University Press.
  • Weigle, S. C. y K. Parker. 2012. “Source Text Borrowing in an Integrated Reading/Writing Assessment”. Journal of Second Language Writing 21: 118–133.
  • Wind, S. A. y G. Engelhard. 2013. “How Invariant and Accurate are Domain Ratings in Writing Assessment?” Assessing Writing 18 (4): 278–299.
  • Wind, S. A., C. Stager y Y. J. Patil. 2017. “Exploring the Relationship between Textual Characteristics and Rating Quality in Rater-Mediated Writing Assessments: An Illustration with L1 and L2 Writing Assessments”. Assessing Writing 34: 1–15.
  • Wolfe, E. W. y A. McVay. 2012. “Application of Latent Trait Models to Identifying Substantively Interesting Raters”. Educational Measurement: Issues and Practice 31 (3): 31–37.
  • Zhao, C. G. 2012. “Measuring Authorial Voice Strength in L2 Argumentative Writing: The Development and Validation of an Analytic Rubric”. Language Testing 30 (2): 201–230.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.