Rater-Mediated Assessment of Iranian Undergraduate Students’College Essays: Many-Facet Rasch Modelling

Esfandiari, Rajab

doi:10.22049/jalda.2021.27032.1234

Document Type : Research Article

Author

Rajab Esfandiari

Associate Professor, Department of English Language, Faculty of Humanities, Imam Khomeini International University, Qazvin Iran

https://doi.org/10.22049/jalda.2021.27032.1234

Abstract

In rater-mediated assessments, the ratings awarded to language learners’ written, or spoken, performances do not necessarily reflect their language abilities because a number of other construct-irrelevant factors may affect the knowledge they demonstrate. Rater subjectivity and rating scales are among the variables possibly influencing the final results. The purpose of the present study was to examine the extent to which university students’ ratings on their essays mirrored the effect of these two factors. To that end, 150 Iranian EFL teachers rated ten five-paragraph essays BA students had written as their course requirements at Imam Khomeini International University. The raters used two rating scales to rate the essays on a number of assessment criteria. The study rested on a partial rating design, and the Rasch-based computer program, FACETS, was used to analyze the data. Results of Facets analyses showed raters differed considerably in the amounts of severity they exercised when rating the essays. The results also showed rater bias interactions with holistic rating scales. The implications of the findings for proposing procedures for reducing the effects of such extraneous variables are discussed.

Keywords

20.1001.1.2383591.2021.9.1.6.7

Main Subjects

Language Assessment and Testing

Article Title [Persian]

استفاده از انگاره ی چند وجهی راش جهت بررسی مقالات دانشجویان کارشناسی زبان انگلیسی در آزمون های مصحح محور

Author [Persian]

دکتر رجب اسفندیاری

دانشیار گروه زبان انگلیسی، دانشکده علوم انسانی، دانشگاه بین المللی امام خمینی قزوین، قزوین، ایران

Abstract [Persian]

در آزمونهای مصححمحور، نمراتی که به عملکرد کتبی و یا شفاهی زبانآموزان داده میشود لزوما منعکس کننده توانایی زبانی آنها نیست بخاطر اینکه عوامل دیگری می تواند نتایج نهایی توانایی زبانآموزان را تحت تأثیر قرار بدهد. سلیقهای عمل کردن مصححها و مقیاسهای نمره دهی از عوامل تأثیر گذار بر توانایی زبانآموزان است. هدف از مطالعه حاضر نیز بررسی این عوامل در در نمراتی است که به مقالات آنها داده میشود. بههمین منظور، از 150 مصصح ایرانی خواسته شد تا ده مقالهای را که دانشجویان در درس «مقالهنویسی» در مقطع کارشناسی نوشته بودند با استفاده از مقیاسهای نمرهدهی و معیارهای ارزشیابی مورد بررسی قرار بدهند. دادهها با استفاده از نرم افزار فاستس مورد تحلیل قرار گرفت و نتایج تحلیل دادهها نشان داد که مصححان درجات مختلفی از سختگیری را در هنگام نمرهدهی اعمال میکردند. نتایج مطالعه همچنین حاکی از این بود که مصححان نسبت به مقیاس اندازهگیری کلینگر سوگیری نشان دادند. کاربردهای آموزشی نتایج مطالعه در جهت کاهش سوگیری مصححان نسبت به مقیاسهای نمرهدهی و معیارهای ارزشیابی جهت بهبود نمرات مورد بررسی قرار میگیرد.

Keywords [Persian]

مقیاس کلی نگر
مقیاس جزئی نگر
سوگیری
سلیقه ی مصحح
سختگیری

References

Bachman, L. F., Lynch, B. K., & Mason, M. (1995). Investigating variability in tasks and rater judgments in a performance test of foreign language speaking. Language Testing, 12(2), 238-257.https://doi.org/10.1177/026553229501200206

Barkaoui, K. (2010). Variability in ESL essay rating processes: The role of the rating scale and rater experience. Language Assessment Quarterly, 7(1), 54-74.https://doi.org/10.1080/15434300903464418

Bond, T., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences. Routledge.

Bonk, W. J., & Ockey, G. J. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20(1), 89-110.https://doi.org/10.1191/0265532203lt245oa

Cronbach, L. I. (1990). Essentials of psychological testing (5th ed.). Harper and Row.

Crusan, D. (2010). Assessment in the second language writing classroom. University of Michigan Press.

Crusan, D. (2015). Dance, ten; looks: three: Why rubrics matter [Editorial]. Assessing Writing, 26(1),1–4.https://doi.org/10.1016/j.asw.2015.08.002

Dempsey, M. S., PytlikZillig, L. M., & Bruning, R. H. (2009). Helping preservice teachers learn to assess writing: Practice and feedback in a Web-based environment. Assessing Writing, 14(1), 38-61.https://doi.org/10.1016/j.asw.2008.12.003

Eckes, T. (2005). Examining rater effects in TestDaF writing and speaking performance assessments: A many-facet Rasch analysis. Language Assessment Quarterly: An International Journal, 2(3), 197-221.https://doi.org/10.1207/s15434311laq0203_2

Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing, 5(2), 155–185.https://doi.org/10.1177/0265532207086780

Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments (2^nd edition). Frankfurt: Peter Lang.

Elder, C., Knoch, U., Barkhuizen, G., & von Randow, J. (2005). Individual feedback to enhance rater training: Does it work? Language Assessment Quarterly, 2(3), 175-196.https://doi.org/10.1207/s15434311laq0203_1

Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a Many‐Faceted Rasch Model. Journal of Educational Measurement, 31(2), 93-112.https://doi.org/10.1111/j.1745-3984.1994.tb00436.x

Engelhard, G., & Wind, S. A. (2017). Invariant measurement with raters and rating scales: Rasch models for rater-mediated assessments. Routledge.

Farhady, H., Jafarpour, A., & Birjandi, P. (1994). Testing language skills: From theory to practice. The Organization for Researching and Composing University Textbooks in the Humanities (SAMT).

Ferris, D. R., & Hedgcock, J. S. (2014). Teaching L2 composition: Purpose, process, and practice (3^rd ed.). Routledge.

Hamp-Lyons, L. (1991). Second language writing: Assessment issues. In B. Kroll (Ed.), Second language writing: Research insights for the classroom(pp. 69-78). Cambridge University Press.

Hamp-Lyons, L. (2011). Writing assessment: Shifting Issues, new tools, enduring questions. Assessing Writing, 16(1), 3–5.https://doi.org/10.1016/j.asw.2010.12.001

Harsch, C., & Martin, G. (2013). Comparing holistic and analytic scoring methods: Issues of validity and reliability. Assessment in Education: Principles, Policy & Practice, 20(3), 281-307.https://doi.org/10.1080/0969594X.2012.742422

Hyland, K., & Anan, E. (2006). Teachers’ perceptions of error: The effects of first language and experience. System, 34(4), 509-519.https://doi.org/10.1016/j.system.2006.09.001

Isaacs, T., & Thomson, R. I. (2013). Rater experience, rating scale length, and judgments of L2 pronunciation: Revisiting research conventions. Language Assessment Quarterly, 10(2), 135-159. https://doi.org/10.1080/15434303.2013.769545

Jacobs, H. L., Zinkgraf, S. A., Wormuth, D. R., Hartfiel, V. F., & Hughey, J. B. (1981). Testing ESL composition: A practical approach.Newbury House.

Kneeland, N. (1929). That lenient tendency in rating. Personnel Journal, 7, 356-366.

Knoch, U. (2011). Rating scales for diagnostic assessment of writing: What should they look like and where should the criteria come from? Assessing Writing, 16(2), 81-96.https://doi.org/10.1016/j.asw.2011.02.003

Knoch, U., Read, J., & von Randow, J. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12(1), 26-43.https://doi.org/10.1016/j.asw.2007.04.001

Knoch, U., Zhang, B. Y., Elder, C., Flynn, F., Huisman, A., Woodward-Kron, R., Manias, E., & McNamara, T. (2020). I will go to my grave fighting for grammar: Exploring the ability of language-trained raters to implement a professionally-relevant rating scale for writing. Assessing Writing, 46, 1-14.https://doi.org/10.1016/j.asw.2020.100488

Kondo-Brown, K. (2002). A FACETS analysis of rater bias in measuring Japanese second language writing performance. Language Testing, 19(1), 3-31.https://doi.org/10.1191/0265532202lt218oa

Kuiken, F., & Vedder, I. (2014). Rating written performance: What do raters do and why? Language Testing, 31(3), 329-348.https://doi.org/10.1177/0265532214526174

Lee, H. K. (2009). Native and nonnative rater behavior in grading Korean students’ English essays. Asia Pacific Education Review, 10(3), 387-397.https://doi.org/10.1007/s12564-009-9030-3

Lim, G. S. (2012). Developing and validating a mark scheme for Writing. Cambridge ESOL: Research Notes, 49, 6–9.

Linacre, J. M. (2004). Optimizing rating scale effectiveness. In E. V. Smith & R.M. Smith (Eds.), Introduction to Rasch measurement (pp. 257–578). JAM Press.

Linacre, J. M. (2007). Facets Rasch measurement computer program (Version 3.64.2) [Computer software]. Winsteps.com.

Linacre, J. M. (2011). FACETS (Version 3.68.1) [Computer software]. Chicago, IL: MESA Press.

Lumley, T., & McNamara, T. F. (1995). Rater characteristics and rater bias: Implications for training. Language Testing, 12(1), 54-71.https://doi.org/10.1177/026553229501200104

Marefat, F., & Heydari, M. (2016). Native and Iranian teachers’ perceptions and evaluation of Iranian students’ English essays. Assessing Writing, 27(1), 24-36.https://doi.org/10.1016/j.asw.2015.10.001

McNamara, T. F. (1996). Measuring second language performance. Addison Wesley Longman.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement, 3rd ed. (pp. 13–103). American Council on Education and Macmillan.

Mousavi, S. A. (2012). An encyclopedic dictionary of language testing. Rahnama Press.

Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.

Myford, C. M., & Wolfe, E. W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II.Journal of Applied Measurement,5(2), 189-227.

North, B. (2003). Scales for rating language performance: Descriptive models, formulation styles, and presentation formats. TOEFLMonograph, 24(pp. 1-106).file:///C:/Users/RAJABE~1/AppData/Local/Temp/NORTHETS2003.pdf

Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413-428.https://doi.org/10.1037/0033-2909.88.2.413

Schoonen, R. (2005). Generalizability of writing scores: An application of structural equation modeling. Language Testing, 22(1), 1–30.https://doi.org/10.1191/0265532205lt295oa

Upshur, J. A., & Turner, C. E. (1999). Systematic effects in the rating of second-language speaking ability: Test method and learner discourse. Language Testing, 16(1), 82–11.https://doi.org/10.1177/026553229901600105

Weigle, S. C. (2002). Assessing writing. Cambridge University Press.

Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Palgrave MacMillan.

White, E.M. (1985). Teaching and assessing writing. Jossey-Bass.

Wigglesworth, G. (1993). Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction. Language Testing, 10(3), 305-335.https://doi.org/10.1177/026553229301000306

Wigglesworth, G. (1994). Patterns of rater behaviour in the assessment of an oral interaction test. Australian Review of Applied Linguistics, 17(2), 77–103. https://doi.org/10.1075/aral.17.2.04wig

Wind, S. A. (2020). Do raters use rating scale categories consistently across analytic rubric domains in writing assessment? Assessing Writing, 43, 1-14.https://doi.org/10.1016/j.asw.2019.100416

Journal of Applied Linguistics and Applied Literature: Dynamics and Advances

Rater-Mediated Assessment of Iranian Undergraduate Students’College Essays: Many-Facet Rasch Modelling

References

References

Volume 9, Issue 1
April 2021
Pages 93-119

Rater-Mediated Assessment of Iranian Undergraduate Students’College Essays: Many-Facet Rasch Modelling

References

References

Volume 9, Issue 1April 2021Pages 93-119

Volume 9, Issue 1
April 2021
Pages 93-119