Investigating Rater Perceptions in the Assessment of Speaking
Assessment, speaking, assessment of oral production, reliability, biasAbstract
In the assessment of spoken production, numerous reasons can be identified behind the decisions that raters make in evaluating samples of oral performance. Inter and intra rater factors are relatively well documented in various reliability and validity studies. Some that have been identified in literature involve the effects of examinee pairing or the familiarity with the examinees, others point in the direction of gender and gender role perceptions O’Sullivan (2008), others appear to be connected with body language and non-verbal cues that accompany oral production (cf.: Krahmer and Swerts 2004, Seiter, Weger, Jensen and Kinzer 2010). While some studies that address the assessment of speaking English in exam contexts suggest that raters may not feel as comfortable assessing pronunciation as they do other aspects of a speaker’s performance (Orr 2002, Hubbard, Gilbert and Pidcock 2006, Brown 2006, De Velle 2008), more recent investigations of rater behaviour involving electronic evidence from training, maintenance and online examination programmes tentatively show that pronunciation, in fact, is the first category examiners attend to (Hubbard 2011, Chambers and Ingham 2011, Krakowian 2011, Seed 2012, Tynan 2015, Kang and Ginther 2019). This paper looks at large collection of assessments stored in an electronic system to investigate what raters really seem to pay attention to when allegedly following rating scales.
