Fairness, Justice, and Language Assessment: The Role of Measurement

Tim McNamara, Ute Knoch and Jason Fan
Oxford University Press 2019
See page 88 for details

 

Assessment is an important part of most language teachers’ roles, and research shows that language teachers can typically spend between a third and half of their time on assessment and/or activities related to assessment. Given that assessment takes up such an important part of language teachers’ time, it is surprising how little attention language assessment gets in teacher training courses and qualifications, and language assessment literacy has been identified as a key issue in language teachers’ professional development needs. For this reason, books such as Fairness, Justice, and Language Assessment: The Role of Measurement are a much-needed resource for ongoing professional development.

The authors, based at Melbourne’s Language Testing Research Centre (LTRC), will be familiar names to those teachers already interested in language teaching. Tim McNamara’s Language Testing (Oxford University Press, 2000) is a great introduction to language testing, and Language Testing: The Social Dimension (John Wiley & Sons, 2006), which focuses on social aspects of language testing, provides a good foundation for this book to build upon. According to the authors, the two main goals of the book are to ‘explore the difference between fairness and justice in language assessment’ and to demonstrate how test fairness can be shown and improved using measurements such as Rasch measurement (no, not something that a doctor would do with a ruler, but more on this later!).

It should be noted that this book is most definitely not an ‘introduction to testing’ book, but that shouldn’t put teachers off reading it. The book consists of nine chapters which effectively introduce the reader to the fairness and justice of language tests in relation to test validity and then guide the reader through the role and application of Rasch measurement and analysis, and how this leads to improvements in test fairness.

The book starts with an introduction in which the authors outline the importance of using test analysis to determine test quality. In Chapter 2, test validity, justice and fairness are discussed, with commentary on the social role of language tests.

Chapter 3 provides a clear introduction to the basic Rasch model, something that will most likely be unfamiliar to most teachers without knowledge of statistics or an MA in Language Testing. The chapter is written for non-experts, and, in my opinion, provides one of the clearest guides to Rasch measurement that I have read. The differences between classical test theory (measures such as item difficulty, item discrimination and test reliability, for example) and Rasch measurement, which is a more complex type of test analysis, are clearly explained. The chapter presents guidance on how to conduct a simple Rasch analysis, a description of what happens during the analysis, and then describes a range of key aspects of Rasch analysis and how to understand what these mean.

Chapters 4 and 5 describe how Rasch measurement can be used beyond the simple ‘right or wrong’ answers, the analysis of which was the focus of Chapter 3. The authors present a (fairly simple) guide to applying and understanding more complex type of Rasch analysis known as the many-facets Rasch model, which can be used for investigating measures such as inter-rater reliability (the level of agreement between two or more raters in their assessment of a test taker’s performance), important for establishing the fairness and reliability of test-taker performance in speaking and writing assessments, for example. These chapters are well supported by numerous figures and tables that help explain areas that may be new to most language teachers.

Chapter 6 explains how the findings of Rasch analysis can be applied to demonstrate test fairness (or unfairness). While valuable, I found this chapter the hardest going, as it was pretty dense (citing a lot of previous research) and it probably included just a few too many acronyms for me (DIF, UDIF, NUDIF, EBB1 and EBB2, anyone?). This is probably a chapter that would be useful to re-read after I have actually carried out some Rasch analysis, so worth returning to later.

Chapters 7 and 8 are definitely aimed at teachers who already have more advanced knowledge of Rasch measurement ‘who have an interest in gaining a deeper knowledge of the area’. Like earlier chapters, both are clearly organised and well supported by visuals, and even though they are aimed at those with more expertise, I definitely found these chapters easier to digest and follow than Chapter 6.

The book concludes by reflecting on the issue of the need to balance fairness and justice in language testing, and nicely clarifies how tests can be just but not fair, and although the examples given relate to high-stakes tests (such as IELTS), the message is equally as important to those of us involved in designing and delivering institutional tests that can determine whether students are eligible for scholarships or can gain entry into a range of tertiary qualifications.

Overall, I think that it would be fair to say that this is probably not a book that would appeal to all language teachers, especially those with little or no experience of statistical analyses of language tests. When reading it, I would recommend that it should be read chapter by chapter rather than jumping to a chapter of interest. However, it is definitely a valuable resource for those ‘amateur’ test writers looking to get a better understanding of what their test results actually tell us and the implications of these results in today’s world. Unfortunately, test data analysis is often ignored in many low- to medium-stakes assessments. The authors state that this book is intended to develop language testing expertise. Fairness, Justice, and Language Assessment: The Role of Measurement goes some way towards this, explaining the need for test analysis and how to go about it. Worth having on the bookshelf of every teachers’ room.

Mark Dawson-Smith
Mark Dawson-Smith is Team Manager at Wintec in Hamilton, New Zealand and is passionate about language assessment.

 



 

Summary of books reviewed

 

Future issues

If you have any articles on the themes below, I would like to hear from you. Articles should be 1200–1800 words.

 

For any further information contact me at Robert.mclarty@pavpub.com

 

Topics for 2020

Summary of books reviewed

 

Future issues

If you have any articles on the themes below, I would like to hear from you. Articles should be 1200–1800 words.

 

For any further information contact me at Robert.mclarty@pavpub.com

 

Topics for 2020