Tuesday, January 30, 2024

Part 5: Research? Science? An AI Example

So, we're trying to find the research behind AI scoring. Stuff we need to know as practitioners. The previous posts are how the participants at CREST evaluated the citations provided on Pearson's website about Automated Scoring. As we looked at each citation, we developed suggestions for understanding and evaluating what is research and what it isn't. Here's the next citation and evaluation:

Citation: Folz, P. W. (2007). Discourse coherence and LSA. In T. K. Landauer, D. McNamara, S. Dennis, & W. Kintsch, (Eds.), LSA: A road to meaning. Mahwah, NJ: Lawrence Erlbaum Publishing.

Here's our notes: 

Authors: Folz directs research at Pearson Technologies Group. He was a pioneer in AI scoring. 


Source: LSA: A Road to Meaning; Published by Lawrence Erlbaum Publishing. Erlbaum is a respectable company for research from my limited experience.


Validity: The chapter referenced is in the section of the text about various essay scorers. The chapters report about the tools and are not peer reviewed scientific research about the tools. The previous and following sections of the book do address psychometric issues in assessment and scoring. 

Purpose: In the abstract to this chapter, we learn that the chapter is not really about using AI to score but to compare: “For all coherent discourse, however, a key feature is the subjective quality of the overlap and transitions of the meaning as it flows across the discourse. LSA provides an ability to model this quality of coherence and quantify it by measuring the semantic similarity of one section of text to the next.” It’s not really about research on how great AI scoring is to use with assessment as posed on the website’s presentation.

Again, we see some places that we need to go to find the research. And, we see that comparing semantic quality of one section or text to another isn't really the same as giving a kid a grade on a standardized assessment that will impact their graduation and the funding of their school community. Bias remains. And we still haven't found the research and science that we're looking for. 

Discern Original Purpose

Furthermore, the abstract tells us something about the purpose (even if it is from 2007): 

The purpose of LSA, the way the essay assessor works, was created to help machines understand natural language of humans to do stuff the human wanted the machine to do. Reminds me of Oppenheimer. Understanding the atom can do a lot of good. But it can be used in ways to do harm. LSA was developed for the purpose of helping a machine understand language. I can't see that it's purpose was to assign a score in an assessment regime. Also reminds me of that guy in Jurrassic Park. Have we gotten so excited that we can do something that we forgot to think about whether or not we should? 



No comments:

Post a Comment