Wednesday, April 17, 2024

Morath's Presentation on Hybrid Scoring: Corrections to Commentary about How Hybrid Scoring Was Communicated

TEA posted this powerpoint to answer questions about hybrid scoring. Morath gave a presentation this week. Here's the link: https://tea.texas.gov/student-assessment/testing/hybrid-scoring-key-questions.pdf 

Inaccurate Information about Communication

On slide 8, the presentation explains that "TEA provided information in other stakeholder group presentations, e.g., Texas Science Education Leadership Association (TSELA), Texas Social Studies Supervisors Association (TSSSA), Texas Council of Teachers of English Language Arts (TCTELA), Coalition of Reading and English Supervisors of Texas (CREST)" in November of 23. 

This is not true. TEA did NOT provide information to TCTELA or CREST in 2023 about hybrid scoring. 

There was NO CREST conference in 23. 

There was a TCTELA conference in February of 23, but NO mention of hybrid scoring was mentioned. Here are my notes because TEA did not release the slides. I also recorded the session on my iphone.  I reviewed my personal recording today. There was NO mention of Hybrid scoring. 

 https://docs.google.com/document/d/1K6SFfwEPeKXO43WHEugVHcQwJSicCVNJqXLlNHRF6AI/edit?usp=sharing 

The FIRST ELAR folks heard about this was from their Testing Coordinators and from a pdf posted on the TEA website in December of 2023. To reiterate - the top ELAR organizations in Texas did NOT have the information. And from what I know, the region service centers did NOT cover any of the information about hybrid scoring until after December of 2023. We didn't know. And they didn't tell us in the way described in this presentation. The information on side 8 is blatantly inaccurate. My momma would call that a lie. 

Other Information and Questions

The slides do give us some good information. But we still aren't getting the answers we need for transparency or instructional intervention or initial instruction. 



Thursday, April 11, 2024

79 vs 8% Zeros: Let's Panic about the Right Thing: Some Facts to Consider about "AI" Scoring

 

Raise Your Hand Texas posted this graphic April 10th, 2024

There's lots to say initially, and more to say after that. 

First, it's not AI. It's an Automated Scoring Engine. 

It does what it's told and doesn't learn. Not sure that helps anyone feel better. Semantics at this point. Toh mai toh. Tah mah toh. 

Second: 79% of RE testers in DECEMBER received zeros on the extended constructed responses. 

These are the folks that haven't passed the English II exam all the other times they've taken it and aren't even in English II classes anymore. Here's a scenario: 

Senior level student. End of first semester of their last year in school. They need the English II credit to graduate. Their school needs them to pass because that gets them points for accountability stuff. They've taken the test this many times: when they were in English II, the summer after the Sophomore year, the December of their Junior year, the spring of their Junior year, the summer after their Junior year, and now the December of their Senior year. They have the next Spring and Summer opportunities as well. Yeah. That's 8 times. And...a lot of these kids have been doing this for both English I and English II. 16 times. And...a lot of these kids have been doing this retesting thing for YEARS. AND, some of these kids were in English I, English II, and English III at the same time because they failed the class and the assessments. And...some of these kids were in an English class and a remedial class. And...when they originally took English II, the Extended Constructed Response, Short Constructed Response, and new item types weren't a thing. So they haven't had initial instruction or experience on much of any of it. Only remediation if all the rules were followed. 

And they made a zero on the ECR in December of 2023 when most folks didn't know that a computer would be scoring the responses. While the lack of information and knowledge about scoring is a problem - a big one - it should be NO surprise that these kids made zeros. It's not the computer's fault. More on that later. 

These kids - the 79% of RE-Testers in 2024 who made a zero - aren't failing because a computer scored their writing. Most of them didn't write. Some of them (from what we saw on the 2023 responses) wrote nothing, typed placeholders like a parenthesis, said "IDK", or "Fuck this." They aren't failing because they haven't had instruction. They are failing because they are just DONE with the testing bullshit. They are disenfranchised, tired, and ruined. 

The computer isn't the problem: the culture created by the testing regime is the problem. 

Third: So where did RYHT Get this Data and Why Did They Post It Now

Author's Purpose and Genre, ya'll. It's rhetoric; it's propaganda. 

Raise Your Hand Texas wants you to VOTE. The post was incendiary and meant to rally the masses and voting base. It's a classic technique for campaigns and lobbying. We SHOULD vote. We SHOULD be angry. But the data posted is a bit deceptive in directing our actions for the kids in front of us. 

Part of their information comes from the TEA accountability report given to the SBOE. Here it is.  And part of it comes from a research portal that I've mentioned in previous blogs. 

Notice also that they are only showing you English II. There's other grades and results out there. 

Here's a chart of what you'd find in the TEA Research Portal. I took the data and made a chart. ASE means Automated Scoring Engine. 

Source: TEA Research Portal

Action: Note the differences in the population, the prompt, and the rubric. Yes. There's a problem with the number of kids scoring a zero. But it's probably not a computer problem. Let's dive deeper. 

Fourth: There's More from Texas School Alliance and Others

The Texas School Alliance (and others) have been talking to TEA. You can become a member here. Various news organizations are also reporting and talking to TEA, asking hard questions and receiving answers. And school officials have been able to view student responses from the 2024 December retest. 

In researching the reports and ideas, here are some key things to know and understand before acting. 

Difference in the Type of Writing

The big changes that caused the zeros probably wasn't the computer scoring thingy. Kids are answering different kinds of prompts than they had been answering before. Before, kids wrote about stuff in persuasive or explanatory modes about a topic they knew something about. They used their own schema and probably didn't care much about the topic. Now, kids have to write in argumentative or explanatory modes in a classic essay format or in correspondence (which we haven't seen yet). They have to write from their understanding of a text and not their schema. 

Action: We need to make sure kids know how to do the right kind of writing. We need to make sure kids can comprehend the texts. 

The Rubric is Different

The rubric used in 2023 and 2024 retesting is DIFFERENT because the type of writing is different. From discussions with others, people are pretty confused about how that new rubric works and what it means. If teachers are confused, KIDS are confused. There has been little to no TEA training on the rubric other than some samples. 

Action: Our actions should be to really study our kids' responses and the scoring alongside the scoring guides. 

The Test ISN'T Harder Overall: SO WHAT? 

You can argue with me on this if you want to. But there's a detailed process all tests go through before and after administration. It's called equating. It's where we get the scale scores, why the cut points for raw scores are different each year, and why we don't know what passing is until after the tests have been scored. 

The ECR is probably harder. So are some of the SCRs. Some of the item types and questions are harder. Some questions have always been harder than others. That's not new. TEA is right that the test isn't harder overall, it's just that there are other things that matter to living, breathing, and feeling humans.

Just because the overall test is not harder, the student experience is not the same. When stuff is new to kids, it's scary. When stuff is hard and scary, and when stuff is high stakes, kids have anxiety. This impacts text scores in ways that are unrelated to psychometrics. Just because the test isn't harder psychometrically, doesn't mean the experience isn't a challenging psychological experience that doesn't impact instruction. 

Action: We need to do more work on preparing students for the user experience in terms of technical online and test details as well as social emotional realities. Especially for kids who have previously experienced failure. 

People Who Viewed the Zero Responses Agreed with the Computer Ratings

TEA reported before (December 2023) that the ASE is trained and MUST agree with human raters. It's part of their process. 

And, the folks I've talked to agree that the responses they saw at TEA from the kids at their campus who scored a zero agree that the papers deserved a zero. Most of them asked for no or 1-2 rescoring requests. 

This means that our colleagues agree that the ASE isn't the problem. 

The Hybrid Scoring Study Conducted by TEA

Here's the link.  And here's a screenshot of the Executive Summary on page 5. 

Huh? Here's what I think it means: 

Paragraph One: They studied the Spring 2023 results. The ASE did a good job. They are going to use the results of the study to study future tests - including the one that was given in December of 2023. 

What we don't know: Who did the study? What was the research design? What is "sufficient performance criteria"? Where is the data and criteria for the "field-test programmed" and "operationally held out validation samples"? What do those terms mean and how do they function in the programming and study? 

Paragraph Two: They studied the first 50% of the extended constructed responses scored by the automated scoring engine (ASE). The ASE did a good job for the subpops used for accountability purposes.

What we don't know: What is "sufficient performance criteria"? How did the subpops compare and what was used as criteria to know that the comparisons were fair? What are the models? 

Paragraph Three: The way the test is made now (with different texts and prompts each time) the computer scoring engines will have to be reprogrammed each test administration AND while scoring is going on. They are going to reprogram on all score points as the data arises during scoring. As changes are made to the models, all the stuff that was scored with the old models will be scored again. (That's good research practice with both a priori and constant comparative analysis. BUT - we don't really know what research protocols they are following, so I'm just guessing here.) As more essays are scored, they'll figure out new ways that kids answer in ways that confuse the models and need rerouting. If they were rerouted before the new codes, they will be rescored with the new models. 

What we don't know: We still don't know how the ASE is programmed and monitored. Like, at all. 

Paragraph Four: Conditions codes change based on paragraph three. They'll keep working on changing them and refining that process during current and future administrations for all grades. They will have to because the grades, texts, prompts, and human ratings all change each administration. The data also changes as more stuff is scored, so the codes have to change as the student responses are scored. 

Action: All of us need to read the whole study. And we need to ask more questions. For now, I think there's not a real ASE scoring problem with the zero responses. The 79% that got zeros probably earned those and we need to seek other things to "blame" for the cause.





Wednesday, April 10, 2024

13 Trips and A Request for Feedback

Note: I wrote this last week during an Abydos session on Building Community. We used a grouping strategy called "Pointing": original developed by Peter Elbow. But we ran out of time. I'd like to try it out with my friends online so I can use the data to show how we extend the grouping activities for revision and differentiated instruction.

Would you help? Can you read my writing and then "point out" the words and phrases that "penetrate your skull" or seem to be the center of "grabbity" that catch your attention? If you have time, give me an idea of why the words or phrases stuck out to you. 

There's 13 trips for a sprinkler  repair. Unlucky? Always. Why? 

You never have all the right things. It's the wrong size. Wrong gage. Wrong thread. Wrong length. Just wrong-wrong-wrong. And that was wrong too. 

1. You bought the wrong thing the first time because you were hapless enough to go the store to buy what you thought you needed before you started digging.

2. You returned to the store to buy the new thing, but when you went back to install it, you realize that thing doesn't fit any better than the first thing. You need that other thing that makes the second thing fit. Misfortune again. 

3. When you put the adapter on to make the part fit, a foreboding crack undermines your efforts.. So now you need more parts to fix that thing too. 

4. When fixing the new problem, you use an old tool at a cursed angle and the plastic for the new part snaps. You go buy the new part only to realize upon return home that...

5. Jinxed, you need a different tool so you can remove that thing that broke. So you go back to the store to buy the tool remove the part the broke so you can use the part you bought the last time. 

6. Now that things are in place, you dry fit the parts only to find that the ill-fated glue you bought the last time - only last week - has already turned solid. 

7. And the lid on the other thing that makes the first thing work won't come off. So you need another tool or chemical to open the second one and buy a new one so you just buy both of them. But then there are three choices now for the same product that used to work just fine and the wretched original is nowhere to be found. 

8. When you are digging past the mud and clay, you realize that the blighted person before you must have taped the thing together with Gorilla Glue and an old garden hose and you'll have to fix that part too. 

9. Then you go to put the new part on and realize that some star-crossed do-it-yourselfer laid the new pipe on the old pipe and there isn't room for replacement connector. So you go back for a flexible bendy part to add to the other parts you just bought. 

10. So you dig a bigger trench because you think you might lift up the pipe to make some room with the bendy part you just bought. Avalanches of dirt cover your previous attempts. So you carefully leverage what you have in the dark.  Blindly, you lift until you hear a crack and notice that on the other side of your repair there is a T where the damaged pipe intersects with two other directions. 

11. So you dig a bigger trench to uncover the new problems. Traipse back to the store to get more things. Things seem to be gong fine when you dry fit that parts and use the new glue stuff. Only to realize that you weren't quite ready for the glue. The pipe is still too long, but now the connections are too short and you'll have to catastrophically cut off the T and replace the whole thing because the pipe is too close to accept a new adapter. 

12. Back to the store. For all the new things and other adapter things. You dry fit and double check everything only to realize - calamity - that when you reach for the glue that you didn't close the lids properly and the gunk is now a dried goo all over your new tools.

13.  Since you've already spent the dire, equivalent funds and wait time for calling a pro, you throw your muddy gloves down into the hole, wipe your bloody hands on your soaked pants, and ask Siri to call the sprinkler repair guy that you should have called before your first, ill-fated trip. 


MH's Pointing Feedback: I love the phrase "avalances of dirt."  It creates a vivid image in my mind.

I like the repeated use of the word "thing."  In real life, I get frustrated with too many uses of "thing," but it fits perfectly here because I'm guessing you don't really know what all the "things" are really called!  It shows from the beginning that you probably shouldn't have been the one making this repair.
The phrase "foreboding crack" tells me "uh-oh."