Monday, May 1, 2023

Readability, TEA and Hemingway, DOK and Worms

Questions from the Field

I wanted to get your thoughts and opinion about reporting DOK and Bloom's information with items and readability formulas with passages.

DOK was not designed for how people are using it. I talked with Norman Web, himself about it. Even got a picture and an autograph. DOK was supposed to match the rigor of the curriculum to the assessment NOT the rigor of the questions themselves. And this distinction is worth noting. Questions are not written at DOK levels, the curriculum is. DOK is supposed to measure a gap between the assessment and the standards. So...writing questions at DOK levels skips important context and grounding from the standards. Does TEA use DOK to write questions? I'd like to see their charts. 

Blooms: This is the model I recommend for Blooms.

Research Ideas and Next Steps: 
  1. When we pair the KS with the SE, what is the Bloom Level? (for each breakout in the assessed curriculum) 
  2. When we look at the item types, what is the connection between Bloom and DOK? (for each item type and TEK) This will have to build over time because we will only have certain item types for certain TEKS for a while. And it will vary by grade. 
  3. Does TEA's interpretation of Bloom and DOK on the assessment match the curriculum? 
  4. Once we can see the gaps/alignment, then we can make some decisions for metrics, practice, and instructional interventions. 
    1. What do these metrics tell us about student performance/growth? 
    2. How do these metrics inform Tier One Instruction? 
    3. How do these metrics help us form progressions of learning and identification of pseudoconcepts that result in refined teaching and intervention for Tier Two Instruction? 
    4. How do these metrics help us write better items and give better feedback to students, parents, and teachers? 
We are taking our cue from TEA to use F/K readability scores and the "Hemingway" app they recommend, so I feel like the info we are collecting is TEA-ish-aligned, but is it the type of readability score you think teachers will want to see or care about?

Thanks for sharing this. I didn't know they were recommending Hemingway. I had to do some research. What do you mean by F/K readability? Flesh-Kincaid? Hemingway uses a similar algorithm as the F/K but somewhat different. It uses the Automated Readability Index. 

Commentary: I think TEA's move here is a good one; however, all readability formulas are flawed. I like that Hemingway's formula uses length of words (loosely associated with vocabulary) and length of sentences (directly associated with T-units and idea density/complexity). 

Note that Hemingway and The Automated Readability Index are not really grade level descriptions that teachers are used to. These numbers are NOT grade level markers that we see in Guided Reading, DRA, Fountas and Pinnell, Reading A-Z, Reading Recovery, or even in Lexiles. These measures do not measure the grade level of the text, but describe the amount of education a reader would need to understand the texts. TEA is using these measures to defend the passages. Teachers use readability measures to match readers to texts they can read easily on their own, texts that are useful for instruction, and texts that will be too frustrating to read alone. It would be a mistake for teachers to use Hemingway to match readers to texts because that's not what it does. 

Hemingway is more about CRAFTING text for readers so they will be successful.  The purpose of the scale is what is important here: how do you write in a way that most people can understand your message and purpose? Writing for people with 9th or 10th grade education levels is ok, but many people aren't that proficient. The Hemingway app and measures help you simplify your writing so that it lands where people with 4th to 6th grade experiences can understand what you intend to convey. Again (as we saw with DOK), we have a disconnect between purpose and how the thing is being used. 

We cannot provide Lexile scores for a few reasons (cost of license being primary), but we can provide some more content-based and not just language-based readability formulas, such as might be seen in Fountas-Pinnell readers.

Lexiles. Eye roll. So many better measures out there. Glad they aren't useful to you.
Content based measures. Hmmmm. That's problematic semantically. I wouldn't say that Fountas-Pinnell readers are content based measures as their levels are also language and text feature based. In ELAR, there really isn't any content past early phonics and some grammar. The rest is process. I know of NO way to measure content levels. 

Do you see a need/want for that among teachers, or is a simple language-based tool like F/K enough in your opinion?

What I see here is the potential for confusion. Already we have a mismatch between TEA recommendations of using Flesch-Kincaid and an app that uses something different. In addition, the semantics and purpose seem similar but have distinctions in practice that confound their use and application with matching students with texts, measuring growth, selecting curriculum materials, writing assessments, and planning instruction for both reading and writing. What a mess! There's a military term I'd like to use here...

Here's another wrench in the works --as if we didn't have enough to worry about:  When you use these formulas to measure readability of questions as units of meaning (like passages), questions are FAR FAR FAR above grade level of any measure. Questions are dense, complex little beasts that no one is talking about at any grade level in any content area. 
Grade 3 Full Length Practice Questions 1-3 analyzed by Hemingway: 
Screenshot 2023-05-01 at 1.52.31 PM.png
Screenshot 2023-05-01 at 1.56.17 PM.png
Screenshot 2023-05-01 at 2.01.17 PM.png

As you can see, using TEA's own recommendation, 3rd graders would need an experience of a fourth or fifth grader to answer the first three questions on the assessment. And that's after reading the passage. The more I look at this stuff, the more I believe we aren't measuring curriculum or student growth or any of the things we think we are measuring.  

Initial thoughts on solutions: 1) Give a language based, classical formula that teachers understand. 2) Give a second, grade level "experience" measure for comprehension and schema or reader's ease. This gives us a chance to help teachers understand what TEA is doing here. (Reminds me of Author's Purpose and Craft. TEA has a specific purpose for their readability stuff. It's about making sure they can defend (to the legislature and parents) what kinds of texts they are using to assess the curriculum. Teachers' have different reasons - ones that have to do with supporting a child and their growth instead of the validity of the assessment. 
 
Secondly, we have been tracking Bloom's and DOK as quickly-trained evaluators (I had a three-hour seminar years and years ago at xxx; xxx has had some haphazard PD as a middle school teacher over the years). As you no doubt know, for a STAAR assessment, we find a lot of DOK 2 / Bloom's "Analyzing" items, and so it seems like it might not be the most useful metric, but we are also not experts and might be missing some subtleties between TEKS and items that would give a more varied report. So my question is two-part. Do you agree that we are likely to see similar DOK/Bloom designations across many items, and if so (or not) is this information you think teachers will want or could use in classroom instruction or reteach? Is the information bang worth the research and editorial review bucks for DOK and Bloom? And perhaps DOK is appropriate and Bloom's not (I kind of lean this way personally)? Sop that's four questions, I guess. :) 

Can-o-worms. Earlier, I described problems with DOK and questions. If you are not matching the question and the curriculum to determine the DOK, then the data you get from that doesn't do what most think it would do. So...that has to be fixed first. 

Do I think we are likely to see similar DOK/Bloom designations across many items? My first response is: People tend to do what they have always done. So yes. TEA tends to do all kinds of things from one design to the next. So no. 

My second response is: How does any of that help the teacher? We see this ongoing work in training for unpacking standards. But honestly, if TEA isn't transparent about what they thing is DOK or Blooms, then we are guessing. Do we have solid instructional practices and understanding of the curriculum that LEADS to success on these descriptions? Labeling them without that seems like a waste of time to me. Teachers might want to put DOK and Blooms as compliance measures for item analysis or in lesson plans, but honestly...what does this change for the instructional impact on students? "Oh, it looks like 75% of our kids missed DOK 2 questions." Now what? 

My third response is this: We haven't even gotten results back. Districts and people are downtrodden and devastated and confused. They are all feeling a little nutty about everything. It's too early to even know or make a good decision. I'm wondering if we are making all of this so much more confusing than it ought to be. Mom always says, "Rest in a storm." That never feels good in a storm. But what good are we going to do if we try to build a DOK nest in a tornado? 

Is this information you think teachers will want or could use in classroom instruction or reteach? 

I don't know. Do we know what kind of instruction fixes DOK problems? I'm not sure we do. Is the DOK or Bloom what's actually causing the problem? For lots of reasons, I don't think so. There are too many variables and too many of them cross pollenate and create varieties of problems that didn't exist before or for everyone. There are SO many instructional implications before we ever get to a question and its level that are not being addressed. It seems counterintuitive to fixate on DOK before we know and repair the underlying issues. 

Here's an example. A local school here decided they wanted a greenhouse and a tennis complex. Funds were acquired. Community was excited. Structures were built. Programs began. Years passed. In the nearby school building, walls and whole classrooms in the school threatened to collapse. Know why? There was no problem with the structure and quality of the building. The contractors had done masterful construction that should have lasted a century or more. The greenhouse and tennis complex were built on the well planned and placed drain fields to take the water away from the sodden clay of our panhandle soil. The problem isn't the structure of the building/question/DOK. The problem is how the whole system worked together. 

Is the information bang worth the research and editorial review bucks for DOK and Bloom? And perhaps DOK is appropriate and Bloom's not (I kind of lean this way personally)? The problem is that we have to make decisions now when we don't have the land survey to tell us how things are built and how we should proceed. 

It's a crapshoot. We might spend a lot of our resources to make something that isn't useful. We might make something that looks good and attracts attention but isn't consequential for helping those we want to serve the most: our students. I'm more "meh" on Bloom's as well. I just can't bring myself to care much about either one when I consider all the things that need to be fixed before labeling a question with a level that we can't validate. I also think the question types themselves indicate their own DOK.