Shona's Shennanigans: Is STAAR Too Hard to Read?

Are STAAR passages too hard to read? I've read several articles and blogs about people complaining about how hard the passages are. I both agree and disagree. Apparently, the state government is getting enough flack that they are going to listen to some testimony about it on Tuesday. There's some problems here that go further than just putting the passages in a readability formula in Microsoft Word.

TEA conducts external reviews to allow teachers to vet the passages. At those sessions, teachers review the passages. They can make comments on the sentence length and make choices about softening the vocabulary. But there are NO resources available to test the readability. (And when I did my original study on readability, TEA staff told me they didn't use readability measures and relied on the teacher opinions in the external review process.) During the external review, the passages have already been selected and reviewed by TEA staff and testing company developers. When I went to a prompt review study in April, I learned and wrote about measures that they use to evaluate texts before submitting them for review. It was the first time I'd heard about how staff and developers vet readability. Perhaps I am behind the 8 ball...

Yet, we've (teachers) known for a long time that these passages were always a problem for readability. I wrote about it for my master's thesis Math Problems Don't Read Once Upon a Time. In this text, I shared quantitative data that math problems on TAKS were written significantly above the level for the equivalent grades's reading passages. (It was published in the R&E Journal. This one had the editor's comments, so it's pretty fun to read for other reasons too. Thanks, Jill Aufil and Joyce!)

There are TWO major problems here that need to be considered:

1. No SINGLE quantitative readability measure will be able to pinpoint the reading level of the passage.

2. HIGH STAKES testing and accountability to rate schools is a key factor in how this stuff plays out with kids in classrooms and on test day.

I lied. THREE:

3. There are qualitative features of text complexity that must be considered along with culturally relevant features that reflect our diverse population.

Quantitative Features:

The quantitative features of each readability measure differ. The one you use depends on your purpose. And you should ALWAYS look at how these formulas are derived and ask yourself:

Does this formula capture meaning and complexity?
How are "they" making decisions about "unique" words and other features?
What's behind the math?
How does this match what I know about the text and my readers?

More practically, it's powerful to think about these questions:

What does this analysis tell you about your TEKS? (Curriculum)
What does this analysis tell you about what you need to teach? (Pedagogy)
What does this analysis tell you that readers will/won’t need? Can or can’t do? (Students Reading and Processes, Schema and Background)

So I took the first 722 words of Animal Farm and put it through its paces in several readability formulas.

Lexiles:

Lexiles measures mean sentence length and average word frequency. Long sentences tend to be hard to read. Shorter ones tend to be easier. The more diverse words you have in a text, the more difficult it tends to be. Makes pretty good sense. But sometimes sentences are long and NOT difficult. (Ahem: Alexander and the Terrible, Horrible, No Good, Very Bad Day. And that's just a title.) Animal Farm has a Lexile measure of 1300-1400. When you convert that to grade level (and you should probably know what's behind that math) the book falls at the 11th or 12th grade level. (Ask your fellow English teachers what level that book is usually taught.) It has a mean sentence length of 25.79 words. It has a mean word frequency of 3.61.

Here's the thing: When you look at the actual text, what makes this hard to read really has nothing to do with the sentence length or the word frequency. Here's the first sentence: "Mr. Jones, of the Manor Farm, had locked the hen-houses for the night, but was too drunk to remember to shut the popholes." That's a long sentence. Nothing hard about it, even for kids in 3rd or 4th grade. I've never heard of popholes. Have you? But I can pretty much tell what that means because I've had chickens. Now I know that this is just one sentence. The demands of the text are collective. But I think you can see that there are some features of complexity in this text that have nothing to do with the math behind the 1300-1400 rating.

Fry:

Good ole Fry. I got to sit next to him during a round table session once. Felt like I'd met Elvis. Fry takes the average number of syllables per 100 words and the average number of sentences per 100 words. Some fancy division and plotting on a graph and you have a grade level. Animal Farm plots out at about the middle of 9th grade. 140 syllables per 100 words. 4.1 sentences per 100 words.

Already we have a huge discrepancy in grade level reporting. Is this text 9th grade? Is this text 12th grade? That'd be pretty important when assigning a text to a particular grade level for high stakes testing. There's some people out there writing about how far off the STAAR test is from grade level. Depends on what measure you use. And all of them are flawed. I'm pretty sure that a book about Pokemon would come up high on the number of syllables count. And still as sure that it's nowhere near the complexity of Animal Farm in any fashion, even if the sentences are long.

Ragor has one too, but it's not purported to be as accurate as Fry.

Flesch-Kincaid:

This measure is the one that I see most people using. It's pretty important that you know the reasoning and the math behind the rating. There are actually TWO Flesch-Kincaid scores. Reading Ease and Grade level.

Flesch-Kincaid Reading Ease

This scale works with average sentence length and average number of syllables per word. It's pretty popular and used by the US Department of Defense for its documents. Florida uses it for all insurance policies. Between 60-and 70 is considered an acceptable standard. Animal Farm comes out at 65.5 for Reading Ease. That's supposed to describe a text written in English plain enough to be understood by 13-15 year olds. That's 8th and 9th grade.

Ah. Now we have justification to put Animal Farm in 8th grade. We might even put it on a test for accountability. But y'all. Look at the math behind this booger. I know that I'm a literacy person and allergic to math, but this doesn't make any sense about how in the world you could justify how these numbers tell you how difficult a text is to read. I'm going to write it in words to emphasize the ridiculous nature of the math. Sorry. Take the total number of sentences and divide them by the total words. Then multiply that to the result you get when you subtract 1.015 from 206.835. Then, take the total number of words and divide that by the total number of syllables and multiply that times 84.6 and subtract that from the number you got from your first set of calculations. That's how hard it is to read. What???!!!! Where did 1.015 come from? And 205.835? I know there is a reason psychotically complex and all, but seriously...we're getting pretty far away from how readers make meaning with text here.

Flesch-Kincaid Reading Level

The scale works similarly to the Reading Ease math, using different numbers. Animal Farm comes out to a 10.5 reading level.

But look at the math. The total number of sentences are divided by the total number of words and divided by .39. That result is added to the result of the total words divided by the total number of syllables multiplied by 11.8. Then you take that result and subtract 15.59. Um....Ok. Lots of problems here. The formula itself privileges sentence length over word length. And grade levels can be hundreds of times higher than 12th grade. How does that help us peg a grade level for our kids and assign a rating for a high stakes test? In addition, we now have a fourth grade level added to how we might quantify this text.

Accelerated Reader

AR uses ATOS. The formula uses word count, average word length, sentence length, and average vocabulary level. It pegs the text at 7.7 grade level.

So now the text is appropriate for assessment at 7th grade? I know that's not what we do, but seriously - it points out flaws in what we are looking at to assign texts to kids and to evaluate how we assess students. Note also that the average vocabulary level is "3". We should be asking how they derive that number and look at the list (and the research behind that list) they are using to make that decision.

The Guardian Posted this article today about Mr Greedy and Grapes of Wrath. I like what they say.

Multiple Features:

I looked at some others too. Gunning Fog uses sentence length, percentage of hard words, words with three or more syllables, jargon, compound words, no suffixes as a syllable, and disregards proper nouns. The resulting scale is supposed to show how many years you'd need in English education to understand on a first reading. Animal Farm for Gunning Fog is 12.6 - hard to read. Be sure to ask yourself how they determine hard words.

The Coleman-Liau uses the number of characters in a word instead of syllables. Some funky math here too: 5.89xACW-0.3xsentences/(100xwords)-15.8. It's supposed to tell you what grade level you must have achieved to be able to understand the text. Animal Farm's Coleman-Liau is 8th grade.

SMOG is another one that's supposed to tell you how many years of instruction you need to comprehend a text. It takes the number of sentences and words with more than three syllables. It uses a square root of polysyllabic count. Uh...square root y'all. For comprehension. I don't get it. Animal Farm's SMOG is 8th grade.

Here's a site that will give you a summary of multiple measures.

High Stakes and Readability

We are NOT going to find a SINGLE readability measure to evaluate the complexity of the text. We might be able to use T-Units to help us measure how well the text grow from one grade to the next in terms of complexity, but I think that's still flawed, as there are MULTIPLE measures that constitute text complexity. In terms of readability formulas for quantitative purposes, it's important to consider multiple measures. It's important to consider the qualitative and cultural measures in terms of how reading intersects with our humanity. It's important to think about how these measures are used and for what purpose. If we are going to have a psychometrically valid assessment, I'm not sure that the field of research and study has developed a valid measure of text complexity that justifies how we are using the results to measure student progress or the effectiveness of schools in their instructional programs.

Qualitative and Culturally Responsive Measures

The lowest measure of Animal Farm that I found pegged the text at the 7th grade. It's not a good choice for a 7th grader. And you know it. There are rubrics out there to evaluate the qualitative features of texts. We should be using them. But here's something else I want to leave you with. I was listening to Kylene Beers yesterday. She described a scenario where a student was reading a state assessment passage about a volcano. The student skipped a question about the lava cascading down the mountain. They asked him why he skipped the question. He said something like this: I just couldn't figure out what dishwasher soap had to do with lava. Cascade.

For high stakes assessment, we are NOT going to be able to find texts that are at the right grade level for each kid. Our language is too rich. Meaning is too nuanced. Words are to dependent on context, language of origin, regional characteristics and use, personal knowledge and background, etc. Assessing a kid with a text is more about what the kid DOES with the text than what the text is on its own. Find a way to measure that and fix the ways we are picking texts for assessment. Until then, let's back off the high stakes and let kids read and write. I think you'll find that they can and will. At much higher levels than you can ever measure.

Note to Legislators: You will continue to receive criticisms about text complexity until other issues are resolved and developed. It doesn't matter what readability formula or machine you use. People can put the texts into any of them and come up with numbers to refute what grade level the assessment appears in on the state assessment. For right now, 1)understand what is implied by each formula 2)increase transparency at the agency during external review - provide multiple statistics/readability results for passages in review so teachers can see them and use them as resources for making decisions 3)call for more research and development in how we can better select and measure the complexity of texts students are exposed to for assessment purposes.

Shona's Shennanigans

Sunday, March 3, 2019

Is STAAR Too Hard to Read?