Tuesday, August 15, 2023

2023 STAAR Extended Constructed Response Analysis, Interpretations, and Recommendations

The extended constructed response seems like a hinge-point because success depends on what kids DO while they are reading and responding. Success on these items can be a gateway to success on the other items.

So - here's some things to think about as you are interpreting results, designing interventions, and planning initial instruction.

By the way - If you didn't know, you have access to ALL the writing students have done on the exam. It's important to print those out for the kids you have. Revision is the best way to improve writing. And it's a good place to start in helping students become aware of their thinking and reasoning. Eventually, we'll have pdfs of it all. Right now, we're having to clip, cut, and paste. Your testing coordinator will be able to give you access in the system to see student results and will have the pdfs eventually.

Here's a hyperlinked pdf of the analysis that you can use in your plc's.

Saturday, August 12, 2023

So many ECR Zeros on STAAR? Why?

Check out these scorepoint graphs:

Why so many zeros? Well...it was new. And 3rd graders are NINE. But perhaps we need to look at the obvious...or what is obviously hidden before we start diving into TEKS.

Here's an analogy - did you know that if you break open a cottonwood twig, that there is a STAR of Texas hidden inside? (Project Learning Tree has a cool lesson about it here.) The star was there all along, but you didn't know it was there.

In the TECH world, they study the "user experience." Kids all over the state told us that they didn't have an ECR on the assessment. Why did they think that? Remember - most kiddos are taking this test on a chromebook tiny screen with no mouse. Here's what they saw:

Semantics:

Students are asked to write an argumentative essay. Not an ECR. Could the problem be semantics? Kids need to know they will be asked to write a composition or an essay. (And not an S A. Before my kids saw the word typed out, they thought I was just saying two letters.) ECR is an assessment item type vocabulary and is not used for kids on the test. Same goes for passages - that's what we call them. That's not the academic language used on the assessment - STAAR says "selections." Our academic synonyms might be causing some of the confusion.

User Experience:

And...did they SEE a place to type the essay? Sure, there's a little arrow that says to scroll down. The scrollbar that says there is more to see doesn't show up until you hover over the right hand side of the screen on some computers. On other computers it does.

That's a wonky user experience for something kids aren't familiar with. With which kids aren't familiar. Whatever. You get the point.

Our solution: Kids need to be beyond familiar with the TECH elements. They need to be fluent with them. And our words need to match their experience - "You'll be asked to write and essay or a composition. You'll need to scroll down or use your arrow keys to see where you need to type."

Font and Text Features:

Another problem is the way the paired passages appear.

See how these scroll on the same "page"? It looks like "Laws for Less Trash" might be a subheading for "Rewards for Recycling."

That's a problem. Kids need to make sure they are attending to the bold material at the beginning of the passage and know that "selections" means two different passages. Another semantics issue.

But this is also a text feature issue. I didn't see any subheadings in the third grade test, so I had to go to 4th grade.

See how the titles of passage are center justified and of a certain size font? Now compare to the subheading: left justified and a smaller font. These are text features that serve as cues for readers about what the author is doing as well as when a new passage/selection appears.

It will be important to teach kids how to understand and decode the text features and navigation elements (like that little almost invisible grey arrow on the right hand side of each screen that says there's more below and you need to scroll down.)

Using Cambium Components for Self-Regulation:

The very top row is designed to help students understand and see how much of the test they have completed with the blue bar and the percentage complete. They know how much energy they need to use and how much time to save. But they can also see where they have and have not answered questions. The little red triangle tells kids where they have NOT answered questions. This would have been a huge cue to students that they'd missed answering the essay question. But did they know it was there? Were they fluent in using the tools to monitor their progress and to check for completion?

Before we start digging into TEKS (and especially accountability ratings and class rosters), let's do some talking with kids about their experience and their approach when taking the exam. Our solutions can start with modeling how the platform works and using similar tools during daily classroom instruction so that students are fluent with technology experiences beyond a familiarity with a tutorial or mention of the tools. Let's make sure students understand the user experience and how to use the tools to enhance their comprehension and demonstration of grade level curriculum. Until then, they're walking in a Texas creekbed under the cottonwoods, not knowing about the hidden treasures all around them.

Monday, May 1, 2023

Readability, TEA and Hemingway, DOK and Worms

Questions from the Field

I wanted to get your thoughts and opinion about reporting DOK and Bloom's information with items and readability formulas with passages.

DOK was not designed for how people are using it. I talked with Norman Web, himself about it. Even got a picture and an autograph. DOK was supposed to match the rigor of the curriculum to the assessment NOT the rigor of the questions themselves. And this distinction is worth noting. Questions are not written at DOK levels, the curriculum is. DOK is supposed to measure a gap between the assessment and the standards. So...writing questions at DOK levels skips important context and grounding from the standards. Does TEA use DOK to write questions? I'd like to see their charts.

Blooms: This is the model I recommend for Blooms.

Research Ideas and Next Steps:

When we pair the KS with the SE, what is the Bloom Level? (for each breakout in the assessed curriculum)
When we look at the item types, what is the connection between Bloom and DOK? (for each item type and TEK) This will have to build over time because we will only have certain item types for certain TEKS for a while. And it will vary by grade.
Does TEA's interpretation of Bloom and DOK on the assessment match the curriculum?
Once we can see the gaps/alignment, then we can make some decisions for metrics, practice, and instructional interventions.

What do these metrics tell us about student performance/growth?
How do these metrics inform Tier One Instruction?
How do these metrics help us form progressions of learning and identification of pseudoconcepts that result in refined teaching and intervention for Tier Two Instruction?
How do these metrics help us write better items and give better feedback to students, parents, and teachers?

We are taking our cue from TEA to use F/K readability scores and the "Hemingway" app they recommend, so I feel like the info we are collecting is TEA-ish-aligned, but is it the type of readability score you think teachers will want to see or care about?

Thanks for sharing this. I didn't know they were recommending Hemingway. I had to do some research. What do you mean by F/K readability? Flesh-Kincaid? Hemingway uses a similar algorithm as the F/K but somewhat different. It uses the Automated Readability Index.

See a simplified description here: https://en.wikipedia.org/wiki/Automated_readability_index
a comparison I made here: https://docs.google.com/presentation/d/1RsqfmOlUjMcEETidAnGXsq3z-s5U3J4yiDIIQ7Pxp5g/edit?usp=sharing
and a review of a readability hearing from TEA to SBOE here: https://docs.google.com/document/d/1l6uktpoj5TsJ1xAHmlYTnWNIvEnD5WD7l3HA0B0EasI/edit?usp=sharing

Commentary: I think TEA's move here is a good one; however, all readability formulas are flawed. I like that Hemingway's formula uses length of words (loosely associated with vocabulary) and length of sentences (directly associated with T-units and idea density/complexity).

Note that Hemingway and The Automated Readability Index are not really grade level descriptions that teachers are used to. These numbers are NOT grade level markers that we see in Guided Reading, DRA, Fountas and Pinnell, Reading A-Z, Reading Recovery, or even in Lexiles. These measures do not measure the grade level of the text, but describe the amount of education a reader would need to understand the texts. TEA is using these measures to defend the passages. Teachers use readability measures to match readers to texts they can read easily on their own, texts that are useful for instruction, and texts that will be too frustrating to read alone. It would be a mistake for teachers to use Hemingway to match readers to texts because that's not what it does.

Hemingway is more about CRAFTING text for readers so they will be successful. The purpose of the scale is what is important here: how do you write in a way that most people can understand your message and purpose? Writing for people with 9th or 10th grade education levels is ok, but many people aren't that proficient. The Hemingway app and measures help you simplify your writing so that it lands where people with 4th to 6th grade experiences can understand what you intend to convey. Again (as we saw with DOK), we have a disconnect between purpose and how the thing is being used.

We cannot provide Lexile scores for a few reasons (cost of license being primary), but we can provide some more content-based and not just language-based readability formulas, such as might be seen in Fountas-Pinnell readers.

Lexiles. Eye roll. So many better measures out there. Glad they aren't useful to you.

Content based measures. Hmmmm. That's problematic semantically. I wouldn't say that Fountas-Pinnell readers are content based measures as their levels are also language and text feature based. In ELAR, there really isn't any content past early phonics and some grammar. The rest is process. I know of NO way to measure content levels.

Do you see a need/want for that among teachers, or is a simple language-based tool like F/K enough in your opinion?

What I see here is the potential for confusion. Already we have a mismatch between TEA recommendations of using Flesch-Kincaid and an app that uses something different. In addition, the semantics and purpose seem similar but have distinctions in practice that confound their use and application with matching students with texts, measuring growth, selecting curriculum materials, writing assessments, and planning instruction for both reading and writing. What a mess! There's a military term I'd like to use here...

Here's another wrench in the works --as if we didn't have enough to worry about: When you use these formulas to measure readability of questions as units of meaning (like passages), questions are FAR FAR FAR above grade level of any measure. Questions are dense, complex little beasts that no one is talking about at any grade level in any content area.

Grade 3 Full Length Practice Questions 1-3 analyzed by Hemingway:

As you can see, using TEA's own recommendation, 3rd graders would need an experience of a fourth or fifth grader to answer the first three questions on the assessment. And that's after reading the passage. The more I look at this stuff, the more I believe we aren't measuring curriculum or student growth or any of the things we think we are measuring.

Initial thoughts on solutions: 1) Give a language based, classical formula that teachers understand. 2) Give a second, grade level "experience" measure for comprehension and schema or reader's ease. This gives us a chance to help teachers understand what TEA is doing here. (Reminds me of Author's Purpose and Craft. TEA has a specific purpose for their readability stuff. It's about making sure they can defend (to the legislature and parents) what kinds of texts they are using to assess the curriculum. Teachers' have different reasons - ones that have to do with supporting a child and their growth instead of the validity of the assessment.

Secondly, we have been tracking Bloom's and DOK as quickly-trained evaluators (I had a three-hour seminar years and years ago at xxx; xxx has had some haphazard PD as a middle school teacher over the years). As you no doubt know, for a STAAR assessment, we find a lot of DOK 2 / Bloom's "Analyzing" items, and so it seems like it might not be the most useful metric, but we are also not experts and might be missing some subtleties between TEKS and items that would give a more varied report. So my question is two-part. Do you agree that we are likely to see similar DOK/Bloom designations across many items, and if so (or not) is this information you think teachers will want or could use in classroom instruction or reteach? Is the information bang worth the research and editorial review bucks for DOK and Bloom? And perhaps DOK is appropriate and Bloom's not (I kind of lean this way personally)? Sop that's four questions, I guess. :)

Can-o-worms. Earlier, I described problems with DOK and questions. If you are not matching the question and the curriculum to determine the DOK, then the data you get from that doesn't do what most think it would do. So...that has to be fixed first.

Do I think we are likely to see similar DOK/Bloom designations across many items? My first response is: People tend to do what they have always done. So yes. TEA tends to do all kinds of things from one design to the next. So no.

My second response is: How does any of that help the teacher? We see this ongoing work in training for unpacking standards. But honestly, if TEA isn't transparent about what they thing is DOK or Blooms, then we are guessing. Do we have solid instructional practices and understanding of the curriculum that LEADS to success on these descriptions? Labeling them without that seems like a waste of time to me. Teachers might want to put DOK and Blooms as compliance measures for item analysis or in lesson plans, but honestly...what does this change for the instructional impact on students? "Oh, it looks like 75% of our kids missed DOK 2 questions." Now what?

My third response is this: We haven't even gotten results back. Districts and people are downtrodden and devastated and confused. They are all feeling a little nutty about everything. It's too early to even know or make a good decision. I'm wondering if we are making all of this so much more confusing than it ought to be. Mom always says, "Rest in a storm." That never feels good in a storm. But what good are we going to do if we try to build a DOK nest in a tornado?

Is this information you think teachers will want or could use in classroom instruction or reteach?

I don't know. Do we know what kind of instruction fixes DOK problems? I'm not sure we do. Is the DOK or Bloom what's actually causing the problem? For lots of reasons, I don't think so. There are too many variables and too many of them cross pollenate and create varieties of problems that didn't exist before or for everyone. There are SO many instructional implications before we ever get to a question and its level that are not being addressed. It seems counterintuitive to fixate on DOK before we know and repair the underlying issues.

Here's an example. A local school here decided they wanted a greenhouse and a tennis complex. Funds were acquired. Community was excited. Structures were built. Programs began. Years passed. In the nearby school building, walls and whole classrooms in the school threatened to collapse. Know why? There was no problem with the structure and quality of the building. The contractors had done masterful construction that should have lasted a century or more. The greenhouse and tennis complex were built on the well planned and placed drain fields to take the water away from the sodden clay of our panhandle soil. The problem isn't the structure of the building/question/DOK. The problem is how the whole system worked together.

Is the information bang worth the research and editorial review bucks for DOK and Bloom? And perhaps DOK is appropriate and Bloom's not (I kind of lean this way personally)? The problem is that we have to make decisions now when we don't have the land survey to tell us how things are built and how we should proceed.

It's a crapshoot. We might spend a lot of our resources to make something that isn't useful. We might make something that looks good and attracts attention but isn't consequential for helping those we want to serve the most: our students. I'm more "meh" on Bloom's as well. I just can't bring myself to care much about either one when I consider all the things that need to be fixed before labeling a question with a level that we can't validate. I also think the question types themselves indicate their own DOK.

Sunday, March 12, 2023

Should we take the interim? And then what? Part One

Draft

Should we take the interim?

Yes.

Reason One: It's a giant research study to see if it is possible to measure growth over time instead of on a one day high stakes assessment. That's what the legislation originally asked for TEA to study. Part of me wants to say it can be done. And, that's not really how the interim is being used right now. Reminds me Dr. Ian Malcom on Jurassic Park that says, "Your scientists were so preoccupied with whether they could, they didn't stop to think if they should." Right now, the interim is supposed to predict the likelihood of passing the STAAR at the end of the year. So many variables are in place socio-emotionally, culturally, academically, and within each subject domain and test design, that I fear we are not measuring what we think we are anyway. It's a good idea to see if this works or not.

Reason Two: Prove them that teachers are the ones that know best, not an assessment. I'd really like to see the data that says it predicts what it says it does. But from what I've seen in reports from campuses last year and their STAAR results, the interim data didn't match any of the projections from the interim for ELAR. So...let's take the thing and then bust out our best to make sure kids are learning beyond the progressions and predictions.

Reason Three: It gives the kids an "at bat" with the new format and item types. I'm ok with that rationale...except: Have we explicitly taught digital reading skills and transfer of knowledge and strategies for the new item types, TEKS, and skills? Have we had enough massed and distributed practice on these skills before weighing the baby again? If we used the interim as an instructional tool, maybe. We could use the interim as a guided or collaborative practice. But as another source of decision making data? Not sure that's accomplishing our goals to make kids do things alone that we already know they don't have enough experience to do well. Sounds like a good way to disenfranchise struggling learners with further beliefs about how dumb they are. It's like signing up for fiber internet and paying for it before the lines get to your neighborhood.

No. It's a giant waste of time for kids and teachers.

Reason One: After examining the data, I have NO idea what I'm supposed to do in response to help the teachers or the kids. More on that later.

Reason Two: It's demeaning and demoralizing. Do I really want to tell a kid in March, a few days before the real deal that they have abysmal chances of meeting expectations? Do I really want to tell teachers that x of their z kids aren't in the right quadrant to show growth when they have less than two weeks after spring break to do something about it? If they even believe that the kids took the exam seriously? They already know the kids didn't use their strategies and purposefully blew off three or more days of precious instructional time while taking the dang thing.

Reason Three: Did we do something about the last data we collected on the interim? Do the kids know their results? Have they made a plan to improve? Do we have a specific plan? Have we fixed the problems that caused the first set of results? People are having data digs and meetings to tell teachers what to do and how these predictions are going to play out for accountability. We're having tutorial programs and hours to meet 4545. We're doing some stuff, but is it really a detailed and reasoned response to resolve the causes of the data? Have we fed the baby enough to cause weight gain before weighing it again? No.

Reason Four: The data is correlational, not based on cause. The data on the interim tells us the correlations between one data collection (STAAR last year) and the next assessment. Results are correlated to the probability of success or failure and do not pinpoint the cause of the success or failure. When working with human subjects, it is humane to use correlational data to make instructional decisions about nuanced human intricacies for individuals in such complex settings and soul crushing accountability for personal and collective judgments?

An additional problem with the interim is that you don't have a full trend line until you have three data points. Statistically, it doesn't make sense to take last year's STAAR results (which was a different test using different standards) and pair it with a second interim. There is no trend line until the third assessment even if the assessments were measuring the same thing.

Yet, that's what teachers were asked to do: make some decisions about indictments on their instructional practices and resulting student performance on data that doesn't mean what they were told it meant. Furthermore, teachers are told to revisit previous CBA's and other data to determine what needs reteaching. The advice is well meaning, but in practice is too unwieldy and flawed to do anything other than make teachers want to pull their hair out and cry out in desperation and stress.

More on that in Part Two: We took the interim. Now what?

Sunday, January 15, 2023

Stations for Editing and a Few Questions

Mr. Manly Transcript and possible stations. Note that you will need to edit out one line after the firefighter is introduced.

Extending with the ECR: With Love and Linked Lessons

Good Morning Dr. Rose,

Today, I will begin using QA12345 with my students. However, this semester I have honors students, and I was wondering what you think I can do to increase the rigor using the QA12345. What are some ways we can have them elaborate, or expand their thinking and writing using the QA12345?

Any suggestions or feedback helps.

Thank you,

High School Teacher

How lovely to hear from you. Thank you for asking. I was working with a group of teachers in Small City, Texas this week that had similar questions. I see a few directions to go.

One: First we use the QU12345 to get the basic topic sentences. Then we use strategies such as looping to help students think of the next thing to say that is connected with the previous statement to deepen the elaboration into a paragraph. With my students, I always teach prove-it's next, then depth charge. At this point, students are ready for Starring or CAFE Squidd. If they are writing narratives, I present tampering with time lessons from After the End. By this point, students are ready to delete stuff from their writing that is repetitive. I also introduce the dead giveaways from Gretchen's site and throwaway writing activities. My friend Cheryl gives kids this activity to think about within and between paragraph structures to try as well.

Two: Sharing. Students should be sharing their writing by reading it aloud to peers and receiving feedback. Start with Pointing and then follow the first two rows of activities.

Three: Examine the craft of other writers and how they develop their ideas. Then try out these ideas in your own writing. Share the befores and afters in small groups. Here's how we worked that out in Small City, Texas this week. Kids began with lesson one in Text Structures from the Masters. We introduced it by saying that as they are maturing as 9th graders, we see a lot of growth in the second semester. They are becoming more of who they are going to be as adults. (We're trying to combat the immaturity we have seen after COVID.)

Next, we had them annotate the kernels in the Hippocratic Oath. Then we went deeper to analyze how the author pitchforked: "I swear by Apollo the physician, and Aesculapius the surgeon, likewise Hygeia and Panacea, and call all the gods and goddesses to witness, that I will observe and keep this underwritten oath, to the utmost of my power and judgment." We noticed that he was referencing authorities that influenced him. We noticed how he put these as items in a series using /likewise/ as a connector as well as the common conjunction, /and/. We noticed how the prepositional phrase at the end allowed the writer to clarify the depth of his devotion and efforts.

Teachers then re-entered their writing to show how they could try out these techniques in their own writing. After modeling the process, teachers allowed students some time to imitate these moves in their own writing from the original writing. They color coded or annotated their moves and revisions just as we did in the mentor text and tested out the ideas with their feedback groups.

For the next lesson, we chose Sojourner Truth's, Ain't I a Woman? speech. We followed the same write, share, kernelize to comprehend, and then annotate for craft process as with the other text. We dug in deeply to name colloquialisms and the irony in the speech. (Normally when people talk in rough mannerisms, they are considered dumb. But Truth's analysis here is astute and wise, full of rhetorical techniques.) We dug into the anaphora (repetition) with her rhetorical questions and the impact they had on us as readers and on delivering the message. We looked at the biblical allusions and how they were used as criticisms/attacks on the reasoning of those who didn't take her perspective.

Next, teachers re-entered their writing to try out the anaphora or one of the other techniques. Since we were in a PLC, teachers were able to take different techniques and try them out in class. These became more modeling texts that they could use in class. (It is important to note: teachers may have prepared the writing beforehand, but when it came time for class, they wrote live and explained their thinking aloud.) Next, students tried the strategies in their own texts, shared them with peers, etc.

That's a lot. Let me know if you want to talk on the phone or zoom. As teachers implement these lessons, we'll have some exemplars.

With Love and Lessons,

Dr. Rose

Monday, December 5, 2022

Best STAAR Resources...For Now

What STAAR resources should I buy? What online program is the best? What books should I read? Where are the best resources for Author's Purpose and Craft? For ECR and SCR?

All. The. Time people are asking.

Right now? The best resources you have are on the STAAR redesign website from TEA. Look at ALL the scoring guides, even if you teach a grade not listed on that material. Look at all the released new item assessments, even for the grades you don't teach. The guides, collectively, give you the best information about how TEA is designing items for all grades and all items.

Look carefully at how the passages and questions intersect. For example, when students are asked to combine sentences, look at the passage to see WHY they need to be combined. It's usually vague references with pronouns, repetition, or the connection between clauses with coordinating or subordinating conjunctions.

Look at how the passages are set up with the introductory or footnote material, especially for excerpts. Look at how the multipart items are connected. Consider the deep thematic links between excerpts and sections.

This may sound tacky, but publishers have not had adequate time or materials to prepare materials that fully match what was released. The last updates were in October of 2022. And we still haven't seen the TEKSGuide for High School. If you see stuff printed before that, you run the risk of the publisher's interpretation instead of TEA's.

Yes, students need online practice opportunities. TEA provides practice on the Cambium site, with TFAR, and Interim assessments. Let's start there.

Unpopular Opinion: The time to buy materials from publishers that matches content, rigor, and question types is not now. Perhaps next year.

Note: I have worked with content and item reviews for Sirius Education Solutions. I believe they have done a wonderful job with examining the standards, what is out there from TEA, and ways to give feedback to students in their online platform. From what I have examined in other platforms, this provides the most curated experience for students needing practice with online formats and item types. As new information is presented, the content, item types, and passages are updated and refined.