Monday, November 6, 2023

Resource for SCR Scoring Guide for ALL Grades and Content

    TEA is a huge entity. If you could hear my accent, huge would have several syllables for emphasis. Size is the only reason I can figure out that some people get information and others don't. The people involved there do the best they can to balance security, permissions, and other stuff - but it's frustrating. Just today, I saw a post that someone didn't know that 3rd graders could be assessed with an argumentative prompt on any kind of genre of reading. 

    Texas Council for the Social Studies had a conference in October of 2023. Jo Ann Bilderback and Carmen Trejo presented for West Ed and TEA. (Someone who knows the connections can explain that to me sometime.) The resources associated with the session were created in February of 2023. Looks like the session was also offered in Region 14 at some time as well. Winter. 

    And now it's November. Next month our high school retesters will be taking the exam again. We need (and needed) this information. And it looks like great stuff! Why are we finding out about this stuff on accident? Why are only certain folks getting access? What am I missing? I try to be on top of things and to study, research, and attend sessions. What should we all do to have timely access to critical information? 

    I looked online where TEA posts recent presentations. Can't find it anywhere. So here it is: 

SCR Training from the TCSS conference

General Session Slides

RLA Content Slides

Science Content Slides

SS Content Slides

RLA Annotated Docs 

Science Annotated Docs

SS Annotated Docs

Now the question: Is there something out there about ECR that would help us? 

Another training from September.


Monday, October 30, 2023

Is it RESEARCH? AI Scoring: A Call For Readers and Thinkers

Think and reason with me? Pretty Please? 

CAVEAT: So, I really do think we are ok for AI scoring of the type of writing that STAAR will assess. CAVEAT: And, I think this will be better than what happened with scoring in the last administration. (I've seen pretty much identical papers get very different scores. And there's lots of confusion out there because of it about what kids need to do, even bad advice like just use less words.) 

But AI scoring is going to be a reality. It's already used effectively with TSI. It's already used with TELPAS. Not sure I can say effectively because I don't have any information or experience about how that's working. 

Now it's time to dig deeper. As I've been discussing things with folks I respect, these are some thoughts we should consider: 

Concerns: 

AI can be: 

  • technically flawed
  • unresponsive to originality and nuance
  • based on predictability formulas

Essential questions: 

In addition, is there consensus in the field about AI's value, validity, and reliability? Are we looking at real research? And what does the research we are given say? What other voices are out there? 

Background: 

Pearson reports that their scoring is "backed by research and [is] unparalleled in the field." At the end of the website are several articles. I'm waiting on access to some of these. In the mean time, let's take a closer look at what's offered as research.

I've started a spreadsheet here that we can use to collaborate. 

Call for research and analysis: 

As Joyce Armstrong Carroll and Abydos state: I'm [researching.] Won't you join me? 

Step One: 

Can you join me in reviewing these texts? We need an annotated copy of the articles. If you have information for the other cells, we need help in evaluating those details. What else should I consider? 

Step Two: 

What other peer reviewed research of excellence should be considered? I've added a second tab for that analysis. 





Sunday, October 29, 2023

Hybrid/AI Scoring: Content and Implications for Instruction

AI Scoring: Content and Implications for Instruction

Something to Read for Background and Further Research:

Here's what we know about "hybrid" automatic scoring: https://www.pearsonassessments.com/large-scale-assessments/k-12-large-scale-assessments/automated-scoring.html

Be sure to look at this one too: https://www.pearsonassessments.com/large-scale-assessments/k-12-large-scale-assessments/automated-scoring/automated-scoring--5-things-to-know--and-a-history-lesson.html 

What's been going on? 

So, there's a LOT of data that has already been collected (been going on in the background since at least 1994. And the EIA: Intelligent Essay Assessor probably used data and tech that preceded that.) Student papers. Rater scores. Rubrics. Processes. Refinements. 39 million responses were tested in 2019 by Pearson. And Pearson isn't the only one working on this stuff. 

What's happening in December 2023

The test has already been designed and field tested. 

The passages and genres and prompts have already been selected. The scoring guide has already been prepared by humans. They've decided what the possible answers are. They've decided what text evidence should match it. The humans have decided what wrong answers and evidence are probable. The humans have decided what paraphrases and synonyms are likely. 

Sample/anchor papers have been scored and uploaded into the machine/system. 

The machine/system is programmed with all of this information. The machine is programmed with the rubric.

When retesters in December submit their tests, the machine gets the papers. The machine/system already knows how sample papers scored. It scans the new submissions and compares to what it already "knows." A score is generated. 

When the system/machine is challenged, the writing gets sent to a human. Sometimes the human gets the paper and the system/machine gets the paper. The human and system/machine calibrate or recalibrate. This will happen about 25% - of papers? of scoring attempts? Not sure. We just have 25%. 

It's called hybrid because humans decide what the system/machine looks for. It's called hybrid because humans are scoring continuously between and with the machine. 

Accuracy/Validity

Can a computer do this? Is it fair to kids and the ways they interpret the text? Some people will argue with me...but YES, it's fair. Much more fair than what we've had before. Why? Let's think about what's being assessed here. It's not really TEKS. And it's not really writing for ECR. 

We want to know - and it's a good thing to know - 
  • Can kids read and understand a text of any genre? 
  • Can kids follow written instructions? 
  • Can kids use data to make decisions? 
  • Do kids know how to read stuff at their grade level? 
  • Can kids make decisions about important ideas and communicate them in a variety of ways (correspondence, use information, argue, etc.)? 
When TEA released some of the information about the new test for ELAR, they gave us some really important information: they showed us a guide for raters. The guide included the text, possible answers, and possible text evidence that could support it. The guide explained what a good response would include with the rubric description. 

So...kids will read a passage (or passages.) They'll be given a prompt with a genre. The passage will have clearly supported answers and text evidence to match it. Texas teachers will have written, revised, and validated this information. The item will have been field tested and reviewed psychometrically. And all of that will be programmed into the computer. 

Here's how I think it will work: We teach kids to dig for the good stuff. 



 




Implications for Instruction: 

1. They don't want to, but kids are going to have to read the passages and the prompts in full. 
2. Deep comprehension is essential. 
3. Digital reading and digital composing skills must be transferred from physical reading and writing tasks. In other words, learners must see what this stuff looks like in practice IN the platform or some other digital platform. 
4. Readers and writers must learn beyond WHAT the tools are in the platforms and HOW they can be used to support reading and writing processes. The Cambium tutorial builds familiarity but NO information on how the tools aid the thinking processes. 
5. Text evidence is KING. And it must be connected to answering the question. 
6. One single acronym is NOT going to help kids answer all the ways this thing can be assessed. RACE isn't good enough. (I actually despise it because it causes bad results. More on that another time.)

Application: 

Here's what we tried with a group of kids who scored between 3-5 on their essays last year. It's a powerpoint of notes and our steps that we used in working through the processes of revising their previous work. We pulled up the Cambium testing platform and used what they have available on the test. We learned a TON of stuff that kids said they did and didn't do; what they knew and didn't know. I still need to think about that for a while. But in the meantime, I hope this helps you make some instructional decisions and next steps. 



Hybrid/AI Scoring: Mechanics and Implications for Instruction

 



TEA kind of announced that AI is scoring stuff for ECR and SCR when they offered the testing administrator training this month. Ok. So we found out about it in a backhanded way. Moving on. 

My little rescue dog, Joy, is trained well. For one trick only: Be cute. Then she sits. Like Joy, the AI scoring machine will do exactly what it is trained to do, regardless of what we call it. For ECR/SCR on STAAR, the scoring will do exactly what it says it will do. 

We are going to be ok. It's time. We already have the technology to hear, transcribe, and interpret English vocabulary and syntax. Phones, text messages, emails, docs, etc. It's working pretty well. (I remember when Dragon Dictation started. Didn't work too well. But it learned and got better.) 

A friend of mine saw the video I posted on facebook about this: 

Hi😊I just came across your video you posted after meeting with your favorite English teachers. I’m not savvy to all the things your referencing in the video, but I will say…if AI allows the teachers a way to grade the students essays, I’m all for it. I would love to have all those HOURS back with my mom, my favorite English teacher. It was just our normal at the time…but looking back, she spent so much precious personal time at home “grading papers” when I was growing up.
❤️


Wow. That kind of hurt. And, we know that marking papers is actually a waste of time. See my dissertation. And I can't tell you how many hours I wasted in carrying those bags of papers home. 

So the implications for instruction are: 

1. Make sure teachers and students have access to technology that can "score" and grade papers for mechanics. Even if it costs money. This should be provided by TEA if they are using it to score. Assessment must match instruction. AI scoring resources are now instructional materials. 

2. Stop editing (grading) papers for writers. Don't do something a computer can do. 

Now - our next treat is going to be what all of this means for scoring content. More on that next. 



Tuesday, October 24, 2023

Notes: “Pleasurable Poetry Analysis: Extracting Theme, Craft, Structure and Beauty without Tying the Poem to a Chair and Beating It…,” with Gretchen Bernabei (3-12)

 www.tinyurl.com/BeachBernabei2023

Who is someone you know with a quirky habit? Or one unusual behavior? Describe it. Write about that for 3 minutes:

My thoughts: 

Quicklist - Cousin Tommy? Joy? 

Joy thinks she is a guard dog. I don't need a guard dog. She's also a chow hound. I don't need that either. She knows only one trick: be cute. Actually, it means "sit" to her. So, when we are at the table, she "sits" thinking that she'll get a treat. When Dean is there, we don't ever really hear her because they share the plate and he offers small tidbits throughout the meal. But, when he's absent, things are very different. 

When we keep eating and talking, ignoring her, she makes a small, barely audible, "huff." As if, "excuse me, I'm being cute and no one is noticing." The huff's punctuate the dinner, coming more and more frequently. Until, she feels like we can't hear her at all and her sharp exclamation of alarm pierces the air: and she is scolded for being not so cute. 

Nobody starts with a blank page. 

Read aloud from Possums

Read aloud from again. Highlight 2-3 lines you really like and want to call your own. 

We shared our underlined phrases. Why did we underline it? What did the lines do to you? What were you thinking? Talk to me about that. 

(Thoughts: It always surprises me how long we talk about the poem and our reactions. It's like we do a full poetic analysis with talk before we ever do another thing. It happens like this in class, too. Just being humans and reacting to the words, our thoughts, confusions, reactions, nodding our heads and seeing more as we see what others have felt. "I don't think I would have noticed that if you hadn't said that." How does Gretchen know when to stop talking about the lines and when to move into the next steps of the lesson? Gretchen notices and names their noticing with the academic language: enjambment and poet choices: The poem is awkward representing the contrasting ideas, forcing us to stop our reading like the possum stops. We must remain motionless like the possum. The words are poking at us with a stick.)

From her book.

Showed us the chunks. Quirky thing.  We marked our copy, using different colors. 

Read the poem one more time to confirm the chunks and what they mean. We call this kernelizing. Pointed out that the text isn't the same size - that all paragraphs aren't the same size. It's just what each chunk does - tracks the movement of the mind like stepping stones(Thomas Newkirk). It's coherent! 

Can you predict what I'm going to ask you to do? Write for 10 minutes to restructure your writing or to write something new. What kind of poem will you create? 

Tricks for Gluttony

Sitting next to my dining room chair

the dog's acting cute: silent sitting, 

expecting me to deliver a reward

Because that's the only trick she knows:

"Be cute!" She sits for a piece of duck jerky.

I wonder: what do I do for treats? 

What am I waiting for people to notice? 

Until they don't and I under my breath 

Huff for a notice, a glance, a praise

Repeat the throat clearing as if unheard

Until a sharp piercing call interrupts

Irritating and Self-Focused Interference

Am I seeking what others can provide? 

A reward and noticing: superficial sustenance

when I'm already full and capable of feeding myself. 

My AHA: Wow. I didn't expect the depth that the structure caused. It shifted my prewriting to a new place! Exciting revisions. As I talked with Gretchen before the session, we talked about how we begin to see ourselves as writers and poets through this process. It made me think more about schema. Kids often approach comprehension from their own schema. Which is good for the reading until we go to the kinds of questions valued on STAAR. We don't want them to use schema of self. We want them to use schema of text. But...if they are working from a schema of authorship...if I were the poet of this piece...I'd be doing x to cause y...That's a kind of personal schema I can support in instruction about poems and author's craft. What a difference it makes when we experience ELAR as both reader and writer! 

It was hard for some in the crowd to write. She said, "Leave each other alone..." and shared what she would offer them for lessons. We laughed and talked. She reminded us of how she would monitor and grade in class. We shared with each other at our table. Then folks read aloud to the whole group. 

We covered 7 of the 8 TEKS strands (all but research.) 

Kids can write responses from their own poems and from the poems of others. The more kids experience both hats - reader/writer - the better equipped they are! 





 




Monday, October 23, 2023

Notes: Teach Rhymes with Beach: Gretchen Bernabei: What's Important (and What's Not)

 Literacy's Democratic Roots by Thomas Newkirk - drawing heavily from this because teaching is an act of patriotism. Fighting against ignorance to prepare a population that votes and perpetuates the experiment of democracy. 

Envelopes- 

tinyurl.com/BeachBernabei2023

Opening thought: from his first page, xiii: "If you enter...Cherish, hold dear." 

Think about what's important...Let's explore out own thoughts. What do we think is important in our classrooms? (Besides the humans.) 

Envelope: Two columns: Quicklist of important things: 

Life lessons you want to convey: (my thoughts)

1. Grace and acceptance

2. Doing your best is most important

3. Everything we do is preparation to do the next thing (my thoughts) 

Life skills you want them to walk away with

4. How to think and reason

5. How to consume and critique a text

6. How to write and create

Three practices you want them to learn (my thoughts) 

7. How to listen and respond

8. How to notice when something is wrong

9. How to find solutions when things aren't right

Add one more thing that's not already on the list (my thoughts) 

10. How to make friends and support others

Other Side: Not important

Things you did as a student that aren't important now (my thoughts) 

1. Diagramming sentences; outlining a chapter

2. balancing chemical equasions (can't spell math) 

3. write something and then type it

Things that you are required to do in classroom that are required by others (my thoughts)

4. write the objective on the board

5. read from a script

6. take attendance on time

Add to the list to outdated things that you have to do (my thoughts) 

7. grading

8. CBA's

9. meetings

One more thing: 

10. Follow rules from assessment regime

Put lists to the side. Asked volunteers to come to the front. This is how she teaches kids to answer questions fully. (She goes through the process she explains in QA12345: Through dialogue.) 

Question; Answer; How do you know? Huh? What does that mean? How else do you know? Huh? What does that mean? So...your answer is...what? If we put the questions on mute, you would have heard a complete response. 

First person: What's your name? 

Second person: What town are we in right now? 

Third person: Is it hot outside today? 

She knows that when a student wants to do her role, that they are on the way to success. She uses a lot of visual texts to begin the work. Told story of how her students used the structure with The Wizzard of Oz. Used sentence stems from the website to ask the questions. These structures become internal and automatic. 

Return to your list. Choose three. (She shared the ideas she added to her envelope.) Circle three of them. Or four. :) Pick one that you wouldn't mind talking about with folks in this room. 

Open the envelope to write on the flap side. 

Kernel Essay: Topic at the top. The other kernels will go under the flap. What you write will go on the envelope outside. 

Answer/Claim/Opinion; One way I know; This means; Another way I know; This means/shows; And so, answer repeated.

CBA's should be outdated and abandoned. I know this because everyone is required to do them, but nothing much changes in learning and teaching. Why would we waste time doing something that makes no difference. Another way I know is that the scores on CBA's don't match the results we end up getting on STAAR. Kids who pass a CBA don't necessarily pass the STAAR or get those kinds of questions right on the final exam. Kids who fail a CBA don't necessarily fail the STAAR or get those kinds of questions wrong. So, CBA's should be outdated practices and abandoned as ineffective pedagogical practices. 

Shared out our work. Rhetorical triangle - someone writes something for somewhere for someone else to hear. It's not real communication until it's shared. Its an exercise for a school laboratory if only the teacher reads. 

Don't do it this way: This isn't very good. It's just toff the top of my head. I didn't have time to revise.

Don't do it this way: Read from it and then ad lib to add. Just read it the way it is. 

Hear three people. 

Heard some from the crowd. 

Everything that has been written or read, has structure. There are hundreds that you can use once you learn the concept. Pick what you want to use empowers students to explain what they mean. 

Other thoughts from Thomas Newkirk

See tinyurl for thoughts and resources. 






Notes: Reading Lenses-Deep Analysis with Jenny Martin at Teach Rhymes with Beach

 C.  “Reading Lenses--Deep Analysis,” with Jenny Martin (3-12) Is authentic reading engagement truly possible? Absolutely! Using the “Reading Lenses” strategy with rich texts, teachers will learn how to boost their students’ reading interactions, and immediately facilitate rigorous analysis. Students are empowered with confidence and skills that support STAAR reading and constructed responses. This session also helps teachers better understand the “thinking” within their TEKS. (3-12) 

Vocabulary: Listened to Speech on the Death of Dr. Martin Luther King Jr. by Robert F. Kennedy. As we listened, we coded the text for vocabulary: context clues (CC), Greek and Latin/Foreign (GL), and multiple meaning words (MM). Then we met in small groups to discuss and share ideas about how these words added to the message by discussing meaning. 

Author's Purpose and Craft: APC elements + Why are these choices made by the author? What's in the text that makes you believe and understand the purpose? We spent some time thinking about R. Kennedy's purpose. Discussed in table to find a common purpose. "Unification is needed." "Martin Luther is a legacy that tells us what we should focus on for unification. Honor MLK." "To inform about passing and to persuade the audience to act in a way to honor MLK's life purpose." Add what you liked to what you have already written. Then we highlighted things in the speech that supported that idea. 

Next: Explain or analyze how the use of text structure contributes to the author's purpose. We are working up to these kinds of questions. Use Gretchen Bernabei's rope and brand/kernelizing process to determine the structure. Use the text structure to explain how it's put together to deliver the message/purpose. Example Kernel: Big announcement: big loss; Who was this man? Where do we go from here? Here are our choices. Here's a poem that is a beautiful expression of the moment. Here's the plan. Nails the landing with: so we dedicate ourselves to... Name the speech structure: Honoring a Great Person. (Kids can write from this structure as well.) 

Next: How does the photograph, caption, and poem contribute to the author's purpose? Why are they there? What is the effect? How does it help you as a reader to connect to and understand the message. 

Next: Metaphor/personification: Highlight with a different color. Discuss impact and use. 

Next: Point of view - best example comes from him having a member of his family being assassinated

Next: Mood and voice: word choice in opposition; repetition: difficult; How does this impact mood and voice. Discuss. Share out. 

Next: What makes it argumentative? Rhetorical devices/appeals and logical fallacies. Discussed logos, pathos, ethos. Exaggerations.  ID those. (Note, there are additional features including argumentative structures.) 

Next: Use the lens of text structure to reread - look for pitchforking. Highlight three examples of pitchforking. Discuss what they are doing rhetorically. Enter our writing and use them. (Note: wouldn't it be a good idea to use these for pieces of text evidence that supports ideas? Then we could explain each? Hmmm.) Shared out end of 9 and all of paragraphs 7 and 10. 9 uses parallelism and anaphora. 

The point is to re-enter the work repeatedly to deeply analyze. We can make sentence frames and have the kids imitate. These are all highly tested. We can also use this as a mentor text to model explicitly about the skills we are teaching. Kids are familiar with the text and we can focus on the skill over time. 

Next: Use question stems handout and respond in writing. Try two vocab stems. Try two or three author's purpose stems. 

6th Grade Question Stems

 


Notes: Teach Rhymes with Beach: Melanie Mayer's Keynote - Six Strategies for Success

 1. The "Devil" Cards: Take a deck of cards. Write the kids' names on them. Ask a question. Have kids turn and talk. Then pull a card and have the person share 

2. Charting the reading: Several ways to do this...

Take a piece of paper and fold it into four columns. 

Option One: Questions: What do you wonder? 

Words: A new/powerful word for me is ____. I think it means/adds clarity by ________ because _________. 

Summary: Who's the speaker? What's the subject? What's the situation? Why does it matter? 

Inference: The writer doesn't say directly, but I can tell from the evidence that...

(Change the chart to fit your focus, such as figurative language: simile, metaphor, personification)

Option Two: Text to self, text, world. Write a structured paragraph response. 

Option Three: Answer, Prove it (text evidence) explain it (use transitions - furthermore) Summarize it 

3. Self- Assess and Peer Advice

Did you start sentence with capital letters? 

Did you write x number of sentences? 

Did you use transition words? (Or your focus for the day)

Did you capitalize I? 

Give yourself a check for each item. Revise and submit. 

Peer Advice: Two things you heard. Use word wall of writer/authorial choices to help name and give examples. Name two what if's of author choices that might improve the text. 

4. 5 Minute Grammar Lessons

Show examples from your writing. Let them notice the structure. Give them check points to evaluate. Have them write and share. Next day, use the structure to answer a question about the day's learning. Add it to a quiz.

5. Utilize Tiering

Gave the example of DOL bell ringer to find the mistakes with homonyms/multiple meaning words. Build several tasks...

Tier One: Find the mistakes. 

Tier Two: Find other homonyms and how they are used. 

Tier Three: Write your own examples or re-enter your writing. 

6. Establish a culture and climate of grace. 

GPS: You moron. You missed the exit again. I told you 4.6 miles ago to get in the right lane. What were you thinking. 

She doesn't even huff - she just says, "rerouting" or "recalculating." Our kids need this too. We do this by being accessible and approachable. We do this by being interruptable. We do this by using positive words and putting on our smiles with our outfit. 

Story - The average person lives to 76. That's a certain amount of days. At 56, I've lived 20, 440 of my days. I only have 7,300 left if I live to the American average. What I do matters...and I'm running out of time. I want to spend it doing the right things for those in front of me. Psalm 90:12 and Psalm 118:24. 

Monday, October 2, 2023

Phonics IS the V in MSV

Gonna be tacky for a while. (This is actually an old issue, but continues to have serious, unintended consequences. A bit scared to post it, as people will probably attack me because they think I'm stupid and misinformed or slamming phonics. Which I'm not. Well, not stupid or ignorant about this issue.) 

 To those who are demonizing the 3 cuing systems...I wonder if they realize that phonics IS the V in MSV (Meaning, Structural, Visual). To those writing legislation to outlaw curriculum with a V in it, do they really want us to NOT look at the phonics? I can just imagine us rowing through books with blindfolds like Sandra Bullock in Bird Box. Ridiculous. 

Friends, MSV isn't the devil some claim it to be. Here's some information to consider.

Semantics: 

Visual - What are the letters? What are the sounds they make to produce the word in print? Phonics IS the visual component of language that we use to translate ink and pixels into sounds. 

"Does it look right?" isn't a suggestion to guess what the word is. It's not really even a strategy. It's a piece of the way the language functions. 

And it's cultural. 

Marie Clay was Australian. They say, "Have a go" like Texans say "try it and see." "Does it look right" is a cue for kids to monitor and self-regulate. 

Let me translate this maligned question for those that don't understand the science: 

Here's what happens with the visual, phonics cue: 

  • The kid says a word after looking at print. 
  • Right or wrong, the teacher wants them to check their accuracy. 
  • Texan:"Kid, match what you heard yourself say with the visuals/letters in the print. 
  • Linguist: Use your phonemic awareness of sounds and connect it with your understanding of phonics." 
Is a kid going to understand the Texan or the linguist? Bless your heart.

A Solution: 

Isn't it easier to ask kids if what they said looks like what's on the page instead of asking them to explain the linguistic processes and names of phoneme types? I thought we were trying to look at words to say/hear them so we can understand what the words mean. House or horse. It's a big deal. 

If a kid can tell you what a digraph is but can't tell you what their mouth does to make the word, is the linguist helping?  Furthermore, neither the Texan or the Linguist will be successful in teaching reading if the kid says the word correctly and doesn't know the difference between a house and a horse. 

There's more to teaching reading than phonics. V is just one of many cues. (And for those that don't know the sciences, there's more than three.)

Wednesday, August 30, 2023

A process and resources for authentic STAAR remediation, response, instruction, and assessment: An English I Case Study and Suggestions


Premise: Before we collect more data, let's intervene on the data we have. 

Premise: Our work does not have to look like a STAAR test. 

Premise: There's a lot of thinking and work to be done BEFORE you can answer a question. Most of that isn't a TEK. 

Premise: Answering multiple choice questions is a completely different skill than any of our TEKS. Answering questions is about thinking and reasoning that you do while reading the source text and while analyzing the question and choices.

Premise: The English course is essentially the SAME course for all grades. Hate me if you want, but listening, speaking, reading, writing, viewing, and thinking are the same K-12. The only difference is text complexity. 

Premise: Thinkers must be able to examine a text and make decisions without the use of questions and in their own writing and that of others. 

Premise: Skills and content are best practiced and mastered when we mess with and apply those things in our own writing.

Explanation and Process: 

Prioritizing what we have for specific purposes

  • Use the 2023 released exam for the previous grade level for analysis and instructional material. 
  • Use the 2023 ECR and SCR samples. Use them for exemplars. Use them for sample texts to revisit/revise. Give them to students so they can grapple with their responses (or lack of response) and improve. 
  • Use the 2022 full length practice exam for crafting common based assessments or formative assessments. (I know, it's not psychometrically aligned, but it's what we have. You could also use and revise some of the previous STAAR exams.) 
  • Save the 2023 released exam for the current grade for the benchmark in December or January.

Resources for English I:

For Instruction:

Analysis and Tools for 8th Grade Released Exam

Powerpoint (in draft to show the process, more to come as time permits) 

For Assessment

The Benefits of Learning to Play an Instrument: Text and Sample Prompts for ECR, SCR


Tuesday, August 15, 2023

2023 STAAR Extended Constructed Response Analysis, Interpretations, and Recommendations

The extended constructed response seems like a hinge-point because success depends on what kids DO while they are reading and responding. Success on these items can be a gateway to success on the other items. 

So - here's some things to think about as you are interpreting results, designing interventions, and planning initial instruction. 

By the way - If you didn't know, you have access to ALL the writing students have done on the exam. It's important to print those out for the kids you have. Revision is the best way to improve writing. And it's a good place to start in helping students become aware of their thinking and reasoning. Eventually, we'll have pdfs of it all. Right now, we're having to clip, cut, and paste. Your testing coordinator will be able to give you access in the system to see student results and will have the pdfs eventually. 

Here's a hyperlinked pdf of the analysis that you can use in your plc's. 


 





Saturday, August 12, 2023

So many ECR Zeros on STAAR? Why?

Check out these scorepoint graphs: 



Why so many zeros? Well...it was new. And 3rd graders are NINE. But perhaps we need to look at the obvious...or what is obviously hidden before we start diving into TEKS.





Here's an analogy - did you know that if you break open a cottonwood twig, that there is a STAR of Texas hidden inside? (Project Learning Tree has a cool lesson about it here.) The star was there all along, but you didn't know it was there. 

In the TECH world, they study the "user experience." Kids all over the state told us that they didn't have an ECR on the assessment. Why did they think that? Remember - most kiddos are taking this test on a chromebook tiny screen with no mouse. Here's what they saw: 


Semantics: 

Students are asked to write an argumentative essay. Not an ECR. Could the problem be semantics? Kids need to know they will be asked to write a composition or an essay. (And not an S A. Before my kids saw the word typed out, they thought I was just saying two letters.) ECR is an assessment item type vocabulary and is not used for kids on the test. Same goes for passages - that's what we call them. That's not the academic language used on the assessment - STAAR says "selections." Our academic synonyms might be causing some of the confusion. 

User Experience: 


And...did they SEE a place to type the essay? Sure, there's a little arrow that says to scroll down. The scrollbar that says there is more to see doesn't show up until you hover over the right hand side of the screen on some computers. On other computers it does. 

That's a wonky user experience for something kids aren't familiar with. With which kids aren't familiar. Whatever. You get the point. 

Our solution: Kids need to be beyond familiar with the TECH elements. They need to be fluent with them. And our words need to match their experience - "You'll be asked to write and essay or a composition. You'll need to scroll down or use your arrow keys to see where you need to type." 



Font and Text Features: 

Another problem is the way the paired passages appear. 

See how these scroll on the same "page"? It looks like "Laws for Less Trash" might be a subheading for "Rewards for Recycling." 
That's a problem. Kids need to make sure they are attending to the bold material at the beginning of the passage and know that "selections" means two different passages. Another semantics issue. 
    But this is also a text feature issue. I didn't see any subheadings in the third grade test, so I had to go to 4th grade. 

See how the titles of passage are center justified and of a certain size font? Now compare to the subheading: left justified and a smaller font. These are text features that serve as cues for readers about what the author is doing as well as when a new passage/selection appears. 

It will be important to teach kids how to understand and decode the text features and navigation elements (like that little almost invisible grey arrow on the right hand side of each screen that says there's more below and you need to scroll down.) 



Using Cambium Components for Self-Regulation: 


The very top row is designed to help students understand and see how much of the test they have completed with the blue bar and the percentage complete. They know how much energy they need to use and how much time to save. But they can also see where they have and have not answered questions. The little red triangle tells kids where they have NOT answered questions. This would have been a huge cue to students that they'd missed answering the essay question. But did they know it was there? Were they fluent in using the tools to monitor their progress and to check for completion? 

Before we start digging into TEKS (and especially accountability ratings and class rosters), let's do some talking with kids about their experience and their approach when taking the exam. Our solutions can start with modeling how the platform works and using similar tools during daily classroom instruction so that students are fluent with technology experiences beyond a familiarity with a tutorial or mention of the tools. Let's make sure students understand the user experience and how to use the tools to enhance their comprehension and demonstration of grade level curriculum. Until then, they're walking in a Texas creekbed under the cottonwoods, not knowing about the hidden treasures all around them. 



Monday, May 1, 2023

Readability, TEA and Hemingway, DOK and Worms

Questions from the Field

I wanted to get your thoughts and opinion about reporting DOK and Bloom's information with items and readability formulas with passages.

DOK was not designed for how people are using it. I talked with Norman Web, himself about it. Even got a picture and an autograph. DOK was supposed to match the rigor of the curriculum to the assessment NOT the rigor of the questions themselves. And this distinction is worth noting. Questions are not written at DOK levels, the curriculum is. DOK is supposed to measure a gap between the assessment and the standards. So...writing questions at DOK levels skips important context and grounding from the standards. Does TEA use DOK to write questions? I'd like to see their charts. 

Blooms: This is the model I recommend for Blooms.

Research Ideas and Next Steps: 
  1. When we pair the KS with the SE, what is the Bloom Level? (for each breakout in the assessed curriculum) 
  2. When we look at the item types, what is the connection between Bloom and DOK? (for each item type and TEK) This will have to build over time because we will only have certain item types for certain TEKS for a while. And it will vary by grade. 
  3. Does TEA's interpretation of Bloom and DOK on the assessment match the curriculum? 
  4. Once we can see the gaps/alignment, then we can make some decisions for metrics, practice, and instructional interventions. 
    1. What do these metrics tell us about student performance/growth? 
    2. How do these metrics inform Tier One Instruction? 
    3. How do these metrics help us form progressions of learning and identification of pseudoconcepts that result in refined teaching and intervention for Tier Two Instruction? 
    4. How do these metrics help us write better items and give better feedback to students, parents, and teachers? 
We are taking our cue from TEA to use F/K readability scores and the "Hemingway" app they recommend, so I feel like the info we are collecting is TEA-ish-aligned, but is it the type of readability score you think teachers will want to see or care about?

Thanks for sharing this. I didn't know they were recommending Hemingway. I had to do some research. What do you mean by F/K readability? Flesh-Kincaid? Hemingway uses a similar algorithm as the F/K but somewhat different. It uses the Automated Readability Index. 

Commentary: I think TEA's move here is a good one; however, all readability formulas are flawed. I like that Hemingway's formula uses length of words (loosely associated with vocabulary) and length of sentences (directly associated with T-units and idea density/complexity). 

Note that Hemingway and The Automated Readability Index are not really grade level descriptions that teachers are used to. These numbers are NOT grade level markers that we see in Guided Reading, DRA, Fountas and Pinnell, Reading A-Z, Reading Recovery, or even in Lexiles. These measures do not measure the grade level of the text, but describe the amount of education a reader would need to understand the texts. TEA is using these measures to defend the passages. Teachers use readability measures to match readers to texts they can read easily on their own, texts that are useful for instruction, and texts that will be too frustrating to read alone. It would be a mistake for teachers to use Hemingway to match readers to texts because that's not what it does. 

Hemingway is more about CRAFTING text for readers so they will be successful.  The purpose of the scale is what is important here: how do you write in a way that most people can understand your message and purpose? Writing for people with 9th or 10th grade education levels is ok, but many people aren't that proficient. The Hemingway app and measures help you simplify your writing so that it lands where people with 4th to 6th grade experiences can understand what you intend to convey. Again (as we saw with DOK), we have a disconnect between purpose and how the thing is being used. 

We cannot provide Lexile scores for a few reasons (cost of license being primary), but we can provide some more content-based and not just language-based readability formulas, such as might be seen in Fountas-Pinnell readers.

Lexiles. Eye roll. So many better measures out there. Glad they aren't useful to you.
Content based measures. Hmmmm. That's problematic semantically. I wouldn't say that Fountas-Pinnell readers are content based measures as their levels are also language and text feature based. In ELAR, there really isn't any content past early phonics and some grammar. The rest is process. I know of NO way to measure content levels. 

Do you see a need/want for that among teachers, or is a simple language-based tool like F/K enough in your opinion?

What I see here is the potential for confusion. Already we have a mismatch between TEA recommendations of using Flesch-Kincaid and an app that uses something different. In addition, the semantics and purpose seem similar but have distinctions in practice that confound their use and application with matching students with texts, measuring growth, selecting curriculum materials, writing assessments, and planning instruction for both reading and writing. What a mess! There's a military term I'd like to use here...

Here's another wrench in the works --as if we didn't have enough to worry about:  When you use these formulas to measure readability of questions as units of meaning (like passages), questions are FAR FAR FAR above grade level of any measure. Questions are dense, complex little beasts that no one is talking about at any grade level in any content area. 
Grade 3 Full Length Practice Questions 1-3 analyzed by Hemingway: 
Screenshot 2023-05-01 at 1.52.31 PM.png
Screenshot 2023-05-01 at 1.56.17 PM.png
Screenshot 2023-05-01 at 2.01.17 PM.png

As you can see, using TEA's own recommendation, 3rd graders would need an experience of a fourth or fifth grader to answer the first three questions on the assessment. And that's after reading the passage. The more I look at this stuff, the more I believe we aren't measuring curriculum or student growth or any of the things we think we are measuring.  

Initial thoughts on solutions: 1) Give a language based, classical formula that teachers understand. 2) Give a second, grade level "experience" measure for comprehension and schema or reader's ease. This gives us a chance to help teachers understand what TEA is doing here. (Reminds me of Author's Purpose and Craft. TEA has a specific purpose for their readability stuff. It's about making sure they can defend (to the legislature and parents) what kinds of texts they are using to assess the curriculum. Teachers' have different reasons - ones that have to do with supporting a child and their growth instead of the validity of the assessment. 
 
Secondly, we have been tracking Bloom's and DOK as quickly-trained evaluators (I had a three-hour seminar years and years ago at xxx; xxx has had some haphazard PD as a middle school teacher over the years). As you no doubt know, for a STAAR assessment, we find a lot of DOK 2 / Bloom's "Analyzing" items, and so it seems like it might not be the most useful metric, but we are also not experts and might be missing some subtleties between TEKS and items that would give a more varied report. So my question is two-part. Do you agree that we are likely to see similar DOK/Bloom designations across many items, and if so (or not) is this information you think teachers will want or could use in classroom instruction or reteach? Is the information bang worth the research and editorial review bucks for DOK and Bloom? And perhaps DOK is appropriate and Bloom's not (I kind of lean this way personally)? Sop that's four questions, I guess. :) 

Can-o-worms. Earlier, I described problems with DOK and questions. If you are not matching the question and the curriculum to determine the DOK, then the data you get from that doesn't do what most think it would do. So...that has to be fixed first. 

Do I think we are likely to see similar DOK/Bloom designations across many items? My first response is: People tend to do what they have always done. So yes. TEA tends to do all kinds of things from one design to the next. So no. 

My second response is: How does any of that help the teacher? We see this ongoing work in training for unpacking standards. But honestly, if TEA isn't transparent about what they thing is DOK or Blooms, then we are guessing. Do we have solid instructional practices and understanding of the curriculum that LEADS to success on these descriptions? Labeling them without that seems like a waste of time to me. Teachers might want to put DOK and Blooms as compliance measures for item analysis or in lesson plans, but honestly...what does this change for the instructional impact on students? "Oh, it looks like 75% of our kids missed DOK 2 questions." Now what? 

My third response is this: We haven't even gotten results back. Districts and people are downtrodden and devastated and confused. They are all feeling a little nutty about everything. It's too early to even know or make a good decision. I'm wondering if we are making all of this so much more confusing than it ought to be. Mom always says, "Rest in a storm." That never feels good in a storm. But what good are we going to do if we try to build a DOK nest in a tornado? 

Is this information you think teachers will want or could use in classroom instruction or reteach? 

I don't know. Do we know what kind of instruction fixes DOK problems? I'm not sure we do. Is the DOK or Bloom what's actually causing the problem? For lots of reasons, I don't think so. There are too many variables and too many of them cross pollenate and create varieties of problems that didn't exist before or for everyone. There are SO many instructional implications before we ever get to a question and its level that are not being addressed. It seems counterintuitive to fixate on DOK before we know and repair the underlying issues. 

Here's an example. A local school here decided they wanted a greenhouse and a tennis complex. Funds were acquired. Community was excited. Structures were built. Programs began. Years passed. In the nearby school building, walls and whole classrooms in the school threatened to collapse. Know why? There was no problem with the structure and quality of the building. The contractors had done masterful construction that should have lasted a century or more. The greenhouse and tennis complex were built on the well planned and placed drain fields to take the water away from the sodden clay of our panhandle soil. The problem isn't the structure of the building/question/DOK. The problem is how the whole system worked together. 

Is the information bang worth the research and editorial review bucks for DOK and Bloom? And perhaps DOK is appropriate and Bloom's not (I kind of lean this way personally)? The problem is that we have to make decisions now when we don't have the land survey to tell us how things are built and how we should proceed. 

It's a crapshoot. We might spend a lot of our resources to make something that isn't useful. We might make something that looks good and attracts attention but isn't consequential for helping those we want to serve the most: our students. I'm more "meh" on Bloom's as well. I just can't bring myself to care much about either one when I consider all the things that need to be fixed before labeling a question with a level that we can't validate. I also think the question types themselves indicate their own DOK.