Inter-Rater Reliability Session Reflection

Over the course of two days, Courtney and I hosted about 60 of our areas 80 principals. The focus of the work was to do a round of calibration for inter-rater reliability when rating teachers. Many APPR plans require the leaders to come to this type of training on a somewhat regular basis while some districts are doing this work in-house. It is part of the regs, so districts have to find a way to do it.

The session was structured around scoring all four domains of Danielson. So we had pre-conference notes and a lesson plan for them to work through for Domain One. One thing they struggled with immediately was the feeling that these artifacts weren't enough. Many said that they would ask more or leading questions to try to illicit more information than was provided by the artifacts. One principal said he runs his pre-ob conferences with a focus on the Effective score level. Essentially asking, "are you here?" or "how do we get you here?" I find this to be a fair method. It recalls good rubric use. I.e.- here is the performance criteria, this is what proficiency looks like, let's work towards that.

So, doing the module in this format, as TLS does when calibrating in Albany, the artifacts exist as is. No leading questions, no other help. What you see is what you get. Generally, our principals scored this teacher too high in Domain One. There was discussion of the students with different needs, there were standards clearly enumerated, there was a scheduled lesson plan. For the audience, they found this performance to be proficient. However, the calibrated score is developing. While the standards are listed, there wasn't much connection to actual learning. And while there was a scheduled plan, it generally revolved around what students would be doing, not learning.

Then we watched about 20 minutes of the teacher in the classroom. It is a 4/5 literacy classroom, with ELLs, and she is a second year teacher. I was struck by how many of the leaders in the room flat out laughed at the teacher. We've shown a developing teacher to an audience of teachers before. And, teachers being more raw, it is somewhat expected. But it was off-putting that building leaders first reaction was to ridicule her. That their coaching instincts didn't kick in. I opted to remind them of the evidence collection cycle where the collection happens first, the alignment second, and only later, the value judgements. Suffice to say, they underscored her performance in Domains Two and Three. Easiest to say, the calibrators of this video are very strict to the language of the rubric. And our attendees generally disagreed. To digress, I just spent a day with math teachers who were disagreeing about the anchor papers for the math 3-8 assessments. Our facilitator just kept reminding the crowd, that these are the calibrated scored. You may not agree, but compliance is necessary for standardization across the state.

One other thing worth noting, we frequently show a teacher who exemplifies student cognitive engagement, constructivist learning, and 21st century learning. We almost always get push back that this isn't real life- show us a regular teacher. So, ha ha ha, we show a developing, regular ol' teacher, and we get push back- show us something exemplary.

Finally, the group was right on in scoring Domain 4, teacher reflection. They generally agreed there was hope for this teacher if she could translate her accurate reflection into future planning and practice.

