Assessing Teachers: Output Data

This is the first in a series of posts over how to judge teacher effectiveness using input and output data. In this first post, I discuss the conversation that education circles should be having over how to properly utilize quantitative data to assess teachers.

Much of the discussion in education circles today revolves around performance pay and standardized testing. The reality is that these conversations devoid people from understanding the true basis for the problem by answering the most important question: how do we judge teacher effectiveness? There are, essentially, two forms of data that will be used in judging teacher effectiveness: input data and output data.

I think most of the educational community would agree that using metrics like standardized testing or end-of-course tests cannot be the sole barometer of teacher effectiveness. Yet at the same time, I think that these types of quantitative data (outputs) should play an integral part in understanding who the best teachers are. “How much of a part” is a question that needs to be hammered out especially if we want the best possible education for our kids. Teachers must be willing to accept that some form of outputs are going to play a role in assessing the quality of teachers, but rather than kick and scream over whether or not that is fair, teachers should work diligently to voice rationale arguments for ways in which that data is used and what types of data are valid.

For instance, the discussion should revolve around linear growth models of assessing teacher effectiveness. If a student comes in reading at a first grade level in fourth grade, it would be ridiculous to claim that the teacher was ineffective if the student left fourth grade reading at a third grade level. After all, we are not turning out widgets. Here are some important questions we should all consider:

  1. How MUCH growth should we expect from a student in any given school year?
  2. How do we pre-test (September) and post-test (June) in a manner that is conducive to each, individual student and representative of learning?
  3. How do we organize the tests in an interdisciplinary fashion to emphasize critical thinking skills while also gaining a full picture of student progress in all areas?
  4. What do we do with students that miss “X”% of the school year due to chronic illness or poor attendance?
  5. How do we increase retention of knowledge over the summer to prevent student recession of knowledge outside of the academic environment?

These five questions are just a sampling of how we need to start thinking about output data in judging teacher effectiveness.

Why educators don’t engage in this type of question and answer session with policymakers of all levels is beyond me, and why policymakers refrain from this conversation is impossible to understand. Simply fighting quantitative data as a metric for teacher assessment is a losing battle. I much prefer discussion over “how” to make it work than whether or not it is “fair” simply because society does not care if teachers think it is fair or not. People want results, and the rate at which information is passed from one entity to another means that accountability is easier to monitor and means more in making decisions.

Educators: we need to stop fighting and start talking.

Related Articles:

Post Footer automatically generated by Add Post Footer Plugin for wordpress.

Leave a comment

2 Comments.

  1. Have you read “Making the Grades: My Misadventures in the Standardized Testing Industry” by Todd Farley yet? If not, then it’s a must read if you want to have a discussion about giving any weight to standardized tests. I know you wanted to talk about how to do standardized testing well vs. is it right, but if it’s not right, is it worth the discussion?

    The very simplified example I typically use when I say that ALL testing is arbitrary is this:

    Imagine a teacher gives a list of 100 words (these can also be concepts, formulae, dates etc.) to be studied for a test which will consist of 10 of these items.

    Student A enters the test knowing 90 of these words. It could be a result of studying hard, but only managing to learn (or memorize) 90 of them. It could be that the student had a lot of prior knowledge and actually studied nothing.

    Student B enters the test knowing 10 of these words. Same story – we don’t know how the student came to know these 10 words/concepts/dates/theories, but on the day of the test, he knows 10.

    Let’s say that the 10 items student B knows just happen to be the 10 that student A doesn’t. And, let’s say that these 10 words end up being the very 10 on the test. How much confidence do we have in a test that gives Student A 0% and Student B 100%?

    It’s important to realize that this is what we do every time we give a test. We can’t possibly test everything from the unit, so we make choices. Some of the choices seem completely arbitrary. How many tests have you written where you looked at a question and asked yourself, “Seriously? THIS is on the test? But I know x, y and z cold! This seems so random.” As a tutor, I would see tests from dozens of different teachers and schools that in theory covered the same provincial content vary wildly in which skills were or weren’t included. So that’s non-standardized testing.

    As for standardized testing, I’ve tutored kids for standardized provincial exams, the SAT, ACT, AP exams . . . the problem with standardized tests is that the fact they’re standardized makes them completely knowable. To demonstrate this to my students, I can take a reading comprehension text from the SAT that I’ve never seen before, go straight to the questions (not reading the accompanying text) and I consistently score between 80 – 100% just because the questions themselves are so predictable. I can tell by looking at a question what the right answer is likely to be, what the question is intending to test, what the false fake-out answers look like etc.

    You might say sure, I taught standardized test prep for 8 years professionally, so of course I’m going to get to the level of familiarity where I can outsmart the test. But what about the kids? Well, the more we rely on standardized testing, the more accustomed the students will get to it, too. In Canada, we only write the SAT if we want to go to a US university because our institutions don’t expect it. So, I’ve never actually written an SAT myself, and didn’t have much occasion to even teach it until the last decade of my career. But, how many years of standardized testing do US school kids go through? (We always hear about the Gr. 3 test in Texas up here.) 8 years seems like nothing if standardized tests become a common measure of education and are therefore administered yearly. If I can master it in less than 8 years, who’s to say that the kids won’t, too?

    We haven’t even yet discussed the issue of marking the tests, which is best left to Todd Farley. It’s an excellent read, and confirms what I’ve heard from markers of provincial exams here. There’s a brief interview of the author here: http://scholasticadministrator.typepad.com/thisweekineducation/2010/01/some-people-who-tell-all-about-the-industry-they-worked-in-are-greeted-as-brave-whistleblowers-and-embraced-by-the-media-and.html

    More generally, I simply don’t believe any test can give you an accurate picture of what someone has learned from a particular teacher. Students are taught fractions every year from Grade 4 to Grade 8 (roughly). Just because it finally clicks after 3 years, does that mean that the earlier teachers had no hand in the process? Do we know that the student’s revelation in Grade 7 had anything to do with the current teacher? What if the student made a connection all on her own that suddenly crystallized her learning? What if she saw a documentary on television or read a book from the library? Because learning requires repetition and reflection, I don’t know that it’s possible to “award” a certain teacher with credit for a particular piece of learning. Maybe a student in Gr. 8 performs outstandingly on “Taming of the Shrew” because *last* year’s teacher was so good at laying a Shakespeare foundation for “Romeo and Juliet”, but the student understandably struggled in Gr. 7 because it was the first time she’d ever seen Shakespeare.

    And the very fact we’re talking about retaining info over the summer implies that whatever results we do test at the end of one year could very well be attributed to short-term learning. Long-term learning takes longer than that to manifest and solidify. It reminds me of a certain tutoring franchise that instructs its tutors to end every class with, “So kids, what are 3 things we learned today?” and write 3 things on the board, knowing that they will then release these kids to parents waiting in the hall who will ask, “So what did you learn today?” The fact that these kids have been primed to parrot back 3 items (and even the slower students could surely remember one thing they could tell their parents) gives the illusion to the parents that their children had truly “learned” these items. The real learning is debatable.

    We can’t figure out (or at least agree upon) how to quantify success in our adult lives. I’m not so sure we should be so presumptuous as to think we can quantify success in students when we can’t even all agree on the purpose/function/goals of schooling (citizenship? self-actualization? employability?)

  2. Upon re-reading, I realized I didn’t apologize as strongly as I would have liked for going where you didn’t want to go. So first of all, I want to make sure the apology is strong that yes, I debated the merits of testing instead of taking your assumption that they can’t be fought. Sorry about that. :)

    So, to try to be a bit more constructive, I could see myself being swayed if we changed the nature of “testing” from student vs. a pre-set list of questions to something more interactive.

    When I first work with a new student, I need to see him work through questions to really know where we’re starting from. Knowing he got 60% on a test is useless. Did he fully understand 60% of the test and leave the rest blank? (If so, why? Time? didn’t understand those questions?) Or, was every question 60% worked through but nothing to completion?

    I also need to see how the student responds with feedback. You’d be completely shocked that these are a typical tutoring sessions with a struggling student:

    Student: I’m stuck.
    Tutor: Well, what do you know how to do from this point in the solution?
    Student continues to successfully complete the entire question on his own.
    ———–

    Student: I’m stuck
    Tutor: If you took a wild guess, what would you do next?
    Student continues to successfully complete the entire question on his own.
    ———-

    Student: I’m stuck
    Tutor: Well, what if you read the question again?
    Student continues to successfully complete the entire question on his own.

    I could, of course, go on and on. Sometimes an actual hint is given, but often, just a reminder to take stock and evaluate is enough. But I can tell you how these students are testing…

    So, that’s a big problem with either the nature of testing (we perform very few other tests in such isolation with so little support) or the nature of how we prepare students for tests. Any kid who’s been tutored regularly will tell you that one of their test strategies is simply to imagine their tutor sitting with them asking, “What do you know how to do next?” or “Read the question again to see if there’s a clue or something you missed.” The common joke is, “Can I just take a tape recorder with your voice into the test with me?”

    So, if we change the definition of testing, then perhaps I can be on board with discussing how to use them to evaluate the success of teachers and students. Until then, I’m just forced to again apologize. :)

Leave a Reply


[ Ctrl + Enter ]