Unfogged: Comment on Talky Times about assessing teachers

A comment from the Gelman post:

I think you may have missed a very important point here: the system is being graded on a linear scale when the marginal improvements are not linear. In simpler terms, they are assessing an increase from 2.0 to 2.1 (delta = 0.1) the same as an increase from 3.6 to 3.7 (same delta of 0.1). But going from 3.6 to 3.7 is much more difficult than going from 2.0 to 2.1, simply due to the upper-bound scoring of 4.
To put it another way, imagine there is a weight loss contest. A 300 lb. person can lose 20 lbs. with not much difficulty. But can a 120 lb. person also lose 20 lbs. so easily? I'm not sure they have addressed the non-linearity of their system properly.

Posted by: apostropher | Link to this comment | 03-10-11 9:37 AM

Gelman isn't talking about how crappy the methodology for making the tenure judgment is.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 9:38 AM

4: but I'm not sure that this is a system where you want to fully model the nonlinearity; assuming tenure spots are a fixed good across the school system (which may not be very accurate, but anyhow), a teacher who can perform way above average with poor students is of much greater benefit than a teacher who can perform slightly above average with excellent students.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 9:40 AM

I mean, if she really is that good of a teacher, and she wants a better chance of achieving tenure, she could request a transfer to a shittier (non-selective) school.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 9:41 AM

6: Teacher evaluation shouldn't be done according to a mathematical formula. That's just an excuse for school administrators to half-ass one of their most important duties.

Posted by: apostropher | Link to this comment | 03-10-11 9:43 AM

8: well, that's a different question.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 9:44 AM

7: This seems to assumes that "good teacher" is an absolute value and that the same teacher who is excellent at teaching g&t kids will be excellent at teaching deeply struggling kids and vice versa. My experience would say, not so much.

Posted by: Jimmy Pongo | Link to this comment | 03-10-11 9:45 AM

6: Teacher evaluation shouldn't be done according to a mathematical formula. That's just an excuse for school administrators to half-ass one of their most important duties.

Might also be usefully re-written as:
"Teacher evaluation shouldn't be done according to some Principal's 'impressions' and 'observations.' That's just an excuse for school administrators to 'fire the ones that are too black.'"

Posted by: Annelid Gustator | Link to this comment | 03-10-11 9:48 AM

To follow on from Apostropher @4:

Two thirds of her students are already achieving the top score on the tests. She may be teaching them lots of stuff, but they can't score better than the top score when retested. So her effectiveness on those kids is not being measured.

Posted by: jim | Link to this comment | 03-10-11 9:49 AM

10: well, fine. But that still leaves open the question (again, assuming that tenure slots across the school system are fixed which, who knows) of whether a teacher who is dedicated and hard-working and pretty good with well-prepared, gifted children is as useful to the school system as a teacher who is impressively good with ill-prepared, troubled children.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 9:49 AM

12: true, which is why the model includes a lot of other weighted terms to account for that kind of ceiling (or, in other cases, presumably floor) effect.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 9:50 AM

What Sifu said, more or less. The non-linearity argument works both ways. It may be numerically harder to go from 3.0 to 3.1 than from 2.0 to 2.1, but that doesn't mean it is pedagogically harder.

Posted by: Ginger Yellow | Link to this comment | 03-10-11 9:52 AM

It could be worse, since apparently failing to get tenure in this school system doesn't mean getting fired. But still, they're using a measure which is objectively stupid as a way to evaluate teachers who teach good students. I don't understand how this is defensible, even if, as Sifu suggests, you think that teaching of good students isn't very important.

Posted by: essear | Link to this comment | 03-10-11 9:53 AM

16: looking at the quote from the school department rep in the article, it looks like they understand that the unreliability of the measure makes it problematic as a single measure, and are instead just waiting for a two year sample to clear a given bar, something that should happen eventually even with a wildly varying measure:

"We are saying that a teacher's tenure decision should simply be delayed (not denied) until that teacher has demonstrated effective practice for consecutive years in all three categories. The alternative is what we've had in the past -- 90-plus percent of teachers who are up for tenure receive it. Do you think journalists deserve lifetime jobs after their third year in the business?"

Now, there's a question (posed in the article) of whether that uncertainty is going to drive talented people away, but it doesn't seem evident to me that this system is going to result in talented people either losing their jobs or being denied tenure permanently.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 9:57 AM

11 is right. I'm wary of teacher evaluations of each other. They can be very useful; they can also be gamed, or a venue to take out personal aggression.

None of these assessments should be taken as the end-all-be-all, but taken together, with repetition, they can give a decent picture of a teacher.

Posted by: heebie-geebie | Link to this comment | 03-10-11 9:59 AM

something that should happen eventually even with a wildly varying measure
Not necessarily.

Posted by: Eggplant | Link to this comment | 03-10-11 10:00 AM

But that still leaves open the question (again, assuming that tenure slots across the school system are fixed which, who knows) of whether a teacher who is dedicated and hard-working and pretty good with well-prepared, gifted children is as useful to the school system as a teacher who is impressively good with ill-prepared, troubled children.

Um, no, but only because you've already stacked the question by labeling the G&T teacher "pretty good" and the troubletown teacher "impressively good." Take away that preweighting and I think you're left with a question of balance within the system. And in that case it's not a matter of whether any given teacher with strengths in x is more valuable than one with strengths in y, it's what needs most improving in a particular system in a particular time. Tenure approval may not be the best or the only tool available for sorting that out.

The idea that G&T kids will naturally do just fine under whatever conditions they are given is, I think patently false. I don't think school systems should be weighted towards nuturing the "talented tenth", but I also don't think that we should push as many resources as possible towards those struggling the most and expect everyone else to sort themselves out.

Posted by: Jimmy Pongo | Link to this comment | 03-10-11 10:06 AM

17: Honestly, I think the problem here is equation of 'tenure' with 'lifetime job'. If we've got a tool that allows us to identify a teacher that genuinely sucks, and it's reliable enough to be used to deny tenure, it should be treated reliable enough to fire for cause, which you can do with a tenured teacher.

There's something weird about 'We have this tool that can identify a terrible teacher. So we've identified this terrible teacher, and what we're going to do about it is leave her in the classroom as an at-will employee -- we don't want to actually keep her away from teaching the kids, we just want to have a category of teachers that we could fire whimsically if we wanted to.'

Posted by: LizardBreath | Link to this comment | 03-10-11 10:06 AM

Of course, it's also undeniably the case that a statistical model is going to fail to capture the full complexity of the situation, so you're inevitably going to be able to find examples of teachers who were ill-served by the system, and that may well be what's happening in this case. There might be excellent reasons not to use the model, or to give it less weight in tenure decisions, but the fact that examples of people who have been plausibly personally ill-served by the model exist is not one of those reasons.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:07 AM

21: right, but they are not using it to deny tenure. They're using it to award tenure, and, implicitly, to delay tenure for those who were not awarded it.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:08 AM

Um, no, but only because you've already stacked the question by labeling the G&T teacher "pretty good" and the troubletown teacher "impressively good."

Um, that's based on the parameters given by the model in the article, um.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:09 AM

19: for a broad enough definition of "eventually", sure. Realistically, no, not necessarily.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:10 AM

None of these assessments should be taken as the end-all-be-all, but taken together, with repetition, they can give a decent picture of a teacher.

What makes you say this? I ask because such assessments seem to presuppose that there's a clear, not to mention relatively unitary, definition of what good teaching is. But maybe I'm not misunderstanding what you mean or the methodology. That seems totally possible to me, given that the world of assessment has always struck me as a huge racket, a world in which people (those producing the assessment schemes) who aren't very good at math can pass themselves off as reliable experts to people who are even worse at math (administrators) and looking for some kind of measure that will allow them to quantify things that probably are better left qualitative (even with all of the inherent problems in such measures).

Posted by: Von Wafer | Link to this comment | 03-10-11 10:13 AM

25: Still no. Compare a teacher who inherits kids taught the previous year by a teacher fantastically good at raising scores to one who inherits underperforming kids.

Posted by: Eggplant | Link to this comment | 03-10-11 10:13 AM

That clearly made no sense at all, and I'm cool with that.

Posted by: Von Wafer | Link to this comment | 03-10-11 10:13 AM

I guess what I'm thinking is that if consistent poor results on this don't lead to firing, then that means that people using it to delay tenure don't actually believe in it, and if they don't really trust it, they shouldn't be using it even to delay tenure.

Even with the very high uncertainty, there's some number of years after which if the measure is worth anything, you'd expect the average to settle down at a number that meant something about the teacher. If the school system isn't willing to implement that as a reason for firing for cause for consistent low performers, then something's screwy. (And if teaching good students is going to make anyone look like a consistent low performer, then that's really going to destabilize the system.)

Posted by: LizardBreath | Link to this comment | 03-10-11 10:15 AM

The idea that G&T kids will naturally do just fine under whatever conditions they are given is, I think patently false.

Yes, they need to be shaken, not stirred.

Posted by: Ginger Yellow | Link to this comment | 03-10-11 10:15 AM

28: No tenure for you!

Posted by: apostropher | Link to this comment | 03-10-11 10:16 AM

27: so you're saying that the tecaher's entire population of students, in every school year, was previously taught by one of those two teachers? Well, okay, I suppose that's a plausible thought experiment (although I assume in practice it never happens), unless the teacher's class is made up of seventh graders in their first year at a new school, which is the case for the woman in the article.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:16 AM

26 made perfect sense. (BTW, did you see my email the other day? I think your FB account is sending out spam.)

Posted by: LizardBreath | Link to this comment | 03-10-11 10:16 AM

31: Too late!

Posted by: JP Stormcrow | Link to this comment | 03-10-11 10:17 AM

24: You're um!

It seemed to me that you were making a more general statement about the system evaluation and the value of the sort of teachers who would be represented well or poorly by it.

Fine, I'll go read the article now.

Posted by: Jimmy Pongo | Link to this comment | 03-10-11 10:18 AM

32: It doesn't have to be the entire population, and why would you assume it never happens? There are small schools, you know.

Posted by: Eggplant | Link to this comment | 03-10-11 10:18 AM

29.2: I don't disagree, but nothing in the article addresses that point; the woman profiled has only had a single evaluation (she's been on the job for less than three years) and as far as I'm reading nobody discusses what the eventual outcome is for a consistent low performer, and there's no evidence that teaching good students will lead to that outcome.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:19 AM

Further to 36: Right, my kids were in an elementary school with two classes in each grade. For any fourth grade teacher, if both of the third grade teachers happened to be good, or happened to suck, that would have an impact on their entire population of students.

Posted by: LizardBreath | Link to this comment | 03-10-11 10:20 AM

29: Surely they eventually fire people who consistently fail to get tenure? (Maybe not. I guess I'm just used to academia where "didn't get tenure" means "got fired".)

Posted by: essear | Link to this comment | 03-10-11 10:21 AM

What makes you say this? I ask because such assessments seem to presuppose that there's a clear, not to mention relatively unitary, definition of what good teaching is.

I think if you combined:
1. half-dozen teaching evaluations by different colleagues and the principal
2. statistics on how students performed at the beginning of the year
2. statistics on how students performed in classes which build on the class in question (not always available)
3. some objective standardized test on sample topics, so that you can compare across institutions

you could get a decent picture on whether or not a teacher is effectively covering the curriculum.

I have all sorts of doubts about the value of the curriculum, mind you. But I also have wild misgivings about teachers individually deviating from a curriculum, especially if some flexibility is built in, and they deviate beyond that. Too often that means they're imposing religious beliefs, at least around here.

Posted by: heebie-geebie | Link to this comment | 03-10-11 10:21 AM

29: I didn't get your e-mail, no, but I did get about a gazillion others saying what I'm sure yours said: stop fucking spamming me, Von! But the thing is, I don't use my facebook account. I haven't logged into it (or out of it) for at least six months. Which I guess is beside the point. Anyway, I changed my password, because that's what someone who knows things (an assessment expert) told me to do. And then I sued Mark Zuckerberg.

Posted by: Von Wafer | Link to this comment | 03-10-11 10:22 AM

36: because New York City has an extremely large, mobile, heterogeneous population, including many recent immigrants. Maybe on Staten Island somewhere? Even at a small, suburban elementary school it's somewhat difficult for me to picture.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:22 AM

Additionally, teachers may or may not improve from year to year, different teachers draw students from different communities, and I'm sure many other effects make it difficult to get an unbiased measure.

Posted by: Eggplant | Link to this comment | 03-10-11 10:22 AM

there's no evidence that teaching good students will lead to that outcome.

It depends on the exact assessment tool, of course, but it seems to be a risk given that there's a hard cap on how well good students can do, as discussed earlier in the thread.

Posted by: LizardBreath | Link to this comment | 03-10-11 10:22 AM

31: As JP says, it's too late. But given the size of the UC's budget deficit, they might eliminate my department, I guess, which would take care of the problem (me).

Posted by: Von Wafer | Link to this comment | 03-10-11 10:23 AM

38: right, but the fourth grade teachers would likely have to be pretty extraordinary, or remarkably terrible, for it to completely overload the measure (which, remember, has thirty-something terms besides prior increase in test scores). If that's not the case, then sure, it's a shittily designed statistic. But I don't see any reason (based on the article) to believe that to be the case.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:24 AM

43: that seems to pretty directly contradict your prior point about the following-in-the-footsteps-of-genius effect, given that the measure was explicitly designed to account for many of those sorts of factors.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:25 AM

40 might make sense to me, though I think there are some details that need working out. Still, you're now my favorite assessment expert. The problem, though, is that I suspect what you propose would be so expensive to implement that it's a non-starter for most school districts.

Posted by: Von Wafer | Link to this comment | 03-10-11 10:27 AM

for it to completely overload the measure (which, remember, has thirty-something terms besides prior increase in test scores). If that's not the case, then sure, it's a shittily designed statistic

See, I don't know what all the terms in the measure are, but I think we've already been given enough information to conclude that these thirty-something extra terms are useless. If the output of the measure is "I don't know if this teacher is the very worst in the system or right in the middle of the pack", the measure is fucked. I don't need to know any more details to conclude that. It seems plausible that the measure is fucked mostly because this teacher is teaching students already pushing against the ceiling, but if it's fucked more broadly, then it only gets worse to use it across the board.

Posted by: essear | Link to this comment | 03-10-11 10:28 AM

You know something that really throws me about the article? What on earth does 'margin of error' mean? I know what it means when you're talking about taking a statistical sample from a larger population -- you're looking at the risk that you randomly drew a sample that doesn't resemble the population overall. But I can't think of anything that would go into a teacher assessment formula that would be that kind of statistical sample.

It seems to mean something like 'this is the range within which your real effectiveness lies', but how on earth could you determine that, without some way of measuring 'real effectiveness'?

There may be a satisfying answer, but I'm confused.

Posted by: LizardBreath | Link to this comment | 03-10-11 10:30 AM

47: How does it contradict? I'm pointing out that it's unlikely that all teachers are drawing students randomly from the same population. I suppose some of the thirty-something terms could account for all effects I've mentioned (and others I'm not clever enough to come up with) but I'm skeptical.

Posted by: Eggplant | Link to this comment | 03-10-11 10:30 AM

There may be a satisfying answer, but I'm confused.

True of nearly everything for me.

Posted by: Von Wafer | Link to this comment | 03-10-11 10:31 AM

49: why? It provides information. A more generally reliable measure would be better, sure: what is it?

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:32 AM

What we need are lots of twins.

Posted by: Eggplant | Link to this comment | 03-10-11 10:32 AM

I also wonder what these NY state assessments look like; are they reasonable and graded in a consistent way? When and where I was growing up, our state assessments involved lots of essay questions that as far as I can tell were graded by a combination of eyeballing the length, skimming for key words, and flipping a coin. I don't think the grading process allowed enough time for them to actually be read, let alone thought about. (Part of my evidence for the terrible grading practices is that I never landed in the top category on the science test, so, you know, I could just be arrogant and deluded. But these were the only standardized tests I ever took where I consistently failed to end up at the top.)

Posted by: essear | Link to this comment | 03-10-11 10:34 AM

Still, you're now my favorite assessment expert.

Hooray!

The problem, though, is that I suspect what you propose would be so expensive to implement that it's a non-starter for most school districts.

My guess is more cynical: I bet they spend as much on ETS already, and ETS lobbying would prevent any assessment that implies that strict testing alone is not amazing and beautiful and fun.

Posted by: heebie-geebie | Link to this comment | 03-10-11 10:35 AM

But I can't think of anything that would go into a teacher assessment formula that would be that kind of statistical sample.

How well a student tests on any one particular day has sampling error.

Posted by: heebie-geebie | Link to this comment | 03-10-11 10:36 AM

51: look at the diagram shown at the top of the article; the formula takes into account student, school, classroom, and district effects for every student in the class. I assume (based on the way the formula is discussed in the article) that those coefficients are generated in an attempt to account for the variations complications you enumerate.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:37 AM

53: This is a real point of disagreement. If a metric isn't at least fairly reliable, I think it's a mistake to use it at all. I'm going to have to think about what I mean by 'fairly' there, and why, but I'm very convinced.

Okay, while I was typing that, some of what I was worried about jelled -- part of the problem about a metric that has some information, but isn't highly reliable, is that you don't know what's throwing it off. If you know that there's random error in it, you're fine, even if there's lots of random error in it -- just do it enough times, and you'll get something reliable. If you've got systematic biases you haven't identified and don't understand (and presumably if you'd identified and understood them, you could get rid of them) relying on the metric is going to lead to systematic unwanted effects on whatever you're doing, even if there's some real information in there.

Posted by: LizardBreath | Link to this comment | 03-10-11 10:38 AM

58: I'm supposed to read the articles now?

Posted by: Eggplant | Link to this comment | 03-10-11 10:42 AM

57: True,and I hadn't thought of that, but it should cancel out somewhat across the entire class (not that I can do the math as I sit here, but the risk that a kid will do unusually well or poorly will be balanced out by the assumption that other kids in the class will also vary randomly around their actual levels of achievement). The amount of error in the metric as reported seems way too high for that to explain it all.

Posted by: LizardBreath | Link to this comment | 03-10-11 10:42 AM

59: well, first of all, you're throwing out most of quantitative social science there. Second of all, you can have systematic biases in very reliably measures just as easily as you can in unreliable measures.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:42 AM

61: the formula is also designed to deal with missing data, and I assume many of the measures (like, of school effects and so on) are estimates based on existing data, so you're multiplying the unreliability of a single student's measure by many students over many years. I think there are some other complications, too, but haven't thought it through, quite.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:44 AM

How well a student tests on any one particular day has sampling error.
Also, a teacher teaches 150 days in the year, some for like 6 hours per day. Of the 1000 classroom hours or whatever, your observations come from (at the very generous end) 20 of those hours.

Suppose the teacher has exogenous problems on 2 of the 6 days with 3 hours of observations? "HOLY CRAP! THIS TEACHER IS TERRIBLE!"

Posted by: Annelid Gustator | Link to this comment | 03-10-11 10:44 AM

62: This all comes down to 'most', 'fairly' and 'what's the difference between something reliable enough to be worth looking at for informational purposes, and reliable enough to start basing HR policy on'.

Posted by: LizardBreath | Link to this comment | 03-10-11 10:44 AM

Hmm. Bugmenot isn't working for me. Wasn't there some Unfogged username and password?

Posted by: Eggplant | Link to this comment | 03-10-11 10:44 AM

49: why? It provides information. A more generally reliable measure would be better, sure: what is it?

"Somewhere between 0 and 52nd percentile, we think" is really a pretty minimal amount of information.

Now, if what's happening is that teachers who teach students near the ceiling aren't being measured well, it might be that it provides much sharper information on the bulk of teachers. That could be useful.

But, even in that case, you wouldn't want to use it to make definitive decisions about people's careers unless it was giving you accurate information. If it tells you one person is at the 90 +/- 1 percentile, by all means, use that as strong evidence to give them tenure. If it tells you they're at the 7 +/- 2 percentile, get rid of them. But denying someone tenure because it gives you minimal information about them, somewhat skewed to the negative end, when all their students and coworkers think they're great? That's pretty fucked up.

Posted by: essear | Link to this comment | 03-10-11 10:45 AM

True,and I hadn't thought of that, but it should cancel out somewhat across the entire class

Unless it's a factor that's biasing the class - flu bug sweeping school, construction outside, etc.

The amount of error in the metric as reported seems way too high for that to explain it all.

This, definitely. It's so high that you wonder if the journalist doesn't understand what they're trying to explain.

Posted by: heebie-geebie | Link to this comment | 03-10-11 10:45 AM

65: well, unless you believe strongly that 90+% of teachers in the NYC public school system are worthy of tenure after their third year, I'd say the burden of proof is on you to show that it's less reliable than the previous (implicit) measure.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:46 AM

A bunch of parents at the school next door to us are trying to get their kids into a school in the next district over, because that school has much better test scores. But from what I hear from someone who has taught there, the school has good scores because they have a laser-like focus on drilling for tests.

No way I'd send my kid there... I want a school with a laser-like focus on creating well-rounded students.

Posted by: Spike | Link to this comment | 03-10-11 10:49 AM

you're throwing out most of quantitative social science there

Most of quantitative social science is a little more scientific than astrology.

Posted by: apostropher | Link to this comment | 03-10-11 10:49 AM

69: Tenure for 90% of teachers doesn't sound unreasonable to me, without knowing a priori how many of them are awful. Certainly I don't think more than 10% of the teachers I had growing up deserved to be fired, even if they weren't all fantastic. I don't see much benefit to having more turnover, more inexperienced teachers, and less expectation that it's a long-term career, when there don't seem to be hordes of qualified people beating down the doors to take the job.

Posted by: essear | Link to this comment | 03-10-11 10:49 AM

69: What do you think 'worthy of tenure' should mean? AFAIK, 'tenure' for a public school teacher means 'can be fired for cause, can't be fired without cause'. If someone's been doing a job successfully for three years, and their supervisors think they're doing well enough that they want to keep them on the job, I think moving them into the 'can only be fired with cause' category is generally quite reasonable.

Posted by: LizardBreath | Link to this comment | 03-10-11 10:49 AM

67.last: but, again, "denying someone tenure" in this situation really doesn't sound a bit like "denying someone tenure" would mean at the college level. It doesn't mean they get fired, it doesn't mean they won't get tenure in the future, it doesn't mean they get paid less. All it means is that they don't have the job security of tenure for another year at least. It sounds like they are largely using the positive side of the measure; if somebody seems like they're probably at least average, then they'll get tenure. If somebody seems like maybe they aren't, they'll wait and see. If somebody seems like they have been worse than average for many years running, and there have to be layoffs, maybe that will be the people who is laid off, instead of just firing the youngest teachers, as was the practice previously.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:50 AM

90+% of teachers in the NYC public school system are worthy of tenure after their third year

Without knowing what percentage don't make it to year 3, I'm hesitant to grant that statistic much power.

Posted by: apostropher | Link to this comment | 03-10-11 10:51 AM

73: so are you arguing for the status quo? Which, again, means that if there are layoffs they will almost certainly be predominantly among the newest teachers, regardless of competence.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:52 AM

71: well, okay, so you're saying 1. that statistical measures have no place in the classroom and more so, that, 2. statistical measures have no place in evaluations of human behavior?

Posted by: Sifu Tweety | Link to this comment | 03-10-11 10:54 AM

Which, again, means that if there are layoffs they will almost certainly be predominantly among the newest teachers, regardless of competence.

Compared to this sort of thing, I'm for the status quo. To the extent that it doesn't affect a large percentage of teachers, it doesn't fix the problem you're talking about at all: either we have a big percentage of the teachers working at-will, or layoffs will still be of the most junior teachers, with a couple of slightly less junior teachers who had a fluky bad result on this metric.

So if that's the problem we're trying to solve, we're looking at taking a significant percentage of teachers out of the current 'firing only for cause' system, and making them at-will employees. I don't like that. Either this sort of data is good enough to fire someone for cause, because they're a bad teacher, or it's not reliable enough for that and shouldn't be used to make layoff decisions.

Posted by: LizardBreath | Link to this comment | 03-10-11 10:58 AM

77: How about "take confidence intervals seriously and don't use weak statistical evidence as the sole basis for decisions that matter to people"?

Posted by: essear | Link to this comment | 03-10-11 10:59 AM

OK, having now read the article I still have no idea where Sifu got Isaacson as pretty god versus some hypothetical other teacher as fantastic. Unless you assume that Heebie's hypothetical that she's somehow tricking her successful students and impressed colleagues into thinking she's great, but really the test outs her has captured the real truth. I don't buy it.

Not to dismiss all of the sigmas I don't really understand, but this here is a pretty nifty exposé on the essay scoring industry for standardized student tests. Short version: it not just the Department of Education that wants more of those threes to be fours, or fours to be threes.

Posted by: Jimmy Pongo | Link to this comment | 03-10-11 11:00 AM

I'm saying that 1. statistical measures are a poor measure of performance of certain activities (great for free throw accuracy, not so great for teaching ability) and, more so, that 2. the social sciences are sciences only by a very loose definition of the word.

Posted by: apostropher | Link to this comment | 03-10-11 11:01 AM

59: well, first of all, you're throwing out most of quantitative social science there

Is it really true that most quantitative social science does not even rise to the level of "fairly reliable", and if so, what's so bad about throwing it out?

Posted by: nosflow | Link to this comment | 03-10-11 11:03 AM

One problem I have with valued-added and standardized testing methods of teaching evaluation is that they don't measure the job done, which is the act of teaching. You can break down the act of teaching into component parts: creating lesson plans which cover the curriculum, for example; or keeping good classroom control; or answering students' questions. Each of those aspects can be evaluated. If a teacher is assessed to have performed each aspect of h/h job well, then s/he has performed the job of teaching well.

If a teacher performs each aspect of the job satisfactorily, but the students don't improve in their standardized test scores? There could be all kinds of reasons for that, some of them external to the classroom.

And some of the reasons may be internal to the classroom. However, to the extent that they aren't, or can't be incorporated as a component part of a teaching evaluation, they are not identified, and are never communicated to the employee. In other words, a school district that takes action against a teacher (termination, or denial of tenure, or extension of probationary status, or whatever) because students' scores don't improve, even though the teacher's performance is otherwise fine, is taking action against an employee based on something that it has never pointed out to the employee, nor given h/h an opportunity to correct.

I'm not saying it's always a bad thing to make employment decisions based on results, rather than performance. Results-based assessment is standard for lots of jobs: that's how CEOs are evaluated, for example, or investment bankers. But of course the teaching profession is very different from those jobs. CEOs and investment bankers are graded on results because those are high risk, high reward jobs. Teachers are on a set salary schedule, and are paid relatively little for the work that they do.

I mean, imagine you're a teacher. You're evaluated in the standard way, and you're deemed to have great classroom rapport, solid lesson plans, and an expansive knowledge base. Your evaluators don't identify anything you should change. But your Value Added score is repeatedly low. Why is that? Is it because your principal keeps assigning you kids who were ahead of the curve last year, and so show relatively little improvement this year? Is it because there's something about your teaching style that depresses scores? If so, what should you change? How do you improve? You don't know! Now, having been given no guidelines for improvement and no criticisms of your performance, you could be fired, or at least have your probationary period extended, which means that next year your principal can fire you for looking at her cross-eyed. Teaching is a job, and teachers are employees, and the profession needs to attract good people. Making the job less reasonable and less stable is not, to my mind, the right course.

Posted by: jms | Link to this comment | 03-10-11 11:03 AM

Haven't read the whole thread yet, but yeah, teacher assessment is basically impossible. Admin and colleague reviews are based on a tiny slice of observation, and are prone to personality differences, class/race/sex biases, etc. Student feedback? Hilarious. Grade and test results? What a disgusting shameful mess. People talk a lot of shit about wanting to "get those bad teachers outta there" or "reward the few who aren't lazy" or whatever. And I guess that would be nice; in the environment I teach in, there is zero (0) reward for being a good teacher except you might be more likely to keep your job from semester to semester. I try to be a good teacher because otherwise I would hate my life too much to go on living.

I sort of feel like, for public school teachers, the bar needs to be raised on the way in, not once they're already there. I hear nightmare stories from my students in the Ed program about what the attitude is supposed to be toward students, and some of the best future-Ed students I've had, I've encouraged to switch tracks and go into library science, where they might actually get a chance to make a difference in a kid's life. (I would never tell anyone not to become a teacher, but to those who are talking about how being an Ed major is killing them, I say, listen, college is supposed to be the fun, exciting, intellectual part of moving into your career. If you think this sucks, wait until you're at Prison High busting your balls to bore a bunch of underserved teens to death.)

Posted by: AWB | Link to this comment | 03-10-11 11:05 AM

that's how CEOs are evaluated

Hahahahah

Posted by: Annelid Gustator | Link to this comment | 03-10-11 11:05 AM

Just read the first page of that article, but yeah: ETS basically wrote NCLB. Standardized test questions are chosen based on how effectively they produce pretty normal distributions of right answers. They are not chosen based on how well they measure understanding, whatsoever. So it's really really effective to throw in a loopy vocabulary word, or an unexpected math trick that isn't really testing what appears to be the subject of the problem. Etc. It's beyond horrible.

Posted by: heebie-geebie | Link to this comment | 03-10-11 11:06 AM

You do know what the problem is that tenure's intended to solve, right? That in the absence of tenure, there's a serious financial incentive to fire older, more expensive teachers and keep a high turnover of cheap kids churning through the schools.

If I were a school district negotiating a union contract in good faith, and I thought I had a good objective metric for competence, I might try to negotiate an adjustment to seniority based on competence, such that very high-performing junior teachers could move up the ladder to be safe from layoffs. Trying to move to a non-seniority based system across the board just seems as if it would lead to open season on the higher-paid older teachers.

Posted by: LizardBreath | Link to this comment | 03-10-11 11:08 AM

87 to Sifu.

Posted by: LizardBreath | Link to this comment | 03-10-11 11:09 AM

Just to make another point that I think is kind of trivial, but which might be worth making explicit: even if you had an objectively quantifiable way to separate teachers who deserved tenure from those who didn't, and had a statistical test that could tell you with 95% confidence which side of the line a teacher fell on, you would still, in a large school system, end up denying tenure to people who deserve it. If you start trusting a much broader confidence interval, you're going to make a lot more mistakes like this.

Posted by: essear | Link to this comment | 03-10-11 11:09 AM

That in the absence of tenure, there's a serious financial incentive to fire older, more expensive teachers and keep a high turnover of cheap kids churning through the schools.

Now more than ever:

Gov. Rick Perry said Wednesday the state's not to blame if teachers lose their jobs as school districts grapple with the potential loss of billions of state dollars. [...] He said if he were deciding, he'd focus on "non-teaching" staff - which a number of school districts have said wouldn't suffice to meet the cuts. [...] Texas could fire every school superintendent, all principals and assistant principals, every school counselor, every librarian, every school nurse, all cafeteria workers, custodians and bus drivers - all 329,574 non-teacher jobs - and still not save the $11.6 billion in public education cuts.

Posted by: apostropher | Link to this comment | 03-10-11 11:15 AM

I associate these kinds of not-reliable measurement tools with Yglesias and his crappy basketball analysis based on unreliable statistics. To hell with half-assed unreliable quantification.

Posted by: Robert Halford | Link to this comment | 03-10-11 11:37 AM

I've encouraged to switch tracks and go into library science, where they might actually get a chance to make a difference in a kid's life. (I would never tell anyone not to become a teacher, but to those who are talking about how being an Ed major is killing them, I say, listen, college is supposed to be the fun, exciting, intellectual part of moving into your career.

I can think of few educational programs less fun, exciting, or intellectual than library science (|masters of", version). The jobs seem ok, though, if you can get one. But there aren't a whole lot of school library jobs, as far as I know.

Posted by: fake accent | Link to this comment | 03-10-11 11:49 AM

Another not-insignificant issue is the degree to which these types of year-to-year comparisons are based on an implicit assumption that you're dealing with roughly the same group of kids. In most urban districts that I've had anything to do with, that's somewhere between wrongheaded and insane -- you're often dealing with upwards of 30% annual turnover among students.

Posted by: Witt | Link to this comment | 03-10-11 11:50 AM

92 is correct, except the jobs mostly suck too.

Posted by: peep | Link to this comment | 03-10-11 11:53 AM

92: We have a very good MLS program at Public College where I teach, so I feel pretty good about recommending it. They seem to have an excellent track record of getting good jobs for students and setting them up with internships in their chosen field. I wouldn't recommend it if I didn't know people who had gone there and ended up loving their life afterward.

Posted by: AWB | Link to this comment | 03-10-11 12:00 PM

95: I'm sure you are conscientous and thoughtul in your recommendations, AWB.

94 was just gratuitous peepian self-deprecation.

Posted by: peep | Link to this comment | 03-10-11 12:07 PM

96.1 +f

Posted by: peep | Link to this comment | 03-10-11 12:08 PM

This is basically nothing that W. Edwards Deming's game with the red and white beads doesn't cover. I'm not even sure that statistical process control is a methodology that has any place in education, even if it was a valid process control system, which this doesn't appear to be as it's making decisions about people's careers based on results within the system deviance.

they don't measure the job done, which is the act of teaching.

No, they don't. Just imagine if we knew what good teachers actually do to the kids - then we could get the others to do the same things!

Posted by: Alex | Link to this comment | 03-10-11 12:24 PM

So Ms. Isaacson's 7th percentile could actually be as low as zero or as high as the 52nd percentile is surely simply a reflection of the basic statistics of the number of students and the way the test scores apparently work.

Although there are complicated formulas for deriving baseline and "expected" scores, her actual 3.63 not meeting the expected 3.69 and other discussion in the article makes it appear that the actual results *are* just averages (although possibly adjusted a bit I'd guess) of the proficiency levels (1,2,3 or 4) of her students from the Math and ELA Proficiency Standard tests (there is a "scaled score" that averages in the 600s, but it is not mentioned in the article). Since she has 66 students, a difference of .06 (her shortfall) is equivalent to 4 students scoring one level higher (if it were simply the average, and given the fact from the article one student had a 2, her actual distribution would be 43 4s, 22 3s and one 2). Whatever they are using for margin of error, it would appear that in her case it would extend to ~3.70 (52%--just greater than median), which would be plausible.

Posted by: JP Stormcrow | Link to this comment | 03-10-11 12:30 PM

100

87.2: that seems like a reasonable complaint, and it sounds like the answer is yes, you're basically okay with the status quo. None of this particularly has to do with the plight of the woman in the article, or with the validity of the statistical measure in question, but as a general staked-out position it's not something I have a problem with.

82: well, it is certainly the case that it's often impossible to measure complex natural phenomena with terribly high accuracy. If you throw it out you're throwing out any chance of explicitly measuring those things, and relying instead on the hunch that implicit measurements will be better. That seems like a sucker's game to me.

81: I'm not convinced that the division between "effecitvely measured statistically" and "not susceptible to statistical analysis" is all that clear, and I think that kind of blanket statement about social science elides both the variety in social science and the variety in quote-unquote "real" science. Do you think that models of climate change provide a better picture of what's actually happening with the planet's weather than models of (say) attention-driven eye movement do of the perceptual system? Or do you think that climate science (to stick with the example) also isn't real science?

79: that's fine. I haven't seen any particular evidence that they're using this as the sole basis but sure, if they are, I'm agin' it.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 12:31 PM

101

I haven't read the thread, so perhaps this has been mentioned, but the emphasis on measuring teacher effectiveness in order to address educational outcomes is, according to a number of education policy people, somewhat wrongheaded in the first place.

Drum had this chart not long ago, demonstrating quite clearly that educational level of parents is a much more significant factor in student achievement. Very interesting! I also heard an hour-long piece on a DC public radio station recently emphasizing that the socioeconomic milieu of students (simply the neighborhood they live in) is the #2 factor. Their list, in order of that which affects a child's educational outcome, went:

1. Educational level of parent (usually mother)
2. Socioeconomic setting
3. Teacher effectiveness/rapport

The policy makers featured in the NPR piece were pushing for an initiative to allow school voucher monies to be used as rent vouchers, to allow parents to move to a higher SES area, which on a very modest level just means a more secure neighborhood.

That we're skipping past these issues and placing nearly the entire burden of educational outcome on teachers, who are often dealing with very much less than optimal classroom situations from year to year (as Witt says in 93), is silly.

Posted by: parsimon | Link to this comment | 03-10-11 12:33 PM

102

101: I think that's probably absolutely correct about the ranking of factors that affect student performance. But the only one of those factors that the schools can really influence is what happens in the classroom, which explains the focus on it.

Posted by: LizardBreath | Link to this comment | 03-10-11 12:37 PM

103

So I finally read the article and am now trying to make sense of the formula. It looks to me like to tries to fit the test score improvement for each student as a sum of contributions from district, school, classroom, and something called "Student Characteristics". If I understand it correctly, that means teachers scores are likely heavily dependent on how well the other teachers at their school do.

Posted by: Eggplant | Link to this comment | 03-10-11 12:38 PM

104

100 generally: I'm getting the sense that you're conflating skepticism about the practical usefulness of a metric with a very large degree of uncertainty in making employment decisions, with skepticism about using statistical tools in science at all. I don't think there are many people on the latter bench.

Posted by: LizardBreath | Link to this comment | 03-10-11 12:39 PM

105

am now trying to make sense of the formula.

To the extent that you can walk us through it, that'd be great -- I looked at it and didn't really know where to start without definitions for all the variables and so on.

Posted by: LizardBreath | Link to this comment | 03-10-11 12:41 PM

106

104: well, my initial points could be summarized thus: 1. nothing in the article tells us whether the metric is good or bad, 2. it is perfectly possible -- even likely -- that a well-thought-out tenure system would result in this woman not getting tenure her first year. That broadened slightly into 3. there's no intrinsic reason why you can't get useful information about something like teacher performance out of a well-constructed model and 4. even if that model has very big error bars, it can still be meaningful. The discussion has broadened somewhat past that, and I think people have been taking me to defend the actual implementation of the system, which I'm not, particularly; I'm just saying that there's nothing in the article which would tell us that it was bad or working in a non-optimal way. It's possible that I've gotten a bit beyond that in the course of things, but those are basically the points I would contend.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 12:44 PM

107

106: That's all fair, I'd just want to leaven it with 5. In the current political climate, there's a lot of energy around removing tenure protections and making it easier to fire teachers, so there's reason to be skeptical about a metric not presented comprehensibly that serves that purpose.

Posted by: LizardBreath | Link to this comment | 03-10-11 12:49 PM

108

I'm just saying that there's nothing in the article which would tell us that it was bad or working in a non-optimal way.

Well, that it identifies an apparently (by ordinary eyeballing methods) very successful teacher as awful suggests, while certainly not conclusively establishing, that something's wrong with it.

Posted by: LizardBreath | Link to this comment | 03-10-11 12:51 PM

109

105: Well, this is really just uneducated guesswork, but I think I_C, I_S, and I_D are just arrays with ones and zeros to indicate which district, school, and class a given student is in (eg, I_Dil is 1 if student <i> is in district <l>). The various α's are similarily sparse arrays whose non-zero values are what they are trying to estimate.

Posted by: Eggplant | Link to this comment | 03-10-11 12:52 PM

110

102: But the only one of those factors that the schools can really influence is what happens in the classroom, which explains the focus on it.

It's not the only factor that public policy can influence, however. And it's not as though the emphasis on teacher performance is an in-school thing; it's driven by public policy. In other words, the schools have been handed the situation as is, and told that it's their job to provide an outcome that cannot be provided in the absence of a concurrent policy approach that addresses the other, in fact more important, factors.

Posted by: parsimon | Link to this comment | 03-10-11 12:52 PM

111

And, of course, I could be completely wrong, knowing nothing about statistics in social sciences.

Posted by: Eggplant | Link to this comment | 03-10-11 12:55 PM

112

107: well, you know the political climate in New York City better than I do, so if it seems to you like that's what's happening then I can hardly make an informed counter-argument. As far as the metric being presented comprehensibly, though, I don't think we have any way to know whether or not that's happening; the article presents it as some kind of incomprehensible magic, but I think it's clear that's some combination of ignorance on the reporter's part and a specific agenda they're pushing. Certainly Andrew Gelman didn't seem bowled over by the mysterious incomprehensibility of it all.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 12:56 PM

113

well, you know the political climate in New York City better than I do, so if it seems to you like that's what's happening then I can hardly make an informed counter-argument.

This has been the basis of pretty much everything I've heard from the right-wing about education for years -- we need to be able to fire teachers more easily. Bloomberg's making a push for it in NY now, but it's been nationwide.

That doesn't mean anything about this metric specifically, but it means I'm looking at it carefully.

Posted by: LizardBreath | Link to this comment | 03-10-11 12:59 PM

114

108: does it? They establish that the teacher shows up to work early and stays late, that the teacher has two Ivy League degrees (whatever that means), that the principal of the school likes the teacher, and that one of the teacher's students likes her. None of those things strikes me as particularly dispositive. On the other hand, we have a (possibly unreliable) metric which says that she was unable to bring as many of her students as expected up to a level above "proficient" on a standardized test we know nothing about. I don't think that we know anything at all about whether she's a good teacher unless the criteria are 1. early riser, 2. went to good schools and 3. not universally hated.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 1:00 PM

115

I guess all that stuff about students being retained or new to the city must be hidden in X_i.

Posted by: Eggplant | Link to this comment | 03-10-11 1:05 PM

116

Certainly Andrew Gelman didn't seem bowled over by the mysterious incomprehensibility of it all.

Although, the article says, essentially, that you'd think these numbers were average scores, but you'd be wrong, they're not. And then Gelman works them out on the basis of the assumption that they're average scores. He may be right, and the article may be wrong, but I'm not seeing what he did other than guessing the article was wrong.

Posted by: LizardBreath | Link to this comment | 03-10-11 1:05 PM

117

116: well, I think his argument is that average scores can explain a lot of it, which would indicate that the coefficients for the other terms were quite small. It doesn't mean they're not there, or that they wouldn't be large for certain students, just that they aren't that important to the score in her particular case.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 1:07 PM

118

114: Well, you know her principal doesn't just like her personally, but specifically says she's an above average teacher. You know her students largely get into very good high schools (presumably they're that kind of kid anyway, but she doesn't seem to have damaged them all that much). You know that whatever type of teacher she is, she's not slacking, which cuts out a large chunk of underperformers.

None of that proves she's not a terrible teacher, but I think it's enough to make you look twice at a metric that says she is.

Posted by: LizardBreath | Link to this comment | 03-10-11 1:09 PM

119

117: But he doesn't have the raw data -- all of his numbers are guesses. He could be way off on the raw data and the coefficients for the other terms could be significant.

Posted by: LizardBreath | Link to this comment | 03-10-11 1:11 PM

120

Well, you know her principal doesn't just like her personally, but specifically says she's an above average teacher.

Yes, and gives specifics. It seems clear that the principal disagrees with the conclusion of the formula. So that's one person, who presumably has a lot of access to the woman's teaching, but with unknown biases.

You know her students largely get into very good high schools (presumably they're that kind of kid anyway, but she doesn't seem to have damaged them all that much)

Managing to not fuck up a bunch of students who have already gained admission to a selective school does not strike me as particualrly indicative of anything.

You know that whatever type of teacher she is, she's not slacking, which cuts out a large chunk of underperformers.

Possibly the case, but I've known lots of people who worked and worked and worked and still were not particularly good at what they did.

None of that proves she's not a terrible teacher, but I think it's enough to make you look twice at a metric that says she is.

Which is why something like Gelman's analysis is useful. If that analysis is correct in finding that improvement in average test score is a big chunk of where the score comes from, then it seems likely that the metric is biased against teachers who do a reasonably good job with excellent students (and in favor of teachers who do a reasonably good job with poor-to-average students). Which puts us pretty much back at square one, where (given the relative ease of recruiting teachers for more elite schools) I still don't understand why that would be a bad thing for the NYC school system.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 1:16 PM

121

Sifu, I think I tend to dispute "even if that model has very big error bars, it can still be meaningful." Maybe we're disagreeing about what "meaningful" means.

Posted by: essear | Link to this comment | 03-10-11 1:16 PM

122

121: even if a model has very big error bars, it can provide information over and above the information you would have without a model.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 1:17 PM

123

None of that proves she's not a terrible teacher, but I think it's enough to make you look twice at a metric that says she is.

On the other hand, I assume the journalist hunted down the teacher that best exemplified the widest gulf possible between appearance of being a quality teacher, and percentile performance.

So she is probably better than 7th percentile, but she's also possibly not as great as her colleagues think.

Posted by: heebie-geebie | Link to this comment | 03-10-11 1:18 PM

124

even if that model has very big error bars, it can still be meaningful.

I'm sure that's true, but so often when these kinds of questions come up there's a failure to account for the human, and particularly the bureaucratic, tendency to want to reduce everything to a simple metric for the purposes of convenience. But she got a 3.69! It's right there in the numbers! No more thought needed! It's so incredibly tempting for a large organization to put way more weight behind an quantifiable metric used for ranking purposes than it probably should -- as a practical matter, few human organizations are really able to say "this provides one piece of additional information beyond what we'd know otherwise, but needs to be heavily leavened with qualitative analysis and tells something but not all that much." IME in the real world decision makers, who don't understand statistics at all, just look at the ranking list of numbers, say, huh, seventh percentile is bad, and I'm going home for the day, and not pursue things any further. This is particularly scary in the employment context.

I should probably shut up about this topic as it involves math.

Posted by: Robert Halford | Link to this comment | 03-10-11 1:23 PM

125

Maybe I'm misunderstanding Gelman's analysis. I don't see him saying anything about how the model predicted that her kids should have achieved (is it a simple average of all their test scores?) a 3.69. All he did was say that if we are talking a simple average, then the 3.63 they got probably broke down like so, and if four kids had moved up from a three to a four, then she would have met her goal. That doesn't seem to tell us anything at all about whether the goal was set sensibly.

Posted by: LizardBreath | Link to this comment | 03-10-11 1:25 PM

126

124: well, sure. Previously, the quantifiable metric was "who has been here longer?" This is presumably a more sophisticated metric than that, which potentially raises problems (due to the involvement of human decision makers) but also potentially provides more information.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 1:26 PM

127

It's kind of like hunting down the most egregious example of someone being sentenced to the death penalty when he or she didn't deserve it: just because you can find a case or two of injustice doesn't mean the whole system is messed up.

I ban myself.

Posted by: parsimon | Link to this comment | 03-10-11 1:27 PM

128

It's kind of like hunting down the most egregious example of someone being sentenced to the death penalty when he or she didn't deserve it

Good call on the banning.

Posted by: heebie-geebie | Link to this comment | 03-10-11 1:28 PM

129

I guess I should have read Gelman's link before doing my 99. Agree with Tweety that it can add some value even with the big error bars, but it appears to be way overemphasized (and as others have noted this case was surely selected since it was rather extreme).

Posted by: JP Stormcrow | Link to this comment | 03-10-11 1:28 PM

130

125: I see. It doesn't tell us anything about that, no. It just tells us that the metric is not likely based on a bunch of baroque variables that are out of her control.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 1:29 PM

131

124 I should probably shut up about this topic as it involves math.

No, you shouldn't. You're making exactly the right point.

If, before the existence of this model, no one would have looked at the numbers and said "huh, most of the teachers with similar students would have had 47 getting a 4, and she had 43 getting a 4 [or whatever] -- she must be a really bad teacher!", then just because your goofy statistical model tells you that you should think this outcome is really bad, that doesn't mean you should believe it.

(And, I would argue, the model in this case didn't actually tell you that -- the huge confidence interval means the model basically threw up its hands and said "hell if I know, I don't feel great about this teacher, but I don't know if she's really bad or not".)

Posted by: essear | Link to this comment | 03-10-11 1:30 PM

132

Previously, the quantifiable metric was "who has been here longer?"

Well, that metric is 100% accurate in describing who has been there the longest, and doesn't purport to give (extremely unreliable) information about one's quality as a teacher. The potential for abuse is quite different.

Posted by: Robert Halford | Link to this comment | 03-10-11 1:30 PM

133

130: The "expectations" still might be*--but in the end a teacher's score comes down to moving 4 versus 8 versus 12 kids out of 60 or so up a notch on the test.

*Would love to see a distribution of the expected change of score across a large population of the teachers--hers was .12 (~12% of her kids), was that typical?

Posted by: JP Stormcrow | Link to this comment | 03-10-11 1:32 PM

134

132: that certainly seems fair. On the other hand, there's a case to be made that laying off teachers without giving any consideration to whether or not those teachers are good at their jobs, while also strictly fair, may not be an optimal solution.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 1:33 PM

135

Well, that metric is 100% accurate in describing who has been there the longest

So the aliens would have you believe.

Posted by: Annelid Gustator | Link to this comment | 03-10-11 1:34 PM

136

It just tells us that the metric is not likely based on a bunch of baroque variables that are out of her control.
But it looks like the metric is based (in part) on variables that are out of her control (specifically, the performance of other students at her school and the performance of other schools in her district). They try to model those effects, but those massive error bars suggest they can't.

Posted by: Eggplant | Link to this comment | 03-10-11 1:35 PM

137

124: This is presumably a more sophisticated metric than that, which potentially raises problems (due to the involvement of human decision makers) but also potentially provides more information.

This comes back to what I was saying before about making teachers at-will employees. No one's suggesting laying off on the basis of this 'more sophisticated' metric. They're suggesting using this metric to keep teachers in the category that may be fired whimsically rather than for cause.

Posted by: LizardBreath | Link to this comment | 03-10-11 1:35 PM

138

131: what the model did tell you is that it is extremely confident that she's not one of the best teachers in [ whatever set they're calculating across ], according to the standards they've set. I still don't understand how that isn't useful information.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 1:35 PM

139

The problem seems to be that the difference between the 7th percentile and the 75th percentile is miniscule. This probably is probably because a) the test has only four outcomes of scores, and b) there really isn't that much variation among effectiveness of the teachers.

This means that percentile is probably the wrong way to draw the cut-off for terrible teachers. That doesn't mean that there aren't teachers whose students underperform year after year, and that that shows up on standardized tests, and can be used to ground that the teacher needs extra supervision and training.

Posted by: heebie-geebie | Link to this comment | 03-10-11 1:36 PM

140

130: If the goal is set on the basis of a bunch of baroque variables out of her control, which Gelman says nothing about, then whether she meets it is likewise based on those variables.

Posted by: LizardBreath | Link to this comment | 03-10-11 1:36 PM

141

136: well, sort of, but what Gelman's quick calculations indicate is that the coefficients of all those other elements are likely to be, on average, very small. And I don't think the size of the error bars comes from problems modeling those (again, likely very small) coefficients. They probably come from variability in doing predictive modeling of small, highly variable populations based on not very much data.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 1:38 PM

142

Aaand I really should go work.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 1:39 PM

143

132: Well, that metric is 100% accurate in describing who has been there the longest

I'm not so sure we can even say that. I remember several of my teachers complaining about some weird loophole in the MPS teachers' contract which meant that their pre-1970 years teaching in Wisconsin were not applicable to their seniority, while younger teachers who had post-1970 Wisconsin time did get to count that time toward retirement.

What confuses me about all of the pressure for standardized evaluation of teachers, and a rigorous adherence to such evaluations when it comes to apportioning benefits or compensation or whatever, is that we really don't see this demand for most other classes of employees. Not for CEOs, not for Army officers, not for garbage collectors, not for doctors -- it seems like teachers are always imagined to be so slippery when it comes to evaluation methodologies that only the most draconian and instrumental ones have even a chance of succeeding. And yet no one is ever called out in the national political scene for making this assertion.

I've had bad teachers and very, very good teachers. Apart from the ones who couldn't keep their hands off the merchandise, it seemed to me like most of what places a teacher along that continuum is based on their affective teaching style, and their overall pedagogical method. You can mandate certain methods of course, and check to see whether they're being followed fairly easily. But to gauge that affective quality, it seems like you just need to have someone periodically checking in with teachers and students. Of course, that layer of administration is always the most vulnerable to budget cuts (after the really invisible stuff like curriculum support), so by the time we get to this point, it's usually way too late.

Posted by: Natilo Paennim | Link to this comment | 03-10-11 1:42 PM

144

The evaluation used in some of the districts around here (including the one my wife teaches in) is JPAS. The focus is on classroom management, instruction techniques, and interaction with the students. If you're really bored there's a link to a 155 page PDF on the system on that page. From what I've seen it seems pretty reasonable.

Posted by: gswift | Link to this comment | 03-10-11 1:43 PM

145

140 gets it right.

Posted by: parsimon | Link to this comment | 03-10-11 1:43 PM

146

Am I understanding correctly that the student results are being scored as 4, 3, 2, or 1? Instead of using the raw scores from the test which would be much more precise?

That seems so obviously stupid to me, I think I must be misunderstanding.

Posted by: peep | Link to this comment | 03-10-11 1:44 PM

147

is that we really don't see this demand for most other classes of employees.

I don't think this is true. The assessment bullshit all originated in the private sector. The public isn't demanding it, but the people with vested interests in the company do.

Posted by: heebie-geebie | Link to this comment | 03-10-11 1:45 PM

148

146: If it's anything like the state tests I had to take in school, the trouble is that a significant part of the score comes from essay questions, which are graded as 1, 2, 3, or 4. So there really isn't a useful fine-grained output. (Aside from the multiple-choice section, but I think that was less important for the final grade.)

Posted by: essear | Link to this comment | 03-10-11 1:48 PM

149

I really don't worry about bad teachers. I do worry about rogue school districts, however. You can have whole school districts where, from top down, everyone is invested in teaching Evangelical Christianity. If everything is kept local and observational, then that is going to be unchallenged forever.

Posted by: heebie-geebie | Link to this comment | 03-10-11 1:48 PM

150

Of course, you can have rogue states which go ahead and just mandate that fundamentalism be taught as truth.

Posted by: heebie-geebie | Link to this comment | 03-10-11 1:50 PM

151

147: But the places it originated in the private sector are things like telemarketing operations. Where it would still be somewhat suspect. For teaching, you're talking about a pretty large array of variables, that, in my opinion, suffer from being reduced to test scores, or even worse, abstracts of test scores.

Posted by: Natilo Paennim | Link to this comment | 03-10-11 1:52 PM

152

141.1: I don't think that's true. It's entirely possible for the sum of multiple variables to depend heavily on each.

Posted by: Eggplant | Link to this comment | 03-10-11 1:52 PM

153

148: I had forgotten it could be an essay test. Of course, then the whole thing is even more subjective.

Posted by: peep | Link to this comment | 03-10-11 1:53 PM

154

IOW, all Gelman shows is that her score can change based on a small number of her students. It's entirely possible that her score can significantly change based on a fluctuation in the performance of students in her school.

Posted by: Eggplant | Link to this comment | 03-10-11 1:56 PM

155

I'm a total inconsistent trainwreck when it comes to grading essays (under much, much lower stakes conditions.)

Posted by: heebie-geebie | Link to this comment | 03-10-11 1:56 PM

156

155: We require consistency in our trainwrecks!

Posted by: peep | Link to this comment | 03-10-11 2:00 PM

157

WE WILL MAKE YOUR TRAINS WRECK ON TIME!

Posted by: OPINIONATED FASCISTS | Link to this comment | 03-10-11 2:02 PM

158

Here is a FAQ for teacher's on the reports (not sure it is the most recent, however). I assumed it was 1,2,3,4 only, however,

The predicted score, actual score, and value-added data are in proficiency ratings, on a continuum from 1.00 to 4.50. A proficiency rating of 1.00 corresponds to the lowest score a student in Performance Level 1 can attain on the state ELA and Mathematics tests. A proficiency rating of 1.99 correspondents to the highest score a student can attain and still be at Performance Level 1. A proficiency rating of 4.50 corresponds to the highest score that can be attained on the test.

The following factors were used to calculate each student's predicted score.
Student Characteristics
Prior year ELA score
Prior year Math score
Free or reduced price lunch
Special Education (differentiated by recommended services)
English Language Learner status
Number of suspensions (in prior year)
Number of absences (in prior year)
Student retained in grade (before prior year)
Attended summer school
New to school
Ethnicity
Gender

Classroom Characteristics
Average prior year ELA
Average prior year Math score
Percent free/reduced lunch
Percent special education (differentiated by recommended services)
Percent English Language Learner (ELL) status
Average number of suspensions (in prior-year)
Average number of absences
Percent of students retained in grade (before prior year)
Percent attended summer school
Percent new to school
Percent by ethnicity
Percent by gender
Class size

Posted by: JP Stormcrow | Link to this comment | 03-10-11 2:02 PM

159

158: So you'd be better off with a classroom of low-performing students who show a marginal improvement than with high-performers who stay at about the same high-performing level?

Posted by: Natilo Paennim | Link to this comment | 03-10-11 2:04 PM

160

Although it looks like 159 is a bizarre side-effect, 158 is pretty well thought-out list of things where you can easily measure the statistical consequence on test scores and compensate for it in a formula.

Posted by: heebie-geebie | Link to this comment | 03-10-11 2:08 PM

161

Re: The FAQ: it describes 2008-09 but not last year's so there may be some differences.

Here's one freaking interesting tidbit:

Q:New York State tests are given in January (ELA) and March (Math). How is this handled?

A:In the Teacher Data Reports, student achievement on New York State tests is attributed to the teacher in the school year of the test.

She was ELA, so January.

Posted by: JP Stormcrow | Link to this comment | 03-10-11 2:08 PM

162

The problem isn't with the complexity of the model or the applicability/weighting of the covariates. It's that the raw data is all standardized test scores, which aren't even particularly good measures of student achievement, let alone educator effectiveness.

Posted by: apostropher | Link to this comment | 03-10-11 2:09 PM

163

Aren't January and March in the same school year?

Posted by: heebie-geebie | Link to this comment | 03-10-11 2:10 PM

164

161: So, vacation starts in February, since anything the students learn after that will be credited to next year's teacher.

Posted by: peep | Link to this comment | 03-10-11 2:11 PM

165

164: Aha! So it's in the teacher's interest to fill their students' heads with nonsense after February.

Posted by: Eggplant | Link to this comment | 03-10-11 2:12 PM

166

158 is more comprehensive than I thought it would be.

I bow to your google fu. I spent about 15 minutes looking for something like that.

Posted by: lemmy caution | Link to this comment | 03-10-11 2:12 PM

167

162: While I'm not crazy (to put it mildly) about the focus on standardized tests, they are about the only quantifiable way we have to measure student performance. I don't think relying on test scores is a problem in itself, I just wish it didn't end up distorting the classroom environment so much.

Posted by: LizardBreath | Link to this comment | 03-10-11 2:14 PM

168

Also:
Throughout the report, teachers are compared to only to 'Peer Teachers.' Peer teachers are defined as:

* Teachers who teach the same grade and subject area and
* Teachers who have similar overall levels of experience. The teacher experience categories are:
o 1 year
o 2 years
o 3 years
o More than 3 years

Posted by: JP Stormcrow | Link to this comment | 03-10-11 2:15 PM

169

It's that the raw data is all standardized test scores, which aren't even particularly good measures of student achievement, let alone educator effectiveness.

No, I agree. I'd be appalled if it led to someone's termination in the face of all contrary evidence that they were a great teacher. Nevertheless, observational/comraderie/anecdotes that someone is a great teacher also should be taken with a grain of salt.

I think there is a role for state (or better, national) government to play in supervising crazy school districts, and I don't know how you do it that circumvents good old boy networks, except by accessing the students via testing.

Posted by: heebie-geebie | Link to this comment | 03-10-11 2:16 PM

170

167: Have you read the NY Post expose of New York state testing? (via Daily Howler) http://www.nypost.com/p/news/opinion/opedcolumnists/new_york_school_testing_con_rTZb2QqrcKue5gjQ2UtktM

Do you buy it?

Posted by: peep | Link to this comment | 03-10-11 2:17 PM

171

Further to 168, While predicted score, actual score, and value-added result are reported in proficiency ratings and are not adjusted for experience, the percentiles compare a teacher's results to the results of other teachers in the same experience category.

So I suspect that some of the experience categories might not have a large population and this might add to the error bars on percentiles in addition to how few or many students you have (which is all I thought of).

Posted by: JP Stormcrow | Link to this comment | 03-10-11 2:21 PM

172

only quantifiable way we have to measure student performance

True, and I'm not advocating doing away with them. I'm saying that we all know that those scores are imperfect measures of the students, so they have an even more abstract correlation for the teachers of the students. And as Tweety noted, the data here are coming from "small, highly variable populations". Basing employment categories on something like that is just nuts.

Posted by: apostropher | Link to this comment | 03-10-11 2:23 PM

173

170: Oh, yeah, that absolutely. I don't think there's anything necessarily wrong with standardized tests in principle. In practice, they're usually all fucked up.

Posted by: LizardBreath | Link to this comment | 03-10-11 2:26 PM

174

And specifically, IIRC NY has had one of those periods where student performance on state tests has been getting better and better, while their performance on national tests has stayed right in the same place. I'd have to poke around to check my memory, but I'm pretty sure the state's been gaming its own tests to look good.

Posted by: LizardBreath | Link to this comment | 03-10-11 2:29 PM

175

173: Thanks! I sometimes wonder if I'm wrong to rely on Bob Somerby.

My general sense is that parsimon's 101 captures the dilemma of the teachers and administrators.
They are being ask to do something impossible, so they respond in the only reasonable way by figuring out different ways of cheating.

Posted by: peep | Link to this comment | 03-10-11 2:31 PM

176

Here is a link to the NYC (new this year) "Tenure Decision Making Framework". Here is some of the introductory verbiage:

For too long, we have granted the same tenure distinction to our most effective teachers as we have to our least effective. Along the way, we have forgotten that tenure is actually a high honor: a commitment for life, awarded to those who have demonstrated they can perform at a high level for the duration of a career. Our current approach demeans the teaching profession and does nothing to help our kids. From now on, only teachers who demonstrate significant professional skill and meaningful, positive impact on student learning will receive lifetime employment.

Posted by: JP Stormcrow | Link to this comment | 03-10-11 2:33 PM

177

Along the way, we have forgotten that tenure is actually a high honor: a commitment for life, awarded to those who have demonstrated they can perform at a high level for the duration of a career.

Unmitigated crap. All tenure is, for a school teacher, is the right not to be fired except for cause.

Posted by: LizardBreath | Link to this comment | 03-10-11 2:39 PM

178

174: Here's the details of the tests. Also press release from last year''s results release--it appears that they raised the bar on proficiency levels.

Posted by: JP Stormcrow | Link to this comment | 03-10-11 2:42 PM

179

177: Yes, I was thinking of it must feel like as a teacher logging into that website because you need to find out some basic information on this important aspect of your freaking job, and you get confronted with a load of crap like that.

Here are some concerns about the teacher evaluation reports from someone at The Earth School (not sure if they are part of the system or not?). Covers some of the topics mentioned here with a bit more data (but still mostly anecdotal).

Posted by: JP Stormcrow | Link to this comment | 03-10-11 4:35 PM

180

"The Earth School"? You know, it might be a perfectly good school, but I can't help thinking that anyone who was involved in naming that place must be incredibly pompous and annoying.

Posted by: essear | Link to this comment | 03-10-11 4:46 PM

181

179:

The DOE's technical advisers for the TDRs warned they should not be used to judge teacher performance. Not only did these advisers refuse to endorse "any particular use [of the model] for accountability, promotion or tenure" of teachers, they warned that "Test scores capture only one dimension of teacher effectiveness, and ... are not intended as a summary measure of teacher performance".

I'm curious to know if that refusal to endorse is because the score is one-dimensional or if they have other reasons.

Posted by: Eggplant | Link to this comment | 03-10-11 4:50 PM

182

181: Not sure. Here is a Times article from 2008 when it was being introduced more broadly. Something has changed since:

"They won't be used in tenure determinations or the annual rating process," the memo said. "Many of you have told us how useful it would be to better understand how your efforts are influencing student progress."

and

The State Legislature this spring prohibited the use of student test scores in teacher tenure decisions.

Posted by: JP Stormcrow | Link to this comment | 03-10-11 4:58 PM

183

180: Why do you hate hippies, essear? Is it just a reflexive thing? I doubt that it's better that the school be named after a wealthy benefactor.

Gosh, I seem to be grumpy.

Posted by: parsimon | Link to this comment | 03-10-11 6:56 PM

184

I have some general discussion of value added models on my blog. Short version, this is a hard problem because teachers are not the biggest influence on student achievement so to rank teachers properly you have to try to adjust for all the other things that matter more (or about as much).

Posted by: James B. Shearer | Link to this comment | 03-10-11 7:40 PM

185

The blockquoted bit is appalling. If your statistical measure is that unreliable, you shouldn't be making important decisions like tenure based on it. Either the reporter is getting something very wrong, or this is just inexcusable.

The quoted range sounds plausible to me (and is probably neglecting model error).

Posted by: James B. Shearer | Link to this comment | 03-10-11 7:42 PM

186

The Earth is the wealthiest benefactor of *all* Parsimon. So there.

Posted by: Turgid Jacobian | Link to this comment | 03-10-11 7:43 PM

187

186: Then it's totally fine to have The Earth School. Schools named after wealthy benefactors are clearly pompous and annoying, as essear said, so I guess that's his objection.

Posted by: parsimon | Link to this comment | 03-10-11 7:53 PM

188

Can I at least complain about the "The" without being accused of hippie-hating?

Posted by: essear | Link to this comment | 03-10-11 8:09 PM

189

You are comparing the student's actual results with the predicted results. But the predicted results have an error term (the last term in the formula). This allows for the fact the predictions are not perfect. Averaging over a bunch of students will reduce the random componet of this error but not eliminate it (particularly since 60 is not an especially large sample). Suppose for example all the students have a 50-50 chance of being a 3 or a 4. Then the average would be 3.5 but with 60 students 3.4 (or lower) or 3.6 (or higher) would not be all that unexpected. This would correspond to 24 (or fewer) or 36 (or more) heads in 60 coin tosses.

Posted by: James B. Shearer | Link to this comment | 03-10-11 8:09 PM

190

How does it contradict? I'm pointing out that it's unlikely that all teachers are drawing students randomly from the same population. I suppose some of the thirty-something terms could account for all effects I've mentioned (and others I'm not clever enough to come up with) but I'm skeptical.

The models generally assume that student assignment is random with respect to all unmodeled variables. This is more because this is convenient than because it is plausible.

Posted by: James B. Shearer | Link to this comment | 03-10-11 8:13 PM

191

188: Can I at least complain about the "The" without being accused of hippie-hating?

Well, okay. Though you'll have to say whether The New School (for Social Research) is okay.

And while I don't have a huge stake in this, I don't think I'm accusing you of hippie-hating as much as identifying hippie-hating. Or rather, that's why I asked. But really, it's fine. I'm pretty sure hippie-scorning counts as normal now.

Posted by: parsimon | Link to this comment | 03-10-11 8:19 PM

192

Honestly, I think the problem here is equation of 'tenure' with 'lifetime job'. If we've got a tool that allows us to identify a teacher that genuinely sucks, and it's reliable enough to be used to deny tenure, it should be treated reliable enough to fire for cause, which you can do with a tenured teacher.

For better or worse this is entirely contrary to how tenure systems work. It is always easier to keep tenure than get tenure in the first place.

Posted by: James B. Shearer | Link to this comment | 03-10-11 8:23 PM

193

184: the outline of the problem reminded me a bit of VORP in sabermetrics; without understanding that very well either, I wonder how similar they are.

Posted by: Sifu Tweety | Link to this comment | 03-10-11 8:30 PM

194

This, definitely. It's so high that you wonder if the journalist doesn't understand what they're trying to explain.

The error is high because the signal (the effect teachers have on student achievement) is low compared to the noise (random variation and all the factors like student IQ and parental income that are more important (or about as important) as teachers). What is harder to explain is why so much faith is being placed in such a flawed measure.

Posted by: James B. Shearer | Link to this comment | 03-10-11 8:31 PM

195

Well, okay. Though you'll have to say whether The New School (for Social Research) is okay.

Given that most of us no longer think of women's suffrage as "new", I think they maybe should have reconsidered the name at some point.

Posted by: essear | Link to this comment | 03-10-11 8:35 PM

196

193

the outline of the problem reminded me a bit of VORP in sabermetrics; without understanding that very well either, I wonder how similar they are.

Evaluating teachers is like trying to evaluate baseball managers (or hitting coaches). This is much harder than evaluating the players or the students.

Posted by: James B. Shearer | Link to this comment | 03-10-11 8:39 PM

197

105: Now the name just indicates the place, which we know of, which produces interesting work, and which is called The New School, no particular strings attached. Or so I thought.

Posted by: parsimon | Link to this comment | 03-10-11 8:40 PM

198

Maybe they could rename it "The ν School". Or "The Nüskool".

Posted by: essear | Link to this comment | 03-10-11 8:42 PM

199

The place formerly known as The New School. Design a marvelous, or wacky, symbol for it. Though that might be too unconventional, which would be bad.

Posted by: parsimon | Link to this comment | 03-10-11 8:46 PM

200

This is a bit misleading as tenure and salary scales where younger workers are underpaid and older workers are overpaid go together. Switching to a system in which everybody is paid the same and there is no tenure wouldn't necessarily save money.

Posted by: James B. Shearer | Link to this comment | 03-10-11 8:51 PM

201

Just to make another point that I think is kind of trivial, but which might be worth making explicit: even if you had an objectively quantifiable way to separate teachers who deserved tenure from those who didn't, and had a statistical test that could tell you with 95% confidence which side of the line a teacher fell on, you would still, in a large school system, end up denying tenure to people who deserve it. If you start trusting a much broader confidence interval, you're going to make a lot more mistakes like this.

Any system that doesn't give everybody tenure will deny tenure to some people that deserve it. A 5% error rate is probably a lot lower than the current rate. It may be better that 20 guilty men go free than one innocent man be convicted but I don't think it is better to hire 20 bad teachers than not hire one good teacher.

Posted by: James B. Shearer | Link to this comment | 03-10-11 8:56 PM

202

100

well, it is certainly the case that it's often impossible to measure complex natural phenomena with terribly high accuracy. If you throw it out you're throwing out any chance of explicitly measuring those things, and relying instead on the hunch that implicit measurements will be better. That seems like a sucker's game to me.

Value added models have more power when looking at questions like are teachers with master's degrees more effective than teachers without master's degrees (teacher's with additional educational credentials are often paid more solely on that basis) where you can average over thousands of classrooms. However even in such cases it is hard to find significant effects. So using these models to evaluate individual teachers seems like wishful thinking.

Posted by: James B. Shearer | Link to this comment | 03-10-11 9:16 PM

203

102

I think that's probably absolutely correct about the ranking of factors that affect student performance. But the only one of those factors that the schools can really influence is what happens in the classroom, which explains the focus on it.

But this doesn't explain the totally unrealistic expections about what schools can accomplish. Another NYT article :

The No Child Left Behind Act, introduced in 2001 by President George W. Bush and passed by Congress with bipartisan support, requires that all schools bring 100 percent of their students to proficiency in math and reading by 2014. Mr. Duncan has called this requirement "utopian."

Posted by: James B. Shearer | Link to this comment | 03-10-11 9:27 PM

204

158

Free or reduced price lunch

This seems to be the proxy for household income. Obviously it is very crude.

Posted by: James B. Shearer | Link to this comment | 03-10-11 9:34 PM

205

||
If AWB is around, I just watched the most recent Simpsons episode. The episode as a whole kind of goes nowhere, but the guest voice of the pharmaceutical factory owner is played by Werner Herzog. The highlight is probably the last line of the episode, the traumatized memories of the adult Augustus Gloop: "My god, the tube. The tube."
|>

Posted by: Jimmy Pongo | Link to this comment | 03-10-11 9:58 PM

206

This is a bit misleading as tenure and salary scales where younger workers are underpaid and older workers are overpaid go together.

Again with that "overpaid" nonsense. Overpaid around here maxes out at a whopping 55K a year.

Posted by: gswift | Link to this comment | 03-10-11 10:22 PM

207

206

Again with that "overpaid" nonsense. Overpaid around here maxes out at a whopping 55K a year.

Add relatively if it makes you feel better.

Posted by: James B. Shearer | Link to this comment | 03-10-11 10:35 PM

208

What does "relatively overpaid" mean?

Posted by: essear | Link to this comment | 03-10-11 10:46 PM

209

I like to put real numbers out there just to make it clear how ridiculous James is. My wife happens to be in the pay lane I mentioned and it's the lane for BA +40 semester hours (she gets to be in the exalted +40 lane by having BA's in both Chem and Geology). Oh and that max takes 15 years to reach. Sure it's going to take another dozen years or so before she maxes out but man it is going to be sweet when she does. Any recommendations on best places to buy an island James? Caribbean? South Pacific?

Posted by: gswift | Link to this comment | 03-10-11 10:58 PM

210

Is "relatively" supposed to mean in comparison to private school teachers, James? Or compared to some other hypothetical job the teachers could have? Or just compared to their younger selves? I'm puzzled by your word choice.

Posted by: essear | Link to this comment | 03-10-11 11:08 PM

211

"Overpaid" is how wingers claim 50K a year and a pension is "unaffordable" after they spent decades cutting taxes and subsequently balancing budgets by shortchanging pension funds.

Posted by: gswift | Link to this comment | 03-10-11 11:23 PM

212

205: Yay! I'll watch it!

Posted by: AWB | Link to this comment | 03-10-11 11:33 PM

213

Poking around with the Google, I found a page full of vitriol directed at teachers in New York City who make $45k a year. I just can't fathom how people can get themselves so worked up about other people making an unremarkable amount of money.

Posted by: essear | Link to this comment | 03-10-11 11:38 PM

214

Jesus, this bitching about teacher salaries pisses me off. Do people have any idea what $45K will get you in NYC? This is the same city whose paper regularly prints explanations of why $200K just isn't enough to live on these days.

Posted by: AWB | Link to this comment | 03-10-11 11:43 PM

215

Apparently it's slightly more than starting NYC police officers make and this is inexcusable. I don't want to link to the page because it's so offensive.

Posted by: essear | Link to this comment | 03-10-11 11:46 PM

216

214: You're forgetting that teachers get off work at 2pm everyday and have summers off. Gosh, AWB. The greed.

Posted by: Stanley | Link to this comment | 03-10-11 11:50 PM

217

The most recent Simpson episode, or one of the more recent ones, anyway, has a character use the phrase "Generation Awesome."

Posted by: fake accent | Link to this comment | 03-11-11 12:11 AM

218

||
8.9 magnitude earthquake off Japan. A friend just emailed me from outside of Tokyo and said it was the worst shaking she's ever felt.
|>

Posted by: Jesus McQueen | Link to this comment | 03-11-11 12:33 AM

219

Fuck that's horrible. That's going to be awful.

Posted by: Keir | Link to this comment | 03-11-11 12:35 AM

220

At a certain level of inaccuracy, adverse selection takes over. Since the error bars are larger than 50 percentile ranks[1], you will have significant numbers in the following four groups:

1) Teachers who are actually in the top half of the distribution and correctly classified as being in the top half of the distribution

2) Teachers who are actually in the bottom half of the distribution and correctly classified as being in the bottom half of the distribution.

3) Teachers who are actually in the top half of the distribution and incorrectly classified as being in the bottom half of the distribution

4) Teachers who are actually in the bottom half of the distribution and incorrectly classified as being in the bottom half of the distribution.

Now consider the likely persistency in the teaching profession of the four groups, and note the effect that this will have on the average quality of the population over time.

Posted by: dsquared | Link to this comment | 03-11-11 12:40 AM

221

219: I'm watching the coverage on Al Jazeera; there's a huge tsunami and a few fires so far, but no reports yet of widespread structural damage.

Posted by: Jesus McQueen | Link to this comment | 03-11-11 12:52 AM

222

Grim video of the tsunami on the BBC news front page, with moving cars getting caught in it. The fires being carried along by it are quite amazing.

Posted by: asilon | Link to this comment | 03-11-11 1:28 AM

223

it's actually just on fire. it's scary as hell.

(and it's very shallow. and holy shit, fuck that.)

Posted by: Keir | Link to this comment | 03-11-11 1:59 AM

224

The problem here isn't that anyone is objecting to statistical methods, or even that anyone is unaware that this one controls for a bunch of social variables. The problem is that it's not a valid control system, because a) it doesn't appear to make any attempt to measure the system's inherent variance, i.e. how much the target metric varies year on year in the absence of any systematic change in its inputs, and b) the confidence intervals are so wide that the model is adding to the noise. If you make decisions on the basis of this measurement, you are more likely than not to be making them on the basis of random noise, and thus adding to the noise.

If this is typical of teacher assessment, the strawman teacher union would be entirely right to oppose it bitterly, because it amounts to punishing their members at random in the hope it will scare them into working harder. People who want to do Skinner box experiments on public servants will support it because that's roughly what it amounts to.

Posted by: Alex | Link to this comment | 03-11-11 2:38 AM

225

Perhaps it's a form of Existentialist Public Management. Although I am fully aware that my perceptions may be shaped by the mere chances of a godless and uncaring universe, and that bigger social forces dominate us all, I am going to grant tenure to Miss A and sack Miss B purely as an acte gratuit, to demonstrate by its sheer absurdity that nevertheless I retain my free will as an individual.

Posted by: Alex | Link to this comment | 03-11-11 2:41 AM

226

||
Hi Alex, are you in Sheffield? If so, inside or outside?
|>

Posted by: chris y | Link to this comment | 03-11-11 2:53 AM

227

No - should I be?

Posted by: Alex | Link to this comment | 03-11-11 3:03 AM

228

No, you shouldn't.

Posted by: chris y | Link to this comment | 03-11-11 3:05 AM

229

chris, just because he's starting saying he's in a godless and uncaring universe doesn't necessarily mean he's in Sheffield. He could be in Leeds.

Posted by: ajay | Link to this comment | 03-11-11 4:39 AM

230

fuck, are the LibDems having a parallel conference in Leeds/ They get everywhere.

Posted by: chris y | Link to this comment | 03-11-11 5:03 AM

231

Oh, right. Sorry, I was making an anti-Leeds (and anti-Sheffield) joke. No political content implied.

Posted by: ajay | Link to this comment | 03-11-11 5:17 AM

232

210

Teaching pay scales are generally seniority based. With every year of experience you are paid more. So a 60 year old teacher might be paid twice as much (the disparity will not always be this extreme) as a 30 year old teacher for doing exactly the same job. There is little evidence that experienced teachers (after the first couple of years) are any better. So if you believe teachers overall are being paid about right then the old teachers are overpaid and the young teachers are underpaid (but career earnings will be about right). If you believe teachers in some particular school district are underpaid (or alternatively overpaid) then you can only say old teachers are overpaid and young teachers are underpaid relative to the average pay for teachers in their school district.

I have some more remarks about seniority pay on my blog.

Posted by: James B. Shearer | Link to this comment | 03-11-11 5:43 AM

233

224

According to Yglesias you are pretending you can evaluate teachers in order to justify paying them more.

Posted by: James B. Shearer | Link to this comment | 03-11-11 5:57 AM

234

Isaylegs can go get fucked as far as I'm concerned. What does that blathering pablum pusher have to say about anything?

I do not agree with him, I don't read him, and (a special note for Bob, who once gave the impression he thought I was MY) I am not him.

Posted by: Alex | Link to this comment | 03-11-11 6:06 AM

235

189

Some more about the error term. One source of error is the influence of basically random events like whose parents will get divorced that year. But another source of error comes from things like parental education that have a predictable influence but you can't include in the model because you don't have the required data. So a class might randomly include a bunch of students who are actually better (or worse) than they appear to the model (which only knows for example whether they are receiving a free lunch and not their exact household income). As I noted above this is often modeled by assuming class room assignment is random with respect to unmodeled student characteristics. This by itself is enough to produce substantial uncertainly but the actual situation is often worse in that assignment is not random so the errors do not tend to cancel out as expected. In other words a teacher may be more likely (than random assignment would predict) to get an entire class of students who are mostly better or worse than they appear to the model.

Posted by: James B. Shearer | Link to this comment | 03-11-11 6:21 AM

236

and (a special note for Bob, who once gave the impression he thought I was MY) I am not him.

Don't worry about that. Bob thinks the same of anyone who isn't crawling through the sewers clenching in his teeth a knife soaked in the blood of some center-left bourgeois: "damned procedural liberal."

Posted by: Annelid Gustator | Link to this comment | 03-11-11 6:36 AM

237

I know this is a misuse of the tool of "rational expectations", which is only properly used to prove a) the rich deserve to rule, and b) governments can do no good, but teachers who is a rational utility maximizer will actually look ahead, and see that if they keep teaching they will eventually make more money. Good teachers who are "underpaid" in year 3 will be the exact same teachers who are "overpaid" in year 20, and teachers in year 3 know this.

Posted by: Walt Someguy | Link to this comment | 03-11-11 6:51 AM

238

234: If you read the Upanishads, you would see that you were him after all.

Posted by: Walt Someguy | Link to this comment | 03-11-11 6:52 AM

239

Good teachers who are "underpaid" in year 3 will be the exact same teachers who are "overpaid" in year 20, and teachers in year 3 know this.
Right, but it seems that James doesn't believe in inter-temporal utility maximization. Or even in deferred compensation generally. For public employees, at least. Doesn't seem to have a problem with his pension.

Posted by: Annelid Gustator | Link to this comment | 03-11-11 6:56 AM

240

Of course I don't believe in ITUM myself, but I also don't believe folks are nearly the rational actors that JBS and his fellow-travelers seem to.

Posted by: Annelid Gustator | Link to this comment | 03-11-11 7:01 AM

241

237

Sure you can see this as a form of deferred pay but it only works if teachers know they can't be easily dismissed. So this sort of pay scale and tenure rights go together which was my original point.

Posted by: James B. Shearer | Link to this comment | 03-11-11 7:03 AM

Re: Talky Times about assessing teachers