We've been revising our student evaluations at Heebie U and I have unconventional thoughts which I will put in the comments.
My role in this is that for the past few years, I've been on the Very Important Committee that uses student evaluations as a factor in Very Important Decisions.
First, Heebie U is unlike most schools in that our numerical ratings on student evaluations are ridiculously inflated. Yes, we have strong teachers and value teaching, but often a school-wide average will be 4.5/5 on a given item. Instructors get 4.8/5 and 5/5 class averages not that infrequently. 3/5 is probably the lowest class average I've ever seen, on any item.
In light of that, I don't know how well my next comment generalizes. But: the vast majority of our student comments in the open box section are insightful and helpful and honest. Sometimes they're pissed and venting, but there's usually a legitimate point to it.
Overall, I think it's incredibly hard to assess teaching, and this movement to invalidate student evaluations seems like a highly mixed bag, to me.
(Not that this post is about invalidating them. But it's an erstwhile theme: they're sexist and racist and need to be thrown out, rather than fixed.)
What happened to me?
I was elegantly symmetrical in the context of the base-10 numbering system. It's all gone now.
I used to be able to do basic wiki research. I used to have pride.
On a five point scale from strongly agree to strongly disagree, please tell me how you feel about the following statement. "I could have been a contender."
Do you strongly agree with the statement, agree with the statement, neither agree nor disagree, disagree, or strongly disagree with the statement?
I'm so sorry. I drank too much whisky today.
I agree with the statement.
But Heebie, the base is 10! If the mathematicians don't take a stand for sanity who will?
IIRC people tend to cluster at the 3 in the Likert, so there's that.
The problem with six point scales is if you give people more options to be wishy washy, they'll take them. Then you don't have enough variance.
"How good a pilot would you say you are, on a scale of one to ten?"
https://www.youtube.com/watch?v=aOIh0o5M-Cw
I wonder if it's possible to fight rating inflation through exhortation. My last job, the ratings were on a 5 point scale and were super inflated -- I usually got almost straight 5s, with a 4 or two thrown in occasionally so as not to look perfunctory. Most of my peers got mostly 4s and 5s, with any information about how well they were doing in the ratio of 4s to 5s, and the one woman my boss had contempt for and tortured, and would have fired if she could, got 3s, which the boss thought of as a bitter insult but which was "Satisfactory" on the form. (My current boss has flaked out on doing annual reviews for the last two years.)
I think an intro paragraph stating something like "We're an office with high standards and a successful hiring process -- we think of our lawyers as generally excellent, and 'satisfactory' means 'satisfactory by the high standards of the office'," might convince people to use the whole scale.
And I found my maxed-out ratings kind of depressing. I could see room for improvement in my performance all over the place, but if I was already at the top of the scale, I'd be a chump to work any harder, right?
I'm so glad I don't supervise anybody anymore.
I kind of miss it -- I did at my old job, but not any more. I'm very good at being genuinely delighted when anyone successfully does anything useful (does this mean I have low standards? Perhaps, but it works for me), and that seemed to generally turn into people I was supervising doing good work. And I find teaching people how to do stuff both more interesting than getting my own work done, and pleasantly inflating for my ego.
Teaching people stuff annoys me. Too be fair to people, I haven't tried it very often so maybe I just need more practice.
To be fair to me, people are objectively annoying.
I suppose it is a widely held belief that men are more likely to be geniuses, but don't people know that geniuses are almost always terrible teachers?
The genius thing is particularly aggravating in math when it comes from, say, a precalculus student who is breathlessly asserting that Male Professor is a fucking genius at graphing sine and cosine, whereas Female Professor is just pretty good.
Basically everybody who goes on about how whatever genetic grouping they belong to is better than others is attempting to compensate for their own personal shittiness.
Even a six point scale seems like too many. See the five-point scale comments above, but really if you have actual commentary I don't see why you need more than a three point scale.
First, Heebie U is unlike most schools in that our numerical ratings on student evaluations are ridiculously inflated.
When I was in college I think it was common for professors to get all 5s. I don't remember how I got that impression, but I was somebody who, by default, would have given a range of scores and then at some point I got in the habit of giving all 5s for any professor I liked and I expect that was because I heard that was normal.
I wonder if it's possible to fight rating inflation through exhortation.
I think you'd need both exhortation and the promise that nothing punitive would happen to people who got 3.5s or 4s. If the results of the evaluation feed into salary/promotions then there's an incentive for an unspoken, "you scratch my back I'll scratch yours" approach.
Jammies' old job had enforced "this many people must fall into each category, 1-5" so that they'd always have a ready supply of 1-2s to fire when they wanted to save money. He spent his last 18 months getting 1s and 2s after being fine for his first 15 years, and then got laid off.
A 6 point scale is perfect: first it's clumped in thirds: Good (5-6), Middle (3-4), Poor (1-2). Then it's just +/- thinking to distinguish between the items in a third. It's quick and easy and feels not-too-reductionist. It's the most common weight that I give math problems when I write an exam. 6 point scales 4-evah.
I like that idea. I also think it works better for something like movies in which there's a range of, "this was perfectly fine, but not my taste." I think that if I was evaluating a service I'd probably end up using it as 1=didn't successfully deliver the service; 2-3 = delivered the service, but the experience was unpleasant in some way; 4-6 varying degrees of good.
15.2: Serving on NIH study sections (groups of volunteer scientists who read and score grant proposals) is annoying for a lot of reasons, but one of them is the contradictory instructions we're given.
They emphasize over and over that we should use the whole range (1-9) when scoring our pile of proposals, but then they define 8-9 as "serious flaws". Funding is so competitive these days that most of the time we don't have get any proposals with serious flaws.
most of the time we don't have get any proposals with serious flaws.
Have or get: pick whichever one you prefer.
Yeah, when I say that using the whole range of a scale would be nice, that doesn't mean using it all on every occasion. That is, you'd expect that most employees, most places, are at least satisfactory most of the time. Mostly I was thinking that the modal rating should be the middle of the scale, and you'd expect more variance above the middle than below, which would be for genuine fuckups.
Yes. When I tutored, the grading guidance on papers was to assume that most people were normal, and thus to start with a grade of 55% and add or subtract as appropriate.
We were asked in a survey about what we thought about ours and in the optional response I linked a bunch of studies about how evaluations were rotten.
55% is failing by most U.S. grading standards.
Or misanthropy. Either confirms my biases.
29 is correct. The number of people who might conceivably submit something with serious flaws, and actually bother to go through the 5,000 mind-numbing steps involved in submitting a proposal, are minuscule. Totally different from journals where hopeless zeroes submit stuff to Nature and The Scientific World Journal alike.
At my workplace, we are told the average performance evaluation is supposed to be a 3 out of 5. and they are supposed to be normally distributed. The first part is easy; the second part is not because most supervisors only have like 5 direct reports so giving anyone anything other than a 3 seems like the sort of statement you have to have a mountain of evidence to justify.
37.1: Plus, you need various university officials to compete parts of grants. The university wants paid.
Redirect the indirects.
I try to complete evaluations with some eye on what they're likely to be used for. That means usually treating questions about others' performance as equivalent to a binary choice between "this person should be fired" and "this person should be retained," so I give a lot of high scores for middling performance. I'm sure there are organizations in this world that use feedback on individual employees in a constructive way, but that's not the way to bet.
More on teaching evaluations: https://cae.appstate.edu/inclusive-excellence/bias-course-evaluations?fbclid=IwAR1tsJhqoHiyK0AUjxXarmLshXCFOCQL06mB8BVSzUX27lv-Y_ypBw9iiiA
A colleague tells me that at her psycho former tech firm, in addition to stack ranking they had to be explicit about how they were upholding the official corporate values, which included "bias for action." I told her the only bias for action I would exhibit would be walking off the job the minute I read those words.
"X continues to exhibit a bias toward action in all cases where that does not conflict with Newton's laws of motion."
"X upholds corporate values by billing for 100% of the time he spends pooping without pooping during face to face meetings.
For interviewing at my workplace, we use a four point scale: strong no hire, weak no hire, weak hire, strong hire. There's no room to not make a decision, and no gradations for levels of reservations/wishy-washiness. Often helpful to make a decision, but anxiety producing for indecisive people like me.
46: That makes a lot of sense for situations like hiring where you do actually have to make a decision. Things like course evaluations where you don't really have to do anything with the information are very different.
It's probably good that I don't have any evaluations to do. I've been reading Elements of Surprise and I would probably have trouble not saying something like, "Has not sent a picture of his penis to any coworkers since the in-office training."
||
I should submit this as a guest post but I'm lazy, it's Friday afternoon and so who cares, and it's vaguely relevant to the academic context. A white student is suing a medical school for racial discrimination, which is very dog-bites-man, but her grounds for the suit are as follows:
According to a summary of her claims in the judge's ruling, the admissions officer asked Katchur about her race, and when she answered that she was white, the admissions officer asked "if plaintiff was sure and suggested that plaintiff obtain an expensive genetic test to see if she could qualify as Native American or American Indian to garner better chances of being accepted to Jefferson." The admissions officer also told Katchur, she said, "that she advised a past Caucasian applicant to obtain a genetic test, that the applicant learned that he was partially African American, and that he was accepted into Jefferson on account of his race." Black applicants have a better chance of admission, Katchur said she was told.
If the admissions officer really said all that, this is much closer to an apples-to-apples comparison than you usually get in these cases: the admissions officer did effectively tell her that a version of her white self with exactly the same background, privileges, access to education, and cultural experiences, but with racially distinct "DNA," would do better in the admissions process. What she can't prove is that the one-drop-minority version of herself would have gotten admitted. But the officer admitted to using a 99% bullshit racial litmus test for applicants, so some kind of discrimination seems to have been applied -- it's just not clear whether the applicant personally suffered because of it.
I doubt that this is the best way to change policies, but if admissions offices are really doing this, it's garbage and deserves shaming.
|>
People cannot subjectively distinguish 10 gradations in quality. Even objective measures will likely get overwhelmed by noise at the 1/10 level. I agree that 4 or 5 categories is much better.
Related: we switched to an entirely decimal scale, so now you can earn from 0.0 to 4.0 in 0.1 increments. It's totally impossible to do so rigorously in a quantitative course, but I can't even imagine what the humanities do, except to just give out traditional 4,3,2,1 and ignore the interstitial grades.
Even with the 5 category scale, I maintain that there are two types of people: those who give them out in a frequency resembling a bell curve (very few 1s and 5s) and those who give them out as all or nothing (very few 2s and 4s). Changes in the ratio of these people drastically change your distribution and average.
49: Finally, something from Philly as hard to understand as Eagles fans.
We've all thrown a battery at Santa at least once.
It's going to be a good week for discussing racism and medical schools.
When I taught (as a female adjunct), I used to dread the comments on my student evals (because, yes: the gender bias is real, and adjunct faculty are all too often at the mercy of these consumer satisfaction surveys).
My numerical ratings were high (and no doubt ridiculously inflated), and I did get some good comments: some very positive; some critical (mostly about the reading/assignment load), but written in a spirit of honest engagement (and yes, I did lighten the load in light of those comments, because I realized I was being unrealistic in my expectations); some just sort of garden-variety griping about being forced to take a dumb history course when this was supposed to be a STEM school, etc., etc.
But always, I knew, there would be that one student who would write something hostile and mean, and whose comment would indicate that I had utterly failed to communicate the spirit and purpose of this endeavour that is supposed to be called education.
For example: "This professor is hostile to religion, and doesn't understand the Catholic faith."
Yes, in an anonymous survey, a disgruntled student (a B student who didn't understand why he didn't get an A) complained of my not understanding the basic tenets of the Roman Catholic faith. And I'm thinking, 'Kid, I have probably forgotten more of the details of the Baltimore Catechism than you have ever known, or will ever know.' Totally absurd (like I didn't go to RC schools from pre-K to the end of high school? Like I wasn't raised in a cloyingly insular Catholic environment? Like the principal of my high school, Monsignor L*nney, wasn't my dad's own cousin? Like I don't understand Catholicism, fer f*ck's sake?!).
But it's kind of annoying when that kid's complaint is being registered as a mark against your teaching, you know?
"Student evaluations" are "consumer satisfaction surveys" is the main point I'd like to make.
At the other end of the annoyance spectrum, a student totally trolled me, and made me laugh out loud.
Note: I am Canadian, not Irish, and I don't (consciously) present myself as either. Would never describe myself as "Canadian," much less as "Irish," in a classroom setting.
6. Instructor's fairness and consistency in grading.
"She is Irish and Irish people are not fair graders and I know this fact from the history of British Imperialism."
Yeah, I laughed.
Our car sales guy asked us to complete the survey rating him and noted that for every point below 10 that he got he had to make it up with three subsequent 10s to get a good employee evaluation. Maybe he was full of shit and says that to get 10s.
My wife, as an educator, prefers to avoid grade inflation on surveys and one time gave some online service evaluation a 7/10. The company called us to ask why we were so dissatisfied and she had to explain that to her 7 indicated acceptable service but nothing that blew us away.
Our employee evaluations are a 3x3 grid (accomplishments and culture/leadership) but 1 or 3 on either scale means significant outlier that does require justification. 90% of people get 2/2.
Really if they remade the Bo Derek movie and called it 6 I don't think it would have the same meaning.
This is so true, and academe is so terrible at this stuff. The box-ticking forms we get for grad students are the worst: 'is this student in the top 2% of students you have ever taught? Top 5%?' In about eight different categories, and anything less than a half/half mix of 2's and 5's is the kiss of death. (I wouldn't be surprised if nobody has *ever* has worked through the math to answer these accurately -- if they did, I'm pretty sure their students didn't get into grad school.) Imho the human brain automatically translates all scales into: (1) Wonderful, to be endorsed and rewarded in every way; (2) pretty darn good; (3) fine; (4) not quite fine, tbh; (5) disaster area, avoid. Anything more nuanced or complicated is basically a demand that you confabulate, and most people are going to err in the direction of charity.
I wish Kahneman and Tversky had worked on this, they prolly could have sorted it out in a few weeks and spared us all a lot of aggro.
61: Right. I always feel a bit guilty filling out those recommendation forms, but I know that if I don't rate a student as the GREATEST EVAR!! I'm basically killing their chances of getting into a graduate program.
I think this has been brought up here before, but when I was going up for tenure, I was explicitly told not to list Europeans as potential letter writers even if they were the big shots in my field. The reason being that in Europe people don't write absurdly hyperbolic recommendation letters by default.
What about Spaniards? They seem pretty hyperbolic?
I think it's just ethnic prejudice.
Speaking of racism, it's weird that Ralph Northam (see 55) hasn't specified which person he is in the racist photo. It's like the Schrödinger's cat of racist symbols: right now, he's neither the blackface man nor the KKK guy, but he's also both of them. Or something.
We were just speaking of ethnic prejudice.
What's funny, in a morbid way, is that his opponent was too focused on attracting racist votes to investigate if Northam was vulnerable to charges of racism.
"That's not a picture of me in blackface. I was merely wearing a klan robe!"
Josh Marshall made the joke I was trying to make, but better.
Sources now reporting that Northam doesn't think he is in racist picture. https://talkingpointsmemo.com/news/northam-says-wasnt-him-racist-photo
What is his explanation going to be? "I was blackout drunk pretty much the whole time I was in medical school. I don't remember a thing from back then. Actually that's why I decided I better get into politics."
"I was blackout drunk yesterday when I said I was in the photo and apologized for it." Maybe he'll say that.
"I now chose now to live as a drunk man."
Today's lesson: Science sometimes involves as much sitting on uncomfortable seats listening to dull talk as religion.
That could have been me during one of my drunken stupors.
You get drunk and go to student science fairs. That's sick.
Drunk people coming in to see baking soda volcano explode too soon disrupt things.
If this science doesn't end soon, I'm going to turn against evolution.
What's funny, in a morbid way, is that his opponent was too focused on attracting racist votes to investigate if Northam was vulnerable to charges of racism.
It's Lincoln-Douglas all over again.
56: apparently I have nothing to say about the OP topic, but in researching family history I read an interesting-to-me, 19-year-old book about a Carmelite order in L.A. that has become a bastion of conservatism, focusing on its Mexican cultural roots, and it tempted me to dive into the vast literature on Catholicism after Vatican II. I really don't have time to do that. Book recommendations are welcome. (I suspect NW has the largest library on the subject among people here.)
84: Whatever you learn, do tell!
I'm still not going to watch the whole second episode.
Northam's quote in his medical school yearbook is on target. "There are more old drunks than old doctors in the world so I think I'll have another beer."
I'm not sure that holds up with a proper intent to treat design.
Interesting. Northam is straight up denying that he's in the photo.
I would think that that's the sort of thing that could be determined. You know, with science and stuff.
"In order to test and improve facial recognition technology and adapt it to use in challenging circumstances, we will enroll 200 participants with light skin and take their photographs from three angles (front, left, right). The participants' faces will then be covered with burned cork, because tradition, and machine learning algorithms will be employed to.... You know what? Forget we said anything."
I'm surprised that Northam doesn't see the writing on the wall. There's absolutely nothing that makes him essential, nothing that he brings to the table that we can't get from someone else. He's utterly replaceable (and there's a really good replacement ready and waiting), and there's no identifiable upside to keeping Northam in office. And yet, he's thinking he can just moonwalk his way through the problem.
It's a real Hail Mary move. If it doesn't work, he's exposed as a bald faced liar on top of everything else.
Not that people don't expect politicians to lie, but they at least expect them to be reasonably good at it.
Not that people don't expect politicians to lie, but they at least expect them to be reasonably good at it.
They do? How long has Trump been President?
95: I think maybe he makes up in enthusiasm and shamelessness what he lacks in skill.
Interesting critique from the time of Northam's inauguration suggesting he and his first lady in speeches refuse to even use the terms black, African-American, or PoC - apparently out of a misguided notion of "unity"? I verified this at least in his inaugural address (2,395 words) and first address to the state legislature (3,720 words).
Even if it isn't him in the photo, did he pick or approve the picture?
OT: The deer is back and took a dump on my patio.
84 - not really a large collection but I do recommend "The Smoke of Satan", an amusing account of the various headbangers who decided that Vatican II was invalid and therefore had been called by an antipope. One difficulty they then faced was deciding who the pope should be. Most solved it by deciding that there was no pope at all ("Sedevacantism"). This can all, of course, be rigorously proven.
Once in a used bookstore I saw a self-published 70s volume with a title like 'World Peace or Armageddon? It's Up To The Pope!', and have been haunted ever since by my failure to buy it. Have always wondered what else might turn out to be Up To The Pope.
At least one other thing he dropped they ball on.