So I'm thinking I ought to ... measure my success by how much time students spend on homework outside of class.
This sounds monstrous, and 100% backwards.
Also, will you be conducting surveys, or just guessing about the amount of time they've spent, just based on how prepared they seem?
I completely forgot how to do calculus, even thought I use calculus-based math every day. The main transferable skill I got out of it was the ability to not panic when I see a bunch of numbers I don't understand.
I remember why you can calculate the area under a curve. I don't remember how.
Heebie:
Did you read the Jump Math article in the NY Times?
This sounds monstrous
You only say that because heebie's a woman.
"classes which don't have a canonical sequel"
Is this the case for calculus?
This sounds as if you're going to change from a method that successfully teaches calculus to the bulk of your students, to a method that will weed out and lose a whole bunch of them, while possibly teaching the best students more.
If your students are largely planning to be high school math teachers, which is the impression I've got, is that actually a net gain? This sounds like something that might be better for a seminar for those students who are seriously considering math grad school.
3: Fill the area with little squares of equal size and count the squares. Then make the squares smaller and smaller until something happens.
Is it possible for you to compare your students' grades in later math classes - or other classes where you think your brain-training should help - with students who took the same class from someone else?
3. Construct a box that is rectangular, except that the top is curved. Fill the box with water. Pour the water into a measuring cup, write down the measurement, and divide that number by the width of the box.
Construct a box that is rectangular, except that the top is curved.
I told you that plastic has to go in the top shelf of the washer. Now we'll have to get a new one.
Am I churning out students who are better-than-expected at understanding Calculus, but at the expense of developing their transferable skills?
That seems unlikely to be possible for anybody who can teach calculus clearly. I'd just call that a win and keep going.
9: This is a really good question -- are you coming to the conclusion that there's a problem because your students do well in your class and then collapse when they hit another teacher? Or anything like that?
8: Yeah, thats the why that I can remember. The how involves squigly lines and greek letters.
10 would not work, as there is no way in hell that I am capable of making a curved box that is water-tight. Certainly not without using a quantity of aquarium sealant that would throw off the calculation completely.
I do remember that the differential of x^2 is 2x but I can't remember why.
((x+ε)2 - x2)/ε -> (2εx + ε2)/ε -> 2x + ε.
All your threes are backwards.
Which proves he was born in Kenya.
I didn't go to a school where math majors were being groomed for high school teacher-dom, but I third 9 -- if your students are coming out with a decent understanding of calculus, I don't think you need to upend your methods unless something is going wrong for them down the line. If they aren't spending enough time on homework in your mind, give them more problem sets.
Okay, hands up: who wants me to STFU?
Ok, let me clarify things.
1. Moore method can be done in lower level classes, but only if you have extreme flexibility there, which we don't, so I'm planning on doing it in upper level classes next semester.
2. "classes which don't have a canonical sequel" Is this the case for calculus?
Yes - Cal 1,2,3 are very well-established sets of material.
3. This sounds as if you're going to change from a method that successfully teaches calculus to the bulk of your students, to a method that will weed out and lose a whole bunch of them, while possibly teaching the best students more.
No...I don't think it's coming across how much care the teacher must take during classtime to monitor where each student is at, and structure things accordingly. This doesn't really work in large classes.
I don't mean to be negative about the Moore method -- I took a seminar in law school that was sort of like that, and it was absolutely fantastic. But it was a small, self-selected group, and even in that group a couple of people fell off track and ended up really not getting anything out of it -- it seems as if it'd be really difficult to get students who weren't totally committed and engaged to learn that way.
That's an interesting educational question overall -- if you ask the very best, most successful students, how they learn best, you still end up not knowing if the best students are the best because they were exposed to the best methods of teaching/learning, so everyone should be doing the same sorts of things, or if those methods are only the best for students who already come in on the top of the heap, and you need different methods for average students.
25 crossed with 24, and isn't responding to it.
The point of a math class is really not to teach someone how to calculate volume by shells. The point is to develop the portion of their brain that can represent systems symbolically, and then work within their symbolic model and know what their work means in the real task.
Understanding complicated systems because they've been clearly explained is not quite the same as developing that portion of your brain. It helps, but indirectly.
re: 25
That was certainly my feeling re: the Oxford tutorial method, which definitely seemed to suit some students better than others. While the tutor can give fairly intensive one on one help for students there isn't [ime] the tightness of organization of the material that a more lectures/group-tutorial based course has, and some people really need/benefit from that.
But it was a small, self-selected group, and even in that group a couple of people fell off track and ended up really not getting anything out of it -- it seems as if it'd be really difficult to get students who weren't totally committed and engaged to learn that way.
I think so. I think it's done poorly very often. I keep coming back to the idea that the skill to develop is having very close tabs on every student in the room, and what they understand.
This is done by asking them to explain something, in complete sentences. And by eavesdropping when they are discussing something in a group.
But it really takes a lot of concentration from the teacher to keep track of it.
27.last: Then I'm the best person ever at giving directions.
Define the area under that curve as the numeraire.
What else gets kids to invest time outside of class?
You start to think in a complex way about material when you start to devote your leisure-brain to the topic, and play with the ideas in your head. The problem is that students approach homework as a contract to be completed, because homework is obnoxious and they wouldn't be doing it otherwise. It's not particularly engaging.
In the best normal math class I ever had, the homework taught the whole lesson all over again in a different way, asked students to prove or develop key concepts (often before we'd been assigned the relevant reading or gone over it in class, so this involved real thinking), and went over the same sort of problem in a few different ways. Class time was devoted to the ordinary business of explaining the math in a straightforward way.
It worked really well.
I think the group problem solving approach can work well with small classes, IF the students are already acculturated to doing a material amount of prep work on their own. Has anyone seen it work well with large (20+) classes?
re: 32
I think you have to just accept that 99% of your students won't give as much of a shit about the subject as you do. It is is really nice when you manage to get someone genuinely into a subject/topic, but most students just do want to do the minimum to get by.
31: Here's an alternate solution:
http://www.smbc-comics.com/index.php?db=comics&id=1338
35: But that would mean 1 out of 20 students won't get math.
From the Wikipedia page you linked:
The University of Chicago offers the following Moore method classes: honors calculus, analysis, algebra, geometry, and number theory along with one or two Moore method electives each year.
This is totally false, as far as I'm aware. I did take one class with only about six or seven students that seems to have been pretty close to following this method, though, and it worked really well, but I have trouble imagining how it would work in a big class.
35 is really missing the point.
I want to develop a part of their brain. We know that that happens best when you love the material. But not because of sheer love, but because people then engage with the material on its own terms. So we should take clues from that pathway - real engagement - when designing a class.
but I have trouble imagining how it would work in a big class.
I do, too. Our classes are small, though.
For a small, upper-level class, it sounds like a great idea.
When I was being negative about it, I was thinking you were going to be teaching calculus to freshmen that way.
ah. So you're not talking about the calculus sequence but a class like real analysis.
What about introducing them to differential equations empirically? Numerical integrators like octave are free. Creating a base notebook for import and suggesting cases to explore would take work, but might pay off.
Or a guided tour of interesting systems via youtube-- 2-D Burgers equation, reaction-diffusion systems, maybe predator-prey dynamical systems for which stability analysis of solutions would be accessible to calculus students. Diffusion-limited aggregation, finite-element stress calculations; basicaly, applied topics that are interesting, are at the periphery of accessibility, and lend themselves to numerical puttering around.
All this as a way to motivate thinking about systems, maybe suggest topics that HS teachers can themselves later suggest to their students as more interesting than the volumes of solids.
Heebie, I think that what you are doing is a good idea. In general, I think its best that students be exposed to a wide variety of types of instruction. Every student has their own optimal way of learning that they need to figure out for themselves. Some will thrive in environments that others struggle with, or will fail to thrive under methods in which others do well. But if students only ever get taught one way, there is little opportunity for each individual to find out about and seek out the methods by which they learn best.
So, mixing it up is good, and I applaud you for adding some variety.
re: 39
But you want them to engage that part of their brain by working more outside of class, and I'm saying there's a tension between that and the desire of must students to do as little as possible. I'm sure you are right that changes in class structure or teaching methodology can partly encourage whatever it is you want them to do, but if what you want them to do is a lot more work outside class that's always going to be a very tough sell.
I would like to figure out a way to re-invigorate Calculus, but Moore Method is definitely not it. That's part of the reason I wanted to clarify for myself what makes Moore Method useful, and how I came to this "time engaged outside of class" conclusion. Aside from having heard that studies show that blah blah blah.
Anyway, I'm really not sure how to revitalize Calculus.
I'm saying there's a tension between that and the desire of must students to do as little as possible.
Sure. This tension exists for me, as a parent, as a teacher, as everything that's preventing me from curling up with a book in a patch of sunshine, and a cat in my lap.
What about introducing them to differential equations empirically? Numerical integrators like octave are free. Creating a base notebook for import and suggesting cases to explore would take work, but might pay off.
I don't love diff eq terribly much, but this sort of thing would make a fabulous first year seminar, where you are totally disengaged from a traditional list of topics to cover and can really explore a topic at the students' pace.
It's a metaphor. For my vagina.
I really am curious about 9 -- the post sounded as if you thought your current teaching methods were producing students who understood calculus better than most, but who were less skilled at learning math generally. Do you notice that your calc students seem weaker than students from other classes when they get into the upper level work? Or is this just a worry?
On revitalizing Calc... not that I know anything, but challenge problems? Give them some optional problems that should really stretch them, and tell them that a successful answer, presented for the class, will wipe out a missing homework or something like that?
How this thread reads in my brain:
"Math math math. Math math, math math math!"
"Teachy, teachy. Teachy teachy teachy -- teachy teachy -- teachy."
"Math! Math math math."
"Math math, teachy teachy."
[Pauly Shore]
"Teachy math, teachy math, teachy teachy math math."
re: 47
And further to this, if what you are doing is working well [which it sounds like it is], maybe that's the right way to go?
[I may be talking out of my arse here]
53: Some French dude made millions that way.
On further reflection, it appears that the man could play music, but not talk, out of his ass. I regret the error.
Is it possible for you to compare your students' grades in later math classes - or other classes where you think your brain-training should help - with students who took the same class from someone else?
Math Ed studies are notoriously indeterminate. Did my class weed out fewer students? What determines which students go on to take another math class? Etc.
I believe they've shown that students who take an IBL math class are more likely to take another math class, for what that's worth.
the post sounded as if you thought your current teaching methods were producing students who understood calculus better than most, but who were less skilled at learning math generally.
In reality, I'm vain enough to think my students accomplish both just fine. I'd say: fewer students get weeded out in my classes, and those borderline students leave my class with a decent understanding of the basics, and then crash and burn under a future teacher.
As a field, I think math teachers would be serving society better if we focused on developing students' ability to think/communicate/model things symbolically, and less on getting them to understand fixed traditional topics.
43.1,2 is how I taught myself most of the math I actually use, as opposed to all the rigorous stuff I learned in classes later. Teaching a computer how to do something goes a long way toward understanding what it means, and getting visual representations of what's happening in dynamics instead of just looking at equations also goes a long way.
Teaching a computer how to do something goes a long way toward understanding what it means
I had a stats professor who made is build a program to do regression analysis. It seemed to work.
I've taught some Moore method classes, and they've gone very well. But I've never done it in a normal college setting, so I don't know how well it transfers over.
One huge potential advantage is that most students (even the really good ones!) are really really really terrible at talking about math. The sentences don't make sense, they don't say what they actually mean, they're not clear. Teaching people how to make precise correct informative statements is much more important than any actual content in math (after arithmetic), and Moore method actually makes some progress in that direction.
It makes me angry that high school math doesn't require students to write in complete sentences. I've been trying to get my calculus students to do that this year, but with a huge class there's really no way to do that. (I.e. I can get more students to write more words, but the words won't actually make sense so what's the point.)
57: Computer science should be part of the standard high school curriculum. It's certainly more important than whatever math they're learning, and probably more important than the science they're learning. If coding were taught early on it would open the door to redoing the intro math curriculum in a way that emphasized concepts over doing computations yourself. Currently it's too hard to really test concepts, but making people code things up goes a long way towards that. You can't code something that you don't understand.
Computer science should be part of the standard high school curriculum. It's certainly more important than whatever math they're learning,
I really agree with this. Also, statistics should be taught as a mainstream high school course.
One huge potential advantage is that most students (even the really good ones!) are really really really terrible at talking about math.
And this!
Also, statistics should be taught as a mainstream high school course.
I took AP stats in high school and retained very little of it, far less than I have of calculus. Partly this may be because the teacher was teaching it for only I think the third time, but it's also, you know, the curriculum at my high school was all pointing towards calculus (and the AP stats material is statistics-without-calculus), so it was rather a different mode of thought.
Which isn't to say that stats shouldn't be in the HS curriculum, of course!
Before you retire the system, could you put the clear, informative, this is how you do calculus classes online? Kthx. /guy who never learned calculus.
For instance, I remember that there is such a a thing as a chi-squared test, but only very vaguely what it's for and certainly not how to carry one out.
66: Proc freq data = xxx; tables x*y/chisq; run;
It's all coming back to me now! Thanks, Moby.
I many strong reactions to things discussed in this thread, but I'm not saying anything because I keep thinking of the times when language teaching has been discussed here and how frustrating it was that everyone seemed to think they knew so much about it.
69: I'm always happy to share acontextual knowledge.
70: As a rule of thumb I'd expect math people to not be frustrated in the same way. Basically because we tend to undervalue experience and domain specific knowledge, while overvaluing ideas and generalization. Which is to say, I think you should say what you're thinking and not be worried that you'd be being frustrating.
61:L You can't code something that you don't understand.
Wel, you can't correctly code something you don't understand. But I've seen a lot of well-formed code that works perfectly well on 1 or 2 test cases, but is completely wrong.
But yes it is easier for someone who knows what they're doing to spot conceptual mistakes in code than in prose.
I find that light opera is the form of expression which makes conceptual errors easiest to spot.
Re:70
That's the Unfogged dark-side Dunning-Kruger effect. I expect we've all felt it when people are confidently bullshitting within our areas of genuine expertise, and contributed to it at other times, too.
But yes it is easier for someone who knows what they're doing to spot conceptual mistakes in code than in prose.
Yeah, someone who knows what they're doing. Also, it depends a lot on the code in question.
I'm also interested in hearing Blume's thoughts.
I wonder, if we all agreed to respond to every query or interaction we experienced in the next seven days by quoting 61, as a sort of viral seeding experiment, we'd be any more likely to bring it's suggestion to fruition. It would be worth all the strange looks.
Before you retire the system, could you put the clear, informative, this is how you do calculus classes online?
These do exist, if you're actually curious about sinking a little time into it.
78, further: that is, if it worked, it'd be worth all the strange looks. Otherwise not so much.
(I recently "simplified" a function definition from "foo sofar = foldl (\acc f -> f acc) sofar fs" to "foo sofar = foldl (flip ($)) sofar fs" (and I just realized it could have been reduced further to "foo = flip (foldl (flip ($))) fs") and, yeah, you know, it's clear to me how it works because I went through the simplification process, but it's still more opaque now than it was before.)
82 looks decently rigorous. There are also versions that you can do less intensely, to get the flavor of calculus at the curiosity level.
I can vouch for this guy, for the fast-and-loose-but-interesting version.
Also it's available free all over the place, it looks like.
Can you guys find some for topology? Because I'm completely lost.
My momma said not to trust girls who taught math on the first date.
What kind of topology have you been looking at? The point-set topology is not particularly HEY NEAT WHOA! You'd want a bendy, stretchy introduction to algebraic topology.
Also, statistics should be taught as a mainstream high school course
in principle yes, in practice statistics is usually taught so abominably badly that I am not sure.
88: I'm trying to understand how one gets from point-set topology (or pointless topology) to the bendy, stretchy stuff (specifically manifolds).
Yeah, high school statistics runs into the usual problem of "someone who actually understands statistics has much better job options than teaching high school." It's a hard problem.
89, 92: Do you think it's taught worse than other high school level math?
I have naturally low blood pressure, so feel free to raise it with infuriating yet hilarious examples.
Of course CS also runs into the problem in 92.
I dunno, I feel like the CS course I had in high school was pretty good, actually.
90: Take a rubber band, stretch it out into some other shape. This gives you two different topological spaces (in the sense of point-set topology), they're homeomorphic (which is to say, indistinguishable from the point of view of point-set topology). That's how point-set topology is related to stretchy rubber topology.
Of course in this example you don't really need general point-set topology, you could just do everything with metric spaces. And basically that's true for all of manifold topology. So most of what you're learning in point-set topology isn't really aimed at topology as such, but is instead a common language which is useful throughout mathematics.
Of course, I've never actually taken point-set topology, so take my opinion with a grain of salt.
Do you think it's taught worse than other high school level math?
Yes much so. In general, stats courses taught by mathematicians are awful (because mathematicians so very, very often assume that statistics is really just applied probability). In high school they tend to just teach the whole thing as a set of magic formulas with no underlying logic, and then a bit of "here's how to draw a histogram" that is meant to motivate the magic formulas.
97: I'm at a pretty rudimentary level. I would really like to see how one uses the definition of a topological space (which seems, to me, to be setting up little more than a distributive lattice) to say something useful about a metric space.
I also didn't take point-set before stretchy topology. Worked fine.
I would love to take a well taught introductory stats course, but I too have the impression that such a beast barely exists.
Point-set topology isn't there to say new interesting things about metric spaces, it's there to allow you to generalize your intuition from metric spaces to more general settings.
103: Not good enough! My topological intuition sucks hairy balls.
I was able to understand this graph, so I guess I know enought statistics to understand this issue at least!
http://www.good.is/post/the-only-gay-marriage-graph-that-really-matters/
104: Which cannot be combed entirely smooth, no matter how hard you try.
It's not my specialty, but here are a few sources:
-appendix of Reif's textbook Statistical and Thermal Physics
I find Andrew Gelman's blog consistently interesting, and the little that I've read of his writing has been clear as well. He's written an intro book as well as a book about teaching stats.
I would love to take a well taught introductory stats course, but I too have the impression that such a beast barely exists.
I took a very nice upper-level stats class from which, unfortunately, I remember very little.
My take-away is that there's an element of statistics which involves cultivating a decent, but not very sophisticated, statistical intuition and beyond that you get into a bunch of specific techniques which are each useful and not too complex, but are also narrow enough that they are difficult to remember if you don't actually use them.
whoops, link typo
wordy explanation of Bayes' theorem:
http://yudkowsky.net/rational/bayes
re: 99
In high school they tend to just teach the whole thing as a set of magic formulas with no underlying logic, and then a bit of "here's how to draw a histogram" that is meant to motivate the magic formulas.
That was my experience in 1st year psychology, too.
What I found genuinely useful was when I read some history/philosophy of science books on the history of statistics and statistical methods, and then plugged my way through some stats for business and econometrics texts.*
* albeit at a very noddy level. I couldn't actually apply much of it now.
106: a simple piercing and you can comb your balls as smooth as you like. Ahh topology.
My vague understanding in statistics is that there's some underlying philosophical dispute (between some sort of Bayesianism and some other position that I've never really heard explained coherently) that results in everyone just compromising on magic formulas with no underlying logic.
...in everyone just compromising on magic formulas with no underlying logic.
There is a great deal of underlying logic to the formulas. There is now some debate over previously unspoke assumptions. It doesn't really affect life as far as what most people working with statistics do for a living.
113: There's plenty of underlying logic in, say, testing for statistical significance. Using p=.05 as some kind of universal standard, though, is the application of a magical formula. Though I hope that it's mostly non-statisticians mis-applying it.
That's something more like QWERTY than magic. Back when you had to do the math by hand, .05 was useful.
I think the underlying logic, in most cases, is that the magic formulas are easy to calculate with nothing more than pen, paper, and some lookup tables.
You might think I was pwned, but it's only n=1.
In high school they tend to just teach the whole thing as a set of magic formulas with no underlying logic, and then a bit of "here's how to draw a histogram" that is meant to motivate the magic formulas.
Not only in high school. Certainly all of the statistics I ever got out of any physics coursework -- most of which was kind of crammed into lab sessions, because actually talking about data analysis is apparently beneath the people teaching the classes -- was of the cookbook form. How, but not why, to do a chi-squared test, etc. Poisson statistics probably showed up in an actual class at some point. To a large extent the only thing I know to do when estimating errors is taking square roots of things.
It's only fairly recently that I've figured out that statistics is actually really interesting and that I want to know a lot more about it.
Also, I find some of the vague philosophical pronouncements about Bayesianism that I've encountered pretty appealing, but I also know a lot of people who think they can make boring and useless work interesting and important by chanting things like "Markov chain Monte Carlo" and throwing massive amounts of computing power at irrelevant problems, so I'm a little unclear on how much of it is hucksterism.
If you want to learn how to do statistics (not learn statistics), this site is very good. That is, if you know basically what you need to do but are foggy on the details, that site is a huge help.
...so I'm a little unclear on how much of it is hucksterism.
66.236% +/- 12.456
121: This answer is correct, but if you want to get full credit you are going to have to show your work.
"Hucksterism" is too strong of a word.
Do we need to have a chat about significant figures?
...so I'm a little unclear on how much of it is hucksterism.
66.236% +/- 12.456
Back in the early days of the internet I got in some big argument with Sas/ha Vol/okh about whether you could sometimes say that A was ahead of B even if they were "within the margin of error." It seemed to me quite obvious that when calculating your 95% interval there was no reason to count the chances that A was way way ahead of B as evidence against A beating B. But Sasha kept saying weird things about how the null hypothesis was that they were tied that didn't make any sense, and saying that it was only liberals who believed in versions of statistics that make sense.
Where by "early days of the internet" I meant "early days of blogosphere." And sadly neither VC nor Sausagely have archives that go back far enough.
But the point was if you have a margin of error of 3, that means that you have 95% confidence that person A's vote is within 3 of what you say it is. So suppose A is beating B by 5.5 in your poll. You then do some calculation with bell curves to figure out what percent of the time A is beating B, and it may very well come out to be over 95%. (Of course there's some second order affects here since the two votes amounts aren't independent, but that wasn't the point of the argument.)
It seemed to me quite obvious that when calculating your 95% interval there was no reason to count the chances that A was way way ahead of B as evidence against A beating B.
You want to do what they call a one-tailed test. As a general rule, don't do a one-tailed test. As a specific rule, taking into account the nuances of any given situation, don't do a one-tailed test.
That's something more like QWERTY than magic. Back when you had to do the math by hand, .05 was useful.
My impression was that the first appearance of .05 was in an application in which having that be the bound for significance made sense—that is, that which p-value bound to check for was derived from some antecedent understanding of the nature of the problem area. But on recounting that now it seems as if it must be apocryphal, unless that first appearance was just incredibly influential.
I recently read this book that I found on the front table of the Seminary Co-op, which was pretty interesting as history, but frustratingly skimpy on mathematical details.
129: Karl Pearson was a big deal.
128 is doing exactly what we're complaining about here, falling back on some cookbook rule given by magic. So, why shouldn't we do one tailed tests if what we're interested in is a one tailed question?
(For example, is the point that the approximation of real life distributions by bell curves tends to break down at the extreme ends of the curve?)
Was somebody looking for mathstat videos ?
66: chi-square test means your 'test statistic' would have a chi square distribution if some idealized
assumptions are satisfied. For example, if you square a standard normal, or add up an independent series of
same, you get a number with a chi square distribution.
132: That's not a cookbook rule. It is a guideline intended to stop the well known human tendency to nudge things in the way that it expected or hoped. And it isn't a rule of statistics. It is a rule that no reviewer will let you do a one-tailed test unless only one tail is physically possible.
Yeah, it's actually pretty silly, right? Suppose we have two candidates and a poll says one of them is getting 51 +/- 3% of the vote. Assuming I can treat this as a Gaussian, they will win about 75% of the time. Let's say I got the error bar wrong, and it looks more like a Gaussian of +/- 3.5%. Then they still win 71% of the time. It's just not sensitive to the tails of the distribution.
So you're saying it's solely a convention that has no rationale or reason other than that we all decided on a less sensible convention? (After all, it's only nudging things in that direction if it's not the convention.)
Oops, those numbers were for 52 +/- 3% and 52 +/- 3.5%. Anyway, the point remains. If it were 51%, the win percentages would be 63% and 61%, respectively.
115: I understand why you would use p=.05 rather than, say, p=.045 or something, and to some extent that has to be arbitrary, so might as well just pick a tractable number. I'm just saying that when you're drawing from a pool of, say, five hundred studies, you'll probably want a significance level at least as strong as 1/500. This is important when, e.g., interpreting new medical research.
Even switching to a flat distribution of the same width only changes the probability by a couple of percent.
My recollection from the argument (which unfortunately doesn't seem to be on the internet) was that Sasha would say if you got a result of 52 +/- 3% then you literally *learned nothing* because you were not able to reject the null hypothesis (that the candidates are tied) with 95% confidence. Saying that you learned something means you're insufficiently conservative and jump too quickly to conclusions.
134 is making zero sense to me. 'Splain yourself, Moby.
136: No. When you do a test like that for a survey, you are testing whether A and B are different which involves both tails whether you like it or not.
Pure silliness. Suppose my null hypothesis was that candidate A would win in a landslide?
143: Yeah, I feel like I must be misrepresenting his opinion because it's so obviously insane, but I'm quite sure this was something that he actually claimed. Again the argument wasn't that it was actually right, but only that it was Conservative.
142: But I don't want to test whether A and B are different, I want to test whether A is beating B!
143: It would be completely fair to do that, if you did it before hand. So long as you are willing to ignore any other inference from that test (e.g. that A will win by a small amount).
But, you can't just go look at the results after a poll is done and post hoc exclude the possibility that B could have come out ahead of A (which is what is described in 127). I mean, I can't stop you from doing it, but it means you are doubling your chances of a false positive.
I mean, I certainly wouldn't claim any high degree of certainty about the outcome or be surprised if I was wrong, based on a poll that reported 52 +/- 3%, but it's still information that should make me provisionally more confident about what the outcome will be.
I guess this probably makes me a Bayesian, but it just sounds like common sense.
But, you can't just go look at the results after a poll is done and post hoc exclude the possibility that B could have come out ahead of A (which is what is described in 127). I mean, I can't stop you from doing it, but it means you are doubling your chances of a false positive.
You're not excluding anything, but you're certainly free to post hoc update your belief about which candidate will win in light of the information you obtained.
There are all sorts of cases where I will argue at length about how people are biasing themselves by not doing a blind analysis, not taking into account trials factors (as Benquo pointed out in 138), and so on. I don't see this as one of those cases.
147: And you can calculate how certain you should be given poll results. That's fine.
146: So you're saying the rule is "95% confidence, unless your question happens to be something where only one tail is relevant in which case you need even more confidence" and the reason for the higher requirement in the latter case is "it decreases your chance of false positives." Well sure it does, but so would holding all questions to that higher standard!
I think you're confusing issues of scientific ethics with issues of statistics.
Where's Cosma when we need him, anyway? And if I want to learn more about statistics, can I somehow just dump information directly from his brain to mine?
You're not excluding anything, but you're certainly free to post hoc update your belief about which candidate will win in light of the information you obtained.
If you conducted a new poll and used the previous results to inform your hypothesis for that poll, you'd be golden. If ran one test and then used that result to run another test using the exact same data, you'd need to adjust your p value for multiple comparisons which would involve (basicially) dividing your test stat by 2. Which would get you the same p value as a 2-tailed test.
If ran one test and then used that result to run another test using the exact same data, you'd need to adjust your p value for multiple comparisons which would involve (basicially) dividing your test stat by 2.
This looks like what I would call a "trials factor" (although Google is either tailoring its results for me or this is an expression primarily used by high-energy physicists, so statisticians might call it something else). But I'm not seeing how it's relevant for the question Unfoggetarian was asking.
Maybe I should take a class in explaining stuff?
The term you are looking for is "Multiple Comarisons."
Multiple comparisons. I took away the p because it was greater than .05.
Really, if you want to calculate the probability of an event directly, you shouldn't be using a significance test.
156: Right, but the same factors apply in calculating the CI for your probability estimate.
Sweet jesus is this confusing.
I still think you're just confusing which things are matters of ethics and which are matters of statistics. There's an interaction between on the one hand a mathematical model, and on the other hand some statistical test. If you take the XKCD jellybeans as a good example (http://xkcd.com/882/) the problem here is that you let your data affect your model in an irrational way. There's just not a reason at the beginning to think that color is relevant, and so the model that you should have here is that you're actually doing the same experiment over and over again which changes what the statistical calculation you're doing. But this is a question of what you know about jellybeans, color, and acne a priori, and it's a matter of scientific ethics to actually report all the tests you ran, but it's not fundamentally an issue of statistics.
159: Now I want to smack you hard.
And not just in the usual way that I want to smack people I meet.
Do I understand correctly that I'm allowed to do one-sided tests with 95% confidence, but only if I wrote myself a certified piece of mail prior to that in which I promised that this was the test I was going to run?
159: Well, but there are cases which are very similar to the XKCD jellybeans example where the issues are slightly different. When someone looks for a new particle decaying to two photons, for instance, and they're looking for a bump at a given mass, and they tell you they see with 4-sigma confidence an excess at 115 GeV (say), you should revise the significance downward for the fact that it could also have been at 117 or 120 or.... Here the analogue of "color" is "mass," it is relevant, and you're not really doing the same experiment over and over again, but the mere fact that you're doing many experiments means that one of them is much more likely to show a statistically significant result if you're treating them all as independent significance tests.
And not just in the usual way that I want to smack people I meet.
s/b "And not in the good buttsex way."
Sometimes I post a comment and then wonder if I really should have posted it from work.
You cannot use the results of the two-tailed test to decide you want to run a one-tailed test. You can, if you'd like, run a one-tailed test you decide on before hand.
But shouldn't you *always* decide to do a one-sided test beforehand if you're doing a poll on an election?
Let me start over. If you want to know if A is different from B, you have a two-tailed hypothesis. That seems pretty clear. If you want to know if A is greater than B, you have a one-tailed hypothesis. A difference that would give you p = .1 for a two-tailed test will give you p = .05 for a one-tailed.
If you did a test for A is greater than B and a test for B is greater than A, you have done two one-tailed tests that check exactly the same hypothesis as the two-tailed test but you have magically doubled your chances of finding a significant difference between A and B compared to your chances with the single two-tailed test. Without any adjustment for multiple comparisons, you've taken the exact same data and the exact same calculations and doubled your chances of finding a statistically significant result. You did this by lying to yourself and your mother and I are disgusted with you.
Hrm, ok, now I see the point that you're making.
But I still don't see how the fact that one thing is twice as likely to yield a significant result as another thing actually makes the former wrong (rather than the latter). I mean 5% is double 2.5%.
170.2: Because you don't really know the true parameter assuming that A can only be greater than or equal to B isn't safe.
157: Using a 1-tailed confidence interval yields the intuitively correct result for a measured difference near zero, of a 50% chance that each candidate wins. Using the CI corresponding to the 2-tailed test just gives you a bizarre 47.5% chance for each, with 5% somehow lost into the aether.
Of course, around 0 significance tests becare irrelevant.
167: To give some detail, when conducting an election poll, you usually want the CI, not a single significance test. The CI based on the one-tailed test will be artificially narrow because it assumes it isn't possible for B to beat A. If it isn't actually possible for B to beat A, you don't need a poll.
172 and 173 seem to contradict each other.
If I understand correctly, the Bayesian story here is as follows. Each person is an unfair coin with some actual probability of turning up candidate A or candidate B. The problem is that you don't know those actual probabilities. So you start with some prior assumed distribution on the possible state of unfair coins (based on your theoretical understanding of how elections work, for example you should probably be including incumbency examples in your model). Now you update your priors using Bayes theorem based on the result of your survey. This gives you a new distribution. Ok, now if I have any question (like does A beat B) I just look at the distribution and add up all the probabilities (where A beats B by each possible ammount) and I get the probability of each answers. Ok, now I need to know if that probability is good enough to print in a newspaper (and really we should just be reporting the probability anyway) you ask is this over 95% or not. But I certainly should count the cases where A clobbers the hell out of B as contributing to the cases where A beats B.
This story makes a lot of sense to me, in a way that other stories about statistics never have. Of course the hard part is in the priors...
The numbers I quoted in 135 and 137 were just sampling a Gaussian and counting how many times A wins. I don't understand any of this one-tailed and two-tailed confidence intervals business.
This thread makes me doubt that statistics should be taught as part of the standard high school curriculum.
You wouldn't expect this from his main research or game theory textbook, but R My/erson taught a class on decision analysis & probability models that seemed pretty good at developing one's intuitions about probability through (spreadsheet) simulations. At the time, the pace felt too slow, but I probably got a fair bit out of it. And then I found $5.
177: Yeah, when it comes down to it college might be the right place for statistics. It should be a class more people take (for example, premeds should have to take it), but you need precalculus plus some discrete math. CS on the other hand has fewer prerequisites and is a prerequisite for more things.
I don't think 172 is using confidence intervals in the way they were intended to be used.
172 (and 157) A lesson here is you should not add up the expected errors as if they are independent, as if you could get a result of each candidate getting > 50 %. (If this thread is about vote counting error, not polling error, I'm p0wned.)
If you want to construct CI using intuition from standard deviations, R Halford might prefer references like Castaneda and Hazelwood.
181: I thought (under the Bayesian story) these are the same thing. That is, returning a result of 52 +/- 3% is just shorthand for saying that the posterior distribution is approximately a Gaussian with median 52 and standard deviation 1.5.
If that's not what it's saying, then what is it saying?
I've mentioned nothing Bayesian and it has no plausible connection with 126/7 as expressed. They are related, but you still have to calculate based on some consistent method.
Also, 183 isn't obviously sensical to me.
-appendix of Reif's textbook Statistical and Thermal Physics
I just checked my copy and see a bunch of things about how to evaluate integrals, but not much discussion of statistics.
Reif
I loved that book. Mine has this baby-poop-colored cover.
Posterior to what?
Some low hanging fruit.
Posterior means after you take into account the survey results.
You take it into account if you have another survey, as I mentioned once above.
A better way to put my question would have been, what is the prior distribution.
You start with some prior distribution (telling you the odds that A has x% support). Then you do a survey. Then you update to get the posterior distribution. Then you can ask any questions you want about the posterior distribution (e.g "does A have over 50% support?") and then calculate the odds of each answer.
Saying that the survey result is 52 +/- 3 just means the posterior distribution is a bell curve around 52 with sd 1.5. And a straightforward bell curve calculation gives you the odds that A is winning. I haven't done the details but you can say that it's 95% sure that A is winning when A is doing just barely less well than 50+margin of error.
Well right the prior distribution is the tricky part. But if you don't have any prior theoretical knowledge of the situation than you're not going to be able to infer anything anyway. (This is where I need to know more philosophy of science to make this argument correctly.)
Pollsters use 3% margin of error to mean s.d. of 1.5%, not 3%? That also changes the numbers I quoted above, then, I meant +/- 3% as 3% s.d.
I think the "margin of error" is two standard deviations, because that's what corresponds to (roughly) 95% confidence.
Yeah wikipedia confirms that newspaper error margins corresponds to 1.96 standard deviations in each direction.
56
In reality, I'm vain enough to think my students accomplish both just fine. I'd say: fewer students get weeded out in my classes, and those borderline students leave my class with a decent understanding of the basics, and then crash and burn under a future teacher.
There is a case of course that it is the purpose of an introductory course to weed out the marginal students and that you are doing them no favors by encouraging them to continue on an academic track on which they are unlikely to be successful.
168
If you did a test for A is greater than B and a test for B is greater than A, you have done two one-tailed tests that check exactly the same hypothesis as the two-tailed test but you have magically doubled your chances of finding a significant difference between A and B compared to your chances with the single two-tailed test. Without any adjustment for multiple comparisons, you've taken the exact same data and the exact same calculations and doubled your chances of finding a statistically significant result. You did this by lying to yourself and your mother and I are disgusted with you.
This seems right to me. Assume we are doing a poll in an election between A, B, the only source of error is statistical (no biased sample, all responders correctly report how they are going to vote, no surveyers making up responses etc.), all votes are either for A or B and you have no prior view of who is ahead. You want to minimize the chance you will report A (or B) is winning when they aren't. Assume the worst case which is when A and B are exactly tied. Assume with your sample size 5% of the time A will lead by 52-48 or more (and 5% of the time B will lead by 52-48 or more). Then if you report A (or B) is winning when the margin is 52-48 or more you will make an error 10% of the time which I believe is the two-sided result.
Of course there is an objection that if the election is tied and is decided by a coin flip (or something) then you would still be right half the time. But I would argue you weren't really right in reporting A (or B) was winning, you were just lucky. But the issues are rather subtle.
177
This thread makes me doubt that statistics should be taught as part of the standard high school curriculum.
Yes a little probability is probably ok but statistics can quickly get you in deep water.
192.2: And a straightforward bell curve calculation gives you the odds that A is winning.
Is "A" Charlie Sheen?
You have a sample mean to give you an estimate of an unknown population mean. If you did the survey right and by the usual means you see in the paper, you now know there is a 95% chance that mean is within +/-3 of X. You are proposing that we see if the 50% line (or B's vote) is outside of +/- 2 standard deviations from X? Is that right? If so, you are dumping part of the error. There is only a 95% chance that your 95% CI you used the second time around is right. The test is whether the 50% mark (for two mutually exclusive and exhaustive choices) or B's estimate is in your 95% CI.
193: This has nothing to do with philosophy of science as I understand the term. And you can infer perfectly well from a single survey. You just can't adjust your inference outside of standard sampling theory.
182: I was assuming that we were describing the quantity (A% - B%), i.e. the difference. Obviously if you're modeling A and B as independent variables you need to do more work.
180: I agree that this is not the intended use, but as far as I can tell there's nothing in the generation of confidence intervals that requires them to be the complements of a typical tailed test. For distributions like the normal that are location-invariant, the probability that the sample mean is within a certain interval relative to the true parameter is equivalent to the probability that the parameter is within a related interval around the sample mean. Unless I'm making some kind of technical error (which might well be the case), this implies that you can calculate P(true_mean>sample_mean)=P(sample_mean
193: If you're thinking about Bayesian priors, there are plenty of standard "uninformative" priors. Of course the fact that you can choose between different supposedly uninformative priors is itself problematic.
202.2: OK. But the conversation was confusing enough and I didn't want to make it weird.
Paragraph 2 should have finished:
"Unless I'm making some kind of technical error (which might well be the case), this implies that you can calculate P(true_mean>sample_mean)=P(sample_mean
Dammit, stupid carats. Let's see if the escape character will work. Paragraph 2 should have finished:
"Unless I'm making some kind of technical error (which might well be the case), this implies that you can calculate P(true_mean>sample_mean)=P(sample_mean<true_mean), even when sample_mean is sitting right on top of your null hypothesis.
I only read the first sentence of that paragraph.
It's after ten and I saw the equal sign. I don't read equations after ten.
Moby Hick : reading equations : 10:00 pm :: mogwai : feeding : midnight.
208: I've never seen tripartite ratios like that before. Is that a thing?
I just figured it should be a thing.
Is it equivalent to
Moby Hick : reading equations :: mogwai : feeding
AND
reading equations : 10:00 pm :: feeding : midnight
?
I think it's a failure of our linear way of writing. Truly, it ought to be represented:
Moby : reading equations
. . . .
10:00 pm
and then a stick emerging along the z-axis, so to speak, from the center of the triangle, and on the other end of the stick:
Mogwai : feeding
. . . .
midnight
In other words, like a triangularly-weight-shaped barbell.
214 to 212. And I thought 208 was awesome.
Essear has always harbored a hatred for triangularly-weight-shaped barbells, so.
It appears he's overcome his grudge!
I need a phone with a 3-d display.
Someone was telling me about 3-D TVs at BestBuy, and for some reason I was picturing some amazing hologram Star Wars table thing. I was totally overreacting, mouth agape, etc.
Later Jammies was like, "Seriously? Why were you hamming it up?" and I realized that it was a wear-glasses Captain-Eo style good old 3D.
That won't help me play Angry Birds.
221: I don't know. I'm mostly convinced that the 3D-TV technologies are for the birds.
Exactly. I want to see the feathers go poof.
Andrew Gelman's talk linked in this post is interesting on the question of how to interpret statistical significance when effects are small. But given that we don't typically expect political candidates to be very precisely evenly matched, I'm not sure it's relevant to the above discussion.
So, most of this statistics stuff is way over my head, not something I couldn't understand if I tried, but not something that I can pick up easily from blog comments.
So, I'd really like to learn statistics in an introductory course but in a semi-nuanced way. I wish that there were a bunch of Boston area unfogged people who felt the same way so that we could just hire a tutor together, but we're all spread out.
225: We're not that spread out, just bimodal.
P.S.: The harvard ext. intro stat class is kind of meh.
This was not a good conversation to try to understand.
225
So, I'd really like to learn statistics in an introductory course but in a semi-nuanced way. ...
You could just buy a textbook and go through it on your own. I would suggest (based on very little) Statistics . New copies are expensive but used ones are just a few dollars.
226: Right, but redfoxtailshrub who is interested in a good statistics class is not, you know,in Boston. That's what I meant by spread out. I think that there was somebody else who did not live near rfts who was also kind of interested.
There was a thread in which dsquared said that econometricians had made really significant contributions to statistics, and somebody else (snarkout maybe?) pointed out that economists were stuck on regressions as a means of analysis. I tried to do an advanced google search on the site, but I got the terms wrong. There was a link to a good introductory textbook explaining some of the new ways of analyzing data. (a friend of mine who is an economist is interested.)
I want to see the feathers go poof.
Randy Johnson obliges.
Randian Johnson is obligated only to himself.
[AP Stats somewhere upthread]
When I saw the material for my eldest kid's AP Stat course I forbid him taking the AP test to forestall him not taking a college intro course. Although they are fraught as well.
I never had stats as an undergrad so I don't know what they teach.
Introductory college courses on statistics are so uniformly dreadful that I shudder to think of what a high school class would be like. I am very happy I never took one. (What I was taught about statistics in my physics classes matches essar's account exactly.) I hope we do it better here than most other places, but even so I am not very happy with it. Certainly the introductory stats. for engineers class I taught my first semester on the faculty was dreadful.
With those bona fides established, let me recommend, in all seriousness, The Cartoon Guide to Statistics by Larry Gonick and Wollcoot Smith.
As a second book, the modestly titled All of Statistics, by my friend Larry Wasserman, is actually extremely good, if you remember calculus. It grew out of an honors introductory course Larry taught for several years.
On the specific issue about whether candidate A is ahead of candidate B when they over-lap, it sounds like Volokh was being an ass (such a surprise), but not entirely without a point. I'd explain it like this. The usual meaning for "margin of error" in a poll is that it gives the (half-) width of the 95% confidence interval, i.e., everything outside that interval can be excluded with at most a 5% risk of error. If we wanted a higher confidence level, the margin of error would grow; if we tolerated a higher error rate, it would shrink. It would also shrink with more people being polled. Having the interval centered on one side of 50:50, but including it, means we have only very weak evidence in favor of either candidate.
If we say the margin of error was 3%, I can back out the sample size to be about 1110 (assuming the simplest [binomial] polling model). Say A is ahead in the poll by 51:49. What's the likelihood of getting a poll outcome that much in favor of A if they're really tied? A bit over 25%. If A is really behind 49:51? A bit over 9%. If A is really behind 48:52? 2%. Can you rule out A being behind 25:75? Yes, baring novelistic coincidences. You see how this goes. You can call it for A on this evidence, but you'd wrong as often as one time in four, which doesn't seem like good odds.
And Moby is right about the awful consequences of just changing around your hypotheses without taking that into account in your testing procedure; that gets you not just jellybeans but interspecies role-taking in post-mortem Atlantic salmon.
151.last: For that, I charge extra.
230: Hastie, Tibshirani and Friedman's Elements of Statistical Learning is great, and free online. My lecture notes on data mining and undergraduate data analysis resemble them in the latter respect.
I don't know shit about, well, about much of anything, really.
233: May I ask what about AP worried you? I'm a statistician who used to teach undergraduates, and I thought that AP was much better than the typical undergraduate course. Most instructors I knew at other colleges thought so, too.
237: It may have been the instructor who taught it in our high school as much as the AP material itself. I just recall at some point in the semester (our school does "block" scheduling so you get double periods for a semester rather than a whole year) looking at his homework and what they were learning and quizzing him and thinking "this is not enough". Some of it was about that child as well, a brilliant kid with bad study habits who would have easily crammed for a 4 or 5 and forgotten it all by the next weekend. (But that was even easier to do in college ... but that is another story--although he did actually did end up taking a decent stat course in college on his second try at it.)
This would have been back about 2002 so I am bit hazy. But those caveats aside, overall I do recall thinking it was lightweight compared to AP Calc and even the AP Comp Sci which had a dynamite teacher out our school; the code had to work plus meet a bunch of other criteria that she was quite strict with.
Does 233 imply that the kid was required to take a statistics course in college? That would be unusual, wouldn't it?
Not unusual---statistics is required in a lot of undergraduate programs.
"a lot"? Really? Maybe "a few."
More accurate might be "programs taken by a lot of students," like engineering, business, psychology, and econ.
239: No, but in the fields of study he was likely to go into, yes. I probably did over-react*. My own undergrad stats was calculus and math-heavy with lots of proofs and probably failed along lines that dsquared mentioned. I found a probability course I took a lot more useful. I almost think best is some basic introduction to the concepts as part of a standard college prep math sequence and then in college a course specifically targeted at a particular field that can use real problems from that field as relevant examples. I can't recall what it was now, but I used to use a Bio-statistics college text of my wife's because I could at least minimally relate to the problems.
*The AP classes at our school were widely variable and having had three go through them, I could see that much of that variability was due to the individual teachers.
I'm glad 235 mentions the Hastie et al. book, because it reminds me that I downloaded it last summer, read a couple of chapters, thought "this is interesting, I should read it all when I have some time," and promptly forgot about it.
I sort of wish someone had forced me to take a statistics class at some point, although I suppose if they had it would have been a boring cookbook class and I probably wouldn't have learned anything.
241, 242: I would have thought any hard science but I guess not. But yes in my experience: econ, engineering, bio, environmental sciences, business, psychology, any social science and applied math. So as Kreskin says, a lot of students.
Generally not in the hard sciences, no. I forgot about general education requirements in 242.
Most of those students (undergrad business, psych, social science) would likely not have a "real" stats course.
Though, TBH, even in grad school my prob&stats class was ridiculously heavy on cookbook bs.
247: Sure, but ideally other than the cookbook approach (which is "dangerous") it would concentrate on concepts to the degree that 1) the person could read and understand its use in the standard work in their field, and 2) know when to seek expert help when they need to do something beyond the most simple applications in their own work.
249: Yeah, true. I don't have that much confidence in my own education on the topic--but I absolutely maintain my chauvinism towards those who didn't have to start with Kolmogorov.
Not for any good reason, naturally.
||
Myself, Bonsaisue, and our two squirrely progeny are planning to be in NYC over Memorial Day weekend. We'll be coming the Thursday and leaving Monday. Would there be any interest in a Mineshaftian ingathering?
|>
||
How can I get rid of these slugs that are polluting my doorstep?
|>
I sort of wish someone had forced me to take a statistics class at some point...
You could probably find a highly specialized dominatrix for that.
235.last: Hey, classification trees in R. I need that. Thanks.
255: That's a pathetic little confidence interval you have and you know it. Don't you, worm?
254: it's okay, I can understand it: we'd be indistinguishable from the bridge-and-tunnel crowd.
255, 257: I so did not need that image in my head during next week's student project presentations. Thanks for nothing, mineshaft!
255, 257: I so did not need that image in my head during next week's student project presentations. Thanks for nothing, mineshaft!
You could probably find a highly specialized edition of "The Secret" dominatrix for that.
You could probably find a highly specialized dominatrix for that.
She may be expensive, but oh, let me tell you about our goodness of fit...
She can't help with non-paramour-tric analysis.
235: Cosma, can I ask you a question if you're still around?
My understanding of confidence intervals in null hypothesis testing is that a 95% confidence interval of +-X means that in the long run, given this data generating process, and this data analytic procedure, you will generate a point estimate that is within X points of the true value 95% of the time. But further: this does not imply that in a particular dataset you can exclude values outside it with 5% risk of error.
Sadly, I don't have the book that formed my understanding of this -- Royall's Statistical Evidence: A Likelihood Paradigm -- in front of me, but I'll quote from an article that recapitulates a lot of Royall:
Turning to confidence intervals, an x% confidence interval means that when using a specified sampling procedure the true value of an estimated parameter will fall within the boundaries of the resulting intervals x% of the time. It does not mean that the true value will fall within this particular one interval x% of the time and it does not mean that one can be x% certain that the true value is within any particular interval. As can be seen from the definition of confidence intervals, they are intimately related to the logic of the Neyman-Pearson framework. Within that framework they provide an index of precision and sensitivity in terms of sampling and long-term objective probability. That is, it is the procedure (e.g., rejecting the null or generating an interval) that has a long-term conditional error probability (of, e.g., rejecting the null given that it is true or generating an interval that does not contain the true population parameter value). Translating that into an index of evidence for or against a particular hypothesis given only particular obtained results is neither obvious nor feasible. Confidence intervals simply do not speak directly to the question of what these and only these data imply for any hypothesis, because confidence intervals depend on considering a whole class of (imagined or real) replications.
I know you have students you are actually paid to teach, but if you get a chance I'd be very interested if you could help me resolve the discrepancy I see here. Is the discrepancy only apparent? Or is it just that you, and everyone else who treats margins of error as statements of confidence in the true population value, is interpreting outside of the N-P framework.
(I'm glad to see you recommend All of Statistics. I got that book and was planning to read it this summer.)
I guess I treat, or should treat, margins of error as statements of confidence about this particular method of describing the true population. Like, given a bunch of Star Trek-style alternate timelines, in x% of them this method describes the datum in question within n units.
264: The sample mean can be used (if you do it right) as the estimate of the true mean. If nobody is looking.
266: ?
Of course you're using the sample mean as an estimate of the true mean. My question is about the inferential component, the correct interpretation of statements of confidence about the validity of that estimate.
265: If n = 456321, you have a 2.1% chance of being in the timeline where the green women in bikinis are attracted to humans.
My understanding of confidence intervals in null hypothesis testing is that a 95% confidence interval of +-X means that in the long run, given this data generating process, and this data analytic procedure, you will generate a point estimate that is within X points of the true value 95% of the time.
...
This isn't quite right, you will generate a value within X 95% of the time ASSUMING THE NULL HYPOTHESIS IS TRUE. So if you "reject" the null hypothesis whenever you observe a value outside the interval you will make an error (in the sense that you are "rejecting" the null hypothesis when it is correct) at most 5% of the time.
If the null hypothesis is not true there is little you can say without making assumptions about what alternative hypothesis are likely.
... But further: this does not imply that in a particular dataset you can exclude values outside it with 5% risk of error.
This is a little unclear. Perhaps an example will clarify things. Suppose we have coin which we can flip and observe heads or tails. Our null hypothesis is that the coin is fair and heads and tails will each be observed 50% of the time. Making these assumptions then we can calculate a value of n (which I am too lazy to do) such that if we toss the coin n times then 95% of the time we will observe a number of heads between .48*n and .52*n (because n has to be an integer these values .48, .52 and 95% won't be exact). So .48*n -.52*n (or .48-.52 if we normalize by dividing by n) is our confidence interval. Suppose we observe .51*n heads. Then we can say this is consistent with the null hypothesis (there is an implicit significance level here of 5%). (And if we observed say .53*n heads we could say this is inconsistent with the null hypothesis, again with an implicit significance level). But we can not say there is 95% chance that the true head probability value is between .49 and .53 (or in any other small interval like .48 to .52) without making additional assumptions.
If in the above example we made the additional assumptions that tosses are independent random trials, that each has probability p of producing heads and that our coin was picked randomly from a population of such coins in which p was equally likely to be any value from 0 to 1 then if we observed .51*n heads in n tosses we could compute an interval (which I believe would be about .49*n to .53*n) in which we are 95% confident that the true value of our coin's p lies. But this is going well beyond the traditional null hypothesis framework.
Disclaimer - I am not a statistician.
267: That estimate has a different confidence interval that you calculate by [waves hands].
235: In order to calculate the percent of the time that I'd be wrong, you're calculating the percent of the time that I'd be wrong *if the null hypothesis is true.* But certainly the probability that the null hypothesis is true is substantially less than 100%! Right? Or am I missing something here?
Disclaimer: I haven't done random surveys in 15 years.
ASSUMING THE NULL HYPOTHESIS IS TRUE.
right, thank you. Jesus, it's hard to maintain the correct representation of this stuff. Anyway, you helpfully reminded me of just why it's wrong -- or I thought it was -- to take the margin of error as a reflection of one's confidence in a particular estimate, as opposed to one's confidence in the entire procedure.
If in the above example we made the additional assumptions that tosses are independent random trials, that each has probability p of producing heads and that our coin was picked randomly from a population of such coins in which p was equally likely to be any value from 0 to 1 then if we observed .51*n heads in n tosses we could compute an interval (which I believe would be about .49*n to .53*n) in which we are 95% confident that the true value of our coin's p lies. But this is going well beyond the traditional null hypothesis framework.
Which of those assumptions are not in the null hypothesis significance testing framework? Independent, identically distributed is an assumption of NHST, right? The uniform distribution of p does sound like an assumption you don't make in NHST -- that actually sounds like a Bayesian prior to me, but I don't really understand Bayesian stats; I can obviously barely retain frequentist stats.
Is the reason Cosma and others feel comfortable taking a margin of error as a measure of confidence in a particular estimate because people who interpret polling data make additional assumptions about the distribution of p?
Anyway, you helpfully reminded me of just why it's wrong -- or I thought it was -- to take the margin of error as a reflection of one's confidence in a particular estimate, as opposed to one's confidence in the entire procedure.
To elaborate, because you have no idea whether the A is "really" beating B, or vice versa, in the particular case, and confidence intervals are generated using the assumption that A and B are tied. Thus they can't speak to the particular case, and only to the long-run procedure. Do I have that right?
265: If n = 456321, you have a 2.1% chance of being in the timeline where the green women in bikinis are attracted to humans.
She-Hulk is human, Moby. It's not her fault she got a blood transfusion from Bruce Banner after he got irradiated by gamma rays.
Anyway, at least now I'm getting a handle on what the other point of view on statistics is and why it would change how you dealt with questions like whether A is beating B. From the two points of view the bell curves that you get out at the end of the calculation mean totally different things:
1) The odds of various outcomes assuming some null hypothesis. You look at where your outcome lies.
vs.
2) The odds of various realities given some priors and the fixed outcome. You calculate the part of the graph corresponding to your question.
In particular, if I'm understanding things right, the confidence interval is the collection of hypotheses which, if true, would generate this outcome at least 5% of the time. This is, in general, very different from claiming that 95% of the time one of those outcomes is true. For the latter kind of claim (95% of the time one of these outcomes is true) has more flexibility in terms of figuring out which collection of outcomes you want to use, while for the former there's obviously only one interval you're allowed to use.
274: Yes. Or rather, each null has different CIs. A is not B is one null and A is not whatever value you got for the point estimate.
273: Standard significance tests do not assume that p has a uniform distribution. The logic of tests also does not require independent, identically distributed observations.
273
Which of those assumptions are not in the null hypothesis significance testing framework? Independent, identically distributed is an assumption of NHST, right? The uniform distribution of p does sound like an assumption you don't make in NHST -- that actually sounds like a Bayesian prior to me, but I don't really understand Bayesian stats; I can obviously barely retain frequentist stats.
Independent random trials is generally part of the null hypothesis. And you can also test for it (but people often don't). So if your sequence of tosses was HTHTHTHT etc for 100 tosses you could also reject the null hypothesis even though it is ok just looking at the fraction of heads.
If you assume the null hypothesis is true then you know the value of p (probability of heads) independent of your observations. If you assume a Baynesian prior distribution of values of p then you can update the prior with each observation but you won't reject it even if you observe HTHTHT... forever.
A little poking around about the frequentist approach suggests that the point of view I found self-parodying ("if you can't exclude the null hypothesis you've literally learned nothing") does seem to be part of that point of view. That just sounds really weird and unintuitive, and also is what leads to a bunch of the problems that then need to be avoided with new rules. For example, if you did the same survey 10 times you could learn nothing the first 9 times and then exclude the null hypothesis the 10th time. This is clearly ridiculous, so you need to add a bunch of rules of what you are and aren't allowed to do.
If you just did the Bayesian approach and learned something from the first 9 tests, you'd end up with a prior distribution which was pretty strongly concentrated around the "null hypothesis" and then when you do the 10th test it's just built into the system (with no additional fixes or tweeks) that you'd need a really really strong result to actually change your priors enough for the null hypothesis to become unlikely.
Yes, the seemingly self-parodying stuff about being able to conclude nothing from a non-significant result is indeed part of the theory, at least in some presentations of it. But it's hard to say what the frequentist approach is, because the theory of statistics is only a weak guide to practice, and because statistics is applied in all sorts of ways in different disciplines---sometimes by PhDs, and sometimes by researchers with little statistical training.
277: Not that I'm the ultimate authority, but while 276 seemed more or less right to me, I don't know what this
In particular, if I'm understanding things right, the confidence interval is the collection of hypotheses which, if true, would generate this outcome at least 5% of the time.
means, but I think it's very unlikely to be right.
You pick one hypothesis. You designate it the null.
If you're a Fisherian, you can decide if your data are improbable, given the null, and call that evidence against it (but you can never generate evidence for it). That's inference from the particular dataset to the question of whether it's unlikely that (for example) A and B are really tied, but I didn't think there was such a thing as a Fisherian confidence interval. I thought they were strictly Neyman Pearson. Also, I thought that Royall book made a darn convincing case for Fisherian inference not being legitimate.
If you're a Neyman-Pearsonian, you decide how often you want to make the mistake of acting as if the hypothesis you designated null is true when it's false, and false when it's true, in the long run. Your confidence interval is the region in which you decide to behave as if the null is true. It has this feature: given that the null is true (thanks, James), in the long run with the same procedure 95% of the time the true population value will fall in that interval. This isn't like saying that the CI is collection of hypotheses.
really weird and unintuitive
yeah. That Royall book I mentioned makes the case for a different paradigm for inference because it's incredibly unintuitive. (Witness that I work on getting this stuff straight over and over and find a couple of months sufficient to make a lot of mistakes in how I state things.)
I still don't understand the justification for this: i.e., everything outside that interval can be excluded with at most a 5% risk of error.
283
If you're a Neyman-Pearsonian, you decide how often you want to make the mistake of acting as if the hypothesis you designated null is true when it's false, and false when it's true, in the long run. ...
I think this is only half right. Data consistent with the null hypothesis doesn't give any sort of bound on the probability that the null hypothesis is false. I don't think you can say anything about how often you will mistakenly accept the null hypothesis as true without additional assumptions.
I'm starting to wonder if this stuff is even worse than interpretation of quantum mechanics, as things that lead smart people to get hopelessly confused and mired in lengthy and sometimes meaningless arguments go.
What I understand of the Bayesian paradigm seems fairly intuitive and satisfying, if there's a finite, discrete set of options that exhaust all possibilities you might be interested in, so that no matter what prior you start with you won't go too wrong in the long run.
On the other hand, if you have an infinite set of possible hypotheses to consider and no natural measure to use on that set, I don't see how any framework for inference would be completely well-defined, which I guess is what drives the desire to just make limited statements like "assuming the null hypothesis, this result doesn't look very probable" -- how do you know if that's evidence for some other hypothesis, if you don't have a good characterization of what they all are?
I'm not sure if these are the real issues that drive these debates, but they're the things that seem troublesome to me based on very limited knowledge.
285.3 has it right. The contorted interpretation of significance tests comes from considering the null hypothesis in isolation. If you want to test an isolated null hypothesis, without specifying alternative hypotheses, the p-value (the tail area of the sampling distribution under the null) is about the only thing you can sensibly calculate.
283: The confidence property is definitely a Neyman-Pearson idea---Fisher had his own system of interval estimation, called "fiducial inference." Fisher may have sometimes used confidence intervals too---I'm not sure.
This Cosma post makes interesting reading.
I think this is only half right. Data consistent with the null hypothesis doesn't give any sort of bound on the probability that the null hypothesis is false. I don't think you can say anything about how often you will mistakenly accept the null hypothesis as true without additional assumptions.
Sorry, what I *meant* was (it is very hard to write clearly about this stuff): you need to decide the upper bound you want to put on each type of mistake before you decide what you want the width of your confidence interval to be. In order to control the "true when it's false" type of error, you need an estimate of effect size. So yes, you need another assumption. However, I think as N-P originally formulated their framework, both parts are important components of the procedure.
285.3 and 286:
Have I mentioned I really liked that Royall book? It makes a case for an inferential framework in which you pick two hypotheses to compare and use the ratio formed by (the likelihood of getting the data|Hypothesis A)/(the likelihood of getting the data|Hypothesis B) to quantify the evidence for A over B. It does wind up being more intuitive, IMO.
285.3 My real objection to unobjectionable statements like "assuming the null hypothesis, this result doesn't look very probable" is that they don't match how people want to use the results. A more Bayesian "here are the odds that a betting house should place on A beating B" is much closer to what people are looking for when considering election polling, for example.
I enjoyed the Royall book, too. Two others I'd recommend are Comparative Statistical Inference by Barnett and Statistical Inference by Oakes.
288 does sound pretty cool
290: This is where I often get to as well. I certainly see that rigorous use of the standard tests are what you want to use if you're talking about using the results to add to the corpus of scientific knowledge, justify beginning a significant engineering project, embark on an extensive public health initiative--things like that. However, a lot of the time people are just trying to guide their thinking about likely outcomes.
Here is a possible "for instance" concerning significance and potential liability, and with which I had some passing familiarity from afar (but could well be wrong on the details). It concerned a cluster of disease occurrences in a particular work location. There were some plausible (but certainly not proven) causal factors, but the question at one stage merely concerned looking at the incidence of this disease in the potentially-affected population versus the general population. I forget exactly how the question was asked (what the null hypothesis was) but it turned out the difference in rates was not statistically significant at any of the accepted levels but seemed reasonably "close" (say ~.85) from a layman's perspective. So, "move along, nothing to see here". I probably have something wrong in the specific case*, but the more general question is whether that is even the right way to look at something like that? How should the test even be formulated; Type I and Type II errors would both be arguably significantly unfair to one party or the other. Would something more "odds-based" be appropriate in assigning liability and awards, and is that ever done? (So this as much of a law question as a statistics one.)
*I've moved nearly completely out of the circle of folks involved so do not know the final outcome, but the nature and interpretation of those results caused an ugly rift in the social circle of those affected. Some years later the potentially liable party took steps that could be interpreted as implicitly acknowledging some link, but did so very carefully. I actually would not be surprised if there were not some settlements in the end.
295: I don't think there's a methodological way out the fundamental dilemma of needing to make a binary decision as to liability, and needing a cutoff point at which to do it. There will always be a tradeoff between the likelihood of hits and misses/sensitivity and specificity/type I and type II/whatever you want to call it. You'd have to establish a threshold for actionable proof in any inferential paradigm. What do you mean, "odds-based"?
(I mean, in theory you could grade the degree of financial liability with the certainty of the evidence. That would be weird.)
296: What do you mean, "odds-based"?
I think I meant something like your parenthetical: if certain set liability=x; from there proportionately decrease it with increasing uncertainty (i.e. scale with the odds that the one party did cause the harm). Acknowledges that you will never get to ground truth and tries to compensate accordingly. This is probably legally, morally and intellectually incoherent, but I think it is right for some personal value of right.
This is probably legally, morally and intellectually incoherent, but I think it is right for some personal value of right.
It doesn't seem that crazy... at least Jaynes is on your side.
Someone really needs to make a musical about the bayesians vs. the frequentists. What rhymes with Neyman-Pearson?
Frequency probability is not well-defined. This is false. There are at least three good definitions of frequency probability.
This is my definition! If you don't like it, I have others.
Brilliant, thank you Cosma.
But I was thinking something with a little more song and dance. Slicked back hair. A few shivs. Jets vs. Sharks.
The right librettist might pomo the story by bringing in Efron as a gay beat poet in Act 2.
If you don't like it, I have others scribbled on the back of a student's t-test.
303: Perhaps you've had too much Guinness.
264: I have a response, but it is too large for this comment box, and will turn into a blog post. (And possibly a lecture in data analysis next spring.) Shorter me: I think Deborah Mayo gets this pretty much right; see e.g. her paper with David Cox (though they skip confidence intervals there, oddly).
I have a response, but it is too large for this comment box, and will turn into a blog post.
That's what they always say, and then a hundred years later, still nothing.