Mildly NSFW due to a link on the front page to a less SFW article.
However!
These answers have been provided by users of OkCupid softheaded enough to think that its simplistic polling model makes any sense!
For instance, take the question "Is it logically inconsistent to support the death penalty but oppose abortion?". What might be the, you know, conditions of possibility for answering that? You'd have to postulate rationales for each position, right? Regarding those you might be able to say that they're logically (!) inconsistent, or not—but that of course leaves the original question unanswered (and it's not hard to find rationales that would validate either conclusion). Perhaps you've canvassed all possible combinations of all possible rationales for each position, and determined that either each such combination is consistent or each such combination is inconsistent, and you're also pretty confident that no further rationales are lurking, unthought-of, out there in the realm of possibility. But no one answering the question has done that (I feel it is safe so to assert).
Which doesn't, I suppose, rob the map of all interest, but it does mean that you don't really know what question the answerers were answering.
Every map is constructed to go from bright green to bright red, which makes it a bit hard to tell whether these are significant differences. It would be nice to see something that represents the percentage breakdown in each state more accurately.
(Well, not quite that, because obviously, I think the answer to that question is "no", but I also think it's obviously "no", enough so that I'm not sure, not just what the people who think the answer is "yes" might possibly be thinking, but also what the question-askers might possibly be asking.)
I like the last map. Impressive.
4: the use of "logically" in that question is indeed puzzling. "Morally" would have been better, maybe.
5: Except that, as I understand the procedure, it isn't. It could be that 92% of the people in Idaho said they would rather give up the right to vote than the right to bear arms, but 97-99% of the people everywhere else did, and Idaho would look bright red on that map.
Um, swap the order of the two rights in my comment.
7: You're using numbers. I don't get those. Can you put it in a color coded map, please?
Oh, also, while that is certainly true, they are measuring some difference, yes? I still think it is impressive to know that 7% of Idaho would give up the right to vote, while only 3% of Californians would....
you don't really know what question the answerers were answering.
I suspect in that particular case the question most of the answerers were answering was "are Republicans hypocrites?"
It's important to keep in mind that these questions are written by individual OkCupid users.
10: But we don't know that, I was just making up numbers. All we know is that Idaho deviates from the mean in that direction more than anywhere else, but we don't know how far they deviate. It could be that 80% of Idahoans would and 2% of everyone else, or that 2% of Idahoans would and 1% of everyone else, or....
It would all be much improved by simple labels of what the extremes correspond to in terms of % answers.
The OkCupid algorithm always seems to match me with zillions of people who list The Fountainhead as a favorite book, so I don't trust them to do useful things with data.
12: most are, some aren't. And that's no excuse!
4: the use of "logically" in that question is indeed puzzling. "Morally" would have been better, maybe.
And yet you could still have the positions that (a) abortion is to be opposed because it's an instrument of women's freedom and (b) the death penalty is to be supported because wrongdoers should forfeit their lives, and these are not obviously inconsistent positions morally or logically (whatever specifically moral inconsistency would be), whatever else is true of them.
Indeed, you're only allowed to write new questions after you've answered some huge sum of already existing questions, which means that the only people who can write new questions are those who thought that the already existing questions were somehow worth answering.
most are, some aren't
I think all the ones these maps are based on are.
16.last: sure. I didn't say it would stop being stupid, only that "logically" was particularly puzzling.
13: Yeah, I know the numbers were made up; I suppose my point is that it does tell you something, even if it isn't statistically significant.
So, um, yes. You're right, the maps would be much better with labels.
it does tell you something, even if it isn't statistically significant
If it isn't statistically significant, it doesn't tell you anything.
I wonder how Netflix's data would compare as a way to map out regional trends.
21: Right. I've got to stop admitting that I'm wrong in a more direct manner.
22: now 10% more accurate when recommending abortions!
For that matter, why hasn't Netflix tried to spin off a dating service?
For that matter, why hasn't Netflix tried to spin off a dating service?
There's more demand for their service if people watch movies separately? Not to mention keeping the shut-ins shut in probably helps.
There's more demand for their service if people watch movies separately?
I don't know about that. I've had a Netflix for days that I would've watched already if eekbeat were around. n=1.
I've had a Netflix for days that I would've watched already if eekbeat were around.
But you have the account, and they don't really lose much if you just hold on to a DVD for days. (I sometimes have one sitting around for months before I finally watch it....)
If two people effectively have a joint account rather than two separate accounts, that's where they would be losing money.
But you have the account, and they don't really lose much if you just hold on to a DVD for days.
It would be best for them if everyone just held on to their DVDs for months—postage adds up.
Really, the most a dating service could do, from Netflix's perspective, would be to get people to go through DVDs more rapidly, and while that might mean upgrading to a more expensive plan for some people, they don't make their money by the disc, but they do have to pay for each disc's round trip. It wouldn't really spur anyone to join, either, since presumably it would work by taking advantage of your rental and rating record, accreted over long years.
28 is poignant.
"n=1" is the loneliest equation that you'll ever do.
32: you're discounting word of mouth, though. Or happy couples, featured in advertising. Bonanza!
32: Depends whether they charge for the dating service, or get extra advertising revenue. "You live 0.2 miles from a compatible user who is currently watching Mad Men, also likes Godard and hated Fire Walk with Me. For a mere $10, you could be watching the rest of the season with her soon!"
Or happy couples, featured in advertising.
Are they advertising the dating service itself, or is that an adjunct of their main dvd-shipping business?
Netflix should just add in an (opt-in) way to see other users who are near you both geographically and tastewise, with some other stats displayed, and to message them, but not call it a dating service.
Or maybe this is a terrible business model that would never make sense. Who knows? Fortran ate my brain today.
If two people effectively have a joint account rather than two separate accounts, that's where they would be losing money.
Ah, good point. There's a certain cost-to-doing-business thing (shipping fees, lost DVDs {I, for one, have had several get lost in the post; they seem to absorb the cost without blinking}) that also comes into play. So, sure, they make more money temporarily if no one watches movies. But if no one watches movies no one keeps a Netflix account.
There's a good amount of post-watching-data they could be using to do something, I guess, was the point I was interested in. A netflix dating service seems like a great idea. They know where you are and what you like, movie-wise.
Somebody go pitch this to Netflix is all I'm saying. Also pitch with my ongoing complaint that Netflix should include a note telling me why the fuck I ordered X movie six months ago. "Oh, mcmanus suggested it? Word."
Somebody go pitch this to Netflix is all I'm saying.
Get me a hobo!
I see that I've been massively pwned and hobo'd. You bastards!
There's a good amount of post-watching-data they could be using
They collect data on a movie's date-effectiveness?
There are too many embarrassing movies in my queue or that I've watched for me to want to share that information. Hopefully there would be some way to manipulate your profile.
Somebody go pitch this to Netflix is all I'm saying.
If you sell the idea to them, please send me a nickel. Preferably a shiny one.
If they did, they could come up with tailor-made recommendations for what the prospective couple between the members of whom they mediate should watch, based on what's been effective with others with similar movie-watching habits.
42: do you want a match for your ideal you, or for the real you?
the prospective couple between the members of whom they mediate
Whoa. Fortran really did break my brain. Does this parse as English for normal people?
Should be "between the members of which". Your eagle eyes save the day once again, essear!
There are too many embarrassing movies in my queue or that I've watched for me to want to share that information. Hopefully there would be some way to manipulate your profile.
I guess you don't use the "Friends" feature that currently exists, then? I've had second thoughts about adding things because of it. Then I remember the reality-TV-watching habits of my friends with otherwise refined tastes, and no longer feel embarrassed.
45: Everyone needs to have their secrets, neb.
And no, not really, but I'd prefer to charm them with my good taste first, and then slowly work in the horrifying elements.
48: I do, but the only person I'm friends with actually lived with me, so she's well aware of my bad taste.
It's a good feature, by the way, for roommates, so that you if you don't share an account you can still keep tabs on what is in the house for you to pilfer.
21 reminds me of the religious views, as expressed on facebook, of a friend of mine: "There is a magical threshold called alpha = .05, which is the source of all truth".
Because being a great truth, the statement in 52 springs to mind whenever other truths are mentioned.
LAST NIGHT FORTRAN SAVED MY LIFE
If it isn't statistically significant, it doesn't tell you anything.
Take that, "impressionistic" historians!
Also, now I know what people are talking about when they talk about running coin-tossing, dice-rolling, trying-to-not-goat-choosing computer simulations.
Is there a way on Netflix to have "two users" on one account? I think that my BF and I are screwing up their rating system, because we have different tastes in movies. I'd like to be able to get recommendations for me and separate ones for him.
I'd like to see the "rape fantasy" map broken out by gender. There's a big difference between a male respondent being game to play along with his female partner's rape fantasy and a female respondent's being game to play along with her male partner's rape fantasy (heteronormativity preemptively acknowledged). I wouldn't be surprised if some or even most of the differential among states on that question can be explained by variations in the gender composition of the user base.
Totally OT: So I just woke from a dream in which I was about to be married to Ogged (whom I have never met), but discovered evidence that he was a serial killer. When he discovered that I was obliquely Tweeting and FB-statusing in the hopes of someone figuring it out and calling the police, he began menacing me with an icepick as I feigned ignorance of his crimes. LizardBreath arrived just in time to save me, tackling Ogged from behind and tying his hands together. It was all very cinematic.
The main reveal came when I realized there was a room in Ogged's house where he hung 8x10 glossies of his intended victims with his weapon of choice suspended over them. One by one, these little displays would disappear, as would the people they represented.
I have no idea why I am dreaming about Ogged. I sort of figure my brain chose a random name and it could just as easily been any of you. But I do like to think that LizardBreath would be the one to save me if I was in a pickle.
60: What did ogged look like in your dream?
8x10 glossies of his intended victims with his weapon of choice suspended over them
I'm assuming he didn't include information about what room he was going to perform the deed in so as to throw Colonel Mustard et alia off the scent?
52: I hope your friend corrects for multiple comparisons.
61: You know how dreams are, sort of hard to tell, but vaguely along the general Mexican Chris Noth lines.
3: It would be nice to see something that represents the percentage breakdown in each state more accurately.
Or as in the case of the question, Should burning your nation's flag be illegal?, have the actual data points plotted. (It was near the bottom of the earlier "Rape Fantasies and Hygiene By State" post.) Here is the resulting map, and it is a nice visualization of where we Commies live (actually it could use a blow-up). Austin stands out in Texas (as do college towns across the country if you look closely), Seattle vs. Tacoma, Denver vs. Colorado Springs, Cleveland and Columbus vs. other Ohio cities, The People's Republic of Greater Vermont (includes parts of New Hampshire and Western Massachusetts). Lots of good stuff.
I'm amused by how interesting I find these maps given how essentially bogus I think the underlying data is. N=115,000, which I vaguely seem to recall is fine for a national sample, but of course it's not evenly distributed.
Given how much time they spend snarking on North Dakota, I'm extremely curious as to how many people from ND actually participated.
66: The number of people isn't the problem. You can get away with about 1,200 for a national sample (depending on the CI you want, etc.). It appears to be a self-selected sample. You can't fix that by getting more people.
I'm pretty fascinated with the MINUTE results. My only real worl experience with the state involved the boyfriend who swung far right and left me for Rush Limbaugh after moving there. I gather I've harbored an unfair and totally inaccurate impression of the state all these years.
Um, Minnesota results. Apparently my bb auto-corrects "MN" to "MINUTE."
66.2: You can get some idea of that from the map with the points plotted (pretty sparse, but needs a bigger scale map to really see, or the numbers would help as well).
And it is GREAT data, although yes there are of course a number of significant underlying selection biases. But with those caveats it is wonderful stuff (especially if you had (as they do) some basic demographic data to identify some of those selection biases).
The Minnesota results actually line up pretty well with my (limited) experience of the state.
67: I'm not sure what it means that I first read "CI" as "confidential informant." I haven't even seen The Wire!
(You meant "confidence index," right?)
Minnesota results
I know Emerson has said this before, but I would guess that MN is probably one of the most polarized rural/city states there is politics wise.
72.2: Confidence interval. When they report a survey in the news, they usually report a confidence interval (e.g. +/- 3%). This is usually the 95% confidence interval (which goes with the alpha
I seem to have lost the end of my last comment.
72.2: Confidence interval. When they report a survey in the news, they usually report a confidence interval (e.g. +/- 3%). This is usually the 95% confidence interval (which goes with the alpha = .05), though the news doesn't always report that part.
The more respondents, the narrower the CI for a given alpha, but diminishing returns set in pretty quickly after 1,000 people. Of course, if you are interested in some sub-group (or in comparing one group to another), you will need more people.
76: Ah, thanks, that makes sense.
Of course, if you are interested in some sub-group (or in comparing one group to another), you will need more people.
Right. I have this hazy notion that to do certain kinds of state-by-state comparisons with different subpopulations, the Census Bureau usually surveys 30,000 people, but maybe I'm completely off.
74: That claim was also made frequently about Pennsylvania during the 2008 primary; the neologism "Pennsyltucky" was much bandied about to describe the rural part of the state, at least in my circles. And I know a guy who is always talking about how Vermont was famously, strongly rural right-wing Republican until the last decade or two. And Washington and/or Oregon, I forget which but it's plausible of both, supposedly has very different political climates along the coast and inland. Austin, of course, is a cosmopolitan oasis, I gather.
There's stiff competition for "one of the most polarized rural/city states there is politics wise," I gather.
There's stiff competition for "one of the most polarized rural/city states there is politics wise," I gather.
Maybe. I should say that when I say city/rural about MN it is really Minneapolis-St. Paul metro area, everywhere else. If I am looking at the population data correctly over half the people in the state live in the Minn-St. Paul metro area.
77: When comparing sub-populations, the sample size depends on the alpha you want (nearly always .05) as well as how different you expect the sub-populations to be. Not suprisingly, you need a much bigger sample to look for a small difference between two groups than you need to look for a large difference. There are canned power analysis programs to assist with this.
78: The map I link in 65 (flag burning question) illustrates that pretty well. (I really do want a blow up). And in some ways it is the perfect question. For the bigger cities you can see the suburban rings.
81: And not saying that the question maps onto Republican/Democrats perfectly, it is more of a traditionalist/rootless cosmopolitan divide.
Maybe. I should say that when I say city/rural about MN it is really Minneapolis-St. Paul metro area, everywhere else. If I am looking at the population data correctly over half the people in the state live in the Minn-St. Paul metro area.
Rural/city: I learne dfrom this week's New Yorker that Duluth is very very Democratic.
It appears to be a self-selected sample. You can't fix that by getting more people.
Actually, you can, if you're clever about it, by properly stratifying the sample* (which is easier to do the more respondents you get). As Stormcrow suggests in 70, making demographic corrections goes a long way to countering the sample bias of self-selection. Obviously some bias remains, but there is a creditable argument that this particular bias is no worse, and possibly a little better, than that introduced by non-response.
I did some work for a polling outfit that bet their business on the truth of the latter proposition, and thus far they have done well. They poll exclusively online, recruiting their respondents in a fashion that allows them to be demographically sorted. The trick in their business model is that they compensate the respondents, and calibrate the compensation carefully to recruit enough respondents with various demographic traits in order to round out the sample with the lowest possible outlay. So, for example, if a poll of the online population tends to undersample old people in rural areas, they will offer higher compensation to those individuals.
I won't identify the company because it's potentially personally identifying, but if you read a certain highbrow newsweekly published in London you will see their polling frequently.
*N.B. I'm not claiming okcupid has done this, just that it's possible in theory and in practice
There's stiff competition for "one of the most polarized rural/city states there is politics wise," I gather.
The residents of New York and California would surely agree.
Erm, what makes Minnesota unusual is the rural white strongly democratic parts of the state, not a rural/urban split.
Actually, considering Minnesota hasn't been strongly democratic, the urban/rural divide in voting must be pretty small.
There aren't very many "rural" people, period. Minnesota contains lots of people who believe themselves to be living in small towns, actually live in exurbs, and are religious maniacs.
Paging Emerson!
You'll can reach him by snailmail at this address.
84: Yes, you can certainly do better internet-based surveys. What you describe is an attempt to get as far as possible from a self-selected sample given the constraints of the medium. I'm guessing that they use well below 100k respondents to do this (how could they afford it otherwise). My point is that numbers don't matter as much as where the numbers come from. There is a lot of ground between an ideal random-sample-with-replacement and what is actually done given non-response issues, cost, etc. But, adding more people, past a certain point, is not improving the methodology.
contains lots of people who believe themselves to be living in small towns, actually live in exurbs,
Isn't this true of much of the country now (by population, anyway)?
Those maps are nothing but an exercise in misinformation and piss me off.
However! A fine quatorze juillet to all. This is a fun read.
. . . what the French Revolution was really like - a digestive eruption of all the basest instincts in the lowest elements of society, led by power-drunk ideologues of the radical Left. . . . was utterly unlike the American rebellion against the English colonial officials - which amounted to a regional secession, led by the responsible members of the upper middle class.
Looking back, we see that black slaves in America at the Founding lived much worse lives than did poor Frenchmen, and had vastly fewer rights. Would that have justified a massive slave rebellion, ending with the murder of George Washington, Thomas Jefferson, and every other slave-owning aristocrat?
Sorry, was there a question in there?
Also, Happy Woody Guthrie's Birthday!
90: your surmises are correct. And I agree with you on all counts.
The outfit I was talking about deliberately oversamples by a much larger factor than an offline pollster (which they can easily do because it is much cheaper than telephone or in-persons polling), not to increase the sample size per se, but to facilitate the stratification.
92.1 is shorter me, except that I don't get pissed-off so much.
Fun maps. Results are entirely unreliable, but very intuitive, aren't they? Emerson would have a field day with them.
Erm, what makes Minnesota unusual is the rural white strongly democratic parts of the state
What parts of the state would those be?
Duluth is the main rural city in the north.
60: Also OT, I had an odd AWB-related Unfogged dream last night. I was in her kitchen, and there was a tureen of soup with a banana in it. She suggested that I remove the banana, since it wouldn't taste good in the soup, and then we ate the soup. Also, all of our conversation was via Unfogged comments.
92.1, 94, 95: Unreliable for what? At some level you folks are all utterly barking mad. Like the statistician who had his head stuck so far up his methodology that he was bitching about the pop-soda map. Get off your scientistic high horses and meet the world. Flaws and all, this is probably among the best datasets about broad-based public attitudes in the history of the world.
Hyperbole much JP? A little, when I'm overwrought.
On a very cursory glance, I appear to be mostly right:
http://www.fairvote2020.org/2004/12/bush-kerry-by-county.html
Smaller rural/urban divide than most places. The mostly dem rural parts seem more common in Wisconsin, but that be just in hectares, not people.
I may well be talking nonsense.
Q: Do you like Hibbing?
A: Why you dirty-minded man, I've never hibbed.
102: "Flaws and all, this is probably among the best datasets about broad-based public attitudes in the history of the world."
I'll try to estimate a p value for that statement and get back to you. Though, 'among the best' isn't easy to quanitfy.
"Among the best" is not a strong statement. I, for example, am among the best 290 million soccer players in the United States.
JP is right. This is a gold mine of information. Unfortunately, they're presenting it in a really lousy way.
This is a gold mine of information.
It's so hard to read tone. Are you guys really serious? OKCupid users IMO are extremely unrepresentative of the general population in terms of age (younger), education (more), social class (higher) and technological prowess (way, way, way more).
They're probably pretty representative in broadly-defined ethnicity (that is, the six widely used categories) and religion (maybe a slight over-representation of out atheists). But I would be very hard pressed to think that their opinions on hot-button social issues or moral questions could tell me anything reliable about 305 million Americans, most of whom are older, poorer, and less-well-off than they are.
I think essar and JP are just trolling the quants.
On a very cursory glance, I appear to be mostly right:
A large part of the North that went for Kerry is covered by the Superior National Forest and the Boundary Waters Wilderness area. I think that map tends to be somewhat misleading because of the very small number of people in those areas.
Yes, but CJB, in for example California, Oregon, Washington, Pennsylvania, Illinois, there were more 60-75% districts for Bush. In Minnesota, as in Wisconsin, less solidly blue, there were few.
Let's try that again:
Yes, but CJB, in for example California, Oregon, Washington, Pennsylvania, Illinois, there were plenty of 60-75% districts for Bush. In Minnesota and Wisconsin, less solidly blue, there were just a few.
Part of the problem in that I am conflating Liberal and Democrat which I shouldn't be. MN may have a decent number of Democrats. I don't usually go around asking people their political affiliations, but spending some time in those areas they are not what I would consider liberal.
Do not know how it has gone recently, but at one time northern Minnesota had a fairly strong union presence due to the iron ore miniing, so it was smallish towns/rural with a more Democratic slant than usual for that mix. So that's one key to understanding Minnesota political demographics.
"What do you mean key? Mesabi?"
I'm not trolling anyone, and I don't know what "quant" means in this context (surely we don't have multiple people working in quantitative finance in this thread?).
OKCupid users IMO are extremely unrepresentative of the general population in terms of age (younger), education (more), social class (higher) and technological prowess (way, way, way more).
Well, then if nothing else it's a huge dataset on geographical variations in attitudes among middle-class, internet-savvy 20-40 year olds, which is not uninteresting in itself. (Look at how many psych studies use 18- to 22-year-old university students. One doesn't have to sample the whole population to learn interesting things.)
But I would guess that their dataset is large enough that, by being clever, one can extract reasonably good information about the population as a whole with at least coarse geographic breakdowns. This would obviously take much more work than has gone into making these maps.
The philosophy of the modern GOP, brought to you by Sen. Jeff Sessions: "Empathy for one party is always prejudice against another."
117: By 'quant' I mean someone who uses quantitative research methods, particularly in the social sciences.
I may well be talking nonsense.
New mouseover text?
117: Looking for geographic patterns is one of the tasks for which a survey like OKcupid is worst adapted. Things like tech savy, income, age and education vary greatly with geography. When a psychologist does an experiment with students, they are looking at how certain traits interact with or relate to each other. They aren't trying (or shouldn't be trying) to estimate prevalence.
(Look at how many psych studies use 18- to 22-year-old university students. One doesn't have to be a stickler to find this suspect.)
come now nosflow, surely it's a pretty fair methodology to find the pulse of 18-22 year old university students.....
118: Gotta love that the only other judge specifically identified as a threat to The American Way is Justice Ginsburg. Damned women-folk with their golldarn empathy.
The judicial system is a zero-sum game. If women yes, then people no.
125: We can't help it. It's the hormones.
117: By 'quant' I mean someone who uses quantitative research methods, particularly in the social sciences.
It's possible that my lack of social science training makes me effectively innumerate for these purposes, but: really? You're saying this database of order a million responses each to hundreds or thousands of questions does not contain interesting information?
It's not how one would design an experiment to learn about these things, but it's there. It's observational data. And you don't think it can be mined to learn interesting things?
116.2: Nicely done.
Stormcrow has it right. While the Arrowhead (the northern pointy bit in MN) is no longer the hotbed of syndicalism it once was and, as per CJB, it''s not densely populated, it is a big part of what helps get Dems elected on a state-wide level.
I don't think we're that different from national political trends. We're a bit like an over-sized Vermont, with a more cosmopolitan urban center (thanks in no small part to recent immigrant populations). Where I think MN politics can be distinct is in having leftist or progressive politics get more of a seat at the table, particularly in the Twin Cities. Like everywhere else, there are plenty of Limbaugh-loving assholes here too, and, like everywhere else, they cluster especially densely in the suburbs.
I suspect some people are talking past each other on the subject of the OKCupid data.
one set is saying "but the selection method has bias!"
another "wow, huge database, lots of interesting stuff in there!"
These points aren't contradictory.
128: On a quick examination, I wasn't able to tell how they got their respondents and other crucial information. You may or may not be able to learn something from it.
But it is certainly possible to collect millions of responses and not learn anything accurate (except incidentally). The classic example is the Literary Digest poll for the 1936 election. They got 2 million responses and predicted Landon would beat FDR by a huge margin. Gallop got the right answer with a relatively tiny sample because he had a far better method.
130: You're right, of course. I'm just annoyed by the accusation of trolling.
131 128: On a quick examination, I wasn't able to tell how they got their respondents and other crucial information.
It's a free dating site. People sign up and answer questions to try to get better matches. So, yes, it's a very self-selected sample.
On the other hand, for each person on the site, the site owners know how they answered every question, plus their self-reported age, gender, race, education level, languages spoken, and income, most of which are relatively accurately self-reported, I think (filtering out the people who say their income is $1 million+ when it obviously isn't). Surely with sufficient effort one can use all this information to learn interesting things.
I never said the raw yes/no counts were interesting. I think it would take serious work to try to get controlled and reliable numbers out of the data. But it is a huge amount of data.
Well, then if nothing else it's a huge dataset on geographical variations in attitudes among middle-class, internet-savvy 20-40 year olds, which is not uninteresting in itself.
Oh, for sure. I was just reacting to the OK Cupid blog's apparent belief that they are actually describing geographic trends among Americans, rather than a subgroup.
On preview: Pwned by 130, but I"m posting because I figure essear deserves to hear that I don't think he's (?) a troll.
110: It's so hard to read tone.
Yes, it's hard to even know myself some times when I'm just being authentically obtuse or obtuse for effect. But in this case I am being mostly serious (ignoring the hyperbole) along the lines that essear discusses in 117. In partiuclar, I am not assuming that I can necessarily learn about the real parameters for all of America., but rather the expressed attitudes on interesting topics from a fairly broad sweep of people who have meeting other people for romance (pretty universal) as the common demoninator. And some of the deographic stuff could be removed by analysis. But, true, it certainly would not have an adequate representation of contentedly married older folk, for instance.
But it does have the advantage of being answers in which the respondent had an actual interest in (even if it does not reflect their "real" values), and furthermore. answers which the respondents were not aware would be ised for this purpose (so in some sense they are being spied on in aggregate).
And on preview, yes to 130. Anyway, I would love to see the actual dataset. And it was the map of the actual points for the flag burning question that really piqued my interest.
134.last: Me on the other hand...
"It's not how one would design an experiment to learn about these things, but it's there. It's observational data. And you don't think it can be mined to learn interesting things?"
But it never will.
And it was the map of the actual points for the flag burning question that really piqued my interest.
I really don't think you should trust that map to find you a safe place to practice your vile, unpatriotic deeds, JP.
"And a word about statistical validity: the best questions on OkCupid have been answered over a million times. Therefore we have unique insights into the American mindset. A quick comparison:
OkCupid Question Popularity
Old media could only get 3,050 people to answer a poll about Obama. And it was enough to call the election with confidence.
[...]
300,000 people have answered that question in 3 parts, and there are thousands more questions with as large or larger data sets."
He could possibly know it's nonsense, it's a lighthearted post, but it doesn't sound like it to me.
I'll be practice empathy anywhere I want.
Me on the other hand...
Awww, bubelah, not you either.
plus their self-reported age, gender, race, education level, languages spoken, and income, most of which are relatively accurately self-reported
Ooh, now I'm wildly curious about whether they have a bulge in 1/1 birthdays. Since they (ahem) send birthday e-mails, I'm sure OK Cupid has stats on those sorts of things.
Also, I wonder whether there is a disproportionate amount of __/__/_0 birthdays (among people who signed up this year), due to people who want to think of themselves as 19, 29, 39, etc.
141: All you'd have to do is get access to the data, to get into into some form useable by a statistical package and do the analysis. Which is why 137 is accurate.
130
These points aren't contradictory.
Aren't they? How interesting is a self-selected database just because it's large? 110 points out a number of demographic differences between OKCupid users and the general population, but misses the biggest of all: OKCupid users are also (supposedly) overwhelmingly single. It's not an extrapolation from college students, but it might as well be.
The OKCupid map could be very useful if similar dating services showed similar data and I wanted to pick the dating service that was right for my area. As it is, it really reveals very little about actual attitudes in our culture as a whole, though.
Admittedly, this is all ex recto; the link in the original post is blocked for me here at work. Oh well.
140: I'm not sure what your specific malfunction is, but you've got my empathy anyway.
Or maybe you were just trolling the grammarians?
How interesting is a self-selected database just because it's large?
"self selected" is slippery, here. It applies to almost all sampling, after all --- as a matter of degree. How much you can account for selection issues, and how decorrelated the information you are asking about is from the selection bias is what makes all this sort of stuff hairy.
I think it's pretty clear you'd have trouble probing that data set for population wide opinions on a lot of issues, but the obvious demographic skew is hardly narrow enough to make the data uninteresting.
I haven't any opinon on the posted polling info, note, I'm talking generally here.
137 is probably true, at least externally.
A lot of companies are selling this sort of data to each other though.
140: be And I'll do it with typos! Up against the wall Redneck Grammarians!
144: I do have me a good case of Yglesias's Disease*, that is for sure (I wrote a whole long comment on that over at Berube's place once, but will spare folks the details here. Short answer: In addition to honestly earned grammatical ignorance, I also think that I have a mild cognitive deficit that also manifests itself in other ways (ADD'y and Tourette'sish)).
145: And quit fucking posting relevant material before I can respond.
*But I do hope that if I were in his position I would recognize it and get a fracking copy editor.
92: Looking back, we see that black slaves in America at the Founding lived much worse lives than did poor Frenchmen, and had vastly fewer rights. Would that have justified a massive slave rebellion, ending with the murder of George Washington, Thomas Jefferson, and every other slave-owning aristocrat?
Um... yes?
149: Somebody should do an on-line poll to get a definative answer.
Shorter 148.2: I coulda been a New Haven firefighter if it weren't for that "wise" Latina! But thanks to Samuel "White Like Me" Alito and John "I Fucking Fooled Them" Roberts there is hope.
I should have put 150 on the new thread.
There are also two major uses for this kind of dataset, which I think are subject to different criticisms:
1) Assessing population-wide views, such as trying to assess what the answers to these questions would be if we could actually poll everyone in America.
This is essentially an issue of determining averages for a broad sweep of the population, which would be highly affected by the self-selection bias in the case of these culturally-relevant questions where (I believe most of us suspect) the self-selected group has significantly differing views from the broad population. This is where the need for further data-mining and stratification would be exceptionally necessary to try and produce accurate estimates from the dataset.
2) Assessing the variation in these views across the country (or other forms of group slicing).
For these purposes, it doesn't matter as much that the average response is skewed by self-selection bias. What matters instead is the degree to which answers/views correlate with geography (or whatever other form of slicing) after controlling for the relevant demographics. If you think that regional differences in opinion are actually due to a population-wide effect rather than a pure demographic effect or a concentrated subgroup within geographic regions (and if the latter, one that would either disproportionately appear or fall out of the self-selected sample), then this methodology remains fairly valid for showing the potential spread of opinions and where various geographic regions lie on the spectrum. Even better, it controls for demographic effects, which means you get an idea of how views differ within a relatively similar demographic, rather than say a culturally conservative skew due entirely to an older population.
Of course, if you want to use the data to determine the geographic spread in opinions including differences in demographics (say, for elections or something like that), then you again need to get back to slicing, dicing, and stratifying the data to pull out anything useful. But still, there's at least one use for which this data should be pretty decent without too much work.
153.2: The biggest issue I could see with #2 would be if OKCupid had a strongly different demographic or reputation in different areas (for instance if in some areas there were very strong regional or local competitors among specific demographics such as religious folk or what have you).
153.2 sounds like "If you assume that this information tells you something useful about geographical distribution, then it tells you something useful about geographical distribution," but I'm probably missing something.
Actually, you can, if you're clever about it, by properly stratifying the sample* (which is easier to do the more respondents you get). As Stormcrow suggests in 70, making demographic corrections goes a long way to countering the sample bias of self-selection. Obviously some bias remains, but there is a creditable argument that this particular bias is no worse, and possibly a little better, than that introduced by non-response.
I don't think I agree with this, I could see it working for some samples but the self-selection associated with OKCupid is just too weird to be controlled by observables. The 50 year old left-handed guys living in the Northeast subscribing to OKCupid will still be quite different from the typical 50 year old left-handed Northeastern guy in the general population.
On 153.2, I agree with Halford, and would say further that the geographic selection for OKCupid seems like it could be very different than the geographic selection in the population as a whole. A 30 year old OKCupid subscriber in North Dakota is likely to be atypical in ways that a 30 year old OKCupid subscriber in NY City is not.
155: Well, I'm mostly describing the necessary precondition for this to tell you something about geographic spreads, which is mostly that a) they exist, and b) the geographic differences in attitude correlate positively across demographic groups. I think both are pretty reasonable assumptions.
157: A positive correlation alone would tell you something valid about the direction of geographic differences, but couldn't you still be wildly off about the magnitudes?
Using the commonplace heuristic of "social science research results that reinforce my preconceived notions are presumptively valid", I will point out that the outlier status of WV on the guns question will stand up to any amount of additional scrutiny.
158: Yeah, absolutely. Well, except for the magnitude within the self-selected group. In the case of something like OKCupid, which will probably select for more educated, internet-using, 20-30 somethings, that self-selected group may give a perfect view into the magnitudes of differences among a potential dating pool even for those of us who haven't used OKCupid.