Re: Mates of State

1

Mildly NSFW due to a link on the front page to a less SFW article.


Posted by: Becks | Link to this comment | 07-13-09 9:56 PM
horizontal rule
2

However!

These answers have been provided by users of OkCupid softheaded enough to think that its simplistic polling model makes any sense!

For instance, take the question "Is it logically inconsistent to support the death penalty but oppose abortion?". What might be the, you know, conditions of possibility for answering that? You'd have to postulate rationales for each position, right? Regarding those you might be able to say that they're logically (!) inconsistent, or not—but that of course leaves the original question unanswered (and it's not hard to find rationales that would validate either conclusion). Perhaps you've canvassed all possible combinations of all possible rationales for each position, and determined that either each such combination is consistent or each such combination is inconsistent, and you're also pretty confident that no further rationales are lurking, unthought-of, out there in the realm of possibility. But no one answering the question has done that (I feel it is safe so to assert).

Which doesn't, I suppose, rob the map of all interest, but it does mean that you don't really know what question the answerers were answering.


Posted by: nosflow | Link to this comment | 07-13-09 10:07 PM
horizontal rule
3

Every map is constructed to go from bright green to bright red, which makes it a bit hard to tell whether these are significant differences. It would be nice to see something that represents the percentage breakdown in each state more accurately.


Posted by: essear | Link to this comment | 07-13-09 10:11 PM
horizontal rule
4

(Well, not quite that, because obviously, I think the answer to that question is "no", but I also think it's obviously "no", enough so that I'm not sure, not just what the people who think the answer is "yes" might possibly be thinking, but also what the question-askers might possibly be asking.)


Posted by: nosflow | Link to this comment | 07-13-09 10:11 PM
horizontal rule
5

I like the last map. Impressive.


Posted by: Parenthetical | Link to this comment | 07-13-09 10:13 PM
horizontal rule
6

4: the use of "logically" in that question is indeed puzzling. "Morally" would have been better, maybe.


Posted by: Beefo Meaty | Link to this comment | 07-13-09 10:16 PM
horizontal rule
7

5: Except that, as I understand the procedure, it isn't. It could be that 92% of the people in Idaho said they would rather give up the right to vote than the right to bear arms, but 97-99% of the people everywhere else did, and Idaho would look bright red on that map.


Posted by: essear | Link to this comment | 07-13-09 10:17 PM
horizontal rule
8

Um, swap the order of the two rights in my comment.


Posted by: essear | Link to this comment | 07-13-09 10:17 PM
horizontal rule
9

7: You're using numbers. I don't get those. Can you put it in a color coded map, please?


Posted by: Parenthetical | Link to this comment | 07-13-09 10:20 PM
horizontal rule
10

Oh, also, while that is certainly true, they are measuring some difference, yes? I still think it is impressive to know that 7% of Idaho would give up the right to vote, while only 3% of Californians would....


Posted by: Parenthetical | Link to this comment | 07-13-09 10:21 PM
horizontal rule
11

you don't really know what question the answerers were answering.

I suspect in that particular case the question most of the answerers were answering was "are Republicans hypocrites?"


Posted by: teofilo | Link to this comment | 07-13-09 10:22 PM
horizontal rule
12

It's important to keep in mind that these questions are written by individual OkCupid users.


Posted by: teofilo | Link to this comment | 07-13-09 10:23 PM
horizontal rule
13

10: But we don't know that, I was just making up numbers. All we know is that Idaho deviates from the mean in that direction more than anywhere else, but we don't know how far they deviate. It could be that 80% of Idahoans would and 2% of everyone else, or that 2% of Idahoans would and 1% of everyone else, or....


Posted by: essear | Link to this comment | 07-13-09 10:23 PM
horizontal rule
14

It would all be much improved by simple labels of what the extremes correspond to in terms of % answers.


Posted by: essear | Link to this comment | 07-13-09 10:25 PM
horizontal rule
15

The OkCupid algorithm always seems to match me with zillions of people who list The Fountainhead as a favorite book, so I don't trust them to do useful things with data.


Posted by: essear | Link to this comment | 07-13-09 10:29 PM
horizontal rule
16

12: most are, some aren't. And that's no excuse!

4: the use of "logically" in that question is indeed puzzling. "Morally" would have been better, maybe.

And yet you could still have the positions that (a) abortion is to be opposed because it's an instrument of women's freedom and (b) the death penalty is to be supported because wrongdoers should forfeit their lives, and these are not obviously inconsistent positions morally or logically (whatever specifically moral inconsistency would be), whatever else is true of them.


Posted by: nosflow | Link to this comment | 07-13-09 10:30 PM
horizontal rule
17

Indeed, you're only allowed to write new questions after you've answered some huge sum of already existing questions, which means that the only people who can write new questions are those who thought that the already existing questions were somehow worth answering.


Posted by: nosflow | Link to this comment | 07-13-09 10:32 PM
horizontal rule
18

most are, some aren't

I think all the ones these maps are based on are.


Posted by: teofilo | Link to this comment | 07-13-09 10:32 PM
horizontal rule
19

16.last: sure. I didn't say it would stop being stupid, only that "logically" was particularly puzzling.


Posted by: Beefo Meaty | Link to this comment | 07-13-09 10:35 PM
horizontal rule
20

13: Yeah, I know the numbers were made up; I suppose my point is that it does tell you something, even if it isn't statistically significant.

So, um, yes. You're right, the maps would be much better with labels.


Posted by: Parenthetical | Link to this comment | 07-13-09 10:36 PM
horizontal rule
21

it does tell you something, even if it isn't statistically significant

If it isn't statistically significant, it doesn't tell you anything.


Posted by: Beefo Meaty | Link to this comment | 07-13-09 10:40 PM
horizontal rule
22

I wonder how Netflix's data would compare as a way to map out regional trends.


Posted by: essear | Link to this comment | 07-13-09 10:43 PM
horizontal rule
23

21: Right. I've got to stop admitting that I'm wrong in a more direct manner.


Posted by: Parenthetical | Link to this comment | 07-13-09 10:43 PM
horizontal rule
24

Um, start, not stop.


Posted by: Parenthetical | Link to this comment | 07-13-09 10:43 PM
horizontal rule
25

22: now 10% more accurate when recommending abortions!


Posted by: Beefo Meaty | Link to this comment | 07-13-09 10:44 PM
horizontal rule
26

For that matter, why hasn't Netflix tried to spin off a dating service?


Posted by: essear | Link to this comment | 07-13-09 10:49 PM
horizontal rule
27

For that matter, why hasn't Netflix tried to spin off a dating service?

There's more demand for their service if people watch movies separately? Not to mention keeping the shut-ins shut in probably helps.


Posted by: nosflow | Link to this comment | 07-13-09 10:50 PM
horizontal rule
28

There's more demand for their service if people watch movies separately?

I don't know about that. I've had a Netflix for days that I would've watched already if eekbeat were around. n=1.


Posted by: Stanley | Link to this comment | 07-13-09 10:54 PM
horizontal rule
29

I've had a Netflix for days that I would've watched already if eekbeat were around.

But you have the account, and they don't really lose much if you just hold on to a DVD for days. (I sometimes have one sitting around for months before I finally watch it....)

If two people effectively have a joint account rather than two separate accounts, that's where they would be losing money.


Posted by: essear | Link to this comment | 07-13-09 10:58 PM
horizontal rule
30

But you have the account, and they don't really lose much if you just hold on to a DVD for days.

It would be best for them if everyone just held on to their DVDs for months—postage adds up.


Posted by: nosflow | Link to this comment | 07-13-09 10:59 PM
horizontal rule
31

28 is poignant.


Posted by: Beefo Meaty | Link to this comment | 07-13-09 10:59 PM
horizontal rule
32

Really, the most a dating service could do, from Netflix's perspective, would be to get people to go through DVDs more rapidly, and while that might mean upgrading to a more expensive plan for some people, they don't make their money by the disc, but they do have to pay for each disc's round trip. It wouldn't really spur anyone to join, either, since presumably it would work by taking advantage of your rental and rating record, accreted over long years.


Posted by: nosflow | Link to this comment | 07-13-09 11:01 PM
horizontal rule
33

28 is poignant.

"n=1" is the loneliest equation that you'll ever do.


Posted by: nosflow | Link to this comment | 07-13-09 11:02 PM
horizontal rule
34

32: you're discounting word of mouth, though. Or happy couples, featured in advertising. Bonanza!


Posted by: Beefo Meaty | Link to this comment | 07-13-09 11:05 PM
horizontal rule
35

32: Depends whether they charge for the dating service, or get extra advertising revenue. "You live 0.2 miles from a compatible user who is currently watching Mad Men, also likes Godard and hated Fire Walk with Me. For a mere $10, you could be watching the rest of the season with her soon!"


Posted by: essear | Link to this comment | 07-13-09 11:08 PM
horizontal rule
36

Or happy couples, featured in advertising.

Are they advertising the dating service itself, or is that an adjunct of their main dvd-shipping business?

Netflix should just add in an (opt-in) way to see other users who are near you both geographically and tastewise, with some other stats displayed, and to message them, but not call it a dating service.


Posted by: nosflow | Link to this comment | 07-13-09 11:10 PM
horizontal rule
37

Or maybe this is a terrible business model that would never make sense. Who knows? Fortran ate my brain today.


Posted by: essear | Link to this comment | 07-13-09 11:10 PM
horizontal rule
38

If two people effectively have a joint account rather than two separate accounts, that's where they would be losing money.

Ah, good point. There's a certain cost-to-doing-business thing (shipping fees, lost DVDs {I, for one, have had several get lost in the post; they seem to absorb the cost without blinking}) that also comes into play. So, sure, they make more money temporarily if no one watches movies. But if no one watches movies no one keeps a Netflix account.

There's a good amount of post-watching-data they could be using to do something, I guess, was the point I was interested in. A netflix dating service seems like a great idea. They know where you are and what you like, movie-wise.

Somebody go pitch this to Netflix is all I'm saying. Also pitch with my ongoing complaint that Netflix should include a note telling me why the fuck I ordered X movie six months ago. "Oh, mcmanus suggested it? Word."


Posted by: Stanley | Link to this comment | 07-13-09 11:10 PM
horizontal rule
39

Somebody go pitch this to Netflix is all I'm saying.

Get me a hobo!


Posted by: Beefo Meaty | Link to this comment | 07-13-09 11:11 PM
horizontal rule
40

I see that I've been massively pwned and hobo'd. You bastards!


Posted by: Stanley | Link to this comment | 07-13-09 11:13 PM
horizontal rule
41

There's a good amount of post-watching-data they could be using

They collect data on a movie's date-effectiveness?


Posted by: nosflow | Link to this comment | 07-13-09 11:13 PM
horizontal rule
42

There are too many embarrassing movies in my queue or that I've watched for me to want to share that information. Hopefully there would be some way to manipulate your profile.


Posted by: Parenthetical | Link to this comment | 07-13-09 11:13 PM
horizontal rule
43

Somebody go pitch this to Netflix is all I'm saying.

If you sell the idea to them, please send me a nickel. Preferably a shiny one.


Posted by: essear | Link to this comment | 07-13-09 11:14 PM
horizontal rule
44

If they did, they could come up with tailor-made recommendations for what the prospective couple between the members of whom they mediate should watch, based on what's been effective with others with similar movie-watching habits.


Posted by: nosflow | Link to this comment | 07-13-09 11:14 PM
horizontal rule
45

42: do you want a match for your ideal you, or for the real you?


Posted by: nosflow | Link to this comment | 07-13-09 11:15 PM
horizontal rule
46

the prospective couple between the members of whom they mediate

Whoa. Fortran really did break my brain. Does this parse as English for normal people?


Posted by: essear | Link to this comment | 07-13-09 11:16 PM
horizontal rule
47

Should be "between the members of which". Your eagle eyes save the day once again, essear!


Posted by: nosflow | Link to this comment | 07-13-09 11:17 PM
horizontal rule
48

There are too many embarrassing movies in my queue or that I've watched for me to want to share that information. Hopefully there would be some way to manipulate your profile.

I guess you don't use the "Friends" feature that currently exists, then? I've had second thoughts about adding things because of it. Then I remember the reality-TV-watching habits of my friends with otherwise refined tastes, and no longer feel embarrassed.


Posted by: essear | Link to this comment | 07-13-09 11:18 PM
horizontal rule
49

45: Everyone needs to have their secrets, neb.

And no, not really, but I'd prefer to charm them with my good taste first, and then slowly work in the horrifying elements.


Posted by: Parenthetical | Link to this comment | 07-13-09 11:18 PM
horizontal rule
50

48: I do, but the only person I'm friends with actually lived with me, so she's well aware of my bad taste.


Posted by: Parenthetical | Link to this comment | 07-13-09 11:19 PM
horizontal rule
51

It's a good feature, by the way, for roommates, so that you if you don't share an account you can still keep tabs on what is in the house for you to pilfer.


Posted by: Parenthetical | Link to this comment | 07-13-09 11:20 PM
horizontal rule
52

21 reminds me of the religious views, as expressed on facebook, of a friend of mine: "There is a magical threshold called alpha = .05, which is the source of all truth".


Posted by: nosflow | Link to this comment | 07-13-09 11:21 PM
horizontal rule
53

52: I can't imagine why.


Posted by: Beefo Meaty | Link to this comment | 07-13-09 11:24 PM
horizontal rule
54

Because being a great truth, the statement in 52 springs to mind whenever other truths are mentioned.


Posted by: Walt Someguy | Link to this comment | 07-13-09 11:38 PM
horizontal rule
55

FORTRAN ATE MY BALLS


Posted by: OPINIONATED GRANDMA | Link to this comment | 07-14-09 12:03 AM
horizontal rule
56

LAST NIGHT FORTRAN SAVED MY LIFE


Posted by: OPINIONATED INDEEP | Link to this comment | 07-14-09 2:18 AM
horizontal rule
57

If it isn't statistically significant, it doesn't tell you anything.

Take that, "impressionistic" historians!

Also, now I know what people are talking about when they talk about running coin-tossing, dice-rolling, trying-to-not-goat-choosing computer simulations.


Posted by: eb | Link to this comment | 07-14-09 4:16 AM
horizontal rule
58

Is there a way on Netflix to have "two users" on one account? I think that my BF and I are screwing up their rating system, because we have different tastes in movies. I'd like to be able to get recommendations for me and separate ones for him.


Posted by: Bostoniangirl | Link to this comment | 07-14-09 5:41 AM
horizontal rule
59

I'd like to see the "rape fantasy" map broken out by gender. There's a big difference between a male respondent being game to play along with his female partner's rape fantasy and a female respondent's being game to play along with her male partner's rape fantasy (heteronormativity preemptively acknowledged). I wouldn't be surprised if some or even most of the differential among states on that question can be explained by variations in the gender composition of the user base.


Posted by: pain perdu | Link to this comment | 07-14-09 6:32 AM
horizontal rule
60

Totally OT: So I just woke from a dream in which I was about to be married to Ogged (whom I have never met), but discovered evidence that he was a serial killer. When he discovered that I was obliquely Tweeting and FB-statusing in the hopes of someone figuring it out and calling the police, he began menacing me with an icepick as I feigned ignorance of his crimes. LizardBreath arrived just in time to save me, tackling Ogged from behind and tying his hands together. It was all very cinematic.

The main reveal came when I realized there was a room in Ogged's house where he hung 8x10 glossies of his intended victims with his weapon of choice suspended over them. One by one, these little displays would disappear, as would the people they represented.

I have no idea why I am dreaming about Ogged. I sort of figure my brain chose a random name and it could just as easily been any of you. But I do like to think that LizardBreath would be the one to save me if I was in a pickle.


Posted by: A White Bear | Link to this comment | 07-14-09 6:53 AM
horizontal rule
61

60: What did ogged look like in your dream?

8x10 glossies of his intended victims with his weapon of choice suspended over them

I'm assuming he didn't include information about what room he was going to perform the deed in so as to throw Colonel Mustard et alia off the scent?


Posted by: M/tch M/lls | Link to this comment | 07-14-09 6:57 AM
horizontal rule
62

Or, as here, an ice pick-le


Posted by: Di Kotimy | Link to this comment | 07-14-09 6:58 AM
horizontal rule
63

52: I hope your friend corrects for multiple comparisons.


Posted by: Moby Hick | Link to this comment | 07-14-09 6:58 AM
horizontal rule
64

61: You know how dreams are, sort of hard to tell, but vaguely along the general Mexican Chris Noth lines.


Posted by: A White Bear | Link to this comment | 07-14-09 7:02 AM
horizontal rule
65

3: It would be nice to see something that represents the percentage breakdown in each state more accurately.

Or as in the case of the question, Should burning your nation's flag be illegal?, have the actual data points plotted. (It was near the bottom of the earlier "Rape Fantasies and Hygiene By State" post.) Here is the resulting map, and it is a nice visualization of where we Commies live (actually it could use a blow-up). Austin stands out in Texas (as do college towns across the country if you look closely), Seattle vs. Tacoma, Denver vs. Colorado Springs, Cleveland and Columbus vs. other Ohio cities, The People's Republic of Greater Vermont (includes parts of New Hampshire and Western Massachusetts). Lots of good stuff.


Posted by: JP Stormcrow | Link to this comment | 07-14-09 7:16 AM
horizontal rule
66

I'm amused by how interesting I find these maps given how essentially bogus I think the underlying data is. N=115,000, which I vaguely seem to recall is fine for a national sample, but of course it's not evenly distributed.

Given how much time they spend snarking on North Dakota, I'm extremely curious as to how many people from ND actually participated.


Posted by: Witt | Link to this comment | 07-14-09 7:18 AM
horizontal rule
67

66: The number of people isn't the problem. You can get away with about 1,200 for a national sample (depending on the CI you want, etc.). It appears to be a self-selected sample. You can't fix that by getting more people.


Posted by: Moby Hick | Link to this comment | 07-14-09 7:24 AM
horizontal rule
68

I'm pretty fascinated with the MINUTE results. My only real worl experience with the state involved the boyfriend who swung far right and left me for Rush Limbaugh after moving there. I gather I've harbored an unfair and totally inaccurate impression of the state all these years.


Posted by: Di Kotimy | Link to this comment | 07-14-09 7:24 AM
horizontal rule
69

Um, Minnesota results. Apparently my bb auto-corrects "MN" to "MINUTE."


Posted by: Di Kotimy | Link to this comment | 07-14-09 7:27 AM
horizontal rule
70

66.2: You can get some idea of that from the map with the points plotted (pretty sparse, but needs a bigger scale map to really see, or the numbers would help as well).

And it is GREAT data, although yes there are of course a number of significant underlying selection biases. But with those caveats it is wonderful stuff (especially if you had (as they do) some basic demographic data to identify some of those selection biases).


Posted by: JP Stormcrow | Link to this comment | 07-14-09 7:27 AM
horizontal rule
71

69: Paging Emerson!


Posted by: JP Stormcrow | Link to this comment | 07-14-09 7:28 AM
horizontal rule
72

The Minnesota results actually line up pretty well with my (limited) experience of the state.

67: I'm not sure what it means that I first read "CI" as "confidential informant." I haven't even seen The Wire!

(You meant "confidence index," right?)


Posted by: Witt | Link to this comment | 07-14-09 7:32 AM
horizontal rule
73

interval


Posted by: JP Stormcrow | Link to this comment | 07-14-09 7:34 AM
horizontal rule
74

Minnesota results

I know Emerson has said this before, but I would guess that MN is probably one of the most polarized rural/city states there is politics wise.


Posted by: CJB | Link to this comment | 07-14-09 7:35 AM
horizontal rule
75

72.2: Confidence interval. When they report a survey in the news, they usually report a confidence interval (e.g. +/- 3%). This is usually the 95% confidence interval (which goes with the alpha


Posted by: Moby Hick | Link to this comment | 07-14-09 7:38 AM
horizontal rule
76

I seem to have lost the end of my last comment.

72.2: Confidence interval. When they report a survey in the news, they usually report a confidence interval (e.g. +/- 3%). This is usually the 95% confidence interval (which goes with the alpha = .05), though the news doesn't always report that part.
The more respondents, the narrower the CI for a given alpha, but diminishing returns set in pretty quickly after 1,000 people. Of course, if you are interested in some sub-group (or in comparing one group to another), you will need more people.


Posted by: Moby Hick | Link to this comment | 07-14-09 7:42 AM
horizontal rule
77

76: Ah, thanks, that makes sense.

Of course, if you are interested in some sub-group (or in comparing one group to another), you will need more people.

Right. I have this hazy notion that to do certain kinds of state-by-state comparisons with different subpopulations, the Census Bureau usually surveys 30,000 people, but maybe I'm completely off.


Posted by: Witt | Link to this comment | 07-14-09 7:44 AM
horizontal rule
78

74: That claim was also made frequently about Pennsylvania during the 2008 primary; the neologism "Pennsyltucky" was much bandied about to describe the rural part of the state, at least in my circles. And I know a guy who is always talking about how Vermont was famously, strongly rural right-wing Republican until the last decade or two. And Washington and/or Oregon, I forget which but it's plausible of both, supposedly has very different political climates along the coast and inland. Austin, of course, is a cosmopolitan oasis, I gather.

There's stiff competition for "one of the most polarized rural/city states there is politics wise," I gather.


Posted by: Cyrus | Link to this comment | 07-14-09 7:45 AM
horizontal rule
79

There's stiff competition for "one of the most polarized rural/city states there is politics wise," I gather.

Maybe. I should say that when I say city/rural about MN it is really Minneapolis-St. Paul metro area, everywhere else. If I am looking at the population data correctly over half the people in the state live in the Minn-St. Paul metro area.


Posted by: CJB | Link to this comment | 07-14-09 7:51 AM
horizontal rule
80

77: When comparing sub-populations, the sample size depends on the alpha you want (nearly always .05) as well as how different you expect the sub-populations to be. Not suprisingly, you need a much bigger sample to look for a small difference between two groups than you need to look for a large difference. There are canned power analysis programs to assist with this.


Posted by: Moby Hick | Link to this comment | 07-14-09 7:52 AM
horizontal rule
81

78: The map I link in 65 (flag burning question) illustrates that pretty well. (I really do want a blow up). And in some ways it is the perfect question. For the bigger cities you can see the suburban rings.


Posted by: JP Stormcrow | Link to this comment | 07-14-09 7:52 AM
horizontal rule
82

81: And not saying that the question maps onto Republican/Democrats perfectly, it is more of a traditionalist/rootless cosmopolitan divide.


Posted by: JP Stormcrow | Link to this comment | 07-14-09 7:55 AM
horizontal rule
83

Maybe. I should say that when I say city/rural about MN it is really Minneapolis-St. Paul metro area, everywhere else. If I am looking at the population data correctly over half the people in the state live in the Minn-St. Paul metro area.

Rural/city: I learne dfrom this week's New Yorker that Duluth is very very Democratic.


Posted by: Cryptic ned | Link to this comment | 07-14-09 7:56 AM
horizontal rule
84

It appears to be a self-selected sample. You can't fix that by getting more people.

Actually, you can, if you're clever about it, by properly stratifying the sample* (which is easier to do the more respondents you get). As Stormcrow suggests in 70, making demographic corrections goes a long way to countering the sample bias of self-selection. Obviously some bias remains, but there is a creditable argument that this particular bias is no worse, and possibly a little better, than that introduced by non-response.

I did some work for a polling outfit that bet their business on the truth of the latter proposition, and thus far they have done well. They poll exclusively online, recruiting their respondents in a fashion that allows them to be demographically sorted. The trick in their business model is that they compensate the respondents, and calibrate the compensation carefully to recruit enough respondents with various demographic traits in order to round out the sample with the lowest possible outlay. So, for example, if a poll of the online population tends to undersample old people in rural areas, they will offer higher compensation to those individuals.

I won't identify the company because it's potentially personally identifying, but if you read a certain highbrow newsweekly published in London you will see their polling frequently.

*N.B. I'm not claiming okcupid has done this, just that it's possible in theory and in practice


Posted by: pain perdu | Link to this comment | 07-14-09 7:57 AM
horizontal rule
85

There's stiff competition for "one of the most polarized rural/city states there is politics wise," I gather.

The residents of New York and California would surely agree.


Posted by: pain perdu | Link to this comment | 07-14-09 7:59 AM
horizontal rule
86

Erm, what makes Minnesota unusual is the rural white strongly democratic parts of the state, not a rural/urban split.


Posted by: David Weman | Link to this comment | 07-14-09 7:59 AM
horizontal rule
87

Actually, considering Minnesota hasn't been strongly democratic, the urban/rural divide in voting must be pretty small.


Posted by: David Weman | Link to this comment | 07-14-09 8:01 AM
horizontal rule
88

There aren't very many "rural" people, period. Minnesota contains lots of people who believe themselves to be living in small towns, actually live in exurbs, and are religious maniacs.


Posted by: Cryptic ned | Link to this comment | 07-14-09 8:04 AM
horizontal rule
89

Paging Emerson!

You'll can reach him by snailmail at this address.


Posted by: pain perdu | Link to this comment | 07-14-09 8:06 AM
horizontal rule
90

84: Yes, you can certainly do better internet-based surveys. What you describe is an attempt to get as far as possible from a self-selected sample given the constraints of the medium. I'm guessing that they use well below 100k respondents to do this (how could they afford it otherwise). My point is that numbers don't matter as much as where the numbers come from. There is a lot of ground between an ideal random-sample-with-replacement and what is actually done given non-response issues, cost, etc. But, adding more people, past a certain point, is not improving the methodology.


Posted by: Moby Hick | Link to this comment | 07-14-09 8:10 AM
horizontal rule
91

contains lots of people who believe themselves to be living in small towns, actually live in exurbs,

Isn't this true of much of the country now (by population, anyway)?


Posted by: soup biscuit | Link to this comment | 07-14-09 8:11 AM
horizontal rule
92

Those maps are nothing but an exercise in misinformation and piss me off.

However! A fine quatorze juillet to all. This is a fun read.

. . . what the French Revolution was really like - a digestive eruption of all the basest instincts in the lowest elements of society, led by power-drunk ideologues of the radical Left. . . . was utterly unlike the American rebellion against the English colonial officials - which amounted to a regional secession, led by the responsible members of the upper middle class.
Looking back, we see that black slaves in America at the Founding lived much worse lives than did poor Frenchmen, and had vastly fewer rights. Would that have justified a massive slave rebellion, ending with the murder of George Washington, Thomas Jefferson, and every other slave-owning aristocrat?

Sorry, was there a question in there?

Also, Happy Woody Guthrie's Birthday!


Posted by: Sir Kraab | Link to this comment | 07-14-09 8:14 AM
horizontal rule
93

90: your surmises are correct. And I agree with you on all counts.

The outfit I was talking about deliberately oversamples by a much larger factor than an offline pollster (which they can easily do because it is much cheaper than telephone or in-persons polling), not to increase the sample size per se, but to facilitate the stratification.


Posted by: pain perdu | Link to this comment | 07-14-09 8:14 AM
horizontal rule
94

92.1 is shorter me, except that I don't get pissed-off so much.


Posted by: Moby Hick | Link to this comment | 07-14-09 8:18 AM
horizontal rule
95

Fun maps. Results are entirely unreliable, but very intuitive, aren't they? Emerson would have a field day with them.


Posted by: David Weman | Link to this comment | 07-14-09 8:18 AM
horizontal rule
96

Erm, what makes Minnesota unusual is the rural white strongly democratic parts of the state

What parts of the state would those be?


Posted by: CJB | Link to this comment | 07-14-09 8:19 AM
horizontal rule
97

The north.


Posted by: David Weman | Link to this comment | 07-14-09 8:21 AM
horizontal rule
98

The north.

How are you judging that?


Posted by: CJB | Link to this comment | 07-14-09 8:25 AM
horizontal rule
99

98: He has a compass.


Posted by: Moby Hick | Link to this comment | 07-14-09 8:26 AM
horizontal rule
100

Duluth is the main rural city in the north.


Posted by: Cryptic ned | Link to this comment | 07-14-09 8:27 AM
horizontal rule
101

60: Also OT, I had an odd AWB-related Unfogged dream last night. I was in her kitchen, and there was a tureen of soup with a banana in it. She suggested that I remove the banana, since it wouldn't taste good in the soup, and then we ate the soup. Also, all of our conversation was via Unfogged comments.


Posted by: emdash | Link to this comment | 07-14-09 8:29 AM
horizontal rule
102

92.1, 94, 95: Unreliable for what? At some level you folks are all utterly barking mad. Like the statistician who had his head stuck so far up his methodology that he was bitching about the pop-soda map. Get off your scientistic high horses and meet the world. Flaws and all, this is probably among the best datasets about broad-based public attitudes in the history of the world.

Hyperbole much JP? A little, when I'm overwrought.


Posted by: JP Stormcrow | Link to this comment | 07-14-09 8:29 AM
horizontal rule
103

On a very cursory glance, I appear to be mostly right:

http://www.fairvote2020.org/2004/12/bush-kerry-by-county.html

Smaller rural/urban divide than most places. The mostly dem rural parts seem more common in Wisconsin, but that be just in hectares, not people.

I may well be talking nonsense.


Posted by: David Weman | Link to this comment | 07-14-09 8:30 AM
horizontal rule
104

I hate Duluth.


Posted by: Kobe | Link to this comment | 07-14-09 8:30 AM
horizontal rule
105

Q: Do you like Hibbing?
A: Why you dirty-minded man, I've never hibbed.


Posted by: JP Stormcrow | Link to this comment | 07-14-09 8:31 AM
horizontal rule
106

102: "Flaws and all, this is probably among the best datasets about broad-based public attitudes in the history of the world."

I'll try to estimate a p value for that statement and get back to you. Though, 'among the best' isn't easy to quanitfy.


Posted by: Moby Hick | Link to this comment | 07-14-09 8:35 AM
horizontal rule
107

Or to quantify.


Posted by: Moby Hick | Link to this comment | 07-14-09 8:36 AM
horizontal rule
108

"Among the best" is not a strong statement. I, for example, am among the best 290 million soccer players in the United States.


Posted by: Cryptic ned | Link to this comment | 07-14-09 8:44 AM
horizontal rule
109

JP is right. This is a gold mine of information. Unfortunately, they're presenting it in a really lousy way.


Posted by: essear | Link to this comment | 07-14-09 8:45 AM
horizontal rule
110

This is a gold mine of information.

It's so hard to read tone. Are you guys really serious? OKCupid users IMO are extremely unrepresentative of the general population in terms of age (younger), education (more), social class (higher) and technological prowess (way, way, way more).

They're probably pretty representative in broadly-defined ethnicity (that is, the six widely used categories) and religion (maybe a slight over-representation of out atheists). But I would be very hard pressed to think that their opinions on hot-button social issues or moral questions could tell me anything reliable about 305 million Americans, most of whom are older, poorer, and less-well-off than they are.


Posted by: Witt | Link to this comment | 07-14-09 8:53 AM
horizontal rule
111

I think essar and JP are just trolling the quants.


Posted by: Moby Hick | Link to this comment | 07-14-09 8:55 AM
horizontal rule
112

On a very cursory glance, I appear to be mostly right:

A large part of the North that went for Kerry is covered by the Superior National Forest and the Boundary Waters Wilderness area. I think that map tends to be somewhat misleading because of the very small number of people in those areas.


Posted by: CJB | Link to this comment | 07-14-09 8:58 AM
horizontal rule
113

Yes, but CJB, in for example California, Oregon, Washington, Pennsylvania, Illinois, there were more 60-75% districts for Bush. In Minnesota, as in Wisconsin, less solidly blue, there were few.


Posted by: David Weman | Link to this comment | 07-14-09 9:06 AM
horizontal rule
114

Let's try that again:

Yes, but CJB, in for example California, Oregon, Washington, Pennsylvania, Illinois, there were plenty of 60-75% districts for Bush. In Minnesota and Wisconsin, less solidly blue, there were just a few.


Posted by: David Weman | Link to this comment | 07-14-09 9:10 AM
horizontal rule
115

Part of the problem in that I am conflating Liberal and Democrat which I shouldn't be. MN may have a decent number of Democrats. I don't usually go around asking people their political affiliations, but spending some time in those areas they are not what I would consider liberal.


Posted by: CJB | Link to this comment | 07-14-09 9:17 AM
horizontal rule
116

Do not know how it has gone recently, but at one time northern Minnesota had a fairly strong union presence due to the iron ore miniing, so it was smallish towns/rural with a more Democratic slant than usual for that mix. So that's one key to understanding Minnesota political demographics.

"What do you mean key? Mesabi?"


Posted by: JP Stormcrow | Link to this comment | 07-14-09 9:23 AM
horizontal rule
117

I'm not trolling anyone, and I don't know what "quant" means in this context (surely we don't have multiple people working in quantitative finance in this thread?).

OKCupid users IMO are extremely unrepresentative of the general population in terms of age (younger), education (more), social class (higher) and technological prowess (way, way, way more).

Well, then if nothing else it's a huge dataset on geographical variations in attitudes among middle-class, internet-savvy 20-40 year olds, which is not uninteresting in itself. (Look at how many psych studies use 18- to 22-year-old university students. One doesn't have to sample the whole population to learn interesting things.)

But I would guess that their dataset is large enough that, by being clever, one can extract reasonably good information about the population as a whole with at least coarse geographic breakdowns. This would obviously take much more work than has gone into making these maps.


Posted by: essear | Link to this comment | 07-14-09 9:25 AM
horizontal rule
118

The philosophy of the modern GOP, brought to you by Sen. Jeff Sessions: "Empathy for one party is always prejudice against another."


Posted by: Sir Kraab | Link to this comment | 07-14-09 9:29 AM
horizontal rule
119

117: By 'quant' I mean someone who uses quantitative research methods, particularly in the social sciences.


Posted by: Moby Hick | Link to this comment | 07-14-09 9:32 AM
horizontal rule
120

100: rural city

No.


Posted by: Cyrus | Link to this comment | 07-14-09 9:32 AM
horizontal rule
121

I may well be talking nonsense.

New mouseover text?


Posted by: Sir Kraab | Link to this comment | 07-14-09 9:33 AM
horizontal rule
122

117: Looking for geographic patterns is one of the tasks for which a survey like OKcupid is worst adapted. Things like tech savy, income, age and education vary greatly with geography. When a psychologist does an experiment with students, they are looking at how certain traits interact with or relate to each other. They aren't trying (or shouldn't be trying) to estimate prevalence.


Posted by: Moby Hick | Link to this comment | 07-14-09 9:35 AM
horizontal rule
123

(Look at how many psych studies use 18- to 22-year-old university students. One doesn't have to be a stickler to find this suspect.)


Posted by: nosflow | Link to this comment | 07-14-09 9:37 AM
horizontal rule
124

come now nosflow, surely it's a pretty fair methodology to find the pulse of 18-22 year old university students.....


Posted by: soup biscuit | Link to this comment | 07-14-09 9:47 AM
horizontal rule
125

118: Gotta love that the only other judge specifically identified as a threat to The American Way is Justice Ginsburg. Damned women-folk with their golldarn empathy.


Posted by: Di Kotimy | Link to this comment | 07-14-09 9:48 AM
horizontal rule
126

The judicial system is a zero-sum game. If women yes, then people no.


Posted by: Cryptic ned | Link to this comment | 07-14-09 9:49 AM
horizontal rule
127

125: We can't help it. It's the hormones.


Posted by: Sir Kraab | Link to this comment | 07-14-09 10:01 AM
horizontal rule
128

117: By 'quant' I mean someone who uses quantitative research methods, particularly in the social sciences.

It's possible that my lack of social science training makes me effectively innumerate for these purposes, but: really? You're saying this database of order a million responses each to hundreds or thousands of questions does not contain interesting information?

It's not how one would design an experiment to learn about these things, but it's there. It's observational data. And you don't think it can be mined to learn interesting things?


Posted by: essear | Link to this comment | 07-14-09 10:01 AM
horizontal rule
129

116.2: Nicely done.

Stormcrow has it right. While the Arrowhead (the northern pointy bit in MN) is no longer the hotbed of syndicalism it once was and, as per CJB, it''s not densely populated, it is a big part of what helps get Dems elected on a state-wide level.

I don't think we're that different from national political trends. We're a bit like an over-sized Vermont, with a more cosmopolitan urban center (thanks in no small part to recent immigrant populations). Where I think MN politics can be distinct is in having leftist or progressive politics get more of a seat at the table, particularly in the Twin Cities. Like everywhere else, there are plenty of Limbaugh-loving assholes here too, and, like everywhere else, they cluster especially densely in the suburbs.


Posted by: Jimmy Pongo | Link to this comment | 07-14-09 10:04 AM
horizontal rule
130

I suspect some people are talking past each other on the subject of the OKCupid data.

one set is saying "but the selection method has bias!"

another "wow, huge database, lots of interesting stuff in there!"

These points aren't contradictory.


Posted by: soup biscuit | Link to this comment | 07-14-09 10:05 AM
horizontal rule
131

128: On a quick examination, I wasn't able to tell how they got their respondents and other crucial information. You may or may not be able to learn something from it.

But it is certainly possible to collect millions of responses and not learn anything accurate (except incidentally). The classic example is the Literary Digest poll for the 1936 election. They got 2 million responses and predicted Landon would beat FDR by a huge margin. Gallop got the right answer with a relatively tiny sample because he had a far better method.


Posted by: Moby Hick | Link to this comment | 07-14-09 10:10 AM
horizontal rule
132

130: You're right, of course. I'm just annoyed by the accusation of trolling.


Posted by: essear | Link to this comment | 07-14-09 10:11 AM
horizontal rule
133

131 128: On a quick examination, I wasn't able to tell how they got their respondents and other crucial information.

It's a free dating site. People sign up and answer questions to try to get better matches. So, yes, it's a very self-selected sample.

On the other hand, for each person on the site, the site owners know how they answered every question, plus their self-reported age, gender, race, education level, languages spoken, and income, most of which are relatively accurately self-reported, I think (filtering out the people who say their income is $1 million+ when it obviously isn't). Surely with sufficient effort one can use all this information to learn interesting things.

I never said the raw yes/no counts were interesting. I think it would take serious work to try to get controlled and reliable numbers out of the data. But it is a huge amount of data.


Posted by: essear | Link to this comment | 07-14-09 10:15 AM
horizontal rule
134

Well, then if nothing else it's a huge dataset on geographical variations in attitudes among middle-class, internet-savvy 20-40 year olds, which is not uninteresting in itself.

Oh, for sure. I was just reacting to the OK Cupid blog's apparent belief that they are actually describing geographic trends among Americans, rather than a subgroup.

On preview: Pwned by 130, but I"m posting because I figure essear deserves to hear that I don't think he's (?) a troll.


Posted by: Witt | Link to this comment | 07-14-09 10:15 AM
horizontal rule
135

110: It's so hard to read tone.

Yes, it's hard to even know myself some times when I'm just being authentically obtuse or obtuse for effect. But in this case I am being mostly serious (ignoring the hyperbole) along the lines that essear discusses in 117. In partiuclar, I am not assuming that I can necessarily learn about the real parameters for all of America., but rather the expressed attitudes on interesting topics from a fairly broad sweep of people who have meeting other people for romance (pretty universal) as the common demoninator. And some of the deographic stuff could be removed by analysis. But, true, it certainly would not have an adequate representation of contentedly married older folk, for instance.

But it does have the advantage of being answers in which the respondent had an actual interest in (even if it does not reflect their "real" values), and furthermore. answers which the respondents were not aware would be ised for this purpose (so in some sense they are being spied on in aggregate).

And on preview, yes to 130. Anyway, I would love to see the actual dataset. And it was the map of the actual points for the flag burning question that really piqued my interest.


Posted by: JP Stormcrow | Link to this comment | 07-14-09 10:16 AM
horizontal rule
136

134.last: Me on the other hand...


Posted by: JP Stormcrow | Link to this comment | 07-14-09 10:17 AM
horizontal rule
137

"It's not how one would design an experiment to learn about these things, but it's there. It's observational data. And you don't think it can be mined to learn interesting things?"

But it never will.


Posted by: David Weman | Link to this comment | 07-14-09 10:17 AM
horizontal rule
138

And it was the map of the actual points for the flag burning question that really piqued my interest.

I really don't think you should trust that map to find you a safe place to practice your vile, unpatriotic deeds, JP.


Posted by: M/tch M/lls | Link to this comment | 07-14-09 10:20 AM
horizontal rule
139

"And a word about statistical validity: the best questions on OkCupid have been answered over a million times. Therefore we have unique insights into the American mindset. A quick comparison:

OkCupid Question Popularity

Old media could only get 3,050 people to answer a poll about Obama. And it was enough to call the election with confidence.
[...]
300,000 people have answered that question in 3 parts, and there are thousands more questions with as large or larger data sets."

He could possibly know it's nonsense, it's a lighthearted post, but it doesn't sound like it to me.


Posted by: David Weman | Link to this comment | 07-14-09 10:22 AM
horizontal rule
140

I'll be practice empathy anywhere I want.


Posted by: JP Stormcrow | Link to this comment | 07-14-09 10:22 AM
horizontal rule
141

Me on the other hand...

Awww, bubelah, not you either.

plus their self-reported age, gender, race, education level, languages spoken, and income, most of which are relatively accurately self-reported

Ooh, now I'm wildly curious about whether they have a bulge in 1/1 birthdays. Since they (ahem) send birthday e-mails, I'm sure OK Cupid has stats on those sorts of things.

Also, I wonder whether there is a disproportionate amount of __/__/_0 birthdays (among people who signed up this year), due to people who want to think of themselves as 19, 29, 39, etc.


Posted by: Witt | Link to this comment | 07-14-09 10:22 AM
horizontal rule
142

141: All you'd have to do is get access to the data, to get into into some form useable by a statistical package and do the analysis. Which is why 137 is accurate.


Posted by: Moby Hick | Link to this comment | 07-14-09 10:27 AM
horizontal rule
143

130
These points aren't contradictory.

Aren't they? How interesting is a self-selected database just because it's large? 110 points out a number of demographic differences between OKCupid users and the general population, but misses the biggest of all: OKCupid users are also (supposedly) overwhelmingly single. It's not an extrapolation from college students, but it might as well be.

The OKCupid map could be very useful if similar dating services showed similar data and I wanted to pick the dating service that was right for my area. As it is, it really reveals very little about actual attitudes in our culture as a whole, though.

Admittedly, this is all ex recto; the link in the original post is blocked for me here at work. Oh well.


Posted by: Cyrus | Link to this comment | 07-14-09 10:27 AM
horizontal rule
144

140: I'm not sure what your specific malfunction is, but you've got my empathy anyway.


Posted by: M/tch M/lls | Link to this comment | 07-14-09 10:27 AM
horizontal rule
145

Or maybe you were just trolling the grammarians?


Posted by: M/tch M/lls | Link to this comment | 07-14-09 10:29 AM
horizontal rule
146

How interesting is a self-selected database just because it's large?

"self selected" is slippery, here. It applies to almost all sampling, after all --- as a matter of degree. How much you can account for selection issues, and how decorrelated the information you are asking about is from the selection bias is what makes all this sort of stuff hairy.

I think it's pretty clear you'd have trouble probing that data set for population wide opinions on a lot of issues, but the obvious demographic skew is hardly narrow enough to make the data uninteresting.

I haven't any opinon on the posted polling info, note, I'm talking generally here.


Posted by: soup biscuit | Link to this comment | 07-14-09 10:31 AM
horizontal rule
147

137 is probably true, at least externally.

A lot of companies are selling this sort of data to each other though.


Posted by: soup biscuit | Link to this comment | 07-14-09 10:33 AM
horizontal rule
148

140: be And I'll do it with typos! Up against the wall Redneck Grammarians!

144: I do have me a good case of Yglesias's Disease*, that is for sure (I wrote a whole long comment on that over at Berube's place once, but will spare folks the details here. Short answer: In addition to honestly earned grammatical ignorance, I also think that I have a mild cognitive deficit that also manifests itself in other ways (ADD'y and Tourette'sish)).

145: And quit fucking posting relevant material before I can respond.

*But I do hope that if I were in his position I would recognize it and get a fracking copy editor.


Posted by: JP Stormcrow | Link to this comment | 07-14-09 10:33 AM
horizontal rule
149

92: Looking back, we see that black slaves in America at the Founding lived much worse lives than did poor Frenchmen, and had vastly fewer rights. Would that have justified a massive slave rebellion, ending with the murder of George Washington, Thomas Jefferson, and every other slave-owning aristocrat?

Um... yes?


Posted by: ajay | Link to this comment | 07-14-09 10:33 AM
horizontal rule
150

149: Somebody should do an on-line poll to get a definative answer.


Posted by: Moby Hick | Link to this comment | 07-14-09 10:36 AM
horizontal rule
151

Shorter 148.2: I coulda been a New Haven firefighter if it weren't for that "wise" Latina! But thanks to Samuel "White Like Me" Alito and John "I Fucking Fooled Them" Roberts there is hope.


Posted by: JP Stormcrow | Link to this comment | 07-14-09 10:36 AM
horizontal rule
152

I should have put 150 on the new thread.


Posted by: Moby Hick | Link to this comment | 07-14-09 10:39 AM
horizontal rule
153

There are also two major uses for this kind of dataset, which I think are subject to different criticisms:

1) Assessing population-wide views, such as trying to assess what the answers to these questions would be if we could actually poll everyone in America.

This is essentially an issue of determining averages for a broad sweep of the population, which would be highly affected by the self-selection bias in the case of these culturally-relevant questions where (I believe most of us suspect) the self-selected group has significantly differing views from the broad population. This is where the need for further data-mining and stratification would be exceptionally necessary to try and produce accurate estimates from the dataset.


2) Assessing the variation in these views across the country (or other forms of group slicing).

For these purposes, it doesn't matter as much that the average response is skewed by self-selection bias. What matters instead is the degree to which answers/views correlate with geography (or whatever other form of slicing) after controlling for the relevant demographics. If you think that regional differences in opinion are actually due to a population-wide effect rather than a pure demographic effect or a concentrated subgroup within geographic regions (and if the latter, one that would either disproportionately appear or fall out of the self-selected sample), then this methodology remains fairly valid for showing the potential spread of opinions and where various geographic regions lie on the spectrum. Even better, it controls for demographic effects, which means you get an idea of how views differ within a relatively similar demographic, rather than say a culturally conservative skew due entirely to an older population.

Of course, if you want to use the data to determine the geographic spread in opinions including differences in demographics (say, for elections or something like that), then you again need to get back to slicing, dicing, and stratifying the data to pull out anything useful. But still, there's at least one use for which this data should be pretty decent without too much work.


Posted by: Po-Mo Polymath | Link to this comment | 07-14-09 10:41 AM
horizontal rule
154

153.2: The biggest issue I could see with #2 would be if OKCupid had a strongly different demographic or reputation in different areas (for instance if in some areas there were very strong regional or local competitors among specific demographics such as religious folk or what have you).


Posted by: JP Stormcrow | Link to this comment | 07-14-09 10:50 AM
horizontal rule
155

153.2 sounds like "If you assume that this information tells you something useful about geographical distribution, then it tells you something useful about geographical distribution," but I'm probably missing something.


Posted by: Robert Halford | Link to this comment | 07-14-09 10:58 AM
horizontal rule
156

Actually, you can, if you're clever about it, by properly stratifying the sample* (which is easier to do the more respondents you get). As Stormcrow suggests in 70, making demographic corrections goes a long way to countering the sample bias of self-selection. Obviously some bias remains, but there is a creditable argument that this particular bias is no worse, and possibly a little better, than that introduced by non-response.

I don't think I agree with this, I could see it working for some samples but the self-selection associated with OKCupid is just too weird to be controlled by observables. The 50 year old left-handed guys living in the Northeast subscribing to OKCupid will still be quite different from the typical 50 year old left-handed Northeastern guy in the general population.

On 153.2, I agree with Halford, and would say further that the geographic selection for OKCupid seems like it could be very different than the geographic selection in the population as a whole. A 30 year old OKCupid subscriber in North Dakota is likely to be atypical in ways that a 30 year old OKCupid subscriber in NY City is not.


Posted by: PGD | Link to this comment | 07-14-09 11:32 AM
horizontal rule
157

155: Well, I'm mostly describing the necessary precondition for this to tell you something about geographic spreads, which is mostly that a) they exist, and b) the geographic differences in attitude correlate positively across demographic groups. I think both are pretty reasonable assumptions.


Posted by: Po-Mo Polymath | Link to this comment | 07-14-09 12:00 PM
horizontal rule
158

157: A positive correlation alone would tell you something valid about the direction of geographic differences, but couldn't you still be wildly off about the magnitudes?


Posted by: PGD | Link to this comment | 07-14-09 12:23 PM
horizontal rule
159

Using the commonplace heuristic of "social science research results that reinforce my preconceived notions are presumptively valid", I will point out that the outlier status of WV on the guns question will stand up to any amount of additional scrutiny.


Posted by: Knecht Ruprecht | Link to this comment | 07-14-09 12:30 PM
horizontal rule
160

158: Yeah, absolutely. Well, except for the magnitude within the self-selected group. In the case of something like OKCupid, which will probably select for more educated, internet-using, 20-30 somethings, that self-selected group may give a perfect view into the magnitudes of differences among a potential dating pool even for those of us who haven't used OKCupid.


Posted by: Po-Mo Polymath | Link to this comment | 07-14-09 12:52 PM
horizontal rule