The Ecological Fallacy

Advertisement: Statwing makes an easy-to-use data analysis tool. Click through any of the images below to see that analysis in Statwing (and play around with related data).

U.S. states with proportionally more immigrants have proportionally more households with income above $100k.[1] Ergo, immigrants are more likely than non-immigrants to have household incomes above $100k.

Foreign Born vs Income

There’s a strong positive correlation between the proportion of a state’s residents that are foreign born and the proportion of it’s households with income over $100k.
Click the image to see statistical output (and to explore similar looking data on median household income and per capita income).

Hopefully something feels off about that logic. Because it’s wrong. Actually the relationship between income and being an immigrant at the individual level is the opposite.

Relative to residents born in America, a relatively low proportion of immigrants have incomes above $100k.

Each column above sums to 100%; for example, 77.9% of immigrants in America have a household income below $100k.[2]
Click the image to see detailed statistical output.

Deducing from the first chart that immigrants are more likely to be well-off is committing the ecological fallacy—attributing qualities at the individual level because of a relationship at a group level.[3]

That example was pretty easy to catch, not least because it feels intuitive that immigrants would tend to have lower income than non-immigrants. But not all ecological fallacies are so easy to spot.

For example, there’s a negative correlation between per capita income in a state and the percent of the 2012 presidential election vote that went to Romney.

State median income is negatively correlated with proportion of the state's vote in the 2012 election that went to Romney.Click the image to see detailed statistical output.

State median income is negatively correlated with proportion of the state’s vote in the 2012 election that went to Romney.[4]
Click the image to see detailed statistical output.

It’s easy to picture rich and liberal cities like San Francisco and New York, hear the phrase “latte liberal” a couple times, and believe that higher income is in fact correlated with voting Democratic.

At an individual level, though, higher income is associated with voting Republican.

Higher income is associated with voting for Republicans.This image from a great New York Times blog post by Andrew Gelman "Red vs. Blue in a New Light." Gelman popularized the recognition that the traditional red-blue electoral college map often leads to conclusions that suffer from the ecological fallacy.

Higher income is associated with voting for Republicans.
This image from a great New York Times blog post by Andrew Gelman and Avi Feller “Red vs. Blue in a New Light.”

The (simplified) explanation for this apparent paradox? Across the country, lower income folk tend to vote Democrat; within blue states, upper income folk also vote Democrat, but in red states they vote Republican. For more on this, read Andrew Gelman’s very awesome book Red State, Blue State, Rich State, Poor State, or at least read this paper of his [pdf] about it.

Making judgments based on group-level data isn’t always bad; sometimes group level data is more accurate [pdf], and at the very least it’s easier to gather. Quite often, conclusions one might make from group-level data are correct, as with a famous 1968 dataset correlating smoking with various cancers (click the image to see cigarettes vs. other cancers, too).

At the state level, consumption of cigarettes is positively correlated with deaths from lung cancer.Click through to see statistical output.

At the state level, consumption of cigarettes is positively correlated with deaths from lung cancer.[5]
Click through to see statistical output.

 

So. Be cautious anytime someone points to group-level data to say something about members of that group. Sometimes the fallacious thinking will be obvious, but often it will not.

 
Discussion on Hacker News
Statwing Front Page

Notes

[1] Unless otherwise noted, all data is collected from the Current Population Survey; this data was calculated from 2008-2012 data. Click the picture to see detailed statistical output.

[2]. Income is presented in bands in the Current Population Survey (e.g., $45,000-$47,499). Running a ranked (nonparametric) t-test across the whole range of incomes also indicates that immigrants tend to have lower incomes.

[3]. Thanks to David Freeman at UC Berkeley for this example from his technical report on the ecological fallacy., which was itself inspired by the original paper discussing the ecological fallacy at length (Robinson, 1950).

[4]. Voting data pulled from Wikipedia.

[5]. This dataset, made famous by Fraumeni (1968) pulled from CMU’s Statlib.

Sorry, comments are closed for this post.