Advertisement: Statwing makes an easy-to-use data analysis tool. Click through any of the images below to see that analysis in Statwing (and play around with related data).
U.S. states with proportionally more immigrants have proportionally more households with income above $100k. Ergo, immigrants are more likely than non-immigrants to have household incomes above $100k.
Hopefully something feels off about that logic. Because it’s wrong. Actually the relationship between income and being an immigrant at the individual level is the opposite.
Deducing from the first chart that immigrants are more likely to be well-off is committing the ecological fallacy—attributing qualities at the individual level because of a relationship at a group level.
That example was pretty easy to catch, not least because it feels intuitive that immigrants would tend to have lower income than non-immigrants. But not all ecological fallacies are so easy to spot.
For example, there’s a negative correlation between per capita income in a state and the percent of the 2012 presidential election vote that went to Romney.
It’s easy to picture rich and liberal cities like San Francisco and New York, hear the phrase “latte liberal” a couple times, and believe that higher income is in fact correlated with voting Democratic.
At an individual level, though, higher income is associated with voting Republican.
The (simplified) explanation for this apparent paradox? Across the country, lower income folk tend to vote Democrat; within blue states, upper income folk also vote Democrat, but in red states they vote Republican. For more on this, read Andrew Gelman’s very awesome book Red State, Blue State, Rich State, Poor State, or at least read this paper of his [pdf] about it.
Making judgments based on group-level data isn’t always bad; sometimes group level data is more accurate [pdf], and at the very least it’s easier to gather. Quite often, conclusions one might make from group-level data are correct, as with a famous 1968 dataset correlating smoking with various cancers (click the image to see cigarettes vs. other cancers, too).
So. Be cautious anytime someone points to group-level data to say something about members of that group. Sometimes the fallacious thinking will be obvious, but often it will not.
Discussion on Hacker News
Statwing Front Page
 Unless otherwise noted, all data is collected from the Current Population Survey; this data was calculated from 2008-2012 data. Click the picture to see detailed statistical output.
. Income is presented in bands in the Current Population Survey (e.g., $45,000-$47,499). Running a ranked (nonparametric) t-test across the whole range of incomes also indicates that immigrants tend to have lower incomes.
. Thanks to David Freeman at UC Berkeley for this example from his technical report on the ecological fallacy., which was itself inspired by the original paper discussing the ecological fallacy at length (Robinson, 1950).
. Voting data pulled from Wikipedia.
. This dataset, made famous by Fraumeni (1968) pulled from CMU’s Statlib.