America Goes to the Doctor

Advertisement: Statwing makes an easy-to-use data analysis tool. Click through any of the images below to see that analysis in Statwing (and play around with the dataset).

You know how when you go to the doctor you get your height, weight, blood pressure, etc. measured? We popped thousands of those readings into Statwing (from Kaggle’s  Practice Fusion Analyze This 2012 Prediction Challenge), and the data is pretty fun to play with.[1] This dataset isn’t perfectly representative, but it gives a decent feel for how vital signs, heights, and weights are distributed in the population.[2] Note that the data is U.S. only.[3]

For example, here’s the distribution of weights of Americans:

Weight by Gender

Click the image to see the analysis in Statwing. It’s much more interactive, has averages and medians and the like, and you can run additional analyses of other variables.


We all know that height and weight are correlated, but how much? (Click through for statistical test results.)

Weight vs. Height

Click the image to see it in Statwing (with statistical test results), and to run analyses on other vital signs.


When your blood pressure is taken, two numbers are recorded (you’ll hear “### over ###”). Turns out the first number (systolic) is your blood pressure when the heart’s ventricles are contracted, having just pumped blood, and your blood pressure is at its highest point in the heartbeat cycle. The second number (diastolic) is your blood pressure when the heart’s ventricles are full of blood, about to pump, and your blood pressure is at its lowest. They’re correlated to roughly the same degree as height and weight are.

Systolic by Diastolic

Click the image to see it in Statwing (with statistical test results), and to run analyses on other vital signs.

They don’t just vary from each other, they’re also used to assess different health issues. For example, high systolic blood pressure is a better indicator of potential cardiac issues than high diastolic blood pressure.


There’s a lot more fun stuff in this dataset. You can pivot by region, age, body temperature, and more. Take a look.

Discussion on Hacker News
Statwing Front Page



[1] Data prep: Data was downloaded from here. We started with visits to family practitioners, general practitioners, and doctors of internal medicine from the training_SyncTranscript file. We removed records with no height or weight, outliers that appeared to be measurement errors (for example, weight numbers that were probably kilograms instead of pounds), folks experiencing moderate hypothermia (temperature readings 90° Fahrenheit and lower), and one individual who ruined all our charts by weighing 1150 pounds. We then randomly selected data so there was no more than one visit per patient, then brought in data from training_SyncPatient to get age, gender, and location. We didn’t have exact birthdays so ages are estimated from birth year.

[2] The dataset isn’t necessarily representative of the American public generally. Younger people are significantly underrepresented. People often go to the doctor when they’re sick, so you might expect (for example) a higher than typical number of feverish temperatures. Et cetera. As a sanity check we compared dataset averages with known averages for height and weight (per gender) and found them to be within an inch and within a few pounds.

[3] The data would look quite different if it were not U.S.-only. For example, Americans weigh about 15 pounds more than Germans on average.

Sorry, comments are closed for this post.