Ridge regression

Statwing now enables you to use any of three kinds of regression.

1. Ordinary Least Squares (OLS): OLS is the most common kind of regression.

2. M-estimation: M-estimation regression downweights the impact of outliers. One problem with OLS regression is that if the variable being predicted has an outlier (e.g., most values are between 100 and 200, but one value is 1,000), that extreme outlier has a large impact on the result. M-estimation downweights the outlier so that one outlier can’t have a large impact on the results.

3. Ridge regression: Both OLS and M-estimation run into trouble with “multicollinearity” – that is, when two or more input variables are correlated with both each other and the output variable. Often in a situation like that, OLS or M-estimation will attribute all the value to one of those variables instead of splitting it between the two. Ridge regression lets you tune your regression results so that the value is spread more equally among correlated variables.

To explore, go to our regression demo dataset and click the little cog icon in the upper right corner of the result card, then select a different type of regression.

Appraisers: Give 10% off Statwing, Get 10% off Statwing

By giving a 10% discount off of Statwing to other appraisers, you can now get Statwing for 10% off, 20% off, or even for free. When you next sign into Statwing you’ll see a “Get 10% off” button in the upper right: Click that to get a personalized link: Copy that link and send itContinue Reading

Fannie Mae: GLA Adjustments Should Be Higher

Fannie Mae announced last week that it is concerned that many GLA adjustments are “artificially low.” As evidence, they noted that more expensive homes have much higher Price/GLAs than less expensive homes, but only slightly higher GLA adjustments: (Click for a larger version)   Fannie Mae implicitly blamed both itself and automated review systems: “TheContinue Reading

Delightful logistic regression

Not unexpectedly, we followed up linear regression with logistic regression.It’s ready to go, so give it a shot with our demo dataset, predicting how likely someone is to be married based on their age, sex, religion, and anything else you’d like to use to predict it: https://www.statwing.com/demos/logistic-regression

Delightful multiple linear regression is finally here!

We’ve finally finished multiple linear regression, and it’s awesome. You’ll never find an easier way to disentangle how input variables affect an output variable.Results explained in plain EnglishAutomatic alerts about issues with the regression model, and how to fix themAutomatic visualizationsTwo-click data transformations to improve your modelPlainly written guides to the regression process and interpretingContinue Reading

Statwing integrates with Quandl

Next time you use Quandl, click the “Statwing” button to statistically analyze a Quandl dataset.Let’s say you were looking at this Quandl dataset of bitcoin prices over time. You’d click the “Statwing” button in the right sidebar, find yourself in Statwing, and then start playing with the data.In two clicks you can get a niceContinue Reading

Creating new variables and cleaning data in Statwing

You can now create new variables for analysis in Statwing: Type Example Bucketing numbers into groups 6-7 → “Satisfied” 3-5 → “Neutral” 1-2 → “Unsatisfied” Grouping categories together USA & Mexico & Canada → “North America” Colombia & Venezuela & (etc.) → “South America” Mathematical functions and formulas =median(Score1, Score2, Score3) → median of theContinue Reading

Yelp Dataset Challenge

Yelp is awarding $50k+ in prizes for interesting analyses as part of its Dataset Challenge.We’ve loaded its dataset of ~42k businesses, their ratings, and their attributes into Statwing to make them easy to explore.Check it out.

Data from Defense Department Program 1033

The New York Times just released a dataset with every order from state and local governments for surplus military items (the result of a Freedom of Information Act request).We’ve loaded the dataset into Statwing to make it easy to explore.Check it out.

Every MLB pitch thrown in 2014

We found a really fun dataset to play around with: every Major League Baseball pitch thrown so far in 2014, as tracked by the PITCHf/x system. It includes everything you’d want to know about a pitch: it’s speed, the type of pitch, how much it broke on each spatial axis, how many men were onContinue Reading