Rundown of data tools

Our friends over at just published a blog post walking through dozens of different data tools (Hadoop, data visualization, databases, etc.). It’s not exhaustive but it’s written in human-friendly language, so we like it.

It’s football season again!

So we’d love to direct your attention to our favorite Statwing blog post of old, Ten Years of NFL Plays Analyzed, Visualized, Quizzified. If you like NFL football, you’ll love that post. If you don’t love NFL football, here’s a lovely post about the human lifecycle presented via responses to the General Social Survey.

Ridge regression

Statwing now enables you to use any of three kinds of regression. 1. Ordinary Least Squares (OLS): OLS is the most common kind of regression. 2. M-estimation: M-estimation regression downweights the impact of outliers. One problem with OLS regression is that if the variable being predicted has an outlier (e.g., most values are between 100Continue Reading

Appraisers: Give 10% off Statwing, Get 10% off Statwing

By giving a 10% discount off of Statwing to other appraisers, you can now get Statwing for 10% off, 20% off, or even for free. When you next sign into Statwing you’ll see a “Get 10% off” button in the upper right: Click that to get a personalized link: Copy that link and send itContinue Reading

Fannie Mae: GLA Adjustments Should Be Higher

Fannie Mae announced last week that it is concerned that many GLA adjustments are “artificially low.” As evidence, they noted that more expensive homes have much higher Price/GLAs than less expensive homes, but only slightly higher GLA adjustments: (Click for a larger version)   Fannie Mae implicitly blamed both itself and automated review systems: “TheContinue Reading

Delightful logistic regression

Not unexpectedly, we followed up linear¬†regression with logistic regression. It’s ready to go, so give it a shot with our demo dataset, predicting how likely someone is to be married based on their age, sex, religion, and anything else you’d like to use to predict it:¬†

Delightful multiple linear regression is finally here!

We’ve finally finished multiple linear regression, and it’s awesome. You’ll never find an easier way to disentangle how input variables affect an output variable. Results explained in plain English Automatic alerts about issues with the regression model, and how to fix them Automatic visualizations Two-click data transformations to improve your model Plainly written guides toContinue Reading

Statwing integrates with Quandl

Next time you use Quandl, click the “Statwing” button to statistically analyze a Quandl dataset. Let’s say you were looking at this Quandl dataset of bitcoin prices over time. You’d click the “Statwing” button in the right sidebar, find yourself in Statwing, and then start playing with the data. In two clicks you can getContinue Reading

Creating new variables and cleaning data in Statwing

You can now create new variables for analysis in Statwing: Type Example Bucketing numbers into groups 6-7 → “Satisfied” 3-5 → “Neutral” 1-2 → “Unsatisfied” Grouping categories together USA & Mexico & Canada → “North America” Colombia & Venezuela & (etc.) → “South America” Mathematical functions and formulas =median(Score1, Score2, Score3) → median of theContinue Reading

Yelp Dataset Challenge

Yelp is awarding $50k+ in prizes for interesting analyses as part of its Dataset Challenge. We’ve loaded its dataset of ~42k businesses, their ratings, and their attributes into Statwing to make them easy to explore. Check it out.