SAS versus R for business analysts

Click to read this post

Over on R4Stats, I replied to Bob Muenchen’s article, Forecast Update: Will 2014 be the Beginning of the End for SAS and SPSS?

Personally, I think SAS is a wonderful application, with my SAS experience starting in SAS programming back in 1989 (mainframes, along with Fortran), SAS Enterprise Guide (I wrote SAS for Dummies, the first two editions with Chris Hemedinger) and SAS Enterprise Miner.   Additionally, I have used JMP, SAS Data Integration Studio, SAS Forecast Studio and several other SAS tools.

On the other hand, I have used R since 2004 on several projects and S (precursor to R) since the 90’s in biopharm. I find R truer to being a modern programming language while SAS is truer to being an analyst programming language. Perhaps I am biased? But, the way I think of attacking problems with data and my typical need to massage the data in a wide range of ways, SAS is simply superior in my opinion. The flow of the language, the ease of readability and the powerful DATA step are still my favorite programming world. However, if I am seeking most any statistical test under the sun, R is clearly superior.

Unfortunately, R doesn’t have a clear, de-facto GUI (graphical user interface) that is well-designed

Read more

Joyful or informative charts? Best practices in visual analytics

Small_packed_bubble_chartStephen Few, noted visual analytics expert and the original inspiration for our work in the field, recently wrote about criticisms of best data visualizations practices. In particular, Amanda Cox of the New York Times said, “There’s a strand of the data viz world that argues that everything could be a bar chart. That’s possibly true but also possibly a world without joy.” And Nathan Yau of Flowing Data wrote, “in visualization you eventually learn that there’s more to the process than efficient graphical perception and avoidance of all things round. Design matters, no doubt, but your understanding of the data matters much more.” These are both people who have a body of work that I admire but I am also surprised at these comments.

This discussion reminds me of a similar problem in marketing and web analytics. Generating traffic that leads to sales is good. Eventually, someone finds a way to generate traffic that leads to not many new sales, but management is misled to think this must be good since traffic leads to sales. This is similar to “look, this chart is beautiful“, but hard to interpret or understand. So, while we delivered fun graphs, minimal information is shared. This may be good for traffic, but not so much for higher sales.

I suspect that part of this recent criticism can be traced back to Stephen’s recent criticism of Tableau, “Tableau Veers from the Path“. In it, he mentions a new graph type in Tableau, packed bubble charts and contrasts them with bar charts. This is an example of the “avoidance of all things circular”. Is Stephen truly anti-joy@f16 Will an example show him to be wrong@f17 Let’s give it a try and you can judge for yourself.

Here’s a packed bubble chart example

Read more

Alteryx Inspire 2013 Tableau Talk

Download the presentation Picture This! Your Data in Tableau. Watch my big data talk here and download the example workbooks here (requires Tableau 8 to open). Attending this conference was a great experience. Based on multiple customer discussions at the conference, Alteryx is a great product for personal data integration, data enrichment and predictive analytics. … Read more

Estimating future success rates from initial experience
surveys and observation (tutorial)

A wide range of common business questions are often decided incorrectly because decision-makers overlook, forget or neglect the application of a simple concept from statistics. In this tutorial we will walk you through several examples to avoid this potentially costly mistake. Examples where this technique can help include:

Is my ad worth the price?
Conversion (CTR): how many customers converted to a paying customer after clicking on an Google ad and visiting a special offer web page? Based on the revenue generated is the ad price too high?

How many of my customers have children?
Estimating customer demographics: based on a one day survey in every store, what percent of our entire customer base have children?

Who will win the election?
Survey results: what percent of likely voters will vote for Obama based on the responses from a 1,000 people in a poll?

Bringing down the house?
Winning a bet: if my friend flips a coin 10 times and it landed on heads 9 times, is this a “fair” coin?

All of these questions and many others can be answered with the technique explained and demonstrated in this article.

 

Which states have the most Miss America winners?

Here is a fun example about the Miss America pageant, it appeared on the Ask.com home page.

Notice that 27% of Ask.com users picked the correct state for the most Miss America winners, is that good? Well, we should ask how you would perform if you had no information and simply guessed at the answer. With four choices and only one correct answer, you have a 1 in 4 chance (that’s 1/4 = 25%) of guessing the answer even if you have no clue.

So, is 27% actually better than all of these people just guessing@f4 The answer is “it depends” on a missing piece of information- how many people answered this question. If 100 people answered it and 27 answered correctly, there is a good chance that they are all simply guessing. However, if 10,000 answered this question and 2,700 answered it correctly, there is a good chance that some of them answered better than just guessing.

 

The classic illustration of success- flip a coin

You may be puzzled at this point. Don’t fear. Let me move to a simpler example, flipping a coin. Believe it or not, it is very similar to the multiple choice question above, with the main difference being the chance of “success”- guessing heads or tails correctly, which is 1 in 2 or 50%. So, if I flip it once and you are right, then 100% of flips were guessed correctly. However, this one flip being guessed correctly wouldn’t lead me to believe that you had the ability to see the future (or that the coin is an unfair coin that is always heads). How many flips guessed correctly would it take@f5 Like I have seen followed in many business situations, what does your intuition or gut say@f6

Five out of five correct@f7
Twelve out of fourteen@f8
80 out of 100@f9

Here’s the good news, there is a simple

Read more

Data Driven Conference 2012

We are having a great time at the Data Driven Conference in Columbus! Our first session was standing room only and we are presenting the same talk a second time at 1:30 in E161.

Interesting questions include “how do you become better at asking the right questions that lead to better analysis” and “how do you communicate with IT to get better data”?

To buy a copy of The Accidental Analyst, please visit www.AccidentalAnalyst.com.

Here is our infographic that we created

Read more

Book Excerpt: Tableau 7 Quick Table Calculations and Custom Table Calculations

This is a free preview from Rapid Graphs with Tableau Software 7, available in print and Kindle on Amazon and on the Nook at Barnes and Noble. Due to width constraints on this blog, you may notice some loss in resolution compared with the purchased book, which has approximately 2.5 times better resolution.
 
 
 
 
 
 

Read more

Freakalytics newsletter, August 2012

Thank you for your interest in our newsletter. Please share it with your colleagues that can benefit from it. We are happy to share some great news, “The Accidental Analyst” is now available on Amazon in the US and Europe! We will be offering 90 minute book workshops around the country and a one-day course … Read more

Webcast: “Big Data” on your laptop, fast, informative and at your command

NOTE: This fun review of “big data” was inspired by a recent presentation I gave on behalf of Tableau Software at the Big Data Conference in Chicago. You can find the 1st part of this 3 part webcast here, “Performance to Cost Index & my personal history with “Big Data” and Part 2 here, “Big Data” in US History, Exploring the 1790 US Census. This part of the big data series is free, just subscribe or sign in below.

In this presentation, I share an example of working with big data stored on my laptop and the entire analysis happens without any type of connection to remote servers or databases. My analysis uses two tables of interest, the first has 216 million records, over ten years of airline ticket pricing in the US while the second table has 72 million records of US airlines performance data extracted from Hadoop. In the demonstration, which uses currently available technologies, we will quickly explore and analyze this data for interesting trends and patterns.

Read more