Category Archives: Data

An R function to check if you are online…

R-online-function-check-Freakalytics-201503Whether you use R to access online databases in the cloud or want to scrape website data, it's always better to check if you are online before throwing numerous error and warning messages. I wrote this function to work with no additional R packages required.  Also, this function is fast and transparent about what occurs when you use it.

This function was developed Read the entire article on PowerTrip Analytics

How long should my presentation be?

Always leave them wanting more!
–P.T. Barnum
Creator of "The Greatest Show On Earth!®"

About our Presentation Length Calculator
Creating compelling presentations that are clear and actionable are the lifeblood of successful analyst teams. Often, analysts have worked on problems for days or weeks and have much more material to present than is relevant or useful for the audience that will take action. To help analysts communicate effectively, we created the Presentation Length Calculator. It should provide useful guidance for how many slides to have ready in your presentation based on the amount of time you have for your meeting, the backgrounds of the key people in your audience and the level of decision-maker you are presenting for.

The formulas behind these recommendations are based on feedback collected from over 40 executives, more than 100 managers/directors and hundreds of experienced analysts. Of course, no formula is universal, so use these recommendations as guidelines and tailor them based on your industry, your company, your experiences with the audience and your success with past presentations.

The hardest part for many analysts is offering a course of action that is grounded in their analysis. However, for business decision-makers, this is where an analyst actually creates the greatest value for the company. Our data shows that making useful course of action recommendations is critical to the long-term success of analyst teams. If you are nervous about recommendations, a good approach is to share the level of uncertainty about your analysis and consider sharing how you will measure the success or need for course corrections as course actions are implemented by the decision-maker.

Click to use the presentation length calculator by Freakalytics

Read the entire article on PowerTrip Analytics

Communicate Your Results, a Case Study from the
7th C of The Accidental Analyst

You can have brilliant ideas,
but if you can’t get them across,
your ideas won’t get you anywhere.

Engineer on the original Ford Mustang design team and
CEO during Chrysler’s comeback in the 1980’s.


From the Seven C’s of Data Analysis Framework
Maria is a Senior Sales Analyst for an online pet supply store described in our book, The Accidental Analyst: Show Your Data Who’s Boss. She was approached by marketing managers for insights on which states have the best opportunities for sales growth from additional marketing investment. Downnload Maria's presentation for the marketing managers.
Here's a checklist Read the entire article on PowerTrip Analytics

Open source editor for D3 data visualization

d3-NY-Times-FreakalyticsAt Freakalytics, we've used the D3 data visualization library on several client projects and have been impressed with the nearly infinite set of graphing, charting and mapping possibilities.  Unfortunately, we were less impressed with the high learning curve, level of effort and complexity involved in developing and customizing the desired visualizations.

Perhaps you have seen D3 in the New York Times? D3 examples like those in the New York Times are typically made by teams with expertise in D3 and related web technologies.  Now, forward-leaning visual analytics companies like Qlik are opening their API to work in harmony with the wide range of D3 visualizations.

Now, the really good news! An open source effort at the Data Lab of The University of Washington has created Lyra, a point-and-click editor for creating D3 visualizations. We've used it and were impressed with it, so we wanted to share it with you as a learning resource or even a productivity tool. Keep in mind that Lyra is still experimental and requires some effort on your part to properly embed it in your work. The UW Data Labs has created some nice videos, tutorials, examples, a Wiki and a discussion group for your learning benefit.

One of the examples posted by the Data Lab is a classic data visualization piece, Napoleon's March.

RStudio Keyboard Quick Reference by Freakalytics

RStudio-example-ggvis-interactive-graphs-300At Freakalytics, we frequently use R (often referred to as RStats) in our client projects and wanted to share our success using the RStudio Interactive Development Environment (IDE) with you. So, we created the RStudio® Keyboard Quick Reference by Freakalytics. It is available to you, compliments of Freakalytics, as a PDF and later in this article as a searchable data table.
The RStudio IDE was built by the team at RStudio to make you more productive in the R world. It is a free, open source application for Windows, Linux, Mac and UNIX desktop users. RStudio Desktop includes an interactive R console, a smart editor that supports direct code execution, graphing interfaces, code history, a debugger and project management for R code and related files.
Download the RStudio® Keyboard Quick Reference by Freakalytics. The reference card is available as a PDF download for your convenience. The PDF version is printable and usable in most e-book applications.
In addition the PDF version, we are pleased to share online access to the RStudio® Keyboard Quick Reference as a searchable data table (click here to access the searchable data table in a dedicated window.) This searchable data table has all the shortcuts from the PDF -and- advanced shortcuts not shown on the PDF version (which is one-page for newer users of R).
Continue reading

From Business Intelligence to Visual Analytics
Craft a Winning Data Strategy

Stephen McDaniel and Eileen McDaniel, Ph.D.
Freakalytics, LLC

Topics: Data Analysis, Visual Analytics and Business Intelligence

This was originally published in the
TDWI FlashPoint Newsletter in August of 2014
Italicized sections, images and their captions were not part of the TDWI version.

Until recently, visual analytics was considered a niche area. Those days are quickly passing; almost every major analytics and BI vendor is either launching or developing a product focused on visual analytics.

As a data professional, you’ll face challenges in integrating these products into your existing BI infrastructure. How can you successfully implement new visual analytics tools and keep your business customers engaged and happy? Step in front of the inevitable progression to visual analytics by crafting a winning data strategy.

We suggest starting in these three areas:

The growth of WalMart as an interactive dashboard

1. Learn the basics of the visual analytics tools used by the business analysts in your organization. Follow the process of how a real-world project is executed. Solving a typical business problem will give you a chance to experience firsthand what users are doing. You will be surprised at how the tools change your view of the data warehouse and “proper” data structures.

We’ve had many data professionals attend our analytics workshops. Even those with years of experience in the field tell us that managing code and databases is a completely different way of thinking about data compared to analyzing an issue, which has drastically different constraints and goals. Investigating a real problem that the business is facing should help you to see many possible ways that your data stores can be adjusted to enable successful analysis.

Check out our book on the principles of visual analytics, grounded in the scientific method, The Accidental Analyst.  Stephen Few called it, "... a wonderful book, filled with practical advice."

Tools to consider include Qlik Sense, SAP Lumira, Tibco Spotfire, Tableau and Microstrategy Analytics Desktop.  We have successfully used all of these tools in our work with various clients. They all have differing strengths, workflows and design philosophies.  Read more about these products and others in our Candid Quadrants report.


2. Find an ally in each of your key business areas, preferably one that is an expert analyst for a viewpoint from “the other side.” Leverage these analysts for invaluable knowledge to design better data structures in the form of tables, graphs, and system maps in your data systems. This is far more effective than decoding the whole process by yourself. When building data warehouses and downstream analytic data stores, we’ve discovered that expert analysts are often excited and motivated to collaborate on improving the efficiency and value of the data sources in their analyses.

Unlike traditional BI projects, data projects are now a journey with many twists and unexpected turns. Working closely with business allies that understand the data teams and the business are key to success.

Read more about collaborating with business and data teams in our previous TDWI article.


3. Commit to the reality that self-service data management with desktop spreadsheets and databases among business users is not going away. Instead, it will only continue to accelerate over the next few years. Part of this reality is driven by the fact that the appropriate data structure is often dependent on the analysis problem at hand. Another reason driving this growth is that more data streams are flowing into organizations, often at a rate that is overwhelming for analysts and data teams alike.

In our experience, when we help business users improve their data management skills, they are less likely to make mistakes or inaccurate assumptions about the data. They also better comprehend and appreciate the hard work involved in maintaining central systems.

Seize the opportunity to be more successful in your career as a data professional by understanding and incorporating the new landscape of BI and visual analytics into your data warehouse and collaborating closely with business users to establish a strong environment for analytics. Ultimately, data warehouses are about making better decisions in a timely manner, and these suggestions can help you further the utility of your data warehouse.

Excel is a powerful ad-hoc data cleansing tool
Excel has been bashed by statisticians and data teams for years. However, it's a powerful tool for one-off data review and rapid cleansing of data for an urgent analysis. It's also ubiquitous, both in presence and knowledge amongst business analysts. Tread carefully if you think you can "take it away". You can definitely reduce reliance with better tools, training, support and evangelism.

Learn more about Excel for business analytics in our free, recorded webinar.


Stephen McDaniel is an Chief Data Scientist at Freakalytics, LLC and author of several books on analytic software. Eileen McDaniel, PhD, is author of The Accidental Analyst and Director of Analytic Communications at Freakalytics. Both work with clients on strategic analytic projects, teach courses on analytics and are on the faculty at INFORMS.

Three great data visualization stories from Qlik World 2014

Donald-FarmerAt the Qlik World 2014 expert data visualization panel, moderated by Donald Farmer, Donald asked each panelist to offer up one of our favorite data visualizations for inspiration and learning.
Alberto Cairo, Visualization at The University of Miami, offered up John Snow's 1854 map of London that helped demonstrate that cholera was spread by contaminated water. Previously, many believed it was spread by noxious vapours or "bad air".

By meticulously canvassing the St James neighborhood and collecting high-quality data, John Snow was able to convince the local authorities to disable the well at the center of his cholera victim's map. The points are recorded overlaid on a contemporary map of the neighborhood.


Read more about John Snow's map and recognition of him as a founder of modern epidemiology.
Kaiser Fung, Business Analytics and Data Visualization at NYU, cited a highly interactive work exploring the distribution of regional dialects in the United States, "How Y’all, Youse and You Guys Talk". After answering a number of questions about how you would pronounce various words, this data visualization shows you the parts of the country that speak most similar to you. We really like that it gives you previews of your regional dialect based on the last answer as you complete the numerous questions - great motivation to keep you going!

Read more about the person behind this data-intensive and statistical model-driven approach to smart data visualization.
Stephen (of Freakalytics) cited Hans Rosling's TED talk about myths and realities of people in the "developing world". This talk is engrossing because he uses data visualization -and- his extensive work with this body of data to tell several surprising narratives that debunk many pre-conceived notions about the developing world (mistakenly characterized by many as the third world).

While people are impressed by the cool motion of the bubble charts and Hans's passion about the topic, the real message is that storytelling doesn't just happen by throwing up some charts. It's about being knowledgeable about your data, the topic you are discussing and bringing forth thoughtful analyses to inform and spur your audience towards better actions.

Favorite charts for business analytics

I was asked by Donald Farmer of Qlik about my favorite charts.  Donald is leading a keynote panel on data visualization at Qlik's World Conference with myself, Alberto Cairo and Kaiser Fung.

While I can't say that I have a favorite chart, I can definitely state that I often rely heavily on three chart types for much of my work with clients.

#1 A bar chart is likely the most versatile chart type. Capable of representing data by category, data over time and, my favorite, as aligned bars (sometimes called trellised or latticed).




#2 Data over time is big in business, especially seeing how we are performing this year versus last.  Here's a great way to easily see this year (bright purple) versus last year (light gray) in a line chart.  We are doing much better this year, with only April and May showing low or no growth.



#3 Maps are critical to understanding business performance by location.  I often scale the data against a benchmark or target and use a diverging color palette to find best performers and places that may need some help or guidance by management.


Shark Tank investments – using data to understand the Sharks

Using data from the first four seasons of the Shark Tank, Freakalytics has assembled a few fascinating insights for fans and potential entrepreneurs that may come before the Sharks in future seasons.

While Barbara Corcoran is the most frequent investor. Mark Cuban is the investor with the largest amount invested and Mr. Wonderful invests the most, on average.



Lori Greiner paid the highest amount relative to the valuation proposed by the entrepreneur. Note a significant number of the investments were made at 1 times investor valuation, but the vast majority of investments made were below entrepreneur asking valuation.


Just three of the Sharks appeared on all four seasons- Mr. Wonderful, Daymond John and Robert Herjavec. Of these three, Daymond invested the most total dollars.


Notice how showing the same information from the previous chart, a tree map, as a line chart offers different insights at first glance. Investor frequency of appearance and trends by season are now much clearer.


Bringing several charts together as an interactive, analytic dashboard. Notice refinements to the scatter plot- a reference line of average investment size and average valuation vs ask.


Barbara is among the most conservative investors on the show, paying just 49% of entrepreneur valuation and outlaying an average investment of just $92k.


Lori is the most aggressive investor, with an average 87% valuation vs entrepreneur ask. She often invests in products ready for the market now and needing a rapid go to market plan.


Mark Cuban, I will allow you to examine this last Shark selected in the analytic dashboard, for insights.


A few closing thoughts from my analysis

+ Of active investors, Barbara & Daymond pay least on company asking value, 49% & 53%

+ Lori willing to pay the most versus company ask, 87%
   - Perhaps due to her quick payback horizon as a TV/marketing expert
   - Paid highest ratio ever, 300%

+ Mark Cuban invested largest amount, $1.25M
Strikes solid bargain at 63% of asking value

+ Mr. Wonderful has largest average investment size at $175k

+ Similar to Mark, 57% of company asking value

+ Robert is most conservative, $90k/investment
   - 65% of company asking valuation

Disclaimer- there are potential data entry problems in this data and my analysis assumptions vary slightly from other analyses I found online of Shark Tank. However, I believe key trends and insights are substantially similar versus other analyses I found online. Royalty and loan schemes were ignored in this analysis.

Join us at the Qlik 2014 World Conference

Click image for larger view

Eileen and I have been invited to attend the launch of Qlik® Sense at the Qlik 2014 conference in Orlando. As featured speakers, we are holding a session on The Accidental Analyst®, a reliable framework for building analyses to answer real-world business questions. I will also be on an expert data visualization panel with Alberto Cairo. We will discuss what makes data visualizations stand out to clearly inform decision-makers.

We had a great briefing with Qlik CTO Anthony Deighton about Qlik Sense.  So we dove right in and created a dashboard example from a data source that is included with Qlik Sense Desktop. We look forward to seeing what people are already doing with Qlik Sense and hearing about the future directions they may go with it.

Click image for larger view