TCC EU 2011- Tableau Server Scaling and Performance Best Practices

With Dan Jewett, VP of Product Management, Tableau
(Dan and Stephen previously worked together at Brio in the late 90’s)

Some comments are the opinion of Freakalytics and
not necessarily those of Tableau

This content is live blogged; there may be occasional errors or omissions.
 
 
Overview

Architecture
Hardware
Server config
Database or data engine@f21
Workbook optimizations for Server publishing
 
 
1) Prefers that most customers see it as a black-box

Content from Desktop
To Tableau Server
People then access your content via no-client AJAX technology (like GMail)
HTTP(S) access via Apache web server, please leave web server alone if at all possible!

VIZQL and web app meet your requests, they work with a security layer to use
– Tableau caching
– – Data engine
— Databases and Tableau Data Extracts

Holding this all together is a repository with user management, content management and a search engine (Lucine project ?name? by Apache)
 
 
2) Hardware- don’t be thrifty here, this makes a big difference in the user experience!

For a modest investment in hardware, you can radically increase performance and capacity
Dell server example- $14k US buys 8 cores, 64 GB memory, 3 TB of fast SCSI (RAID 5); but $38k US triples the CPUs, 128 GB RAM, 4 TB of RAID
Assume with 100 users, this tripling of server capacity adds perhaps 10% to project costs but much better capacity at peak load times (e.g. – Monday mornings when everyone logs in)

Windows Server 2003 or 2008, you want 64 bit OS and hardware!

MEMORY- more is better!   I would trade some CPU’s for more memory if you have to trade off.

Fast disks matter, a lot- RAID config is better
Lots of data extracts can mean need for high storage capacity

Can be hosted on a virtual machine, but physical machines typically perform better (based on anecdotal feedback).

Tableau web clients accessing the server are chatty with the Tableau Server system, so there is a moderate amount of networking speed needed between the two.   However, there are many little requests, so network latency can be an issue.   Consider co-locating smaller servers worldwide instead of one big central system. Viz’s render as lots of little tiles with JavaScript for interaction.
 
 
3) Distributed components- a scalability strategy not a performance strategy (allows lots of users, but not faster viz’s!)

Primary TS machine in the cluster- it is the Load Balancer for client requests
Add worker machines, even moving data engine off to worker machines
Can even move data engine off the client worker machines, simplest way to increase performance- the data engine is the big memory user!

Caching is per process- distribution can actually diminish performance since the caching is not shared amongst machine or processes.

Keep your machines in the same subnet if possible
Firewalls and DMZ’s can slow network communication, sometimes significantly depending on how locked down your systems are…
 
 
4) Caching

Request comes in from user at web client.
a) Fastest- created this view before and in cache, if so, just send cached images- no queries or calcs needed!
b) If no cache image, then do I have the SQL query in cache?   If so, use cached data to render view and send to user.
c) If no cached query, hopefully database is fast and can quickly send results.

Three cache control strategies in the Server config dialog
Minimize queries– hold as long as there is memory available- users hitting refresh data will force data to refresh
Balanced– holds no longer than your specified number of minutes
Most up-to-date– no caching of models or data at all, always fresh/hot data from the oven, but at a much higher cost to both the database and the Tableau Server!

Model cache size- how many viz’s to cache (4 views in a dashboard, that is 4 caches) – 30 is default- should be much higher on server!   (100-200???)
Query cache- size in MB of query results to cache- 64 MB is default- should be much higher on server!   (2,000-4,000 MB on a 64 MB Server)

How many server processes per server core?   2 VizQL and App per core is a good start.   If caching is critical, then add more VizQLs. If your server is lower-end, reduce to one per core.
 
 
5) Database or extract?

Database- live data needed or want database security based on users.
Data extract- faster (unless you have Teradata, Vertica, Netezza, etc.), prefer data changes in reports on a predictable schedule; can handle security by user filters when published or Tableau Server view restrictions.

Note that there are many ways to optimize extracts including hide unused data items, filter based on dashboard need, data aggregation by visible dimensions and used level of date details.   Stephen has seen extracts reduced by 50-99% with these methods.
 
 
6) Better server performance

Real-time virus scanners can kill system performance on the Server, consider nightly virus check instead or restricted scanning

Server timeouts can be optimized- default session release is 4 hours per user request

To prevent runaway queries, Tableau is default config terminates queries lasting longer than 30 seconds.   You might set this lower or higher…   This setting also impacts scheduled extract refreshes.
 
 
7) Workbook optimization

If your workbook is slow on the desktop, it will be slow on Server!
Large workbook file size on desktop should be examined for optimization and removal of unneeded elements
Smaller workbooks are better
Custom bins can be slow
20-80 worksheets in a workbook- avoid this if possible
Tabbed views are slower to render than single views
Large crosstab views can be very slow (10-1,000 pages of crosstabs)


Share the power of R shiny apps across the entire team with YakData
The team at Freakalytics has built YakData brightRserver, our new cloud platform.

Securely share R shiny apps
Develop R shiny apps
All on one dedicated, secure and powerful platform.

Subscribe and keep in touch with us!