DV Posts

Data Visualization readings – last 4 months of 2013.

(time to read is shrinking…)

0. The Once and Future Prototyping Tool of Choice

1. Block by Block, Brooklyn’s Past and Present

2. Data Visualization and the Blind


4. Old Charts

5. Back To Basics

6. In-Memory Data Grid Key to TIBCO’s Strategy

7. Submarine Cable Map

8. Interview with Nate Silver:

9. Qlikview.Next will be available in 2014

10. Importance of color?

11. Qlikview.Next has a gift for Tableau and Datawatch

12. (October 2013) Tableau posts 90% revenue gain and tops 1,000 staffers, files for $450 million secondary offering

13. The Science Of A Great Subway Map

14. SEO Data Visualization with Tableau

15. John Tukey “Badmandments”

Supplementary BADMANDMENTS:

  • 91. NEVER plan any analysis before seeing the DATA.
  • 92. DON’T consult with a statistician until after collecting your data.
  • 94. LARGE enough samples always tell the truth

16. Thinking about proper uses of data visualization.

17. Big BI is Stuck: Illustrated by SAP BusinessObjects Explorer

18. IBM (trying to catch up?) bets on big data visualization

19. Site features draft designs and full views of the Treemap Art project (By Ben Shneiderman)

20. A Guide to the Quality of Different Visualization Venues

21. Short History of (Nothing) Data Science

22. Storytelling: Hans Rosling at Global Health – beyond 2015

23. DataWatch Quarterly Review: Rapid Growth Finally Materializing

24. QlikView Extension – D3 Animated Scatter Chart


25. SlopeGraph for QlikView (D3SlopeGraph QlikView Extension)

26. Recipe for a Pareto Analysis

27. Color has meaning

28. TIBCO’s Return To License Growth Frustratingly Inconsistent

29. Automated Semantics and BI

30. What is wrong with definition of Data Science?

31. Scientific data became so complex, we have to Invent new Math to deal with it

32. Samples

My Best Wishes for 2014 to all visitors of this Blog!


2013 was very successful year for Data Visualization (DV) community, Data Visualization vendors and for this Data Visualization Blog (number of visitors per grew from average 16000 to 25000+ per month).

From certain point of view 2013 was the year of Tableau – it went public, Tableau has now the largest Market Capitalization among DV Vendors (more than $4B as of Today) and its strategy (Data to the People!) became the most popular among DV users and it had (again) largest YoY revenue growth (almost 75% !) among DV Vendors. Tableau already employed more than 1100 people and still has 169+ job openings as of today. I wish Tableau to stay the Leader of our community and to keep their YoY above 50% – this will not be easy.

Qliktech is the largest DV Vendor and it will exceed in 2014 the half-billion dollars benchmark in revenue (probably closer to $600M by end of 2014) and will employ almost 2000 employees. Qlikview is one of the best DV product on market. I wish in 2014 Qlikview will create Cloud Services, similar to Tableau Online and Tableau Public and I wish Qlikview.Next will keep Qlikview Desktop Professional (in addition to HTML5 client).

I wish TIBCO will stop trying to improve BI or make it better – you cannot reanimate a dead horse; instead I wish Spotfire will embrace the approach “Data to the People” and act accordingly. For Spotfire my biggest wish is that TIBCO will spin it off the same way EMC did with VMWare. And yes, I wish Spofire Cloud Personal will be free and enabled to read at least local flat files and local DBs like Access.

2014 (or may be 2015?) can witness new, 4th DV player coming to competition: Datawatch bought recently Panopticon and if it will complete integration of all products correctly and add features which other DV vendors above already have (like Cloud Services), it can be very competitive player. I wish them luck!


Microsoft released in 2013 a lot of advanced and useful DV-related functionality and I wish (I recycling this wish for many years now) that Microsoft finally will package the most its Data Visualization Functionality in one DV product and add it to Office 20XX (like they did with Visio) and Office 365 instead of bunch of plug-ins to Excel and SharePoint.

It is a mystery for me why Panorama, Visokio and Advizor Solutions still relatively small players, despite all 3 of them having an excellent DV features and products. Based on 2013 IPO experience with Tableau may be the best way for them to go public and get new blood? I wish to them to learn from Tableau and Qlikview success and try this path in 2014-15…

For Microstrategy my wish is very simple – they are only traditional BI player who realised that BI is dead and they started in 2013 (actually before then 2013) a transition into DV market and I wish them all success they can handle!

I also think that a few thousands of Tableau, Qlikview and Spotfire customers (say 5% of customer base) will need (in 2014 and beyond) more deep Analytics and they will try to complement their Data Visualizations with Advanced Visualization technologies they can get from vendors like http://www.avs.com/

My best wishes to everyone! Happy New Year!


With releases of Spotfire Silver (soon to to be a Spotfire Cloud), Tableau Online and attempts of a few Qlikview Partners (but not Qliktech itself yet) to the Cloud and providing their Data Visualization Platforms and Software as a Service, the Attributes, Parameters and Concerns of such VaaS or DVaaS ( Visualization as a Service) are important to understand. Below is attempt to review those “Cloud” details at least on a high level (with natural limitation of space and time applied to review).

But before that let’s underscore that Clouds are not in the skies but rather in huge weird buildings with special Physical and Infrastructure security likes this Data Center in Georgia:


You can see some real old fashion clouds above the building but they are not what we are talking about. Inside Data Center you can see a lot of Racks, each with 20+ servers which are, together with all secure network and application infrastructure contain these modern “Clouds”:


Attributes and Parameters of mature SaaS (and VaaS as well) include:

  • Multitenant and Scalable Architecture (this topic is too big and needs own blogpost or article). You can review Tableau’s whitepaper about Tableau Server scalability here: http://www.tableausoftware.com/learn/whitepapers/tableau-server-scalability-explained
  • SLA – service level agreement with up-time, performance, security-related and disaster recovery metrics and certifications like SSAE16.
  • UI and Management tools for User Privileges, Credentials and Policies.
  • System-wide Security: SLA-enforced and monitored Physical, Network, Application, OS and Data Security.
  • Protection or/and Encryption of all or at least sensitive (like SSN) fields/columns.
  • Application Performance: Transaction processing speed, Network Latency, Transaction Volume, Webpage delivery times, Query response times
  • 24/7 high availability: Facilities with reliable and backup power and cooling, Certified Network Infrastructure, N+1 Redundancy, 99.9% (or 99.99% or whatever your SLA with clients promised) up-time
  • Detailed historical availability, performance and planned maintenance data with Monitoring and Operational Dashboards, Alerts and Root Cause Analysis
  • Disaster recovery plan with multiple backup copies of customers’ data in near real time at the disk level, a 

    multilevel backup strategy that includes disk-to-disk-to-tape data backup where tape backups serve as a secondary level of backup, not as their primary disaster recovery data source.

  • Fail-over that cascades from server to server and from data center to data center in the event of a regional disaster, such as a hurricane or flood.

While Security, Privacy, Latency and Hidden Cost usually are biggest concerns when considering SaaS/VaaS, other Cloud Concerns surveyed and visualized below. Recent survey and diagram are published by Charlie Burns this month:


Other survey and diagram are published by Shane Schick in October 2011 and in February of 2013 by KPMG. Here are concerns, captured by KPMG survey:


As you see above, Rack in Data Center can contain multiple Servers and other devices (like Routers and Switches, often redundant (at least 2 or sometimes N+1). Recently I designed the Hosting Data VaaS Center for Data Visualization and Business Intelligence Cloud Services and here are simplified version of it just for one Rack as a Sample.

You can see redundant network, redundant Firewalls, redundant Switches for DMZ (so called “Demilitarized Zone” where users from outside of firewall can access servers like WEB or FTP), redundant main Switches and Redundant Load Balancers, Redundant Tableau Servers, Redundant Teradata Servers, Redundant Hadoop Servers, Redundant NAS servers etc. (not all devices shown on Diagram of this Rack):


I got many questions from Data Visualization Blog’s visitors about differences between compensation for full-time employees and contractors. It turned out that many visitors are actually contractors, hired because of their Tableau or Qlikview or Spotfire skills and also some visitors consider a possibility to convert to consulting or vice versa: from consulting to FullTimers. I am not expert in all these compensation and especially benefits-related questions, but I promised myself that my blog will be driven by vistors’s requests, so I google a little about Contractor vs. Full-Time worker compensation and below is brief description of what I got:

Federal Insurance Contribution Act mandates Payroll Tax splitted between employer (6.2% Social Security with max $$7049.40 and 1.45% Medicare on all income) and employee, with total (2013) as 15.3% of gross compensation.


In addition you have to take in account employer’s contribution (for family it is about $1000/per month) to medical benefits of employee, Unemployment Taxes, employer’s contribution to 401(k), STD and LTD (short and long term disability insurances), pension plans etc.

I also added into my estimate of contractor rate the “protection” for at least 1 month GAP between contracts and 1 month of salary as bonus for full-time employees.


Basically the result of my minimal estimate as following: you need to get as a contractor the rate at least 50% more than base hourly rate of the full-time employee. This  base hourly rate of full-time employee I calculate as employee’s base salary divided on 1872 hours: 1872 = (52 weeks*40 hours – 3 weeks of vacation – 5 sick days – 6 holidays) = 2080 hours – 208 hours (Minimum for a reasonable PTO, Personal Time Off) = 1872 working hours per year.

I did not get into account any variations related to the usage of W2 or 1099 forms or Corp-To-Corp arrangements and many other fine details (like relocation requirements and overhead associated with involvement of middlemen like headhunters and recruiters) and differences between compensation of full-time employee and consultant working on contract – this is just a my rough estimate – please consult with experts and do not ask me any questions related to MY estimate, which is this:

  • Contractor Rate should be 150% of the base rate of a FullTimer

RS-COLLEGE LOAN SCAMS low resIn general, using Contractors (especially for business analytics) instead of Full-timers is basically the same mistake as outsourcing and off-shoring: companies doing that do not understand that their main assets are full-time people. Contractors are usually not engaged and they are not in business to preserve intellectual property of company.

For reference see Results of Dr. Dobbs 2013 Salary Survey for Software Developers which are very comparable with salary of Qlikview, Tableau and Spotfire developers and consultants (only in my experience salary of Data Visualization Consultants are 10-15% higher then salaries of software developers):


This means that for 2013 the Average rate for Qlikview, Tableau and Spotfire developers and consultants should be around 160% of the base rate of a average FullTimer, which ESTIMATES to Effective Equivalent Pay to Contractor for 1872 hours per Year as $155,200 and this is only for average consultant... If you take less then somebody tricked you, but if you read above you already know that.

2400 years ago the concept of Data Visualization was less known, but even than Plato said “Those who tell stories rule society“.


I witnessed multiple times how storytelling triggered the Venture Capitalists (VCs) to invest. Usually my CEO (biggest BS master on our team) will start with a “60-seconds-long” short Story (VCs called them “Elevator Pitch”) and then (if interested) VCs will do a long Due Diligence Research of Data (and Specs, Docs and Code) presented by our team and after that they will spend comparable time analyzing Data Visualizations (Charts, Diagrams, Slides etc.) of our Data, trying to prove or disprove the original Story.

Some of conclusions from all these startup storytelling activity were:

  • Data: without Data nothing can be proved or disproved (Action needs Data!)

  • View: best way to analyze Data and trust it is to Visualize it (Seeing is Believing!)

  • Discovery of Patterns: visually discoverable trends, outliers, clusters etc. which form the basis of the Story and follow-up actions

  • Story: the Story (based on that Data) is the Trigger for the Actions (Story shows the Value!),

  • Action(s): start with drilldown to a needle in haystack, embed Data Visualization into business, it is not an Eye Candy but a practical way to improve the business

  • Data Visualization has 5 parts: Data (main), View (enabler), Discovery (visually discoverable Patterns), Story (trigger for Actions) and finally the 5th Element – Action!

  • Life is not fair: Storytellers were there people who benefited the most in the end… (no Story no Glory!).


And yes, Plato was correct – at least partially and for his time. Diagram above uses analogy with 5 Classical Greek Elements. Plato wrote about four classical elements (earth, air, water, and fire) almost 2400 years ago (citing even more ancient philosopher) and his student Aristotle added a fifth element, aithêr (aether in Latin, “ether” in English) – both men are in the center of 1st picture above.

Back to our time: the Storytelling is a hot topic; enthusiasts saying that “Data is easy, good storytelling is the challenge” http://www.resource-media.org/data-is-easy/#.URVT-aVi4aE or even that “Data Science is a Storytelling”: http://blogs.hbr.org/cs/2013/03/a_data_scientists_real_job_sto.html . Nothing can be further from the truth: my observation is that most Storytellers (with a few known exceptions like Hans Rosling or Tableau founder Pat Hanrahan) ARE NOT GOOD at visualizing but they still wish to participate in our hot Data Visualization party. All I can say is “Welcome to the party!”

It may be a challenge for me and you but not for people who had a conference about storytelling: this winter, 2/27/13 in Nashville, KY: http://www.tapestryconference.com/ :

Some more reasonable  people referring to storytelling as a data journalism and narrative visualization: http://www.icharts.net/blogs/2013/pioneering-data-journalism-simon-rogers-storytelling-numbers

Tableau founder Pat Hanrahan recently talked about “Showing is Not Explaining”. In parallel, Tableau is planning (after version 8.0) to add features that support storytelling by constructing visual narratives and effective communication of ideas, see it here:

Collection of resources on storytelling topic can be found here: http://www.juiceanalytics.com/writing/the-ultimate-collection-of-data-storytelling-resources/

You may also to check what Stephen Few thinks about it here: http://www.perceptualedge.com/blog/?p=1632

Storytelling as an important part (using Greek Analogy – 4th Classical Element (Air) after Data (Earth), View (Water) and Discovery (Fire) and before Action (Aether) ) of Data Visualization has a practical effect on Visualization itself, for example:

  • if Data View is not needed for Story or for further Actions, then it can be hidden or removed,

  • if number of Data Views in Dashboard is affecting impact of (preferably short Data Story), then number of Views should be reduced (usually to 2 or 3 per dashboard),

  • If number of DataPoints is too large per View and affecting the triggering power of the story, then it can be reduced too (in conversations with Tableau they even recommending 5000 Datapoints per View as a threshold between Local and Server-based rendering).


Below you can find samples of Guidelines and Good Practices for Data Visualization (mostly with Tableau), which I used recently.

best-practiceSome of this samples are Tableau-specific, but others (may be with modifications) can be reused for other Data Visualization Platform and tools. I will appreciate feedback, comments and suggestions.

Naming Convention for Tableau Objects

  • Use CamelCase Identifiers: Capitalize the 1st letter of each concatenated word

  • Use Suffix for Identifiers with preceding underscore to indicate the type (example: _tw for workbooks).

Workbook Sizing Guidelines

  • Use Less than 5 Charts per Dashboard, Minimize the number of Visible TABs/Worksheets

  • Move Calculations and Functions from Workbook to the Data.

  • Use less than 5000 Data-points per Chart/Dashboard to enable Client-side rendering.

  • To enable Shared Sessions, don’t use filters and interactivity if it is not needed.

Guidelines for Colors, Fonts, Sizes

  • To express desirable/undesirable points, use green for good, red for bad, yellow for warning.

  • When you are not describing “Good-Bad situation” (thanks to feedback of visitor under alias “SF”) , try to use pastel, neutral and blind colors, e.g. similar to “Color Blind 10” Palette from Tableau.

  • Use “web-safe” fonts, to approximate what users can see from Tableau Server.

  • Use either auto-resize or standard (target smaller screen) sizes for Dashboards

Data and Data Connections used with Tableau

  • Try to avoid pulling more than 15000 rows for Live Data Connections.

  • For Data Extract-based connections 10M rows is the recommended maximum.

  • For widely distributed Workbooks use of Application IDs instead of Personal Credentials.

  • Job failure due expired credentials leads to suspension from Schedule, so try to keep embedded credentials up to date


Tableau Data Extracts (TDE)

  • If Refresh of TDE takes more than 2 hours, consider to redesign it.

  • Reuse and share TDEs and Data Sources as much as possible.

  • Use of Incremental Data Refresh instead of Full Refresh when possible.

  • Designate Unique ID for each row when Incremental Data Refresh is used.

  • Try to use free Tableau Data Exract API instead of licensed Tableau Server to create Data Extracts

Scheduling of Background Tasks with Tableau

  • Serial Schedules is recommended; avoid the usage of hourly Schedules.

  • Avoid scheduling during peak hours (8am-6pm), consider weekly instead of daily schedules.

  • Optimize Schedule Size, group tasks related to the same project into one Schedule, if total tasks execution exceeds 8 hours, split Schedule on a few with similar Name but preferably with different starting time.

  • Maximize the usage of Monthly and Weekly Schedules (as oppose to Daily Schedules) and usage of weekends and nights.

Guidelines for using Charts

  • Use Bars to compare across categories, use Colors with Stacked or Side-by-Side Bars for deeper Analysis

  • Use Line for Viewing Trends over time, consider Area Charts for Multi-lines

  • Minimize the usage of Pie Charts; when appropriate – use it for showing proportions. It is recommended to limit pie wedges to six.

  • Use Map to show geocoded data, consider use maps as interactive filters

  • Use Scatter to analyze outliers, clusters and construct regressions


You can find more about Guidelines and Good Practices for Data Visualization here: http://www.tableausoftware.com/public/community/best-practices

The most popular (among business users) approach to visualization is to use a Data Visualization (DV) tool like Tableau (or Qlikview or Spotfire), where a lot of features already implemented for you. Recent prove of this amazing popularity is that at least 100 million people (as of February 2013),  used Tableau Public as their Data Visualization tool of choice, see


However, to make your documents and stories (and not just your data visualization applications) driven by your data, you may need the other approach – to code visualization of your data into your story and visualization libraries like  popular D3 toolkit can help you. D3 stands for “Data-Driven Documents”. The Author of D3 Mr. Mike Bostock designs interactive graphics for New York Times – one of latest samples is here:


and NYT allows him to do a lot of Open Source work which he demonstartes at his website here:

https://github.com/mbostock/d3/wiki/Gallery .


Mike was a “visualization scientist” and a computer science PhD student at #Stanford University and member of famous group of people, now called “Stanford Visualization Group”:


This Visualization Group was a birthplace of Tableau’s prototype – sometimes they called it  “a Visual Interface” for exploring data and other name for it is Polaris:


and we know that creators of Polaris started Tableau Software. One of other Group’s popular “products” was a graphical toolkit (mostly in JavaScript, as oppose to Polaris, written in C++) for Visualization, called ProtoVis:


– and Mike Bostock was one of ProtoViz’s main co-authors. Less then 2 years ago Visualization Group suddenly stopped developing ProtoViz and recommended to everybody to switch to D3 library


authored by Mike. This library is Open Source (only 100KB in ZIP format) and can be downloaded from here:



In order to use D3, you need to be comfortable with HTML, CSS, SVG, Javascript programming, DOM (and other Web Standards); understanding of jQuery paradigm will be useful too. Basically if you want to be at least partially as good as Mike Bostock, you need to have a mindset of a programmer (I guess in addition to business user mindset), like this D3 expert:


Most of successful early D3 adopters combining even 3+ mindsets: programmer, business analyst, data artist and even sometimes data storyteller. For your programmer’s mindset you may be interested to know that D3 has a large set of Plugins, see:


and rich #API, see https://github.com/mbostock/d3/wiki/API-Reference

You can find hundreds of D3 demos, samples, examples, tools, products and even a few companies using D3 here: https://github.com/mbostock/d3/wiki/Gallery


« Previous PageNext Page »


Get every new post delivered to your Inbox.

Join 367 other followers