(this is a repost from my other blog: http://tableau7.wordpress.com/2012/06/09/tableau-and-big-data/ )
Big Data can be useless without multi-layer data aggregations, hierarchical or cube-like intermediary Data Structures, when ONLY a few dozens, hundreds or thousands data-points exposed visually and dynamically every single viewing moment to analytical eyes for interactive drill-down-or-up hunting for business value(s) and actionable datum (or “datums” – if plural means data). One of best expression of this concept (at least how I interpreted it) I heard from my new colleague who flatly said:
“Move the function to the data!”
I got recently involved with multiple projects using large data-sets for Tableau-based Data Visualizations (100+ millions of rows and even Billions of records!). Some of largest examples of their sizes I used were: 800+ millions of records and other was 2+ billions of rows.
So this blog post is to express my thoughts about such Big Data (in average examples above have about 1+ KB per CSV record before compression and other advanced DB tricks, like columnar Databases used by Data Engine of Tableau) as back-end for Tableau. But please keep in mind that as a 32-bit tool, Tableau itself is not ready for Big Data. In addition, I think Big Data is mostly a buzzword and BS and we are forced by marketing BS masters to use sometimes this stupid term.
Here are some Factors involved into Data Delivery from main and designated Database (Back-ends like Teradata, DB2, SQL Server or Oracle) for Tableau-based Big Data Visualizations) into “local” Tableau Visualizations (many people still trying to use Tableau as a Reporting tool as oppose to (Visual) Analytical Tool:
Queuing thousands of Queries to Database Server. There is no guarantee your Tableau query will be executed immediately; in fact it WILL be delayed.
Speed of Tableau Query when it will start to be executed depends on sharing CPU cycles, RAM and other resources with other queries executed SIMULTANEOSLY with your query.
Buffers, pools and other resources available for particular user(s) and queries at your Database Server are different and depends on privileges and settings given to you as a Database User
Network speed: between some servers it can be 10Gbits (or even more), in most cases it is 1Gbit inside server rooms, outside of server rooms I observed in many old buildings (over wired Ethernet) max 100Mbits coming into user’s PC; in case if you using Wi-Fi it can be even less (say 54 Mbits?). If you are using internet it can be even less (I observed speed in some remote offices as 1 Mbit or so over old T-1 lines); if you using VPN it will max out at 4Mbits or less (I observed it in my home office).
Utilization of network. I use Remote Desktop Protocol – RDP to VM (from my workstation or notebook; (VM or VDI Virtual Machine, sitting in server room) and connected to servers with network speed of 1Gbit, but it still using maximum 3% of network speed (about 30 MBits, which is about 3 Megabytes of data per second, which is probably about few thousands of records per seconds.