Analytics


NewYear2015Greeting2

 

My best wishes in 2015 to visitors of this Data Visualization blog!

2014 was very unusual for Data Visualization Community. Most important event was the huge change in market competition where Tableau was a clear winner, QLIK lost it leadership position and Spotfire is slowly declining as TIBCO went private. Pleasant surprise was Microsoft, who is finally trying to package Power BI separately from Office. In addition other competitors like Microstrategy, Panorama and Datawatch were unable to gain bigger share in Data Visualization market.

2014 again was the year of Tableau: market capitalization exceeded $6B, YoY growth was highest again, sales approaching $0.5B/year, number of employees almost the same as @QLIK, LinkedIn index exceeded 90000, number of Job Openings increased again and as of today it is 337! I personally stopped comparing Data Visualization products for last few months, since Tableau is a clear winner overall and it will be difficult for others to catch-up unless Tableau will start making mistakes like QLIK and Spotfire/TIBCO did during last few years.

2014 was very confusing for many members of QLIK community, me included. Qlik.Next project resulted in new Qlik Sense Product (I don’t see too much success for it) and Qlikview 12 is still not released, while prices for both QLIK products are not public anymore. Market Capitalization of QLIK is below $3B despite still solids sales (Over $0.5B/year) and YoY growth is way below of Tableau’s YoY. Qlikview’s LinkedIn index now around 60000 (way below Tableau’s) and Qlik Sense’s LinkedIn index is only 286…  QLIK has only 124 Job opening as of today, almost 3 times less then Tableau!

Curiously, BI Guru Mr. Donald Farmer, who joined QLIK 4 years ago (a few months before the release of Qlikview 11) and who was the largest propagandist of Qlik.Next/Qlik Sense, was moved from VP of Product Management position to new “VP of Innovation” @QLIK just before the release of Qlik Sense and we hear much less from Donald now. Sadly, during these 4 years Qlikview 12 was never released, and QLIK never released anything similar to free Tableau Reader, free Tableau Public and Tableau Online (I am still hoping for Qlikview in Cloud) and all Qlikview prices were unpublished…

As a member of Spotfire community, I was sad to see the failure of Spotfire (and its parent TIBCO) to survive as public company: on December 5, Vista Equity Partners completed the acquisition of TIBX for $4.3 billion. I estimate Spotfire sales around $200M/year (assuming it is 20% of TIBCO sales). LinkedIn index of Spotfire (is way below Tableau’s and Qlikview’s) is around 12000 and number of Job Openings is too small. I hope Vista Equity Partners will spinoff the Spotfire in IPO as soon as possible and move all Spotfire’s Development, Support, Marketing and Sales into one American location, preferably somewhere in Massachusetts (e.g. back to Somerville).

Here is a farewell Line Chart (bottom of Image) to TIBX symbol, which was stopped trading 3 weeks ago (compared to DATA and QLIK Time Series (upper and middle Line Charts) for entire 2014):

data_qlik_tibx_2014

Observing and comparing multiple (similar) multidimensional objects over time and visually discovering multiple interconnected trends is the ultimate Data Visualization task, regardless of specific research area – it can be chemistry, biology, economy, sociology, publicly traded companies or even so called “Data Science”.

For purposes of this article I like the dataset, published by World Bank: 1000+ Measures (they called it World Development Indicators) of 250+ countries for over 50+ years – theoretically more then 10 millions of DataPoints:

http://data.worldbank.org/data-catalog/world-development-indicators?cid=GPD_WDI

Of course some DataPoints are missing so I restricted myself to 20 countries, 20 years and 25 measures (more reasonable Dataset with about 10000 DataPoints), so I got 500 Time Series for 20 Objects (Countries) and tried to imitate of how Analysts and Scientists will use Visualizations to “discover” Trends and other Data Patterns in such situation and extrapolate, if possible, this approach to more massive Datasets in practical projects. My visualization of this Dataset can be found here:

http://public.tableausoftware.com/views/wdi12/Trends?amp;:showVizHome=no

In addition to Trends Line Chart (please choose Indicator in Filter at bottom of the Chart, I added (in my Tableau Visualization above) the Motion Chart for any chosen Indicator(s) and the Motion Map Chart for GDP Indicator. Similar Visualization for this Dataset done by Google here: http://goo.gl/g2z1b6 .

As you can see below with samples of just 6 indicators (out of 1000+ published by World Bank), behavior of monitored objects (countries) are vastly different.

GDP trends: clear Leader is USA, with China is the fastest growing among economic leaders and Japan almost stagnant for last 20 years (please note that I use “GDP Colors of each country” for all other 1000+ indicators and Line Charts):

GDPTrends

Life Expectancy: Switzerland and Japan provide longest life to its citizens while India and Russian citizens are expected to live less then 70 years. Australia probably improving life expectancy faster than other 20 countries in this subset.

LifExpectancy

Health Expenditures Per Capita: Group of 4: Switzerland, Norway (fastest growing?), Luxemburg and USA health expenses about $9000 per person per year while India, Indonesia and China spent less then $500:

HealthExpenditurePerCapita

Consumer Price Index: Prices in Russia, India and Turkey growing faster then elsewhere, while prices in Japan and Switzerland almost unchanged in last 20 years:

CPI

Mobile Phones Per 100 Persons: Russia has 182 mobile phones per 100 people(fastest growing in last 10 years) while India has less then 70 cellular phones per 100 people.

CellPhonesPer100

Military Expenses as Percentage of Budget (a lot of missing data when it comes to military expenses!):  USA, India and Russia spending more then others – guess why is that:

MilitaryExpensesPercentageOfBudget

 

You can find many examples of Visual Monitoring of multiple objects overtime. One of samples is https://www.tradingview.com/ where over 7000 objects (publicly traded companies) monitored while observing hundreds of indicators (like share prices, Market Capitalization, EBITDA, Income, Debt, Assets etc.). Example (I did for previous blog post): https://www.tradingview.com/e/xRWRQS5A/

For last 6 years every and each February my inbox was bombarded by messages from colleagues, friends and visitors to this blog, containing references, quotes and PDFs to Gartner’s Magic Quadrant (MQ) for Business Intelligence (BI) and Analytics Platforms, latest can be found here: http://www.gartner.com/technology/reprints.do?id=1-1QLGACN&ct=140210&st=sb .

Last year I was able to ignore these noises (funny enough I was busy by migrating thousands of users from Business Objects and Microstrategy to Tableau-based Visual Reports for very large company), but in February 2014 I got so many questions about it, that I am basically forced to share my opinion about it.

  • 1st of all, as I said on this blog many times that BI is dead and it replaced by Data Visualization and Visual Analytics. That was finally acknowledged by Gartner itself, by placing Tableau, QLIK and Spotfire in “Leaders Quarter” of MQ for 2nd year in a row.

  • 2ndly last 6 MQs (2009-2014) are suspicious for me because in all of them Gartner (with complete disregard of reality) placed all 6 “Misleading” vendors (IBM, SAP, Oracle, SAS, Microstrategy and Microsoft) of wasteful BI platforms in Leaders Quarter! Those 6 vendors convinced customers to buy (over period of last 6 years) their BI software for over $60B plus much more than that was spent on maintenance, support, development, consulting, upgrades and other IT expenses.

There is nothing magic about these MQs: they are results of Gartner’s 2-dimensional understanding of BI, Analytics and Data Visualization (DV) Platforms, features and usage. 1st Measure (X axis) according to Gartner is the “Completeness of Vision” and 2nd Measure (Y axis) is the “Ability to Execute”, which allows to distribute DV and BI Vendors among 4 “Quarters”: RightTop – “Leaders”, LeftTop -“Challengers”, RightBottom – “Visionaires” and LeftBottom – “Niche Players” (or you can say LeftOvers).

mq2014

I decided to compare my opinions (expressed on this blog many times) vs. Gartner’s (they wrote 78 pages about it!) by taking TOP 3 Leaders from Gartner, than taking 3 TOP Visionaries from Gartner (Projecting on Axis X all Vendors except TOP 3 Leaders) than taking 3 TOP Challengers from Gartner (Projecting on Axis Y all Vendors except TOP 3 Leaders and TOP 3 Visionaries ) than TOP 3 “Niche Players” from the Rest of Gartner’s List (above) and taking “similar” choices by myself (my list is wider then Gartner’s, because Gartner missed important to me DV Vendors like Visokio and vendors like Datawatch and Advizor Solutions are not included into MQ in order to please Gartner’s favorites), see the comparison of opinions below:

12DVendorsIf you noticed, in order to be able to compare my opinion, I had to use Gartner’s terms like Leader, Challenger etc., which is not exactly how I see it. Basically my opinion overlapping with Gartner’s only in 25% of cases in 2014, which is slightly higher then in previous years – I guess success of Tableau and QLIK is a reason for that.

BI Market in 2013 reached $14B and at least $1B of it spent on Data Visualization tools. Here is the short Summary of the state of each Vendor, mentioned above in “DV Blog” column:

  1. Tableau: $232M in Sales, $6B MarketCap, YoY 82% (fastest in DV market), Leader in DV Mindshare, declared goal is “Data to the People” and the ease of use.

  2. QLIK: $470M in Sales, $2.5B MarketCap, Leader in DV Marketshare, attempts to improve BI, but will remove Qlikview Desktop from Qlik.Next.

  3. Spotfire: sales under $200M, has the most mature Platform for Visual Analytics, the best DV Cloud Services. Spotfire is limited by corporate Parent (TIBCO).

  4. Visokio: private DV Vendor with limited marketing and sales but has one of the richest and mature DV functionality.

  5. SAS: has the most advanced Analytics functionality (not easy to learn and use), targets Data Scientists and Power Users who can afford it instead of free R.

  6. Revolution Analytics: as the provider of commercial version and commercial support of R library is a “cheap” alternative to SAS.

  7. Microsoft: has the most advanced BI and DV technological stack for software developers but has no real DV Product and has no plan to have it in the future.

  8. Datawatch: $33M in sales, $281M MarketCap, has mature DV, BI and real-time visualization functionality, experienced management and sales force.

  9. Microstrategy: $576M in sales, 1.4B MarketCap; BI veteran with complete BI functionality; recently realized that BI Market is not growing and made the desperate attempt to get into DV market.

  10. Panorama: BI Veteran with excellent easy to use front-end to Microsoft BI stack, has good DV functionality, social and collaborative BI features.

  11. Advizor Solutions: private DV Veteran with almost complete set of DV features and ability to do Predictive Analytics interactively, visually and without coding.

  12. RapidMiner: Commercial Provider of open-source-based and easy to use Advanced Analytical Platform, integrated with R.

Similar MQ for “Advanced Analytics Platforms” can be found here: http://www.gartner.com/technology/reprints.do?id=1-1QXWEQQ&ct=140219&st=sg – have fun:

mq2014aap

In addition to differences mentioned in table above, I need to say that I do not see that Big Data is defined well enough to be mentioned 30 times in review of “BI and Analytical Platforms” and I do not see that Vendors mentioned by Gartner are ready for that, but may be it is a topic for different blogpost…

Update: 

Analytics extrapolates Visible Data to the future (“predicts”) and enables us to see more then 6-dimensional subsets of data with mathematical modeling. The ability to do it visually, interactively and without programming … vastly expands the number of potential users for Visual Analytics. I am honored to present the one of the most advanced experts in this area – Mr. Gogswell: he decided to share his thoughts and be the guest blogger here. So the guest-blog-post below is written by Mr. Douglas Cogswell, the Founder, President and CEO of ADVIZOR Solutions Inc.

Formed in 2003, ADVIZOR combines data visualization and in-memory-data-management expertise with usability knowledge and predictive analytics to produce an easy to use, point and click product suite for business analysis. ADVIZOR’s Visual Discovery™ software spun out of a distinguished research heritage at Bell Labs that spans nearly two decades and produced over 20 patents.

Mr. Cogswell is the well known thought leader and he is discussing below the next step in Data Visualization Technology, when limitation of human eye prevents users to comprehend the multidimensional (say more than 6 dimensions) Data Patterns or estimate/predict the future trends with Data from the Past. Such Multidimensional “Comprehension” and Estimations of the Future Trends requires a Mathematical Modeling in form of Predictive Analytics as the natural extension of Data Visualization. This is in turn, requires the Integration of Predictive Analytics and Interactive Data Visualization. Such Integration will be accepted much easier by business and analysts , if it will require no coding.

Mr. Cogswell discussing the need and possibility of that in his article (Copyright ADVIZOR Solutions, 2014) below.

no-code2

Integrating Predictive Analytics and Interactive Data Visualization WITHOUT any Coding!

It’s a new year, and many organizations are mulling over how and where they will make new investments. One area  getting a lot of attention these days is predictive analytics tools. The need  to better understand the present and predict what might happen in the future for competitive advantage is enticing many to look at what these tools can do. TechRadar spoke with James Fisher, who said 85 percent of the organizations that have adopted these tools believe they have positively impacted their business.

Fast Fact Based Decision Making is Critical.

“Businesses are collecting information on their customers’ mobile habits, buying habits, web-browsing habits… The list really does go on,” he said. “However, it is what businesses do with that data that counts. Analytics technology allows organizations to analyze their customer data and turn it into actionable insights, in a way that benefits business.”

Interest in predictive analytics by businesses is expected to continue to grow well beyond this year, with Gartner reporting in early 2013 that approximately 70 percent of the best performing enterprises will either manage or have a view of their processes with predictive analytics tools by 2016. By doing this, businesses will gain a better sense of what is happening within their own networks and corporate walls, which actions could have the best impact and give increased visibility across their industries. This will give situational awareness across the business, making operating much easier than it has been in past years.

Simplicity and Ease of Use are Key.

Analytics is something every business should be figuring out.  There are more software options than ever, so executives will need to figure out which solution will work best for them and their teams. According to InformationWeek’s Doug Henschen, the “2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey” found that business users and salespeople need easy-to-use, visual data analytics that is intuitive and easily accessible from anywhere, any time. . These data visualization business intelligence tools can give a competitive edge to the companies adopting them.

“The demand for these more visual analytics tools leads to one of the biggest complaints about analytics,” he said. Ease-of-use challenges have crippled the utilization rate of this software.  But that is changing.  “Analytics and BI vendors know that IT groups are overwhelmed with requests for new data sources and new dimensions of data that require changes to reports and dashboards or, worse, changes to applications and data warehouses,” he wrote. “It’s no wonder that ‘self-service’ capabilities seem to be showing up in every BI software upgrade.”

A recent TDWI research report titled “Data Visualization and Discovery for Better Business Decisions” found that companies do have their future plans focused on these analytics and how they can use them. In fact, 60 percent said their organizations are currently using business visualization for snapshot reports, scorecards, or display. About one-third are using it for discovery and analysis and 26 percent for operational alerting. However, companies are looking to expand how they use the technology, as 45 percent are looking to adopt it for discovery and analysis, and 39 percent for alerts.

“Visualization is exciting, but organizations have to avoid the impulse to clutter users’ screens with nothing more than confusing ‘eye candy’,” Stodder wrote. “One important way to do this is to evaluate closely who needs what kind of visualizations. Not all users may need interactive, self-directed visual discovery and analysis; not all need real-time operational alerting.”

Data Visualization & Predictive Analytics Naturally Complement Each Other.

Effective data visualizations are designed to complement human perception and our innate ability to see and respond to patterns.  We are wired as humans to perceive meaningful patterns, structure, and outliers in what we see.  This is critical to making smarter decisions and improving productivity, and essential to the broader trend towards self-directed analysis and BI reporting, and tapping into new sources of data.

Visualization also encourages “storytelling” and new forms of collaboration.  It makes it really easy to not only “see” stories in data, but also to highlight what is actionable to colleagues. 

On the other hand, the human mind is limited in its ability to “see” very many correlations at once.  While visualization is great for seeing patterns across 2, or 4 or maybe 6 criteria at a time, it breaks down when there are many more variables than that.  Very few people are able to untangle correlations and patterns across, say, 15 or 25 or 75 or in some cases 300+ criteria that exist in many corporate datasets.

Predictive Analytics, on the other hand, is not capacity constrained!!  It uses mathematical tools and statistical algorithms to examine and determine patterns in one set of data . . . in order to predict behavior in another set of data.  It integrates well with in-memory-data and data visualization, and leads to faster and better decision making.

Making it Simple & Delivering Results.

The challenge is that most of the predictive analytics software tools on the market require the end-user to be able to program in SQL in order to prep data, and have some amount of statistics background to build models in R … or SPSS … or SAS.  At ADVIZOR Solutions our vision has been to empower business analysts and users to build predictive models without any code or statistics background.

NoCode

The results have been extremely promising — inquisitive and curious-minded end-users with a sense for causality in their data can easily do this — and are turning around models in just a few hours.  The result is they are using data in new and powerful ways to make better business decisions.

Three Key Enablers to a Simple End-User Process.

The three keys to making this happen are:  (1) having all the relevant data offloaded from the database or datamart into RAM, (2) allowing the business user to explore it visually, and (3) providing a really simple modeling interface.

Putting the data in RAM is key to making it easy to condition so that the business user can create modeling factors (such as time lags, factors from data in multiple tables, etc.) without having to go back and condition data in the underlying databases — which is usually a time consuming process that involves coordinating with IT and/or DBAs. 

Allowing the business user to explore it visually is key to hypothesis generation and vetting about what really matters, before building and running models.

Providing really simple interfaces that automate the actual statistics part of the process lets the business user focus on their data, not the statistics of the model.  That simple modeling process includes:

  • Select the Target & Base Populations
    • The “target” is the group you want to study (e.g., people who responded to your campaign)
    • The “base” is the group you want to compare the target to (e.g., everybody who received the campaign)
  • Visually Explore the data and develop Hypotheses
    • This helps set up which explanatory fields to include …
    • … and which additional ones may need to be added
  • Select list of Explanatory Fields
    • The “explanatory fields” are the factors in your data that might explain what makes the target different from other entities in your data
  • Build Model
  • Iterate
  • Understand and Communicate what the model is telling you
  • Predict / Score Base Population
  • Get lists of Scored potential targets

Check out how you can do this with no code in this 8 min YouTube video.

Best Done In-house with Your Team.

In our experience this type of work is best done in-house with your team.  That’s because it’s not a “black box”, it’s a process.  And since your team knows the data and its context better than anybody else, they are the ones best suited to discuss, interpret, and apply the results.  In our experience, over and over again it has been proven that knowing the data and context is the key factor  … and that you don’t need a statistics degree to do this.

IncrementalSalesTrends

Quick Example: Consumer Packaged Goods Sales.

In recent client work a well known consumer packaged goods company was trying to untangle what was driving sales.  They had several key questions they were attempting to answer:

  • What factors drive sales?
  • How do peaks in incremental sales relate to the Social Media spikes?
    • For all brands
    • By each brand
  • How does it vary by media provider?  By type of post?
  • Can we use this data to forecast incremental sales? Which factors have the biggest impact?

They had lots of data, which included sales by brand by week, and a variety of potential influences which included:  a variety of their own promotions, call center stats, social media posts, and mined sentiment from those social media posts (e.g., was the post “positive”, “neutral”, or “negative”).   The key step in creating the right explanatory fields was developing time lags for each of these potential influences since the impact on sales was not necessarily immediate — for example, positive Twitter posts this week may have some impact on sales, but more likely the impact will be on sales +1 week, or maybe +2 weeks, or +4 weeks, etc. 

Powerful Results.

What we learned was that there were multiple influences and their intensity varied by brand. Seasonality was no longer the major driver.  New influences — including social media posts and online promotions — were now in the top spot.  We also learned that the key influences can and should be managed.  This was critical — there are lags between the impact of, for example, a negative Twitter post and when it hits sales. As a result, a quick positive response to a negative post can heavily offset that negative post.

In Summary.

An easy to use data discovery and analysis tool that integrates predictive analytics with interactive data visualization and which is then placed in the hands of business analysts and end-users can make huge differences in how data is analyzed, how fast that can happen, and how it is then communicate to and accepted by the decision makers in an organization.

And, stay tuned.  We’ll next be talking about the people side of predictive analytics — if there is now technology that lets you create and use models without writing any code, then what are the people skills and processes required to do this well?