Excel as a Business Intelligence Platform – Part 2
Author: Ugur Kadakal, Ph.D., CEO and founder of Pagos, Inc.
Dos&Don’ts of Building Successful Dashboards in Excel
In previous article (see “Excel as BI Platform” here) I discussed Excel’s use as a Business Intelligence platform and why it is exceedingly popular software among business users. In this article I will talk about some of the principles to follow when building a dashboard or a report in Excel.
One of the greatest advantages of Excel is its flexibility: it puts little or no constraints on the user’s ability to create their ideal dashboard environments. As a result, Excel is being used as a platform for solving practically any business challenge. You will find individuals using Excel to solve a number of business-specific challenges in practically any organization or industry. This makes Excel the ultimate business software.
On the other hand, this same flexibility can lead to errors and long term maintenance issues if not handled properly. There are no constraints on data separation, business logic or the creation of a user interface. Inexperienced users tend to build their Excel files by mixing them up. When these facets of a spreadsheet are not properly separated, it becomes much harder to maintain those workbooks and they become prone to errors.
In this article, I will discuss how you can build successful dashboards and reports by separating data, calculations and the user interface.
It is very important to keep data separate from the rest of your dashboard. The data that is being used by calculations or the user interface has to reside unmodified somewhere in your workbook. There should be no formulas or user interface controls in the data sheets. How this works also highly depends on how data is imported into your workbook. Let’s talk about some of the approaches of handling data in Excel dashboards.
There are situations where data is updated infrequently and can be fit into a workbook without causing portability issues. These are situations where data does not need to be updated more often than (approximately) once a week. In this case, manually extracting the data from other systems and placing it into a workbook can be a manageable task. However, some organizations may not have the leeway to manually update data every week. It may even be more expensive than fully automating that process.
Another major factor in having static data in a workbook is the data size and the portability of the file. Excel has limitations on embedding data. These limitations depend on the version of Excel. For example, Excel 2003 has a row limit of 65,536. You cannot embed a larger data set into a single worksheet in Excel 2003. The alternative is to break the data into multiple worksheets, which isn’t recommended as it may lead to complex calculation logic that may become too difficult to manage. The other alternative is to place that data in a pivot cache. While a pivot cache can handle much larger data sets, they are limited to using formulas available in pivot tables. In Excel 2007, the row limitation went up to 1,048,576. This provides 16x more space than Excel 2003, which is a significant improvement. However, as you get closer to embedding data sets this large, your workbook size will likely grow into double or triple digits (in MB). It will get more difficult to copy those workbooks and share them with colleagues. Even opening those files will be excruciatingly slow. Needless to say, I do not advise embedding data sets of this size into Excel. If your Excel file is already bigger than 50MB because of data, it is a good time to think about other ways to handle your data.
Note that Microsoft has significantly increased Excel’s capabilities in handling large data sets in Excel 2010 with the introduction of PowerPivot. Combined with better data compaction and superior performance of its new columnar, in-memory database engine, it allows users to handle much larger data sets at a faster pace than its predecessors. Data Analysis Expressions (DAX) was also introduced in Excel 2010 and provides an extensive library of formulas that can be used in PowerPivot data sets. While DAX is not as rich as Excel’s native formula library, it is still rich enough for competent Excel users to get creative by combining DAX and native formulas to achieve their goals.
Despite these inefficiencies, the majority of enterprises were/are still using Excel 2003 or 2007 at the time that this article was written. While most enterprises have plans to migrate to Windows 7 and Office 2010, those migrations take an extremely long time. That’s why we are basing our recommendations in this article on Excel 2007.
Excel has extensive capabilities in querying data from external data sources via OLEDB or ODBC. This gives you two advantages: (1) it keep your data outside of your workbook which will effectively reduce your workbook size and (2) it will allow you to refresh data at any time, which eliminates the necessity of manual data updating. There are two approaches when bringing data into a workbook from external data sources. The first is to import the whole data set and use native Excel features (PivotTables, filters, etc.) and formulas to produce the necessary results. This approach works fine if the data being placed in Excel is not too large. But if you are querying large data sets, it is important to bring only necessary data and perform all filtering in the database by passing filter parameters from the user interface into your SQL statements. This requires advanced knowledge of SQL.
Naturally, importing external data is not for novice Excel users. It requires an extensive understanding of SQL languages and data structures. As your source data gets larger and more complex, the SQL expertise required in writing those queries goes up. Nevertheless, it will allow you to connect to the most commonly used data sources and to query large data sets that can then be used inside of the workbook.
One of the biggest advantages of Excel is its ability to build complex business logic through a visual, easy-to-use grid interface. This is also important when it comes to building dashboards and reports. Once the source data is embedded in a workbook, business users can easily build formulas and calculate various parameters and key performance indicators (KPIs) that are vital when it comes to decision making.
But it is important to follow certain principles when building those formulas in order to create successful dashboards and reports. It is extremely important to separate yours calculations from the rest of the workbook. In the same manner as data was isolated into an individual worksheet, all calculations should be isolated from the rest of the workbook by placing them in different worksheets as well.
Another important principle is to use named ranges when referencing data inside of data sheets. This will make the application easier to maintain and audit. Note that some Excel experts disagree with any usage of named ranges. I recommend using named ranges in dashboards as much as possible despite the fact that it will create some extra work when updating the data.
Excel offers two main approaches to building calculations: (1) using VBA and (2) using native worksheet formulas. Stay away from building your formulas in VBA. If it is possible, always use native formulas to build your calculations. Using VBA will create an additional layer of complexity to your workbooks. It will make your workbooks harder to maintain and audit.
Excel has an extensive library of native worksheet formulas. The formula library is so rich that there are often multiple formulas or formula combinations that can get you to the same results. Some of those formulas are more preferable than others. Formulas such as SUMIFS, COUNTIFS and AVERAGEIFS are an easy ways to create complex multicolumn calculations. They can be used as anything from pivot table replacements to pattern matching utilities to text sorting tools. However, they are not available prior to Excel 2007. If you are using Excel 2003, their single criteria counterparts (SUMIF, COUNTIF and AVERAGEIF) are available.
Another important category is database formulas (such as DSUM and DCOUNT) if your dashboard requires complex queries. They can be used as alternatives to the formulas mentioned above and can make the creation of calculations that reference large data sets significantly more efficient.
Reference formulas, such as LOOKUP (all variations), INDEX and MATCH, can also be important tools in your repertoire when building dashboards. They will allow you to easily sort through large data sets and find results based on input parameters that are selected by the user. We recommend staying away from volatile reference formulas such as INDIRECT unless you have no other choice. They will make it harder to maintain your dashboards, especially if frequent structural changes are necessary.
The user interface is the third main facet in any BI application. Excel offers extensive capabilities in building the user interface of your dashboard or report. The user interface be broken into two parts: (1) the user controls (which is where the user makes selections and expects the visual and data portion of the dashboard to get updated) and (2) the data visualization (which includes features such as data tables, pivot tables and charts).
By definition, dashboards are interactive. They require users to interact with data in order to drill down and identify trends in order to make intelligent business decisions. The user interactivity can be built into your dashboard using form controls. They offer a visually appealing and easy-to-use interface for your dashboards. In the same manner as the calculations page, stay away from VBA and ActiveX controls. Always use simpler form controls and use cell references and formulas to handle connectivity to your calculation sheets. I also recommend keeping all input ranges for your form controls in a separate worksheet. Keep cell references in the same sheet. It will make it easier when connecting those references with your calculations and will, subsequently, make the maintenance of your dashboard easier. This can be seen in the example below:
The first figure shows the user interface portion of the dashboard. “Customer Support” is selected inside of the Department filter. The second screen shows the sheet where input ranges and cell references are placed. Note that cells C4:C11 contains input ranges for that control while cell C1 contains the cell reference. Note that form controls always return the ID. You need to place an INDEX formula to produce its description.
Pivot Tables and Pivot Charts
Pivot tables and pivot charts can be an important part of your dashboard. They provide tremendous flexibility when interacting with a large data set. Through the use of features like rows, columns, data, filters and slicers (supported in Excel 2010), you can create advanced dashboards.
Note that features such as pivot tables or pivot charts have their own built-in controls and user interactivity. While they are automatically created when you insert a pivot table or a pivot chart, connecting those controls to the rest of your dashboard requires additional formula building. You should always be mindful that those controls only update the pivot that they belong to and will not synchronize with other modules.
Charts are great additions to your dashboard. They offer invaluable features in terms of data visualization. While standard charts do not offer interactivity in the same manner as pivot charts do, one can build similar interactivity by connecting form control selections to chart data through creative formulas. By doing so, you can create charts from results and KPIs in your calculation sheets.
Conditional Formatting and Sparklines
Conditional formatting is another powerful feature that can provide invaluable visualizations for your dashboard. The addition of color scales and data bars in Excel 2007 in combination with icon sets allow users to build complex dashboards. Another important feature that I consider in the same category is the Sparkline chart.
While Excel is the most commonly used BI tool in the market today, it is also prone to errors. This becomes especially true when used by inexperienced users who are not familiar with the aforementioned principles. In this article, I tried to address how best to avoid those risks to create successful dashboards and reports.
Over the past decade, we have experienced a great technological shift towards online collaboration thanks to the ever-expanding Internet. Like with any software, web-based access and collaboration is required in BI. While Excel is a very powerful software for BI, it is still desktop-based and lacks collaboration features. In the next article, I will talk about how you can web-enable your Excel-based dashboards and collaborate with your colleagues.