Monday, April 28, 2014

Making a Reusable Tableau Table Contents

There are lots of examples in the Tableau Community of how to build and use tables of contents (TOC) for navigating between worksheets and dashboards. Lately there has also been some discussion on the Tableau forums and ideas sections about making this easier. Whenever I read these I think of Tbeleau Co-Foiunder Pat Hanrahan's oft quoted maxim, "Easy is hard," or "Easy is hard, right?" Then I immediately appreciate the hard work all the gracious Tableaurati share to make all our lives easier and simpler. What that in mind, I have been thinking about a flexible way to do Tables of Contents and navigation controls. Here are my thoughts. I used the Super Store data source that comes with Tableau.  The workbook I came up with is on Tableau Public and is embedded in this post. It is not a finished, shiny thing, but just enough to illustrate the concepts.

The Challenge

During a challenge developing TOCs and navigational controls for a use case in my day job, I wondered if it would be possible to use a separate data source for the TOC that had no direct relationship to the the primary and any secondary data that underlies a visualization. One thing that makes this idea attractive is that the data source sits alone. No calculated fields in a primary or secondary source. So, colleagues who have access to the underlying data won't have reason to ask, and I won't have to answer, this question, "What is that "Go to xxxx" column in the data, and why is it populated for every record?"

  • Make the TOC data source independent of any primary or secondary data sources.
  • Make the contents of the TOC labels easy to manage and change.
  • Allow the data source for the TOC to support TOC navigational controls.
  • Provide a way to incorporate standard tool tips or flexible ones, depending on the use case. This includes the ability to insert data from a view or dashboard into the tool tips.

The first step is building a framework for the database.  In this example, I was in a hurry, so I just made up a little .txt file that looks like this:

This works perfectly fine with a table you can put together with your favorite data source type (Excel, Access, SQL based table, etc.)

The item field will be used to filter the text label of the TOC entry.
The text field is the actual text label that is displayed in the TOC entry. Note that the text for item 7 is "View ToC". 
The tooltip field is an optional field that holds any predetemined instructional text associated with a TOC entry.
The optional type field identifies what kind of use the record is intended to fill.

The example listing is just that, an example. This data source can have as few or as many records as necessary. Obviously, it can change on the fly, as well, as more TOC entries or instructional tool tips are required.

The second step is to connect to this data in Tableau using the appropriate connector. In order to make this Tableau data source available to others and facilitate it's standardization, I extracted it.

The third step is to make the individual TOC entries. Follow this process:
  1. Select this data connection, then start a new worksheet(Ctrl+M).
  2. The item' field in the data source is a record id. To see how the ids relate to the TOC text, drag item, text, and type to the Row shelf to make a reference table.
  3. Start another new worksheet. Drag the item dimension to the Filter shelf, and select just one of the values, for example, 2.
  5. Drag the 'text' dimension to the Text shelf.
  6. To help with formatting, I added a calculated dimension named 'header', and set its literal value to ''.
  7. Drag the 'header' dimension to the Row and Column shelves.  Right click on one of these and uncheck Show Header.
  9. Choose Format | Borders from Tableau's menu, then set Row and Column Dividers as desired.
  10. Chose Format | Shading, then set the Default Worksheet shading as desired.
  11. Choose Format | Font, then set the Default Worksheet font as desired.
  12. Duplicate the worksheet to preserve this formatting.
  13. On this duplicated worksheet, edit the item dimension filter on the Filter shelf by selecting another single one of the values. This will change the text displayed in the new worksheet.
  14. Repeat step 10 and 11 in this list as necessary. Be sure to include a worksheet that filters to the View ToC item, in this case, filter item to 7.
The fourth step is to make the Table of Contents dashboard, as follows:
  1. Make a new dashboard and set the height large enough to hold all the TOC entries. Give the dashboard a name, like "Table of Contents" or "Navigation Menu"
  2. Add a vertical layout container to the dashboard.
  3. Add the individual TOC entry worksheets to the vertical layout container in the order needed. See if you can spot the joke in the dashboard image below.
  4. Add filter actions to the dashboard that target the dashboards or worksheets of interest. For each action, be sure to clear the check boxes for the component sheets on any target dashboard. Also, set the target filters to Selected Fields, but add no filters.
  5. Set Run action on to Select.

The fifth step is to add the View ToC worksheet to any dashboard that requires a one click return to the Tableau of Contents dashboard itself. Each of the sample dashboards in the embedded workbook have this sheet included. Here is an example:

Since each TOC entry is a worksheet of its own, each can have its own tool tip if necessary. One way to facilitate this is to include the text of any tool tip in the TOC data source. In this example data source, the field. 'tooltip" serves this purpose.  For example, here is the data source that drives the TOC content. Note that the 'tooltip' field has the value 'Go to Bar Graph Dashboard'.

The Tableau Public workbook has some other thoughts in it about how to use this independent data source idea. 
Please offer your comments or suggestions for improvement.


First I did some research, checking out a number of table of contents and navigational controls examples from Tableau Zen Masters Joe Mako, Steve Wexler, Craig Bloodworth, Andy Kriebel, Ryan Robataille, Kelly Martin, Jonathan Drummey, Ramon Martinez, Alan Smithee and others. The most common solution was one with which I was familiar -- usng calculated fields to hold text that could be used in conjunction with shape marks (arrows, custom shapes, etc.) to give instructions via tooltips, and fire filter actions to control navigation to dashboards and worksheets in various ways. This method is described in Tableau's Knowledgebase article, Creating a Table of Contents to Navigate to Other Dashboards. But this in not quite what I wanted. Then Matt Francis posted his very cool article titled Create a default TableauTemplate - now with added 8.1 goodness. Matt got me thinking about the possibility of adding a generic data source to the template that could serve as a framwork for doing TOCs, navigational controls, help buttons, and other things.

Thursday, September 27, 2012

Clarifying the No-Kill Shelter Stats

Recently, Cincinnati City Beat published an article online and in print about the no-kill animal shelter movement as an alternative to mass euthanasia. The article included this infographic

It caught the eye, but on further inspection seemed to me a bit confusing and misleading. So, in this post I explain why and offer some alternatives.

Four-leggeds vs Humans

Among other facts, the article reports the magnitude of the current situation. It specifies the number of adopters who will use various methods of adopting, i.e., Swingers (can be persuaded to adopt from a shelter or rescue), Definitely using a shelter or rescue, Definitely using a breeder. So these are the adopters, not the pets. The article also mentions the unfortunate evidence that 4 million pets are "unadopted" from shelters and eventually euthanized, that is, pets not people. The graphic tries to portray the quantitative differences between the types of adopters and the number of euthanized pets using scaled abstract images of pets for each case. Two different messages, but the same metaphor -- not the best approach. When I first looked at this image in the printed version, I figured it was trying to communicate that 17 million large dogs, 5 million small dogs, 1.5 million rabbits, and 4 million cats were adopted from shelters. Then I read the captions on each pet image, but still wondered why pet images. Brain pain. There had to be a clearer way of communicating this message.

Not for the color blind

Adding insult to injury, maybe, the image seemed to me to be not optimized for color blind viewer, or even for viewers wanting a relaxing visual experience. To test this, I submitted the images to the Vischeck web site. It has a detailed info page with links to other sites like this one for research and testing.

Naomi Robbins recently talked about this subject, and Vischeck in her Forbes blog.

Here are the Vischeck results:

Deuteranope (a form of red/green color deficit)

Protanope (another form of red/green color deficit)
Tritanope (a blue/yellow color deficit - very rare)

Interestingly, the rare person with tritanope color deficit would at least see that the three pets associated with adoption intentions are in the same hue family and should perhaps be considered together, while the cat image related to euthanasia is a completely different hue. But most folks with color deficit will get little to no visual clues from the colors used.

Some Alternatives

The point of visual data analysis and data visualization is data sensemaking. It is usually not necessary to invent new schemes or metaphors to make the data come alive. For one thing, more often than not, these new inventions or trendy displays obscure the message or story in the data. Also, it is very hard to do easy well. And, the new-fangled stuff usually hurts the brain, big time. So, what that leaves is using visual displays that incorporate best practices, something Tableau Software does by default. And, this applies to data journalism as much as it does for hard-core statistical analytics. Stephen Few has a very comprehensive discussion on this subject in Criteria for Evaluating Visual EDA Tools. The sensemaking task with the pet adoption data is basically a Part-to-Whole and Ranking Displays exercise. Few discusses this in detail in Chapter 8 of his popular book Now you see it: Simple Visualization Techniques for Quantitative Analysis. In the Tableau dashboard display below I present four alternatives to the menagerie version published by City Beat.

The crosstab shows all the data, both the raw data and percents of total. Maybe that is all that is necessary. It is clear which intention is most common. It also does not mix adoption intentions with euthanized pets. Notice the caption note explaining the Swing Adopter.

The three graphs use Tableau's color blind palette, which yields this color perception result when passed through Vischeck:

The pie chart shows very little. However, in this case, since the three intentional statements have very different response levels, the pie seems to work. However, the pie chart is inherently messy, and should be avoided. Again, I have tried to mitigate this by providing tooltips that fill the the information gaps -- categorical labels, percents, and notes.

The stacked bar chart shows the raw cumulative raw counts of adopters segmented by intention. It takes up very little space. It is possible to gauge the relative prevalence of the intentions compared to each other. Hover over any segment to see both its raw count and percent of total adopters (a bit more information and somewhat less direct). Again, the information about euthanized pets is not directly available. Notice, too, that the tool tip for Swing Adopters includes some additional explanation about swing adopters.

The bar chart in the lower right hand corner most closely illustrates a best practice for this analytical question. Bars encode values as lengths, a pre-attentive characteristic easily handled by the human visual system. They are much more suitable for encoding differences in the value of a single measure across one or more dimensions. In the example, like the other alternatives, all the relevant data is directly available to the viewer, or just a tooltip away.

Bottom line, even though a question or some data you want to visualize and share is simple, you should make every effort to communicate the story in an unambiguous manner, being respectful of your audience's brains. First, do no harm.

Thursday, September 20, 2012

The 47% is not really the 47%, is it?

Recently US media outlets reported on presidential candidate Mitt Romney's musing that voters with no federal tax liability would likely vote for incumbent President Barack Obama. Subsequently Simon Rogers at the Guardian Data Blog in the UK published an investigation of state-by-state variability in this percentage. The piece also explored other indicators, like % on medicare, % without medical insurance, and so on.

Simon used a mashup that involved Google Maps. It seemed to me that without much effort the same or similar could be done with Tableau. So, starting with the data link in the Guardian article, I ended up with the following viz. Easy peasy for Tableau.

The dashboard design challenges that presented themselves were allowing 1) the viewer to select a measure to review, and 2) managing the legends so that they respected the conceptual features of the Guardian version. To start with, I decided to try to accomplish these objectives in Tableau without changing the data structure. The date presented by the Guardian is a typical pivoted Excel table, with a row for every state followed by many columns of measures -- one for each subject of interest. Not ideal. The discussion below outlines the basic approach that other Tableau practitioners have documented in blog post.

Changing the measure

The Guardian version uses a drop down box above the Google map that allowed the viewer to select a single measure of interest. To accomplish this in Tableau requires using a parameter and at least one calculated field. The parameter holds the choices, which are string surrogates for the measure columns in the data. The calculated field(s) hold values for the measure of interest, and set titles. The parameter, named "Show this...", looks like this:

A CASE calculated field, called "This indicator", has the following definition:

case [Show this ...]
when 'No tax liability' then [The 36%: percent of tax filers with no liability]
when 'Poverty' then [The 15%: percent living in poverty]
when 'Tax returns over $1 million' then [The 1%: percent of tax returns over $1m]
when 'Veterans' then [The 7%: percent of population who are veterans]
when 'On Medicare' then [The 15%: percent of population on medicare]
when 'Without medical insurance' then [The 16%: percent without medical insurance]
when 'Unemployed (July, 2012)' then [The 8%: percent unemployed (July 2012)]
when 'Over 65 years old' then [The 13%: population aged over 65]

A second CASE calculated field, called "Indicator Legend Label", looks like this:

case [Show this ...]
when 'No tax liability' then 'The 36%: percent of tax filers with no liability'
when 'Poverty' then 'The 15%: percent living in poverty'
when 'Tax returns over $1 million' then 'The 1%: percent of tax returns over $1 million'
when 'Veterans' then 'The 7%: percent of population who are veterans'
when 'On Medicare' then 'The 15%: percent of population on medicare'
when 'Without medical insurance' then 'The 16%: percent without medical insurance'
when 'Unemployed (July, 2012)' then 'The 8%: percent unemployed (July 2012)'
when 'Over 65 years old' then 'The 13%: population aged over 65'

It sets the title above the map to correspond the to measure selected from the parameter drop-sown list.

Standardizing the legends

In the Guardian version, regardless of the measure chosen, the thematic map uses four or five classes, with a consistent color range for the legend. The class boundaries change depending on the range of values in the data for the selected measure.

To accomplish this in Tableau requires some planning and manual color legend adjustments. First, I made another CASE style calculated field, called "color legends" as follows, based on the value of the Show this... parameter:

case [Show this ...]
when 'No tax liability' then
if [This indicator] <=25 then '0 to 25'
elseif [This indicator] <=30 then '25 to 30'
elseif [This indicator] <=35 then '30 to 35'
elseif [This indicator] <=40 then '35 to 40'
elseif [This indicator] >40 then '40 or more'
when 'Poverty' then
if [This indicator] <=5 then '5 or less'
elseif [This indicator] <=10 then '5 to 10'
elseif [This indicator] <=15 then '10 to 15'
elseif [This indicator] <=20 then '15 to 20'
elseif [This indicator] >20 then '20 or more'

when 'Tax returns over $1 million' then
if [This indicator] <=.01 then '0.0 to 0.01'
elseif [This indicator] <=.12 then '0.01 to 0.12'
elseif [This indicator] <=.24 then '0.12 to 0.24'
elseif [This indicator] <=.5 then '0.24 to 0.50'
elseif [This indicator] >.5 then '0.50 to 0.60'

when 'Veterans' then
if [This indicator] <=6 then '0 to 6'
elseif [This indicator] <=7 then '6 to 7'
elseif [This indicator] <=8 then '7 to 8'
elseif [This indicator] <=9 then '8 to 9'
elseif [This indicator] >9 then '9 or more'

when 'On Medicare' then
if [This indicator] <=8 then '0 to 8'
elseif [This indicator] <=11 then '8 to 11'
elseif [This indicator] <=14 then '11 to 14'
elseif [This indicator] <=17 then '14 to 17'
elseif [This indicator] >17 then '17 to 25'

when 'Without medical insurance' then
if [This indicator] <=5 then '0 to 5'
elseif [This indicator] <=10 then '5 to 10'
elseif [This indicator] <=15 then '10 to 15'
elseif [This indicator] <=20 then '15 to 20'
elseif [This indicator] >20 then '20 or more'

when 'Unemployed (July, 2012)' then
if [This indicator] <=5 then '0 to 5'
elseif [This indicator] <=7 then '5 to 7'
elseif [This indicator] <=9 then '7 to 9'
elseif [This indicator] <=11 then '9 to 11'
elseif [This indicator] >11 then '11 or more'

when 'Over 65 years old' then
if [This indicator] <=10 then '0 to 10'
elseif [This indicator] <=12 then '10 to 12'
elseif [This indicator] <=14 then '12 to 14'
elseif [This indicator] <=16 then '14 to 16'
elseif [This indicator] >16 then '16 or more'


Next, I dropped this calculated field onto the color shelf. Then, one at a time, I changed the measure of interest, which changed the color shelf so that it only included the classes associated with that measure. For each measure, I set the color of each class such that the lowest ordered values always had the same color RGB value, and so on through each class member for each measure.

The map and tooltips

The map itself is a filled map mark type. It uses the Gray color scheme, with only the Base and Country Borders options turned on, and Washout set to 0%. This provides the cleanest presentation of state boundaries.

The elements called out in the tooltip:

are on the Level of Detail shelf to make them available to the tooltip editor. The look and feel come from Tableau's RTF editor controls.

Download the workbook to learn more.

Wednesday, July 18, 2012

Lazy Countries without the Lollipops

Today (July 18, 2012) the Guardian Data Blog asked the question, "Which are the laziest countries on earth?". The data they used is from the British medical journal The Lancet. To visualize the data, the Guardian used Tableau Public. You can see the interactive work below:

This visualization uses a chart type dubbed a 'Lollipop Chart' by Andy Cotgreave when he worked with the DataStudio (). Andy identified the features of this techique this way:
1. Can be used when all dimension members have high values (i.e. long/tall bars in a bar chart)
2. Greatly reduces the data-ink ratio while maintaining a clear link to axis labels
3. All the users I’ve shown it to so far have really engaged with it – they think it’s both pretty and easy to read

I also like the fact that it works if you add more dimensions to make small multiples:

The first question I had when I saw the Guardian's outing of the laziest countries was, "How do countries within a region stack up against each other?" I also wondered if gender sorted differently for the countries within a region. Then there was the whole issue of averages and variability. As it was published, the Guardian visualization didn't get me there for any of these questions.

One of the hurdles for me was the lollipops themselves. In this example, they take up way too much space for my liking. So, I downloaded the workbook from Tableau Public and made some changes.

One change I made was to use a dot plot instead of a lollipop chart. This saved two lines for each country pane, and showed me more quickly the differences by gender within a country. I also added a filter for the region, so I could see all the countries in a region at the same time instead of having to scroll and remember. Thirdly, I added a parameter to make sorting flexible and interactive. Finally, I added a multi-dimensional strip plot so I could see the distribution of values for each country, one gender at a time.

Combining all these changes together results in the following interactive exploratory analytical tool. Play around with it and let me know if you get more insight about the laziest countries.

Tuesday, July 17, 2012

Sometimes the pie slices = more than 100% - Yummy!

Earlier today I received a message to look at this "cool pie chart" from the web site. The chart is supposed to show the percent of respondents to a survey who answered a question about actions ERs could do to reduce repeat ER visits by recently discharged patients. They gave away this image for free, encouraging readers to copy and use it in their own blogs or web sites, with complete attribution. Or, you could buy the report that includes this chart for $127.

The chart was followed by this explanation:

HIN's Reducing Avoidable ER Visits e-survey conducted in October 2011 captured how 134 healthcare organizations are working to staunch the flow and expense of avoidable ED use and point low-acuity patients in the direction of appropriate care. According to survey respondents, some effective strategies to reduce avoidable readmissions by the recently discharged are:

· Phone follow-up within 2 days: 49 percent
· Primary care physician visit within 3-5 days: 44.9 percent
· Medication review: 38.8 percent
· Home visit: 34.7
· Notify primary care physician of discharge: 30.6 percent

Not good. Don't do this.

Pies are for dessert (apple or pumpkin, please), even if there are only two slices (one for me - say 80%, and one for you - say 20%). Otherwise we are going to fight over whose piece is bigger, because the human brain doesn't visually process angles well at all.

In this case though, the larger offense to the viewers and their brains is that you are supposed to suspend what you know about pies (all the slices taken together are supposed to equal the whole pie - 100%). Here the pie is apparently larger than the sum of its parts. What the author is really trying to convey, we think, is that the question was multi-select, so respondents could choose all that apply. So it is not a parts to the whole problem. To convey the percent of respondents who answered each choice, an ordered bar chart does the trick, or simply a sorted text table. Here is how an ordered bar chart might look. It conveys all the information or the original with no cognitive load (extra processing effort) in a smaller space. And the little Best Practice threshold parameter let's the author and consumers of the data clarify their POVs interactively if necessary. It is clear which interventions are more recommended than others, and by how much -- for example, a phone follow-up at discharge had nearly twice as many mentions as offering a provider appointment at discharge. We can deduce this even without the value labels, because the human brain is much better at deciphering differences in length.

Comments are welcome.

Monday, June 4, 2012

Finding WiFi Locations in NYC :: 2010 Data

Sometime this month, I have to travel to NYC, and will need access to WiFi. Hopefully, it will be free. But where to start looking? Well, the NYC Open Data program happens to have a 2010 vintage list of WiFi locations in the whole city. Each entry has complete name and address information, along with lat/long. Free vs Fee is another one of the dimensions of the data. Here is what I did with the data in short order, using Tableau Public.

The data set includes a City field, which I thought would have five entries, one for each Borough. However, it seems to be a collection of the Borough names along with neighborhood names. The city 'New York' is really Manhattan Borough, which is of interest to me for this trip. There are quite a few listings there. The other Borough names are in the list of Cities, but contain few locations each, serving, I suspect, as a catch all when no neighborhood is appropriate.

Sunday, June 3, 2012

Exploring Johan Santana's No-hitter

Saturday, June 2, 2012, ESPN More Sports' Stats & Info site carried a special look at Johan Santana's history making no-hitter in a New York Mets - St Louis Cardinal's match-up. The story included a playing field visualization of each out in the game, using Tableau Public. The visualization places a circle at the location of each out. The circles are color-coded by type of play (fly out, ground out, strike out). The inning in which an out occurred is encoded by varying the size of the circle; all outs in inning one supposedly one size, inning two, slightly larger, and so on). Take a look at the image below:

Nice work, but I felt a need for more interaction. What if you want to see quickly all the outs in a particular inning - not easy or simple enough. The hold-over tool tip requires too much effort to decode. The color encoding in the original conforms to Tableau's automatic best practice, but a more color-blind friendly palette might be as effective. Finally, the size encoding is based on the use of a numeric measure for inning, 0.0 to 9.0, when really these are discrete values (more of a dimension/category).

So, I did a rework, shown below. It includes a simple box score, so to speak, identifying each out in each inning, by type. Now the viewer can immediately see how the inning went for Santana, with almost no cognitive load. The box score has instructions telling the viewer to select an inning header to highlight all the inning's outs on the playing field viz. Better interaction. Also, you can select more than one inning, and compare them to one another. The size encoding on the field view is improved for the innings. Size is not really the best encoding here, because the larger circles might suggest "bigger" or "more" or "better", which is not the case here, but it works in a pinch.

When I showed this rework to the author, his comment was, "all we need now is an asterisk for the disputed foul-ball call at 3rd base in the 6th inning." So, back to the authoring board I went to take care of that. For the disputed call, a conditional text field makes a footnote appear in the tooltip when you hover over the box score or playing field marks for the groundout; it reminds that the previous pitch resulted in a hit that was called foul, but replay showed was fair. The footnote does not appear when you hover anywhere else. At the same time, I improved the tooltip, so that all the information one needed was in a simple color coded sentence, based on the play type. This kind of tooltip flexibility is very easy to do in Tableau and is widely applicable.

And baseball fanatic colleague Matt Booher suggested adding links to video clips to the out-by-out strip at the top, for key plays. So, now, if you hover over game out #16 (6th inning out #1) or game out #22 (8th inning out #1), you will see links to the MLB video clips for these outs. A new browser window opens in each case.

Try out the viz below, and think of a "Yes, and ..." that might apply to your work and data.