Uncertainty in data visualisation through the lens of statistical inference

Di Cook
Monash University

Visualising Uncertainty
Rostock Retreat
June 21, 2021

https://dicook.org/files/Rostock2021/slides.html
Image credit: Di Cook, 2018

1 / 38

Hello 👋🏻2 / 38

Hello 👋🏻

2 / 38

Hello 👋🏻

Professor, Monash University, Melbourne, Australia

2 / 38

Hello 👋🏻

Professor, Monash University, Melbourne, Australia

👉🏻 https://dicook.org/files/Rostock2021/slides.html

2 / 38

Motivation3 / 38

What do you read from each display?

These images were motivated by work on the Australian Cancer Atlas, and show thyroid cancer incidence as a choropleth map (left) and a new type of display a hexagon tile map (right).

Kobakian and Cook (unpublished) https://github.com/srkobakian/experiment

What do you read from each display?

High thyroid cancer incidence mostly located in east coast.

High thyroid cancer incidence is evident around Brisbane, Sydney, Perth and in some inner city Melbourne areas.

4 / 38

"Land doesn’t get cancer, people do"

We need to establish a new display for Australia to allow us to read spatial distribution of values measured on people

5 / 38

6 / 38

To test which design is better we are going to use the lineup protocol. Each slide shows 12 plots, numbered 1 through 12 at the top of each plot. One of the 12 is a data plot and the remaining 11 are null plots. Pick the plot that is most different from the others.

Write them down just for yourself, without sharing, for now.

6 / 38

Ready?

6 / 38

Ready?

6 / 38

7 / 38

8 / 38

9 / 38

10 / 38

11 / 38

12 / 38

Check your choices

The data plot is in these positions

page	location
8	6
9	3
10	8
11	5
12	9
14	3

13 / 38

Conducting visual inference14 / 38

About the lineup protocol

Based on the statistical justice system

Compare the data plot (accused) with a population of null plots (innocents)
If the data plot (accused) is picked from the lineup, then reject the null hypothesis because it looks different (guilty).
The $p$ -value is the probability that the accused would look this different (guilty) if they actually were not really different (innocent).

Wickham et al (2010) IEEE TVCG

15 / 38

 
    plot 
    question 
    null 
  
    Chloropleth maps 
    Is there a spatial trend? 
    No relationship between location and statistic value 
  
    Tag cloud 
    Is this document the same as that document? 
    No difference in word counts 
  
    Treemap 
    Is the distribution within higher-level categories the same? 
    No difference in proportions within categories 
  
    Histogram 
    Is the underlying distribution smooth? 
    Distribution is smooth 
  
    Histogram 
    Is the underlying distribution bell-shaped? 
    Distribution is normal 
  
    Residual plot 
    Are residuals normally distributed? 
    Distribution is normal 
  
    Scatterplot 
    Are the two variables associated? 
    No association 
  
    Scatterplot, coloured 
    Are points clustered by colour? 
    No difference between coloured groups 
  
    Time series 
    Does the mean change over time? 
    Same mean over time 
  
    Time series 
    Does the variability change over time? 
    Same variability over time 
  
16 / 38

plot	question	null
Chloropleth maps	Is there a spatial trend?	No relationship between location and statistic value
Tag cloud	Is this document the same as that document?	No difference in word counts
Treemap	Is the distribution within higher-level categories the same?	No difference in proportions within categories
Histogram	Is the underlying distribution smooth?	Distribution is smooth
Histogram	Is the underlying distribution bell-shaped?	Distribution is normal
Residual plot	Are residuals normally distributed?	Distribution is normal
Scatterplot	Are the two variables associated?	No association
Scatterplot, coloured	Are points clustered by colour?	No difference between coloured groups
Time series	Does the mean change over time?	Same mean over time
Time series	Does the variability change over time?	Same variability over time

A visual t-test

Nulls: samples simulated from same normal distribution
Null hypothesis: Both groups are samples from the same distribution

17 / 38

A visual t-test: take 2

Nulls: samples generated by permuting labels A, B
Null hypothesis: Both groups are samples from the same distribution

18 / 38

Procedure (1/5)

x	y
B	0.92
B	2.18
A	-0.27
A	-3.92
A	-2.12
B	1.73
A	-1.65
B	1.81
B	3.51
B	3.60

ID	koala_NSW	koala_VIC	bilby_NSW	bilby_VIC
grey	23	43	11	8
cream	56	89	22	17
white	35	72	13	6
black	28	44	19	16
taupe	25	37	21	12

Variables in columns
Observations in rows

https://vita.had.co.nz/papers/tidy-data.pdf

19 / 38

Procedure (2/5)

Use the grammar of graphics to define a plot

ggplot(aes(x = x, y = y, colour = x)) +
  geom_point() +
  stat_summary(fun = "mean", 
               shape = 4, size=1) +
  scale_colour_brewer("", palette="Dark2") +
  some_nice_plot_styling

Based on this mapping, what would be considered null (not interesting)?

20 / 38

Procedure (3/5)

Add data or null

data %>%
  ggplot(aes(x = x, y = y, colour = x)) +
  geom_point() +
  stat_summary(fun = "mean", 
               shape = 4, size=1) +
  scale_colour_brewer("", palette="Dark2") +
  some_nice_plot_styling

21 / 38

Procedure (4/5)

Hide the data among a field of nulls

library(nullabor)
lineup(null_permute("y"), n=10,
            true = data) %>%
  ggplot(aes(x = x, y = y, colour = x)) +
  geom_point() +
  stat_summary(fun = "mean", 
               shape = 4, size=1) +
  scale_colour_brewer("", palette="Dark2") +
  facet_wrap(~.sample) +
  some_nice_plot_styling

22 / 38

Procedure (5/5)

Ask uninvolved, independent observers to pick the plot that is most different from the lineup. We would expect that the chance that any observer chooses the data plot is 1/10 (or $1 / m$ generally, where $m$ is the number of plots in the lineup.

Suppose you had 23 observers, and 8 of them choose the data plot as the most different.

pvisual(x=8, K=23, m=10)

##      x simulated       binom
## [1,] 8    0.0099 0.001229827

23 / 38

How do we know it works?24 / 38

Validation experiment

Majumder et al (2013) conducted validation study to compare the performance of the lineup protocol, assessed by human evaluators, in comparison to the classical test, using subjects employed with Amazon's Mechanical Turk.

http://datascience.unomaha.edu/turk/exp2/index.html

Power analysis of human evaluation relative to classical test.

25 / 38

How can it be used to compare plot designs?26 / 38

Power to compare plot design

Hofmann et al (2012) show how power, interpreted as proportion of observers who detect the data plot, from different plot designs, can be used to establish which is better.

How to make the power calculations and generate confidence intervals for power of different designs.

27 / 38

Back to the maps experiment28 / 38

Thyroid cancer incidence in Australia

29 / 38

Experimental design

Factor: choropleth, hexagon tile map
Structure: NW to SE trend, hotspot in 3 cities, hotspot in all cities

Null: Simulation from a variogram model to model spatial dependence, using gstat package (144 null sets)
Replicates: Four
Lineups: 12 plots in a lineup
Data plot: Trend added to null, for one plot in a lineup

Displays: Two sets, with coin flip for data displayed as choropleth or hexagon tile map, sets A and B
Subjects: 42 subjects for set A, and 53 for set B

30 / 38

Results31 / 38

How to do this yourself

Get a copy of the nullabor package

install.packages("nullabor")

# install.packages("remotes")
remotes::install_github("dicook/nullabor")

Look at the "Get started" documentation at http://dicook.github.io/nullabor/index.html

32 / 38

Statistical inference architecture

is built to measure uncertainty

33 / 38

Statistical
inference
architecture

34 / 38

Group A1 Data vis challenge

Which display of uncertainty is better?

35 / 38

Thanks for listening!

Here's what I hope you take away from this talk:

Uncertainty means what might change if you had a different sample
This is what statistical inference addresses
Plots can be embedded into an inferential framework
Crowd-sourcing can help with conducting inference with plots
Visual inference might be helpful to test your new ideas for adding indications of uncertainty into your displays

36 / 38

Additional reading

^ Buja et al (2009) Statistical Inference for Exploratory Data Analysis and Model Diagnostics, RSPT A
^ Wickham et al (2010) Graphical Inference for Infovis, TVCG
^ Hofmann et al (2012) Graphical Tests for Power Comparison of Competing Design, TVCG
^ Majumder et al (2013) Validation of Visual Statistical Inference, Applied to Linear Models, JASA
^ Yin et al (2013) Visual Mining Methods for RNA-Seq data: Examining Data structure, Understanding Dispersion estimation and Significance Testing, JDMGP
^ Zhao, et al (2014) Mind Reading: Using An Eye-tracker To See How People Are Looking At Lineups, IJITA
^ Lin et al (2015) Does host-plant diversity explain species richness in insects? Ecological Entomology
^ Roy Chowdhury et al (2015) Using Visual Statistical Inference to Better Understand Random Class Separations in High Dimension, Low Sample Size Data, CS
^ Loy et al (2017) Model Choice and Diagnostics for Linear,
Mixed-Effects Models Using Statistics on Street Corners, JCGS
^ Roy Chowdhury et al (2018) Measuring Lineup Difficulty By Matching Distance Metrics with Subject Choices in Crowd- Sourced Data, JCGS
^ Vanderplas et al (2020) Testing Statistical Charts: What Makes a Good Graph? ARSIA
^ Vanderplas et al (2021) Statistical significance calculations for scenarios in visual inference. Stat.

37 / 38

Acknowledgements

Slides created via the R package xaringan, with wattle theme created from xaringanthemer.

The chakra comes from remark.js, knitr, and R Markdown.

Slides are available at https://dicook.org/files/Rostock2021/slides.html and supporting files at https://github.com/dicook/Rostock2021.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Image credit: Di Cook, 2018

38 / 38

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Uncertainty in data visualisation through the lens of statistical inference

Di Cook Monash University

Visualising Uncertainty Rostock Retreat June 21, 2021

Hello 👋🏻

Hello 👋🏻

Hello 👋🏻

Hello 👋🏻

Motivation

"Land doesn’t get cancer, people do"

Ready?

Ready?

Check your choices

Conducting visual inference

About the lineup protocol

A visual t-test

A visual t-test: take 2

Procedure (1/5)

Procedure (2/5)

Procedure (3/5)

Procedure (4/5)

Procedure (5/5)

How do we know it works?

Validation experiment

How can it be used to compare plot designs?

Power to compare plot design

Back to the maps experiment

Thyroid cancer incidence in Australia

Experimental design

Results

How to do this yourself

Statistical inference architecture

Group A1 Data vis challenge

Thanks for listening!

Additional reading

Acknowledgements

Hello 👋🏻

Help

Di Cook
Monash University

Visualising Uncertainty
Rostock Retreat
June 21, 2021