+ - 0:00:00
Notes for current slide
Notes for next slide

Going beyond 2D and 3D to visualise higher dimensions, for ordination, clustering and other models

Di Cook
Monash University

vISEC
June 22, 2020

https://dicook.org/files/vISEC2020/slides_tourr.html






Image credit: Gentoo Penguins, Wikimedia Commons

1 / 46

Outline

  • Getting started: tourr, spinifex, geozoo
  • What is a tour?
  • Different types of tours
  • Interpreting what you see
  • Saving your tour plot
2 / 46

Getting set up

3 / 46

tourr

install.packages("tourr")
help(package="tourr")
library("tourr")

Implements geodesic interpolation and basis generation functions that allow you to create new tour methods from R.

4 / 46

spinifex

install.packages("spinifex")
help(package="spinifex")
library("spinifex")

Implements manual control, where the contribution of a selected variable can be adjusted between -1 to 1, to examine the sensitivity of structure in the data to that variable. The result is an animation where the variable is toured into and out of the projection completely.

5 / 46

geozoo

install.packages("geozoo")
help(package="geozoo")
library("geozoo")

Geometric objects defined in 'geozoo' can be simulated or displayed in the R package 'tourr'.

6 / 46
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] geozoo_0.5.1 spinifex_0.2.0 tourr_0.5.6
## [4] xaringanthemer_0.3.0
##
## loaded via a namespace (and not attached):
## [1] sysfonts_0.8.1 digest_0.6.25 showtextdb_3.0 bitops_1.0-6
## [5] magrittr_1.5 evaluate_0.14 xaringan_0.16 rlang_0.4.6
## [9] stringi_1.4.6 rmarkdown_2.3 tools_4.0.1 stringr_1.4.0
## [13] showtext_0.8-1 xfun_0.14 yaml_2.2.1 compiler_4.0.1
## [17] htmltools_0.5.0 knitr_1.28
7 / 46

Grab the runthis.R file from https://github.com/dicook/vISEC2020

in the skills_showcase folder. (Or the slides_tour.Rmd for everything!)

8 / 46

Get some new data

9 / 46
remotes::install_github("allisonhorst/palmerpenguins")
library(tidyverse)
library(palmerpenguins)
penguins <- penguins %>% filter(!is.na(bill_length_mm))

See https://allisonhorst.github.io/palmerpenguins/ for more details.

10 / 46
Adélie Wikimedia Commons Gentoo Wikimedia Commons Chinstrap Wikimedia Commons
11 / 46
library(ochRe)
ggplot(penguins,
aes(x=flipper_length_mm,
y=body_mass_g,
colour=species,
shape=species)) +
geom_point(alpha=0.7,
size=2) +
scale_colour_ochre(
palette="nolan_ned") +
theme(aspect.ratio=1,
legend.position="bottom")

12 / 46

Our first tour

13 / 46
clrs <- ochre_pal(
palette="nolan_ned")(3)
col <- clrs[
as.numeric(
penguins$species)]
animate_xy(penguins[,3:6],
col=col,
axes="off",
fps=15)

14 / 46

What did you see?

00:30
15 / 46
  • clusters ✅
16 / 46
  • clusters ✅
  • outliers ✅
16 / 46
  • clusters ✅
  • outliers ✅
  • linear dependence ✅
16 / 46
  • clusters ✅
  • outliers ✅
  • linear dependence ✅
  • elliptical clusters with slightly different shapes ✅
16 / 46
  • clusters ✅
  • outliers ✅
  • linear dependence ✅
  • elliptical clusters with slightly different shapes ✅
  • separated elliptical clusters with slightly different shapes ✅
16 / 46
  • clusters ✅
  • outliers ✅
  • linear dependence ✅
  • elliptical clusters with slightly different shapes ✅
  • separated elliptical clusters with slightly different shapes ✅
16 / 46

What is a tour?

A grand tour is by definition a movie of low-dimensional projections constructed in such a way that it comes arbitrarily close to showing all possible low-dimensional projections; in other words, a grand tour is a space-filling curve in the manifold of low-dimensional projections of high-dimensional data spaces.

xiRp, ith data vector

F is a p×d orthonormal basis, FF=Id, where d is the projection dimension.

The projection of xi onto F is yi=Fxi.

Tour is indexed by time, F(t), where t[a,z]. Starting and target frame denoted as Fa=F(a),Fz=F(t).

The animation of the projected data is given by a path yi(t)=F(t)xi.

17 / 46

Geodesic interpolation between planes

Tour is indexed by time, F(t), where t[a,z]. Starting and target frame denoted as Fa=F(a),Fz=F(t).

The animation of the projected data is given by a path yi(t)=F(t)xi.

18 / 46

A grand tour is like a random walk (with interpolation) through the space of all possible planes.

19 / 46

Let's take a look at some common high-d shapes with a grand tour

20 / 46

4D spheres

Hollow

Solid

21 / 46

4D cubes

Hollow

Solid

22 / 46

Others

Torus

Mobius

23 / 46

Reading axes - interpretation

Length and direction of axes relative to the pattern of interest

24 / 46

25 / 46

26 / 46

Reading axes - interpretation

27 / 46

Gentoo from others in contrast of fl, bd

Chinstrap from others in contrast of bl, bm

28 / 46

There may be multiple and different combinations of variables that reveal similar structure. ☹️

The tour can help to discover these, too. 😂

29 / 46

Other tour types

  • guided: follows the optimisation path for a projection pursuit index.
  • little: interpolates between all variables.
  • local: rocks back and forth from a given projection, so shows all possible projections within a radius.
  • dependence: two independent 1D tours
  • frozen: fixes some variable coefficients, others vary freely.
  • manual: control coefficient of one variable, to examine the sensitivity of structure this variable. (In the spinifex package)
  • slice: use a section instead of a projection.
30 / 46

guided tour

new target bases are chosen using a projection pursuit index function

31 / 46

maximizeFg(Fx)    subject to F being orthonormal

  • holes: This is an inverse Gaussian filter, which is optimised when there is not much data in the center of the projection, i.e. a "hole" or donut shape in 2D.
  • central mass: The opposite of holes, high density in the centre of the projection, and often "outliers" on the edges.
  • LDA/PDA: An index based on the linear discriminant dimension reduction (and penalised), optimised by projections where the named classes are most separated.
32 / 46

Grand

Might accidentally see best separation

Guided, using LDA index

Moves to the best separation

33 / 46

manual tour

control the coefficient of one variable, reduce it to zero, increase it to 1, maintaining orthonormality

34 / 46

Manual tour

  • start from best projection, given by projection pursuit
  • bl contribution controlled
  • if bl is removed form projection, Adelie and chinstrap are mixed
  • bl is important for Adelie

35 / 46

Manual tour

  • start from best projection, given by projection pursuit
  • fl contribution controlled
  • cluster less separated when fl is fully contributing
  • fl is important, in small amounts, for Gentoo

36 / 46

Local tour

Rocks from and to a given projection, in order to observe the neighbourhood

37 / 46

Projection dimension and displays

38 / 46

How do I use tours

39 / 46
  • Classification:
    • to check assumptions of models
    • to examine separations between groups
    • determine variable importance
    • examine boundaries
    • random forest diagnostics vote matrix
  • Dimension reduction
    • go beyond 2 PCs
    • work with much higher dimensional data
    • check for not linear dependencies
40 / 46
  • Clustering
    • examine shape of clusters
    • separation between clusters
    • compare cluster solution
    • view the dendrogram in data space
  • Compositional data
    • shapes and clusters in a simplex
41 / 46

Saving for publication

Method 1, using plotly (see reading axes code chunk):

  1. Generate each frame, index each frame, a big array
  2. Make one big ggplot, with all frames overplotted, and a non-used argument frame pointing to your index
  3. Pass to ggplotly
  4. Save to html using htmltools::save_html()

or try using

spinifex::play_tour_path()
42 / 46

Saving for publication

Method 1, using gifski and tourr::render_gif(). See lots of code chunks!

43 / 46

Summary

We can learn a little more about the data if have a tour in the toolbox. It can help us to understand

  • dependencies between multiple variables
  • examine shapes, of clusters
  • detect outliers
44 / 46

Thanks

Slides created via the R package xaringan, with iris theme created from xaringanthemer.

The chakra comes from remark.js, knitr, and R Markdown.

Slides are available at https://dicook.org/files/vISEC20/slides_tourr.html and supporting files at https://github.com/dicook/vISEC2020.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

46 / 46

Outline

  • Getting started: tourr, spinifex, geozoo
  • What is a tour?
  • Different types of tours
  • Interpreting what you see
  • Saving your tour plot
2 / 46
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow