Posts

In 1997, I wrote a paper that was accepted for the newly fledged Journal of Statistical Software. That article is still available at https://www.jstatsoft.org/article/view/v002i06. It looks nothing like the original published paper. Papers today are only published in pdf format, unlike the original which were delivered as html, with pdf being a secondary format o help readers who preferred a printed copy. My paper was titled “Calibrate your to Recognise High-Dimensional Shapes from their Low-Dimensional Projections”.

CONTINUE READING

Multicollinearity This was one of the comments from a recent review of a paper: As you note in the paper, it seems likely that there are still issues with multi-collinearity Multicollinearity means that the observations are co-linear in some combination of the variables. This has been relaxed in practice to mean substantial association between explanatory variables. When your explanatory variables have substantial association between them, it means that you don’t have a stable base on which to build a model.

CONTINUE READING

useR! 2018 was held for the first time in the southern hemisphere, and the feedback from participants has been very positive. I have been asked to write about the organisation and this is a good way to get some of the planning and decisions and operations into print, so that it might be useful for others charged with conference organisation. There are a lot of people who made the conference a success, and their contributions need to be acknowledged.

CONTINUE READING

Goal I just gave a short talk at ISCB-ASC 2018 about visualising high-dimensional data, which involves showing dynamic graphics. In the past, I have run the tour, captured the window and saved to a movie, and embedded this into the Rmarkdown xaringan slides. It seems a bit discombobulated to make the slides this way, and a better way to work would be to make a tour animation using plotly. This turned out to take me two days to get it working, through little mistakes that were not easy to debug by googling the problem.

CONTINUE READING

Download your data You can get access to your own electricity and gas usage data from https://www.citipower.com.au/our-services/myenergy. You will need a copy of your power bill, which has your smart meter number and meter id, to register for an account. Reading the data The data structure is described here. The data is not especially nicely formatted (surprise). The main components are: The time resolution is half-hourly. And values for each day are spread across the columns.

CONTINUE READING

In this assignment, the focus was to practice data cleaning. Students suggested questions to build a class survey, to get to know the interests of other class members, and then completed the composed survey. After cleaning the data, a few summary plots of interesting aspects of the data were made. There are some common mistakes that rookies often make when constructing data plots: packing too much into a single graphic, leaving categorical variables unordered, reversing norms for response and explanatory variables, conditioning in wrong order, plotting counts when proportions should be the focus, not normalizing by counts, using a boxplot for small sample size.

CONTINUE READING

I’m sitting watching cricket tonight, the first day of the Australia vs West Indies Boxing Day test. Just now video of retired batsman Chris Rogers being honored was played, along with a plot of his batting record, shown on screen similar to this one below: Howzat? What are they trying to show? What’s the data in this plot? Is it a bar chart? A histogram? What does color mean?

CONTINUE READING

This week I have been visiting the Department of Statistical Sciences at Cornell University. This is the home of many venerable statisticians. At first sight it appears that statisticians are spread all over the university, and technically they are because funding comes from many directions, but almost all are actually located in a suite in Comstock Hall. Professor Paul Velleman is one of the pioneers of data-centrist thinking about statistics. He produced the software called DataDesk in the early 90s that some saw as rivaling LispStat and particularly JMP for introductory statistics classes.

CONTINUE READING

This week I have been visiting the new Center for Statistics and Applications in Forensic Evidence. The center involves four universities, CMU, ISU, UC-Irvine, U. Virginia, and is a NIST Center of Excellence. The kickoff event occurred over Oct 26-27 at ISU, organized by Center Director, Professor Alicia Carriquiry. The speaker list included Barry Scheck (Co-Founder, The Innocence Project), Jo Handelsman (The White House Office of Science & Technology Policy), Philip Dawid (Emeritus Professor of Statistics, University of Cambridge), Anil Jain (Michigan State University) and Stephen Feinberg (CMU).

CONTINUE READING

Spent a couple of hours this morning talking at the http://it.monash.edu/data-science workshop organised by Michael Brand from Monash University. Good questions, good discussion.

Here is a link to my slides.

CONTINUE READING