For your info
This tutorial and the included exercises are currently under construction. If you want to run the analyses, please compile
RevBayes
from the dev-coalescent branch as described here (for the development branch). For plotting with theR
packageRevGadgets
, please install the package from the “dev_pop_size” branch using devtools:devtools::install_github("revbayes/RevGadgets", ref = "dev_pop_size")
.
This tutorial describes how to run a demographic analysis using the coalescent process in RevBayes
.
Demographic inference is about estimating population dynamics and in this tutorial, we will specifically focus on population size estimation.
The coalescent process provides a flexible way of estimating population size trajectories through time.
We are primarily interested in asking:
(1) what the population size for our given population was?
and (2) how population sizes have changed over time, e.g., increased or decreased towards the present or experienced a bottleneck.
The input data are usually sequence alignments or estimated trees.
In case of sequence data, trees and population size parameters will be jointly estimated.
Here, we consider different types of analysis, starting with a constant demographic history and doing more complex analyses in later exercises.
The first analyses will consider data from isochronous samples, i.e., data from samples that have all been collected at the same time.
In the second part of the exercises, you will be asked to also analyze data from heterochronous samples, i.e., samples that have been collected at different points in time.
The data for these analyes are taken from Vershinina et al. (2021).
The following table provides an overview of different coalescent models. The tutorials column links directly to the respective tutorial with isochronous data. The tutorial on heterochronous data shows how to apply the models to data from different points in time.
Model | Description | Demographic function | Prior on the population sizes | Interval change point method | Multiple loci analysis possible? | Tutorial |
---|---|---|---|---|---|---|
Constant | A simple coalescent model with one constant population size. | constant | individually chosen | - | Yes | Constant |
Linear | A coalescent model with a linear population size trajectory. | linear | individually chosen | - | Yes | |
Exponential | A coalescent model with an exponential population size trajectory. | exponential | individually chosen | - | Yes | |
Bayesian Skyline Plot without autocorrelation | A coalescent model with a piecewise constant population size trajectory. The different pieces (intervals) are not correlated. The times of interval change are dependent on coalescent events (“event-based”). | constant | individually chosen, independent and identically distributed (iid) | - | Yes | Skyline |
Bayesian Skyline Plot with autocorrelation (Drummond et al. 2005) |
A coalescent model with a piecewise constant population size trajectory. The different pieces (intervals) are correlated, each population size is drawn from an exponential distribution with the mean being the previous population size. The times of interval change are dependent on coalescent events (“event-based”). | constant | $\theta_i | \theta_{i-1}$ (exponential distribution with mean $\theta_{i-1}$) | event-based | No, only single loci | Little description included in skyline Example script |
Extended Bayesian Skyline Plot (Heled and Drummond 2008) |
A coalescent model with a piecewise constant or linear population size trajectory. The different pieces are not correlated. The times of interval change are independent from coalescent events, they are independent and identically distributed (iid). The number of changes and thus the number of intervals is determined by stochastic variable search. | constant | independent and identically distributed (iid) | event-based - number and position of change points estimated | Yes | Little description included in skyline Example script |
Skyride (Minin et al. 2008) |
A coalescent model with a piecewise constant population size trajectory. The different pieces are correlated via a Gaussian Markov Random Field (GMRF) Prior, the degree of smoothing is determined by a precision parameter. The times of interval change are dependent on coalescent events (“event-based”). | constant | $\theta_i | \theta_{i-1}$ (GMRF Prior with precision parameter) | event-based | No, only single loci | Little description included in skyline Example script |
Skygrid (Gill et al. 2012) |
A coalescent model with a piecewise constant population size trajectory. The different pieces are correlated via a Gaussian Markov Random Field (GMRF) Prior and are equally spaced in time, the degree of smoothing is determined by a precision parameter. The times of interval change are independent from coalescent events, they are independent and identically distributed (iid). Can be considered as extension of the skyride model. | constant or linear | $\theta_i | \theta_{i-1}$ (GMRF Prior with precision parameter) | equally sized | Yes | |
Gaussian Markov Random Field (GMRF) Prior with interval change points independent from coalescent events | A coalescent model with a piecewise constant or linear population size trajectory. The different pieces are correlated via a Gaussian Markov Random Field (GMRF) Prior. The times of interval change are independent from coalescent events, they are independent and identically distributed (iid). | constant or linear | $\theta_i | \theta_{i-1}$ (GMRF Prior) | specified - here equally sized | Yes | With sequences as input data: GMRF With trees as input data: GMRF treebased |
Horseshoe Markov Random Field (HSMRF) Prior with interval change points independent from coalescent events (Faulkner et al. 2020) | A coalescent model with a piecewise constant or linear population size trajectory. The different pieces are correlated via a Horseshoe Markov Random Field (HSMRF) Prior. The times of interval change are independent from coalescent events, they are independent and identically distributed (iid). | constant or linear | $\theta_i | \theta_{i-1}$ (HSMRF Prior) | specified - here equally sized | Yes | Little description included in GMRF Script in HSMRF |
Compound Poisson Process (CPP) Prior (similar to Opgen-Rhein et al. (2005)) |
A coalescent model with a piecewise constant or linear population size trajectory. The number of pieces (intervals) is not fixed. A Poisson Prior is used on the number of interval change points, additional priors need to be defined for population sizes and change points. The number of change points is determined via reversible jump MCMC (rjMCMC). | constant or linear | independent and identically distributed (iid) | specified - number and position of change points estimated | Yes | Little description included in GMRF Script in CPP |
Piecewise (similar to Pybus and Rambaut (2002), but with flexible combinations) |
A coalescent model with user-defined demographic functions for a number of intervals. Base demographic functions included in RevBayes are constant, linear and exponential population size trajectories. These can be combined in an arbitrary way, but the most ancient one should always be constant due to computational reasons. | constant and linear and exponential | individually chosen, independent and identically distributed (iid) | specified - individually chosen | No, only single loci | Piecewise |
In order to perform the different analyses in this tutorial, you will need to create a directory on your computer for this tutorial and download a few files.
In the tutorial directory on your computer, create a subdirectory called “data” and download the data files that you can find on the left of this page.
You should now have the following files in your data folder:
horses_isochronous_sequences.fasta: an alignment in FASTA format, containing sequences of 36 horse taxa, all sampled at the same time.
horses_heterochronous_sequences.fasta: an alignment in FASTA format, containing sequences of 173 horse taxa, sampled at various times.
horses_heterochronous_ages.tsv: a tab seperated table listing the heterochronous horse samples and their ages. For extant taxa, the minimum age is 0.0 (i.e. the present).
It is useful to create .Rev
scripts for the different analyes.
We will also provide a full script in every tutorial that you can easily run for the whole analysis.
Please create a “scripts” directory in the tutorial directory.
You can begin by clicking on the first exercise!
If you want to turn back to this page, it is listed on the left as prerequisite in all the exercises. The exercise for heterochronous data is based on the exercises for isochronous data. It aims at highlighting the changes you have to make when considering samples with different ages. It is therefore recommended to do the exercises for isochronous data first.
The first five exercises work with isochronous data, the last one with heterochronous data:
After doing all the exercises, you can compare the resulting population sizes. Have a look at the summary.