# Coalescent Analyses

## Overview

This tutorial describes how to run a demographic analysis using the coalescent process in RevBayes. Demographic inference is about estimating population dynamics and in this tutorial, we will specifically focus on population size estimation. The coalescent process provides a flexible way of estimating population size trajectories through time. We are primarily interested in asking: (1) what the population size for our given population was? and (2) how population sizes have changed over time, e.g., increased or decreased towards the present or experienced a bottleneck. The input data are usually sequence alignments or estimated trees. In case of sequence data, trees and population size parameters will be jointly estimated. Here, we consider different types of analysis, starting with a constant demographic history and doing more complex analyses in later exercises. The first analyses will consider data from isochronous samples, i.e., data from samples that have all been collected at the same time. In the second part of the exercises, you will be asked to also analyze data from heterochronous samples, i.e., samples that have been collected at different points in time. The data for these analyes are taken from Vershinina et al. (2021).

The following table provides an overview of different coalescent models. The tutorials column links directly to the respective tutorial with isochronous data. The tutorial on heterochronous data shows how to apply the models to data from different points in time.

Model Description Demographic function within intervals Prior on the population sizes Interval change-point method Multiple loci analysis possible? Tutorial
Constant A simple coalescent model with one constant population size. constant individually chosen - Yes Constant
Linear A coalescent model with a linear population size trajectory. linear individually chosen - Yes
Exponential A coalescent model with an exponential population size trajectory. exponential individually chosen - Yes
Bayesian Skyline Plot without autocorrelation A coalescent model with a piecewise constant population size trajectory. The different pieces (intervals) are not correlated. The times of interval change are dependent on coalescent events (“coalescent event based”). piecewise constant individually chosen, independent and identically distributed (iid) coalescent event based No, only single loci Skyline
Bayesian Skyline Plot with autocorrelation
(Drummond et al. 2005)
A coalescent model with a piecewise constant population size trajectory. The different pieces (intervals) are correlated, each population size is drawn from an exponential distribution with the mean being the previous population size. The times of interval change are dependent on coalescent events (“coalescent event based”). piecewise constant $\theta_i | \theta_{i-1}$ (exponential distribution with mean $\theta_{i-1}$) coalescent event based No, only single loci Little description included in Skyline

Example script
Extended Bayesian Skyline Plot
(Heled and Drummond 2008)
A coalescent model with a piecewise constant or linear population size trajectory. The different pieces are not correlated. The times of interval change are dependent on coalescent events (“coalescent event based”), they are independent and identically distributed (iid). The number of changes and thus the number of intervals is determined by stochastic variable search. piecewise constant or linear independent and identically distributed (iid) coalescent event based - number and position of change points estimated Yes Little description included in Skyline

Example script
Skyride
(Minin et al. 2008)
A coalescent model with a piecewise constant population size trajectory. The different pieces are correlated via a Gaussian Markov Random Field (GMRF) Prior, the degree of smoothing is determined by a precision parameter. The times of interval change are dependent on coalescent events (“coalescent event based”). piecewise constant $\theta_i | \theta_{i-1}$ (GMRF Prior with precision parameter) coalescent event based No, only single loci Little description included in Skyline

Example script
Skygrid
(Gill et al. 2012)
A coalescent model with a piecewise constant population size trajectory. The different pieces are correlated via a Gaussian Markov Random Field (GMRF) Prior and are equally spaced in time, the degree of smoothing is determined by a precision parameter. The times of interval change are independent from coalescent events, they are independent and identically distributed (iid). piecewise constant $\theta_i | \theta_{i-1}$ (GMRF Prior with precision parameter) coalescent event independent - equally sized Yes Little description included in GMRF

Example script
Gaussian Markov Random Field (GMRF) Prior with interval change points independent from coalescent events (Faulkner et al. 2020) A coalescent model with a piecewise constant or linear population size trajectory. The different pieces are correlated via a Gaussian Markov Random Field (GMRF) Prior. The times of interval change are independent from coalescent events, they are independent and identically distributed (iid). piecewise constant or linear $\theta_i | \theta_{i-1}$ (GMRF Prior) coalescent event independent, user-specified - here equally sized Yes With sequences as input data: GMRF

With trees as input data: GMRF treebased
Horseshoe Markov Random Field (HSMRF) Prior with interval change points independent from coalescent events (Faulkner et al. 2020) A coalescent model with a piecewise constant or linear population size trajectory. The different pieces are correlated via a Horseshoe Markov Random Field (HSMRF) Prior. The times of interval change are independent from coalescent events, they are independent and identically distributed (iid). piecewise constant or linear $\theta_i | \theta_{i-1}$ (HSMRF Prior) coalescent event independent, user-specified - here equally sized Yes Little description included in GMRF

Script in HSMRF
Skyfish model
(similar to Opgen-Rhein et al. (2005))
A coalescent model with a piecewise constant or linear population size trajectory. The number of pieces (intervals) is not fixed. A Poisson Prior is used on the number of interval change points, additional priors need to be defined for population sizes and change points. The number of change points is determined via reversible jump MCMC (rjMCMC). piecewise constant or linear independent and identically distributed (iid) coalescent event independent, number and position of change points estimated Yes Skyfish
Piecewise
(similar to Pybus and Rambaut (2002), but with flexible combinations)
A coalescent model with user-defined demographic functions for a user-defined number of intervals. Base demographic functions included in RevBayes are constant, linear and exponential population size trajectories. These can be combined in an arbitrary way, but the most ancient one should always be constant due to computational reasons. piecewise constant, linear, or exponential; can be different for every interval individually chosen, independent and identically distributed (iid) coalescent event independent, user-specified No, only single loci Piecewise

## Preparation

In order to perform the different analyses in this tutorial, you will need to create a directory on your computer for this tutorial and download a few files.

### The Data

In the tutorial directory on your computer, create a subdirectory called “data” and download the data files that you can find on the left of this page.

You should now have the following files in your data folder:

• horses_isochronous_sequences.fasta: an alignment in FASTA format, containing sequences of 36 horse taxa, all sampled at the same time.

• horses_heterochronous_sequences.fasta: an alignment in FASTA format, containing sequences of 173 horse taxa, sampled at various times.

• horses_heterochronous_ages.tsv: a tab seperated table listing the heterochronous horse samples and their ages. For extant taxa, the minimum age is 0.0 (i.e. the present).

### The Scripts

It is useful to create .Rev scripts for the different analyes. We will also provide a full script in every tutorial that you can easily run for the whole analysis.

Please create a “scripts” directory in the tutorial directory.

### The Figures

The R package RevGadgets provides functionality for plotting RevBayes results. In each exercise, you can plot the population size trajectories resulting from the different analyses.

If you want to plot the results from the exercises, please create a “figures” directory in the tutorial directory and install the R package RevGadgets.

## Exercises

You can begin by clicking on the first exercise!

If you want to turn back to this page, it is listed on the left as prerequisite in all the exercises. The exercise for heterochronous data is based on the exercises for isochronous data. It aims at highlighting the changes you have to make when considering samples with different ages. It is therefore recommended to do the exercises for isochronous data first.

The first five exercises work with isochronous data, the last one with heterochronous data:

## Summary

After doing all the exercises, you can compare the resulting population sizes. Have a look at the summary.

1. Drummond A.J., Rambaut A., Shapiro B., Pybus O.G. 2005. Bayesian Coalescent Inference of Past Population Dynamics from Molecular Sequences. Molecular Biology and Evolution. 22:1185–1192. 10.1093/molbev/msi103
2. Faulkner J.R., Magee A.F., Shapiro B., Minin V.N. 2020. Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories. Biometrics. 76:677–690. https://doi.org/10.1111/biom.13276
3. Gill M.S., Lemey P., Faria N.R., Rambaut A., Shapiro B., Suchard M.A. 2012. Improving Bayesian Population Dynamics Inference: A Coalescent-Based Model for Multiple Loci. Molecular Biology and Evolution. 30:713–724. 10.1093/molbev/mss265
4. Griffiths R.C., Tavare S., Bodmer W.F., Donnelly P.J. 1994. Sampling theory for neutral alleles in a varying environment. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences. 344:403–410. 10.1098/rstb.1994.0079
5. Heled J., Drummond A.J. 2008. Bayesian inference of population size history from multiple loci. BMC Evolutionary Biology. 8:289. 10.1186/1471-2148-8-289
6. Kingman J.F.C. 1982. On the Genealogy of Large Populations. Journal of Applied Probability. 19:27–43. 10.2307/3213548
7. Minin V.N., Bloomquist E.W., Suchard M.A. 2008. Smooth Skyride through a Rough Skyline: Bayesian Coalescent-Based Inference of Population Dynamics. Molecular Biology and Evolution. 25:1459–1471. 10.1093/molbev/msn090
8. Opgen-Rhein R., Fahrmeir L., Strimmer K. 2005. Inference of demographic history from genealogical trees using reversible jump Markov chain Monte Carlo. BMC Evolutionary Biology. 5:6. 10.1186/1471-2148-5-6
9. Pybus O.G., Rambaut A. 2002. GENIE: estimating demographic history from molecular phylogenies. Bioinformatics. 18:1404–1405. 10.1093/bioinformatics/18.10.1404
10. Vershinina A.O., Heintzman P.D., Froese D.G., Zazula G., Cassatt-Johnstone M., Dalén L., Der Sarkissian C., Dunn S.G., Ermini L., Gamba C., Groves P., Kapp J.D., Mann D.H., Seguin-Orlando A., Southon J., Stiller M., Wooller M.J., Baryshnikov G., Gimranov D., Scott E., Hall E., Hewitson S., Kirillova I., Kosintsev P., Shidlovsky F., Tong H.-W., Tiunov M.P., Vartanyan S., Orlando L., Corbett-Detig R., MacPhee R.D., Shapiro B. 2021. Ancient horse genomes reveal the timing and extent of dispersals across the Bering Land Bridge. Molecular Ecology. 30:6144–6161. https://doi.org/10.1111/mec.15977