Branch-Specific Diversification Rate Estimation

How to estimate branch-specific shifts in diversification rates

Sebastian Höhna and Michael R. May

Last modified on December 13, 2018

Outline: Estimating Branch-Specific Speciation & Extinction Rates


This tutorial describes how to specify a branch-specific branching-process models in RevBayes; a birth-death process where diversification rates vary among branches, similar to Rabosky (2014). The probabilistic graphical model is given for each component of this tutorial. The goal is to obtain estimate of branch-specific diversification rates using Markov chain Monte Carlo (MCMC).

The Birth-Death-Shift Process


An example of a tree with a single event of diversification rate change, from $(\lambda_1, \mu_1)$ to $(\lambda_2, \mu_2)$. a) A tree showing the complete cladogenetic process, including extinct lineages. b) The reconstructed process for the tree shown in a.

shows an example in which speciation and extinction rates change among lineages. The speciation and extinction rates at the root of the tree of are $(\lambda_1, \mu_1)$. There was one event of change to the speciation and extinction rates on the tree from $(\lambda_1, \mu_1)$ to $(\lambda_2, \mu_2)$. From a casual inspection of the tree, it appears that the single change in speciation and/or extinction rate in the tree of affected the diversity. Note that the clade above the event of speciation/extinction rate change has fewer living species, and more extinct species, than the clade that maintained the ancestral speciation and extinction rates. This is exactly the type of situation we attempt to uncover.

Here we will describe the birth-death-shift process. The parameters in this model are:

The process is described as follows. In a small interval of time, $\Delta t$, a lineage speciates with probability $\lambda \Delta t$, goes extinct with probability $\mu \Delta t$, or changes its rate with probability $\eta \Delta t$. When a speciation event occurs, both daughter lineages inherit the speciation and extinction rates of the parent lineage. When an event of rate change occurs, new speciation and extinction rates are drawn from the probability distributions, $f_{\lambda}(\cdot)$ and $f_{\mu}(\cdot)$. The affected lineage then continues, but with the modified speciation and extinction rates. When an extinction event occurs, the lineage is terminated at the event time.

Samples of an MCMC analysis under a Birth-Death-Shift model. The colors represent different classes of rates for the speciation and extinction rates ($\lambda$ and $\mu$, respectively. Note that the speciation and extinction rates at the root of the tree differ for the different MCMC samples: $(\lambda’,\mu’)$, $(\lambda’’,\mu’’)$, and $(\lambda’’’,\mu’’’)$.

Estimating Branch-Specific Diversification Rates


In this analysis we are interested in estimating the branch-specific diversification rates. We are going to use the dnCBDP distribution which uses a finite number of rate-categories instead of drawing rates from a continuous distribution directly.

Here we adopt an approach using (few) discrete rate categories instead. This allows us to numerically integrate over all possible rate categories using a system of differential equation originally described by Maddison et al. (2007) (see also FitzJohn et al. (2009) and FitzJohn (2010)). The numerical procedure beaks time into very small time intervals and sums over all possible events occurring in that interval (see ).

You don’t need to worry about any of the technical details. It is important for you to realize that this model assumes that new rates at a rate-shift event are drawn from a given (discrete) set of rates.

Read the tree


Begin by reading in the observed tree.

observed_phylogeny <- readTrees("data/primates_tree.nex")[1]

From this tree, we can get some helpful variables:

taxa <- observed_phylogeny.taxa()
root <- observed_phylogeny.rootAge()
tree_length <- observed_phylogeny.treeLength()

Additionally, we initialize a variable for our vector of moves and monitors.

moves    = VectorMoves()
monitors = VectorMonitors()

Finally, we create a helper variable that specifies the number of discrete rate categories, another helper variable for the expected number of rate-shift events, the total number of species, and the variation in rates.

NUM_RATE_CATEGORIES = 6
EXPECTED_NUM_EVENTS = 2
NUM_TOTAL_SPECIES = 367
H = 0.587405

Using these variables we can easily change our script, for example, to use more or fewer categories and test the impact.

Specifying the model


Priors on rates


Discretization of a lognormal distribution. The two left figures have 4 rate categories and the two right plots have 10 rate categories. The top plots have the 95% probability interval spanning one order of magnitude (sd $=0.587405$) and the bottom plots have the 95% probability interval spanning two orders of magnitude (sd $=2*0.587405$) .

Instead of using a continuous probability distribution we will use a discrete approximation of the distribution, as done for modeling rate variation across sites (Yang 1994) and for modeling relaxed molecular clocks (Drummond et al. 2006). That means, we assume that the speciation rates are drawn from one of the $N$ quantiles of the lognormal distribution. For this we will use the function fnDiscretizeDistribution which takes in a distribution as its first argument and the number of quantiles as the second argument. The return value is a vector of quantiles. We use it as a deterministic variable and every time the parameters of the base distribution (i.e., the lognormal distribution in our case) change the quantiles will update automatically as well. Thus we only need to specify parameters for our base distribution, the lognormal distribution. We choose a log-uniform distribution as the prior distribution for the mean parameter of the lognormal distribution. fixed on our expected diversification rate, which is .

speciation_mean ~ dnLoguniform( 1E-6, 1E2)
moves.append( mvScale(speciation_mean, lambda=1, tune=true, weight=2.0) )

Next, we choose an exponential prior distribution with mean of $H$ for the variation in speciation rates.

rate_sd ~ dnExponential( 1.0 / H )
moves.append( mvScale(rate_sd, lambda=1, tune=true, weight=2.0) )

Now, we can compute the speciation rate categories. We will use a lognormal distribution discretized into NUM_RATE_CATEGORIES quantiles and the parameters that we should created.

speciation := fnDiscretizeDistribution( dnLognormal(ln(speciation_mean), rate_sd), NUM_RATE_CATEGORIES )

Similarly, we define the prior on the extinction rate in the same way as we did for the speciation rate.

extinction_mean ~ dnLoguniform( 1E-6, 1E2)
extinction_mean.setValue( speciation_mean / 2.0 )
moves.append( mvScale(extinction_mean, lambda=1, tune=true, weight=2.0) )

However, we assume that extinction rate is the same for all categories. Therefore, we simply replicate using the rep function the extinction rate NUM_RATE_CATEGORIES times.

extinction := rep( extinction_mean, NUM_RATE_CATEGORIES )

Next, we need a rate parameter for the rate-shifts events. We do not have much prior information about this rate but we can provide some realistic ranges. For example, we can specify a uniform distribution that the goes from 0 to 100 expected events. Remember that this is only possible if the tree is known and not estimated simultaneously because only if the tree is do we also know the tree length. As usual for rate parameter, we apply a scaling move to the event_rate variable.

event_rate ~ dnUniform(0.0, 100.0/tree_length)
event_rate.setValue(EXPECTED_NUM_EVENTS/tree_length)
moves.append( mvScale(event_rate, lambda=1, tune=true, weight=2.0) )

Additionally, we need a parameter for probability that the process starts at the root in any of the diversification-rate categories. We use a uniform/equal prior distribution on the diversification-rate categories.

rate_cat_probs <- simplex( rep(1, NUM_RATE_CATEGORIES) )

Shifts in the Extinction Rate

We might want to allow the extinction rate to change as well. As with the speciation rate, we discretize the lognormal distribution into a finite number of rate categories.

extinction_categories := fnDiscretizeDistribution( dnLognormal(ln(extinction_mean), H), NUM_RATE_CATEGORIES )

Now, we must create a vector that contains each combination of speciation- and extinction-rates. This allows the rate of speciation to change without changing the rate of extinction and vice versa. The resulting vector should be $N^2$ elements long. We call these the `paired’ rate categories.

k = 1
for(i in 1:NUM_RATE_CATEGORIES) {
    for(j in 1:NUM_RATE_CATEGORIES) {
        speciation[k]   := speciation_categories[i]
        extinction[k++] := extinction_categories[j]
    }
}

Now we also need to specify a root prior for $N^2$ elements.

rate_cat_probs <- simplex( rep(1, NUM_RATE_CATEGORIES * NUM_RATE_CATEGORIES) )

Note however, that this type of analysis will take significantly longer to run!

Incomplete Taxon Sampling


We know that we have sampled 233 out of 367 living primate species. To account for this we can set the sampling parameter as a constant node with a value of 233 / 367.

rho <- observed_phylogeny.ntips() / NUM_TOTAL_SPECIES

Root age


The birth-death process requires a parameter for the root age. In this exercise we use a fix tree and thus we know the age of the tree. Hence, we can get the value for the root from the (Magnuson-Ford and Otto 2012) tree. This is done using our global variable root defined above and nothing else has to be done here.

The time tree


Now we have all of the parameters we need to specify the full branch-specific birth-death model. We initialize the stochastic node representing the time tree.

timetree ~ dnCDBDP( rootAge           = root,
                    speciationRates   = speciation,
                    extinctionRates   = extinction, 
                    Q                 = fnJC(NUM_RATE_CATEGORIES),
                    delta             = event_rate, 
                    pi                = rate_cat_probs,
                    rho               = rho,
                    condition         = "time" )

And then we attach data to it.

timetree.clamp(observed_phylogeny)

For summary and plotting purposes, we need to extract the branch-specific diversification rate estimate from the tree. We will use a stochastic rate mapping algorithm, which is derived from stochastic character mapping.

moves.append( mvGibbsDrawCharacterHistory(timetree, weight=1) )
branch_lambda := timetree.averageSpeciationRate()
branch_mu := timetree.averageExtinctionRate()
branch_net_div := branch_lambda - branch_mu
branch_rel_ext := branch_mu / branch_lambda

Finally, we create a workspace object of our whole model using the model() function.

mymodel = model(speciation)

The model() function traversed all of the connections and found all of the nodes we specified.

Running an MCMC analysis


Specifying Monitors


For our MCMC analysis, we need to set up a vector of monitors to record the states of our Markov chain. First, we will initialize the model monitor using the mnModel function. This creates a new monitor variable that will output the states for all model parameters when passed into a MCMC function.

monitors.append( mnModel(filename="output/primates_FRC_BDSP.log",printgen=10, separator = TAB) )

Additionally, we create an extended-Newick monitor. The extended-Newick monitor writes the tree to a file and adds parameter values to the branches and/or nodes of the tree. We can thus print the tree with the average speciation and extinction rates, as well as the net diversification (speciation - extinction) and relative extinction (extinction / speciation) rates, for each branch into a file. We will need this file later to estimate and visualize the posterior distribution of the rates at the branches.

monitors.append( mnStochasticBranchRate(cdbdp=timetree, printgen=1, filename="output/primates_BDS_rates.log") )
monitors.append( mnExtNewick(filename="output/primates_BDS_rates.trees", isNodeParameter=FALSE, printgen=1, tree=timetree, branch_lambda, branch_mu, branch_net_div, branch_rel_ext) )

Finally, create a screen monitor that will report the states of specified variables to the screen with mnScreen:

monitors.append( mnScreen(printgen=10, event_rate, mean_speciation, root_category, total_num_events) )

Initializing and Running the MCMC Simulation


With a fully specified model, a set of monitors, and a set of moves, we can now set up the MCMC algorithm that will sample parameter values in proportion to their posterior probability. The mcmc() function will create our MCMC object:

mymcmc = mcmc(mymodel, monitors, moves, nruns=1, combine="mixed")

Now, run the MCMC:

mymcmc.run(generations=10000,tuning=200)

When the analysis is complete, you will have the monitored files in your output directory. You can then visualize the branch-specific rates by attaching them to the tree. This is actually done automatically in our mapTree function.

treetrace = readTreeTrace("output/primates_BDS_rates.trees", treetype="clock")
map_tree = mapTree(treetrace,"output/primates_BDS_rates_MAP.tree")

Now you can open the tree in FigTree.

⇨ The Rev file for performing this analysis: mcmc_BDS.Rev

Exercise 1


  1. Drummond A.J., Ho S.Y.W., Phillips M.J., Rambaut A. 2006. Relaxed Phylogenetics and Dating with Confidence. PLoS Biology. 4:e88. 10.1371/journal.pbio.0040088
  2. FitzJohn R.G., Maddison W.P., Otto S.P. 2009. Estimating trait-dependent speciation and extinction rates from incompletely resolved phylogenies. Systematic Biology. 58:595–611. 10.1093/sysbio/syp067
  3. FitzJohn R.G. 2010. Quantitative Traits and Diversification. Systematic Biology. 59:619–633. 10.1093/sysbio/syq053
  4. Maddison W.P., Midford P.E., Otto S.P. 2007. Estimating a binary character’s effect on speciation and extinction. Systematic Biology. 56:701. 10.1080/10635150701607033
  5. Magnuson-Ford K., Otto S.P. 2012. Linking the Investigations of Character Evolution and Species Diversification. The American Naturalist. 180:225–245. 10.1086/666649
  6. Rabosky D.L. 2014. Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PLoS One. 9:e89543. 10.1371/journal.pone.0089543
  7. Yang Z. 1994. Maximum Likelihood Phylogenetic Estimation from DNA Sequences with Variable Rates Over Sites: Approximate Methods. Journal of Molecular Evolution. 39:306–314. 10.1007/BF00160154