This tutorial provides the third protocol from our recent publication (Höhna et al. 2017). The first protocol is described in the Nucleotide substitution models and the second protocol is described in the Partitioned data analysis.
You should read first the General Introduction to Model selection tutorial, which explains the theory and standard algorithms for estimating marginal likelihoods and Bayes factors.
The models we use here are equivalent to the models described in the previous exercise on substitution models (continuous time Markov models). To specify the model please consult the previous exercise. Specifically, you will need to specify the following substitution models:
Just to be safe, it is better to clear the workspace (if you did not just restart RevBayes):
clear()
Now set up the model as in the previous exercise. You should start with the simple Jukes-Cantor substitution model. Setting up the model requires:
PhyloCTMC
.The following procedure for estimating marginal likelihoods is valid for
any model in RevBayes. You will need to repeat this later for other
models. First, we create the variable containing the power-posterior
analysis. This requires that we provide a model and vector of moves, as
well as an output file name. The cats
argument sets the number of
stepping stones.
pow_p = powerPosterior(mymodel, moves, monitors, "output/model1.out", cats=50)
We can start the power-posterior analysis by first burning in the chain and and discarding the first 10000 states. This will help ensure that analysis starts from a region of high posterior probability, rather than from some random point.
pow_p.burnin(generations=10000,tuningInterval=1000)
Now execute the run with the .run()
function:
pow_p.run(generations=1000)
Once the power posteriors have been saved to file, create a stepping stone sampler. This function can read any file of power posteriors and compute the marginal likelihood using stepping-stone sampling.
ss = steppingStoneSampler(file="output/model1.out", powerColumnName="power", likelihoodColumnName="likelihood")
These commands will execute a stepping-stone simulation with 50 stepping
stones, sampling 1000 states from each step. Compute the marginal
likelihood under stepping-stone sampling using the member function
marginal()
of the ss
variable and record the value in Table
[tab:ml_cytb].
ss.marginal()
Path sampling is an alternative to stepping-stone sampling and also takes the same power posteriors as input.
ps = pathSampler(file="output/model1.out", powerColumnName="power", likelihoodColumnName="likelihood")
Compute the marginal likelihood under stepping-stone sampling using the
member function marginal()
of the ps
variable and record the value
in Table [tab_ml_subst_models].
ps.marginal()
We have kept this description of how to use stepping-stone-sampling and path-sampling very generic and did not provide the information about the model here. Our main motivation is to show that the marginal likelihood estimation algorithms are independent of the model. Thus, you can apply these algorithms to any model, e.g., relaxed clock models and birth-death models, as well.
Model | Path-Sampling | Stepping-Stone-Sampling |
---|---|---|
JC ($M_1$) | ||
HKY ($M_2$) | ||
GTR ($M_3$) | ||
GTR+$\Gamma$ ($M_4$) | ||
GTR+I ($M_5$) | ||
GTR+$\Gamma$+I ($M_6$) |
Now you can continue to the next tutorial: Model selection of partition models or Model averaging of substitution models