This tutorial provides the third protocol from our recent publication (Höhna et al. 2017). The first protocol is described in the Substitution Models and the second protocol is described in the Partitioned data analysis.

You should read first the General Introduction to Model selection tutorial, which explains the theory and standard algorithms for estimating marginal likelihoods and Bayes factors.

The models we use here are equivalent to the models described in the previous exercise on substitution models (continuous time Markov models). To specify the model please consult the previous exercise. Specifically, you will need to specify the following substitution models:

- Jukes-Cantor (JC) substitution model (Jukes and Cantor 1969)
- Hasegawa-Kishino-Yano (HKY) substitution model (Hasegawa et al. 1985)
- General-Time-Reversible (GTR) substitution model (Tavaré 1986)
- Gamma (+G) model for among-site rate variation (Yang 1994)
- Invariable-sites (+I) model (Hasegawa et al. 1985)

Just to be safe, it is better to clear the workspace (if you did not just restart RevBayes):

```
clear()
```

Now set up the model as in the previous exercise. You should start with the simple Jukes-Cantor substitution model. Setting up the model requires:

- Loading the data and retrieving useful variables about it
(
*e.g.*, number of sequences and taxon names). - Specifying the instantaneous-rate matrix of the substitution model.
- Specifying the tree model including branch-length variables.
- Creating a random variable for the sequences that evolved under
the
`PhyloCTMC`

. - Clamping the data.
- Creating a model object.
- Specifying the moves for parameter updates.

The following procedure for estimating marginal likelihoods is valid for
any model in RevBayes. You will need to repeat this later for other
models. First, we create the variable containing the power-posterior
analysis. This requires that we provide a model and vector of moves, as
well as an output file name. The `cats`

argument sets the number of
stepping stones.

```
pow_p = powerPosterior(mymodel, moves, monitors, "output/model1.out", cats=50)
```

We can start the power-posterior analysis by first burning in the chain and and discarding the first 10000 states. This will help ensure that analysis starts from a region of high posterior probability, rather than from some random point.

```
pow_p.burnin(generations=10000,tuningInterval=1000)
```

Now execute the run with the `.run()`

function:

```
pow_p.run(generations=1000)
```

Once the power posteriors have been saved to file, create a stepping stone sampler. This function can read any file of power posteriors and compute the marginal likelihood using stepping-stone sampling.

```
ss = steppingStoneSampler(file="output/model1.out", powerColumnName="power", likelihoodColumnName="likelihood")
```

These commands will execute a stepping-stone simulation with 50 stepping
stones, sampling 1000 states from each step. Compute the marginal
likelihood under stepping-stone sampling using the member function
`marginal()`

of the `ss`

variable and record the value in Table
[tab:ml_cytb].

```
ss.marginal()
```

Path sampling is an alternative to stepping-stone sampling and also takes the same power posteriors as input.

```
ps = pathSampler(file="output/model1.out", powerColumnName="power", likelihoodColumnName="likelihood")
```

Compute the marginal likelihood under stepping-stone sampling using the
member function `marginal()`

of the `ps`

variable and record the value
in Table [tab_ml_subst_models].

```
ps.marginal()
```

As an example we provide the file
**RevBayes_scripts/marginalLikelihood_JukesCantor.Rev**.

We have kept this description of how to use stepping-stone-sampling and
path-sampling very generic and did not provide the information about the
model here. Our main motivation is to show that the marginal likelihood
estimation algorithms are independent of the model. Thus, you can apply
these algorithms to any model, *e.g.*, relaxed
clock models and birth-death models, as well.

- Compute the marginal likelihoods of the
*cytb*alignment for the following substitution models:- Jukes-Cantor (JC) substitution model
- Hasegawa-Kishino-Yano (HKY) substitution model
- General-Time-Reversible (GTR) substitution model
- GTR with gamma distributed-rate model (GTR+G)
- GTR with invariable-sites model (GTR+I)
- GTR+I+G model

- Enter the marginal likelihood estimate for each model in the corresponding cell of Table [tab:ml_cytb].
- Which is the best fitting substitution model?

Model |
Path-Sampling |
Stepping-Stone-Sampling |
---|---|---|

JC ($M_1$) | ||

HKY ($M_2$) | ||

GTR ($M_3$) | ||

GTR+$\Gamma$ ($M_4$) | ||

GTR+I ($M_5$) | ||

GTR+$\Gamma$+I ($M_6$) |

Now you can continue to the next tutorial: Model selection of partition models or Model averaging of substitution models

- Hasegawa M., Kishino H., Yano T. 1985. Dating of the Human-Ape Splitting by a molecular Clock of Mitochondrial DNA. Journal of Molecular Evolution. 22:160–174. 10.1007/BF02101694
- Höhna S., Landis M.J., Heath T.A. 2017. Phylogenetic Inference using RevBayes. Current Protocols in Bioinformatics. 10.1002/cpbi.22
- Jukes T.H., Cantor C.R. 1969. Evolution of Protein Molecules. Mammalian Protein Metabolism. 3:21–132. 10.1016/B978-1-4832-3211-9.50009-7
- Tavaré S. 1986. Some Probabilistic and Statistical Problems in the Analysis of DNA Sequences. Some Mathematical Questions in Biology: DNA Sequence Analysis. 17:57–86.
- Yang Z. 1994. Maximum Likelihood Phylogenetic Estimation from DNA Sequences with Variable Rates Over Sites: Approximate Methods. Journal of Molecular Evolution. 39:306–314. 10.1007/BF00160154