Rev Language Reference


pathSampler - Path-sampling marginal likelihood estimation

Applies the path-sampling technique (Gelman & Meng 1998), also known as thermodynamic integration (Ogata 1989; Lartillot & Philippe 2006), to estimate the marginal likelihood of a model from a series of unnormalized "power posterior" densities in which the likelihood term is raised to a power B between 0 and 1.

Usage

pathSampler(String filename, String powerColumnName, String likelihoodColumnName, String separator)

Arguments

filename : String (pass by value)
The filename where the likelihood samples are stored in.
powerColumnName : String (pass by value)
The name of the column that holds the values of the powers.
likelihoodColumnName : String (pass by value)
The name of the column that holds the likelihood values.
separator : String (pass by value)
The field separator character. Values on each line of the file are separated by this character. If sep = "" the separator is 'white space', that is one or more spaces, tabs, newlines or carriage returns.

Details

The RevBayes implementation of path sampling allows for an arbitrary choice of the powers B, and uses the trapezoidal rule for numerical integration. For K power posteriors starting with the posterior proper (B_1 = 1) and ending with the prior (B_K = 0), the path-sampling log marginal likelihood estimate r_PS is therefore equal to: r_PS = \sum_{k=1}^{K-1} ( (\sum{i=1}^n L_{k,i} / n) + (\sum{i=1}^n L_{k+1,i} / n) ) * (B_k - B{k+1})/2 where n denotes the number of MCMC samples from each power posterior, and L_{k,i} denotes the log likelihood of the i-th MCMC sample from the k-th power posterior. The estimate is returned by calling the `.marginal()` method on the sampler object. We can also compute the standard error of this estimate in two different ways using the `.stdError()` method. By default, RevBayes uses the formula: SE = sqrt( \sum_{k=1}^K w_k^2/4 Var(L_k)/N_{eff,k} ) where w_k is the k-th weight, Var(L_k) is the variance of the log likelihoods from the k-th power posterior, and N_{eff,k} is the effective sample size for the k-th power posterior. This is the formula given by dos Reis et al. (2018; Appendix 2), but RevBayes uses a different numerical integration technique, and therefore a different set of weights: | B_2 - B_1 if k = 1 w_k = < B_{k+1} - B_{k-1} if 1 < k < K | B_K - B_{K-1} if k = K These weights generalize Eqs. 51 and 52 of Lartillot & Philippe (2006) to cases in which the powers B are not evenly spaced between 0 and 1. The standard error can also be derived using the stationary bootstrap method (Politis & Romano 1994), which divides the log likelihood values from each power posterior into consecutive blocks of random size (drawn from a geometric distribution) and resamples them with replacement. The RevBayes implementation of the stationary block bootstrap method is based on the mcmc3r R package (Álvarez-Carretero et al. 2022), and can be accessed by calling `.stdError(bootstrap=TRUE)`.

Example

# Create a simple model (unclamped)
a ~ dnExponential(1)
mymodel = model(a)

# Create a move vector and a monitor vector
moves[1] = mvScale(a, lambda = 1.0, weight = 1.0)
monitors[1] = mnFile(a, filename = "output/out.log")

# Run 64 power posterior analyses, each with 100 samples
pow_p = powerPosterior(mymodel, monitors, moves, "output/out.pp", cats=64, sampleFreq=1)
pow_p.run(generations=200, burninFraction=0.5)

# Create a path sampler object from a file that RevBayes
# automatically generated by concatenating the likelihoods from
# individual power posteriors
ps = pathSampler(file="output/out.pp", powerColumnName="power",
                 likelihoodColumnName="likelihood")

# Compute the marginal likelihood
ps.marginal()

# Compute the standard error
ps.stdError()                                               # default formula
ps.stdError(bootstrap=TRUE)                                 # bootstrap (100 replicates)
ps.stdError(bootstrap=TRUE, replicates=50, printFiles=TRUE) # bootstrap (50 replicates, save output)

Methods

See Also