Rev Language Reference


ConditionalPosteriorOrdinate - Conditional posterior ordinate (a.k.a. cross-validation)

Model selection via leave-one-out cross-validation. Cross-validation assesses the fit of a model by using a subset of the data to estimate parameters (i.e., train the model) and the remaining observations to evaluate the predictive fitness. In the most extreme case of leave-­one-­out cross-­validation, we use all but one observation to estimate the parameters, P(\theta | X(−i)), then compute the probability of observing the removed data point, P(Xi | \theta), integrate over all parameters: P(Xi | X(−i)) = \int p(Xi | \theta) p(\theta | X(−i)) d\theta and repeat this process for all data points.

Usage

ConditionalPosteriorOrdinate(String filename, String[] columnNamesToSkip, String separator)

Arguments

filename : String (pass by value)
The name of the file where the likelhood samples are stored.
columnNamesToSkip : String[] (pass by value)
The names of the columns that we are going to skip.
Default : [ Iteration, Posterior, Likelihood, Prior, Replicate_ID ]
separator : String (pass by value)
The field separator character. Values on each line of the file are separated by this character. If sep = "" the separator is 'white space', that is one or more spaces, tabs, newlines or carriage returns.

Details

A cross-validation analysis assumes that one has a trace of samples of the probabilities for each data point (e.g., site or column) from one's data stored in a file. Each data point probability is important for the computation of the leave-one-out cross-validation probability. We read in this trace using the function `ConditionalPosteriorOrdinate`. This "constructor" function requires the `filename` as an argument. Then, you can calculate the leave-one-out cross-validation probability using the member method `.predictiveProbability()`. In the current implementation, the member method `.predictiveProbability()` requires two arguments: the `counts`, which are a vector of observations as real numbers; and `log`, which indicates whether the probabilities in the trace are log-transformed. The method returns the leave-one-out cross-validation probability.

Example

# Create a vector of observations (e.g., site frequency spectrum)
obs_sfs = [ 305082, 44248, 32223, 28733, 28220, 26205, 27477, 26618, 27533, 26945, 28736, 28671, 31277, 31250, 34352, 34859, 38331 ]

# Read a pre-existing trace and construct the analysis object
cpo = ConditionalPosteriorOrdinate( filename="output/StairwayPlot_esfs.log" )

# Calculate the leave-one-out cross-validation probability
cpo.predictiveProbability( obs_sfs, log=FALSE )

Methods

See Also