Implementing functions, distributions, and moves

Last modified on August 15, 2018

Implementing a function

There are two main classes of functions in RevBayes: regular functions and member functions. For our example, we will be implement a regular function. First we need to add two files to the RevBayes source code, Func_hyperbolicCosineFunction.cpp and Func_hyperbolicCosineFunction.h. These will go within revbayes/src/revlanguage/functions/math since we are adding a function and it is a mathematical function. Second, we need to register the function by modifying revbayes/src/revlanguage/workspace/RbRegister_Func.cpp.

RevLanguage Header file

The file Func_hyperbolicCosine.h should look like the following:

#ifndef Func_hyperbolicCosine_h
#define Func_hyperbolicCosine_h

#include "Real.h"
#include "RlTypedFunction.h"

#include <string>

namespace RevLanguage {
    
    /**
     * The RevLanguage wrapper of the hyperbolic Cosine function (sinh()).
     *
     * The RevLanguage wrapper of the hyperbolic function function connects
     * the variables/parameters of the function and creates the internal HyperbolicCosineFunction object.
     * Please read the HyperbolicCosineFunction.h for more info.
     *
     *
     * @copyright Copyright 2024-
     * @author The RevBayes Development Core Team (<your-name>)
     * @since 2024-04-26, version 1.0
     *
     */
    class Func_hyperbolicCosine :  public TypedFunction<Real> {
        
    public:
        // Basic utility functions
        Func_hyperbolicCosine*                         clone(void) const;                                          //!< Clone the object
        static const std::string&                       getClassType(void);                                         //!< Get Rev type
        static const TypeSpec&                          getClassTypeSpec(void);                                     //!< Get class type spec
        std::string                                     getFunctionName(void) const;                                //!< Get the primary name of the function in Rev
        const TypeSpec&                                 getTypeSpec(void) const;                                    //!< Get the type spec of the instance
        
        // Function functions you have to override
        RevBayesCore::TypedFunction<double>*            createFunction(void) const;                                 //!< Create internal function object
        const ArgumentRules&                            getArgumentRules(void) const;                               //!< Get argument rules
        
    };
    
}


#endif /* Func_hyperbolicCosine_h */

RevLanguage Implementation file

And the Func_hyperbolicCosine.cpp should look like this:

#include "Func_hyperbolicCosine.h"
#include "Probability.h"
#include "Real.h"
#include "RlDeterministicNode.h"
#include "TypedDagNode.h"
#include "GenericFunction.h"

using namespace RevLanguage;

double* hyperbolicCosine(double x)
{
    double result = (exp(x) + exp(-x))/2;
    return new double(result);
}


/**
 * The clone function is a convenience function to create proper copies of inherited objected.
 * E.g. a.clone() will create a clone of the correct type even if 'a' is of derived type 'b'.
 *
 * \return A new copy of the function.
 */
Func_hyperbolicCosine* Func_hyperbolicCosine::clone( void ) const
{
    return new Func_hyperbolicCosine( *this );
}


RevBayesCore::TypedFunction<double>* Func_hyperbolicCosine::createFunction( void ) const
{
    RevBayesCore::TypedDagNode<double>* x = static_cast<const Real&>( this->args[0].getVariable()->getRevObject() ).getDagNode();
    
    return RevBayesCore::generic_function_ptr< double >( hyperbolicCosine, x );
}


/* Get argument rules */
const ArgumentRules& Func_hyperbolicCosine::getArgumentRules( void ) const
{
    static ArgumentRules argumentRules = ArgumentRules();
    static bool          rules_set = false;
    
    if ( !rules_set )
    {
        
        argumentRules.push_back( new ArgumentRule( "x", Real::getClassTypeSpec(), "The value.", ArgumentRule::BY_CONSTANT_REFERENCE, ArgumentRule::ANY ) );
        
        rules_set = true;
    }

    return argumentRules;
}


const std::string& Func_hyperbolicCosine::getClassType(void)
{
    static std::string rev_type = "Func_hyperbolicCosine";
    
    return rev_type;
}

/* Get class type spec describing type of object */
const TypeSpec& Func_hyperbolicCosine::getClassTypeSpec(void)
{
    static TypeSpec rev_type_spec = TypeSpec( getClassType(), new TypeSpec( Function::getClassTypeSpec() ) );
    
    return rev_type_spec;
}


/**
 * Get the primary Rev name for this function.
 */
std::string Func_hyperbolicCosine::getFunctionName( void ) const
{
    // create a name variable that is the same for all instance of this class
    std::string f_name = "cosh";
    
    return f_name;
}

const TypeSpec& Func_hyperbolicCosine::getTypeSpec( void ) const
{
    static TypeSpec type_spec = getClassTypeSpec();
    
    return type_spec;
}

The function RevBayesCore::generic_function_ptr< ResultType >(CppFunction, arg1, ...) in Func_hyperbolicCosine::createFunction( ) automatically constructs a RevBayesCore::Function from the C++ function so that we do not have to.

If CppFunction is has multiple type signatures, as is the case with functions like sqrt( ), generic_function_ptr< > will not work. In such cases, we can solve this problem by writing a wrapper function with a single type signature. For example,

// Specialize sqrt to only work on doubles
double mysqrt(double x)
{
   return sqrt(x);
}

Registering the new function

In order to make the new function available for use within the Rev language, we first need to register it. To do this,

Go to the src/revlanguage/workspace/ directory.
Open the file RbRegister_Func.cpp file in your editor.
Scroll down until you find the #include commands for math functions.
Add the line #include "Func_hyperbolicCosine.h" in the correct alphabetical order for that group.
Scroll down in that file until you find the section of the code that adds math functions.
Add the line addFunction( new Func_hyperbolicCosine() );

Functions with caching and optimized recalculation

The above method should be prefered for implementing new functions in RevBayes unless the function needs to save intermediate results and use them during recalculation. For example, if our function computes x*(y+z), then if only x has changed, we could save the old value of y+z and re-use it instead of computing y+z from scratch. (In this case, y+z is not expensive enough to be worth saving, but is just a simple illustration of an intermediate result.)

In such cases we will also need to write a RevBayesCore::Function. A RevBayesCore::Function serves a different role than a RevLanguage::Function. A RevLanguage::Function interprets the function arguments and connects them to the RevBayesCore::Function, which performs the actual calculation and holds the result. If we write our own RevBayesCore::Function, then we can use the update() method to intelligently recalculate the value, and we can add data memembers to save intermediate results;

Here will illustrate how to write a simple RevBayesCore::Function using the hyperbolic cosine example. Our example does’t actually save any intermediate results, it just illustrates how a RevBayesCore::Function works. It is therefore equivalent to the RevBayesCore::Function that is automatically generated by generic_function_ptr.

RevBayesCore Header file

First, we will write our new header file. Within our header file, we need to #include a few other RevBayes header files, including TypedDagNode.h since our typed function deals with nodes of DAGs. Note that the directory structure of core is similar to that of the revlanguage. The file src/core/functions/math/HyperbolicCosineFunction.h will be:

#ifndef HyperbolicCosineFunction_h 
#define HyperbolicCosineFunction_h

#include "TypedFunction.h"
#include "TypedDagNode.h"
#include <cmath>

namespace RevBayesCore {
    /**
     * \brief Hyperbolic Cosine of a real number.
     *
     * Compute the hyperbolic cosine of a real number x. (cosh(x) = (exp(x) + exp(-x))/2).
     *
     * \copyright (c) Copyright 2009-2018 (GPL version 3)
     * \author <your-name>
     * \since Version 1.0, 2015-01-31
     *
     */
    class HyperbolicCosineFunction : public TypedFunction<double> {

        public:
                                          HyperbolicCosineFunction(const TypedDagNode<double> *a); 

            HyperbolicCosineFunction*     clone(void) const; //!< creates a clone
            void                          update(void);      //!< recomputes the value

        protected:
            void                          swapParameterInternal(const DagNode *oldP, const DagNode *newP); //!< Implementation of swapping parameters

        private:
            const TypedDagNode<double>* x;

    };
}


#endif

The first part of this file should be the standard header that goes in all the files giving a brief description about what that file is as well as information about the copyright and the author of that file.

RevBayesCore Implementation file

Next, after including the necessary header files, we have to ensure that our new function is included within the RevBayesCore namespace. Here we are implementing our hyperbolic cosine function as its own class that is derived from the typed function class. This class stores the hyperbolic cosine of a value that is held in a DAG node. We have also defined a clone method which can create a clone of our class, and an update method which will update the value of our Hyperbolic Cosine class whenever the value of the DAG node changes.

The file src/core/functions/math/HyperbolicCosineFunction.cpp will look like:

#include "HyperbolicCosineFunction.h"

using namespace RevBayesCore;

HyperbolicCosineFunction::HyperbolicCosineFunction(const TypedDagNode<double> *a) : TypedFunction<double>( new double(0.0) ),
x( a )
{
    addParameter( a );
}


HyperbolicCosineFunction* HyperbolicCosineFunction::clone( void ) const
{
    return new HyperbolicCosineFunction(*this);
}


void HyperbolicCosineFunction::swapParameterInternal(const DagNode *oldP, const DagNode *newP)
{
    if (oldP == x)
    {
        x = static_cast<const TypedDagNode<double>* >( newP );
    }
    
}

void HyperbolicCosineFunction::update( void )
{
    // get the current value of x
    double xValue = x -> getValue();

    // compute the function result
    double result = (exp(xValue) + exp(-xValue))/2;

    // update the stored value
    *value = result;        

}

Alternate implemention for `createFunction`

In order to use the new RevBayesCore::HyperbolicCosineFunction class, we need to modify the RevLanguage::Func_hyperbolicCosine class to use it instead of generic_function_ptr( ). To do this,

Open src/revlanguage/functions/math/Func_hyperbolicCosine.cpp in your editor.
Add an #include statement to make the RevBayesCore::HyperbolicCosineFunction class visible
Modify the definition of Func_hyperbolicCosine::createFunction() as follows:

...
#include "HyperbolicCosineFunction.h"
...
         
RevBayesCore::TypedFunction<double>* Func_hyperbolicCosine::createFunction( void ) const
{
    
    RevBayesCore::TypedDagNode<double>* x = static_cast<const Real&>( this->args[0].getVariable()->getRevObject() ).getDagNode();
    
    return new RevBayesCore::HyperbolicCosineFunction( x );
}

Implementing a distribution

General info before getting started

Within the RevBayes core directory, there are subdirectories for different categories of distributions. All mathematical distributions that have been implemented exist in core/distributions/math.
Note that when implementing a new distribution, you will need to create .cpp and .h files in both the revlanguage directory and the core directory. (For a refresher on the difference between these two directories, refer to the Getting familiar with the code section of this Developer’s guide). The overall naming format remains the same for every distribution in RevBayes. In the Beta Binomial Distribution example provided below, I specify what naming convention to use when creating each file.
It is often helpful to look at/borrow code from existing RevBayes distributions for general help on syntax and organization.

For the language side, one of the most important things is the create distribution function (it converts user-arguments into calculations). Also, the getParameterRules function is important (to get the degrees of freedom & other things). It is often helpful to look at the code of existing distributions for general help on syntax & organization.

Within every new distribution, you will need to include some functions. For example, each new distribution must have: the get class type, name, and help functions. You may not need to implement these from scratch (if they’re dictated by the parent class & are already present), but you will need to implement other functions within your distribution (e.g. cdf, rv, quantile).
Distributions have a prefix DN (dag node), and all moves have a prefix MV. RevBayes takes the name within & creates the DN automatically, so be aware of this. For a refresher on DAG nodes, refer to Getting familiar with the code.

In the following steps, we’ll implement the Beta Binomial Distribution as an example, for syntax purposes.

Steps

Create new .cpp & .h files in revlanguage/distributions/math/, named Dist_betabinomial.cpp and Dist_betaBinomial.h. Note: all files in this directory will follow this naming format: Dist_<nameofdistribution>.[cpp|h].

To populate these files, look at existing examples of similar distributions for specific info on what to include & on proper syntax. For example, for the Beta Binomial distribution, I checked the existing Binomial Distribution code for guidance.
```
//add code here
```
Test
1. Create new .cpp & .h files in core/distributions/math/, named BetaBinomialDistribution.cpp and BetaBinomialDistribution.h.
  
  Note: This is the object oriented wrapper code that references the functions hard-coded in the next step. All files in this directory will follow this naming format: <NameOfDistribution>Distribution.[cpp|h].
2. Create new .cpp and .h files in core/math/Distributions/, named DistributionBetaBinomial.cpp and DistributionBetaBinomial.h.
  
  These are the raw procedural functions in the RevBayes namespace (e.g. pdf, cdf, quantile); they are not derived functions. RbStatistics is a namespace. To populate these files, look at existing examples of similar distributions to get an idea of what functions to include, what variables are needed, and the proper syntax.
  
  Note: This is the most time-consuming step in the entire process of implementing a new distribution. All files in this directory will follow this naming format: Distribution<NameOfDistribution>.[cpp|h]
Navigate to revlanguage/workspace/RbRegister_Dist.cpp

Every new implementation must be registered in RevBayes. All register files are located in the revlanguage/workspace directory, and there are different files for the different types of implementations (RbRegister_Func.cpp is for registering functions; RbRegister_Move is for registering moves; etc.). We are implementing a distribution, so we will edit the RbRegister_Dist.cpp file.
You need to have an include statement at the top of the RbRegister script, to effectively add your distribution to the RevBayes language. You also need to add code to the bottom of this file, and give it a type and a new constructor. Generally, you can look within the file for an idea of proper syntax to use.

For the Beta Binomial distribution, we navigate to the section in the file with the header ‘Distributions’ and then look for the sub-header dealing with ‘math distributions’. Then, add the following line of code:
```
#include "Dist_betaBinomial.h"
```
This step registers the header file for the beta binomial distribution, effectively adding it to RevBayes.

Next, navigate to the section of the file that initializes the global workspace. This section defines the workspace class, which houses info on all distributions. Then, add the following line of code:
```
AddDistribution< Natural		>( new Dist_betaBinomial());
```
This adds the distribution to the workspace. Without this step, the beta binomial distribution will not be added to the revlanguage.

Note: Depending on the kind of distribution, you may need to change Natural to a different type (e.g. Probability, Real, RealPos, etc.).

After registering your distribution, you are ready to test your code.
Before pushing your changes, you should ensure your code is working properly.

There are multiple ways to do this. As a best practice, you should first compile it to ensure there are no errors. Once it compiles with no problems, you can test in various ways (e.g. run each individual function within the new Beta Binomial distribution in R, then run the Binomial distribution with a Beta prior in Rev and see if the output matches). For more information, see the developer tutorials on validation and testing. (TODO)

After ensuring your code runs properly, you are ready to add it to the git repo. We recommend reading through the RevBayes Git Workflow section of the Developer’s guide before pushing.

Implementing a Metropolis-Hastings Move

General info before getting started

The steps to implementing a new move vary slightly, depending on the move’s type (e.g., Metropolis-Hastings versus Gibbs). For the purpose of this guide, we will focus on a Metropolis-Hastings move.

In general, the fastest and easiest way to get help is to find the most similar move already implemented in RevBayes and use it as a guide. Remember that, as with implementing a new distribution or function, you’ll need to add relevant code to both the core of RevBayes and the language. Also remember that you’ll need to work out the math appropriate for your move (e.g., the Hastings ratio) ahead of time.

Steps

Orienting within the repository - For the core, navigate in the repository to src/core/moves. For a Metropolis-Hastings move, we’ll then go into the proposal directory. In this directory, you can find several templates for generic proposal classes, as well as subdirectories containing moves for specific parameter types. To keep things easy, we’ll focus on a single scalar parameter, so we’ll navigate one step further into the scalar directory. For the language, navigate to src/revlanguage/moves. For this example, as we did in the core, we’ll focus on a move for a scalar parameter, so we’ll then open scalar. To register our new move after it’s implemented, we’ll also need to update the file src/revlanguage/workspace/RbRegister_Move.cpp.
Creating new files for the core - As an example, we’ll implement a new move that draws a random value from a Gamma distribution and proposes a new scalar by multiplying the current value by the draw from the Gamma. This move will be called a “Gamma Scaling move”. Since this move is similar to an existing scaling move, we can start by copying the file ScaleProposal.h and naming the new copy GammaScaleProposal.h. As a reminder, we’re working in the directory src/core/moves/proposal/scalar/.

Once the new header file is created and named, we can update the content to match our new move. The simplest changes involve renaming things to match the new move (e.g., updating the preprocessor macro from ScaleProposal_H to GammaScaleProposal_H, or changing the name of object references and constructor from ScaleProposal to GammaScaleProposal). The comments at the top of the header file that describe how the move works should also be updated, but these changes will obviously be specific to the move being implemented.

Next, we’ll need to create a new .cpp file containing the implementation of our new move. As with the header, it’s easiest to copy and rename an existing file, so we’ll use ScaleProposal.cpp as our template, copy it, and rename to GammaScaleProposal.cpp. As with the header file, most of the necessary changes involve updating the names of variables and function names. If the move requires access to other math functions, additional header files may need to be included at the top. Explore src/core/math as needed to find the necessary functions or distributions. In this case, we will need access to methods associated with the Gamma distribution, so we will add #include "DistributionGamma.h". For our example, the number and type of variables used by our move are the same as our template based on the Scale move, so we don’t need to modify the constructor or variable initialization, other than updating the constructor name. Similarly for this example, we don’t need to alter the code in the ::cleanProposal, ::clone, ::prepareProposal, ::printParameterSummary, ::undoProposal, ::swapNodeInternal, and ::tune methods as these are common to our template and new moves (and will be identical to many of the scalar moves), but we do need to update the class names associated with the methods (i.e., ScaleMove:: -> GammaScaleMove::). For the ::getProposalName method, we need to update the string in the method that provides a descriptive name for the move - name = "Gamma Scaling". The bulk of the necessary changes for the new move will come in the ::propose method and the help description above the method. For this example, the new ::propose method looks like this:
```
/*
 * Perform the proposal.
 *
 * A gamma scaling proposal draws a random number from a gamma distribution u ~ Gamma(lambda,1) and scales the current vale by u
 * lambda is the tuning parameter of the proposal which influences the size of the proposals by changing the shape of the Gamma.
 *
 * \return The hastings ratio.
 */
double GammaScaleProposal::propose( double &val )
{
    
    // Get random number generator
    RandomNumberGenerator* rng     = GLOBAL_RNG;
    
    // copy value
    storedValue = val;
    
    // Generate new value (no reflection, so we simply abort later if we propose value here outside of support)
    double u = RbStatistics::Gamma::rv(lambda,1,*rng);
    val *= u;
    
    // compute the Hastings ratio
    double ln_hastings_ratio = 0;
    try
    {
        // compute the Hastings ratio
        double forward = RbStatistics::Gamma::lnPdf(lambda, 1, u);
        double backward = RbStatistics::Gamma::lnPdf(lambda, 1, (1.0/u));
        
        ln_hastings_ratio = backward - forward - log(u);    // The -log(u) term is the Jacobian
    }
    catch (RbException e)
    {
        ln_hastings_ratio = RbConstants::Double::neginf;
    }

    return ln_hastings_ratio;
}
```
In this case, the Hastings ratio involves the probability density of the forward move (the scaling factor u), the density of the corresponding backward move (the scaling factor $\frac{1}{u}$), and the solution to the Jacobian ($-\frac{1}{u}$).
Creating new files for the rev language - After implementing the detailed machinery to perform the new move in the RevBayes core, you need to modify a few files associated with the Rev language to make it available to users. As a reminder, the first set of these files is found in src/revlanguage/moves. For our example, we will be focusing specifically on a move for a scalar value, so navigate to the scalar subdirectory. As with the files in the core, we will copy and rename a header (.h) and implementation (.cpp) file from the Scale move. In this case, our new header file will be called Move_GammaScale.h and our new implementation file will be called Move_GammaScale.cpp. In the header file, simply update the names of the preprocessor macro, the class, and the objects. Also remember to update the help comments near the top of the file.

In the new .cpp file, begin by updating the names of the header files included at the top. Note that we include, and will need to update, the names of both the header for the core GammaScaleProposal.h and the workspace Move_GammaScale.h. Most of the rest of the changes in this file involve updating the names of classes and objects associated with this move, but we also need to update the string specifying the type in the ::getClassType method and specifying the constructor name in the ::getMoveName method. Pay special attention to the rules specified in ::getParameterRules and make sure they satisfy the constraints required by the new move. Update these rules as needed, using existing rules from other moves as templates.

For our particular example, we also need to make one additional change. Because we’ve only updated the ScaleProposal, and not the ScaleProposalContinuous, we need to remove the part of the code in this move that could call ScaleProposalContinuous. The ::constructInternalObject method now looks like this (compare to the corresponding method from Move_Scale.cpp):
```
void Move_GammaScale::constructInternalObject( void )
{
    // we free the memory first
    delete value;
    
    RevBayesCore::Proposal *p = NULL;
    
    // now allocate a new sliding move
    double d = static_cast<const RealPos &>( lambda->getRevObject() ).getValue();
    double w = static_cast<const RealPos &>( weight->getRevObject() ).getValue();
    double r = static_cast<const Probability &>( tuneTarget->getRevObject() ).getValue();
    RevBayesCore::TypedDagNode<double>* tmp = static_cast<const RealPos &>( x->getRevObject() ).getDagNode();
    RevBayesCore::StochasticNode<double> *n = dynamic_cast<RevBayesCore::StochasticNode<double> *>( tmp );
    p = new RevBayesCore::GammaScaleProposal(n, d, r);
    bool t = static_cast<const RlBoolean &>( tune->getRevObject() ).getValue();
    
    value = new RevBayesCore::MetropolisHastingsMove(p, w, t);
    
}
```
Updating the Rev language register - The last step in implementing our new move is to make sure that it’s registered in the Rev language. To do this, we will need to update the file RbRegister_Move.cpp in src/revlanguage/workspace. In this file, we’ll need to include the header for our new move #include "Move_GammaScale.h" in the section corresponding to moves on real values. We’ll also need to add the constructor to the workspace by updating the ::initializeMoveGlobalWorkspace method to include addTypeWithConstructor( new Move_GammaScale() );, again in the section corresponding to moves on real values.
Testing the performance of the new move - If properly implemented, a new move can be validated by running an MCMC analysis where the clamped data are ignored and one tries to sample only the prior. This can be done in RevBayes by calling model.ignoreAllData() to mark the data as ignored. The recommended strategy is to implement the simplest possible model that uses a variable of the type appropriate for the new move, and assigning that variable an easily validated prior (e.g., a uniform). Run the analysis with only the new move operating on that variable and then plot the variable’s marginal distribution to make sure it matches the prior.

Implementing functions, distributions, and moves

Last modified on August 15, 2018

Overview

RevLanguage Header file

RevLanguage Implementation file

Registering the new function

Functions with caching and optimized recalculation

RevBayesCore Header file

RevBayesCore Implementation file

Alternate implemention for createFunction

General info before getting started

Steps

General info before getting started

Steps

Alternate implemention for `createFunction`