Monday, November 10, 2014

Bayesian priors for diversification studies

In my last post, I was asking whether one should condition the likelihood on survival in the context of diversification studies. The whole discussion, however, was purely at the level of the likelihood. Now, how should we look at this problem from a Bayesian perspective? 

Assuming that $\theta = (\lambda, \mu)$ is our vector representing the speciation and extinction rates, I was suggesting the following algorithmic model:

Nature has fixed distribution $\phi(\theta)$
   Nature chooses $\theta$ from $\phi$
   Nature runs diversification process given $\theta$
until process survives

where $\phi$ is the natural variation in diversification rate among clades (both extant and extinct). Our observed phylogeny can then be seen as one particular draw from this simulation model. Conditioning on the observed phylogeny $T$ leads to the following posterior distribution:

(1)    $p(\theta \mid T) \propto \phi(\theta) \, p(T \mid \theta) $

involving the likelihood unconditional on ultimate survival. I was then considering two limiting cases of this equation in the context of a maximum a posteriori approach.

Now, simply identifying $\phi$ with our prior and sampling from the posterior distribution defined by equation 1 would represent a meaningful Bayesian formalization of the problem. However, this assumes that we can see our prior as a reasonable estimate of the true natural variation in diversification rates $\phi$  and not just as a representation of some loosely defined prior beliefs about possible values of $\theta$. So, in particular, this has to be an informative prior.

Do we have practical ways to do this? Yes, I think so. In fact, I can see two different approaches to this problem.

Fossil data  Suppose we are able to come up with an acceptable estimate of the distribution of diversification rates from the fossil record. Since $\phi$ in equation 1 is to be interpreted as the distribution of diversification rates across all clades, both extant and extinctwe can directly identify $\phi$ with our empirical estimate $\hat \phi$ obtained from fossils, which leads us to the following posterior distribution:

(2)    $p(\theta \mid T) \propto \hat \phi(\theta) \, p(T \mid \theta)$.

Neontological data  Alternatively, suppose that we already have estimates of $\theta$ across several other clades (e.g. several mammalian orders), obtained by fitting or conditioning diversification models on their respective phylogenies. We would like to use this information to derive the prior for our clade of interest (say, Carnivores). Our prior information, however, is now itself conditional on survival, so, we can not simply calculate the mean and the variance of the speciation and extinction rates across other clades and use it as a proxy for $\phi$, since $\phi$ is not conditional on extinction.

On the other hand, we can re-write equation 1 as follows:

$p(\theta \mid T) = \phi(\theta) p(T \mid \theta) = \phi(\theta) p(S \mid \theta) \, p(T \mid \theta) / p(S \mid \theta) = p(\theta \mid S) \, p(T \mid \theta, S)$.

where $p(S \mid \theta)$ is, as in the previous post, the probability of ultimate survival of the clade given $\theta$. So, here, we have two factors: a conditional prior:

$p(\theta \mid S) \propto \phi(\theta) p(S \mid \theta)$,

which can be represented by the following sampling model:

   choose $\theta$ from $\phi$
   run a diversification process under $\theta$
until process survives
return $\theta$

clearly suggesting that this is the prior distribution of values of $\theta$ across surviving groups. And a likelihood now conditional on survival:

$p(T \mid \theta, S) = p(T \mid \theta) / p(S \mid \theta)$,

which we have already seen. What all this means is that, instead of working with $\phi$ and with the unconditional likelihood, we can equivalently work with a prior conditioned on survival, but then we should combine it with the conditional likelihood. In practice, we can for instance calculate the mean and the variance of speciation and extinction rates estimated on other mammalian orders and, using a moment-matching argument, derive an empirical prior $\hat \psi(\theta)$, now conditional on survival. The posterior is then:

(3)    $p(\theta \mid T) \propto \hat \psi(\theta) \, p(T \mid \theta, S)$.

Note that this idea of combining the conditional likelihood with a conditional prior is similar to what is proposed in Stadler et al (2013).

Thus, it all depends on what prior information you have. If your prior information is obtained from fossil evidence, then this should be considered as unconditional prior, to be used in conjunction with an unconditional likelihood (equation 2). If, on the other hand, your prior information has been gathered from other neontological studies, then this prior information is conditional on survival, and as such it should be be combined with a conditional likelihood (equation 3).

No prior information  Now, what if we do not want to use fossil evidence, nor any information gained from other clades? We would like to say that we do not know anything about $\lambda$ and $\mu$ and, accordingly, use a non-informative prior. However, which likelihood should we combine with this uninformative prior? conditional or unconditional? 

This is a bit tricky. First, there would be much to say about uninformative priors in general, before dealing with the question of how to use them in the present case. Second, technically, the only Bayesian prior that we can meaningfully introduce here should express our uncertainty about $\phi$. So, we would end up with two levels in our model: $\theta$ sampled from $\phi$, itself sampled from some hyperprior. We could explicitly derive something along those lines, but this would be uselessly complicated for the present matter. Instead, we can pretend that we implicitly marginalize our complicated two-level model over the hyperprior, ending up with an effective prior directly on $\theta$.

At the end, I think it is just a question of deciding whether we want to express lack of information about $\theta$ among all clades or only among surviving clades. In the first case, we would combine our uninformative prior with the unconditional likelihood, in the second case, with the conditional likelihood*. Personally, I think I would prefer to express lack of information among all clades, for it is not true that I don't know anything about the diversification rate of an extant species group: I know in particular that $\lambda - \mu$ cannot be unreasonably low, otherwise, I would not have observed this clade.

In summary, in a Bayesian context, there seems to be some flexibility in which version of the likelihood we can use: in fact, it all depends on the meaning attached to the prior.

We can of course imagine more complicated settings: estimating $\phi$ by simultaneously considering several clades in the context of a hierarchical Bayesian model, constructing integrative models jointly considering fossil and molecular data, etc. All this can lead to relatively complicated arguments and calculations for deriving the exact prior to be used  but this is exactly where algorithmic representations can be useful.


Stadler, T., Kühnert, D., Bonhoeffer, S., & Drummond, A. J. (2013). Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV). Proceedings of the National Academy of Sciences, 110(1), 228–233. doi:10.1073/pnas.1207965110

* Strictly speaking, there is a third case: we could also entertain the model in which there is in fact no available variation under $\phi$, so that $\phi$ is highly concentrated around some value $\theta_0$ (as considered in my previous post). But we don't know the location of $\theta_0$ and therefore we put an uninformative prior on it. In that case, we would combine the uninformative prior with the conditional likelihood. However, personally, I am not convinced by this specific model: I tend to think that there is variation among clades, so that the clades that we observe are statistically enriched in higher diversification rates -- a fact that should be included in, and not factored out from, what we call our empirical data.


  1. Hi,

    I certainly miss something but the $\theta$'s drawn from your algorithmic model are not $\phi$-distributed anymore (their actual distribution may be derived from $\phi$). Is the $phi$ of your equation 1 the same as in the beginning of your text?


    Gilles Didier

  2. Hi Gilles,

    yes, the phi is the same.

    without any conditioning:

    Nature chooses theta from phi
    Nature runs diversification process given theta
    until process survives

    gives you theta from p(theta | S) (the conditional prior)

    Now, if you also condition on observed tree T:

    Nature chooses theta from phi
    Nature runs diversification process given theta
    until process survives AND simulated phylogeny matches observed phylogeny

    then you sample theta from equation 1, right? (perhaps I should have made the actual conditioning on observed T explicit in my notation).

    Or are you referring to something else?

  3. I have rephrased the argument (next post)... I hope this clarifies.