Concerning my last post, I realize that my notation for rejection sampling models is a bit elliptic. In particular, the rejection step corresponding to the fact that we condition on the observed data is never explicitly written. So, let me just reformulate the technical arguments and restate the main idea.

In the general case, i.e. for any Bayesian model, the rejection sampling algorithm could be written as follows:

What this algorithm returns is one single value of $\theta$ sampled from the posterior distribution:

$p(\theta \mid D) \propto p(\theta) p(D \mid \theta)$.

Now, in the case of a diversification model, we have to account for the fact that we observe only surviving species clades. In addition, we imagine that there is some natural variation among clades in diversification rates, represented by a distribution $\phi$ on the diversification parameter $\theta = (\lambda, \mu)$, such that, each clade produced by Nature would have its speciation and extinction rates independently sampled from $\phi$.

Thus, altogether, our complete rejection sampling would look like:

REPEAT

repeatIn the general case, i.e. for any Bayesian model, the rejection sampling algorithm could be written as follows:

REPEAT

sample $\theta$ from prior

simulate data given $\theta$

UNTIL simulated data match observed data $D$

return $\theta$

$p(\theta \mid D) \propto p(\theta) p(D \mid \theta)$.

Of course, you can imagine that one would typically run this algorithm many times and then use the resulting sample of independent draws from the posterior to estimate posterior mean, median, credibility intervals, etc.

Note that I use upper case letters for the REPEAT-UNTIL loop corresponding to the conditioning on empirical data -- to distinguish it from other repeat-until loops that might also be invoked, just because of the data acquisition protocol.

Thus, altogether, our complete rejection sampling would look like:

REPEAT

Nature chooses $\theta$ from $\phi$

Nature runs diversification process given $\theta$

until process survives

UNTIL simulated phylogeny matches observed phylogeny $T$

return $\theta$

We now have two repeat-until loops: the inner one represents the fact that, by construction, we only consider non extinct groups. The outer one represents our conditioning on the observed phylogeny $T$. However, this could just as well be written as:

REPEAT

Nature chooses $\theta$ from $\phi$Nature runs diversification process given $\theta$

UNTIL simulated phylogeny matches observed phylogeny

return $\theta$

simply because an empty tree will never match our observed phylogeny anyway. Thus, we end up with a value of $\theta$ sampled from a very simple posterior distribution, one that does not invoke any conditional likelihood (this was my equation 1 in the previous post):

(1) $p(\theta \mid T) \propto \phi(\theta) \, p(T \mid \theta) $

This posterior distribution combines a prior not conditional on survival $\phi(\theta)$, with a likelihood also not conditional on survival $p(T \mid \theta)$.

Now, you can always rewrite this as:

$p(\theta \mid T) \propto \phi(\theta) p(S \mid \theta) \, p(T \mid \theta) / p(S \mid \theta)$

or, equivalently:

(2) $p(\theta \mid T) \propto p(\theta \mid S) \, p(T \mid \theta, S)$

where $p(S \mid \theta)$ is the probability of survival of the clade. Thus, we now have a prior conditional on survival together with a likelihood also conditional on survival. Equation 1 and 2 are equivalent, and you can choose between them, fundamentally, based on which prior information you have (fossils: equation 1, other neontological studies: equation 2).

But in any case, you never get, under this algorithmic model, an unconditional prior combined with a conditional likelihood. The only way to get this combination is by considering

*another*model:
REPEAT

choose $\theta$ from some prior $p(\theta)$

repeat

Nature runs diversification process given $\theta$

until process survives

UNTIL simulated phylogeny matches observed phylogeny

which basically corresponds to the case where Nature has produced many clades, all under the exact same diversification rates (same parameter vector $\theta$), and we have randomly sampled one among the surviving clades thus produced. Here, we still draw $\theta$ from some prior distribution $p(\theta)$, but this distribution now merely represents our prior uncertainty about the fixed value of $\theta$ 'chosen by Nature' -- not the natural variation of $\theta$ among groups. So, here, we cannot collapse the two repeat-until loops, we have to keep the two: an internal loop corresponding to the actual 'repeated natural experiment', under fixed but unknown $\theta$, and an external loop simply representing our Bayesian inference about this unknown value of $\theta$, by rejection sampling, based on some subjective prior $p(\theta)$.

Personally, I consider this second model less convincing than the first one: we do see in practice quite some variation among closely related groups, so, we should invoke some distribution $\phi$ representing this variation.

Of course, ultimately, none of the two models is really convincing: as soon as you acknowledge the existence of variation in diversification rates between clades, then you should also expect variation within clades: if diversification rates differ between rodents and primates, then, they almost certainly vary among primates, or among rodents. Mathematically, $\theta$ should itself be the result of a stochastic process along the lineages. In fact, some interesting work has already been done along those lines, using a compound Poisson process (Rabosky, 2014).

===

Personally, I consider this second model less convincing than the first one: we do see in practice quite some variation among closely related groups, so, we should invoke some distribution $\phi$ representing this variation.

Of course, ultimately, none of the two models is really convincing: as soon as you acknowledge the existence of variation in diversification rates between clades, then you should also expect variation within clades: if diversification rates differ between rodents and primates, then, they almost certainly vary among primates, or among rodents. Mathematically, $\theta$ should itself be the result of a stochastic process along the lineages. In fact, some interesting work has already been done along those lines, using a compound Poisson process (Rabosky, 2014).

===

Rabosky, D. L. (2014). Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees.

*PLoS One*,*9*(2), e89543. doi:10.1371/journal.pone.0089543