Before getting into challenging problems, let us have a quick look at two simple cases of Bayesian methods that appear to have good frequentist calibration properties. Both are Gaussian comparative models.

(1) Revell and Reynolds (2012) estimate the parameters of a Brownian model of quantitative trait evolution with intraspecific variation, on data simulated under various conditions and on trees with 50 taxa. They report that the frequency at which their 95% confidence intervals contain the true value of the parameter(s) is statistically indistinguishable from 95% in all cases that were analyzed. This corresponds to the standard definition of calibration, as frequentist coverage associated to simple parameter estimation.

(2) Ancestral trait reconstruction based on a phylogenetic regression model (Lartillot, 2013). Here, two correlated traits are assumed to evolve according to a bivariate Brownian model. Both traits are known in extant species, and one (the predictor) is also known in the ancestors (interior nodes). The other trait is to be predicted for all ancestors along the phylogeny.

The frequentist calibration turns out to be very good in the sense that, on average, 95% of all credible intervals across the phylogeny contain the true value of the trait to be predicted. On the other hand, if you look at one particular ancestor along the phylogeny, then, across all replicates, the credible interval associated to that particular ancestor may contain the true value significantly more often than 95% of the time -- or significantly less often (minimum observed is 87%), depending on the ancestor. However, ancestor-specific deviations cancel out and the average coverage over ancestors is 95%. In other words, what I seem to get here is not

*point-wise*calibration, but*group-wise*calibration (I still don't completely understand why).Group-wise calibration is probably the only thing we can hope to obtain in many cases, in particular in an empirical Bayes context. But this is also most often what we need in order to make sense of our probabilistic evaluations in frequentist terms: most often, we want to know how many false discoveries we are making across ancestors (or genes, or observations, etc).

One should of course not generalize too hastily from these two relatively anecdotic examples. Gaussian models are relatively easy cases. Under some simple normal models, you can even obtain exact calibration for any sample size, using suitably invariant uninformative priors (Berger, 1985). But this is encouraging, at least in the context of relatively simple parametric problems with gentle probability distributions.

Yes, I know. Phylogenetics is not just about simple parametric problems with gentle probability distributions.

--

Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis (1985 ed.). New-York: Springer-Verlag.

Lartillot, N. (2013). A phylogenetic Kalman filter for ancestral trait reconstruction using molecular data. Bioinformatics,

Revell, L. J., & Graham Reynolds, R. (2012). A new Bayesian method for fitting evolutionary models to comparative data with intraspecific variation. Evolution, 66:2697–2707.

**epub ahead of print**.Revell, L. J., & Graham Reynolds, R. (2012). A new Bayesian method for fitting evolutionary models to comparative data with intraspecific variation. Evolution, 66:2697–2707.

## No comments:

## Post a Comment