Comments on The Bayesian kitchen: Second-order amino-acid replacement processes

This is an interesting idea indeed. Yet, it is not...

2016-12-16T16:26:06.091+01:00

This is an interesting idea indeed. Yet, it is not quite clear to me how this model articulates with the pruning algorithm. Under a 'traditional' first-order Markov model of substitution, the partial likelihood at a tip u corresponds to Pr(D(u)=y|N(u)=x) where D(u) is the observed sequence data (at one particular site), and N(u) is the state at tip u. Then Pr(D(u)=y|N(u)=x)=1 whenever x 'agrees' with y and 0 otherwise (e.g., for nucleotide data, if y is a purine then Pr(D(u)=y|N(u)=x)=1 if x is in {A,G}). Now, under the model you propose here, Pr(D(u)=y|N(u)=x)=1 for all substitutions x that lead to nt/aa/codon y and zero otherwise. But, surely one cannot simply ignore the timing of these substitutions, right? More generally, because the Markov model you propose here involves only non-observable states, it is not quite clear to me how the 'connection' to actual sequences works.

Thanks for your comments and questions.. good poi...

2016-12-14T18:31:36.228+01:00

Thanks for your comments and questions..

good point about time-reversibility -- although I am not sure about the exact reasons why we would really want to enforce this property. It is certainly convenient (the pulley principle, as you point out), but it is not necessarily adequate, empirically speaking (e.g. we know that mutation rates, for instance, are not time-reversible).

concerning the second point: what you say is true, but after all, the same objection could be raised in the case of classical 1st order processes: the amino-acids that are accepted at a given site, given the current state, do not depend on the substitution rate. Yet, it is reasonable to imagine that a more constrained site should in principle undergo more conservative amino-acid replacements, compared to a less constrained site.

Interesting post! (1) One important feature of th...

2016-12-14T05:38:22.157+01:00

Interesting post!

(1) One important feature of the above formulation is that it automatically loses reversibility! Thus, the likelihood depends on the choice of the root node. Is there a formulation of a "second order" process that can maintain reversibility?

(2) Another feature of the above proposal is that it depends on the current and previous amino acid states regardless of how long the current amino acid has been present. Thus a highly conserved site and a purely neutral site have the same rates if they have the same two most recent amino acids.

A different formulation of a second order chain is to have the rates depend on the current amino acid and the amino acid t time units ago.

This can be generalized to a k-th order chain by having the rates depend on the current amino acid and the amino acids i*t time units ago for i=1 to k-1.

This class of models has the nice feature that it approaches the first order model for fixed k as t->0 but can be made to approach the model where the rates depend on the whole substitution history if we let k-> \infty and t->0 in an appropriate manner.

Q: Do there exist non-trivial high-order reversible models in this fixed-lag class? The answer is not obvious to me.