In 1996 Arquès and Michel [1996. A complementary circular code in the protein coding genes. J. Theor. Biol. 182, 45-58] discovered the existence of a common circular code in eukaryote and prokaryote genomes. Since then, circular code theory has provoked great interest and underwent a rapid development. In this paper we discuss some theoretical issues related to the synchronization properties of coding sequences and circular codes with particular emphasis on the problem of retrieval and maintenance of the reading frame. Motivated by the theoretical discussion, we adopt a rigorous statistical approach in order to try to answer different questions. First, we investigate the covering capability of the whole class of 216 self-complementary, C 3 maximal codes with respect to a large set of coding sequences. The results indicate that, on average, the code proposed by Arquès and Michel has the best covering capability but, still, there exists a great variability among sequences. Second, we focus on such code and explore the role played by the proportion of the bases by means of a hierarchy of permutation tests. The results show the existence of a sort of optimization mechanism such that coding sequences are tailored as to maximize or minimize the coverage of circular codes on specific reading frames. Such optimization clearly relates the function of circular codes with reading frame synchronization.
Circular codes revisited: A statistical approach
GIANNERINI, SIMONE;
2011-01-01
Abstract
In 1996 Arquès and Michel [1996. A complementary circular code in the protein coding genes. J. Theor. Biol. 182, 45-58] discovered the existence of a common circular code in eukaryote and prokaryote genomes. Since then, circular code theory has provoked great interest and underwent a rapid development. In this paper we discuss some theoretical issues related to the synchronization properties of coding sequences and circular codes with particular emphasis on the problem of retrieval and maintenance of the reading frame. Motivated by the theoretical discussion, we adopt a rigorous statistical approach in order to try to answer different questions. First, we investigate the covering capability of the whole class of 216 self-complementary, C 3 maximal codes with respect to a large set of coding sequences. The results indicate that, on average, the code proposed by Arquès and Michel has the best covering capability but, still, there exists a great variability among sequences. Second, we focus on such code and explore the role played by the proportion of the bases by means of a hierarchy of permutation tests. The results show the existence of a sort of optimization mechanism such that coding sequences are tailored as to maximize or minimize the coverage of circular codes on specific reading frames. Such optimization clearly relates the function of circular codes with reading frame synchronization.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.