There have been considerable attempts in the past to relate phenotypic trait —habitat temperature of organisms— to their genotypes, most importantly compositions of their genomes and proteomes. However, despite accumulation of anecdotal evidence, an exact and conclusive relationship between the former and the latter have been elusive.
We present an exhaustive study of the relationship between amino acid composition of proteomes, nucleotide composition of DNA, and optimal growth temperature of prokaryotes. Based on 204 complete proteomes of archaea and bacteria spanning the temperature range from −10°C to +110°C, we performed an exhaustive enumeration of all possible sets of amino acids and found a set of amino acids whose total fraction in a proteome is correlated, to a remarkable extent, with the optimal growth temperature. The universal set is Ile, Val, Tyr, Trp, Arg, Glu, Leu (IVYWREL), and the correlation coefficient is as high as 0.93. We also found that the G+C content in 204 complete genomes does not exhibit a significant correlation with optimal growth temperature (R=−0.10). On the other hand, the fraction of A+G in coding DNA is correlated with temperature, to a considerable extent, due to codon patterns of IVYWREL amino acids. Further, we found strong and independent correlation between OGT and frequency with which pairs of A and G nucleotides appear as nearest neighbors in genome sequences. This adaptation is achieved via codon bias. Further we analyze the physical reason for the observed amino acid composition bias and determine that this is due to positive design —that seeks to lower native state of proteins— and negative design that increases the energy of misfolded conformations. Together these two factors work to increase energy gap in proteins and therefore increase its stability. These findings present a direct link between principles of proteins structure and stability and evolutionary mechanisms of thermophylic adaptation.