f) “”s”" region locates outside of the ORF. g) A second cagA gene between cagM and cagP. h) (tr), truncation. i) Mongolian gerbil-adapted, originally
from gastric ulcer. j) vacA gene is split. k) According to a reference [139], the sequence might not represent a complete genome, although it is deposited as a complete circular genome in GenBank. l) “”m”" region was not available because of a deletion in the center of the ORF. Japanese/Korean core genomes diverged from the European and then the Amerind A phylogenetic tree was constructed from concatenated seven genes atpA, efp, mutY, ppa, trpC, ureI and yphC, which were used for CX-5461 ic50 multi-locus sequence typing (MLST) [18] and phylogenetic analyses [19, 20]) (Additional file 1 (= Figure S1)). The tree showed that AZ 628 datasheet the 6 East Asian strains, the 4 Japanese strains (F57, F32, F30 and F16) and the 2 Korean strains (strain 51 and strain 52), are close to the known subpopulation
hspEAsia of hpEastAsia, whereas 4 strains (Shi470 [21], v225d [22], Sat464 and Cuz20) are close to another subpopulation of hpEastAsia, hspAmerind. Strains 26695, HPAG1, G27, P12, B38, B8 and SJM180 were assigned to hpEurope. Strains J99 and 908 were assigned to hspWAfrica of hpAfrica1. SBI-0206965 PeCan4 was tentatively assigned to hspAmerind although it appears to be separate from the above 4 hspAmerind strains and somewhat closer to other subgroups (a subgroup of hpEurope, hspMaori and a group of “”unclassified Asia”" in the HpyMLST database [18]). We deduced the common core genome structure of these 20 genomes based on the conservation of gene order using CoreAligner [23] (Table 1). CoreAligner determines the set of core genes among the related genomes not by universal conservation of genes but by conservation of neighborhood relationships between orthologous gene pairs allowing some exceptions. As a result, CoreAligner identified different numbers of
core genes among strains (1364-1424), which reflect deletion, Calpain duplication and split of the core genes in the individual strains. For phylogenetic analysis among the strains, we further extracted 1079 well-defined core orthologous groups (OGs) as those that were universally conserved, non-domain-separated, and with one-to-one correspondence (see Methods). The concatenated sequence of all well-defined core OGs resulted in a well-resolved phylogenetic tree (Figure 1). The tree was composed of two clusters, one containing the Japanese, Korean and Amerind strains and the other containing the European and West African strains. The tree strongly supported a model in which the Japanese/Korean strains (hspEAsia) and the Amerind strains (hspAmerind) diverged from their common ancestor, which in turn diverged from the ancestor shared by the European strains (hpEurope) long before.