The gaps between contigs in scaffolds were closed using the unassembled mate paired reads or by PCR sequencing of the DNA products amplified from the primers flanking the gaps. The assembly and gap closure of TX16 was difficult due to large number of repetitive sequences in the genome. The addition of the large insert 8 kb library with deep clone coverage was able to facilitate the selleck chemical assembly and scaffolding to generate high quality contigs and scaffolds in the de novo assembly. E. faecium strain TX1330 was sequenced by
454 GS20 technology to 6x sequence coverage for fragment reads and by 454 FLX to 69.8x sequence coverage for paired end reads, respectively. TX1330 was also assembled using 454 Newbler assembler. Savolitinib price Plasmids were identified by circularization of DNA
sequences by paired end reads, and were also experimentally verified by PFGE analysis of SmaI and ApaI digested genomic DNA followed by hybridization with PCR-generated probes complementary to 5′ and 3′ ends of plasmid 10058-F4 nmr contigs. PFGE hybridization profiles were then compared to identify neighboring plasmid contigs. The gene prediction for both E. faecium TX16 and TX1330 was accomplished by Glimmer 3 [75] and GeneMark [76]. tRNAScan [77] was used for tRNA prediction, RNAmmer [78] for rRNA prediction, and RFAM/infernal for other non-coding RNA genes [79]. Manual annotation was facilitated by Genboree genome browser (http://www.genboree.org). Conserved protein domains were searched using Pfam [80], COG [81], and InterProScan [82]. Other tools such as PsortB [83, 84], ExPASy ENZYME [85], and the Transport Classification Database [86] were also used to facilitate the annotation. For manual annotation, each entry was annotated by two annotators independently and the differences were reconciliated at the end of the annotation. Genomic sequences and annotations for 20 other draft
E. faecium strains, including 1,141,733; 1,230,933; 1,231,408; 1,231,410; 1,231,501; 1,231,502; C68; Com12; Com15; D344SRF; E1039; E1071; E1162; E1636; E1679; E980; TC6; TX82; TX0133A; U0317, were obtained from NCBI. A complete list of the strains and their clinical sources is provided in Table 2. Genome characterization DNA and protein sequence alignments were performed using BLASTN and BLASTP [87], respectively, unless otherwise Rucaparib stated. Prophage loci were identified using both Prophinder program [47] and Prophage Finder [46]. Prophinder uses BLASTP to search phage proteins in the ACLAME database while Prophage Finder uses BLASTX to search input DNA sequence to an NCBI database of phage genomes. Possible prophage loci were also reviewed manually. IslandViewer [52] server was used to analyze possible genomic islands on the chromosome. IslandViewer integrated sequence composition based genomic island prediction programs including IslandPath-DIMOB [50] and SIGI-HMM [51] as well as comparative genome based program IslandPick [53] for genomic island prediction.