L. helveticus belongs to the largest of eight proposed phylogenetic
Lactobacillus subgroups, the
L. acidophilus-
L. delbrueckii group (
16). This subgroup consists largely of GI tract-associated species, and the 16S rRNA sequence of
L. helveticus shares 98.4% identity with
L. acidophilus, which suggested that the DPC 4571 cheese culture is particularly closely related to the probiotic commensal
L. acidophilus strain NCFM. We confirmed the positioning of
L. helveticus within the group by constructing a phylogenetic supertree with 47 ribosomal proteins (Fig.
2), a robust method of phylogenetic analysis (
14). The relationship between
L. helveticus and
L. acidophilus is reflected in separate branching from the sequenced genomes of dairy
L. delbrueckii and probiotic
Lactobacillus gasseri/L. johnsonii species. More importantly, this relatedness is reflected in the fact that 75% of predicted DPC 4571 ORFs have orthologues in the
L. acidophilus NCFM genome (orthology was defined as a BLASTP E value of <10
−20). Considering the level of conservation between the DPC 4571 and NCFM genomes and the dramatically different environments that the strains inhabit, it was of interest to define the gene sets that differentiated the two closely related organisms (summarized in Table
2). Each genome included approximately 500 predicted genes that were not conserved in the other genome. The majority of these differentiating genes have an assigned function from in silico analyses (the complete lists of genes are compiled in Table S6 and Table S7 in the supplemental material). More than half the genes with predicted functions that were acquired by DPC 4571 or lost from NCFM were, unsurprisingly, for transposase enzymes. However, four restriction/modification (R/M) systems (Lhv27-Lhv28, Lhv260, Lhv1152 to -1158, and Lhv1978-Lhv1979) and two independent putative restriction endonucleases (Lhv1031 and Lhv1478-Lhv1479) are DPC 4571 specific, as well as a large section involved in fatty acid metabolism (Lhv1922 to Lhv1933). The R/M systems may account for the poor transformation efficiencies we observed for DPC 4571 compared to other lactobacilli (data not shown). The presence of additional fatty acid biosynthesis genes in dairy LAB cultures that are absent from
L. acidophilus and
L. gasseri has been noted previously (
27,
46). In addition, a number of DPC 4571-specific amino acid metabolism genes were also identified (see Table S6 in the supplemental material). These DPC 4571-specific genes tend to be clustered within two large 100-kb sections and a number of 15- to 30-kb sections. This clustering is indicative of lateral gene transfer events, and GC content distribution analysis of the genome highlighted a difference at one of the 100-kb regions (Fig.
1). The region is characterized by a GC content of 42% (5% higher than the rest of the genome), and it is flanked by IS elements and unique 12-bp direct-repeat (TCATCTACTTTC) sequences. In addition, the region is not conserved in
L. acidophilus or
L. johnsonii. These data are consistent with recently acquired chromosomal regions or genomic islands (
5) that have been described in
Lactobacillus plantarum (
24) but not in lactobacilli of the acidophilus complex. The putative genomic island (GEI) includes the lipid biosynthesis genes described above, in addition to predicted restriction endonuclease and amino acid metabolism genes (cysteine synthase and serine acetyltransferase).
Since diverging from its GI tract relative, DPC 4571 has lost a range of genes that were highlighted by Altermann et al. (
2) and Pridmore et al. (
35) as encoding features likely to contribute to the ability of probiotic lactobacilli to colonize and interact with the intestinal mucosa and microbiota (Table
2). Fewer putative cell wall-anchoring proteins have been detected in the genome of
L. helveticus DPC 4571 (see Table S8 in the supplemental material) than in that of
L. acidophilus NCFM. Nine proteins with the motif LPXTG were found (22 have been described for
L. acidophilus), and none of them has a predicted gram-positive anchoring domain. Regarding ORFs with the LPQTXE motif, only two of the four in NCFM were detected (with the motif LPQTGE). Both have low homology to much larger surface proteins, and one appears to be inactivated due to a frameshift. A further six surface proteins from
L. acidophilus NCFM are lacking in
L. helveticus DPC 4571 (see Table S7 in the supplemental material), including the LPXTG motif-containing proteins Lba1633, Lba1634, and SlpA (Lba169), which has been described as being involved in adherence (
6a), although it was noted that another protein involved in adhesion, fibronectin binding protein FbpA (Lba1148), is highly conserved. The predicted mucus binding proteins (Lba1020, Lba1377, Lba1392, Lba1460, Lba1609, Lba1652, and Lba1709), some of which encode anchoring motifs, are all missing from DPC 4571 (see Table S7 in the supplemental material). In fact, no complete mucus binding proteins were detected in the
L. helveticus genome. Another feature of the probiotic
L. acidophilus NCFM strain is the ability to utilize a wide range of sugars as energy sources. NCFM has 20 phosphoenol pyruvate-dependent phosphotransferase (PEP-PTS) systems by which the sugars are transported into the cytoplasm and phosphorylated. DPC 4571 demonstrates the classic limited
L. helveticus fermentation profile (see Table S9 in the supplemental material), which is reflected in the fact that only nine individual PEP-PTS systems were identified in the genome. Moreover, the transporters for the complex dietary carbohydrates raffinose and fructooligosaccharides and most of the predicted glucosidase enzymes (10 of 13 genes) in the NCFM genome are missing or inactivated (see Table S7 in the supplemental material). Other deletions that were of interest were the three NCFM potential autonomous units. Neither NCFM nor DPC 4571 has complete prophage sequences (
47), but NCFM does encode three potential autonomous units, designated pauLAI, -II. and -III (
2). The core seven ORFs of pauLAII and pauLAIII and adjacent R/M system genes are all missing from DPC 4571 (see Table S7 in the supplemental material). The core seven ORFs for pauLAI and surrounding ORFs, including the prophage maintenance killer system and antidote genes, are also missing from DPC 4571, although one type III R/M system is encoded at the DPC 4571 locus, corresponding to pauLAI. Finally, the contributions of IS element-mediated events to gene interruption and deletion were investigated by examining the disruption of local gene synteny between the NCFM and DPC 4571 genomes by using ACT. Ten interrupted genes and deletions at 31 loci were predicted (see Tables S6 and S7 and text in the supplemental material). However, the IS-associated differences account for only a fraction of the differences observed between the DPC 4571 and NCFM genomes despite their abundance in the
L. helveticus strain.