Phylogenomics reveals deep molluscan relationships
- (22 September 2011)
- Published online
Evolutionary relationships among the eight major lineages of Mollusca have remained unresolved despite their diversity and importance. Previous investigations of molluscan phylogeny, based primarily on nuclear ribosomal gene sequences1, 2, 3 or morphological data4, have been unsuccessful at elucidating these relationships. Recently, phylogenomic studies using dozens to hundreds of genes have greatly improved our understanding of deep animal relationships5. However, limited genomic resources spanning molluscan diversity has prevented use of a phylogenomic approach. Here we use transcriptome and genome data from all major lineages (except Monoplacophora) and recover a well-supported topology for Mollusca. Our results strongly support the Aculifera hypothesis placing Polyplacophora (chitons) in a clade with a monophyletic Aplacophora (worm-like molluscs). Additionally, within Conchifera, a sister-taxon relationship between Gastropoda and Bivalvia is supported. This grouping has received little consideration and contains most (>95%) molluscan species. Thus we propose the node-based name Pleistomollusca. In light of these results, we examined the evolution of morphological characters and found support for advanced cephalization and shells as possibly having multiple origins within Mollusca.
With over 100,000 described extant species in eight major lineages, Mollusca is the second most speciose animal phylum6. Many molluscs are economically important as food and producers of pearls and shells whereas others cause economic damage as pests, biofoulers and invasive species. Molluscs are also biomedically important as models for the study of brain organization, learning and memory as well as vectors of parasites. Although shelled molluscs have one of the best fossil records of any animal group, evolutionary relationships among major molluscan lineages have been elusive. Morphological disparity among the major lineages of Mollusca has prompted numerous conflicting phylogenetic hypotheses (Fig. 1). The vermiform Chaetodermomorpha (also known as Caudofoveata) and Neomeniomorpha (also known as Solenogastres) traditionally have been considered to represent the plesiomorphic state of Mollusca because of their ‘simple’ internal morphology and lack of shells7. Whether these two lineages constitute a monophyletic group, Aplacophora8, or a paraphyletic grade4, 9 has been widely debated. Some workers have considered the presence of sclerites a synapomorphy for a clade Aculifera, uniting Polyplacophora (chitons; which have both sclerites and shells) and Aplacophora. In contrast, Polyplacophora has alternatively been placed with Conchifera (Bivalvia, Cephalopoda, Gastropoda, Monoplacophora and Scaphopoda) in a clade called Testaria uniting the shelled molluscs4. Morphology has been interpreted to divide Conchifera into a gastropod/cephalopod clade (Cyrtosoma) and a bivalve/scaphopod clade (Diasoma)6. Unfortunately, because of varying interpretations of features as derived or plesiomorphic, a lack of clear synapomorphies, and often unclear character homology, the ability of morphology to resolve such deep phylogenetic events is limited.Molecular investigations of molluscan phylogeny have relied primarily on nuclear ribosomal gene sequences (18S and 28S)1, 2, 3, 10, and have also offered little resolution. Maximum likelihood (ML) analyses of 18S, 28S or both1 recovered most major lineages monophyletic, but support at deeper nodes was generally weak. Subsequent analyses of a combined data set (18S, 28S, 16S, cytochrome c oxidase I and histone H3)2 yielded similar results, namely that bivalves were not monophyletic and support values at most deep nodes were low. Expanding on this study, further work supported a sister-taxon relationship between chitons and monoplacophorans (Serialia) but support at other deep nodes was generally low3. Moreover, Mollusca was not recovered monophyletic (a result significantly supported by Approximately Unbiased, AU, tests; Supplementary Table 1) possibly due to contaminated neomenioid sequences10. Morphological and traditional molecular phylogenetic approaches have failed to robustly reconstruct mollusc phylogeny. Notably, several recent phylogenomic studies (for example, refs 5 and 11) have significantly advanced our understanding of metazoan evolution by using sequences derived from genome and transcriptome data. With this approach, numerous orthologous protein-coding genes can be identified and employed in phylogeny reconstruction. Many of these genes are constitutively expressed and can be easily recovered from even limited expressed sequence tag (EST) surveys. Additionally, these genes are usually informative for inferring higher-level phylogeny because of their conserved nature due to their functional importance. Here, we used such a phylogenomic approach to investigate evolutionary relationships among the major lineages of Mollusca. High-throughput transcriptome data were collected from 18 operational taxonomic units (OTUs; Supplementary Table 2), and augmented with publicly available ESTs and genomes (Supplementary Table 3). To increase data set completeness, data from closely related species were combined in eleven cases, resulting in a total of 42 mollusc OTUs. Every major lineage of Mollusca was represented in the data set by at least two distantly related species, except for monoplacophorans that live in deep marine habitats and could not be procured in adequate condition for transcriptome analyses. For sequence processing and orthology determination, a bioinformatic pipeline was developed that builds upon previous studies (see Methods and Supplementary Fig. 2). This pipeline identified 308 orthologous genes suitable for concatenation and phylogenetic analyses (Supplementary Table 4), totalling 84,614 amino acid positions. To determine the appropriate outgroup to Mollusca, preliminary analyses including a broad range of lophotrochozoans and the cnidarian Nematostella were conducted. Nematostella was included to verify that neomenioid data did not contain cnidarian contamination (see Methods). Maximum likelihood (ML) analyses using the best-fitting model for each gene strongly supported Annelida as the sister taxon of Mollusca (bootstrap support, bs = 100, Supplementary Fig. 3), whereas Bayesian inference (BI) placed Entoprocta + Cycliophora sister to Mollusca with poor support (posterior probability, pp = 0.62, Supplementary Fig. 4). Relationships among major lineages of Mollusca were consistent between analyses with multiple outgroups (Supplementary Figs 3–4) or with only Annelida as outgroup (Fig. 2 and Supplementary Fig. 5; additional information on outgroup selection in Supplementary Results). On the basis of these results, Annelida was selected as outgroup for all other analyses to reduce computational complexity and potential homoplasy from distant or fast-evolving outgroups. This final data matrix including all 308 genes (Fig. 3) had an average percentage of genes sampled per taxon of 41% and an overall matrix completeness of 25.6%, comparable to other major phylogenomic data sets (for example, ref 11). ML and BI analyses of this matrix yielded nearly identical topologies within Mollusca, except for relationships among basal gastropods and placements of the sea slug Pleurobranchaea and the bivalve Mytilus (Fig. 2 and Supplementary Fig. 5). High leaf stability scores for all OTUs (Supplementary Table 3) and strong support for most nodes suggest all OTUs were represented by sufficient data to be reliably placed. Remarkably, branch lengths were relatively uniform; cephalopods did not show long branches as previously reported in analyses of 18S and 28S1, 2, 3, 10.All major lineages of Mollusca were monophyletic with strong support (bs = 100%, pp = 1.00). Importantly, there was strong support at all deep nodes, although the node placing Scaphopoda received moderate support in ML (bs = 72%) but strong support in BI (pp = 0.98). A clade including Aplacophora and Polyplacophora was unequivocally supported (bs = 100%, pp = 1.00) and placed sister to Conchifera, consistent with the Aculifera hypothesis. Moreover, we found strong support (bs = 100%, pp = 0.99) for a sister relationship between Neomeniomorpha and Chaetodermomorpha, supporting the Aplacophora hypothesis but contrary to previous molecular1, 2, 3, 10 and morphological4 studies. To evaluate alternatives to the Aculifera and Aplacophora hypotheses, we used AU tests (Supplementary Table 5). These tests rejected the Testaria hypothesis, which allies chitons with the other shelled molluscs (P < 0.02) and placement of either aplacophoran taxon as sister to all other molluscs (both P < 0.01). Aculiferan monophyly supports interpretation of the Palaeozoic taxon ‘Helminthochiton’ thraivensis as possessing features intermediate between chitons and aplacophorans12, and interpretation of dorsal, serially arranged calcareous structures as a possible aculiferan synapomorphy13. Specifically, the chaetoderm Chaetoderma14 and some, but not all, neomenioids15 possess dorsal, serially repeated sclerite-secreting regions during development. Notably, chiton valves are not thought to be homologous to aculiferan sclerites16, although certain genes involved in patterning these structures may be. Our results highlight a need for developmental gene expression studies of aculiferans to address this issue. Within a monophyletic Conchifera (bs = 100%, pp = 0.98), Gastropoda and Bivalvia were supported as derived sister taxa (bs = 100%, pp = 1.0). Traditionally, a sister relationship between gastropods and bivalves, which relates the two most speciose lineages of molluscs, has received little consideration. However, this relationship has been recovered in molecular studies with relatively limited taxon sampling across Mollusca5, 17. Similarities between the veliger larvae of gastropods and lamellibranch bivalves have been long recognized. Most notably, both possess larval retractor muscles and a velum muscle ring18. Another potential synapomorphy is loss of the anterior ciliary rootlet in locomotory cilia of gastropods and bivalves19. Because of strong support for a gastropod/bivalve clade in most analyses and the implications of this hypothesis for understanding molluscan evolution, we propose the node-based name Pleistomollusca, which includes the last common ancestor of Gastropoda and Bivalvia and all descendents (Fig. 4). Etymology of this name (pleistos from Greek for ‘most’) recognizes the incredible species diversity of this clade of molluscs which we conservatively estimate to contain >95% of described mollusc species.Sister to Pleistomollusca is Scaphopoda (albeit with moderate support in ML; bs = 72%, pp = 0.98) and Cephalopoda represents the sister taxon of all other conchiferan lineages sampled. Despite strong support values for a gastropod/bivalve clade, AU tests failed to reject Scaphopoda as sister to any other conchiferan lineage (P > 0.5). Given the limited sampling for Scaphopoda, additional data may help solidify its position. Nonetheless, all results presented here clearly refute the traditional view of a sister relationship between gastropods and cephalopods (Cyrtosoma; P < 0.01). Features thought to be diagnostic of this clade include a well-developed, free head with cerebrally innervated eyes and a nervous system with visceral loop inwards of the dorsoventral musculature6. However, these characters must be reinterpreted as either symplesiomorphies lost in scaphopods and bivalves, or convergences. Notably, the high degree of cephalization in gastropods and cephalopods has recently been suggested to have evolved independently20. The phylogenomic approach used here also holds promise for resolving relationships within major lineages. For example, although their phylogeny has been widely debated, our broadly sampled caenogastropod subtree was strongly supported throughout (bs = 100, pp = 1.0) and consistent with previous morphological analysis21. We also recovered opisthobranchs paraphyletic with respect to Pulmonata, agreeing with recent morphological and molecular studies22. Additionally, our analyses confirm bivalve monophyly with deposit-feeding protobranchs sister to filter-feeding lamellibranchs. To assess robustness of the reconstructed topology further, we examined the influences of matrix completeness, gene inclusion and substitution models on phylogenetic reconstruction (Supplementary Table 6). Analyses of the 200 and 100 best-sampled genes (Supplementary Figs 6 and 7) recovered the same branching order and relative level of support among major lineages as the full data set. For gene inclusion, matrices of only non-ribosomal (Supplementary Fig. 8) and only ribosomal protein genes (Supplementary Fig. 9) were analysed to address issues of different gene classes (for example, ribosomal proteins) biasing phylogenetic signal5. Support values for deep nodes inferred from non-ribosomal protein genes were generally weak and Aplacophora, Polyplacophora and Bivalvia were not recovered monophyletic. In contrast, analysis of only ribosomal protein genes recovered all major lineages monophyletic with strong support in BI but moderate support for most deep nodes in ML (see also ref. 17). Although ribosomal protein and non-ribosomal protein genes seem to be contributing different amounts of phylogenetic signal, support for most nodes was greater when all gene classes were included, in accordance with previous phylogenomic studies5, 11. We also performed an analysis based on very conservative orthology determination using only the 243 genes for which our method and InParanoid identified the same Lottia sequence as orthologous to the primer taxon (Drosophila) sequence (see Methods). Branching order (Supplementary Fig. 10) was identical to the tree based on all 308 genes (Fig. 2). Our ML analyses differ from other phylogenomic studies by using gene-specific amino acid substitution models rather than a single model across the entire matrix. Thus, for comparative reasons, we also ran single-model ML analyses using the WAG + CAT + F model (Supplementary Fig. 11) and the LG + CAT + F model (Supplementary Fig. 12). These analyses yielded the same relationships as the ML analysis using the best-fitting model for each gene (Supplementary Fig. 5) with similar overall support in all three analyses. We also assessed the effect of model selection by performing a BI analysis using the CAT-GTR model on the data set of the 100 best-sampled genes (Supplementary Fig. 7); this model is too computationally intensive for the full 308 gene data set. Except for the placement of Pleurobranchaea, this analysis yielded the same branching order as the analysis using the CAT model (Fig. 2) with similar support values. Finally, even an approximately ML analysis (Supplementary Fig. 13), which is less computationally intensive, yielded the same relationships among major lineages as the fully parameterized ML analysis. A primary goal of resolving molluscan phylogeny is to improve our understanding of their early evolutionary history. Perhaps more than any other animal group, understanding of molluscan early evolution has been constrained by the notion of a generalized bauplan or ‘archetype’ which is still propagated by some invertebrate zoology textbooks. Arguably, such a viewpoint has hindered our ability to consider how individual characters have evolved within Mollusca. Using a modified version of a morphological character matrix4, we performed ancestral state reconstruction using maximum parsimony and a simplified topology based on our results (Fig. 4) to infer ancestral states for 60 characters across Mollusca (Supplementary Table 7). Even though monoplacophoran transcriptome data were unavailable herein, we were able to evaluate how placement of Monoplacophora influences our understanding of early molluscan evolution. Ancestral state reconstruction of most characters for the last common ancestor of Mollusca was unaffected by the placement of monoplacophorans. We considered three possibilities: (1) Monoplacophora basal within Conchifera, (2) sister to Polyplacophora, and (3) absent from the analysis. In all three cases, only 6 out of 60 characters were influenced (Table 1). For example, ancestral state reconstruction for shell(s) secreted by a shell gland and periostracum changed between absent (Monoplacophora basal conchiferan) and equivocal (Monoplacophora sister to Polyplacophora, or not considered).