Introduction
Caulerpa J.V.Lamouroux is currently recognized as the single genus in the green algal family Caulerpaceae Kützing (Bryopsidales) with 104 currently accepted species (Guiry and Guiry 2023) including two fossil taxa. Caulerpa species are common inhabitants of tropical and subtropical intertidal and subtidal zones. Like other bryopsidalean algae, Caulerpa thalli lack transverse cell walls, but the genus distinguishes itself by the presence of cell wall ingrowths traversing the cell lumen to provide structural support. Thalli typically consist of a horizontally growing stolon with downward growing colourless rhizophores and upward growing photosynthetic fronds termed assimilators (Zubia et al. 2020). Species are primarily distinguished on the basis of their assimilator morphology which can be leaf-like or consist of a central axis (rachis) bearing lateral branchlets (ramuli) of various shapes. However, morphological plasticity, overlapping morphological species boundaries, and the occurrence of cryptic species, has led several researchers to advocate the use of DNA barcodes for reliable species identification (Belton et al. 2014, Draisma et al. 2014, Sauvage et al. 2021). A DNA barcode refers to a short and specific segment of DNA that is used for identifying and classifying species. It is typically a standardized region of the genome that exhibits sufficient variability among different species while maintaining conservation within individuals of the same species. DNA barcoding involves sequencing the barcode region of unknown samples that are then compared against a database of pre-identified species, known as reference libraries (Phillips et al. 2022), such as the Barcode of Life Data Systems (BOLD, http://www.boldsystems.org) (Ratnasingham and Hebert 2007) and GenBank (https://www.ncbi.nlm.nih.gov/genbank/) (Benson et al. 2013) using the Basic Local Alignment Search Tool (BLAST), https://blast.ncbi.nlm.nih.gov/) (Johnson et al. 2008). The BLAST top hit is considered conspecific to the queried sequence if it is 100% identical or the difference does not exceed the DNA barcode gap. The DNA barcode gap is defined as the region that separates the distribution of infraspecific pairwise distances from inter-specific distances among related taxa using a DNA barcode. The chloroplast-encoded tufA gene was selected as DNA barcode for the marine green macroalgae (with the exception of the Cladophorales) (Saunders and Kucera 2010). Caulerpa DNA barcoding has been applied in biodiversity assessment studies (Barata 2008, Kazi et al. 2013, Belton et al. 2015, Fernández-García et al. 2016, Belton et al. 2019, Dumilag et al. 2019, Darmawan et al. 2021), in systematics studies (Belton et al. 2014, Draisma et al. 2014, Sauvage et al. 2021), to the identification of macroalgae in the aquarium trade (Stam et al. 2006, Vranken et al. 2018, Woodhouse and Zuccarello 2021), to the determination of kleptoplast origins in sacoglossan sea slugs (Waegele et al. 2011, Wade and Sherwood 2017), and in surveying endolithic phototrophs by metabarcoding (Sauvage et al. 2016). The complete tufA gene is 1221–1230 nucleotides (nt) long in Caulerpa, but the barcode region that is usually targeted for species identification spans only about two thirds of it, i.e., nt positions 289–1108 of the 1230 nt (811–820 nt). It is difficult to define species boundaries in Caulerpa based on tufA DNA sequence difference, because a fixed tufA DNA barcode gap does not exist (Sauvage et al. 2013). Kazi et al. (2013) observed an overlap between levels of intraspecific genetic distance and interspecific genetic distance in the genus Caulerpa. Different molecular species delineation methods indicated a different number of Caulerpa species (Belton et al. 2014). Sauvage et al. (2016) performed a species delimitation analysis on a dataset of 901 ulvophycean tufA sequences (891 nt in length), including 172 Caulerpaceae, using the Generalized Mixed Yule Coalescent method (GMYC) (Fujisawa and Barraclough 2013) and the Automatic Barcode Gap Discovery method (ABGD) (Puillandre et al. 2012). They noted a high incongruence between the two methods for the Caulerpaceae, i.e., 65 GMYC species and 139 ABGD species or Molecular Operational Taxonomic Units (MOTUs). It is paramount that researchers using DNA barcodes to assist in species identification, have a sound understanding of the level of DNA sequence differentiation, especially when a taxonomic group is understudied and, therefore, is inadequately represented in GenBank. Moreover, many Caulerpa accessions in GenBank are incorrectly identified and names are usually not updated by the submitters (Stam et al. 2006, Woodhouse and Zuccarello 2021). This complicates species identification by researchers who are not familiar with Caulerpa. For example, Zuldin et al. (2019) concluded that they found C. macrodisca based on tufA, but they were not aware that they also sampled C. megadisca sequences from GenBank (JN645149, JN645154, Belton et al. 2014). Their queried specimen had morphological features of both C. macrodisca (ramuli attached to assimilators) and C. megadisca (very large ramuli), but this clade remained internally unresolved in the phylogeny they presented, so it cannot be concluded with certainty that they found C. macrodisca. Moreover, the authors did not submit their newly generated tufA sequence to GenBank.
Belton et al. (2014) proposed tufA reference sequences for nine Caulerpa species of the so-called C. racemosa-peltata complex (Sauvage et al. 2013) (Suppl. Table S1). Belton et al. (2019) proposed tufA reference sequences for 29 Caulerpa species from southern Australia (Suppl. Table S1) and Sauvage et al. (2021) proposed a tufA reference sequence for their newly described species C. wysorii. For newly described species (i.e., C. coppejansii, C. megadisca, C. perplexa, C. wysorii), the reference sequence was obtained from the type specimen. Otherwise a sequence was chosen from a specimen collected near the type location. For C. fergusonii, no sequence was available from tropical Sri Lanka (type location) and instead a sequence was selected from temperate Australia with the remark that it should be replaced once a sequence from Sri Lanka becomes available (Belton et al. 2019). These studies also recognized ‘dark taxa’, which are genetically distinct specimens that could not be linked to existing species or have not yet been formally described as new species (Page 2016). Belton et al. (2014) reported Caulerpa sp. 3 (JQ894932) and Caulerpa sp. 10 (JN645159), both having globose ramuli like in C. racemosa. Both unidentified species were only represented by a single specimen and, therefore, the authors refrained from attributing a name to them as they did for nine other clades in the C. racemosa-peltata complex. Sauvage et al. (2021) described a dark taxon from Famà et al. (2002) (Caulerpa sp., AJ417962) as new species (i.e., C. wysorii) after more specimens were found and sequenced. They also discovered a new dark taxon ‘C. nuda’ (MT441788) in their study. Belton et al. (2019) noticed that the tufA sequence under the name C. flexilis (AJ417970) in Famà et al. (2002) differed by ten nt substitutions from the tufA sequences representing C. flexilis in their study. It formed a separate lineage and they presumed that Famà et al. (2002) misidentified their specimen. Draisma et al. (2014) found that their specimen identified as C. brownii var. selaginoides J.Agardh from New Zealand (FM956037, FR668294) differed by 2.7% in tufA and 1.7% in rbcL from their C. brownii (C.Agardh) Endlicher from Australia (FR848341, FR848359), suggesting that they were dealing with two distinct species. The same study also included two C. longifolia specimens that differed by 2.0% in tufA (FR848338, FM956040). One of both (FM956040) is now considered to belong to a distinct species C. crispata (Belton et al. 2019), but the name is not updated in GenBank. Draisma et al. (2014) also recognized multiple cryptic species in the morphospecies C. verticillata and C. ambigua based on DNA sequence differentiation. They retained the morphospecies names, but added a number to each clade as they did not know to which molecular species clade the respective types could belong.
The molecular identification of specimens to any level of biological organization through DNA barcodes necessarily depends a priori on known taxon designations brought about through the current state of taxonomic practices (DeSalle 2006). The aim of the present study was to compile a tufA reference sequence database for the genus Caulerpa including dark taxa and cryptic species complexes. Therefore, we reviewed the published Caulerpa tufA sequences in the context of currently accepted taxonomy in AlgaeBase (Guiry and Guiry 2023). One sequence for each species clade was selected to serve as reference sequence and a quality index (QI) score was assigned to each based on geographic proximity to the type location. As a measure for proximity we adopted the hierarchical Marine Ecoregions of the World (MEOW) bioregionalization system proposed by Spalding et al. (2007). The MEOW system is primarily based on the distribution of marine vertebrates and invertebrates but is commonly used to describe seaweed distribution patterns (e.g., Leliaert et al. 2018, Gabriel et al. 2020, Yip et al. 2020, Lagourgue et al. 2022). MEOW classifies coastal and shelf areas into 12 realms, 62 provinces, and 232 ecoregions. Caulerpa can be found in all realms except the Arctic realm and Southern Ocean realm. Detailed boundaries of each ecoregion can be found at https://databasin.org/ (accessed on 23 June 2023, MEOW dataset uploaded by The Nature Conservancy GIS Staff).
We also reviewed the available tufA sequences for the sister-clade of the Caulerpaceae, which can serve as outgroup in phylogenetic analyses. Draisma et al. (2014) demonstrated that simple sterile thalli identified as Pseudochlorodesmis F.Børgesen formed a sister-clade of the Caulerpaceae. However, Pseudochlorodesmis is polyphyletic and Verbruggen et al. (2009) proposed to use Pseudochlorodesmis as a form genus and to subsume Siphonogramen I.A.Abbott & Huisman and Botryodesmis Kraft therein. AlgaeBase adopted the taxonomic treatment of Cremen et al. (2019) and classified Pseudochlorodesmis in the family Halimedaceae Link. Caulerpaceae and Halimedaceae are sister-families in Cremen et al. (2019), but their study did not include the Pseudochlorodesmis clade that is sister to the Caulerpaceae in Draisma et al. (2014).
Materials and Methods
The list of the 102 currently accepted extant Caulerpa species, including information about type locations, was downloaded from AlgaeBase (accessed on 11 September 2023) and is shown in Supplementary Table S1 with additional notes on the taxonomy of these species.
All Caulerpa and Pseudochlorodesmis tufA sequences were downloaded from GenBank (accessed on 11 June 2023) includeing some submitted under different genus names (Caulerpella Prud’homme & Lokhorst, Siphonogramen) and aligned using the BioEdit Sequence Alignment Editor v.7.0.5.3 (Hall 1999). We added the tufA alignment of Draisma et al. (2014), which included Caulerpa sequences not submitted to GenBank and sequences of the Halimedineae Hillis-Colinvaux ex Verbruggen & Guiry to serve as outgroup in a pilot analysis. We selected two sequences from Barata (2008), which are not in GenBank. These sequences represent C. kempfii (a species not represented in GenBank) and C. lanuginosa, which differed by 22 nt out of 820 (2.7%) from a C. lanuginosa sequence in GenBank (DQ652496) and must, therefore, be considered to be a distinct species. New tufA sequences were generated for four species, i.e., C. bikinensis, C. fergusonii, C. filicoides, and C. matsueana (collection information in Suppl. Table S1 and GenBank). Caulerpa bikinensis and C. matsueana were sequenced for the first time. The new C. fergusonii and C. filicoides sequences were from specimens collected closer to their respective type locations than the ones already in GenBank. The new tufA sequences were generated following Draisma et al. (2014) using the primer combination tufAF/tufAR1 for C. bikinensis and C. filicoides, tu157F/tu818R (Sauvage et al. 2014) for C. fergusonii, and tu157F/tufAR1 for C. matsueana. A Maximum Likelihood (ML) guide tree (not shown) was generated on the IQ-TREE web server (http://iqtree.cibiv.univie.ac.at/) (Nguyen et al. 2015, Trifinopoulos et al. 2016, Minh et al. 2020) with partitioning of the data by codon position (Chernomor et al. 2016) and otherwise default settings, including the implemented ModelFinder to choose the best-fitting substitution model (Kalyaanamoorthy et al. 2017), the ultrafast bootstrap approximation feature (UFBoot) (Hoang et al. 2018), and the Shimodaira-Hasegawa-like approxmate likelihood ratio test (SH-aLRT) (Guindon et al. 2010). Gaps (–) are treated as missing data. UFBoot support values ≥ 95% correspond roughly to a probability of 95% that a clade is true. For SH-aLRT, values ≥ 80% are considered 95% reliable. The tree file was opened in FigTree 1.4.4 (Rambaut 2018) and rooted with the Dichotomosiphonaceae G.M.Smith, which is sister to all other Halimedineae (Draisma et al. (2014). Thus the taxa belonging to the sister-clade of the Caulerpaceae (i.e., Pseudochlorodesmis pro parte) could be identified. A new guide tree (not shown) was generated with only the sequences from the Caulerpaceae clade and its Pseudochlorodesmis sister-clade as outgroup. All subclades within the Caulerpaceae were evaluated as to which species clade they represented, cross-referencing with the original publications and checking available voucher material. One sequence was selected for each species clade that had no reference sequence assigned to it yet. The 35 reference sequences proposed in Belton et al. (2014, 2019) that were not obtained from type specimens, were re-evaluated. If multiple species clades were candidate for one species name, then these were numbered (e.g., C. ambigua-1, C. ambigua-2, etc.) and one of each was selected as reference sequence. Ideally, the reference sequence was from the type specimen, otherwise it was aimed to select a reference sequence that was obtained from a morphologically similar specimen that was collected in close proximity to the type location. Other selection criteria were also taken into account, i.e., sequence length (at least 500 nt), overlap with the 820 nt barcode region, and the availability of a voucher. A quality index (QI) score was assigned to each tufA reference sequences as follows: T) sequence obtained from the type specimen, 1) from a specimen from the type location, 2) from the same ecoregion (sensuSpalding et al. 2007) as the type location, 3) from the same marine province as the type, 4) from the same marine realm, and 5) from another realm or unknown origin. Reference sequences of dark taxa were not given a QI score. Exact type locations are not available for many species. For example, the type location of C. selago is reported to be the Red Sea (Silva et al. 1996). The best candidate for a reference sequence is from a specimen from the Red Sea. It will be given a QI range score 1–3, because it could be from the type location, but it is certainly from the same marine province (the Red Sea is subdivided into two ecoregions).
The 1230 nt Caulerpa-Pseudochlorodesmis tufA reference sequence alignment was analysed in IQ-TREE as described above (ML inference) and the tree was rooted with the Pseudochlorodesmis clade. ModelFinder selected the following models for, respectively, codon positions 1, 2, and 3: GTR+F+G4, HKY+F+G4, and GTR+F+I+G4. The selected model for second codon position was F81+F+G4 when only the 820 nt barcode region was analysed.
Results
Eighteen tufA sequences were assigned to Caulerpa’s sister clade Pseudochlorodesmis, including ten derived from kleptoplasts. They had been submitted to GenBank as Pseudochlorodesmis sp., Siphonogramen abbreviatum (W.J.Gilbert) I.A.Abbott & Huisman, Bryopsidales sp., or Uncultured Ulvophyceae. These eighteen sequences were thought to represent nine distinct species-level (MOTUs) taxa based on sequence differentiation and nine were selected as outgroup and labeled Pseudochorodesmis-1 to -9 (Suppl. Table S1, Fig. 1).
Eighteen tufA sequences were assigned to the Caulerpa subgenus Caulerpella, including two derived from kleptoplasts. They had been submitted to GenBank as Caulerpa sp. or as Caulerpella ambigua (Okamura) Prud'homme & Lokhorst. These eighteen sequences were thought to represent nine distinct species-level taxa based on sequence differentiation and nine were selected for the reference sequence alignment and labeled C. ambigua-1 to -9 (Suppl. Table S1, Fig. 1). Four species were found in Okinawa, the nearest location to the type location of C. ambigua (i.e., Bonin Is). Two species were found in the western Atlantic, where the type location of the only other currently accepted species in the subgenus and section Caulerpella, i.e., C. vickersiae, is located (Suppl. Table S1).
In addition to the two C. verticillata sister-species clades recognized by Draisma et al. (2014), here as C. verticillata-1 and -2, a third species was recognized, i.e., C. verticillata-3 (Suppl. Table S1, Fig. 1). Caulerpa verticillata-1 was found in the West Indies, the type location C. verticillata, but also in the Indo-Pacific. Caulerpa verticillata-2 (from Indonesia) and -3 (from Brazil, voucher not seen by present authors) are each only represented by a single tufA sequence in GenBank. Caulerpa lanuginosa is also represented by two species (as mentioned in the Materials & Methods section), here labeled C. lanuginosa-1 (from the Florida Keys, type location) and C. lanuginosa-2 (from Brazil, figs 25‒26 in Barata 2008).
The 39 tufA reference sequences proposed in Belton et al. (2014, 2019) and Sauvage et al. (2021) were retained, except those for C. fergusonii (JN851136) and C. cupressoides (AJ417929), which were replaced with sequences from specimens closer to their respective type locations (Suppl. Table S1). Sixty-five of the selected reference sequences had a QI score ≤ 4, meaning that they were from the same marine realm as the type specimen (Suppl. Table S1). Forty-five had a QI score ≤ 3 (same marine province), 29 had a QI ≤ 2 (same ecoregion). After inspection of GenBank accession AJ417970 (as C. flexilis), it was concluded that it should not be attributed to a dark taxon as suggested in Belton et al. (2019) for reasons outlined in the Discussion. When the two C. brownii tufA sequences from Draisma et al. (2014) were compared to those from Belton et al. (2019) (KF649856-58) from Australia and Lang et al. (2017) from New Zealand (MG721694), it was concluded that Draisma et al. (2014)’s C. brownii var. selaginoides from New Zealand (FM956037) belonged to the C. brownii species clade and that their C. brownii from Australia (FR848341) must be considered as dark taxon. It was added to the alignment as Caulerpa sp.1 ecad brownii. The other dark taxa mentioned in the Introduction (Belton et al. 2014, Sauvage et al. 2021) were retained and labeled Caulerpa sp.2 ecad racemosa, Caulerpa sp.3 ecad racemosa, and Caulerpa sp.4 ecad nuda (Suppl. Table S1).
The four newly generated Caulerpa tufA sequences, as well as three sequences from the study by Draisma et al. (2014) that were selected as reference sequence, but were not yet in GenBank, were submitted to GenBank and have the accession numbers OR887543-49 (Suppl. Table S1). In addition, the sequence of C. verticillata-2 (FM956071, 759 nt) was updated (FM956071.2, 843 nt). GenBank accession JN185577 (Pseudochlorodesmis-2) was not previously reported in an article. The final alignment (Suppl. File 2) included 98 taxa (including nine outgroup taxa, Pseudochlorodesmis spp.) and is 1230 nt positions long, but includes only three complete tufA sequences. Four selected sequences were shorter than 600 nt, 86 sequences covered > 90% of the 820 nt barcode region (Suppl. Table S1). A three-codon gap needed to be introduced to the C. ambigua sequences to restore alignment, except in C. ambigua-4, which is sister to all other species in the subgenus Caulerpella (Fig. 1). A two-codon gap was introduced in the sequences of C. papillosa and C. vesiculifera (both subgenus Caulerpa section Sedoideae) in the same alignment position as the gap in C. ambigua. The C. scalpelliformis (subgenus Caulerpa section Caulerpa) sequence had a single codon gap six codons upstream. The 1230 nt alignment contained 375 parsimony-informative sites (86, 35, and 254 at, respectively, codon positions 1, 2, and 3) and 748 constant sites. The 820 nt barcode region contained 339 parsimony-informative and 431 constant sites. The ML tree (Log-likelihood score -11027.0375) based on the 1230 nt tufA alignment with 98 taxa is shown in Figure 1. The ML tree based on the 820 nt barcode region (Log-likelihood -9820.6706) had the same topology (not shown).
Discussion
The tree in Figure 1 is the most comprehensive Caulerpa phylogeny published to date, representing 73 of the 102 currently accepted extant Caulerpa species (Guiry and Guiry 2023). Half of the species for which no DNA barcode is proposed, are likely synonyms (Suppl. Table S1). All proposed reference sequences were obtained from specimens collected in the wild, except for C. sp.3 ecad racemosa, which was collected at a local market in Okinawa. The tree includes four dark Caulerpa taxa and multiple representatives for a monophyletic C. ambigua clade (9), a monophyletic C. lanuginosa (2), and a polyphyletic C. verticillata (3) (Fig. 1), adding up to a total of 89 Caulerpa tufA reference sequences (Suppl. Table S1). A DNA sequence for C. matsueana is provided here for the first time. The C. matsueana sequence reported in Draisma et al. (2014) (not in GenBank) is now considered C. opposita. We also submitted a DNA sequence under the name C. bikinensis for the first time. However, GenBank accession FN667649 (as C. manorensis) is considered conspecific based on a single nucleotide substitution in 851 nt overlap and re-examination of the voucher. Caulerpa manorensis is described as having flattened, mostly oppositely arranged, ramuli (Nizamuddin 1964), whereas ramuli in C. bikinensis can be flat, obovoid, or ellipsoidal, and mostly alternately arranged (Tsuda 2021). FN667649 has flattened, mostly alternating ramuli. C. bikinensis and C. manorensis are not closely related (Fig. 1). The here proposed tufA reference sequence of C. manorensis (KY819068), the only one available, differs only by one and two nt substitutions from, respectively, that of C. macra (KF256089) and C. veravalensis (KC153501). The close relationship between these three species (Fig. 1) is demonstrated for the first time. Caulerpa macra has three-dimensional ramuli that may come in a variety of forms (oviform, pyriform, claviform to slightly bulbous), but never flat, either distichously or radially arranged (Belton et al. 2014). Only C. macra is represented by multiple sequences in GenBank (all under the name C. racemosa or C. racemosa var. macra). The infraspecific variation among tufA sequences assigned to C. macra by Belton et al. (2014) exceeds that of the difference between C. manorensis, C. veravalensis, and C. macra, casting doubt about their independent species status. One of the two species delimitation methods applied by Belton et al. (2014) suggested that the C. macra clade in their study represented two species. A close relationship between C. veravalensis and C. manorensis may have been expected, both having flattened assimilators, a feature they share with C. faridii and C. qureshii for which no DNA sequence data exist. Caulerpa qureshii was considered a synonym of C. veravalensis by Draisma et al. (2014, suppl. table s2). All these four species with flattened ramuli were described from the Western Indian ecoregion, but only C. veravalensis has been sequenced from this region and only C. manorensis has been reported from outside this region. The C. macra-manorensis-veravalensis complex is either a morphologically very plastic species or the tufA gene has insufficient discriminative power in this clade. Extreme morphological varieties have been attributed to a single species before based on identical tufA DNA sequences (e.g., C. wysorii, Sauvage et al. 2021). Fernández-García et al. (2016) found the East Pacific endemic C. vanbosseae Setchell & N.L.Gardner to be polyphyletic inside an East Pacific subclade of C. chemnitzia and concluded that the former was a reduced growth form of the latter. We refrain from merging the C. macra-manorensis-veravalensis complex into a single species. We recommend that its genetic variation should be further explored with more specimens and other DNA markers, e.g., the chloroplast-encoded rpoA gene, which was introduced by Sauvage et al. (2021) as DNA marker to distinguish between Caulerpa species, supplementing the tufA barcode.
Caulerpa cupressoides and C. serrulata are strongly supported sister-species in the tufA phylogeny (Fig. 1) and their proposed reference sequences only differ by four nt substitutions in the 820 nt barcode region. The 1230 nt tufA sequences from two complete chloroplast sequences of both species (MG797569 and MK792749) differ by six nt substitutions. In contrast, however, phylogenies based on the rbcL gene, which is generally slightly less variable than tufA in the genus Caulerpa, showed them not to be closely related (de Senerpont Domis et al. 2003, Kazi et al. 2013, Belton et al. 2019, Gao et al. 2020). Morphology and other tested DNA markers (chloroplast-encoded ycf10-chlB and rpoA, nuclear ITS and 18S rDNA) support the close relationship (de Senerpont Domis et al. 2003, Kazi et al. 2013, Draisma unpublished data). We replaced Belton et al.’s (2019) proposed reference sequence for C. cupressoides (AJ417929 from Famà et al. 2002, no voucher) with one of equal length from Stam et al. (2006) and with an equal QI score (= 2), but that differed by 1 nt substitution at position 1075, which is near the 3’-end of the sequenced fragment. This substitution at a first codon position and resulting in an amino acid substitution, was not found in any other Caulerpa tufA sequence, except in three other sequences published in Famà et al. (2002). It could very well be a base call mistake, considering that the substitution is at a position near the end of the forward sequence reaction where the chromatogram becomes increasingly difficult to interpret and where the reverse reaction was probably not yet legible. Moreover, Famà et al. (2002) were the first to sequence tufA in Caulerpa and thus had no reference sequences available other than the ones generated by themselves.
Accessions FM956016 and FM956019 (unpublished) are both under the name C. lessonii in GenBank, which is morphologically similar to C. cupressoides. The voucher of FM956019 is a mix of C. cupressoides and C. lessonii morphologies and DQ652345 (as C. cupressoides) was re-identified by W.F. Prud’homme van Reine (personal communication to S.G.A. Draisma) as C. lessonii. These three sequences were part of the C. cupressoides- serrulata clade in the guide tree, suggesting C. lessonii may not be a distinct species and we do not propose a reference sequence for it. There are several GenBank accessions under the name C. microphysa, but none from the Central Indo-Pacific marine realm where its type location is located. They are all part of the C. lentillifera clade. There is a GenBank accession under the name C. lentillifera (FM956024) from the same marine province as the type location of C. microphysa.Draisma et al. (2014) already suggested that these two morphologically similar entities are conspecific. No reference sequence is proposed for C. microphysa.
As stated in the Results section, GenBank accession AJ417970 (as C. flexilis), was excluded from the list of dark taxa. Inspection of the tufA alignment revealed that the ten nt substitutions occur within the last thirty nt of the sequenced fragment (3’-end) and that this part of the sequence is similar to that in the subgenus Caulerpa. Caulerpa flexilis was the single representative of the subgenus Araucarioideae in the study by Famà et al. (2002), who were the first to sequence tufA in the genus Caulerpa. Therefore, we conclude that the ten nt difference is likely due an incorrect interpretation of the chromatogram near the 3’-end and that the specimen is indeed C. flexilis. The chromatogram and the voucher specimen are not available for re-examination.
Our pursuit of an optimal QI score was not strictly followed. For C. ashmeadii a tufA sequence with a better QI score was available, i.e., KF977086 (921 nt), which was collected 100 km from the type location (QI score = 2). The tufA sequence covers 100% of the barcode region and is identical to that of the selected reference sequence, which was collected 400km from the type location in a neighbouring ecoregion, but in a different realm (QI score = 5), only 25 km from the realm boundary. Given the close proximity of the two alternative choice options, our preference was for the one with a complete tufA sequence. The complete chloroplast and mitochondrial genomes of this specimen have also been sequenced (Sauvage et al. 2019). For C. brachypus, KF649909 is from a location closer to the type location (QI score = 2) than the one from the reference sequence proposed in the present study (QI score = 5), but its length is only 441 nt (421 nt in the barcode region, i.e., 51%). Where the sequence overlaps, it is identical to that of C. subserrata (AJ417935, no voucher), which may lead to the conclusion that these two entities are conspecific. The proposed reference sequence for C. brachypus (KF314158) differs by one nt substitution from KF649909 and two more from AJ417935 outside the barcode region. AJ417935 was treated as C. brachypus in Sauvage et al. (2013). Several Caulerpa species are only represented by a single tufA GenBank accession, i.e., C. bartoniae, C. bikinensis (new), C. biserrulata, C. delicatula (as C. lanuginosa var. delicatula), C. heterophylla, C. manorensis, C. matsueana (new), C. minuta (as C. parvifolia), C. selago, C. subserrata, C. vesiculifera, as well as the aforementioned four dark taxa and the cryptic taxa C. ambigua-1, -5, -6, and -9 and C. verticillata-2 and -3. The tufA sequences of C. kempfii and C. lanuginosa-2 were not submitted to GenBank, but Barata (2008) reported the latter taxon twice in her thesis.
Only seven Pseudochlorodesmis species (plus one variety) have been described of which five are flagged as currently taxonomically accepted in AlgaeBase and two were transferred to the genus Siphonogramen in which they are the only two species. As explained in the Introduction, the genus Pseudochlorodesmis is considered a polyphyletic form genus, but only those tufA sequences that formed a sister-clade to the Caulerpaceae, were selected for the present study. The type location of the type species of the genus, i.e., P. furcellata (Zanardini) Børgesen, was not specified, but is likely in the Adriatic Sea (Mediterranean Sea) (Silva et al. 1996). None of the nine Pseudochlorodesmis species in the outgroup of the present study are from the Mediterranean Sea (Suppl. Table S1). Other Pseudochlorodesmis lineages have representatives from the Mediterranean (Verbruggen et al. 2009, Sauvage et al. 2016, Cremen et al. 2019).
The tufA reference sequence alignment from the present study and supplementary Table S1 will also be available from https://doi.org/10.5281/zenodo.10209540 and updated versions will be published when new reference sequences become available and some will be replaced when longer tufA sequences become available or with a better QI score. We recommend that future Caulerpa diversity assessment studies include this reference sequence alignment in the analysis. Since species boundaries remain contested in the genus Caulerpa, morphological observation remains important, especially as long as not all accepted species are included in the tufA alignment. A Caulerpa species without its tufA sequenced may be classified into a Caulerpa section based on morphology with some confidence, but its phylogenetic position within a section cannot reliably be predicted without a DNA sequence as previous studies showed (Belton et al. 2014, 2019, Draisma et al. 2014). Chromatograms need to be checked carefully for nt substitutions in closely related sister-species that are also morphological similar, especially when represented by only a single tufA sequence in GenBank or found outside its confirmed distributional range.
Phylogenetic relationships between Caulerpa subgenera remain largely unresolved. The tree in Figure 1 may serve as a guide tree to select taxa for multiple gene sequencing to infer more robust phylogenetic relationships between subclades, which subsequently can be used to study the historical biogeography of the genus for which species distributions should be confirmed with tufA barcodes. Phylogeographic structure has been demonstrated in some species with tufA, e.g., C. prolifera (Varela-Álvarez et al. 2015), C. chemnitzia (Fernández-García et al. 2016), and C. macrodisca (Pattarach et al. 2019). However, a careful evaluation is necessary whether a species was recently introduced to a region. When species checklists are updated, the nomenclature of the old list is updated and new species records are added, but previous records are seldom removed. The South China Sea coasts of Vietnam and the Philippines are of similar length and lie in the same climate zone. There are many islands in the sea between them that may serve as stepping stones and the sea surface current changes direction during the year (Wyrtki 1961), suggesting there are no major dispersal barriers. However, Sørensen’s similarity index Cs (Magurran 1988) of the South China Sea seaweed flora’s of these two countries is low, Cs = 0.3649 (Phang et al. 2016). This low similarity is likely an artefact resulting from taxonomic inconsistencies. Seaweed checklists can be more objectively compared using DNA barcodes. The Caulerpa tufA reference sequence database presented in the presented study will help researcher to use the same name for a taxon.
Supplementary Material
Supplementary materials are available at Jeju Journal of Island Science website (https://www.jjis.or.kr/).