Abstract Detail



Machine learning approach in resolving species complexes.

Species complexes have been a long-standing problem in taxonomy. In spite of measuring hundreds of morphological features, their high degree of similarity poses difficulty in distinguishing groups in a species complex. Taxonomists used classification based methods like discriminant analysis, and random forest and ordination methods (NMDS and PCA) to analyze the groups based on morphological features and to reduce their dimensionality. Classification analysis adds human bias to the identity of the groups, whereas, dimensionality reduction analysis does not provide information about the similarity between the groups. Hence, a robust method is needed that can identify the total number of possible clusters and can deduce the features that explain the most amount of variation in the data-set. Here, we present two case studies where we used spectral clustering to delimit species complexes in the genus Hedychium (Zingiberaceae). We propose that the spectral clustering algorithm can be used to find the different clusters because unlike simple k-means clustering, spectral clustering algorithm has the capability of discovering arbitrarily shaped clusters even when standard kernels (like RBF kernel) are used. The genus Hedychium is known to have several species complexes and thus has challenged taxonomists for decades. We measured 150 morphological characters, both vegetative and reproductive, for each taxa (n=5 to 20) from multiple populations belonging to two major complexes (the spicatum-complex and the coronarium complex). We identified seven groups in the spicatum-complex (5 known species, one variety, and one unidentified) and three groups in the coronarium-complex (3 known species). All the known species in both complexes are distributed across the Indian subcontinent and Southeast Asia. Spearman correlation test was performed to identify and remove the highly correlated traits. Spectral clustering suggests 5-10 clusters in spicatum-complex and 3-5 clusters in coronarium-complex. H. spicatum, one of the most widely distributed species, forms two or three clusters along with other sympatric species. These heterogeneous clusters suggest that the continuity in characters can be an outcome of hybridization which is also supported by results from the interspecies crosses. Logistic regression and mutual information based feature selection were performed post-spectral clustering which identified the number of flowers opening per day, the number of fertile bracts, notch depth and notch to labellum ratio as important characters in explaining the variation observed in the clusters. Our results suggest that the spectral clustering combined with feature selection analysis can enhance our understanding of species complexes.

Related Links:
Tropical Ecology and Evolution Lab

1 - Indian Institute of Science Education and Research Bhopal, Biological Sciences, Lab-303, Academic building 3, Near Bhauri Village, Bhopal Bypass road, Bhopal, MP, 462066, India
2 - Indian Institute of Science, Computer Science and Automation, Banglore, Karnataka, 560012, India

species complex
spectral clustering
machine learning.

Presentation Type: Poster This poster will be presented at 5:30 pm. The Poster Session runs from 5:30 pm to 7:00 pm. Posters with odd poster numbers are presented at 5:30 pm, and posters with even poster numbers are presented at 6:15 pm.
Number: PSY025
Abstract ID:1053
Candidate for Awards:None

Copyright © 2000-2019, Botanical Society of America. All rights reserved