Murguia Romero, Miguel [1], Serrano-Estrada, Bernardo [2], Melo-Samper-Palacios, Ubaldo [1], Ricker, Martin [1], Magallon, Susana [1], Salazar, Gerardo [1].

A color model to document and follow up records' integrity in biodiversity databases.

Large biodiversity databases (LBDB) commonly adopt the Darwin Core standard, following its nomenclatural and semantic definitions. Although this standard facilitates information exchange, it fails to document the validation status of the contained information. The validation process of large databases requires formal informatic techniques, and the more appropriate one currently available is the Relational Model for Databases (RM). The RM establishes the CRUDE integrity constraints (Column, Referential, User defined, Domain, and Entity), and is the best model to guide the design of techniques to apply and document the validation process of LBDB. Our objective is to build a model to document and communicate the state of User-defined integrity constraints as the validation process for LBDB. We use IBdata (, the web system to consult and manage the collection database of ~1.3 million specimens of Mexico ´s National Herbarium (MEXU), to develop a model for documenting both the User-defined integrity constraints and the result of applying them to each record. The denominated "Color Model" considers an indeterminate number of User-defined integrity constraints that, when applied to each record in the database, could or could not be satisfied. The Color Model works on three levels: Logical (l-L), database (l-DB), and user interface (l-UI). On the l-L level, a color is assigned to each record-validation rule tuple from a three-rooted tree, representing the basic status of a validation result as follows: Green, consistent; red, inconsistent; and yellow, pseudoconsistent, depending on the resolution and threshold at which the validation is performed. Each of these three colors could evolve to another color (for example, from red to green), when the values in the corresponding fields have been reviewed and corrected, such that the validation result becomes consistent. On the l-DB level, the specification of each User-defined integrity constraint, used as a validation rule, is stored in a R-table. On the l-UI level, a circle of the color corresponding to the result of the validation associated with each record-validation rule pair is displayed. The Color Model has been implemented at its three levels in IBdata, allowing the user to know the status of each record-validation rule pair. The Color Model documents the validation process at record level for multiple User-defined integrity constraints that can be applied massively and allows monitoring the validation status for each record-validation rules pair as a tool for community data curation.

1 - Universidad Nacional Autónoma de México, Instituto de Biologí­a, 3er Circuito Exterior, Ciudad Universitaria, Coyoacán, Mexico, DF, 04510, México
2 - SERES Sistemas Especializados, Mexico City, DF, MEXICO

relational model
user-defined constraints
validation of biological databases

Presentation Type: Oral Paper
Session: BIHDII, Biodiversity Informatics & Herbarium Digitization II
Location: Virtual/Virtual
Date: Wednesday, July 21st, 2021
Time: 10:15 AM(EDT)
Number: BIHDII002
Abstract ID:680
