Unlocking the Power of Data Standardization: A Guide to CDISC and FAIRplus Cookbook for Streamlined Research. Transforming data management to foster adoption of FAIR principles

Looking for a resource that provides guidance on FAIR data management in Life Sciences? Look no further than the FAIR Cookbook! This resource is a collection of recipes that offer a deep dive into the technical aspects of FAIR data management and the infrastructure needed. It also provides applied examples of FAIRification with clinical trial, epidemiological, and molecular data, making it a valuable resource for data stewards, data managers, and data curators. One of the recipe of this great resource, if the application of FAIR principle in a context of CDISC-SDTM, a standard model for organizing, annotating, and formatting data from clinical trials.

The FAIR Cookbook is a live resource that provides guidance on how to implement the FAIR Principles in the Life Sciences. It is intended for a wide range of audiences, including researchers, data stewards, software developers, policymakers, and trainers. The recipes are a combination of guidance, technical, hands-on, background, and review types, and are classified according to the intended audience. It is based on a grant funded by UE and IMI. It is an international effort that included the Countries listed on the right side.

The FAIR Cookbook is a community-driven resource that is being populated and improved iteratively in an open manner. It is funded by the IMI FAIRplus project, which comprises a coalition of Europe’s leading experts in data interoperability, standards, pre-clinical to clinical translation, and long-term sustainable data repositories. The Cookbook also links to relevant community, mature, and complementary resources in the Life Sciences, such as the RDMkit, FAIRsharing, biotools, and TeSS, as well as the Pistoia Alliance’s FAIR Toolkit for Life Science Industry and more generic resources, such as ‘The Turing Way’ handbook for reproducible data science.

The FAIR principles provide guidelines for making data more easily findable, accessible, interoperable, and reusable. These principles have been widely adopted by funding agencies, industry leaders, and organizations in various fields.

The FAIRplus Cookbook uses these principles as an organizing principle and offers atomic recipes for implementing them.

Ensuring that data is FAIR is important for facilitating data sharing and reuse, which can lead to more efficient and effective scientific research. However, there are also ethical considerations surrounding data sharing and reuse, such as ensuring data privacy and security and obtaining informed consent from study participants. These issues are explored in the FAIRplus Cookbook.

The first time I have been in touch with this precious service was during the mapping of clinical trial data to CDISC (Clinical Data Interchange Standards Consortium).
CDISC-SDTM is a standard model and framework for organizing, annotating, and formatting data from clinical trials. Regulatory agencies such as the US FDA require clinical trial results to be submitted in this format. The SDTM Implementation Guide (SDTMIG) provides expanded guidance on implementing SDTM for specific use cases or “domains”. CDISC standards provide procedures and guidelines for encoding project-specific data that might not fit into the existing domains.
Mapping a non-conforming data dictionary to CDISC STDM standard and CDISC Vocabulary presents some challenges. Given the complexity of CDISC standards, any project team intending to convert their datasets to SDTM at any point in the project life cycle should aim to align with the standard as early on in the process as possible. Data dictionaries and data collection instruments should, where possible, be aligned to the relevant CDISC standards to facilitate data conversions later on. The FAIRcookbook site was very useful to initiate the project for the tabulation model.
CDISC provides a comprehensive framework for the standardization of data collection, organization, and analysis in clinical trials, with the aim of improving data quality, consistency, and interoperability.
The main parts in CDISC (Clinical Data Interchange Standards Consortium) are the following:
– CDASH (Clinical Data Acquisition Standards Harmonization): CDASH provides guidelines for the collection of data during clinical trials, with a focus on data quality and consistency. It aims to standardize the collection of data across multiple trials, making it easier to compare and combine data from different sources.
– SDTM (Study Data Tabulation Model): SDTM is a standard format for organizing and presenting data from clinical trials. It provides a framework for the tabulation of clinical trial data in a consistent and standardized way, making it easier to analyze and compare data from different sources.
– ADaM (Analysis Data Model): ADaM provides a standard format for the analysis of clinical trial data. It provides a framework for the creation of analysis datasets that can be used for statistical analysis, reporting, and submission to regulatory agencies.
– Define-XML: Define-XML is an XML file that provides metadata for the datasets used in a clinical trial. It describes the variables, datasets, and relationships between them, providing a clear and comprehensive overview of the data used in the trial.
– Controlled Terminology: Controlled Terminology provides a standardized set of codes and definitions for data elements used in clinical trials. It aims to ensure consistency and accuracy in the use of terminology across different trials and data sources.

Another part of the CDISC standard I worked on is the SEND, which provides a way to exchange non-clinical data between organizations, such as pharmaceutical companies and regulatory agencies. The SEND Implementation Guide defines the structure and content of non-clinical data that should be submitted to regulatory agencies in order to support the evaluation of a drug’s safety and efficacy. The SEND Implementation Guide specifies the format and content of datasets for non-clinical studies, including general toxicology, reproductive toxicology, carcinogenicity, and other studies. The guide provides detailed instructions for the organization of data, variable naming conventions, controlled terminology, and other elements necessary for standardized data exchange. The SEND Implementation Guide is intended to be used in conjunction with other CDISC standards, including the Study Data Tabulation Model (SDTM), which provides a standard format for clinical trial data.

Together, these standards help ensure that data is submitted in a consistent and standardized format, which facilitates data exchange, analysis, and evaluation.

data access data analysis data annotation data archiving data collaboration data curation data dissemination data documentation data enrichment.data ethics data exchange data governance data harmonization data integration data lineage data modeling data privacy data provenance data quality data retrieval data reuse data security data sharing data standardization data storage data transformation data transparency data validation data visualization FAIR data principles metadata regulatory compliance research data

Unlocking the Power of Data Standardization: A Guide to CDISC and FAIRplus Cookbook for Streamlined Research. Transforming data management to foster adoption of FAIR principles

Speaker at “8th Artificial Intelligence, Data Analytics and insights summit – DACH Region”, 11th-12th November 2021

Interleaving algorithm for optimization of neural networks with self-learning perceptrons

You may also like