Skip to content
 Back to the Blog

Why RWD in rare diseases needs a common data model

Why RWD in rare diseases needs a common data model

When it comes to rare diseases, Real-World Data (RWD) precision can be lost when aggregating samples across countries, and there is a strong case to create a common data model in Europe to form a single population where multiple autonomous sources of data are combined into a federated database.

The advantages of a common data model 

“The most interesting data source to improve patient pathways and avoid delayed diagnoses would be a biomedical data warehouse combined from different care providers, but we don’t have that yet,” says Prof. Pierre-Antoine Gourraud, “We are debating about how to combine multiple databases. The use of synthetic data is most promising. We have been using the Avatar methods for the past three years. That would be extremely powerful, to see the flow of patients in one location to the other.”

Currently, in many rare disease data queries, the data in some countries is unable to be used, because the sample size of patients falls below a threshold number, and there is an inherent risk of patient identification happening in small groups. A common data model would alleviate concerns about patient anonymity by increasing the population size and would increase data quality and consistency.

What’s “under the hood” of a common data model: consistent data coding and structure

Not only would sample sizes be improved, but hospital data in the future could combine information on genomics and imaging using consistent coding and structured in a more accessible way, which would improve the richness of data available to researchers when drilling down into the data, delivering more meaningful results.

“The future is “In patient” Electronic Medical Records data, plus genomics and imaging,” says Gilles Paubert, but for these to be added the data must be coded and structured. “In the future we’ll see a federated data model, which will avoid downloading data from an hospital, but will enable working within a system that connects multiple hospitals.”

However, before this can happen a great deal of work is needed to ensure consistency of coding and structure within the data, and to encourage change from “paper” data capture to electronic data capture that is easy to use and saves clinicians’ time.

This perspective is shared by Prof. Pierre-Antoine Gourraud,

“About 60% to 80% what we do in our biomedical database at the hospital is to look for a number of patients that correspond to a couple of criteria. People are looking for a particular type or of set of patients and we then combine the coded data. I think the combination of criteria to identify the patients, as well as to better define assumptions, is what would really what make a big difference in the future.”

Mockup Livre blanc RDRWD  843x778px

Rare Diseases Whitepaper

Rare Disease and Real-World Data: How Real-World Data Can Help Deliver Better Therapies and Better Outcomes


Federated database: an example

Federated or “virtual” databases consist of interconnected databases that remain autonomous and can be geographically decentralized, with no data integration occurring in the disparate databases. Users can store and retrieve data with a single query through data abstraction via a uniform user interface, and the federated database queries each of the constituent database management systems, compositing the results of the queries to give a combined result.

Federated databases allow for a type of analysis where the data stays where it has been produced, which protects data privacy.

“Many people are very excited about distributed analysis. Which is a great example of technology where the data stays where it has been produced and the analyses travel. It leaves the privacy protection and responsibility to the institution hosting the data. It allows for more control over who has access to the data and what they do with it, for example in order to perform analytics and AI processes on the data,” says Prof. Pierre-Antoine Gourraud


Subscribe to our Newsletter

In this newsletter, Gilles Paubert, the Global Head of Cegedim Health Data, shares some of his favorite recent datascience and RWD articles…

By subscribing to this newsletter, you consent to GERS (Cegedim Health Data), the data controller, processing your data for such purpose. In accordance with the GDPR and French law n° 78-17 of 6 January 1978 (Informatique et Libertés), as amended, you have the right to access, rectify, erase, restrict and object to the processing of your personal data and the right to data portability. You may exercise these rights by sending an email to the following address:; or by writing to GERS, DPO, 137 rue d'Aguesseau - 92100 Boulogne-Billancourt, France. If you feel, after contacting us, that your "Data Protection" rights are not respected, you may submit a complaint to the CNIL.

You may also be interested in

  • Read Why RWD in rare diseases needs a common data model 3 RWD challenges for pharma aiming to modernise rare disease research

    3 RWD challenges for pharma aiming to modernise rare disease research

    There are multiple challenges when dealing with Real-World Data (RWD) in rare diseases, from how to organize teams and plan operational capabilities, to small sample sizes introducing statistical limitations and data protection restrictions, to RWD...

    Read the Article
  • Read Why RWD in rare diseases needs a common data model RWD and RWE: 6 opportunities in rare diseases

    RWD and RWE: 6 opportunities in rare diseases

    Real-World Data (RWD) and Real-World Evidence (RWE) present several opportunities in application to rare diseases, including supplementing clinical trials data for drug approval applications, better understanding the disease itself, expanding the...

    Read the Article