When it comes to rare diseases, Real-World Data (RWD) precision can be lost when aggregating samples across countries, and there is a strong case to create a common data model in Europe to form a single population where multiple autonomous sources of data are combined into a federated database.
“The most interesting data source to improve patient pathways and avoid delayed diagnoses would be a biomedical data warehouse combined from different care providers, but we don’t have that yet,” says Prof. Pierre-Antoine Gourraud, “We are debating about how to combine multiple databases. The use of synthetic data is most promising. We have been using the Avatar methods for the past three years. That would be extremely powerful, to see the flow of patients in one location to the other.”
Currently, in many rare disease data queries, the data in some countries is unable to be used, because the sample size of patients falls below a threshold number, and there is an inherent risk of patient identification happening in small groups. A common data model would alleviate concerns about patient anonymity by increasing the population size and would increase data quality and consistency.
Not only would sample sizes be improved, but hospital data in the future could combine information on genomics and imaging using consistent coding and structured in a more accessible way, which would improve the richness of data available to researchers when drilling down into the data, delivering more meaningful results.
“The future is “In patient” Electronic Medical Records data, plus genomics and imaging,” says Gilles Paubert, but for these to be added the data must be coded and structured. “In the future we’ll see a federated data model, which will avoid downloading data from an hospital, but will enable working within a system that connects multiple hospitals.”
However, before this can happen a great deal of work is needed to ensure consistency of coding and structure within the data, and to encourage change from “paper” data capture to electronic data capture that is easy to use and saves clinicians’ time.
This perspective is shared by Prof. Pierre-Antoine Gourraud,
“About 60% to 80% what we do in our biomedical database at the hospital is to look for a number of patients that correspond to a couple of criteria. People are looking for a particular type or of set of patients and we then combine the coded data. I think the combination of criteria to identify the patients, as well as to better define assumptions, is what would really what make a big difference in the future.”
Federated or “virtual” databases consist of interconnected databases that remain autonomous and can be geographically decentralized, with no data integration occurring in the disparate databases. Users can store and retrieve data with a single query through data abstraction via a uniform user interface, and the federated database queries each of the constituent database management systems, compositing the results of the queries to give a combined result.
Federated databases allow for a type of analysis where the data stays where it has been produced, which protects data privacy.
“Many people are very excited about distributed analysis. Which is a great example of technology where the data stays where it has been produced and the analyses travel. It leaves the privacy protection and responsibility to the institution hosting the data. It allows for more control over who has access to the data and what they do with it, for example in order to perform analytics and AI processes on the data,” says Prof. Pierre-Antoine Gourraud