With over 30 years of combined experience working with various pharmaceutical laboratories in the field of biometrics, I currently holds a biostatistics and data science position at Roche France within the real-world data unit. Since 2019, I am also responsible for data sharing with researchers requesting access to clinical studies generated by Roche France, in order to make the secondary use of our health data a reality.
Background: To estimate remaining data utility, we evaluated three data strategies: Anonymization, Federated Approaches, and OMOP-CDM transformation.
Methods: CDISC-SDTM Data from a retrospective HER2+ breast cancer study (73 variables) were anonymized and mapped to OMOP-CDM. Using DataSHIELD, we tested a federated approach by splitting SDTM and OMOP databases into three samples. Statistical analyses (descriptive statistics, regression methods, survival analyses) for each method were compared against the raw CDISC-SDTM gold standard, focusing on information loss, consistency, and reproducibility.
Results: None of the anonymization methods successfully reproduced all statistical analyses. The federated approach demonstrated good consistency but showed decreased accuracy in multivariate models due to database variability. Conversely, CDISC-SDTM was successfully mapped to OMOP-CDM, showing high statistical concordance.Conclusions: Whilst data was successfully mapped to OMOP, utility was reduced when further privacy preserving methods were applied. A trade-off has to be found between privacy and usefulness of data.