Analyze with DataSHIELD

What is DataSHIELD?

Some research projects require pooling data from several studies to obtain sample sizes large enough for detecting interactions. Unfortunately, important ethico-legal constraints often prevent or impede the sharing of individual-level data across multiple studies.

DataSHIELD aims to address this issue. DataSHIELD is a method that enables advanced statistical analysis of individual-level data from several sources without actually pooling the data from these sources together. Visit DataSHIELD.org for more information.

In collaboration with the DataSHIELD team in University of Bristol, The OBiBa team built the complex software infrastructure required to run securely DataSHIELD analyses on data stored in Opal.

For detailed information about the implementation of the DataSHIELD method in Opal, see the Opal R and DataSHIELD User Guide .

Features

With DataSHIELD in Opal, studies can:

  • Create large virtual pooled datasets from multiple study harmonized datasets and run sophisticated analysis such as linear regressions.
  • Download and manage analytical DataSHIELD R packages from the DataSHIELD CRAN repository .
  • Set up fine user permissions for tables and variables allowing DataSHIELD analysis.
  • Set up authentication certificates between DataSHIELD clients and Opal.

A real example of DataSHIELD deployment is described in the BioSHaRE user story.