Store and Document Data with Opal

What is Opal?

Opal is OBiBa's core data managment application. This server application provides all the necessary tools to import, transform and describe data. Subject’s identifiers can also be managed at data import and export time.

Analysis

Thanks to its integration with R , complex statistical analysis and reports can be performed. The implementation of the DataSHIELD process allows advanced statistical data analysis across multiple studies without sharing and disclosing any individual-level data.

Integration

Being integrated with Amber and Mica, studies using Opal can seamlessly and securely import data collected with Amber. They can also create web data portals with Mica that query Opal databases to obtain real-time aggregated reports on subject's data.

Secured REST web services are also available allowing to automate server management (Python command line tools) or to access to data (from R or Python, or any tools that are web-capable).

Features

Data Warehouse

Here are some of the main features of the Opal’s data warehouse technologies:

  • Store data on an unlimited number of variables,
  • Support MongoDB , Mysql , MariaDB and PostgreSQL as database software backend,
  • Customized variable dictionaries,
  • Import data from CSV, SPSS, SAS, Stata files and from SQL databases,
  • Export data to CSV, SPSS, SAS, Stata files and to SQL databases,
  • Incremental data importation,
  • Connect directly to multiple data source software such as SQL databases and LimeSurvey ,
  • Store data about any type of "entity", such as subject, sample, geographic area, etc.,
  • Store data of any type (e.g., texts, numbers, geo-localisation, images, videos, etc.),
  • Import and store genotype data as VCF files (Variant Call format ),
  • Advanced indexing functionality using ElasticSearch ,
  • SQL API for selecting, filtering, grouping, joining table's data.

Resources

Resources are datasets or computation units which location is described by a URL and access is protected by credentials. When assigned to a R/DataSHIELD server session, remote big/complex datasets or high performance computers are made accessible to data analysts. Opal provides an interface for managing the access to the resources and assigning them to a R/DataSHIELD server session, in integration with the resourcer R package. When using resources, the Opal installation is very light-weight as no database and no import process is required: the data are accessed where they are originaly located, from the R server.

Learn more

Views and Derived Variables

Opal provides the software infrastructure to create virtual tables called "views" of derived variables that can be persisted on disk or exported into files. Main features are:

  • Comprehensive JavaScript library of util functions commonly used to derive new variables (e.g. unit conversion) See Magma Javascript API .
  • User-friendly interfaces to recode variables without programming,
  • Instant summary statistics computation of the new derived variables.

Privacy, Confidentiality and Security

Opal provides a state-of-the-art software infrastructure for data encryption, participant identifiers management and user authentication/authorization. Main features are:

  • Public Key Infrastructure (PKI) allowing Opal to manage public-private key pairs for encrypting and decrypting data,
  • Authentication using either certificates, username/password or token mechanisms,
  • Integration with any OpenID Connect providers,
  • Advanced participant identifiers manager enabling multiple identifiers per participant,
  • Distinct and highly secure database for storing participant identifiers,
  • Granular permission management down to the variable level,
  • REST web services using HTTPS protocol.

Opal File System

Studies's operations involve file management and exchanges. Opal comes with its own file system to facilitate these processes. Main features are:

  • Centralized and file management,
  • Access controls.

Genotypes

Genotyping data can be stored in Opal as VCF files (Variant Call format ). This functionality is available as a plugin . Main features are:

  • Support of VCF and BCF formats,
  • Basic statistics,
  • Sample-participant mapping,
  • Extraction of VCF files combined with phenotypes criteria.

R Interface

Opal includes a module enabling data statistical analysis using R. Main features are:

  • R server monitoring from Opal,
  • Secured data access from R,
  • Opal R package (opalr ),
  • DataSHIELD R packages,
  • Import R dataset into Opal,
  • Export Opal dataset into R,
  • Opal files management from R,
  • R server workspaces can be saved and restored.

Learn more

Python Interface

Opal includes a module enabling data statistical analysis using Python. Main features are:

  • Secured data access from Python,
  • Opal Python package (obiba-opal ),
  • Import various file formats into Opal,
  • Export Opal dataset into Python,
  • Opal files management from Python.

Learn more

SQL API

Opal's tables can be queried with SQL:

  • Execute SQL from the web interface and download SQL output,
  • Execute SQL from the R client opalr R package,
  • Execute SQL from Python client sql command.

Learn more

Reporting

Opal leverages R advanced graphic and statistical capabilities by allowing the design of reports in R Markdown format. Main features are:

  • Scheduled Execution (with email notifications),
  • Advanced statistical analysis,
  • Advanced graphics,
  • Secured data access,
  • RStudio IDE can be used for designing reports.

Learn more

Indexing

Opal automatically indexes data imported in its embedded search engine (ElasticSearch ). This allows very fast retrieval and complex querying of the data. Main features are:

  • Real-time data dictionary search capability,
  • Real-time data faceted search capability,
  • Contingency tables.

Web Services (API)

Opal is built on REST web services: everything is accessible through an URL. Any client that can make an HTTPs request can be a client to an Opal server. Main features are:

  • The resources can be obtained in JSON or binary form (Protobuf),
  • Client authentication can be done by providing username/password credentials or a token or by establishing a Two-way SSL authentication ,
  • Clients are already available in Javascript, R, Python and Php.

Download

Requirements

Opal is a Java-based application, so it should run on any platform for which a Java Virtual Machine is provided.

Opal is a stand-alone web server application, therefore does not require web server containers such as Tomcat or Jetty to be installed.

Instructions

Detailed installation instructions can be found in Opal Installation Guide . We provide packages for Debian-based systems (Debian, Ubuntu, etc.) and Fedora-based systems (Fedora, CentOS, etc.). We strongly suggest to use these packages as it greatly simplifies the installation and the upgrades.

For Debian-based systems, see instructions in our Debian package repository .

Opal [latest] (.deb)

For Fedora-based systems, see instructions in our RPM package repository .

Opal [latest] (.rpm)

All other platforms should follow the installation instructions provided in Opal Installation Guide .

Opal [latest] (.zip)

Configuration

Once Opal is installed, it can be configured to match your needs and environment. Please follow the instructions provided in the Opal Configuration Guide .

Try Opal

To have a closer look at Opal try our demo site , with one of the credentials:

  • Administrator: administrator/password
  • DataSHIELD user: dsuser/P@ssw0rd