Skip to content

Metadata Collection

To ensure legal compliance, data interoperability, and reusability, ELIXIR Luxembourg requires specific metadata and documentation to accompany each dataset submission. This section outlines the required and recommended materials to prepare before uploading your data.

Data Information Sheet

The Data Information Sheet is used to document any access restrictions associated with the cohort data you are submitting.

Note

This sheet is typically completed during the legal negotiation phase and included as an annex to the data sharing agreement.

Data information sheet defines list of unique datasets which will be further handled by ELIXIR Luxembourg independently of each other. Every dataset will receive an unique accession number and its own item in ELIXIR Data Catalogue. Users will have the option to request each dataset according to the principle of data minimisation.

Tip

Define datasets based on its re-usability. E.g. in case a clinical dataset is always required in order to interpret results of sequencing, these two data types shall form one dataset.

Supporting Documentation

To promote data interoperability and transparency, the following documents are requested for each dataset:

Data Dictionary

Submit a comprehensive data dictionary that includes:

  • A list of all variables
  • Descriptions and value ranges
  • Notes on any de-identification procedures applied

If variables were modified during de-identification (e.g., pseudonymisation, randomisation), describe the method used for each affected variable.

While not mandatory, the following documents are strongly recommended. They enhance trust in the dataset and help users assess its suitability and compliance risks.

Data De-identification Procedure

Due to varying interpretations of data protection laws across jurisdictions, dataset providers should describe the de-identification process used. This helps users evaluate risks and align data classification with local policies.

Include details such as:

  • Aggregation methods
  • Removal of direct identifiers
  • Randomisation of subject codes
  • Any other anonymisation techniques

For guidance, refer to the European Data Protection Board’s pseudonymisation guidelines.

To verify lawful data collection and ensure compliance with data subject consent, users often request access to consent form templates.

Note

ELIXIR Luxembourg only collects templates, not signed consent forms.

Checklist for Preparing Data Files

To meet ELIXIR Luxembourg’s standards for data quality, FAIR principles, and long-term usability, please follow these guidelines when preparing your files:

File Naming and Organization

  • Use clear, consistent, and descriptive filenames.
  • Avoid spaces, accented characters, or special symbols—only hyphens and underscores are allowed.
  • Apply the same naming principles to variable names.

Refer to the RDMkit file naming guide for best practices.

File Formats

Use non-proprietary, widely accepted formats that support long-term preservation and reuse.

Checksums

Include a checksums.sha256 file in the root folder containing SHA256 checksums for all submitted files. This ensures data integrity.

See the FAIR Cookbook checksum recipe for instructions.

README File

A README.md or README.txt file should be placed in the root folder. It should describe:

  • Folder structure
  • Common variables
  • Data sources
  • Any other relevant context

Refer to the RDMkit README guide for more details.

Note

Additional documentation or file formats may be requested to comply with legal requirements or to meet interoperability needs specific to the dataset’s intended use.