Archiving data

Generating data and software as well as publishing results in scientific journals take a lot of effort. Thus, it is important to properly archive your research output. Archiving data means that you ensure that a copy of your dataset is kept in a secure location for the long term (10 years or more).

Research data is all information, digital and non-digital, generated as part of the scientific process, on which scientific conclusions are based. In the research data lifecycle, it is important to differentiate between the stage where data is actively undergoing analysis (referred to as mutable data) and the phase where research data has been processed and has reached a stable state (known as immutable data). A similar differentiation is also applicable to storage and archiving systems, as they entail distinct functional and technical prerequisites.

Deciding which research data to archive involves careful evaluation of different factors to ensure that valuable and relevant data are preserved for future use. Here's a step-by-step process to help you make informed decisions about archiving research data:

Define Archiving Goals and Criteria

  • Identify the purpose of archiving: Is it for research integrity, future reference, data sharing, reproducibility, or compliance with funding or institutional requirements? (See in the table below which files need to be saved for the purpose of research integrity and reuse of data.)
  • Determine the criteria for data selection: Consider factors such as data quality, significance, uniqueness, potential for reuse, and alignment with your research objectives.

 

Assess Data Value and Significance

  • Evaluate the importance of the data to your research outcomes and conclusions.
  • Consider the potential value of the data to other researchers, disciplines, or future studies.
  • Prioritize data that underpin published findings or have the potential to contribute to new insights.

 

Consider Ethical and Legal Considerations

  • Ensure that the data to be archived comply with ethical standards, privacy regulations, and data protection laws.
  • Anonymize or de-identify sensitive or personal information as necessary.
  • Obtain necessary permissions or consents for data sharing and archiving, especially for human subjects data.

 

Assess Data Quality and Documentation

  • Ensure that the data are well-documented, properly organized, and adequately described.
  • Include metadata that provide context, methods, variables, and any relevant information for understanding the data.

 

Evaluate Reusability and Reproducibility

  • Choose data that can be easily understood and reused by others, facilitating reproducibility and validation.
  • Consider whether the data and associated documentation are sufficient to replicate your research methods and findings.

 

Select Data with Long-Term Value

  • Focus on data that have enduring relevance, regardless of current trends or specific project timelines.
  • Prioritize data that contribute to broader scientific knowledge and have the potential for continued impact.

 

Include Raw and Processed Data

  • Whenever possible, archive both raw and processed data, as well as any intermediate results or transformations.
  • Raw data enable others to apply different analyses and methods, while processed data showcase your research approach.

 

Consider Data Granularity

  • Decide whether to archive comprehensive datasets or subsets that are most relevant to specific research questions or areas.

 

Involve Collaborators and Stakeholders

  • Discuss data archiving decisions with co-authors, collaborators, or other stakeholders to ensure a shared understanding and agreement.

 

Document Archiving Decisions

  • Maintain clear records of the data archiving process, including the rationale for selecting or excluding specific datasets.

 

Use Data Repositories

  • Choose reputable data repositories that align with your field of research and offer appropriate metadata and access options.

 

Review and Update

  • Regularly review and update your archiving decisions to ensure they align with evolving research goals and practices.

Two factors come into play when determining the data to be archived at a minimum: the motivation to preserve data for research integrity and the desire to facilitate data sharing among fellow researchers.

 

Perspective of scientific integrity

Perspective of reuse of data

All raw, processed and analysed data

Final versions of analysed data
If possible: also raw and processed data

Documentation (i.e., codebooks, lab journals, protocols, etc.) necessary for understanding the data

Documentation (i.e., codebooks, lab journals, protocols, etc.) necessary for understanding the data

Readme.txt file to help others understand the contents and purpose of the associated files or code

Readme.txt file to help others understand the contents and purpose of the associated files or code

Informed consents forms

Template of informed consent form used in a study

Approval letter from the Ethical Review Board

 

If applicable: Data Management Plan