Data preservation

ARCHIVING DATA

Generating data and software as well as publishing results in scientific journals take a lot of effort. Thus, it is important to properly archive your research output. Archiving data means that you ensure that a copy of your dataset is kept in a secure location for the long term (10 years or more).

Research data is all information, digital and non-digital, generated as part of the scientific process, on which scientific conclusions are based. In the research data lifecycle, it is important to differentiate between the stage where data is actively undergoing analysis (referred to as mutable data) and the phase where research data has been processed and has reached a stable state (known as immutable data). A similar differentiation is also applicable to storage and archiving systems, as they entail distinct functional and technical prerequisites.

Deciding which research data to archive involves careful evaluation of different factors to ensure that valuable and relevant data are preserved for future use. Here's a step-by-step process to help you make informed decisions about archiving research data:

Define Archiving Goals and Criteria

Identify the purpose of archiving: Is it for research integrity, future reference, data sharing, reproducibility, or compliance with funding or institutional requirements? (See in the table below which files need to be saved for the purpose of research integrity and reuse of data.)
Determine the criteria for data selection: Consider factors such as data quality, significance, uniqueness, potential for reuse, and alignment with your research objectives.

Assess Data Value and Significance

Evaluate the importance of the data to your research outcomes and conclusions.
Consider the potential value of the data to other researchers, disciplines, or future studies.
Prioritize data that underpin published findings or have the potential to contribute to new insights.

Consider Ethical and Legal Considerations

Ensure that the data to be archived comply with ethical standards, privacy regulations, and data protection laws.
Anonymize or de-identify sensitive or personal information as necessary.
Obtain necessary permissions or consents for data sharing and archiving, especially for human subjects data.

Assess Data Quality and Documentation

Ensure that the data are well-documented, properly organized, and adequately described.
Include metadata that provide context, methods, variables, and any relevant information for understanding the data.

Evaluate Reusability and Reproducibility

Choose data that can be easily understood and reused by others, facilitating reproducibility and validation.
Consider whether the data and associated documentation are sufficient to replicate your research methods and findings.

Select Data with Long-Term Value

Focus on data that have enduring relevance, regardless of current trends or specific project timelines.
Prioritize data that contribute to broader scientific knowledge and have the potential for continued impact.

Include Raw and Processed Data

Whenever possible, archive both raw and processed data, as well as any intermediate results or transformations.
Raw data enable others to apply different analyses and methods, while processed data showcase your research approach.

Consider Data Granularity

Decide whether to archive comprehensive datasets or subsets that are most relevant to specific research questions or areas.

Involve Collaborators and Stakeholders

Discuss data archiving decisions with co-authors, collaborators, or other stakeholders to ensure a shared understanding and agreement.

Document Archiving Decisions

Maintain clear records of the data archiving process, including the rationale for selecting or excluding specific datasets.

Use Data Repositories

Choose reputable data repositories that align with your field of research and offer appropriate metadata and access options.

Review and Update

Regularly review and update your archiving decisions to ensure they align with evolving research goals and practices.

Two factors come into play when determining the data to be archived at a minimum: the motivation to preserve data for research integrity and the desire to facilitate data sharing among fellow researchers.

Perspective of scientific integrity	Perspective of reuse of data
All raw, processed and analysed data	Final versions of analysed data If possible: also raw and processed data
Documentation (i.e., codebooks, lab journals, protocols, etc.) necessary for understanding the data	Documentation (i.e., codebooks, lab journals, protocols, etc.) necessary for understanding the data
Readme.txt file to help others understand the contents and purpose of the associated files or code	Readme.txt file to help others understand the contents and purpose of the associated files or code
Informed consents forms	Template of informed consent form used in a study
Approval letter from the Ethical Review Board
If applicable: Data Management Plan

FAQ

What is the difference between raw, processed, and analysed data?
Raw data are the original data that you have collected but have not yet processed or analysed. For instance: audio files, archives, observations, field notes and data from experiments. Data you have not collected yourself and that you are reusing, may be considered raw data.
Processed data are the data that you have digitised, translated, transcribed, cleaned, validated, checked and/or anonymised.
Analysed data are the models, graphs, tables, texts and so on that you have created based on the raw and the processed data, and that are intended to aid in the discovery of useful information, the presentation of conclusions, and decision-making

Does TU/e have an archive for research data?
No. At the moment there is no archive at TU/e for long-term storage of research data.

Data preservation

ARCHIVING DATA

FAQ

Need help?