Proper data organization will enable you – and others too – to find the correct version of the correct dataset quickly and easily. Consistent and clear file naming, a logically arranged set of folders for storing data files and consistent version management will all be helpful.
File names will help you identify the correct datasets without having to open them. Good file names are consistent (based on file naming conventions), distinctive (distinguishing between various file versions and files with similar subjects) and indicative (meaningful).
A file name should comprise all descriptive information, not depending on the folder in which it has been stored.
The sequence of elements of which a file name is composed (subject, date, version, file type) is relevant if you wish to sort your data files in any specific order.
Avoid using special characters in file names: \ / : * ? < > | [ ] & $ . These may cause processing problems. Preferably replace spaces in file names by underscores. Finally: compile a readme file explaining the meaning of the file names, and keep it with your files.
Note: software for simultaneous renaming of multiple files is available, e.g. PSRenamer and Ant Renamer.
- More about file naming on the pages of Stanford Universuty Libraries
Not only are good file names important for quick and easy data file identification, so is a clear folder structure. Think of how you are going to arrange your research data (and other research-related documents) in folders and subfolders right from the start of your project. An arrangement in 3-4 layers will usually do.
Ways to arrange your data files into folders may be by file type, method (e.g. interview, survey, experiment, observation) or type of material (data, documentation, publications).
A very clear explanation of the how and why of version management, especially in case of collaboration with others, is given in the guide Managing and Sharing Data (p. 13-14) from the UK Data Archive.