Archiving and preserving your research data involves more than keeping your data files on your lab server. In addition to capturing information about your data, you should consider the following:
- What formats will you choose to store your data?
- How long will you retain your data?
- Where can you archive your data? How can you ensure the repository or archive you've chosen is trustworthy?
The file format in which you keep your data is a primary factor in one's ability to use your data in the future. As technology continually changes, researchers should plan for both hardware and software obsolescence. How will your data be read if the software used to produce them becomes unavailable?
Formats more likely to be accessible in the future are:
- Open, documented standards
- Commonly used by a research community
- Standard representations (ASCII, Unicode)
- Uncompressed (If you need to compress files to conserve space, limit compression to your 3rd backup copy.)
Consider migrating your data into a format with the above characteristics, in addition to keeping a copy in the original software format.
Examples of preferred format choices:
- PDF/A, not Word
- ASCII or CSV, not Excel
- MPEG-4, not Quicktime
- TIFF or JPEG2000, not GIF or JPG
- XML or RDF, not RDBMS
All research data collected or generated as part of a government sponsored program should be retained for a minimum of 3 years from the end of the project, in order to comply with potential FOIA requests. If you collect data about humans, animals, or agricultural products, you must retain your data in accordance with the Georgia Board of Regents Records Retention Schedule. The policy specifies the retention periods for many research related records, in addition to certain types of research data, and should be reviewed by everyone involved in a research project.
|Type of Research Data||Retention Period (data from projects that are not of major significance)||Retention Period (data from projects of national or international significance, interest, or controversy)|
|Animal Care and Use||3 years||Permanent|
|Human Subjects||70 years (if there potential long-term effects to human subjects)||Permanent|
|Agricultural||70 years (if project has potential long-term environmental effects)||Permanent Comments|
Science and Engineering Data Repositories
For a more complete list of data repositories, see DataBib, a searchable catalog of research data repositories.
Data that have been created at Georgia Tech or by GT researchers, in any discipline, can be archived in SMARTech, the GT repository created to capture, distribute, and preserve digital products of faculty and researchers. Authors can archive their digital works in a variety of formats, including datasets. For more information on how to deposit data into SMARTech, please review the submission guidelines or contact us.
Factors for Evaluating Data Repositories
While choosing to deposit your data into a repository is a great decision for preservation and access, not all data repositories are alike. When deciding where to deposit your research data, there are several factors to consider.
- How is the repository sustained? What is their business model? Is this a recently established repository or has it been around for awhile?
- Is there evidence of an explicit institutional commitment to preservation?
- What is the repository's preservation policy or plan?
- Has the repository worked to ensure compliance with OAIS Reference Model (this may also be referred to as the Trusted Repository Audit & Checklist (TRAC) or ISO 16363)?
- Who is the audience for the repository>
If you have questions about a repository and whether they are a suitable home for your research data, contact them and ask about how they will preserve and disseminate your data. In many cases, the repository will want to know in advance if you plan to archive your data with them, and they will appreciate hearing from you. Additionally, they may be able to help with your data management plan.