Research Data Management

Data Management Plans

When creating a Data Management Plan (DMP) for a grant proposal, always check the specific funding agency or program solicitation for guidelines. See our Federal Agency Public Access Policies Research Guide for information about agency guidelines. Remember that you can use the DMPTool to help develop your plan.

In general, there are six core components to a DMP:

  1. Description of data to be produced or collected, including data standards or formats
  2. Identification of protocols or workflows to help manage data throughout the project
  3. Description of documentation and metadata standards to describe the data
  4. Plan for short-term data storage & backup, including necessary security measures
  5. Plan for sharing data, including legal and ethical issues, intellectual property issues, or access policies and provisions
  6. Plan for data preservation, archiving, and long-term access

Share & Archive your Data

Making your data openly accessible is not only important for ensuring scientific integrity and promoting open inquiry, but it is also required and highly encouraged by funding agencies and publishers.

For many disciplines, research data are commonly deposited in and shared through a disciplinary repository. Review the Registry of Research Data Repositories to determine whether an appropriate repository is available to you.

Through the SMARTech Repository, we are able to support the long-term preservation and sharing of certain types of research data. SMARTech is an open access repository, so cannot accommodate any proprietary or otherwise confidential research data.

The Georgia Tech Library is able to support the long-term preservation and sharing of certain types of research data in the SMARTech Repository. Review the SMARTech Data Submission Guidelines.

Archiving and preserving your research data involves more than keeping your data files on your lab server. In addition to capturing information about your data, you should consider the following:

Choosing File Formats for Preservation

The file format in which you keep your data is a primary factor in one's ability to use your data in the future. As technology continually changes, researchers should plan for both hardware and software obsolescence. How will your data be read if the software used to produce them becomes unavailable?

Formats more likely to be accessible in the future are:

  • Non-proprietary
  • Open, documented standards 
  • Commonly used by a research community
  • Standard representations (ASCII, Unicode)
  • Unencrypted
  • Uncompressed (If you need to compress files to conserve space, limit compression to your 3rd backup copy.)

Consider migrating your data into a format with the above characteristics, in addition to keeping a copy in the original software format.

Examples of preferred format choices:

  • PDF/A, not Word
  • ASCII or CSV, not Excel
  • MPEG-4, not Quicktime
  • TIFF or JPEG2000, not GIF or JPG
  • XML or RDF, not RDBMS

Research Data Retention Periods

All research data collected or generated as part of a government sponsored program should be retained for a minimum of 3 years from the end of the project, in order to comply with potential FOIA requests. If you collect data about humans, animals, or agricultural products, you must retain your data in accordance with the Georgia Board of Regents Records Retention Schedule. The policy specifies the retention periods for many research related records, in addition to certain types of research data, and should be reviewed by everyone involved in a research project. 

Type of Research Data
Retention Period (data from projects that are not of major significance)
Retention Period (data from projects of national or international significance, interest, or controversy)

Animal Care and Use
3 years 
Permanent

Human Subjects
70 years (if there potential long-term effects to human subjects)
Permanent

Agricultural
70 years (if project has potential long-term environmental effects)
Permanent Comments 

Depositing your data in a research data repository will facilitate its discovery and preservation. 

Science and Engineering Data Repositories

Astronomy

Atmospheric Science

Biology

Chemistry

Earth Science

Earthquake Engineering/Seismology

Nanotechnology

Oceanography

Space Science

Social Sciences

For a more complete list of data repositories, see DataBib, a searchable catalog of research data repositories.

Data that have been created at Georgia Tech or by GT researchers, in any discipline, can be archived in SMARTech, the GT repository created to capture, distribute, and preserve digital products of faculty and researchers. Authors can archive their digital works in a variety of formats, including datasets. For more information on how to deposit data into SMARTech, please review the submission guidelines or Susan Parham our Research Data expert. 

Factors for Evaluating Data Repositories

While choosing to deposit your data into a repository is a great decision for preservation and access, not all data repositories are alike. When deciding where to deposit your research data, there are several factors to consider.

  • How is the repository sustained? What is their business model? Is this a recently established repository or has it been around for awhile?
  • Is there evidence of an explicit institutional commitment to preservation?
  • What is the repository's preservation policy or plan?
  • Has the repository worked to ensure compliance with OAIS Reference Model (this may also be referred to as the Trusted Repository Audit & Checklist (TRAC) or ISO 16363)?
  • Who is the audience for the repository

If you have questions about a repository and whether they are a suitable home for your research data, contact them and ask about how they will preserve and disseminate your data. In many cases, the repository will want to know in advance if you plan to archive your data with them, and they will appreciate hearing from you. Additionally, they may be able to help with your data management plan.

SMARTech Data Submission Guidelines

SMARTech, or Scholarly Materials And Research @ Georgia Tech, is a repository for the capture of the intellectual output of the Institute in support of its teaching and research missions. Datasets may be deposited into SMARTech in order to preserve them and make them accessible worldwide. Data archived in SMARTech will be made available through the web-accessible repository at no cost to the depositors or users. Because SMARTech is a repository for completed work, any submitted data should be in their final form – SMARTech is not storage or collaborative space for works in progress.

To deposit data into SMARTech, you must complete the following four things:

  1. Ensure that your data are eligible to be deposited into SMARTech by reviewing the information below. SMARTech is an open repository, so sensitive data should not be deposited into SMARTech. Additionally, SMARTech is unable to accommodate datasets that are larger than 2 GB. 
  2. Review the SMARTech Public Deposit License, which grants Georgia Tech the right to distribute your data, sign it and email the license to smartech@library.gatech.edu. A scanned version of the signed license is acceptable.
  3. Prepare the metadata and documentation required for your deposit. This information will be used to create a SMARTech record so others can locate your data, and it will be used to create a README file, which will help future users understand your data and how they might use them in their own work.

What types of data may be submitted to SMARTech?

Although SMARTech is able to accommodate a variety of scholarly materials (more information about the mission and collecting policy of SMARTech), datasets that are acceptable for deposit into SMARTech are generally defined to be the digital information structured by formal methodology for the purpose of creating new research or scholarship that is conducted by Georgia Tech faculty, researchers, or students. Data may be in a variety of digital formats suitable for communication, interpretation, or processing. Examples include:

  • Observational data (e.g., sensor readings, survey instruments)
  • Experimental data (e.g., lab equipment readings)
  • Simulation data (e.g., climate models)
  • Derived or compiled data (e.g., compiled databases, text or data mining)

All materials considered for inclusion in ​SMARTech will be assessed using the following criteria: 

Content Considerations: Materials will be assessed on their enduring value and their fit within the collecting priorities of SMARTech.

Technical Considerations: Both the Library’s ability to commit to the preservation of the bitstream/digital item and the technical quality of the materials will be considered in accepting materials for inclusion in SMARTech. Zipped and/or tarred files are discouraged but may be used in the event that a dataset is too large or contains individual files that should be distributed as a bundle. Please contact Susan Parham in advance if you plan to submit zipped or tarred files. The library is currently unable to accept files that are larger than 2GB in total. Multiple files may be deposited for one study, but the total volume of those files (including zipped/tarred files) cannot exceed 2GB. Considerations of digital file formats are discussed below. Depending on the size of the dataset, depositors are able to transfer data to the Library through a number of different mechanisms (for ex. Email, on a USB drive, etc.). Please contact Susan Parham with any questions.

Legal Considerations: When depositing your data into SMARTech, you certify that you are legally able to do so and that making your data publicly accessible will not violate federal, state, city, or institution policy. Depositors should be careful to ensure that the content they submit contains no confidential or sensitive information. Confidential or sensitive information includes all information that personally identifies any individual or that contains any information classified as highly sensitive under state or federal law, or Georgia Tech policy. Please see the Georgia Tech Data Access Policy, Data Security Classification Handbook, and Data Protection Safeguards for more information.

Highly sensitive data (classified as Category III data according to Georgia Tech policies) includes personally identifiable information, biochemical information, or health information that reveals an individual’s health condition and/or history of health services use. Examples of this include:

1. Personal information that, if exposed, can lead to identity theft. “Personal information” means the first name or first initial and last name in combination with and linked to any one or more of the following data elements about the individual:

  • Social security number
  • Driver’s license number or state identification card number issued in lieu of a driver’s license number
  • Passport number
  • Financial account number, or credit card or debit card number

2. Health information, also known as “protected health information (PHI),” which includes health records combined in any way with one or more of the following data elements about the individual:

  • Names
  • All geographic subdivisions smaller than a state, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of the Census the geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people, and the initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000
  • All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older
  • Telephone numbers
  • Fax numbers 
  • Electronic mail addresses
  • Social security numbers 
  • Medical record numbers
  • Health plan beneficiary numbers
  • Account numbers
  • Certificate/license numbers
  • Vehicle identifiers and serial numbers, including license plate numbers
  • Device identifiers and serial numbers
  • Web Universal Resource Locators (URLs)
  • Internet Protocol (IP) address numbers
  • Biometric identifiers, including finger and voice prints
  • Full face photographic images and any comparable images
  • Any other unique identifying number, characteristic, or code

Digital File Formats: SMARTech accepts all file formats, although depending on the format of the file, SMARTech may be limited in its ability to preserve the digital file and end users may be limited in their ability to use it. Further, action may be taken to normalize or transform the submitted data into a different format. In the case that normalization or transformation is necessary, the original data file (in the original format) will be included in the repository alongside the transformed file (in the new format). To the best of your abilities, deposit data in formats that are “Supported” or “Known.”

Supported

When an item’s format is public and open as is the case with formats such as Adobe PDF, HTML, JPEG, or AIFF, it is categorized as a “Supported format.” Items in this category can be used in the future through migration or emulation and the Library makes a commitment to do so.

  • Adobe PDF (pdf)
  • XML (xml)
  • HTML (html, htm)
  • Rich Text (rtf)
  • Text (txt)
  • Post Script (ps, eps, ai)
  • GIF (gif) PNG (png)
  • JPEG (jpg, jpeg)
  • TIFF (tif, tiff)
  • WAV (wav)
  • MPEG (mpa, abs, mpeg)
  • AIFF (aiff, aif, aifc)

Known

When an item is submitted in a proprietary format it is categorized as “Known.” This category indicates that the specifics of the program code for that format are not public but the format is so widely used that the ability to use it in the future is almost certain.

  • RealAudio (ra, ram)
  • Basic (au, snd)
  • Microsoft Excel (xls)
  • Microsoft Project (mpp, mpx, mpd)
  • Microsoft Visio (vsd)
  • FileMaker/FMP3 (fm)
  • LateX (latex)
  • Mathematica (ma)
  • Tex (tex)
  • TeXdvi (dvi)
  • Video Quicktime (mov, qt)
  • BMP (bmp)
  • Adobe Photoshop (pdd, psd)
  • Microsoft Powerpoint (ppt)
  • Photo CD (pcd)
  • Microsoft Word (doc)
  • WordPerfect (wpd)
  • SGML (sgml) 

Unsupported

“Unsupported” formats are those that the Library cannot commit to converting to some usable form in the future. In consultation with the depositor, a decision will be made as to including the item in SMARTech and if it is accepted, readable descriptive information will be included. In the case of unsupported formats, the Library will request that the item also be submitted in a supported or known format, if it is at all possible to do so.

Data Management Training

There are excellent resources that can be used to improve your data management skills:

If you are interested in Research Data Management Training, contact Susan Parham.