Research data submission guidelines

SMARTech, or Scholarly Materials And Research @ Georgia Tech, is a repository for the capture of the intellectual output of the Institute in support of its teaching and research missions. Datasets may be deposited into SMARTech in order to preserve them and make them accessible worldwide. Data archived in SMARTech will be made available through the web-accessible repository at no cost to the depositors or users. Because SMARTech is a repository for completed work, any submitted data should be in their final form – SMARTech is not storage or collaborative space for works in progress.


Steps to data deposit

To deposit data into SMARTech, you must complete the following four things:

  1. Ensure that your data are eligible to be deposited into SMARTech by reviewing the information below. SMARTech is an open repository, so sensitive data should not be deposited into SMARTech. Additionally, SMARTech is unable to accommodate datasets that are larger than 2 GB. 
  2. Review the SMARTech Public Deposit License, which grants Georgia Tech the right to distribute your data, sign it and return to the license to the Research Data Librarian (lizzy.rolando@library.gatech.edu). A scanned version of the signed license is acceptable.
  3. Complete the metadata form with information about your data. The information you provide here will be used to create the cataloging record that will make your data discoverable to others.
  4. Create a “README” file to include with your data. This file is necessary so that future users will understand your data.

What types of data may be submitted to SMARTech?

Although SMARTech is able to accommodate a variety of scholarly materials (more information about the mission and collecting policy of SMARTech), datasets that are acceptable for deposit into SMARTech are generally defined to be the digital information structured by formal methodology for the purpose of creating new research or scholarship that is conducted by Georgia Tech faculty, researchers, or students. Data may be in a variety of digital formats suitable for communication, interpretation, or processing. Examples include:

  • Observational data (e.g., sensor readings, survey instruments)
  • Experimental data (e.g., lab equipment readings)
  • Simulation data (e.g., climate models)
  • Derived or compiled data (e.g., compiled databases, text or data mining)

All materials considered for inclusion in ​SMARTech will be assessed using the following criteria: 

Content Considerations: Materials will be assessed on their enduring value and their fit within the collecting priorities of SMARTech.

Technical Considerations: Both the Library’s ability to commit to the preservation of the bitstream/digital item and the technical quality of the materials will be considered in accepting materials for inclusion in SMARTech. Zipped and/or tarred files are discouraged but may be used in the event that a dataset is too large or contains individual files that should be distributed as a bundle. Please contact the Research Data Librarian in advance if you plan to submit zipped or tarred files. The library is currently unable to accept files that are larger than 2GB in total. Multiple files may be deposited for one study, but the total volume of those files (including zipped/tarred files) cannot exceed 2GB. Considerations of digital file formats are discussed below. Depending on the size of the dataset, depositors are able to transfer data to the Library through a number of different mechanisms (for ex. Email, on a USB drive, etc.). Please contact us with any questions.

Legal Considerations: When depositing your data into SMARTech, you certify that you are legally able to do so and that making your data publicly accessible will not violate federal, state, city, or institution policy. Depositors should be careful to ensure that the content they submit contains no confidential or sensitive information. Confidential or sensitive information includes all information that personally identifies any individual or that contains any information classified as highly sensitive under state or federal law, or Georgia Tech policy. Please see the Georgia Tech Data Access Policy, Data Security Classification Handbook, and Data Protection Safeguards for more information.

Highly sensitive data (classified as Category III data according to Georgia Tech policies) includes personally identifiable information, biochemical information, or health information that reveals an individual’s health condition and/or history of health services use. Examples of this include:

1. Personal information that, if exposed, can lead to identity theft. “Personal information” means the first name or first initial and last name in combination with and linked to any one or more of the following data elements about the individual:

  • Social security number
  • Driver’s license number or state identification card number issued in lieu of a driver’s license number
  • Passport number
  • Financial account number, or credit card or debit card number

2. Health information, also known as “protected health information (PHI),” which includes health records combined in any way with one or more of the following data elements about the individual:

  • Names
  • All geographic subdivisions smaller than a state, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of the Census the geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 people, and the initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000
  • All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older
  • Telephone numbers
  • Fax numbers 
  • Electronic mail addresses
  • Social security numbers 
  • Medical record numbers
  • Health plan beneficiary numbers
  • Account numbers
  • Certificate/license numbers
  • Vehicle identifiers and serial numbers, including license plate numbers
  • Device identifiers and serial numbers
  • Web Universal Resource Locators (URLs)
  • Internet Protocol (IP) address numbers
  • Biometric identifiers, including finger and voice prints
  • Full face photographic images and any comparable images
  • Any other unique identifying number, characteristic, or code

Digital File Formats: SMARTech accepts all file formats, although depending on the format of the file, SMARTech may be limited in its ability to preserve the digital file and end users may be limited in their ability to use it. Further, action may be taken to normalize or transform the submitted data into a different format. In the case that normalization or transformation is necessary, the original data file (in the original format) will be included in the repository alongside the transformed file (in the new format). To the best of your abilities, deposit data in formats that are “Supported” or “Known.”

Supported

When an item’s format is public and open as is the case with formats such as Adobe PDF, HTML, JPEG, or AIFF, it is categorized as a “Supported format.” Items in this category can be used in the future through migration or emulation and the Library makes a commitment to do so.

  • Adobe PDF (pdf)
  • XML (xml)
  • HTML (html, htm)
  • Rich Text (rtf)
  • Text (txt)
  • Post Script (ps, eps, ai)
  • GIF (gif) PNG (png)
  • JPEG (jpg, jpeg)
  • TIFF (tif, tiff)
  • WAV (wav)
  • MPEG (mpa, abs, mpeg)
  • AIFF (aiff, aif, aifc)

Known

When an item is submitted in a proprietary format it is categorized as “Known.” This category indicates that the specifics of the program code for that format are not public but the format is so widely used that the ability to use it in the future is almost certain.

  • RealAudio (ra, ram)
  • Basic (au, snd)
  • Microsoft Excel (xls)
  • Microsoft Project (mpp, mpx, mpd)
  • Microsoft Visio (vsd)
  • FileMaker/FMP3 (fm)
  • LateX (latex)
  • Mathematica (ma)
  • Tex (tex)
  • TeXdvi (dvi)
  • Video Quicktime (mov, qt)
  • BMP (bmp)
  • Adobe Photoshop (pdd, psd)
  • Microsoft Powerpoint (ppt)
  • Photo CD (pcd)
  • Microsoft Word (doc)
  • WordPerfect (wpd)
  • SGML (sgml) 

Unsupported

“Unsupported” formats are those that the Library cannot commit to converting to some usable form in the future. In consultation with the depositor, a decision will be made as to including the item in SMARTech and if it is accepted, readable descriptive information will be included. In the case of unsupported formats, the Library will request that the item also be submitted in a supported or known format, if it is at all possible to do so.