Research Guides: Finding Lost Government Data: Preserve & Protect Your Data

How can I preserve and protect my data?

If you’re working with research data, it’s important to keep it safe and usable—not just for now, but for the future too. Here are a few tips to help you preserve and protect your data:

Back it up! Use the 3-2-1 rule: keep 3 copies of your data, on 2 different types of storage (like an external hard drive and the cloud), and store 1 copy off-site, so in a different geographic location from where you are now.
Use open formats. Save your files in simple, non-proprietary formats like CSV or TXT so they stay accessible over time - no matter what software you are using in the future.
Write it down. Consider adding a README file or clear notes that explain what your data is, how you collected it, and what the different columns or variables mean. Trust me, future-you will thank you.
Use a trusted repository. When you’re done with your project, upload your dataset to a trusted platform. These sites make your data secure, easy to find, and citable.

What are data repositories?

A data repository is a digital place where datasets are stored, organized, and shared. Think of a repository like a digital filing cabinet where researchers store and share their research and data after a project is finished. It helps keep everything organized, finable, and open for others to use.

Depositing your research and datasets into a trusted data repository is beneficial to the long-term preservation and protection of your data because researchers, institutions, and funders need to make sure data is:

Safe from loss or tampering
Findable & usable by others in the future
Citable for academic credit
Compliant with legal and ethical standards

Before submitting your data to a repository, make sure the repository is considered to be “trusted.” A trusted data repository is a secure, reliable place where research data is stored, managed and made available for future use. It follows best practices to protect the data’s quality, ensure it stays accessible over time, and supports responsible data sharing practices.

So, what makes a repository a trusted data repository?

Follows international standards for data preservation and access
Uses secure storage and multiple backups distributed in separate geographic locations
Ensures metadata (descriptive information) is included
Offers long-term preservation and clear data use policies
May be certified (such as with CoreTrustSeal)

How do trusted data repositories work?

These are a few of the strategies used by trusted data repositories to preserve and protect research data for long-term usability, integrity, and accessibility:

1. Data Integrity & Authenticity

Checksums & Hashing: Repositories use cryptographic hashes to verify data hasn’t been altered or corrupted.
Version Control: Any changes to datasets are tracked and documented, often with previous versions retained for transparency.

2. Long-Term Preservation

File Format Migration: Data may be converted into standardized, non-proprietary formats (e.g. CSV, TIFF) to ensure future readability.
Replication & Redundancy: Data is stored in multiple geographic locations or mirrored across systems to prevent loss due to hardware failure or disasters.
Regular Audits: Periodic checks and audits ensure that stored data remains accessible and uncorrupted over time.

3. Access Control & Security

Authentication & Authorization: Only authorized users can access sensitive or embargoed data.
Encryption: Data is often encrypted during storage and transmission to prevent unauthorized access.
Compliance: Trusted repositories comply with legal, ethical, and regulatory standards like General Data Protection Regulations (GDPR) or HIPAA (where applicable).

4. Metadata & Documentation

Rich metadata: Descriptive, structural, and administrative metadata help with discoverability, reusability, and citation.
Persistent Identifiers: DOIs or other unique identifiers are assigned to datasets to ensure stable referencing.

5. Standards & Certifications

CoreTrustSeal, ISO 16363, or TRAC Certification: Many trusted repositories seek certification to demonstrate compliance with best practices for digital preservation and trustworthiness.

6. Data Curation & Stewardship

Human Oversight: Professional data curators review submissions to ensure completeness, quality, and adherence to repository policies.
Community Engagement: Repositories often collaborate with research communities to align with domain-specific standards.

What are common repositories I could use?

There are two main types of repositories:

General repositories: These are big, open to all kinds of research topics, and accept almost any type of research. They’re designed to make work easy to cite, share, and discover.
Subject or discipline-specific repositories: These are more specialized and focused on one subject area. They usually have rules about what kind of data they’ll accept, based on what makes the most sense for that field.

Most of these repositories use open licenses, like Creative Commons, which means the data is free for others to access and reuse (with credit, of course!).

The Amherst College Library maintains a list of reputable and commonly used open repositories. Browse our curated list of general cross-disciplinary and subject-specific data repositories on our Open Access Repositories page.

How do I find out more?

If you’re curious to learn more about open access and trusted data repositories, check out these resources:

Amherst College Library Open Access
Amherst College Library Open Access Repositories
Amherst College Library Copyright - Creative Commons
Federal Agency Funding Requirements
Fairsharing.org—curated, informative, and educational resource on data and metadata standards, inter-related to databases and data policies.