Data is More Important Than Hardware or Software

Time can pause and your heart thud rapidly upon the realization of data loss. Co-creator of the Internet, Vince Cerf, has let everyone know that even though you got data, you may not be able to read that data. You back up your files, year after year, and one day you try to access your files and are surprised to find out that you cannot read those files anymore. The reason is the computer programs or apps you used to create and view the files have evolved in a way that makes your files obsolete. As Vince Cerf shows, it happens.

Read-only Archival File Format

If you never need to change the file again, save it as a PDF. The Adobe PDF format is defacto standardized enough that they hold up very, very well. Even if the information changes, you can save snapshots of the data as PDF. This will hold you in good stead. Make sure the PDF is saved in a generic format like PDF/A with no custom settings.

Source Formats

PDF is good for snapshots but what about the source files for those PDFs and how about things that are not written but visual? Use the most open source formats you can find that are supported everywhere. Open source software can live forever and the translation method for reading and writing files can be viewed in the software. With open source, someone out there can recover the decipher method for open formats. Great formats to use are:

Documents, Spreadsheets, Presentations

LibreOffice using the Open Document Format

Digital Video

MPEG-4

Digital Graphics

SVG

Photos from Camera or Post Processed

RAW

GIMP

Digital Audio

FLAC

Special Mention about Plain-Text

A plain-text file is the most universal file format you can use but only if you take care. A plain-text file can be the most long-lived format you can use. They can also get corrupted so that they are not immediately readable. Even when this happens, you can still recover most or all of the information from plain-text files where it would be less feasible with other formats.

  • Try to use the ASCII character set for plain-text files.
  • ASCII is near the oldest, surest character set to recover.
  • Most recovery tools, even advanced ones may become confused with Unicode where ASCII will work.
  • Verify ASCII files whenever backed up.
  • A second copy of ASCII files into PDF can improve recovery efforts in the future.

Databases

Although databases seem complex, they can always be exported to plain-text. Put those ASCII files in a standardized, compressed file format. Accompany the compressed data with a document on how to extract the files back into a database structure and all is well. A delimited file can always be refitted into a high speed binary format. The reverse is less feasible or problematic if you are too complacent with binary formats.

Storage

You can store your data permanently on GitHub or Amazon. If the data is not sensitive, those are good options for long-term storage. Something can happen and you cannot access those services so local storage will be more dependable. The most reliable storage technology is actual hard drives. As long as you make duplicate backups onto multiple hard drives and swap those drives out every few years, you will have durable access to the data. Make sure you power down the drives properly and that the data has fully written to the drives.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s