Vigilance Regarding Data Corruption in Linux

Managing your data on Linux can be a useful means towards sustaining long-term access to that data. That requires knowing some of the limits in how Linux may handle personal data. Considering the limits means taking the right precautions in keeping information intact.

Where to Start?

Any operating system you use can crash. Stated guarantees cannot be proven and real-world evidence shows most mainstream systems crash. Never assume what you are using is 100% reliable, that way you have the opportunity to reduce surprises such as ruined data.

Ubuntu is a brand of a Linux-based operating system I used since 2009. I was on the 14.04 LTS version for nearly a year when it crashed hard as I explain in Ubuntu Crash and Kernel Panic. The maintainers of Ubuntu was planning to update their Web version to systemd technology on 9 Mar. 2015. When my crash happened on 8 Mar. 2015, the Sunday prior, I considered that the two events were related.

Crashing can have ramifications. I speculated that system updates to Ubuntu online was a cause of the system downtime I encountered. That seems fully incorrect. Rather, it is more likely the result of the system awakening from a deep sleep after several hours. An issue due to ACPI support.

Slashdot discusses other issues with ACPI in the article, Linux Might Need To Claim Only ACPI 2.0 Support For BIOS. I can relate to that suggestion as I go into some detail my own encounters with ACPI. Details dealing with data corruption risk when running and operating system.

Crash Waking Up

The reason for my crash seems more to do with the way sleep mode wake-up was handled. I rarely let my computer go to sleep since I use it from boot up to shutdown. I hardly step away until I am done. Any sleep mode time is usually no more than a minute to 10 minutes. Ubuntu handles those situations pretty well. Unfortunately, I did not shutdown my computer when I began to handle another matter and a few hours later, I had problems.

Broken Blu-Ray Player

The Blu-ray player I had only 3 months, that was manufactured in Oct. 2014, was producing a loud squeaking noise. I looked on the web only to realize this is a long-standing problem with certain Blu-ray players going back years and there was no fix. I took the player apart, applied some machine oil. That fixed the squeak but the player required the spindle to rotate a certain speed and so all my disks but 1 would play. After many hours approaching midnight, I let it go. Manufacturing warranties isn’t my thing and I probably voided it anyway. The Blu-ray player now lives in a landfill.

Broken Operating System

Once I realized the Blu-ray player was going nowhere, it was time to cleanup and put away the tools, test disks, cotton swabs, and disposable rags. I sat in front of the computer, saw the light on the power button was turned off. I knew I did not shut it down, it went into deep sleep. I thought nothing about that. I just pressed the enter key. Nothing. Pressed the power button and waited. It started coming back on. I logged in and saw some strange things. All of that is explained in more detail in my article, Ubuntu Crash and Kernel Panic.

Data Recovery

Using a Live USB I keep updated, I reached into the hard drive and pulled in all my data. Not as straightforward as it sounds since some of that data was locked from a security standpoint. A 5 minute process took an hour after the numerous commands and examinations I had to conduct to successfully pull the data off the hard drive onto multiple external drives. I got in the habit of pushing to multiple drives as I’ve been burned hard in the past by relying too much on a single drive. I had the data, I was pleased.

I am running an SSD and power issues can be unkind to them in some scenarios. ZDNet covers this in their article, How SSD power faults scramble your data. In my case, I have ruled that out, but it is good to be mindful of the possibilities.

Recovery

The point is that system crash recovery procedure should be predicated on good backup regime; reinstall the operating system; and restore configurations and data. Spending significant time on recovering an existing install can be less profitable unless a specific reason exists to attempt it. That is the approach I’ve taken.

ACPI Problems

The root of the issue is that ACPI was a standard from Microsoft. Most of the general laptops and desktops Linux is typically installed on as of 2015 are hardware systems designed for Microsoft Windows and not Linux. That is a very important condition. Gaps exist between what the software developers of Linux can reasonably do and the level of support required to handle ACPI well across the board on all system Linux could be installed. Be careful with sleep mode.

Data Corruption

When some Linux-based operating systems wake from sleep mode, data can be ruined. Important files systems use to run the system can be ruined resulting in a crash. File systems that have a copy of your data in temporary bucket can prematurely stop. Information can be partially written, causing ruination that is heavy labor intense and time intense effort to recover. All from sleep mode.

Deteriorating Backups

Backing up data is not enough it turns out. Even though you send data to an external hard drive, that data may not have been written to the external drive completely. It does not matter if you use a command-line, drag and drop, or a variation on the right-click send to method, the visual feedback can be an illusion.

Progress bars showing the copy in progress disappears and you think it is done. You run the command-line and after some time, the prompt returns. The right-click upload method finishes lightning quick or the follow on progress bar finishes and all seems well. Truth is, the data transfer is finished from the point of view of the software, but the hardware has it in a queue while it works on writing it completely.

Some flash drives may not have appropriate indicator lights but on an external hard drive the indicator lights tend to be more reliable. Anyone not noticing these indications may detach the drive from USB port thinking the work is done. Even the safe removal functions in the operating system may falsely allow detachment. Premature removal may result in data not written or a list of file names but no data.

Reducing Data Corruption

Linux, in particular, does not always do an immediate data committal when you might prefer it done. You will lose data if you are not watchful. If you are going to use mainstream, Linux-based operating systems, you will have to develop a habit and approach that involves dealing with data in a technical way. The graphical mechanisms are nice. They resemble the convenience you have in mainstream commercial operating systems. Their presence Linux they can be misleading.

Next, you would ideally verify data across different environments. That means two different versions of Linux to verify data. Maybe not everyday, but at least once a month. This is definitely important for data that must outlive specific versions of hardware and operating systems. When uploading data to external hard drives, you cannot have convenience. You have to take the time to examine the result using techniques to track I/O activity to conclusion including triggering data flush on demand.

Frequent backup is far less useful than accurate backup. Most would prefer to have an old file than a newer ruined file that is useless. Reducing data corruption in terms of backup is all about verifying the information. Use of a portable version control system like Git could be useful in making sure you have versions. The key is to make sure that Git is writing the data properly, backing up and recovering your Git repository. That is a journey requiring tech savvy commitment.

Back to the Start

I mentioned that no system is 100% reliable. Technology is constantly being updated but it may be premature to say it is constantly being improved. Millions and millions of lines of code collectively cannot be vetted for every scenario. In the end, reliability and correctness is the responsibility of the person using the technology. Not to fix it, but to be aware of how much reliability and correctness is available and make arrangements as necessary. Choices and knowledge is indispensable to that effort.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s