Contribute
Register

Random errors in data from ALL disks in the system

Status
Not open for further replies.
Joined
Jul 23, 2015
Messages
47
Motherboard
Asus Z97MX-gaming 5
CPU
i7-4790K
Graphics
GTX 960
Mac
  1. iMac
Mobile Phone
  1. Android
It all started when I had a reboot happening during a backup so I felt an urge to binary-compare the backup-folder to the desktop equivalent. I was assuming everything was ok, it wasn't.

I found numerous files that didn't match up all over the place, even within folders not being processed at that time. I ran disk utility, all was fine. After a lot of troubleshooting I came to this horrifying conclusion:

I have random errors in the data coming from EVERY disk in the system, including external USB disks. In 10gb of data spread into about 7-800mb files, I get about one or two errors. The funny thing is, the errors "move". After emptying the disk cache or rebooting the errors usually pop up in OTHER files that previously were marked OK. As an example, a zip on my system didn't verify ok and was bad according to betterzip. After rebooting, all of a sudden the zip is verified OK again. I've also started having random weirdness in the system like chrome crashing repeatedly opening the same page 5 times, DESPITE quitting and restarting chrome. Then all of a sudden, it works again. I've had a few sudden kernel panics, especially during heavy disk use (duh!). VMware has had some hickups and all of a sudden the guest os wouldn't boot anymore. As you probably can figure something is VERY VERY wrong.

How I'm testing this: I've got a test-folder with a set of movies at my NAS. Before copying these files to my troubled hackintosh, I checksum them with a MD5 tool. After copying the files I verify this MD5 file and find some files are corrupted. Then rebooting and now other files are corrupted while some that were corrupted now is ok. That is: random errors in the data coming from the disk. This happens on all 3 internal disks (SSD and SATA) as well as any external disk attached to the hackintosh. I cannot find any errors in disk utility. I've hex-compared a broken file to the correct one and the changes were as follows: about a hundred places with changed bytes in the order of 1-4 bytes per position.

The hard part now is telling what files are hurt and what aren't. zip-files are easy, dmg's perhaps too, but images and videofiles do not have any checksums to verify. I have a timemachine backup going back all the way since I built the hackintosh so assuming the problems weren't there to begin with, I do have an ok version of the files I had back THEN at least, but how do I know where the line goes datewise? The only idea I can think of is md5'ing the entire damn first point in timemachine, then comparing this to one point at a time until I see a high number of mismatches. This would take forever. Do you guys have any better idea on how to figure out what files are ok / how far back I have to go in Timemachine?

What this really is for me and perhaps should be to some of you too is a wakeup-call. hackintosh is cool and affordable but sometimes weird errors sneak up on us that may have even get into your backups and one day you just may sit there like a stupid, sorry mess wondering if ANY of your double or triple backups actually will save your files.

Notes: I haven't installed ANY updates or even security updates since I at least had the feeling that everything was working great, so I have NO idea on where this is coming from.
 
What is the file system ?
 
What is the file system ?

Journaled HFS+, the default and recommended.

I continued digging around and pondering the issue until I ran memtest and it spit out a never ending stream of mismatches. It ran for like 2 seconds before complaining so likely, the mem has gone to sh*t, or possibly the memcontrollers on the mobo, I'd guess the ram. My biggest concern now isn't replacing the RAM but rather figuring out a way to determind what point I need to go back to in timemachine after replacing the ram. It's incredibly hard to judge when it started and how to find that point.
 
Journaled HFS+, the default and recommended.

I continued digging around and pondering the issue until I ran memtest and it spit out a never ending stream of mismatches. It ran for like 2 seconds before complaining so likely, the mem has gone to sh*t, or possibly the memcontrollers on the mobo, I'd guess the ram. My biggest concern now isn't replacing the RAM but rather figuring out a way to determind what point I need to go back to in timemachine after replacing the ram. It's incredibly hard to judge when it started and how to find that point.

It is possible that the data is ok and that it is just the RAM that is bad so that it cannot correctly report what is recorded on the disks.
 
Run MemTest86 for 24/7 to see if you have bad RAM.
 
It is possible that the data is ok and that it is just the RAM that is bad so that it cannot correctly report what is recorded on the disks.
I've considered that possibility but when you think about it, as soon as a program reads a document, changes it in ram and rewrites it, it's probably crap. The thing saving me may be that this hasn't been going on for too long and most data on the disk hasn't been affected. Unless, of course the ram is really bad and the wrong areas of the disks have been overwritten.

I'm still open for suggestions on how to validate what files are okey when there is no checksum available :/

Voiletdragon: it was the ram.
 
Status
Not open for further replies.
Back
Top