General NVMe Drive Problems (Fatal)

c-o-pr · Jun 8, 2022

maxxx said:
You can get more info here.

Thank you. Due to the wonky nature of these forums you are quoting back to me the same information I offered to the forum back in the day before starting this thread, and for all I know back in the day I was repeating something others had already offered.

I hope you will agree that there's no info about what or why in that article. It's just a Scientology "run-down" of effects. No explanation of why.

I've tried to study the ATA spec and read a lot of history regarding the feature; I've been tracking the Trim subject for more than 10 years.

Trim, in general, is the single most mystical aspect of computing, followed maybe by divide-by-zero, but the comparison is not fair to divide-by-zero. The details have been lost to the web, but the best information I've ever found was early exchanges by Microsoft (Wintel) when the first internal SSDs were becoming available: their engineers got excited about getting the OS involved with inner-workings of new storage the same way they were excited about Winmodem, which was an early approach that gave the "valuable" intel CPU something to do, modulating and de-modulating, allowing systems integrators to save a couple of s $ of parts cost over using a dedicated modem. The ATA spec itself was one of these "inventions" in the days of SCSI, where an "active" bus terminator often cost hundreds dollars (today a 5$ usb drive is 100 billion times more complex than a SCSI terminator.)
Way back, real mac users howled that Apple would not enable Trim on third party drives claiming they were being denied a critical feature by a greedy company, but over in Unix-land it was well understood that Trim was a systems bugaboo with bad edge cases. Apple was simply helping users not shoot themselves in the foot.
As to if, when, and how Apple applies Trim on supported drives the subject itself a mystery, where all knowledge is anecdotal.

10 trees ago, over at Ars Technica there's you can read a PhD dissertation of diatribe on Flash pertaining to Trim, and you will come away that the author has no idea why Trim is needed.

The best research I've ever seen on Trim says that it can be effective in a broader design to address wear-leveling, which is the most troubling aspect of SSD design. Most users think it makes their drives faster, but the opposite tends to be true. Trim itself is a workload, and one that actually may be avoided by the drive supplier. There's really no Trim spec, just an ATA opcode with endlessly morphing semantics.

You may ask yourself today: Why is Trim not supported, and never has been supported at all, on removable drives?

You may ask yourself why does Linux implement Trim as two separate modes: 1) a "discard" option in fstab, and 2) as fstrim(1) command?

Getting back to immediate subject at hand, what I meant by explanation in my earlier post is a reckoning over why some controllers freak-out and die, what kind of Trim commands does macOS issue and when, why are these commands used?

The fact that Apple NVMe drives enable Trim by default is more about Intel than Apple. NVMe originated as Intel Optane, and the spec is an Intel creation. So the previous Wintel ways of thinking are behind it.

As to what Apple does with it, that's anybody's guess. Too bad everything we need to know is locked in a shed outside the walls of the garden.

Today, if the hackkintosher "disables" Trim using SetApfsTrimTimeout, is that a good thing? If so, why is Trim even a feature? If it's a bad thing, how do you feel about doing something bad to your drive to get your hack to work? How good or bad is it?! Does it matter that your data is at stake?

I started this thread because I lost two NVMe drives and all their data during the first month of building a new hack. As far as I know there will never be any explanation, and any other drives could blow at any moment!

But then I guess things have always been like this.

c-o-pr · Jun 9, 2022

980 Pro update Part Deux

The boot delay came back, and lasted almost 15 minutes, including a 10 min delay of heavy drive activity during mid boot and other 5 mins of heavy activity during late boot, which then turned into a black desktop with mouse cursor and normal drive activity but no workee.

The last console message right before the mid-boot stall was something about "BOOT dirscleaner" (I'm an idiot for not taking a pic).

At the time I was doing all kinds of stuff with my hack, including

• Just deleted APFS snapshots on 980 made by CCC
• working with an NVMe external drive which sometimes causes macOS to go loopy (I think it is related to having a full Monterey install on the ext drive because another similar NVMe ext drive works fine and all drives work fine in M.2 slots, wheee!)
• Fiddling with BIOS Active State Power Management (ASPM) features.
• And recently had been working with BIOS overclock tweaks...

Am I asking for it, or what?!

I guess I thought I had some actual news here but reading my own report it looks kinda dumb... oh well

(annd Post Reply)

maxxx · Jun 10, 2022

c-o-pr said:
Thank you. Due to the wonky nature of these forums you are quoting back to me the same information I offered to the forum back in the day before starting this thread, and for all I know back in the day I was repeating something others had already offered.

...

Today, if the hackkintosher "disables" Trim using SetApfsTrimTimeout, is that a good thing? If so, why is Trim even a feature? If it's a bad thing, how do you feel about doing something bad to your drive to get your hack to work? How good or bad is it?! Does it matter that your data is at stake?

I started this thread because I lost two NVMe drives and all their data during the first month of building a new hack. As far as I know there will never be any explanation, and any other drives could blow at any moment!

But then I guess things have always been like this.

What can I say? this is Hackintosh, to begin with, it is not supported by Apple or anyone for that matter, it's just an hack. So if you lost an ssd or two, dont fret about it. I had tens of SSD that I played with and most recent victim is viper patriot (using the new Rainer controller designed by InnoGrit), it is a dud. Stick with what's known to be safe, but where is the fun in exploring if you're playing safe. it's annoyingly fun.

, I hardly touch my macbook nor mac mini, because nothing is broken.

UtterDisbelief · Jun 10, 2022

c-o-pr said:
Thank you. Due to the wonky nature of these forums you are quoting back to me the same information I offered to the forum back in the day before starting this thread, and for all I know back in the day I was repeating something others had already offered.

I hope you will agree that there's no info about what or why in that article. It's just a Scientology "run-down" of effects. No explanation of why.

I've tried to study the ATA spec and read a lot of history regarding the feature; I've been tracking the Trim subject for more than 10 years.

Trim, in general, is the single most mystical aspect of computing, followed maybe by divide-by-zero, but the comparison is not fair to divide-by-zero. The details have been lost to the web, but the best information I've ever found was early exchanges by Microsoft (Wintel) when the first internal SSDs were becoming available: their engineers got excited about getting the OS involved with inner-workings of new storage the same way they were excited about Winmodem, which was an early approach that gave the "valuable" intel CPU something to do, modulating and de-modulating, allowing systems integrators to save a couple of s $ of parts cost over using a dedicated modem. The ATA spec itself was one of these "inventions" in the days of SCSI, where an "active" bus terminator often cost hundreds dollars (today a 5$ usb drive is 100 billion times more complex than a SCSI terminator.)
Way back, real mac users howled that Apple would not enable Trim on third party drives claiming they were being denied a critical feature by a greedy company, but over in Unix-land it was well understood that Trim was a systems bugaboo with bad edge cases. Apple was simply helping users not shoot themselves in the foot.
As to if, when, and how Apple applies Trim on supported drives the subject itself a mystery, where all knowledge is anecdotal.

10 trees ago, over at Ars Technica there's you can read a PhD dissertation of diatribe on Flash pertaining to Trim, and you will come away that the author has no idea why Trim is needed.

The best research I've ever seen on Trim says that it can be effective in a broader design to address wear-leveling, which is the most troubling aspect of SSD design. Most users think it makes their drives faster, but the opposite tends to be true. Trim itself is a workload, and one that actually may be avoided by the drive supplier. There's really no Trim spec, just an ATA opcode with endlessly morphing semantics.

You may ask yourself today: Why is Trim not supported, and never has been supported at all, on removable drives?

You may ask yourself why does Linux implement Trim as two separate modes: 1) a "discard" option in fstab, and 2) as fstrim(1) command?

Getting back to immediate subject at hand, what I meant by explanation in my earlier post is a reckoning over why some controllers freak-out and die, what kind of Trim commands does macOS issue and when, why are these commands used?

The fact that Apple NVMe drives enable Trim by default is more about Intel than Apple. NVMe originated as Intel Optane, and the spec is an Intel creation. So the previous Wintel ways of thinking are behind it.

As to what Apple does with it, that's anybody's guess. Too bad everything we need to know is locked in a shed outside the walls of the garden.

Today, if the hackkintosher "disables" Trim using SetApfsTrimTimeout, is that a good thing? If so, why is Trim even a feature? If it's a bad thing, how do you feel about doing something bad to your drive to get your hack to work? How good or bad is it?! Does it matter that your data is at stake?

I started this thread because I lost two NVMe drives and all their data during the first month of building a new hack. As far as I know there will never be any explanation, and any other drives could blow at any moment!

But then I guess things have always been like this.

Like you I find the magic "trim" aura confusing. For many years now though, I have actively disabled it using the "trimforce" Terminal command, if it is ever made active, and have never suffered any poor performance or longevity issues. My first ever SSD is no longer a boot drive but in a USB caddy for offline storage (SSDs may fail to write eventually, but can still be read if the controller holds up).

What's more the SSDs I use all feature firmware "wear levelling" and "garbage collection", so why use software?

However, my simplistic view has been shouted down on occasion ...

c-o-pr · Jun 10, 2022

maxxx said:
... it's annoyingly fun., I hardly touch my macbook nor mac mini, because nothing is broken.

If I had a good experience with a NVMe then ran into a glitch, I would be more easy going, but it went to hell with my first drive, then the replacement. It took almost 2 months to work it out with Sabrent and their support was very reluctant to stand behind their product. Huge PITA. Then the 3rd, which is Samsung 980 Pro got hard errors, and dealing with Samsung was slightly easier. The 980 Pro seemed great then a firmware update in Jan brought on the boot delay... What fun!

When I got started on this a year ago, I thought I'd split my time between tinkering with the hack and putting the hack to good use on other projects, replacing a 2008 Mac Pro. The good use got eclipsed by the tinkering and the Mac Pro is still on desk because it's reliable. The hack has become a bugaboo because if I switch over, I have to keep another computer in reserve to handle when it poops the bed... I guess I didn't really think this though.

I do enjoy the tinkering though, and been learning a lot about self determination over my PC. It has also made me more jaded.

I think I belong on Linux, but when I'm using Linux WRT to habits it's 'garr why does it feel wrong'. I am a Machead, and so doomed to a struggle of bearing the contradictions.

maxxx · Jun 11, 2022

c-o-pr said:
If I had a good experience with a NVMe then ran into a glitch, I would be more easy going, but it went to hell with my first drive, then the replacement. It took almost 2 months to work it out with Sabrent and their support was very reluctant to stand behind
I do enjoy the tinkering though, and been learning a lot about self determination over my PC. It has also made me more jaded.

I think I belong on Linux, but when I'm using Linux WRT to habits it's 'garr why does it feel wrong'. I am a Machead, and so doomed to a struggle of bearing the contradictions.

I feel you there, I started the hack about a year ago too, I've been on Windows for as long as I can remember and for Linux not so much, only had few install with slackware, ubuntu, it's not my thing.

I've been on 6 PCs builds on these hacks, almost kind of addiction, costing small fortune, sometimes due my own lack of reading the manual/forum; ie.: bought rx6700xt, when turned out not supported by macOS. Now, I hardly looking back to Windows.

It sounds to me your problem is/was related to NVMe, get a piece SN750 or SN850 for better option/future upgrade, I bet it will all smooth. The price of SSD is dropping like a rock at Newegg or Amazon and will be a few bucks well spent.

hardcorefs · Jul 9, 2022

UtterDisbelief said:
Like you I find the magic "trim" aura confusing. For many years now though, I have actively disabled it using the "trimforce" Terminal command, if it is ever made active, and have never suffered any poor performance or longevity issues. My first ever SSD is no longer a boot drive but in a USB caddy for offline storage (SSDs may fail to write eventually, but can still be read if the controller holds up).

What's more the SSDs I use all feature firmware "wear levelling" and "garbage collection", so why use software?

However, my simplistic view has been shouted down on occasion ...

Because they pertain to completely different things.

The computer & storage devices are both black boxes.
(i wrote a dissertation for why forensic write blockers were potentially crap.
basically, "how does sticking a 3rd black box between 2 other black boxes Guarantee the integrity of anything?")

Anyway back to the original thesis, one pertains to things the storage device knows & controls, the other that only the computer knows.
The "trim" is a half assed attempt at fixing that.

It does not help that SSD devices stole a lot of terms then misused them, completely confusing the situation.

I suspect that many devices are broken due to internal shitty firmware in the flash controllers, and crap going off line or un-servicable is due to more bugs in storage parameters.
Back in 2006 I built a realtime system to Analise flash controllers.
and on some i found that by writing certain data strings to certain flash chip pages, could totally erase the devices.
and in some cases the SAME pages that were used to store user data.....
and i thought what are the odds...... of the data in a users file triggering this.

It looked like "ransomware" or "military" code had made it's way into the flash controller chips..
but in reality I suspect it was a recovery system used in mass production, for testing or if they ****ed up the firmware.(slap on an SPI probe write the bytes & the chip recovers to ROM base code)
Many of these flash controllers have a ROM booter, that loads a default firmware, then runs it,
the default firmware then uses a "wedge table" to take over from the booter, sometimes totally, other times partially.
Then this in turn wedges out fixed rom code via additional wedge tables", so you can imaging if you have a bad firmware, it is possible to get a product into a situation where you cannot get new firmware back in.

In reality I'm surprised we manage to get any of this crap to be reliable....

If I was 30 years younger, it would be an interesting project to pull the the code out of some of these devices.

UtterDisbelief · Jul 10, 2022

hardcorefs said:
Because they pertain to completely different things.

The computer & storage devices are both black boxes.
(i wrote a dissertation for why forensic write blockers were potentially crap.
basically, "how does sticking a 3rd black box between 2 other black boxes Guarantee the integrity of anything?")

Anyway back to the original thesis, one pertains to things the storage device knows & controls, the other that only the computer knows.
The "trim" is a half assed attempt at fixing that.

It does not help that SSD devices stole a lot of terms then misused them, completely confusing the situation.

I suspect that many devices are broken due to internal shitty firmware in the flash controllers, and **** going off line or un-servicable is due to more bugs in storage parameters.
Back in 2006 I built a realtime system to Analise flash controllers.
and on some i found that by writing certain data strings to certain flash chip pages, could totally erase the devices.
and in some cases the SAME pages that were used to store user data.....
and i thought what are the odds...... of the data in a users file triggering this.

It looked like "ransomware" or "military" code had made it's way into the flash controller chips..
but in reality I suspect it was a recovery system used in mass production, for testing or if they ****ed up the firmware.(slap on an SPI probe write the bytes & the chip recovers to ROM base code)
Many of these flash controllers have a ROM booter, that loads a default firmware, then runs it,
the default firmware then uses a "wedge table" to take over from the booter, sometimes totally, other times partially.
Then this in turn wedges out fixed rom code via additional wedge tables", so you can imaging if you have a bad firmware, it is possible to get a product into a situation where you cannot get new firmware back in.

In reality I'm surprised we manage to get any of this crap to be reliable....

If I was 30 years younger, it would be an interesting project to pull the the code out of some of these devices.

Thanks for your explanation. I appreciate that. :thumbup:

As end-users we have to take on trust that manufacturers and OS coders are making best use of the SSD medium. My own point was only that macOS's built-in "trim" commands seem, from personal experience, superceded by routines nowadays included in SSD controller firmware. I've had zero issues not using "trim", be it different or similar to other techniques, for speed or longevity. I'm not qualified to dissect the subject with a forensic eye! So I appreciate your view. Cheers.

c-o-pr · Jul 21, 2022

hardcorefs said:
Anyway back to the original thesis, one pertains to things the storage device knows & controls, the other that only the computer knows.
The "trim" is a half assed attempt at fixing that.

Trim is no doubt an absurdly malformed design. If you study it it looks worse and worse.

My orienting observation about the design of Trim is that it's a form a write, and if receiving writes can help the storage controller manage free space, well... the storage controller can just look at regular writes! Put another way, any time a write is issued, the device has learned it can garbage collect the flash under the written extent. From there, you need a page table to map interface extents (LBAs) onto free flash. When the write comes it, map fresh pages under it and send the pages that had previously been mapped to the garbage collector. Putting VM into a storage controller should be no big deal these days. The secret-sauce will be workload tailoring.

To stay sane about SSDs, I just keep in mind my assumption that when MSFT started playing with the early products they had very little logic in the drive controller, they wanted to adapt it to ATA, and engineers saw a short-term opporunity to add value to the OS. There's a long history of such compromises, e.g., CHS addressing in spinning drives where the driver writer had to put a layout map for the specific drive into the OS. It wasn't nefarious, it's just the logic was expensive and you already paid for a huge general purpose logic block, so use it! Today, things are vastly more complex. I've been looking at old Byte magazines from late 70s and 80s and my mind is boggling at how far the complexity has come.

Strangely, when Lisp was invented, and much later when Xerox PARC invented the PC as we know it, they were working with a billion times less power than we have, yet all the key elements of what we use today were engineered; the seeds of what the character Kevin Flynn in Tron Legacy, played by old Jeff Bridges, called the ISOs: the isomorphic programs; the code that gives rise to itself.

There's a crazy kind of hope in such stories... In a weird way, the challenges are now greater than ever

ghat · Jul 30, 2022

hi
I don't know if I have this problem, but my NVMe seems to have died or something, after I kept my box shut off for a month. It was running macOS Big Sur. I had ordered a SS 980 Pro to replace the SS 960 I currently have on the box (which apparently has failed), but just cancelled the order, and ordered a WDC Black SN850.

G

Search

General NVMe Drive Problems (Fatal)

c-o-pr

c-o-pr

maxxx

UtterDisbelief

Moderator

c-o-pr

maxxx

hardcorefs

UtterDisbelief

Moderator

c-o-pr

ghat

Forum

Guides

Downloads