Contribute
Register

General NVMe Drive Problems (Fatal)

Joined
Apr 12, 2021
Messages
900
Motherboard
Asus z590 ROG Maximus XIII Hero
CPU
i9-11900K
Graphics
RX 6600 XT
Mac
  1. MacBook Pro
  2. Mac mini
  3. Mac Pro
Classic Mac
  1. Centris
  2. Power Mac
Mobile Phone
  1. iOS
There's discussion in Monterey threads about slow boot times with Samsung. Dortania (vit9696) has narrowed this down to problems in Trim support versus Apple's NVMe driver. They are asking for users to help with their testing.

Moreover, according to same article, Apple's NVMe driver is known to be fatal for Phison based drive brands due to workloads that can cause the drive to overload and fail losing all data, which I have seen myself!


//Incompatible with IONVMeFamily (die under heavy load):
  • GIGABYTE 512 GB M.2 PCIe SSD (e.g. GP-GSM2NE8512GNTD) (need more tests)
  • ADATA Swordfish 2 TB M.2-2280
  • SK Hynix HFS001TD9TNG-L5B0B
  • SK Hynix P31
  • Samsung PM981 models
  • Micron 2200V MTFDHBA512TCK
  • Asgard AN3+ (STAR1000P)
  • Netac NVME SSD 480//
+ SABRENT ROCKET 4
MY DIRECT EXPERIENCE OF 2 NEW DRIVES DYING WITHIN WEEKS

//dreamwhite commented on May 21:
I don't know if you'll ever consider the following, but apparently many models of Phison E12 (like Sabrent Rocket NVMe 3.0 TLC 64 layers NAND) have a broken trim support and are going to failure of the disk. In less than 1 year, I wrote just 5.2 TB of data for my 512GB model and the drive estimated life was 94%. According to Sabrent TBW warranty, the 500GB 3.0 TLC model should have 800TBW of life which is a joke. Please avoid buying NVMes with Phison E12 controller. I opted for a WD Black SN750 and it works flawlessly both with and without TRIM ^^//

The Dortania report confirms my experience with Sabrent and WD re Phison. When I was building a new hack last spring I chose a Sabrent Rocket 4 which quickly died, and a replacement from Sabrent also quickly died as documented in these posts:

MY BUILD

DRIVE FAILURE INCIDENT

FOLLOWUP

—Sry for too many words, I like words

I replaced the Sabrent with a WD Black SN750 which so far has no problems.

I have also used a Samsung 980 Pro in this build, which I am now returning (literally mailing today) because of file data loss and SMART media errors. I managed to recover back to the WD SN750 (Ohh the stories I could tell)

THE ABOVE DORTANIA GITHUB LINK WILL LEAD YOU TO THE FOLLOWING OPENCORE ENHANCEMENT TO SUPPORT TRIM TIMEOUT TESTING

//SetApfsTrimTimeout: integer
Failsafe: -1
Requirement: 10.14 (not required for older)
Description: Set trim timeout in microseconds for APFS filesystems on SSDs.
[EXPLANATION]
The APFS filesystem is designed in a way that the space controlled via the spaceman structure is either used or free. This may be different in other filesystems where the areas can be marked as used, free, and unmapped. All free space is trimmed (unmapped/deallocated) at macOS startup. The trimming procedure for NVMe drives happens in LBA ranges due to the nature of the DSM command with up to 256 ranges per command. The more fragmented the memory on the drive is, the more commands are necessary to trim all the free space. Depending on the SSD controller and the level of drive fragmenation, the trim procedure may take a considerable amount of time, causing noticeable boot slowdown. The APFS driver explicitly ignores previously unmapped areas and repeatedly trims them on boot. To mitigate against such boot slowdowns, the macOS driver introduced a timeout (9.999999 seconds) that stops the trim operation when not finished in time. On several controllers, such as Samsung, where the deallocation process is relatively slow, this timeout can be reached very quickly. Essentially, it means that the level of fragmentation is high, thus macOS will attempt to trim the same lower blocks that have previously been deallocated, but never have enough time to deallocate higher blocks. The outcome is that trimming on such SSDs will be non-functional soon after installation, resulting in additional wear on the flash.
One way to workaround the problem is to increase the timeout to an extremely high value, which at the cost of slow boot times (extra minutes) will ensure that all the blocks are trimmed. Set this option to a high value, such as 4294967295, to ensure that all blocks are trimmed. Alternatively, use over-provisioning, if supported, or create a dedicated unmapped partition where the reserve blocks can be found by the controller. Conversely, the trim operation can be disabled by setting a very low timeout value. e.g. 999. Refer to this for details.
[REQ TRANSLATION]
//

HTH
 
Important info, thanks for sharing your experience.
 
I have a WD Black SN750 and use the -1 value for the trim. Should I use a specific value?
 
Interesting. Does this problem only appear under MacOS Monterey? Does it affect older MacOS versions?

My Kingston A2000 and KC2500 NVMe M.2 SSDs use Silicon Motion controllers and so far no problems (I run only Mojave and Catalina on them).
 
I have also used a Samsung 980 Pro in this build, which I am now returning (literally mailing today) because of file data loss and SMART media errors.
Interesting. I just upgraded to Monterey from Big Sur yesterday. I am running a Samsung SSD 970 EVO Plus and not seeing any issues (yet) beyond very slightly slower boot times.
 
Interesting. Does this problem only appear under MacOS Monterey? Does it affect older MacOS versions?

My Kingston A2000 and KC2500 NVMe M.2 SSDs use Silicon Motion controllers and so far no problems (I run only Mojave and Catalina on them).
I've only noticed the issue in discussions relating to Monterey installed on some NVMe SSD drives, including the 970 EVO NVMe (which seems most prominent at this time), but that doesn't necessarily mean its mutually exclusive to Monterey.

In Big Sur, with OC 0.7.4 (in sometimes in 0.7.5), I've experienced an early boot hang which could possibly be related to (but not attributable to) the same root cause, whatever that may be. I assume that will be flushed out upstream when the devs get around to it. This hang was noticeable when OpenLinuxBoot.efi was enabled. I use standard SSD drives.
 
I've only noticed the issue in discussions relating to Monterey installed on some NVMe SSD drives, including the 970 EVO NVMe (which seems most prominent at this time), but that doesn't necessarily mean its mutually exclusive to Monterey.

In Big Sur, with OC 0.7.4 (in sometimes in 0.7.5), I've experienced an early boot hang which could possibly be related to (but not attributable to) the same root cause, whatever that may be. I assume that will be flushed out upstream when the devs get around to it. This hang was noticeable when OpenLinuxBoot.efi was enabled. I use standard SSD drives.
Running on the verbose mode showed that my system stuck on the following operation:
Code:
Doing boot task: bootroot
What is happengig during that process? I restarted the system the several times and this is only the bottleneck in my case. I'm using 1TB 970 Evo Plus nvme2.
 

Attachments

  • IMG_0686.JPG
    IMG_0686.JPG
    9.5 MB · Views: 340
Running on the verbose mode showed that my system stuck on the following operation:
Code:
Doing boot task: bootroot
What is happengig during that process? I restarted the system the several times and this is only the bottleneck in my case. I'm using 1TB 970 Evo Plus nvme2.
You should probably ask someone who owns a 970 EVO and is dealing with the issues at hand. Cheers.
 
Interesting. Does this problem only appear under MacOS Monterey? Does it affect older MacOS versions?

My Kingston A2000 and KC2500 NVMe M.2 SSDs use Silicon Motion controllers and so far no problems (I run only Mojave and Catalina on them).
This should be a long-time issue and exclusive to some Phison and Samsung controllers.
 
Back
Top