- Joined
- Mar 29, 2012
- Messages
- 11
- Motherboard
- Gigabyte X299X Designare 10G - F3b - OpenCore UEFI
- CPU
- i9-10940X
- Graphics
- Radeon VII
- Mobile Phone
After spending almost all of my very limited free time over the last six months working on debugging this motherboard's notorious hard reset → boot failure issue, I am happy to be able to finally announce that I have identified the apparent root cause of this issue, reported it to Gigabyte, and developed a prototype modified version of OpenCore with a workaround for this issue.
The root cause of this issue is a rather nasty bug in the motherboard's UEFI firmware. Long story short, any attempt to connect drivers to a specific child handle created by the AMI Generic LPC Super I/O Driver (at
Rather oddly, attempting to manually reload or disable the WDT via direct register writes (or via calling
Anyways, this ICC/OC Watchdog Timer (WDT) is a watchdog timer provided by the Intel X299 PCH that is primarily used to recover from an unstable overclock and other similar situations. There are a great many technical details and different ways of using this timer and details regarding how it works and interacts with various parts of the UEFI firmware, but for our purposes, all you really need to know is this: if the WDT is enabled and the timeout expires, the computer will immediately shut down, then start back up with a number of overclocking-related settings reset/overridden to alternate settings (these are basically safe defaults).
Note that this does not actually modify the setup settings, which is why you will see a discrepancy between what is shown in the UEFI Setup and what is actually active for various overclocking-related options until you re-apply the settings via "save & exit" and return to a normal boot.
Anyways, at this point, you will be shown the "Boot failure detected" warning dialog, at least on F3b and below (F3c is a whole different dumpster fire that I won't go into). If you enter setup and boot via boot override, you will be operating with the alternate settings — this is why you will e.g. loose your XMP profiles or your CPU overclocks when booting this way.
Until you re-apply settings via "save & exit", you will remain in this "boot failure" mode (I believe the generic AMI term for this may be "safe mode"), even after rebooting. This is why you will continue to encounter the "Boot failure detected" warning dialog on every boot if you keep booting by entering setup and booting via boot override.
Now here's the incredibly weird thing. When you're in this "boot failure" mode, attempts to connect drivers to that specific problematic child handle created by the AMI Generic LPC Super I/O Driver do not cause any issues. I don't have any good explanation for this difference in behavior. My best guess is that it's due to some sort of difference in the firmware initialization between the two cases interacting with poorly written UEFI driver and/or application code, but I can't really make an informed guess, and honestly, even then I am reaching.
At any rate, further speculation on this bizarre behavior is not worth it anymore by this point — only Gigabyte or AMI can fix the problem, as they're the only ones with access to the actual source code (as well as the tools required to properly debug this issue at the necessary level).
I submitted a ticket via Gigabyte's eSupport site on December 1st describing the issue in detail. I am still waiting for a response, but honestly, I'm not sure if or when we'll see a fix for this issue. Gigabyte hasn't released even a single UEFI firmware update for any of their X299X motherboards in almost a full year now, and that's a rather bad sign.
With regards to OpenCore, the primary reason it causes these boot failures is due to the use of recursive
The workaround I came up with and implemented is really not that great for a lot of reasons, and I'm not sure that it'd be suitable for inclusion into upstream (at the absolute least it would need to become configuration-based for it to make it into upstream, it also almost certainly needs to be moved into a separate lib module or at least separate functions within the existing lib, etc).
I have not tested it very extensively yet (it's been less than 24 hours since I created the first fully working prototype/proof-of-concept version), and while it has worked fairly well in the limited testing I've done so far (with the exception of a slew of very perplexing errors I ran into when I finally disabled or removed most of the overly-verbose debug printing left over from development, which turned out to be caused by what looks like some stupid timing-related issues that were easily temporarily worked around again, although a more robust solution is probably going to be a headache), initial appearances can be deceiving.
Also, I'm not totally confident that this hardcoded approach will even work universally on this board across all configurations, settings, firmware revisions, etc. It probably will, but that'll need to be tested.
I plan to spend some time on testing this out in greater depth and to further clean up & improve the code before I push this to a public repo.
Anyways, I thought I might as well share what I've discovered and done in here. This has been a very long and painful journey, but the end is finally in sight (at least via the workaround if nothing else), and I've learned many interesting things along the way (although if I hadn't, there's no way I would have kept hacking away at this for so long, lol).
The root cause of this issue is a rather nasty bug in the motherboard's UEFI firmware. Long story short, any attempt to connect drivers to a specific child handle created by the AMI Generic LPC Super I/O Driver (at
PciRoot(0x0)/Pci(0x1F,0x0)/Acpi(PNP0303,0x0)
) while booted normally will (re)enable the ICC/OC Watchdog Timer (WDT) with an 8 second timeout.Rather oddly, attempting to manually reload or disable the WDT via direct register writes (or via calling
gBS->SetWatchdogTimer (0, 0, 0, NULL);
) will not work at this point. I suspect that there is probably some other mechanism at play here in addition to the WDT that is responsible for the timed forced reset.Anyways, this ICC/OC Watchdog Timer (WDT) is a watchdog timer provided by the Intel X299 PCH that is primarily used to recover from an unstable overclock and other similar situations. There are a great many technical details and different ways of using this timer and details regarding how it works and interacts with various parts of the UEFI firmware, but for our purposes, all you really need to know is this: if the WDT is enabled and the timeout expires, the computer will immediately shut down, then start back up with a number of overclocking-related settings reset/overridden to alternate settings (these are basically safe defaults).
Note that this does not actually modify the setup settings, which is why you will see a discrepancy between what is shown in the UEFI Setup and what is actually active for various overclocking-related options until you re-apply the settings via "save & exit" and return to a normal boot.
Anyways, at this point, you will be shown the "Boot failure detected" warning dialog, at least on F3b and below (F3c is a whole different dumpster fire that I won't go into). If you enter setup and boot via boot override, you will be operating with the alternate settings — this is why you will e.g. loose your XMP profiles or your CPU overclocks when booting this way.
Until you re-apply settings via "save & exit", you will remain in this "boot failure" mode (I believe the generic AMI term for this may be "safe mode"), even after rebooting. This is why you will continue to encounter the "Boot failure detected" warning dialog on every boot if you keep booting by entering setup and booting via boot override.
Now here's the incredibly weird thing. When you're in this "boot failure" mode, attempts to connect drivers to that specific problematic child handle created by the AMI Generic LPC Super I/O Driver do not cause any issues. I don't have any good explanation for this difference in behavior. My best guess is that it's due to some sort of difference in the firmware initialization between the two cases interacting with poorly written UEFI driver and/or application code, but I can't really make an informed guess, and honestly, even then I am reaching.
At any rate, further speculation on this bizarre behavior is not worth it anymore by this point — only Gigabyte or AMI can fix the problem, as they're the only ones with access to the actual source code (as well as the tools required to properly debug this issue at the necessary level).
I submitted a ticket via Gigabyte's eSupport site on December 1st describing the issue in detail. I am still waiting for a response, but honestly, I'm not sure if or when we'll see a fix for this issue. Gigabyte hasn't released even a single UEFI firmware update for any of their X299X motherboards in almost a full year now, and that's a rather bad sign.
With regards to OpenCore, the primary reason it causes these boot failures is due to the use of recursive
gBS->ConnectController
calls combined with child path rollups.The workaround I came up with and implemented is really not that great for a lot of reasons, and I'm not sure that it'd be suitable for inclusion into upstream (at the absolute least it would need to become configuration-based for it to make it into upstream, it also almost certainly needs to be moved into a separate lib module or at least separate functions within the existing lib, etc).
I have not tested it very extensively yet (it's been less than 24 hours since I created the first fully working prototype/proof-of-concept version), and while it has worked fairly well in the limited testing I've done so far (with the exception of a slew of very perplexing errors I ran into when I finally disabled or removed most of the overly-verbose debug printing left over from development, which turned out to be caused by what looks like some stupid timing-related issues that were easily temporarily worked around again, although a more robust solution is probably going to be a headache), initial appearances can be deceiving.
Also, I'm not totally confident that this hardcoded approach will even work universally on this board across all configurations, settings, firmware revisions, etc. It probably will, but that'll need to be tested.
I plan to spend some time on testing this out in greater depth and to further clean up & improve the code before I push this to a public repo.
Anyways, I thought I might as well share what I've discovered and done in here. This has been a very long and painful journey, but the end is finally in sight (at least via the workaround if nothing else), and I've learned many interesting things along the way (although if I hadn't, there's no way I would have kept hacking away at this for so long, lol).