Contribute
Register

[SOLVED] NVIDIA related kernel panic, chronic, severe...

Status
Not open for further replies.
Joined
Aug 30, 2016
Messages
186
Motherboard
Asus MAXIMUS XI HERO
CPU
i7-8700K
Graphics
RX5700 XT
Mac
  1. iMac
Classic Mac
  1. iMac
Mobile Phone
  1. Android
My shiny new Hackintosh
https://www.tonymacx86.com/threads/...ene-i7-6700k-asus-geforce-gtx970-gigi.202270/

Hardware Overview:
Model Name: Mac Pro
Model Identifier: MacPro3,1
Processor Name: Intel Core i7
Processor Speed: 4.01 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 8 MB
Memory: 16 GB
Bus Speed: 400 MHz
Boot ROM Version: MP31.006C.B05
SMC Version (system): 1.25f4

has been working perfectly since final assembly and test about a week ago. It's been absolutely problem-free and every OSX app and function has "just worked". But then [cue Jaws theme]:

Yesterday evening while playing a favourite game I heard audio stutter, then saw video stutter, then WHAM, crash -- not just to desktop but to a reboot (kernel panic). "That was strange," sez I, and when boot was complete I cautiously restarted the game, then played for a couple of hours without incident. Thought it might have been a power glitch or gremlins, who knows. A once-off event. But tonight, I can't do anything gpu intensive for more than a few minutes without a kernel panic:


Tue Sep 27 20:48:25 2016

*** Panic Report ***
panic(cpu 0 caller 0xffffff7f9721aeab): NVRM[0/2:0:0]: Read Error 0x006100c0: CFG 0xffffffff 0xffffffff 0xffffffff, BAR0 0xde000000 0xffffff82108ae000 0x124020a1, D0, P3/4
[...]

Tue Sep 27 20:56:55 2016

*** Panic Report ***
panic(cpu 6 caller 0xffffff7f8aa1aeab): NVRM[0/2:0:0]: Read Error 0x00100f04: CFG 0xffffffff 0xffffffff 0xffffffff, BAR0 0xde000000 0xffffff8203f8e000 0x124020a1, D0, P3/4
[...]

Tue Sep 27 21:02:59 2016

*** Panic Report ***
panic(cpu 0 caller 0xffffff7f8941aeab): NVRM[0/2:0:0]: Read Error 0x00000100: CFG 0xffffffff 0xffffffff 0xffffffff, BAR0 0xde000000 0xffffff820296e000 0x124020a1, D0, P3/4
[...]

Tue Sep 27 21:12:27 2016

*** Panic Report ***
panic(cpu 0 caller 0xffffff7f88f10eab): NVRM[0/2:0:0]: Read Error 0x00000100: CFG 0xffffffff 0xffffffff 0xffffffff, BAR0 0xde000000 0xffffff8200e7e000 0x124020a1, D0, P3/4
[...]


In every case the backtrace leads to the NVIDIA driver and Apple IOKit.

I can't play the aforementioned game and I can't run the Heaven benchmark any more; it runs for a while -- not a consistent length of time, but "sorta" the same length of time -- and then wham, audio glitch, audio loop, screen goes dark, crash. And the reboot about 3 out of 4 times does not succeed; I get lots of disk LED blinking but a dark screen. However if I cycle power, I get a normal boot with both screens working... so long as I just run ordinary apps that don't challenge the GTX970. Hackie seems perfectly normal so long as I don't stress the GPU. I note that my GPU temp has been reaching 70C when playing demanding game, and reached 66C before the Heaven benchmark crashed. Is this "too hot" for the GTX970? Googling around, the consensus seems to be that 70C during demanding gameplay is not unusual, and 90C is the trouble threshold.

In some discussions about stock Apple Macs, I have seen advice along the lines of "disable your NVIDIA GPU and use only the native graphics." Since the whole point of my Hackie is to support the GTX970 this is not helpful :)

I am running NVIDIA Web Driver 346.03.15f02 -- is this a dangerous revision? should I roll forward, or back, to some other release? [see below, I've tried every valid release of the web driver and nothing helps so far]


Lastly I ask with dread... does this mean my expensive new Asus GeForce GTX970 is defective? What is an NVRM Read Error and whose NVRAM is being read? mobo? gpu? or does NVRM mean my M2 SSD, the system disk? What diagnostics can I use to locate the problem definitely in one component? Is this a known symptom of something such as overheating, card not properly seated, or other "dumb" physical causes that I might be able to remedy? I am about to try running Geekbench for the GPU to see if one particular test mode always causes the panic.

I'm bewildered because I have not changed *anything* about my Hackie since I got it running, yet all of a sudden the kernel is panicking after just a minute or two of any graphics-intensive app. So it sounds to me like hardware failure and that means hassle and expense (sigh).

POSTSCRIPT: just found a link here at tonymacx86 that might address my issue
https://www.tonymacx86.com/threads/...e-to-nvidia-driver-346-03-15f02.201022/page-8
what a relief it will be if this works. will report back.

UPDATE: nope, too bad. I tried
(a) rolling back to 15b01, that did not stop the nvrm panic running Heaven.
(b) 15f01, no good either, panic after similar number of seconds.
(c) back to 15f02, no good, panic.
(d) update OSX via Apple Store and try 15f03, compatible with latest patch. no good, Heaven benchmark still panics.

I am now out of ideas, other than to have a long talk with my retailer. Please tell me my GTX970 is not broken...

FINAL POSTSCRIPT:
https://www.******.com/r/hackintosh/comments/4fxlow/gtx_970_panic_restarts_under_heavy_load/
This sounds similar. Should I change my system definition from MacPro 3,1 to iMac 15,1? Might this help?
I also have read in one similar case a suggestion that too weak a power supply might cause the GPU to fail in this way. Does anyone think my P/S is underpowered for the configuration?
 
Last edited:
It's getting worse. Crashed at 0144 this morning, unattended, idle. Nothing running but desktop, browser, Steam, Hardware Monitor. Same error. I would be pretty sure it was hardware by now, except that too many other people are having similar crashes; it seems more likely that there's an insidious driver issue than that Asus has released a dud board.

Code:
Anonymous UUID:       037C27B0-51E4-83E3-95A0-DC9BE0A3AB89

Wed Sep 28 01:44:42 2016

*** Panic Report ***
panic(cpu 0 caller 0xffffff7f9bb18f1b): NVRM[0/2:0:0]: Read Error 0x00000100: CFG 0xffffffff 0xffffffff 0xffffffff, BAR0 0xde000000 0xffffff8213a8e000 0x124020a1, D0, P3/4
Backtrace (CPU 0), Frame : Return Address
0xffffff9216893810 : 0xffffff8019adab52
0xffffff9216893890 : 0xffffff7f9bb18f1b
0xffffff9216893950 : 0xffffff7f9bbf473e
0xffffff92168939b0 : 0xffffff7f9bbf47e9
0xffffff92168939e0 : 0xffffff7f9bea8141
0xffffff9216893a30 : 0xffffff7f9bea7801
0xffffff9216893aa0 : 0xffffff7f9bc21f73
0xffffff9216893ac0 : 0xffffff7f9bb1f43e
0xffffff9216893b70 : 0xffffff7f9bb1ccd4
0xffffff9216893d60 : 0xffffff7f9bb1df09
0xffffff9216893e40 : 0xffffff7f9c041078
0xffffff9216893e80 : 0xffffff7f9c002f2f
0xffffff9216893ea0 : 0xffffff7f9c050692
0xffffff9216893ec0 : 0xffffff7f9c050879
0xffffff9216893ef0 : 0xffffff801a0b52a6
0xffffff9216893f40 : 0xffffff801a0b3111
0xffffff9216893f80 : 0xffffff801a0b3206
0xffffff9216893fb0 : 0xffffff8019bc9117
      Kernel Extensions in backtrace:
         com.nvidia.web.NVDAResmanWeb(10.1.1)[372259D5-EEF0-3278-8478-8E09B6A46FDF]@0xffffff7f9bab2000->0xffffff7f9bd91fff
            dependency: com.apple.iokit.IOPCIFamily(2.9)[5447B943-A94D-3BD4-A60F-98B24D19CE93]@0xffffff7f9a24b000
            dependency: com.apple.iokit.IONDRVSupport(2.4.1)[4EB2843C-C821-3AD0-B333-575FD6ED6FB1]@0xffffff7f9ae9f000
            dependency: com.apple.iokit.IOGraphicsFamily(2.4.1)[A360453D-2050-3C49-A549-AC0DD5E87917]@0xffffff7f9ae58000
            dependency: com.apple.AppleGraphicsDeviceControl(3.12.8)[81C2784E-285A-38A7-A16E-515DCB816E0A]@0xffffff7f9baac000
         com.nvidia.web.NVDAGM100HalWeb(10.1.1)[65EFFD2C-C437-3581-BBE8-F54D24FD909D]@0xffffff7f9bd92000->0xffffff7f9bf8efff
            dependency: com.nvidia.web.NVDAResmanWeb(10.1.1)[372259D5-EEF0-3278-8478-8E09B6A46FDF]@0xffffff7f9bab2000
            dependency: com.apple.iokit.IOPCIFamily(2.9)[5447B943-A94D-3BD4-A60F-98B24D19CE93]@0xffffff7f9a24b000
         com.nvidia.web.GeForceWeb(10.1.1)[60573B81-B200-3DA9-A60C-C86EFBD9B8D8]@0xffffff7f9bffa000->0xffffff7f9c089fff
            dependency: com.apple.iokit.IOPCIFamily(2.9)[5447B943-A94D-3BD4-A60F-98B24D19CE93]@0xffffff7f9a24b000
            dependency: com.apple.iokit.IONDRVSupport(2.4.1)[4EB2843C-C821-3AD0-B333-575FD6ED6FB1]@0xffffff7f9ae9f000
            dependency: com.nvidia.web.NVDAResmanWeb(10.1.1)[372259D5-EEF0-3278-8478-8E09B6A46FDF]@0xffffff7f9bab2000
            dependency: com.apple.iokit.IOGraphicsFamily(2.4.1)[A360453D-2050-3C49-A549-AC0DD5E87917]@0xffffff7f9ae58000
            dependency: com.apple.iokit.IOAcceleratorFamily2(205.11)[569DA297-BC38-35C0-B909-6E8686BE0928]@0xffffff7f9bf8f000

BSD process name corresponding to current thread: kernel_task
Boot args: dart=0 mbasd=0 nvda_drv=1 

Mac OS version:
15G1004

Kernel version:
Darwin Kernel Version 15.6.0: Mon Aug 29 20:21:34 PDT 2016; root:xnu-3248.60.11~1/RELEASE_X86_64
Kernel UUID: E349749B-3303-3DDF-959C-B5885A0E1F6E
Kernel slide:     0x0000000019800000
Kernel text base: 0xffffff8019a00000
__HIB  text base: 0xffffff8019900000
System model name: MacPro3,1 (Mac-F42C88C8)

System uptime in nanoseconds: 5038536293623
last loaded kext at 4302919723577: com.apple.driver.AppleXsanScheme    3 (addr 0xffffff7f9c08a000, size 32768)
last unloaded kext at 4378810600913: com.apple.driver.AppleXsanScheme    3 (addr 0xffffff7f9c08a000, size 32768)
loaded kexts:
com.nvidia.CUDA    1.1.0
com.nvidia.web.GeForceWeb    10.1.1
com.nvidia.web.NVDAGM100HalWeb    10.1.1
com.nvidia.web.NVDAResmanWeb    10.1.1
org.tw.CodecCommander    2.6.2
com.driver.LogJoystick    2.0
com.insanelymac.IntelMausiEthernet    2.1.0d3
net.osx86.kexts.GenericUSBXHCI    1.2.11
org.hwsensors.driver.GPUSensors    1707
org.hwsensors.driver.ACPISensors    1707
org.hwsensors.driver.CPUSensors    1707
org.netkas.driver.FakeSMC    1707
com.rehabman.driver.USBInjectAll    0.5.10
com.apple.driver.AudioAUUC    1.70
com.apple.filesystems.autofs    3.0
com.apple.driver.AppleUpstreamUserClient    3.6.1
com.apple.driver.AppleMCCSControl    1.2.13
com.apple.driver.AppleHDA    274.12
com.apple.driver.pmtelemetry    1
com.apple.iokit.IOUserEthernet    1.0.1
com.apple.iokit.IOBluetoothSerialManager    4.4.6f1
com.apple.Dont_Steal_Mac_OS_X    7.0.0
com.apple.driver.AppleHV    1
com.apple.driver.AppleOSXWatchdog    1
com.apple.driver.AppleIntelPCHPMC    1.1
com.apple.driver.AppleIntelSlowAdaptiveClocking    4.0.0
com.apple.driver.ACPI_SMC_PlatformPlugin    1.0.0
com.apple.driver.AppleUSBLegacyHub    900.4.1
com.apple.AppleFSCompression.AppleFSCompressionTypeDataless    1.0.0d1
com.apple.AppleFSCompression.AppleFSCompressionTypeZlib    1.0.0
com.apple.BootCache    38
com.apple.iokit.IOAHCIBlockStorage    2.8.5
com.apple.driver.AppleAHCIPort    3.1.8
com.apple.driver.AppleACPIButtons    4.0
com.apple.driver.AppleHPET    1.8
com.apple.driver.AppleRTC    2.0
com.apple.driver.AppleACPIEC    4.0
com.apple.driver.AppleSMBIOS    2.1
com.apple.driver.AppleAPIC    1.7
com.apple.nke.applicationfirewall    163
com.apple.security.quarantine    3
com.apple.security.TMSafetyNet    8
com.apple.kext.triggers    1.0
com.apple.driver.AppleSMBusController    1.0.14d1
com.apple.iokit.IOAcceleratorFamily2    205.11
com.apple.AppleGraphicsDeviceControl    3.12.8
com.apple.driver.DspFuncLib    274.12
com.apple.kext.OSvKernDSPLib    525
com.apple.iokit.IOSurface    108.2.3
com.apple.iokit.IOSerialFamily    11
com.apple.iokit.IOBluetoothFamily    4.4.6f1
com.apple.driver.CoreCaptureResponder    1
com.apple.driver.corecapture    1.0.4
com.apple.iokit.IONDRVSupport    2.4.1
com.apple.driver.AppleHDAController    274.12
com.apple.iokit.IOGraphicsFamily    2.4.1
com.apple.iokit.IOHDAFamily    274.12
com.apple.iokit.IOAudioFamily    204.4
com.apple.vecLib.kext    1.2.0
com.apple.iokit.IOSlowAdaptiveClockingFamily    1.0.0
com.apple.driver.AppleSMC    3.1.9
com.apple.driver.IOPlatformPluginLegacy    1.0.0
com.apple.driver.IOPlatformPluginFamily    6.0.0d7
com.apple.iokit.IOSCSIArchitectureModelFamily    3.7.7
com.apple.driver.usb.IOUSBHostHIDDevice    1.0.1
com.apple.driver.usb.cdc    5.0.0
com.apple.driver.usb.networking    5.0.0
com.apple.driver.usb.AppleUSBHostCompositeDevice    1.0.1
com.apple.driver.usb.AppleUSBHub    1.0.1
com.apple.iokit.IONetworkingFamily    3.2
com.apple.iokit.IOUSBFamily    900.4.1
com.apple.iokit.IOAHCIFamily    2.8.1
com.apple.driver.usb.AppleUSBXHCIPCI    1.0.1
com.apple.driver.usb.AppleUSBXHCI    1.0.1
com.apple.iokit.IOUSBHostFamily    1.0.1
com.apple.driver.AppleUSBHostMergeProperties    1.0.1
com.apple.driver.AppleEFINVRAM    2.0
com.apple.iokit.IOHIDFamily    2.0.0
com.apple.driver.AppleEFIRuntime    2.0
com.apple.iokit.IOSMBusFamily    1.1
com.apple.security.sandbox    300.0
com.apple.kext.AppleMatch    1.0.0d1
com.apple.driver.AppleKeyStore    2
com.apple.driver.AppleMobileFileIntegrity    1.0.5
com.apple.driver.AppleCredentialManager    1.0
com.apple.driver.DiskImages    417.4
com.apple.iokit.IOStorageFamily    2.1
com.apple.iokit.IOReportFamily    31
com.apple.driver.AppleFDEKeyStore    28.30
com.apple.driver.AppleACPIPlatform    4.0
com.apple.iokit.IOPCIFamily    2.9
com.apple.iokit.IOACPIFamily    1.4
com.apple.kec.Libm    1
com.apple.kec.pthread    1
com.apple.kec.corecrypto    1.0
 
Here is some advice from ROG but it applies to a Windoze-only utility, I don't think these GPU settings are accessible from OSX.

1) Open Nvidia Control Panel
2) Disable GPU audio in the "set up digital audio" menu
3) Go to Manage 3D settings -> Select Program settings -> Click add then select the game from the list
4) From Options change:
a) Power management mode: Prefer maximum performance
b) Triple buffer: Off
c) Thread optimisation: ON
d) Vertical sync: Off


I'm guessing there is no Nvidia Control Panel utility for OSX. Anyone know of another way to get at these settings?

I also found this (to me) cryptic note on InsanelyMac forum:

I run a 9 series GA board and I had emuvariable.efi and partitiondxe.efi installed. I ran clover again and during setup I unchecked these options. Installed Clover r2905 and rebooted the KP had gone.

but I'm afraid I don't understand what this poster is talking about, what .efi files are, etc. Also this is old info (2014) and may no longer apply.

Also found another post on this forum, same problem but with an MSI board. So it's getting hard to blame the hardware, but I am at wits' end, no idea what I can try next if this is a software issue. Help... help...
 
Last edited:
RESOLVED!

This is actually a little embarrassing. And I would recommend to anyone else who is having this maddening problem: dumb as it seems, give this a try.

This morning I went to check on the Hackie and the graphics card seemed stone dead -- no video output at all. I thought it was time to call the vendor at last, but before talking to them I would do just one more thing. Obviously (20/20 hindsight) this was what I should have tried first, before embarking on a carnival of errors and frustrations trying to fix software that wasn't broken...

I opened up the case at last and just rocked the Asus Strix card a bit -- didn't pull it, just pressed gently on each end, jiggled it a little in the slot. Left the case open and the HDD out (everything I need is on the SSD), booted up -- and wow, there was the BIOS splash screen and Clover screen, and I had normal video.

So just for grins and giggles I started the Heaven benchmark (remember that for the last 2 days I have only been able to run this benchmark for a few seconds before a kernel panic). And it ran. It just ran. I left it running. It kept running. GPU temp reached its usual 70-71C under heavy load, but never a glitch. Yippee!

So... now I wonder. Is there something about the mobo and the huge heavy GTX970 card that makes it really marginal when the mobo is in the normal, vertical position? Is there something marginal about that PCI slot? Do Asus mobos and graphics cards suffer from a low build quality that makes them fragile in operation? Is there a really sketchy solder joint on this particular GTX card? Or is this just the brave new world of modern high-density, high-temp cards?

I'm gonna try running the machine on its side (that will compromise the cooling a bit but I think the fans will compensate) for a few days to see whether it's happier in that orientation. I don't remember seeing this kind of physical-connector flakiness "in the old days" (i.e. building PC-based Linux systems about 15 years ago). This kind of thing I used to see with Qbus and Unibus backplane machines :)

Anyway, this is great news, though I am kind of facepalming because I wish I had just got out the screwdrivers and tried this before going through many wasted hours of head-banging. At least I learned something about emergency recovery procedures, booting from external media, etc.
 
Interesting :) I have been seeing that same NVRM Read Error for the last couple of days. I did have the gfx card out a few days ago when testing an old ATi 5770 card. So have just whipped it out again given the slot a blow and re-fitted - hopefully it will resolve like yours did.
 
Status
Not open for further replies.
Back
Top