RESOLVED: NVIDIA related kernel panic, chronic, severe...

Discussion in 'General Help' started by Tazling, Sep 28, 2016.

  1. Tazling

    Tazling

    Joined:
    Aug 31, 2016
    Messages:
    58
    Sep 28, 2016 at 5:52 AM #1
    Tazling

    Tazling

    Joined:
    Aug 31, 2016
    Messages:
    58
    My shiny new Hackintosh
    https://www.tonymacx86.com/threads/...ene-i7-6700k-asus-geforce-gtx970-gigi.202270/

    Hardware Overview:
    Model Name: Mac Pro
    Model Identifier: MacPro3,1
    Processor Name: Intel Core i7
    Processor Speed: 4.01 GHz
    Number of Processors: 1
    Total Number of Cores: 4
    L2 Cache (per Core): 256 KB
    L3 Cache: 8 MB
    Memory: 16 GB
    Bus Speed: 400 MHz
    Boot ROM Version: MP31.006C.B05
    SMC Version (system): 1.25f4

    has been working perfectly since final assembly and test about a week ago. It's been absolutely problem-free and every OSX app and function has "just worked". But then [cue Jaws theme]:

    Yesterday evening while playing a favourite game I heard audio stutter, then saw video stutter, then WHAM, crash -- not just to desktop but to a reboot (kernel panic). "That was strange," sez I, and when boot was complete I cautiously restarted the game, then played for a couple of hours without incident. Thought it might have been a power glitch or gremlins, who knows. A once-off event. But tonight, I can't do anything gpu intensive for more than a few minutes without a kernel panic:


    Tue Sep 27 20:48:25 2016

    *** Panic Report ***
    panic(cpu 0 caller 0xffffff7f9721aeab): NVRM[0/2:0:0]: Read Error 0x006100c0: CFG 0xffffffff 0xffffffff 0xffffffff, BAR0 0xde000000 0xffffff82108ae000 0x124020a1, D0, P3/4
    [...]

    Tue Sep 27 20:56:55 2016

    *** Panic Report ***
    panic(cpu 6 caller 0xffffff7f8aa1aeab): NVRM[0/2:0:0]: Read Error 0x00100f04: CFG 0xffffffff 0xffffffff 0xffffffff, BAR0 0xde000000 0xffffff8203f8e000 0x124020a1, D0, P3/4
    [...]

    Tue Sep 27 21:02:59 2016

    *** Panic Report ***
    panic(cpu 0 caller 0xffffff7f8941aeab): NVRM[0/2:0:0]: Read Error 0x00000100: CFG 0xffffffff 0xffffffff 0xffffffff, BAR0 0xde000000 0xffffff820296e000 0x124020a1, D0, P3/4
    [...]

    Tue Sep 27 21:12:27 2016

    *** Panic Report ***
    panic(cpu 0 caller 0xffffff7f88f10eab): NVRM[0/2:0:0]: Read Error 0x00000100: CFG 0xffffffff 0xffffffff 0xffffffff, BAR0 0xde000000 0xffffff8200e7e000 0x124020a1, D0, P3/4
    [...]


    In every case the backtrace leads to the NVIDIA driver and Apple IOKit.

    I can't play the aforementioned game and I can't run the Heaven benchmark any more; it runs for a while -- not a consistent length of time, but "sorta" the same length of time -- and then wham, audio glitch, audio loop, screen goes dark, crash. And the reboot about 3 out of 4 times does not succeed; I get lots of disk LED blinking but a dark screen. However if I cycle power, I get a normal boot with both screens working... so long as I just run ordinary apps that don't challenge the GTX970. Hackie seems perfectly normal so long as I don't stress the GPU. I note that my GPU temp has been reaching 70C when playing demanding game, and reached 66C before the Heaven benchmark crashed. Is this "too hot" for the GTX970? Googling around, the consensus seems to be that 70C during demanding gameplay is not unusual, and 90C is the trouble threshold.

    In some discussions about stock Apple Macs, I have seen advice along the lines of "disable your NVIDIA GPU and use only the native graphics." Since the whole point of my Hackie is to support the GTX970 this is not helpful :)

    I am running NVIDIA Web Driver 346.03.15f02 -- is this a dangerous revision? should I roll forward, or back, to some other release? [see below, I've tried every valid release of the web driver and nothing helps so far]


    Lastly I ask with dread... does this mean my expensive new Asus GeForce GTX970 is defective? What is an NVRM Read Error and whose NVRAM is being read? mobo? gpu? or does NVRM mean my M2 SSD, the system disk? What diagnostics can I use to locate the problem definitely in one component? Is this a known symptom of something such as overheating, card not properly seated, or other "dumb" physical causes that I might be able to remedy? I am about to try running Geekbench for the GPU to see if one particular test mode always causes the panic.

    I'm bewildered because I have not changed *anything* about my Hackie since I got it running, yet all of a sudden the kernel is panicking after just a minute or two of any graphics-intensive app. So it sounds to me like hardware failure and that means hassle and expense (sigh).

    POSTSCRIPT: just found a link here at tonymacx86 that might address my issue
    https://www.tonymacx86.com/threads/...e-to-nvidia-driver-346-03-15f02.201022/page-8
    what a relief it will be if this works. will report back.

    UPDATE: nope, too bad. I tried
    (a) rolling back to 15b01, that did not stop the nvrm panic running Heaven.
    (b) 15f01, no good either, panic after similar number of seconds.
    (c) back to 15f02, no good, panic.
    (d) update OSX via Apple Store and try 15f03, compatible with latest patch. no good, Heaven benchmark still panics.

    I am now out of ideas, other than to have a long talk with my retailer. Please tell me my GTX970 is not broken...

    FINAL POSTSCRIPT:
    https://www.reddit.com/r/hackintosh/comments/4fxlow/gtx_970_panic_restarts_under_heavy_load/
    This sounds similar. Should I change my system definition from MacPro 3,1 to iMac 15,1? Might this help?
    I also have read in one similar case a suggestion that too weak a power supply might cause the GPU to fail in this way. Does anyone think my P/S is underpowered for the configuration?
     
    Last edited: Sep 28, 2016
  2. Tazling

    Tazling

    Joined:
    Aug 31, 2016
    Messages:
    58
    Sep 28, 2016 at 5:59 PM #2
    Tazling

    Tazling

    Joined:
    Aug 31, 2016
    Messages:
    58
    It's getting worse. Crashed at 0144 this morning, unattended, idle. Nothing running but desktop, browser, Steam, Hardware Monitor. Same error. I would be pretty sure it was hardware by now, except that too many other people are having similar crashes; it seems more likely that there's an insidious driver issue than that Asus has released a dud board.

    Code (Text):

    Anonymous UUID:       037C27B0-51E4-83E3-95A0-DC9BE0A3AB89

    Wed Sep 28 01:44:42 2016

    *** Panic Report ***
    panic(cpu 0 caller 0xffffff7f9bb18f1b): NVRM[0/2:0:0]: Read Error 0x00000100: CFG 0xffffffff 0xffffffff 0xffffffff, BAR0 0xde000000 0xffffff8213a8e000 0x124020a1, D0, P3/4
    Backtrace (CPU 0), Frame : Return Address
    0xffffff9216893810 : 0xffffff8019adab52
    0xffffff9216893890 : 0xffffff7f9bb18f1b
    0xffffff9216893950 : 0xffffff7f9bbf473e
    0xffffff92168939b0 : 0xffffff7f9bbf47e9
    0xffffff92168939e0 : 0xffffff7f9bea8141
    0xffffff9216893a30 : 0xffffff7f9bea7801
    0xffffff9216893aa0 : 0xffffff7f9bc21f73
    0xffffff9216893ac0 : 0xffffff7f9bb1f43e
    0xffffff9216893b70 : 0xffffff7f9bb1ccd4
    0xffffff9216893d60 : 0xffffff7f9bb1df09
    0xffffff9216893e40 : 0xffffff7f9c041078
    0xffffff9216893e80 : 0xffffff7f9c002f2f
    0xffffff9216893ea0 : 0xffffff7f9c050692
    0xffffff9216893ec0 : 0xffffff7f9c050879
    0xffffff9216893ef0 : 0xffffff801a0b52a6
    0xffffff9216893f40 : 0xffffff801a0b3111
    0xffffff9216893f80 : 0xffffff801a0b3206
    0xffffff9216893fb0 : 0xffffff8019bc9117
          Kernel Extensions in backtrace:
             com.nvidia.web.NVDAResmanWeb(10.1.1)[372259D5-EEF0-3278-8478-8E09B6A46FDF]@0xffffff7f9bab2000->0xffffff7f9bd91fff
                dependency: com.apple.iokit.IOPCIFamily(2.9)[5447B943-A94D-3BD4-A60F-98B24D19CE93]@0xffffff7f9a24b000
                dependency: com.apple.iokit.IONDRVSupport(2.4.1)[4EB2843C-C821-3AD0-B333-575FD6ED6FB1]@0xffffff7f9ae9f000
                dependency: com.apple.iokit.IOGraphicsFamily(2.4.1)[A360453D-2050-3C49-A549-AC0DD5E87917]@0xffffff7f9ae58000
                dependency: com.apple.AppleGraphicsDeviceControl(3.12.8)[81C2784E-285A-38A7-A16E-515DCB816E0A]@0xffffff7f9baac000
             com.nvidia.web.NVDAGM100HalWeb(10.1.1)[65EFFD2C-C437-3581-BBE8-F54D24FD909D]@0xffffff7f9bd92000->0xffffff7f9bf8efff
                dependency: com.nvidia.web.NVDAResmanWeb(10.1.1)[372259D5-EEF0-3278-8478-8E09B6A46FDF]@0xffffff7f9bab2000
                dependency: com.apple.iokit.IOPCIFamily(2.9)[5447B943-A94D-3BD4-A60F-98B24D19CE93]@0xffffff7f9a24b000
             com.nvidia.web.GeForceWeb(10.1.1)[60573B81-B200-3DA9-A60C-C86EFBD9B8D8]@0xffffff7f9bffa000->0xffffff7f9c089fff
                dependency: com.apple.iokit.IOPCIFamily(2.9)[5447B943-A94D-3BD4-A60F-98B24D19CE93]@0xffffff7f9a24b000
                dependency: com.apple.iokit.IONDRVSupport(2.4.1)[4EB2843C-C821-3AD0-B333-575FD6ED6FB1]@0xffffff7f9ae9f000
                dependency: com.nvidia.web.NVDAResmanWeb(10.1.1)[372259D5-EEF0-3278-8478-8E09B6A46FDF]@0xffffff7f9bab2000
                dependency: com.apple.iokit.IOGraphicsFamily(2.4.1)[A360453D-2050-3C49-A549-AC0DD5E87917]@0xffffff7f9ae58000
                dependency: com.apple.iokit.IOAcceleratorFamily2(205.11)[569DA297-BC38-35C0-B909-6E8686BE0928]@0xffffff7f9bf8f000

    BSD process name corresponding to current thread: kernel_task
    Boot args: dart=0 mbasd=0 nvda_drv=1

    Mac OS version:
    15G1004

    Kernel version:
    Darwin Kernel Version 15.6.0: Mon Aug 29 20:21:34 PDT 2016; root:xnu-3248.60.11~1/RELEASE_X86_64
    Kernel UUID: E349749B-3303-3DDF-959C-B5885A0E1F6E
    Kernel slide:     0x0000000019800000
    Kernel text base: 0xffffff8019a00000
    __HIB  text base: 0xffffff8019900000
    System model name: MacPro3,1 (Mac-F42C88C8)

    System uptime in nanoseconds: 5038536293623
    last loaded kext at 4302919723577: com.apple.driver.AppleXsanScheme    3 (addr 0xffffff7f9c08a000, size 32768)
    last unloaded kext at 4378810600913: com.apple.driver.AppleXsanScheme    3 (addr 0xffffff7f9c08a000, size 32768)
    loaded kexts:
    com.nvidia.CUDA    1.1.0
    com.nvidia.web.GeForceWeb    10.1.1
    com.nvidia.web.NVDAGM100HalWeb    10.1.1
    com.nvidia.web.NVDAResmanWeb    10.1.1
    org.tw.CodecCommander    2.6.2
    com.driver.LogJoystick    2.0
    com.insanelymac.IntelMausiEthernet    2.1.0d3
    net.osx86.kexts.GenericUSBXHCI    1.2.11
    org.hwsensors.driver.GPUSensors    1707
    org.hwsensors.driver.ACPISensors    1707
    org.hwsensors.driver.CPUSensors    1707
    org.netkas.driver.FakeSMC    1707
    com.rehabman.driver.USBInjectAll    0.5.10
    com.apple.driver.AudioAUUC    1.70
    com.apple.filesystems.autofs    3.0
    com.apple.driver.AppleUpstreamUserClient    3.6.1
    com.apple.driver.AppleMCCSControl    1.2.13
    com.apple.driver.AppleHDA    274.12
    com.apple.driver.pmtelemetry    1
    com.apple.iokit.IOUserEthernet    1.0.1
    com.apple.iokit.IOBluetoothSerialManager    4.4.6f1
    com.apple.Dont_Steal_Mac_OS_X    7.0.0
    com.apple.driver.AppleHV    1
    com.apple.driver.AppleOSXWatchdog    1
    com.apple.driver.AppleIntelPCHPMC    1.1
    com.apple.driver.AppleIntelSlowAdaptiveClocking    4.0.0
    com.apple.driver.ACPI_SMC_PlatformPlugin    1.0.0
    com.apple.driver.AppleUSBLegacyHub    900.4.1
    com.apple.AppleFSCompression.AppleFSCompressionTypeDataless    1.0.0d1
    com.apple.AppleFSCompression.AppleFSCompressionTypeZlib    1.0.0
    com.apple.BootCache    38
    com.apple.iokit.IOAHCIBlockStorage    2.8.5
    com.apple.driver.AppleAHCIPort    3.1.8
    com.apple.driver.AppleACPIButtons    4.0
    com.apple.driver.AppleHPET    1.8
    com.apple.driver.AppleRTC    2.0
    com.apple.driver.AppleACPIEC    4.0
    com.apple.driver.AppleSMBIOS    2.1
    com.apple.driver.AppleAPIC    1.7
    com.apple.nke.applicationfirewall    163
    com.apple.security.quarantine    3
    com.apple.security.TMSafetyNet    8
    com.apple.kext.triggers    1.0
    com.apple.driver.AppleSMBusController    1.0.14d1
    com.apple.iokit.IOAcceleratorFamily2    205.11
    com.apple.AppleGraphicsDeviceControl    3.12.8
    com.apple.driver.DspFuncLib    274.12
    com.apple.kext.OSvKernDSPLib    525
    com.apple.iokit.IOSurface    108.2.3
    com.apple.iokit.IOSerialFamily    11
    com.apple.iokit.IOBluetoothFamily    4.4.6f1
    com.apple.driver.CoreCaptureResponder    1
    com.apple.driver.corecapture    1.0.4
    com.apple.iokit.IONDRVSupport    2.4.1
    com.apple.driver.AppleHDAController    274.12
    com.apple.iokit.IOGraphicsFamily    2.4.1
    com.apple.iokit.IOHDAFamily    274.12
    com.apple.iokit.IOAudioFamily    204.4
    com.apple.vecLib.kext    1.2.0
    com.apple.iokit.IOSlowAdaptiveClockingFamily    1.0.0
    com.apple.driver.AppleSMC    3.1.9
    com.apple.driver.IOPlatformPluginLegacy    1.0.0
    com.apple.driver.IOPlatformPluginFamily    6.0.0d7
    com.apple.iokit.IOSCSIArchitectureModelFamily    3.7.7
    com.apple.driver.usb.IOUSBHostHIDDevice    1.0.1
    com.apple.driver.usb.cdc    5.0.0
    com.apple.driver.usb.networking    5.0.0
    com.apple.driver.usb.AppleUSBHostCompositeDevice    1.0.1
    com.apple.driver.usb.AppleUSBHub    1.0.1
    com.apple.iokit.IONetworkingFamily    3.2
    com.apple.iokit.IOUSBFamily    900.4.1
    com.apple.iokit.IOAHCIFamily    2.8.1
    com.apple.driver.usb.AppleUSBXHCIPCI    1.0.1
    com.apple.driver.usb.AppleUSBXHCI    1.0.1
    com.apple.iokit.IOUSBHostFamily    1.0.1
    com.apple.driver.AppleUSBHostMergeProperties    1.0.1
    com.apple.driver.AppleEFINVRAM    2.0
    com.apple.iokit.IOHIDFamily    2.0.0
    com.apple.driver.AppleEFIRuntime    2.0
    com.apple.iokit.IOSMBusFamily    1.1
    com.apple.security.sandbox    300.0
    com.apple.kext.AppleMatch    1.0.0d1
    com.apple.driver.AppleKeyStore    2
    com.apple.driver.AppleMobileFileIntegrity    1.0.5
    com.apple.driver.AppleCredentialManager    1.0
    com.apple.driver.DiskImages    417.4
    com.apple.iokit.IOStorageFamily    2.1
    com.apple.iokit.IOReportFamily    31
    com.apple.driver.AppleFDEKeyStore    28.30
    com.apple.driver.AppleACPIPlatform    4.0
    com.apple.iokit.IOPCIFamily    2.9
    com.apple.iokit.IOACPIFamily    1.4
    com.apple.kec.Libm    1
    com.apple.kec.pthread    1
    com.apple.kec.corecrypto    1.0
     
     
  3. Tazling

    Tazling

    Joined:
    Aug 31, 2016
    Messages:
    58
    Sep 28, 2016 at 6:16 PM #3
    Tazling

    Tazling

    Joined:
    Aug 31, 2016
    Messages:
    58
    Here is some advice from ROG but it applies to a Windoze-only utility, I don't think these GPU settings are accessible from OSX.

    1) Open Nvidia Control Panel
    2) Disable GPU audio in the "set up digital audio" menu
    3) Go to Manage 3D settings -> Select Program settings -> Click add then select the game from the list
    4) From Options change:
    a) Power management mode: Prefer maximum performance
    b) Triple buffer: Off
    c) Thread optimisation: ON
    d) Vertical sync: Off


    I'm guessing there is no Nvidia Control Panel utility for OSX. Anyone know of another way to get at these settings?

    I also found this (to me) cryptic note on InsanelyMac forum:

    I run a 9 series GA board and I had emuvariable.efi and partitiondxe.efi installed. I ran clover again and during setup I unchecked these options. Installed Clover r2905 and rebooted the KP had gone.

    but I'm afraid I don't understand what this poster is talking about, what .efi files are, etc. Also this is old info (2014) and may no longer apply.

    Also found another post on this forum, same problem but with an MSI board. So it's getting hard to blame the hardware, but I am at wits' end, no idea what I can try next if this is a software issue. Help... help...
     
    Last edited: Sep 28, 2016
  4. Tazling

    Tazling

    Joined:
    Aug 31, 2016
    Messages:
    58
    Sep 29, 2016 at 7:09 PM #4
    Tazling

    Tazling

    Joined:
    Aug 31, 2016
    Messages:
    58
    RESOLVED!

    This is actually a little embarrassing. And I would recommend to anyone else who is having this maddening problem: dumb as it seems, give this a try.

    This morning I went to check on the Hackie and the graphics card seemed stone dead -- no video output at all. I thought it was time to call the vendor at last, but before talking to them I would do just one more thing. Obviously (20/20 hindsight) this was what I should have tried first, before embarking on a carnival of errors and frustrations trying to fix software that wasn't broken...

    I opened up the case at last and just rocked the Asus Strix card a bit -- didn't pull it, just pressed gently on each end, jiggled it a little in the slot. Left the case open and the HDD out (everything I need is on the SSD), booted up -- and wow, there was the BIOS splash screen and Clover screen, and I had normal video.

    So just for grins and giggles I started the Heaven benchmark (remember that for the last 2 days I have only been able to run this benchmark for a few seconds before a kernel panic). And it ran. It just ran. I left it running. It kept running. GPU temp reached its usual 70-71C under heavy load, but never a glitch. Yippee!

    So... now I wonder. Is there something about the mobo and the huge heavy GTX970 card that makes it really marginal when the mobo is in the normal, vertical position? Is there something marginal about that PCI slot? Do Asus mobos and graphics cards suffer from a low build quality that makes them fragile in operation? Is there a really sketchy solder joint on this particular GTX card? Or is this just the brave new world of modern high-density, high-temp cards?

    I'm gonna try running the machine on its side (that will compromise the cooling a bit but I think the fans will compensate) for a few days to see whether it's happier in that orientation. I don't remember seeing this kind of physical-connector flakiness "in the old days" (i.e. building PC-based Linux systems about 15 years ago). This kind of thing I used to see with Qbus and Unibus backplane machines :)

    Anyway, this is great news, though I am kind of facepalming because I wish I had just got out the screwdrivers and tried this before going through many wasted hours of head-banging. At least I learned something about emergency recovery procedures, booting from external media, etc.
     
    NamTran likes this.

Share This Page