Contribute
Register

The screen will be stuck when CUDA application is running

Status
Not open for further replies.
Joined
Mar 15, 2017
Messages
120
Motherboard
Acrock Z390 Phantom Gaming-ITX/ac
CPU
i9 9900k
Graphics
RX Vega 64
Mac
  1. iMac
  2. MacBook Pro
  3. Mac Pro
Mobile Phone
  1. iOS
I have a Hackintosh with x99 and 1080ti. The system seems quite flawless and stable, and it can be burned by P95 and valley (OpenGL) more than one week. For daily use, I also feel well about it.

However, once I execute something using CUDA, the system will crash. To be specific, the screen will be stuck, and the sound output will keep repeating in about one second.

For example, my Hackintosh will crash soonly once any one of the following applications is executed:

1. keras (tensorflow backend) mnist sample
2. pytorch mnist sample
3. cuda samples/5_Simulations/particles

The system will crash in a random but not long time after CUDA is using. I was wondering whether it is because the MAC CUDA is not stable enough or a Hackintosh cannot run CUDA stable, or I do something wrong on the build of my Hackintosh?

Thanks for any reply!

Btw, my configuration can run some deep learning applications by CUDA for a very long time with no error in Linux. So the hardware should be ok.

According to other posts, some guys reports a similar issue which is related to PMDrvr.kext. However, my problem is not solved by removing EIST and PMDrvr.kest.
 
Last edited:
I tried to debug but there exists nothing on system log. It seems that the freezing cannot let the logger record everything.

It seems that my hackintosh build is quite stable if no cuda applications are executed. After disable EIST and PMDrvr.kext, the freezing issue is still there.
 
I have a Hackintosh with x99 and 1080ti. The system seems quite flawless and stable, and it can be burned by P95 and valley (OpenGL) more than one week. For daily use, I also feel well about it.

However, once I execute something using CUDA, the system will crash. To be specific, the screen will be stuck, and the sound output will keep repeating in about one second.

For example, my Hackintosh will crash soonly once any one of the following applications is executed:

1. keras (tensorflow backend) mnist sample
2. pytorch mnist sample
3. cuda samples/5_Simulations/particles

The system will crash in a random but not long time after CUDA is using. I was wondering whether it is because the MAC CUDA is not stable enough or a Hackintosh cannot run CUDA stable, or I do something wrong on the build of my Hackintosh?

Thanks for any reply!

Btw, my configuration can run some deep learning applications by CUDA for a very long time with no error in Linux. So the hardware should be ok.

According to other posts, some guys reports a similar issue which is related to PMDrvr.kext. However, my problem is not solved by removing EIST and PMDrvr.kest.
I had 100% similar issue and i solved my issue by changing overclock settings and by removing PMDrvr.kext. Can you please upload your Bios settings?
 
I had 100% similar issue and i solved my issue by changing overclock settings and by removing PMDrvr.kext. Can you please upload your Bios settings?

Thanks for you reply.

Do you mean that your build will only be freezing if a cuda application is executed? Have you 100% solved this issue which means that your build can to be stable on high-pressure Cuda applications for a long time? Can you share some details about how you solve it?

I just use the built-in XMP setting since my main memory is a built-in 3200 XMP kit. Under the built-in XMP setting, the cores are set to 38x with 31x cache. Also, the main memory frequency is set to 3200.

For other BIOS setting, it is similar to [https://www.tonymacx86.com/threads/...0-13-on-x99-full-success.227001/#post-1542618] recommended.

I will try to disable XMP after this day's work and see whether my Cuda tasks can be stable.
 
Last edited:
Thanks for you reply.

Do you mean that your build will only be freezing if a cuda application is executed? Have you 100% solved this issue which means that your build can to be stable on high-pressure Cuda applications for a long time? Can you share some details about how you solve it?

I just use the built-in XMP setting since my main memory is a built-in 3200 XMP kit. Under the built-in XMP setting, the cores are set to 38x with 31x cache. Also, the main memory frequency is set to 3200.

For other BIOS setting, it is similar to [https://www.tonymacx86.com/threads/...0-13-on-x99-full-success.227001/#post-1542618] recommended.

I will try to disable XMP after this day's work and see whether my Cuda tasks can be stable.
Hi,
while using After Effects or Premier Pro or DaVinci Resolve i was facing same kind of problem. but now it gone after making few changes in my bios so I am uploading My Bios settings
P_20180327_155106.jpg

P_20180327_155149.jpg

P_20180327_155209.jpg
 
@alok0182 Thanks for your reply.

From your updates, I noticed that the following changes are made;
1) sync all core to 40x with 30x cache ratio
2) 1.9v CPU input voltage
3) 4095 for both long and short duration package power limit

Can you share if I miss something and the reason why you do such changes?

My MB is ASUS X99-II, I noticed that this MB will set the CPU and cache ratio to 38 and 31 automatically. If I just change them to Auto, the random freezing issue seems to be alleviated. I am still running a heavy CUDA testing, and it has been running over 10 hours (unfortunately, it freezes again after 14 hours). So it can be concluded that the CPU ratio contributes to the random freezing issue.

However, I do not think that such instability is caused by immoderate overclocking. Since,

1) I do not think XMP setting could be immoderate. For 6800k, 3.8G with 31x cache should be of default design. (And I also tried to set cache ratio to 28x, the default and the freezing issue is still there. Thus, it is not related to cache ratio.)
2) My configuration can run on 4.0G with 35x cache with proper voltage addition. It is completely stable on Windows and Linux with all kinds of torture (including CPU, CUDA, and others). On MacOSX, the random freezing issue will only appear on CUDA applications.

Meanwhile, a lot of guys also reported that PMDrvr.kext would definitely cause problems on CUDA. And this observation also conforms to my experiment.

(I am not familiar with MacOSX CPU Power Management, so I just guess that) this issue is somehow related to the relationship between CPU frequency management and CUDA.

As my observation, I try to narrow this issue that, in my X99 Hackintosh build with i7-6800k, a CUDA application will cause freezing when
1) EIST enabled with PMDrvr.kext;
2) in BIOS, the CPU ratio is not set to Auto.

Therefore, it is highly suspectable that some bug will be triggered if the CPU ratio is not set to default.

@RehabMan Could you have a look at this issue since you are professional in this domain. Is it possible that this issue is related to Piker Alpha's ssdtPGGen or anything associated with XCPM?

@alok0182 Since your system is overclocked, could you share some details about how do you deal with things related to CPU frequency, such as XCPM?

UPDATE1:
Unfortunately, the heavy CUDA testing fails. The system is freezing again in about 14 hours. However, the stability is obviously improved than before. I will try to add some voltage and tune the BIOS setting to see whether it is possible to use CUDA stably.
 
Last edited:
@alok0182 Thanks for your reply.

From your updates, I noticed that the following changes are made;
1) sync all core to 40x with 30x cache ratio
2) 1.9v CPU input voltage
3) 4095 for both long and short duration package power limit

Can you share if I miss something and the reason why you do such changes?

My MB is ASUS X99-II, I noticed that this MB will set the CPU and cache ratio to 38 and 31 automatically. If I just change them to Auto, the random freezing issue seems to be alleviated. I am still running a heavy CUDA testing, and it has been running over 10 hours (unfortunately, it freezes again after 14 hours). So it can be concluded that the CPU ratio contributes to the random freezing issue.

However, I do not think that such instability is caused by immoderate overclocking. Since,

1) I do not think XMP setting could be immoderate. For 6800k, 3.8G with 31x cache should be of default design. (And I also tried to set cache ratio to 28x, the default and the freezing issue is still there. Thus, it is not related to cache ratio.)
2) My configuration can run on 4.0G with 35x cache with proper voltage addition. It is completely stable on Windows and Linux with all kinds of torture (including CPU, CUDA, and others). On MacOSX, the random freezing issue will only appear on CUDA applications.

Meanwhile, a lot of guys also reported that PMDrvr.kext would definitely cause problems on CUDA. And this observation also conforms to my experiment.

(I am not familiar with MacOSX CPU Power Management, so I just guess that) this issue is somehow related to the relationship between CPU frequency management and CUDA.

As my observation, I try to narrow this issue that, in my X99 Hackintosh build with i7-6800k, a CUDA application will cause freezing when
1) EIST enabled with PMDrvr.kext;
2) in BIOS, the CPU ratio is not set to Auto.

Therefore, it is highly suspectable that some bug will be triggered if the CPU ratio is not set to default.

@RehabMan Could you have a look at this issue since you are professional in this domain. Is it possible that this issue is related to Piker Alpha's ssdtPGGen or anything associated with XCPM?

@alok0182 Since your system is overclocked, could you share some details about how do you deal with things related to CPU frequency, such as XCPM?

UPDATE1:
Unfortunately, the heavy CUDA testing fails. The system is freezing again in about 14 hours. However, the stability is obviously improved than before. I will try to add some voltage and tune the BIOS setting to see whether it is possible to use CUDA stably.
Hi.
According to my observation this issue is related to CPU Voltage, RAM Voltage. I found when i set CPU Cache Ratio above 30 then voltage consumption was very very high and i was facing freezing issue also, then i read some online article about i7 6800K and found it is a very bad CPU for Over Clocking due to that high voltage requirement for stability. So I was tweaking those things like Core Ratio, Cache Ratio, CPU Input Voltage, RAM Voltage for my memory kit and VCCIO CPU 1.05V. I found this is the best Bios settings for my case. One more thing I did not pathed my Bios as mentioned in this [https://www.tonymacx86.com/threads/...0-13-on-x99-full-success.227001/#post-1542618] post.
And I am uploading my EFI Folder also. hope this will help you.
 

Attachments

  • MY SSD 10.13.3.zip
    31.9 MB · Views: 162
HI @xiaodai,

After a week of working with an updated Substance Painter. I have not experienced any freezes at all.

I did some test with and without PMDrvr.


with PMDrvr

IRay Render in Substance Painter time before freeze

CPU and GPU 1:30
GPU only 0:10​

CudaZ 5 minutes (apprx)​

AME no issues​


Without PMDrvr I can run all 3 simultaneously without issues.

I am running 13.13.3

Nvidia Web Driver: 387.10.10.25.156

Cuda Driver Version: 387.128

My build is nearly Identical to KGP’s except:
  • I don’t have an NVMe drive
  • I use DSM2-ASUS-X99-A-II-USB.kext
  • I skipped section E.9.2
Also, I did patch my Bios.

I did not make any changes to XCPM.

What version of High Sierra are you running? Did you reinject your frequencies after upgrading? What are you running that you need to test for 14 hours?

I upgraded my clone drive to 13.13.4 but there is no CUDA driver for it yet.

Hope this helps.
 
Last edited:
HI @xiaodai,

After a week of working with an updated Substance Painter. I have not experienced any freezes at all.

I did some test with and without PMDrvr.


with PMDrvr

IRay Render in Substance Painter time before freeze

CPU and GPU 1:30
GPU only 0:10​

CudaZ 5 minutes (apprx)​

AME no issues​


Without PMDrvr I can run all 3 simultaneously without issues.

I am running 13.13.3

Nvidia Web Driver: 387.10.10.25.156

Cuda Driver Version: 387.128

My build is nearly Identical to KGP’s except:
  • I don’t have an NVMe drive
  • I use DSM2-ASUS-X99-A-II-USB.kext
  • I skipped section E.9.2
Also, I did patch my Bios.

I did not make any changes to XCPM.

What version of High Sierra are you running? Did you reinject your frequencies after upgrading? What are you running that you need to test for 14 hours?

I upgraded my clone drive to 13.13.4 but there is no CUDA driver for it yet.

Hope this helps.

Hi @raptortrax ,

Thanks for your reply and your experimental results.

I have a now observation that OpenCL also will cause a freeze, e.g. LuxMark with Stress Test.

My OS version is also 10.13.3 (there are several version of 10.13.3, and to be specific, the last version before 10.13.4 I am using). The driver of both Nvidia and Cuda is up-to-date.

For XCPM, can you share the details about XCPM KernalToPatch? Do you enable EIST in your BIOS when you do not use PMDrvr.kext? As you mention frequency injection, do you mean Piker-Alpha's ssdtPRGen.sh?

I am a developer of CUDA applications, and meanwhile, my work depends on some deep learning applications. To validate the stability of my machine on CUDA (on Hackintosh), I run up a deep learning task, and it will cause freezes in about 10+ hours (no EIST, no PMDrvr.kext, no overclocking, and proper voltage addition).

At the same time, my machine can keep stable over weeks with high-pressure CUDA computation in Linux OS, so the hardware seems of no error.
 
Last edited:
HI @xiaodai,

After a week of working with an updated Substance Painter. I have not experienced any freezes at all.

I did some test with and without PMDrvr.


with PMDrvr

IRay Render in Substance Painter time before freeze

CPU and GPU 1:30
GPU only 0:10​

CudaZ 5 minutes (apprx)​

AME no issues​


Without PMDrvr I can run all 3 simultaneously without issues.

I am running 13.13.3

Nvidia Web Driver: 387.10.10.25.156

Cuda Driver Version: 387.128

My build is nearly Identical to KGP’s except:
  • I don’t have an NVMe drive
  • I use DSM2-ASUS-X99-A-II-USB.kext
  • I skipped section E.9.2
Also, I did patch my Bios.

I did not make any changes to XCPM.

What version of High Sierra are you running? Did you reinject your frequencies after upgrading? What are you running that you need to test for 14 hours?

I upgraded my clone drive to 13.13.4 but there is no CUDA driver for it yet.

Hope this helps.
By the way, do you mean that, just to open cuda-z with performance page will cause a freeze? It seems that the workload of cuda-z is very low and it is quite old.
 
Status
Not open for further replies.
Back
Top