Windows 10 BoD (DPC_WATCHDOG_VIOLATION / VIDEO_TDR_FAILURE) due to graphics driver or faulty hardware?

  • Thread starter Thread starter Spen_2
  • Start date Start date
S

Spen_2

Since quite some time I regularly get BoDs with a DPC_WATCHDOG_VIOLATION (133) and after some driver changes VIDEO_TDR_FAILURE (116).

  • DPC_WATCHDOG_VIOLATION (133): My screen and audio freezes, sometimes comes back after some seconds or ends in a Blue Screen (Green Screen for me).
  • VIDEO_TDR_FAILURE (116): The screen turns black but audio keeps going. This continues until I hit a key on the keyboard or mouse, then it directly hard reboots.


At first it felt like it was coming from the network card / driver of the mainboard since it most often happens when I do something in the Edge browser (the new Chromium based one).

I already turned off hardware acceleration. That did not make any change.


I already put a ton of time into this issue (ignoring the amount of time / days it already cost me with having everything crash and reboot during work (home office).


Based on my analysis of the memory dumps I concluded that it is coming from my EVGA NVIDIA GTX 1080 graphics card or its driver. I now confirmed this by disabling the graphics card completely and only working with the onboard graphics of my CPU (device manager -> GPU -> disable). Since I did that, all issues and even system stuttering I experienced before did go away. I want to clarify that I'm using this graphics card for over 3 years and the issues just started some months ago - I had no issues before. I'm not aware of any changes I made to the system which caused the change.


To fix it I tried the following (in bold everything that might actually be relevant based on my findings with the GPU):

  • update motherboard BIOS
  • install all available drivers from the motherboard support page (IRST, ....)
  • try different versions of the network driver (even unofficial ones)
  • disable hardware acceleration in Microsoft Edge
  • free up storage (free storage on C now ~200 GB)
  • completely reinstalled Windows
  • boot into safe mode, uninstall the NVIDIA drivers with DDU and install
    • the newest available NVIDIA driver (457.09)
    • the oldest available NVIDIA driver (440.97)

I saved and analyzed many memory dumps. With the driver changes mentioned above I noticed changes in the error behavior but the failure did not went away.


I now try to determine if my GPU is defect (hardware failure) or if there is some issue with the driver (or the combination of windows and the driver).


I hope that you can help me with that. I no longer have warranty on the graphics card and really would like to avoid having to buy a new GPU at the moment.


See different memory dumps and my system specs below.


Memory dumps (2) with the newest available Nvidia driver (457.09):


DPC_WATCHDOG_VIOLATION (133)

The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL

or above.

Arguments:

Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending

component can usually be identified with a stack trace.

Arg2: 0000000000000501, The DPC time count (in ticks).

Arg3: 0000000000000500, The DPC time allotment (in ticks).

Arg4: fffff806520fb320, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains

additional information regarding this single DPC timeout



SYMBOL_NAME: dxgmms2!VidSchiWorkerThreadTimerCallback+46

MODULE_NAME: dxgmms2

IMAGE_NAME: dxgmms2.sys

IMAGE_VERSION: 10.0.19041.546

STACK_COMMAND: .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET: 46

FAILURE_BUCKET_ID: 0x133_DPC_dxgmms2!VidSchiWorkerThreadTimerCallback

OS_VERSION: 10.0.19041.1

BUILDLAB_STR: vb_release

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {7c8b50ff-53da-11c4-a8b3-cef8f5e196be}



# Child-SP RetAddr Call Site

00 ffffde80`1e7d9c88 fffff806`51887372 nt!KeBugCheckEx

01 ffffde80`1e7d9c90 fffff806`5172c2cd nt!KeAccumulateTicks+0x15e2c2

02 ffffde80`1e7d9cf0 fffff806`5172c871 nt!KiUpdateRunTime+0x5d

03 ffffde80`1e7d9d40 fffff806`517266e3 nt!KiUpdateTime+0x4a1

04 ffffde80`1e7d9e80 fffff806`5172eff2 nt!KeClockInterruptNotify+0x2e3

05 ffffde80`1e7d9f30 fffff806`5162ecd5 nt!HalpTimerClockInterrupt+0xe2

06 ffffde80`1e7d9f60 fffff806`517f6cba nt!KiCallInterruptServiceRoutine+0xa5

07 ffffde80`1e7d9fb0 fffff806`517f7227 nt!KiInterruptSubDispatchNoLockNoEtw+0xfa

08 ffffb60d`2ec848f0 fffff806`516becdb nt!KiInterruptDispatchNoLockNoEtw+0x37

09 ffffb60d`2ec84a80 fffff806`5170bbda nt!KeYieldProcessorEx+0x1b

0a ffffb60d`2ec84a90 fffff806`51709c53 nt!KxWaitForLockOwnerShip+0x2a

0b ffffb60d`2ec84ac0 fffff806`5cd33696 nt!KeAcquireInStackQueuedSpinLockAtDpcLevel+0x73

0c ffffb60d`2ec84af0 fffff806`516bdef9 dxgmms2!VidSchiWorkerThreadTimerCallback+0x46

0d ffffb60d`2ec84b50 fffff806`516bd735 nt!KiExpireTimer2+0x429

0e ffffb60d`2ec84c60 fffff806`516e4cc4 nt!KiTimer2Expiration+0x165

0f ffffb60d`2ec84d20 fffff806`517fc255 nt!KiRetireDpcList+0x874

10 ffffb60d`2ec84fb0 fffff806`517fc040 nt!KxRetireDpcList+0x5

11 ffffb60d`2eecd910 fffff806`517fb70e nt!KiDispatchInterruptContinue

12 ffffb60d`2eecd940 fffff806`5165095a nt!KiDpcInterrupt+0x2ee

13 ffffb60d`2eecdad0 fffff806`5165086c nt!KiExitThreadWait+0x4a

14 ffffb60d`2eecdb10 fffff806`517074a7 nt!KiFastExitThreadWait+0x40

15 ffffb60d`2eecdb40 fffff806`55850d4f nt!KeDelayExecutionThread+0x3b7

16 ffffb60d`2eecdbd0 fffff806`516a29a5 iaStorAVC!EventQueue::main+0xc3

17 ffffb60d`2eecdc10 fffff806`517fc868 nt!PspSystemThreadStartup+0x55

18 ffffb60d`2eecdc60 00000000`00000000 nt!KiStartSystemThread+0x28


See memory dump file here: 2020_11_03_1_MEMORY.DMP


Nvidia requests a lock at DPC level but does not release it. That is probably what causes the DPC_WATCHDOG_VIOLATION in this case.


DPC_WATCHDOG_VIOLATION (133)

The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL

or above.

Arguments:

Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending

component can usually be identified with a stack trace.

Arg2: 0000000000000501, The DPC time count (in ticks).

Arg3: 0000000000000500, The DPC time allotment (in ticks).

Arg4: fffff8021eafb320, cast to nt!DPC_WATCHDOG_GLOBAL_TRIAGE_BLOCK, which contains

additional information regarding this single DPC timeout



SYMBOL_NAME: dxgkrnl!DpiFdoDpcForIsr+37

MODULE_NAME: dxgkrnl

IMAGE_NAME: dxgkrnl.sys

STACK_COMMAND: .thread ; .cxr ; kb

BUCKET_ID_FUNC_OFFSET: 37

FAILURE_BUCKET_ID: 0x133_DPC_dxgkrnl!DpiFdoDpcForIsr

OS_VERSION: 10.0.19041.1

BUILDLAB_STR: vb_release

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {ba837505-1263-7a6a-27ed-8858d50757c2}



# Child-SP RetAddr Call Site

00 ffffaf00`cdd3fe18 fffff802`1e287372 nt!KeBugCheckEx

01 ffffaf00`cdd3fe20 fffff802`1e126853 nt!KeAccumulateTicks+0x15e2c2

02 ffffaf00`cdd3fe80 fffff802`1e12633a nt!KeClockInterruptNotify+0x453

03 ffffaf00`cdd3ff30 fffff802`1e02ecd5 nt!HalpTimerClockIpiRoutine+0x1a

04 ffffaf00`cdd3ff60 fffff802`1e1f6cba nt!KiCallInterruptServiceRoutine+0xa5

05 ffffaf00`cdd3ffb0 fffff802`1e1f7227 nt!KiInterruptSubDispatchNoLockNoEtw+0xfa

06 ffffd001`f5430970 fffff802`1e10bbd0 nt!KiInterruptDispatchNoLockNoEtw+0x37

07 ffffd001`f5430b00 fffff802`1e109c53 nt!KxWaitForLockOwnerShip+0x20

08 ffffd001`f5430b30 fffff802`2b264667 nt!KeAcquireInStackQueuedSpinLockAtDpcLevel+0x73

09 ffffd001`f5430b60 fffff802`1e0e535e dxgkrnl!DpiFdoDpcForIsr+0x37

0a ffffd001`f5430bb0 fffff802`1e0e4644 nt!KiExecuteAllDpcs+0x30e

0b ffffd001`f5430d20 fffff802`1e1fc255 nt!KiRetireDpcList+0x1f4

0c ffffd001`f5430fb0 fffff802`1e1fc040 nt!KxRetireDpcList+0x5

0d ffffd001`f5baa830 fffff802`1e1fb70e nt!KiDispatchInterruptContinue

0e ffffd001`f5baa860 fffff802`1e1f648b nt!KiDpcInterrupt+0x2ee

0f ffffd001`f5baa9f0 fffff802`2b263b6c nt!KeSynchronizeExecution+0x5b

10 ffffd001`f5baaa30 fffff802`2ef7320e dxgkrnl!DpSynchronizeExecution+0xac

11 ffffd001`f5baaa80 fffff802`2ef93011 nvlddmkm+0x81320e

12 ffffd001`f5baab20 fffff802`1e0a29a5 nvlddmkm+0x833011

13 ffffd001`f5baac10 fffff802`1e1fc868 nt!PspSystemThreadStartup+0x55

14 ffffd001`f5baac60 00000000`00000000 nt!KiStartSystemThread+0x28


See memory dump file here: 2020_11_03_2_MEMORY.DMP


Same thing again with slightly different outcome. Nvidia requests a lock at DPC level but does not release it. That is probably what causes the DPC_WATCHDOG_VIOLATION again.


Memory dump with the oldest available Nvidia driver (440.97):


VIDEO_TDR_FAILURE (116)

Attempt to reset the display driver and recover from timeout failed.

Arguments:

Arg1: ffffe085fbe08010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).

Arg2: fffff80475374860, The pointer into responsible device driver module (e.g. owner tag).

Arg3: 0000000000000000, Optional error code (NTSTATUS) of the last failed operation.

Arg4: 000000000000000d, Optional internal context dependent data.



SYMBOL_NAME: nvlddmkm+b24860

MODULE_NAME: nvlddmkm

IMAGE_NAME: nvlddmkm.sys

STACK_COMMAND: .thread ; .cxr ; kb

FAILURE_BUCKET_ID: 0x116_IMAGE_nvlddmkm.sys

OS_VERSION: 10.0.19041.1

BUILDLAB_STR: vb_release

OSPLATFORM_TYPE: x64

OSNAME: Windows 10

FAILURE_ID_HASH: {c89bfe8c-ed39-f658-ef27-f2898997fdbd}



# Child-SP RetAddr Call Site

00 ffffcd8c`6e725868 fffff804`726014be nt!KeBugCheckEx

01 ffffcd8c`6e725870 fffff804`72600b21 dxgkrnl!TdrBugcheckOnTimeout+0xfe

02 ffffcd8c`6e7258b0 fffff804`7200d683 dxgkrnl!TdrIsRecoveryRequired+0x1b1

03 ffffcd8c`6e7258e0 fffff804`7206770c dxgmms2!VidSchiReportHwHang+0x62f

04 ffffcd8c`6e7259e0 fffff804`720a20d7 dxgmms2!VidSchWaitForCompletionEvent+0x33fec

05 ffffcd8c`6e725a60 fffff804`720a11ba dxgmms2!VidSchiWaitForDrainFlipQueue+0x8f

06 ffffcd8c`6e725b50 fffff804`7205aed0 dxgmms2!VidSchiDrainFlipQueue+0x1a

07 ffffcd8c`6e725b80 fffff804`7205acfa dxgmms2!VidSchiRun_PriorityTable+0x1c0

08 ffffcd8c`6e725bd0 fffff804`660a29a5 dxgmms2!VidSchiWorkerThread+0xca

09 ffffcd8c`6e725c10 fffff804`661fc868 nt!PspSystemThreadStartup+0x55

0a ffffcd8c`6e725c60 00000000`00000000 nt!KiStartSystemThread+0x28


See memory dump file here: 2020_11_07_4_MEMORY.DMP

In this case I'm not so sure on the failure mode. It seems to wait for something which most probably does not happen.


System Specs:

  • Windows insider
  • Operating System: Windows 10 Pro 64-bit (10.0, Build 19042) (19041.vb_release.191206-1406)
  • Mainboard: Gigabyte GA-Z77M-D3H (rev. 1.1) with BIOS version F15a (type: UEFI)
  • CPU: Intel(R) Core(TM) i7-3770K (no overclocking)
  • RAM: Corsair DRAM 2x8GB (16 GB), DDR3 1600 Mhz with XMP Profile enabled
  • GPU: EVGA NVIDIA GTX 1080 tested with driver version 457.09 and 440.97
  • Storage: 2x Samsung SSDs with 250 GB via SATA in RAID 0



I hope someone can help me on that. Please let me now if any further information is needed.

Continue reading...
 
Back
Top