G
Garrett1347
Hello. I have a what I believe is a hardware error. 4 times today, I have gotten the BSOD with a WHEA error. In the crash dump I see an error with my GPU. I used !errrec ffff858f76155038 To see that the there is a pci express error. I cannot figure out how interpret the AER information, I'm way out of my depth already. I've reseated my GPU and PCIe riser twice and cleaned out my case. Windows has been updated as well as GPU drivers.
My questions:
1: What is the actual error? I see that there is an error but I cannot figure what the error is.
2: Can this information pin point my issue to my PCIe slot, PCIe riser, or GPU?
3: What do the Bucket IDs and blackboxes mean?
4: There is a warning at the top "*** WARNING: Unable to verify timestamp for nvlddmkm.sys". How can I fix this?
5: From googling, this could be heat issue or maybe a driver issue. How could I tell if it is a heating issue? I monitor the temp and I have gotten the BSOD with different temps.
Symbol search path is: srv*
Executable search path is:
Windows 10 Kernel Version 18362 MP (12 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Machine Name:
Kernel base = 0xfffff806`58400000 PsLoadedModuleList = 0xfffff806`58848150
Debug session time: Thu Apr 23 15:07:17.951 2020 (UTC - 4:00)
System Uptime: 0 days 0:10:01.572
Loading Kernel Symbols
...............................................................
................................................................
................................................................
.......
Loading User Symbols
Loading unloaded module list
.........
For analysis of this file, run !analyze -v
0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000004, PCI Express Error
Arg2: ffff858f76155038, Address of the WHEA_ERROR_RECORD structure.
Arg3: 0000000000000000
Arg4: 0000000000000000
Debugging Details:
------------------
*** WARNING: Unable to verify timestamp for nvlddmkm.sys
KEY_VALUES_STRING: 1
PROCESSES_ANALYSIS: 1
SERVICE_ANALYSIS: 1
STACKHASH_ANALYSIS: 1
TIMELINE_ANALYSIS: 1
DUMP_CLASS: 1
DUMP_QUALIFIER: 400
BUILD_VERSION_STRING: 10.0.18362.815 (WinBuild.160101.0800)
SYSTEM_PRODUCT_NAME: To Be Filled By O.E.M.
SYSTEM_SKU: To Be Filled By O.E.M.
SYSTEM_VERSION: To Be Filled By O.E.M.
BIOS_VENDOR: American Megatrends Inc.
BIOS_VERSION: P5.70
BIOS_DATE: 06/25/2019
BASEBOARD_MANUFACTURER: ASRock
BASEBOARD_PRODUCT: X370 Gaming-ITX/ac
BASEBOARD_VERSION:
DUMP_FILE_ATTRIBUTES: 0x8
Kernel Generated Triage Dump
DUMP_TYPE: 2
BUGCHECK_P1: 4
BUGCHECK_P2: ffff858f76155038
BUGCHECK_P3: 0
BUGCHECK_P4: 0
HARDWARE_VENDOR_ID: 10DE
HARDWARE_DEVICE_ID: 2184
BUGCHECK_STR: 0x124_AuthenticAMD
CPU_COUNT: c
CPU_MHZ: e09
CPU_VENDOR: AuthenticAMD
CPU_FAMILY: 17
CPU_MODEL: 71
CPU_STEPPING: 0
BLACKBOXBSD: 1 (!blackboxbsd)
BLACKBOXNTFS: 1 (!blackboxntfs)
BLACKBOXPNP: 1 (!blackboxpnp)
BLACKBOXWINLOGON: 1
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT
PROCESS_NAME: System
CURRENT_IRQL: b
ANALYSIS_SESSION_HOST: THURSBY
ANALYSIS_SESSION_TIME: 04-23-2020 20:35:27.0543
ANALYSIS_VERSION: 10.0.18362.1 amd64fre
STACK_TEXT:
fffff806`5e675528 fffff806`58eff188 : 00000000`00000124 00000000`00000004 ffff858f`76155038 00000000`00000000 : nt!KeBugCheckEx
fffff806`5e675530 fffff806`5cba1920 : ffff858f`76182130 00000000`00000000 ffff858f`76155038 00000000`00000000 : hal!HalBugCheckSystem+0xd8
fffff806`5e675570 fffff806`587410a2 : ffff858f`76182130 fffff806`5e6755f9 00000000`00000000 ffff858f`76155038 : PSHED!PshedBugCheckSystem+0x10
fffff806`5e6755a0 fffff806`5d2ce884 : 00000000`00000000 00000000`00000000 ffff858f`761816c0 ffff858f`76154010 : nt!WheaReportHwError+0x382
fffff806`5e675660 fffff806`5d2cf002 : ffffdb00`d8d66a00 fffff806`5e675700 ffffdb00`d8d66ab0 ffff858f`7c04b000 : pci!ExpressRootPortAerInterruptRoutine+0x270
fffff806`5e6756c0 fffff806`5d2cf0b9 : ffffdb00`d8d66a00 fffff806`6392fad7 ffff858f`7d011d90 fffff806`63f1f2ce : pci!ExpressRootPortInterruptRoutine+0x22
fffff806`5e675720 fffff806`58457921 : ffff858f`7d002000 ffff858f`7cef95f0 00000000`00000000 fffff806`585c4860 : pci!ExpressRootPortMessageRoutine+0x9
fffff806`5e675750 fffff806`5842cc75 : 00000000`00000000 00000000`00000000 00000000`00000033 00001f80`00000e00 : nt!KiInterruptMessageDispatch+0x11
fffff806`5e675780 fffff806`585c3b9f : fffff806`5e6758c0 ffffdb00`d8d66a00 00000000`00000001 fffff806`6392e9b4 : nt!KiCallInterruptServiceRoutine+0xa5
fffff806`5e6757d0 fffff806`585c3e67 : ffff858f`7acfa000 fffff806`6392ef24 00000000`00000000 ffff858f`7acfa000 : nt!KiInterruptSubDispatch+0x11f
fffff806`5e675840 fffff806`5848944e : ffff858f`00000000 fffff806`63924b21 00000000`00000001 ffff858f`7ae0c010 : nt!KiInterruptDispatch+0x37
fffff806`5e6759d0 fffff806`6391c30c : fffff806`5e675a30 00000000`00000004 00000000`00000000 ffff4136`9690ed17 : nt!KzLowerIrql+0x1e
fffff806`5e675a00 fffff806`5e675a30 : 00000000`00000004 00000000`00000000 ffff4136`9690ed17 fffff806`640f28a4 : nvlddmkm+0x10c30c
fffff806`5e675a08 00000000`00000004 : 00000000`00000000 ffff4136`9690ed17 fffff806`640f28a4 fffff806`639264b7 : 0xfffff806`5e675a30
fffff806`5e675a10 00000000`00000000 : ffff4136`9690ed17 fffff806`640f28a4 fffff806`639264b7 00000000`00000002 : 0x4
THREAD_SHA1_HASH_MOD_FUNC: 9942ada41b31ad6db57124b7001a4ac9b6f6b447
THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 9ff85a32e98290cb6c35521368931aeb8a562fe1
THREAD_SHA1_HASH_MOD: 3cd8166e6fc0c5e4e6cd65bd834411554e61d4a6
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: AuthenticAMD
IMAGE_NAME: AuthenticAMD
DEBUG_FLR_IMAGE_TIMESTAMP: 0
STACK_COMMAND: .thread ; .cxr ; kb
FAILURE_BUCKET_ID: 0x124_AuthenticAMD_PCIEXPRESS_VENID_10DE_DEVID_2184_UNSUPPORTED_REQUEST_UNEXPECTED_COMPLETION_COMPLETION_TIMEOUT
BUCKET_ID: 0x124_AuthenticAMD_PCIEXPRESS_VENID_10DE_DEVID_2184_UNSUPPORTED_REQUEST_UNEXPECTED_COMPLETION_COMPLETION_TIMEOUT
PRIMARY_PROBLEM_CLASS: 0x124_AuthenticAMD_PCIEXPRESS_VENID_10DE_DEVID_2184_UNSUPPORTED_REQUEST_UNEXPECTED_COMPLETION_COMPLETION_TIMEOUT
TARGET_TIME: 2020-04-23T19:07:17.000Z
OSBUILD: 18362
OSSERVICEPACK: 815
SERVICEPACK_NUMBER: 0
OS_REVISION: 0
SUITE_MASK: 272
PRODUCT_TYPE: 1
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
OSEDITION: Windows 10 WinNt TerminalServer SingleUserTS
OS_LOCALE:
USER_LCID: 0
OSBUILD_TIMESTAMP: unknown_date
BUILDDATESTAMP_STR: 160101.0800
BUILDLAB_STR: WinBuild
BUILDOSVER_STR: 10.0.18362.815
ANALYSIS_SESSION_ELAPSED_TIME: 97d
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:0x124_authenticamd_pciexpress_venid_10de_devid_2184_unsupported_request_unexpected_completion_completion_timeout
FAILURE_ID_HASH: {e0a29fee-d01a-e76c-63f8-d723054dad61}
Followup: MachineOwner
---------
0: kd> !errrec ffff858f76155038
===============================================================================
Common Platform Error Record @ ffff858f76155038
-------------------------------------------------------------------------------
Record Id : 01d619a101f67288
Severity : Informational (3)
Length : 672
Creator : Microsoft
Notify Type : PCI Express Error
Timestamp : 4/23/2020 19:07:17 (UTC)
Flags : 0x00000000
===============================================================================
Section 0 : PCI Express
-------------------------------------------------------------------------------
Descriptor @ ffff858f761550b8
Section @ ffff858f76155148
Offset : 272
Length : 208
Flags : 0x00000001 Primary
Severity : Recoverable
Port Type : Legacy Endpoint
Version : 1.1
Command/Status: 0x0010/0x0007
Device Id :
VenIdevId : 10de:2184
Class code : 018000
Function No : 0x00
Device No : 0x00
Segment : 0x0000
Primary Bus : 0x0c
Second. Bus : 0x00
Slot : 0x0000
Dev. Serial # : 0000000000000000
Express Capability Information @ ffff858f7615517c
Device Caps : 10008de1 Role-Based Error Reporting: 1
Device Ctl : 2917 ur FE NF CE
Dev Status : 001b UR fe NF CE
Root Ctl : 0000 fs nfs cs
AER Information @ ffff858f761551b8
Uncorrectable Error Status : 00114000 UR ecrc mtlp rof UC ca CTO fcp ptlp sd dlp und
Uncorrectable Error Mask : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Severity : 00462030 ur ecrc MTLP ROF uc ca cto FCP ptlp SD DLP und
Correctable Error Status : 0000a000 ADV rtto rnro dllp tlp re
Correctable Error Mask : 00000000 adv rtto rnro dllp tlp re
Caps & Control : 00000014 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
Header Log : 04000001 00002203 0c040000 f7f7f7f7
Root Error Command : 00000000 fen nfen cen
Root Error Status : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
Correctable Error Source ID : 00,00,00
Correctable Error Source ID : 00,00,00
===============================================================================
Section 1 : Processor Generic
-------------------------------------------------------------------------------
Descriptor @ ffff858f76155100
Section @ ffff858f76155218
Offset : 480
Length : 192
Flags : 0x00000000
Severity : Informational
Proc. Type : x86/x64
Instr. Set : x64
CPU Version : 0x0000000000870f10
Processor ID : 0x0000000000000000
Continue reading...
My questions:
1: What is the actual error? I see that there is an error but I cannot figure what the error is.
2: Can this information pin point my issue to my PCIe slot, PCIe riser, or GPU?
3: What do the Bucket IDs and blackboxes mean?
4: There is a warning at the top "*** WARNING: Unable to verify timestamp for nvlddmkm.sys". How can I fix this?
5: From googling, this could be heat issue or maybe a driver issue. How could I tell if it is a heating issue? I monitor the temp and I have gotten the BSOD with different temps.
Symbol search path is: srv*
Executable search path is:
Windows 10 Kernel Version 18362 MP (12 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Machine Name:
Kernel base = 0xfffff806`58400000 PsLoadedModuleList = 0xfffff806`58848150
Debug session time: Thu Apr 23 15:07:17.951 2020 (UTC - 4:00)
System Uptime: 0 days 0:10:01.572
Loading Kernel Symbols
...............................................................
................................................................
................................................................
.......
Loading User Symbols
Loading unloaded module list
.........
For analysis of this file, run !analyze -v
0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000004, PCI Express Error
Arg2: ffff858f76155038, Address of the WHEA_ERROR_RECORD structure.
Arg3: 0000000000000000
Arg4: 0000000000000000
Debugging Details:
------------------
*** WARNING: Unable to verify timestamp for nvlddmkm.sys
KEY_VALUES_STRING: 1
PROCESSES_ANALYSIS: 1
SERVICE_ANALYSIS: 1
STACKHASH_ANALYSIS: 1
TIMELINE_ANALYSIS: 1
DUMP_CLASS: 1
DUMP_QUALIFIER: 400
BUILD_VERSION_STRING: 10.0.18362.815 (WinBuild.160101.0800)
SYSTEM_PRODUCT_NAME: To Be Filled By O.E.M.
SYSTEM_SKU: To Be Filled By O.E.M.
SYSTEM_VERSION: To Be Filled By O.E.M.
BIOS_VENDOR: American Megatrends Inc.
BIOS_VERSION: P5.70
BIOS_DATE: 06/25/2019
BASEBOARD_MANUFACTURER: ASRock
BASEBOARD_PRODUCT: X370 Gaming-ITX/ac
BASEBOARD_VERSION:
DUMP_FILE_ATTRIBUTES: 0x8
Kernel Generated Triage Dump
DUMP_TYPE: 2
BUGCHECK_P1: 4
BUGCHECK_P2: ffff858f76155038
BUGCHECK_P3: 0
BUGCHECK_P4: 0
HARDWARE_VENDOR_ID: 10DE
HARDWARE_DEVICE_ID: 2184
BUGCHECK_STR: 0x124_AuthenticAMD
CPU_COUNT: c
CPU_MHZ: e09
CPU_VENDOR: AuthenticAMD
CPU_FAMILY: 17
CPU_MODEL: 71
CPU_STEPPING: 0
BLACKBOXBSD: 1 (!blackboxbsd)
BLACKBOXNTFS: 1 (!blackboxntfs)
BLACKBOXPNP: 1 (!blackboxpnp)
BLACKBOXWINLOGON: 1
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT
PROCESS_NAME: System
CURRENT_IRQL: b
ANALYSIS_SESSION_HOST: THURSBY
ANALYSIS_SESSION_TIME: 04-23-2020 20:35:27.0543
ANALYSIS_VERSION: 10.0.18362.1 amd64fre
STACK_TEXT:
fffff806`5e675528 fffff806`58eff188 : 00000000`00000124 00000000`00000004 ffff858f`76155038 00000000`00000000 : nt!KeBugCheckEx
fffff806`5e675530 fffff806`5cba1920 : ffff858f`76182130 00000000`00000000 ffff858f`76155038 00000000`00000000 : hal!HalBugCheckSystem+0xd8
fffff806`5e675570 fffff806`587410a2 : ffff858f`76182130 fffff806`5e6755f9 00000000`00000000 ffff858f`76155038 : PSHED!PshedBugCheckSystem+0x10
fffff806`5e6755a0 fffff806`5d2ce884 : 00000000`00000000 00000000`00000000 ffff858f`761816c0 ffff858f`76154010 : nt!WheaReportHwError+0x382
fffff806`5e675660 fffff806`5d2cf002 : ffffdb00`d8d66a00 fffff806`5e675700 ffffdb00`d8d66ab0 ffff858f`7c04b000 : pci!ExpressRootPortAerInterruptRoutine+0x270
fffff806`5e6756c0 fffff806`5d2cf0b9 : ffffdb00`d8d66a00 fffff806`6392fad7 ffff858f`7d011d90 fffff806`63f1f2ce : pci!ExpressRootPortInterruptRoutine+0x22
fffff806`5e675720 fffff806`58457921 : ffff858f`7d002000 ffff858f`7cef95f0 00000000`00000000 fffff806`585c4860 : pci!ExpressRootPortMessageRoutine+0x9
fffff806`5e675750 fffff806`5842cc75 : 00000000`00000000 00000000`00000000 00000000`00000033 00001f80`00000e00 : nt!KiInterruptMessageDispatch+0x11
fffff806`5e675780 fffff806`585c3b9f : fffff806`5e6758c0 ffffdb00`d8d66a00 00000000`00000001 fffff806`6392e9b4 : nt!KiCallInterruptServiceRoutine+0xa5
fffff806`5e6757d0 fffff806`585c3e67 : ffff858f`7acfa000 fffff806`6392ef24 00000000`00000000 ffff858f`7acfa000 : nt!KiInterruptSubDispatch+0x11f
fffff806`5e675840 fffff806`5848944e : ffff858f`00000000 fffff806`63924b21 00000000`00000001 ffff858f`7ae0c010 : nt!KiInterruptDispatch+0x37
fffff806`5e6759d0 fffff806`6391c30c : fffff806`5e675a30 00000000`00000004 00000000`00000000 ffff4136`9690ed17 : nt!KzLowerIrql+0x1e
fffff806`5e675a00 fffff806`5e675a30 : 00000000`00000004 00000000`00000000 ffff4136`9690ed17 fffff806`640f28a4 : nvlddmkm+0x10c30c
fffff806`5e675a08 00000000`00000004 : 00000000`00000000 ffff4136`9690ed17 fffff806`640f28a4 fffff806`639264b7 : 0xfffff806`5e675a30
fffff806`5e675a10 00000000`00000000 : ffff4136`9690ed17 fffff806`640f28a4 fffff806`639264b7 00000000`00000002 : 0x4
THREAD_SHA1_HASH_MOD_FUNC: 9942ada41b31ad6db57124b7001a4ac9b6f6b447
THREAD_SHA1_HASH_MOD_FUNC_OFFSET: 9ff85a32e98290cb6c35521368931aeb8a562fe1
THREAD_SHA1_HASH_MOD: 3cd8166e6fc0c5e4e6cd65bd834411554e61d4a6
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: AuthenticAMD
IMAGE_NAME: AuthenticAMD
DEBUG_FLR_IMAGE_TIMESTAMP: 0
STACK_COMMAND: .thread ; .cxr ; kb
FAILURE_BUCKET_ID: 0x124_AuthenticAMD_PCIEXPRESS_VENID_10DE_DEVID_2184_UNSUPPORTED_REQUEST_UNEXPECTED_COMPLETION_COMPLETION_TIMEOUT
BUCKET_ID: 0x124_AuthenticAMD_PCIEXPRESS_VENID_10DE_DEVID_2184_UNSUPPORTED_REQUEST_UNEXPECTED_COMPLETION_COMPLETION_TIMEOUT
PRIMARY_PROBLEM_CLASS: 0x124_AuthenticAMD_PCIEXPRESS_VENID_10DE_DEVID_2184_UNSUPPORTED_REQUEST_UNEXPECTED_COMPLETION_COMPLETION_TIMEOUT
TARGET_TIME: 2020-04-23T19:07:17.000Z
OSBUILD: 18362
OSSERVICEPACK: 815
SERVICEPACK_NUMBER: 0
OS_REVISION: 0
SUITE_MASK: 272
PRODUCT_TYPE: 1
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
OSEDITION: Windows 10 WinNt TerminalServer SingleUserTS
OS_LOCALE:
USER_LCID: 0
OSBUILD_TIMESTAMP: unknown_date
BUILDDATESTAMP_STR: 160101.0800
BUILDLAB_STR: WinBuild
BUILDOSVER_STR: 10.0.18362.815
ANALYSIS_SESSION_ELAPSED_TIME: 97d
ANALYSIS_SOURCE: KM
FAILURE_ID_HASH_STRING: km:0x124_authenticamd_pciexpress_venid_10de_devid_2184_unsupported_request_unexpected_completion_completion_timeout
FAILURE_ID_HASH: {e0a29fee-d01a-e76c-63f8-d723054dad61}
Followup: MachineOwner
---------
0: kd> !errrec ffff858f76155038
===============================================================================
Common Platform Error Record @ ffff858f76155038
-------------------------------------------------------------------------------
Record Id : 01d619a101f67288
Severity : Informational (3)
Length : 672
Creator : Microsoft
Notify Type : PCI Express Error
Timestamp : 4/23/2020 19:07:17 (UTC)
Flags : 0x00000000
===============================================================================
Section 0 : PCI Express
-------------------------------------------------------------------------------
Descriptor @ ffff858f761550b8
Section @ ffff858f76155148
Offset : 272
Length : 208
Flags : 0x00000001 Primary
Severity : Recoverable
Port Type : Legacy Endpoint
Version : 1.1
Command/Status: 0x0010/0x0007
Device Id :
VenIdevId : 10de:2184
Class code : 018000
Function No : 0x00
Device No : 0x00
Segment : 0x0000
Primary Bus : 0x0c
Second. Bus : 0x00
Slot : 0x0000
Dev. Serial # : 0000000000000000
Express Capability Information @ ffff858f7615517c
Device Caps : 10008de1 Role-Based Error Reporting: 1
Device Ctl : 2917 ur FE NF CE
Dev Status : 001b UR fe NF CE
Root Ctl : 0000 fs nfs cs
AER Information @ ffff858f761551b8
Uncorrectable Error Status : 00114000 UR ecrc mtlp rof UC ca CTO fcp ptlp sd dlp und
Uncorrectable Error Mask : 00000000 ur ecrc mtlp rof uc ca cto fcp ptlp sd dlp und
Uncorrectable Error Severity : 00462030 ur ecrc MTLP ROF uc ca cto FCP ptlp SD DLP und
Correctable Error Status : 0000a000 ADV rtto rnro dllp tlp re
Correctable Error Mask : 00000000 adv rtto rnro dllp tlp re
Caps & Control : 00000014 ecrcchken ecrcchkcap ecrcgenen ecrcgencap FEP
Header Log : 04000001 00002203 0c040000 f7f7f7f7
Root Error Command : 00000000 fen nfen cen
Root Error Status : 00000000 MSG# 00 fer nfer fuf mur ur mcr cer
Correctable Error Source ID : 00,00,00
Correctable Error Source ID : 00,00,00
===============================================================================
Section 1 : Processor Generic
-------------------------------------------------------------------------------
Descriptor @ ffff858f76155100
Section @ ffff858f76155218
Offset : 480
Length : 192
Flags : 0x00000000
Severity : Informational
Proc. Type : x86/x64
Instr. Set : x64
CPU Version : 0x0000000000870f10
Processor ID : 0x0000000000000000
Continue reading...