A
ATCTech_JQS
I'm a system builder at a mom-and-pop pc repair shop / MSP. I've recently built a few systems with identical nearly-identical hardware configurations, and some have been producing increasingly frequent bluescreens, with the bugcheck from the title. I have done my own analysis with bluescreenview (and I am readily available to upload the dump files) - the official MS documentation on this bugcheck indicates that parameter 1 indicates more precisely the point of failure. In every case I've seen, Parameter 1 is "0x7", which indicates a BOOT failure. However, further resources are limited and I've been able to find virtually nothing indicating what a "BOOT" failure actually is or how to remedy this.
The troubleshooting I've done thus far:
Run memtest86+, in some cases for several hours. No errors detected.
Run hardware stress tests on the system as a whole, as well as memory, CPU, disks, individually. No crash or error condition resulted from these tests.
Replaced the motherboard outright in one machine after going through some very basic troubleshooting with the manufacturer (ASRock). We suspected this may have been a known issue to them, perhaps drivers or a manufacturing defect. They were not especially helpful, after I indicated I had compatible memory on their QVL that had been thoroughly memtested, they simply opted to RMA the board. I hadn't even shipped the "failed" one out before getting a replacement installed, which promptly began crashing in the same way.
Installed all available driver and windows updates.
Checked physical connections (SATA, power, etc.).
Run chkdsk /r
The hardware specs for the machines are as follows:
ASRock J3455-ITX motherboard
Samsung 860 EVO 250GB SSD
8GB Crucial DDR3-1600L
Seasonic 520W M12II Modular PSU
Cooler Master Elite 130 ITX chassis
Rarely do these machines even have discrete graphics cards. As you can see, there aren't many components to blame for hardware failures. We admittedly have not done much to rule out the SSD. I'm well aware SSDs are not bulletproof (literally or figuratively) but the notion that the SSDs in 2 systems built over a month apart failed nearly simultaneously doesn't sit well. These problem machines are from separate clients, in different locations, so it is unlikely to be some sort of environmental issue.
I feel relatively confident this is not a hardware issue - more likely driver or OS, perhaps BIOS. As such, I'm mostly looking for some help in analyzing the dump files. They produce diagnostic information with a rate of consistency I very rarely, if ever, see, but kernel debugging surpasses my expertise a bit.
Continue reading...
The troubleshooting I've done thus far:
Run memtest86+, in some cases for several hours. No errors detected.
Run hardware stress tests on the system as a whole, as well as memory, CPU, disks, individually. No crash or error condition resulted from these tests.
Replaced the motherboard outright in one machine after going through some very basic troubleshooting with the manufacturer (ASRock). We suspected this may have been a known issue to them, perhaps drivers or a manufacturing defect. They were not especially helpful, after I indicated I had compatible memory on their QVL that had been thoroughly memtested, they simply opted to RMA the board. I hadn't even shipped the "failed" one out before getting a replacement installed, which promptly began crashing in the same way.
Installed all available driver and windows updates.
Checked physical connections (SATA, power, etc.).
Run chkdsk /r
The hardware specs for the machines are as follows:
ASRock J3455-ITX motherboard
Samsung 860 EVO 250GB SSD
8GB Crucial DDR3-1600L
Seasonic 520W M12II Modular PSU
Cooler Master Elite 130 ITX chassis
Rarely do these machines even have discrete graphics cards. As you can see, there aren't many components to blame for hardware failures. We admittedly have not done much to rule out the SSD. I'm well aware SSDs are not bulletproof (literally or figuratively) but the notion that the SSDs in 2 systems built over a month apart failed nearly simultaneously doesn't sit well. These problem machines are from separate clients, in different locations, so it is unlikely to be some sort of environmental issue.
I feel relatively confident this is not a hardware issue - more likely driver or OS, perhaps BIOS. As such, I'm mostly looking for some help in analyzing the dump files. They produce diagnostic information with a rate of consistency I very rarely, if ever, see, but kernel debugging surpasses my expertise a bit.
Continue reading...