Windows Server The cat finally died! (AGP Aperture Size was wrong.)

  • Thread starter Thread starter Robbie Hatley
  • Start date Start date
R

Robbie Hatley

First of all... wow, no new threads in this group in 6 days?!

That's rather unusual.



Now, to the main topic of this post: As some of you guys and gals

may recall, I've been posting in here every couple weeks for the last

few months about my "COMPUTER KEEPS CRASHING" issue. To recap, my

Win2K-SP4 system has been crashing roughly once a day for months.

The crash always involves the following symptoms happening

simultaneously:

- usb mouse goes offline (mouse pointer stops responding to it)

- serial mouse does NOT go offline (mouse pointer responds)

- all networks (ethernet, usb, 1384, Internet) go offline

- sound starts cutting in and out about 3 times per second



I've tried many things to fix it, all to no avail. At times

I thought I'd fixed the problem, only to have it come back

a few days later. As I put it in one post:



But de cat came back, he couldn't stay no long-er,

Yes de cat came back de very next day,

De cat came back -- thought she were a goner,

But de cat came back for it wouldn't stay away.



Well, guess what? I finally killed the damn cat, and the

problem wasn't even REMOTELY close to any of my previous

guesses, or to any of the advice others gave me. (No

criticism of folks here implied; the actual cause was so

bizarre that I don't blame anyone for not guessing it.)

The cause? Incorrect AGP aperture size.



I'd never have guessed that, but for an incident that

occurred about 10 days ago. I was working along, and

suddenly my screen froze for three seconds (stopped

responding to mouse or keyboard), went black for two

seconds, then returned to normal. THAT LOOKED VERY

FAMILIAR. I'd seen that before! It was a video-mode

reset. My old CRT monitor used to go "CLINK", go black

for 2 seconds, go "CLANK", then return to normal. My

new LCD monitor uses solid-state electronics instead of

noisy mechanical relays, so it doesn't CLINK/CLANK, but

it still goes black for 2 seconds during mode reset.



I seemed to recall having that problem before, so I looked

in my Computer Journal (a text file in which I record

computer maintainance issues from time to time). Sure

enough, from 2005, I found these entries:



~~~~~~~~~~~~ BEGIN COMPUTER JOURNAL EXCERPT ~~~~~~~~~~~~~~~~~~~



Sun. Jun. 19, 2005:

I've been having some annoying problems lately:



I've been experiencing frequent video-mode resets, especially

while operating scrollbars on windows screens; the monitor goes

"CLINK", goes black for 2 seconds, goes "CLANK", then returns to

normal, except for some scrambled content in windows, and some

bad pixels in window frames, both of which usually (but not

always) will correct themselves on minimize, restore.



Also, sometimes my system just freezes up in the middle of work,

video image frozen, no response from keyboard (CAP-LOCK and

NUM-LOCK buttons won't toggle LEDs), no response from mouse

(pointer is frozen). I have to press the front-panel "Reset"

button on my machine to un-freeze it.



Tue. Jun. 24, 2005, 4:00AM:

I replaced my video card with a BFG nVidia GeForce MX 4000 128MB

AGP8X. The instructions were very adamant about several issues:

1. The old drivers MUST be uninstalled before installing new

drivers.

2. The BIOS "AGP Aperature" setting MUST match the number of MB

of RAM on the video card.

3. System BIOS and video BIOS "shadow" or "caching" MUST be

turned OFF.



Sat Aug 27, 2005:

After about 90 days of heavy use, the problems listed above are

completely gone. I think now that these problems were entirely

due to a bad video card.



~~~~~~~~~~~~ END COMPUTER JOURNAL EXCERPT ~~~~~~~~~~~~~~~~~~~



So it turned out, I'd had both the crash problem and the video mode

reset problem before. And I'd fixed them, and they'd stayed fixed

for some months.



So, what did I do to fix them? Four things:



1. Replaced video card.

2. Set "System-BIOS Caching" to "Off" in BIOS settings.

3. Set "Video RAM Caching" to "Off" in BIOS settings.

4. Set "AGP Aperture Size" to match video RAM size in BIOS settings.



So what changed, that would cause the malfunctions to resume?



1. Video card? Still same card, and still seems to be working ok.

2. System BIOS Caching? Still "Off".

3. Video RAM Caching? Still "Off".

4. AGP Aperture Size = Video RAM size?

Video RAM Size = 128MB.

AGP Aperture Size = 256MB.

OOPS.



So about 10 days ago (around March 9, 2010) I changed my AGP Aperture

Size, which had somehow got set to 256MB, back to 128MB where it

belongs. I haven't had any crashes or mode resets since.



Odd that AGP Aperture Size is such a critical setting! I actually

examined that setting about a month ago while trying to fix my crash

problem. I looked it up on the web, and most sites give this advice:



"AGP Aperture size can be set to just about anything you want;

it doesn't matter; it has little effect on performance; most

people should just set it to 256MB and leave it there."



The only problem with that advice is, it's 100% pure unadulterated

BULLSHIT! The correct advice would be:



"Always set AGP Aperture Size to the value demanded by your video

card manufacturer. Failure to comply may cause memory corruption

and system crashes do to mis-match between how much memory the

video driver thinks is allocated for AGP usage, and how much memory

BIOS has *ACTUALLY* allocated. If Aperture size is set too high --

say, 256MB when it should be 128MB -- your video driver will tell

windows that the upper 128MB of AGP Aperture memory is 'unused and

available for system use'. Windows will then attempt to use memory

which is actually in-use by BIOS and Hardware. This will cause

other drivers -- mouse, networks, sound -- to be overwritten in

RAM, causing the system to crash. Conversely, if AGP Aperture Size

is set too low, your video driver will write video data to usused

memory which is *NOT* linked to your video card, resulting in

corruption of video images."



So I think now that my crashing and video problems in 2005 were *NOT*

due to a bad video card, but rather to AGP Aperture Size being set

incorrectly.



Similarly, my crashing problems over the last few months were not

due to viruses, software, hardware, BIOS, power glitches, cosmic

rays, driver conflicts, services, daemons, demons, aliens, karma,

etc, etc, etc... but rather to AGP Aperture Size being set

incorrectly.



Amazing (and infuriating) how one obscure, poorly-understood

(and completely undocumented) BIOS setting can reck so much havoc.



Live and Learn.





De cat didn't come back, no, he went away for good,

no more to be seen in dis neighborhood!

Yep, de cat didn't come back, fo he went to meet his maker;

that old man Hatley, he's a real cat breaker!





--

Cheers,

Robbie Hatley

lonewolf at well dot com

www dot well dot com slant tilde lonewolf slant
 
Buffalo wrote:



> PPS: top posted because of the long explaination. :)




.... or you could have edited the quote? :-)
 
The cat came back again. :-( (Not AGP Aperture Size after all.)

"Buffalo" wrote:



> I really don't believe that was the cause of your problems.

> Reducing the Video Aperture Size may have 'fixed' it, I don't

> believe that was what was actually causing the problem.




Apparently not.



> However, if it keeps working for you, so be it.




It didn't. :-(



After working flawlessly for 10 days after I'd decreased my AGP

Aperture size form 256MB to 128MB, on the 11th day, my system

suddenly crashed again.



Same symptoms as always:

- Sound suddenly went silent.

- Mouse pointer stopped responding to USB Mouse.

- Mouse pointer continued responding to serial Mouse.

- All networks went offline.

(All symptoms happen suddenly and simultaneously, always the same.)



> PS: I have never heard of matching the Video Aperture Size to the

> same value of the video ram on your video graphics card.




Different video-card manufacturers have different recommendations.

My card says "set AGP Aperture equal to video RAM size".

My mom's card says "set AGP Aperture equal to 1/4 of system RAM size".



> I know setting it too high can cause your computer to refuse to boot.




Yes, it could eat up too much of the available address space.



> There are other factors to consider when choosing the Video Aperture Size.

> Check Video Aperture Size out on Google or another search engine to learn

> what it is, what is does, and why it 'can' be important.




The "AGP Aperture Size" information available on the Internet tends to be

vague and/or suspect. But I now don't think this is where my problem lies,

anyway.



I'm thinking now it's drivers or hardware. Probably hardware.

Probably VIDEO hardware, because lately I've noticed that these crashes

usually happen when I'm watching YouTube videos or playing video-intensive

games.



I opened up my computer, blew out all the dust, reconnected one fan

that wasn't even running, cleaned all heatsinks, removed and cleaned

and reinstalled all circuit boards, and reseated all power-supply

connectors.



I also noticed that 3 electrolytic capacitors on my video card are

bulging at the top (the 3 triangular pressure relief panels are splitting

apart) and they're leaking a brown crumbly substance (electrolyte?).

Not a good sign.



Worse, 3 electrolytic capacitors on my motherboard are also bulging,

splitting, and leaking.



I'm now suspecting out-of-spec power supply unit, causing damage to

both MB and video card.



Unless the capacitor damage is due to heat, in which case it could be

that the video-card and MB manufacturers chose capacitors which don't

perform well when exposed to constant 110F temperatures for 7 years.



So really, I need a new case, fans, power supply, MB, and video card.

But I can't afford any of it.



So now, my approach to the crash issue is:

1. I'm increasing ventillation, trying to get temperatures down.

2. I uninstalled and reinstalled all of my motherboard, video-card,

and monitor drivers back to factory originals, in case it's

an updated driver causing the crashes.

3. I re-installed all of the software I uninstalled earlier, because

this is clearly not a software issue.

4. If it keeps crashing, I'll just have to live with it until I can

afford to upgrade hardware.



This is becoming increasingly off-topic in this group because I can

see now it's likely not a Windows-2000 issue, so I'll stop posting

about this issue in this group, unless future evidence indicates

the OS is involved.



--

Cheers,

Robbie Hatley

lonewolf at well dot com

www dot well dot com slant tilde lonewolf slant
 
The cat came back again. :-( (Not AGP Aperture Size after all.)

Robbie Hatley wrote:



> I'm thinking now it's drivers or hardware. Probably hardware.

> Probably VIDEO hardware, because lately I've noticed that these crashes

> usually happen when I'm watching YouTube videos or playing video-intensive

> games.




I know it's not supposed to matter these days but in the old days we'd

have suspected an IRQ problem. Have you checked the BIOS settings in

that area .... so that it *is* letting Windows handle the interrupt

settings?
 
APIC assigns IRQ 20 to all Southbridge functions.

"Sid Elbow" wrote:



> Robbie Hatley wrote:

>

> > I'm thinking now it's drivers or hardware. Probably hardware.

> > Probably VIDEO hardware, because lately I've noticed that these crashes

> > usually happen when I'm watching YouTube videos or playing video-intensive

> > games.


>

> I know it's not supposed to matter these days but in the old days we'd

> have suspected an IRQ problem. Have you checked the BIOS settings in

> that area .... so that it *is* letting Windows handle the interrupt

> settings?




I did check that out. I set all the IRQs manually in BIOs instead

of letting the PIC (Programmable Interrupt Controller) set them

automatically.



BUT, that's irrelevant, bucause when windows starts up, it turns

the PIC chip *OFF*, and uses the APIC (Advanced Programmable

Interrupt Controller) chip instead. The APIC remaps all the IRQs.



"System Information/Hardware Resources/Conflicts & Sharing" reveals

that everything has a separate IRQ except for the following 7 items,

all of which are sharing IRQ 20:



1. USB Host Controller 1

2. USB Host Controller 2

3. USB 2.0 Host Controller

4. Network Controller

5. Audio Processing Unit

6. Audio Codec Interface

7. IEEE-1394 ("Firewire") Controller



Those are exactly the items that all go off-line instantly

(and *ONLY* those items) when my sytem crashes.



I don't think it's an IRQ issue, though. I happen to know that

those 7 items are all handled by the same IC on the motherboard,

namely the "Southbridge". So it makes sense that this chip

would have only one IRQ.



Something is screwing-up the Southbridge. Voltage, temperature,

drivers, hardware conflict (something jamming the PCI bus), or some

such thing.



Maybe too much heat. I notice the southbridge heatsink is so hot it

burns my finger when I touch it. I'll leave the computer case's

left cover off for the next fortnight and see if the crashing stops.



I'll also keep Windbond's "Hardware Doctor" up and running in the

background from now on. It monitors voltages and temperatures and

sets off an audio alarm if something gets out of spec.



But in the end, in light of the bulging/leaking capacitors I found

on my video card and MB yesterday, I think now that the problem is

"hardware wearout". I need a new computer. Sigh. Once I get a

decent job, I'll build myself a new machine and put this one out to

pasture.



--

Cheers,

Robbie Hatley

lonewolf at well dot com

www dot well dot com slant tilde lonewolf slant
 
APIC assigns IRQ 20 to all Southbridge functions.

Robbie Hatley wrote:

> "Sid Elbow" wrote:

>

>> Robbie Hatley wrote:

>>

>>> I'm thinking now it's drivers or hardware. Probably hardware.

>>> Probably VIDEO hardware, because lately I've noticed that these

>>> crashes usually happen when I'm watching YouTube videos or playing

>>> video-intensive games.


>>

>> I know it's not supposed to matter these days but in the old days

>> we'd have suspected an IRQ problem. Have you checked the BIOS

>> settings in that area .... so that it *is* letting Windows handle

>> the interrupt settings?


>

> I did check that out. I set all the IRQs manually in BIOs instead

> of letting the PIC (Programmable Interrupt Controller) set them

> automatically.

>

> BUT, that's irrelevant, bucause when windows starts up, it turns

> the PIC chip *OFF*, and uses the APIC (Advanced Programmable

> Interrupt Controller) chip instead. The APIC remaps all the IRQs.

>

> "System Information/Hardware Resources/Conflicts & Sharing" reveals

> that everything has a separate IRQ except for the following 7 items,

> all of which are sharing IRQ 20:

>

> 1. USB Host Controller 1

> 2. USB Host Controller 2

> 3. USB 2.0 Host Controller

> 4. Network Controller

> 5. Audio Processing Unit

> 6. Audio Codec Interface

> 7. IEEE-1394 ("Firewire") Controller

>

> Those are exactly the items that all go off-line instantly

> (and *ONLY* those items) when my sytem crashes.

>

> I don't think it's an IRQ issue, though. I happen to know that

> those 7 items are all handled by the same IC on the motherboard,

> namely the "Southbridge". So it makes sense that this chip

> would have only one IRQ.

>

> Something is screwing-up the Southbridge. Voltage, temperature,

> drivers, hardware conflict (something jamming the PCI bus), or some

> such thing.

>

> Maybe too much heat. I notice the southbridge heatsink is so hot it

> burns my finger when I touch it. I'll leave the computer case's

> left cover off for the next fortnight and see if the crashing stops.

>

> I'll also keep Windbond's "Hardware Doctor" up and running in the

> background from now on. It monitors voltages and temperatures and

> sets off an audio alarm if something gets out of spec.

>

> But in the end, in light of the bulging/leaking capacitors I found

> on my video card and MB yesterday, I think now that the problem is

> "hardware wearout". I need a new computer. Sigh. Once I get a

> decent job, I'll build myself a new machine and put this one out to

> pasture.




If you can't replace the caps yourself, perhaps you could look up a

replacement board for your PC (exact same model) and then you could just

replace that without really installing anything else. You may find it on the

Internet really cheap.

If you buy a different model motherboard, then you may have to install mb

drivers, etc.

Buffalo

PS: Yes, there were a bunch of bad caps being used years ago, even by some

of the big names.
 
Back
Top