Topics Related to Bad Memory


So you think you might have problems with your memory, eh? We don't usually come to this conclusion ourselves, but sometimes we do. In particular, if a system is exhibiting various, seemingly unrealted, odd and unusual system crashes, we will pronounce that "the computer is sick". Similarly, if programs which are normally rock solid, like TSKMON itself, the command processor, start abnormally terminating, we will say "your computer is sick". One customer with a sick computer had the SYSOP program bomb out of the dashboard with memory protection errors.

A sick computer does not necessarily mean sick memory. But I'll tell you one thing: 9 times out of 10, if your memory IS sick, its going to PASS any diagnostic you can throw at it with flying colors. I wish I had a paycheck for every time I've had this conversation:

"I think your memory might be sick"
"No way. The computer tests it every time it boots up"

In fact, in the many occasions we HAVE concluded that memory was sick (invariably done by observing that problems instantly go away when the memory is replaced), I've only seen ONE computer that actually reported a problem in the BIOS power on startup test memory check.

I guess now is as good a time as any to mention that we made an effort to install a "TSX startup memory check" into the operating system. It was not a very successful effort, because conducting the test caused TSX to fail to boot on many computers that it did not really have any problem with. There is a TSGEN parameter named AUTOMEM that you can set to 1 to activate the test. It might help TSX detect bad memory and it also might help it detect very odd computers which have huge "holes" in the memory addressing scheme. For example, we have seen computers in which the first 16 megs was accessed at normal addresses, but the upper 16 megs had addresses starting with 2 billion instead of 17 million.

As cheap as memory is these days, if you think yours is bad, replace it and see if that makes the problems go away. In the remainder of this article, I'll address a couple of topics which might avoid your having to do this.

First, you should know that you can use the TOTMEMPGS. gen parameter to force TSX not to use all of the memory. Let's suppose that you have 32 megs but you can run for an hour or so in 16 for purposes of testing. Use TOTMEMPGS and hack off the upper 16 megs to force TSX not to use them. If your symptoms dissapear, you have good reason to think that upper SIMM is bad.

The obvious next question is, "how can I tell TSX to use the upper 16 megs but not the lower 16 megs?" The answer is, reset TOTMEMPGS so TSX sees 32 megabytes, and use the RMEMSTART1 and RMEMEND1 parameters to tell TSX to stay from the bottom 16 megs. I think the commands in the TSX32.GEN file would be:

SET RMEMSTART1=0
SET RMEMEND1=16777216

Remember, those parameters stay set until you set them back to zero; it won't undo those settings just to comment those lines in the gen file out.

The other thing to keep in mind about "bad memory" is that there are lots of ways computers can get sick besides the memory. We saw one computer that had a bad memory controller. And lots of times, especially with 486/100 motherboards, we have seen the symptoms dissapear if you turn off the system cache. This is generally a very painful thing to do; if disabling the cache solves your problem, but slows your system to a snails pace, you need to chuck that motherboard in the trash. One customer, however, saw an INCREASE in system performance (and resolution of sick computer syndrome) when he disabled his cache.

So two things you can do to try to solve your very frustrating problems without buying more memory just to test with are restricting TSX to only part of the memory, and disabling the cache. One other thing that comes to mind is to pull out all cards that you can boot without, and see if the problem is caused by some interference between ethernet cards, serial multiplexors, and so forth. Finally, these modern BIOSs have a bazillion settings so they can adapt to different hardware configurations. If those things are off, anything can happen. You might try setting the BIOS to factory defaults, or taking values like memory wait states and making them more conservative to see if you are pushing your hardware past its limits. I hope these comments help -- good luck!