TCLUG Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [TCLUG:5462] "Race Condition"?



At 11:23 PM 4/21/99 -0500, you wrote:
>- Well it can be any shared resource not neccessarily memory. It
> could be a harddisk, or a serial port, or a mouse.
>- It does not have to be processes in software, it can be hardware
> mechanical or electrical or chemical processes even.

I thought I would pipe up with a couple of examples that can occur even on the simplest of hardware (and the funniest stories I could recall). The sharing is not neccessarily obvious...

Lets say you have a program that is trying to save "your most important data" to disk... you could be (basically) in a race condition with the power -- will the power fail before all the data gets committed to disk? The usual time this is a concern is if your machine switches over to UPS and has only so long to shutdown. Funny example from days of yore (ie, my dad): in the 70s Colorado power utility bought a bunch of mainframes from a certain mfg in the twin cities. Mainframes ran on 400hz power (60hz is basically bad), and used a flywheel in the power converter to change the cycle rate. They fly wheel would keep things going during minor (fraction of a second) power dips. The power utility had about two bowling alleys worth of lead acid batteries to keep the mainframes going for one second after the power fail; they even had little go carts to run around and tend to the batteries. The one second (or so) was enough time to fire up the deisel generator which would power mainframes.. The only time this emergency system was used live, everything worked except the starter battery for the generator.

Level interrupts (like the PCI bus) are a race condition for some hardware/software designs. One of NTs problems (and netware, and...) has to due with how it handles interrupts on the PCI bus. Basically an interrupt signal is voltage change (raises or lowers), holds, then goes back. A level interrupt is noted by the CPU after the voltage change. On a lot of level interrupt systems the level interrupt is held until the interrupt handler tells it to go away. Here is the catch: the software must tell it to go away in a coordinated, and time constrained fashion (sti/cli do not stop or cause the CPU to ignore interrupts). Otherwise, the CPU will register another interrupt, usually repeatedly. What happens there is CPU will dump a bunch of stuff onto the interrupt-time stack and transition control to the interrupt handler... repeatedly doing this will cause the stack to overflow (which cause another CPU exception, etc.).

According to http://www.infoworld.com/cgi-bin/displayStory.pl?990317.ecxeon.htm this is a reason why NT wants more than one CPU:

Dual-processor Pentium III Xeon systems should also help IT managers trying to overcome Windows NT's tendency to spike to 100 percent usage when subjected to numerous simultaneous interrupts, which in turn leads to system crashes, said analysts at the Aberdeen Group, in Boston. Deploying a second processor should help alleviate some of those crashes, they said.

All good things,
Randy Maas
randym@acm.org