(NOTE - For the latest developments from every motherboard maker on this developing recall please see this PCSTATS article - Intel 6-Series Chipset Recall - Sandy Bridge Intel P67 & H67) "Being reasonable and willing to help you, our readers, we searched for available information on the problem. Since many would like to hear from Intel itself, we contacted Mikhail Rybakov, Intel PR Manager Russia/C.I.S., over the phone and asked him a few questions. Here's what we've managed to find out."
So what's the problem? The leakage current turned out to be higher than planned for one of the transistors. This happened because the dielectric layer turned out to be too thin for the chosen voltage, or the voltage was too high for that chip design. It's not clear how the error was made. Anyway, such things happen much more often than we hear about them. But in this case Intel is unlucky, because the problematic transistor is in the clock generator circuit responsible for SATA-300 ports (of which there are 4). In certain conditions this may result in controller synchronization errors, which, in turn, will lead to read and write errors. This may reduce performance of drives at best, as data will be read/written several times until confirmation. Under the least favorable conditions, data may be corrupted. This is not a certainty, but a possibility.
This is not a logical error in die topology (like a corrupt interconnection or something), but a potential problem that may show over time as a result of wear. Serious errors are detected as soon as the first wafer is made, because chips are run through a number of logic tests. How does one find a less serious error? All manufacturers use more or less similar accelerated aging methods. The same batch of chips is exposed to high temperatures in a heat chamber as well as high voltages to simulate prolonged wear. There are rather strict mathematical models which allow engineers to predict mean time between failure (MTBF) based on statistical damage results obtained in aforementioned wear tests. That's exactly what we're dealing with today: a prediction from Intel (we'll discuss exact changes and time periods later). One has to understand that it's a statistical estimate, not a fact. There are simply no 3-year old machines based on the new chipsets at the moment to speak of actual defects.
Since data stored on computers often costs much more than computers themselves (unless it's a gaming rig), Intel made a tough decision not to wait for actual trouble. As the Murphy's law states, "Anything that can go wrong, will go wrong," so they had to look for a solution."