Renowned hardware historian and reverse engineer Ken Shirriff recently discovered the exact transistors in the original Intel Pentium processor that caused the “FDIV bug,” leading to a $475 million product recall in 1994. As you can see on his Mastodon threadShirriff pored over the microscope into PLA, which contains the faulty partition table, looking for the root cause of Intel’s first major failure 30 years ago.
The photo above is a photo of the processor die of the original Pentium chip, Intel’s first P5 architecture processor that helped the company become a household name. The Pentium was made using an 800nm process and the above image was taken as a result of stitched microscope photographs. The die contains 3.1 million transistors, with the transistor grids observable under a microscope and block operations on the die identifiable. Compare that to today’s processors, which have tens of billions of transistors and are almost unreadable.
The math error that led to the FDIV error was caused by calculation errors in the PLA (Programmable Logic Array). The Pentium floating point unit was much faster than contemporary chips, thanks to the SRT division algorithm. SRT calculates division at two bits per clock cycle, compared to one bit per clock cycle in the Pentium predecessor.
For this to work, SRT required a 2048-cell table in the matrix, containing the values -2, -1, 0, 1, and 2 in a very compact 112 rows. The values are indicated by the presence or absence of transistors along the grid points. This would be a great strategy, except for one drawback: 5 entries in the table are missing key transistors, set to 0 by default, not the correct “2”.
Mislabeled entries cause an error in floating-point calculations, but the rarity of this error was debated at the time. After being discovered by Professor Thomas Nicely, the FDIV bug was deemed irrelevant by Intel, stating that it only happens once every 27,000 years. IBM declared that this could happen every 24 days and stopped selling Pentiums. Intel bowed to enormous financial pressure and recalled all affected chips, resulting in a loss of $475 million (see our 30th anniversary post for more on this).
“Clever mathematicians discovered the Pentium division algorithm and the missing entries in 1995 by examining the error pattern,” Shirriff says. “But I can confirm it in silicon.” Moreover, Shirriff’s investigation found 16 missing data points, 11 more than the original five expected. These 11 don’t cause errors simply “due to luck”. Intel later solved this problem by filling all unused entries on the boards with 2s, a quick fix that worked and saved a lot of space in future versions of the Pentium.
A more complete description of the Pentium chip and the bug can be found in Shirriff’s article the entire Mastodon thread. Shirriff promises a deeper dive into the investigation in the coming days on your blogwhich may include the ability to repair corrupted Pentiums via physical PLA editing.