## What Every Programmer Should Know About Floating-Point Arithmetic 359

-brazil- writes

*"Every programmer forum gets a steady stream of novice questions about numbers not 'adding up.' Apart from repetitive explanations, SOP is to link to a paper by David Goldberg which, while very thorough, is not very accessible for novices. To alleviate this, I wrote The Floating-Point Guide, as a floating-point equivalent to Joel Spolsky's excellent introduction to Unicode. In doing so, I learned quite a few things about the intricacies of the IEEE 754 standard, and just how difficult it is to compare floating-point numbers using an epsilon. If you find any errors or omissions, you can suggest corrections."*
## Only scratching the surface (Score:5, Interesting)

You really need to talk about associativity (and the lack of it). ie a+b+c != c+b+a, and the problems this can cause when vectorizing or otherwise parallelizing code with fp.

And any talk about fp is incomplete without touching on catastrophic cancellation.

## If you want accuracy... (Score:3, Interesting)

use BCD math. With h/w support it's fast enough...

Why don't any languages except COBOL and PL/I use it?

## I'd just avoid it (Score:5, Interesting)

Given the great complexity of dealing with floating point numbers properly, my first instinct, and my advice to anybody not already an expert on the subject, is to avoid them at all cost. Many algorithms can be redone in integers, similarly to Bresenham, and work without rounding errors at all. It's true that with SSE, floating point can sometimes be faster, but anyone who doesn't know what he's doing is vastly better off without it. At the very least, find a more experienced coworker and have him explain it to you before you shoot your foot off.

## No, base 10 arithmetic isn't "more accurate". (Score:4, Interesting)

The article gives the impression that base 10 arithmetic is somehow "more accurate". It's not. You still get errors for, say, 1/3 + 1/3 + 1/3. It's just that the errors are different.

Rational arithmetic, where you carry along a numerator and denominator, is accurate for addition, subtraction, multiplication, and division. But the numerator and denominator tend to get very large, even if you use GCD to remove common factors from both.

It's worth noting that, while IEEE floating point has an 80-bit format, PowerPCs, IBM mainframes, Cell processors, and VAXen do not. All machines compliant with the IEEE floating point standard should get the same answers. The others won't. This is a big enough issue that, when the Macintosh went from Motorola 68xxx CPUs to PowerPC CPUs, most of the engineering applications were not converted. Getting a different answer from the old version was unacceptable.

## Re:If you want accuracy... (Score:4, Interesting)

also it would absolutely be very slow

Depends on the architecture. IBM's most recent POWER and System-Z chips have hardware for BCD arithmetics.

## Re:Analog Computers (Score:2, Interesting)

Irrational numbers are not such a problem as rational numbers which can't be represented in the base used.

Lets say our computer has 6-digit-decimal precision. If you add two irrational numbers, say pi and e, you'll get 5.85987. It's imprecise, but imprecision is necessary, since it can't be represented in any base.

But if you add 3/7 and 2/3 you get 1.90524 which is imprecise even though a precise answer

does exist.## Hard to debug floating point when it goes wrong! (Score:5, Interesting)

Over at Evans Hall at UC/Berkeley, stroll down the 8th floor hallway. On the wall, you'll find an envelope filled with flyers titled, "Why is Floating-Point Computation so Hard to Debug whe it Goes Wrong?"

It's Prof. Kahan's challenge to the passerby - figure out what's wrong with a trivial program. His program is just 8 lines long, has no adds, subtracts, or divisions. There's no cancellation or giant intermediate results.

But Kahan's malignant code computes the absolute value of a number incorrectly on almost every computer with less than 39 significant digits.

Between seminars, I picked up a copy, and had a fascinating time working through his example. (Hint: Watch for radioactive roundoff errors near singularities!)

Moral: When things go wrong with floating point computation, it's surprisingly difficult to figure out what happened. And assigning error-bars and roundoff estimates is really challenging!

Try it yourself at:

http://www.cs.berkeley.edu/~wkahan/WrongR.pdf [berkeley.edu]

## Re:Hard to debug floating point when it goes wrong (Score:2, Interesting)

Some Mathematica code:

The result: http://www.untruth.org/~josh/real-rounding.png [untruth.org]

## Only in a perfect world, what about MS Access? (Score:1, Interesting)

And I've seen the addition of two money columns defined in Access get magical values. I'm sure somebody here can explain the situation here better than I can, but I've seen $1.00+.50 become $1.49. But in MS Access's defense, a float is a poor way to define money especially in MS Access. I was just hired to bandaid the broken solution.

## Re:Analog Computers (Score:4, Interesting)

``Nobody would expect someone to write down 1/3 as a decimal number, but because people keep forgetting that computers use binary floating point numbers, they do expect them not to make rounding errors with numbers like 0.2.''

A problem which is exacerbated by the fact that many popular programming languages use (base 10) decimal syntax for (base 2) floating point literals. Which, first of all, puts people on the wrong foot (you would think that if "0.2" is a valid float literal, it could be represented accurately as a float), and, secondly, makes it impossible to write literals for certain values that _could_ actually be represented exactly as a float.

## Thanks to Sun (Score:5, Interesting)

Note that the cited paper location is docs.sun.com; this version of the article has corrections and improvements from the original ACM paper. Sun has provided this to interested parties for 20odd years (I have no idea what they paid ACM for rights to distribute).

http://www.netlib.org/fdlibm/ [netlib.org] is the Sun provided freely distributable libm that follows (in a roundabout way) from the paper.

I don't recall if K.C. Ng's terrific "infinite pi" code is included (it was in Sun's libm) which takes care of intel hw by doing the range reduction with enough bits for the particular argument to be nearly equivalent to infinite arithmetic.

Sun's floating point group did much to advance the state of the art in deployed and deployable computer arithmetic.

Kudos to the group (one hopes that Oracle will treat them with the respect they deserve)

## Re:#1 Floating Point Rule (Score:4, Interesting)

Repeatability. If your code and language are standard-compliant, then you'll get the same floating-point math results as someone using another compliant language on any other platform. Not crucial for some tasks, but it certainly is for others, such as scientific work.

Wouldn't it be great if you could change a switch in your computer to change all double precision fp from 53 bit mantissa to 52 bit, and if your results are suddenly radically different then you know your first set of results couldn't be trusted?

Repeatability is highly overrated. It's no good if you get the wrong results, and a different computer system gets you identical wrong results.

## Re:Simple, effective and useful (Score:3, Interesting)

That's what I was thinking too. But hey, what do I know, I just work computers, I'm not a mathematician. :)

The way some folks do it,

0.1 + 02 = 0 + 2

0 + 2 = 2

There was a thread on here a few weeks ago, where I explained it in the calculation of payroll. If you're calculating fractional hours, then those decimals come in handy.

1 minute = 0.0166666666666667 hours.

Depending on how many decimal points you make it, it can really mess with your pay.

0.01 * 60 = 0.6

0.02 * 60 = 1.2

0.0166 * 60 = 0.996

0.0167 * 60 = 1.002

For hourly folks, check your paychecks. I'd bet the company is using the most advantageous rounding for their profit rather than for accuracy.

I was recently told on a something that one interval = 0.0083333 (1/120), that it should always be simply cut off (not rounded) at 1 decimal point. I tried to explain, that would make the numbers totally wrong.

1 = 0.0

10 = 0.0

10 instances of 10 would then be 0.0, rather than 0.8. They wanted "absolute" accuracy over thousands of instances, but still insisted chopping it off to one decimal place is the way they wanted it. *sigh*

I do understand why floating point numbers can induce errors, but is it necessary to make it worse by adding in sloppy math?