## What Every Programmer Should Know About Floating-Point Arithmetic 359

Posted
by
Soulskill

from the gaining-understanding-bit-by-bit dept.

from the gaining-understanding-bit-by-bit dept.

-brazil- writes

*"Every programmer forum gets a steady stream of novice questions about numbers not 'adding up.' Apart from repetitive explanations, SOP is to link to a paper by David Goldberg which, while very thorough, is not very accessible for novices. To alleviate this, I wrote The Floating-Point Guide, as a floating-point equivalent to Joel Spolsky's excellent introduction to Unicode. In doing so, I learned quite a few things about the intricacies of the IEEE 754 standard, and just how difficult it is to compare floating-point numbers using an epsilon. If you find any errors or omissions, you can suggest corrections."*
## Interval arithmetic (Score:5, Insightful)

## Re:Only scratching the surface (Score:1, Insightful)

If you're interested in that, you're better of reading the article by David Goldberg (linked in TFS and the first paragraph of TFA). The whole point of -brazil-'s page is that it sums up some essential issues for novice programmers that find those in-depth descriptions complicated and daunting. Better to keep his page simple than duplicate an already-existing article and become just as inaccessible to newbies.

## Re:Analog Computers (Score:3, Insightful)

Precision isn't that big a deal (we aren't so good at making physical things that 7 decimal digits become problematic, even on something the scale of an aircraft carrier, 6 digits is enough to place things within ~ 1 millimeter).

The bigger issue is how the errors combine when doing calculations, especially iterative calculations.

## Re:#1 Floating Point Rule (Score:5, Insightful)

"The floating-point types are float and double, which are conceptually associated with the 32-bit single-precision and 64-bit double-precision format IEEE 754 values and operations as specified in IEEE Standard for Binary Floating-Point Arithmetic , ANSI/IEEE Std. 754-1985 (IEEE, New York)."

http://java.sun.com/docs/books/jvms/second_edition/html/Overview.doc.html [sun.com]

## Another potential solution is Interval arithmetic (Score:4, Insightful)

## Re:If you want accuracy... (Score:5, Insightful)

Maybe because BCD is the worse possible way to do 'proper' decimal arithmetic, also it would absolutely be very slow.

BCD = 2 decimal digits per 8 bits (4 bits per dd). Working 'inside' the byte sucks

Instead you can put 20 decimal digits in 64bits (3.2 bits per db) and do math much more faster

Why don't any languages except COBOL and PL/I use it?

Exactly

## Re:If you want accuracy... (Score:2, Insightful)

## Re:#1 Floating Point Rule (Score:4, Insightful)

That'd be like not using Java because it doesn't represent ints using ones complement; if your code relies on the specific internal implementation of data primitives you're probably doing something wrong.

(Before I get replies: Of course sometimes these things really do matter, but not often enough to dismiss a multi-purpose langauge.)

## Re:Analog Computers (Score:3, Insightful)

The problem is, if you're doing a long string of calculations--say a loop that repeats calculations thousands of times with the outcome of the last calculation becoming the input for the next (approximating integrals often does this) then the rounding errors can accumulate if you're not paying attention to how the floating point works.

## Re:I'd just avoid it (Score:3, Insightful)

Depends how many "integers" you use. If you need accuracy - and "scientific" computing certainly does - then don't use floats. Decide how much accuracy you need, and implement that with as many bytes of data as it takes.

Floats are for games and 3D rendering, not "science".

## Re:If you want accuracy... (Score:3, Insightful)

Instead you can put 20 decimal digits in 64bits (3.2 bits per db) and do math much more faster

I want accurate math, not estimates.

Math with 20 decimal digits in 64 bits is proper decimal arithmetic. It acts exactly like BCD does, it just doesn't waste tons of space and CPU power.

## Re:I'd just avoid it (Score:5, Insightful)

You've never done any scientific computing, it seems. While it's a very broad term, and floats certainly not the best tool for *all* computing done by science, anyone with even the most basic understanding knows that IEEE 754 floats *are* the best tool most of the time and exactly the result of deciding how much accuracy you need and implementing that with as many bytes of data as it takes. Hardly anything in the natural sciences needs more accuracy than a 64 bit float can provide.

## Re:Simple, effective and useful (Score:3, Insightful)

Sure it can be: by starting with simple explanations fit for novices (who usually aren't actually doing serious numerical math and simply wonder how come 0.1 + 02 != 0.3) and getting into more details progressively.

And I mention the alternatives to floating-point formats and when to use what.

## Re:Interval arithmetic (Score:5, Insightful)

Gah. Yet another unintelligible wikipedia mathematics article. For once I did like to see an article that does a great job *teaching* about a subject. Perhaps wikipedia isn't the right home for this sort of content, but my general feeling whenever reading something is wikipedia is that the content was drafted by a bunch of overly precise wankers focusing on the absolute right terminology without focusing on helping the reader understand the content.

## Re:strictfp (Score:3, Insightful)

Not really. It might point to BigDecimal, but leave strictfp out of it. Remember, this is for starting programmers, not creators of advanced 3D or math libs.

## Re:No, base 10 arithmetic isn't "more accurate". (Score:1, Insightful)

The article gives the impression that base 10 arithmetic is somehow "more accurate". It's not.

Anything that works out exactly in base 2 also works out exactly in base 10, but not vice versa. I'd call that "more accurate", but YMMV.

## Re:Stop with the educational articles (Score:3, Insightful)

Knowing how to do things correctly - like proper floating point math - is one of the ways to separate the true CS professional from the wannabe new graduates.

True, except that HR people and hiring managers neither know nor care about doing things correctly, they just want cheap and fast. Just make sure you have all the right TLAs on your resume, you'll get a job. You can put "IEEE 754 expert" down though. They won't recognized the reference so maybe they'll be impressed by it.

## Re:If you want accuracy... (Score:3, Insightful)

You completely missed my point.

I'm not comparing BCD to floating point, I'm comparing BCD with other ways of encoding decimal numbers in a computer

## Re:Simple, effective and useful (Score:4, Insightful)

I don't think you are correct about two numbers not being "nearly equal" when they are both close to zero, but with opposite signs. The function returns "true" in this case, no? Are you suggesting this is undesirable? I could see for some use cases that property might be undesirable, but if that's what you meant it wasn't clear. Certainly that property is desirable for some applications.IMO this sort of thing is a good reason NOT to write a nearlyequals(a,b) function. That will just lull you into a false sense of security that the same rules are appropriate in every case.

You need to consider each case on it's own merits to decide what is meant by "nearly equals" in context.

In some cases that may be best defined in terms of absolute error, in some cases that may be best defined in terms of error relative to the value and in yet other cases it may be best defined in terms of the error relative to the current precision which is related to the value for larger numbers but becomes fixed for smaller (subnormal) numbers.

## Re:If you want accuracy... (Score:4, Insightful)

If you want accuracy, BCD is still a failure. It only does base 10 instead of base 2. A truly accurate math system would use 2 integers, one for numerator and one for denominator and thus get all rational numbers. If you need irrationals you get even more complicated. But don't pretend BCD is accurate, it fails miserably on common math problems like 1/3.

## Re:Analog Computers (Score:4, Insightful)

Well, it would depend on what you're doing the calculations for, and how you're doing them.

Say it used diesel fired engines, and you were instructed to calculate the fuel consumption per engine revolution, and then apply that to a trip. I don't know the specifics on an aircraft carrier, so I'll just make up some numbers.

At full speed, the ship travels at 12 nautical miles per hour (knots). The engines spin at 300rpm. It burns 1275 gallons of fuel per hour.

That's 18,000 engine revolutions per hour, or 0.0708334 gallons per revolution.

1,000 miles at 12 knots = 84.3333334 hours.

If you are to travel 1,000 nautical miles, 18,000 * 83.3333334 = 1,500,000.0012 revolutino. At 0.0707334 gallons per revolution, that would be 106,100.100085 gallons.

But knowing that it burns 1,275 gallons per hour at 12 knots, and you will be traveling for 83.3333334 hours, you will require 106,250.000085 gallons. Using the measure of gallons per revolution to try to come up with a very precise number to work with, you've actually fallen short by 150 gallons for the trip. I can imagine a slight embarrassment by having your aircraft carrier run out of fuel just 7 minutes from its destination.

Using 7 decimal points of precision, when it's multiplied so many times, it can easily cause errors.

I'd be pretty sure they aren't counting gallons per revolution, I only used that as an example of where errors could happen. If you're considering the full length of the ship, 0.1 inches is more than enough to believe you have a good number. :) I believe due to expansion of the metals, the total length of the ship may change more than that depending on if it's a hot or cold day. :)

## Re:Analog Computers (Score:3, Insightful)

Sure, you can make it a problem, but it isn't particularly insidious.

And the part where I say "The bigger issue is how the errors combine when doing calculations" is a pretty compact version of what you said.

## Re:#1 Floating Point Rule (Score:3, Insightful)

There are some decent points there, but a lot of them aren't really related to IEEE 754 compatibility. For example, bullet point #5 on their first-page list of five "gratuitous mistakes" is that Java doesn't support operator overloading. But by that standard, C sucks too, and yet is somehow used in lots of floating-point libraries.

## Re:Amazing how few programmers use real maths. (Score:3, Insightful)

It is safe to compare to any small integer, not just zero, as long as you are checking if the the value came from an assignment. It is also safe to use small negative powers of two.

One big problem I have is with programmers who religiously add these epsilon functions and screw up algorithms. In my experience, about 99% of the == statements with floating point are explicitly testing "did the assignment to the same value earlier get executed?" Comparing the bit patterns is exactly what is wanted, stop messing it up!

## Re:.9999999984 Post (Score:3, Insightful)

Bullshit.

1. The web was very much alive when the FDIV bug was discovered.

2. I seriously doubt you as a teenager spent 2 weeks finding this bug, this guy (who happens to be the one who found it) spent some weeks digging through everything to prove it was the FPU that bit him:

http://www.trnicely.net/pentbug/pentbug.html [trnicely.net]

Oh, and even when the "computer is wrong" it's still a human error.