Puzzle 12 Multiplying

 import​ ​pandas​ ​as​ ​pd
 
 v = pd.Series([.1, 1., 1.1])
 out = v * v
 expected = pd.Series([.01, 1., 1.21])
 if​ (out == expected).all():
 print​(​'Math rocks!'​)
 else​:
 print​(​'Please reinstall universe & reboot.'​)

Guess the Output

images/aside-icons/important.png

Try to guess what the output is before moving to the next page.

images/hline.png

This code will print: Please reinstall universe & reboot.

images/hline.png

out == expected returns a Boolean pandas.Series. The all method returns True if all elements are True.

When you look at out and expected, they seem the same:

 In [1]: out
 Out[1]:
 0 0.01
 1 1.00
 2 1.21
 dtype: float64
 In [2]: expected
 Out[2]:
 0 0.01
 1 1.00
 2 1.21
 dtype: float64

But when we compare, we see something strange:

 In [2]: out == expected
 Out[2]:
 0 False
 1 True
 2 False
 dtype: bool

Only the middle value (1.0) is equal.

Looking deeper, we see the problem:

 In [3]: ​print​(out[2])
 1.2100000000000002

There is a difference between how Pandas is showing the value and how print does.

String Representation

images/aside-icons/tip.png

Always remember that the string representation of an object is not the object itself. This is beautifully illustrated by the painting The Treachery of Images.

Some new developers, when seeing this or similar issues, come to the message boards and say, “We found a bug in Pandas!” The usual answer is, “Read the fine manual” (RTFM).

Floating point is sort of like quantum physics: the closer you look, the messier it gets.

— Grant Edwards

The basic idea behind this issue is that floating points sacrifice accuracy for speed (i.e., cheat). Don’t be shocked. It’s a trade-off we do a lot in computer science.

The result you see conforms with the floating-point specification. If you run the same code in Go, Rust, C, Java, … you will see the same output.

If you want to learn more about floating points, see the links in the following section. The main point you need to remember is that they are not accurate, and accuracy worsens as the number gets bigger.

You’re going to work a lot with floating points and will need to compare pandas.Series or pandas.DataFrame. Don’t expect everything to be exactly equal; think of an acceptable threshold and use the numpy.allclose function.

 In [4]: ​import​ ​numpy​ ​as​ ​np
 In [5]: np.allclose(out, expected)
 Out[5]: True

numpy.allclose has many options you can tweak. See the documentation.

 import​ ​numpy​ ​as​ ​np
 import​ ​pandas​ ​as​ ​pd
 
 v = pd.Series([.1, 1., 1.1])
 out = v * v
 expected = pd.Series([.01, 1., 1.21])
 if​ np.allclose(out, expected):
 print​(​'Math rocks!'​)
 else​:
 print​(​'Please reinstall universe & reboot.'​)

If you need better accuracy, look into the decimal module, which provides correctly rounded decimal floating-point arithmetic.

Further Reading

Floating-Point Arithmetic: Issues and Limitations in the Python Documentation

http://docs.python.org/3/tutorial/floatingpoint.html

floating point zine by Julia Evans

http://twitter.com/b0rk/status/986424989648936960

What Every Computer Scientist Should Know About Floating-Point Arithmetic

http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

numpy.allclose Documentation

http://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html

Built-in decimal Module

http://docs.python.org/3/library/decimal.html

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.147.87