Puzzle 12 | Multiplying |
| import pandas as pd |
| |
| v = pd.Series([.1, 1., 1.1]) |
| out = v * v |
| expected = pd.Series([.01, 1., 1.21]) |
| if (out == expected).all(): |
| print('Math rocks!') |
| else: |
| print('Please reinstall universe & reboot.') |
Guess the Output | |
---|---|
Try to guess what the output is before moving to the next page. |
This code will print: Please reinstall universe & reboot.
out == expected returns a Boolean pandas.Series. The all method returns True if all elements are True.
When you look at out and expected, they seem the same:
| In [1]: out |
| Out[1]: |
| 0 0.01 |
| 1 1.00 |
| 2 1.21 |
| dtype: float64 |
| In [2]: expected |
| Out[2]: |
| 0 0.01 |
| 1 1.00 |
| 2 1.21 |
| dtype: float64 |
But when we compare, we see something strange:
| In [2]: out == expected |
| Out[2]: |
| 0 False |
| 1 True |
| 2 False |
| dtype: bool |
Only the middle value (1.0) is equal.
Looking deeper, we see the problem:
| In [3]: print(out[2]) |
| 1.2100000000000002 |
There is a difference between how Pandas is showing the value and how print does.
String Representation | |
---|---|
Always remember that the string representation of an object is not the object itself. This is beautifully illustrated by the painting The Treachery of Images. |
Some new developers, when seeing this or similar issues, come to the message boards and say, “We found a bug in Pandas!” The usual answer is, “Read the fine manual” (RTFM).
Floating point is sort of like quantum physics: the closer you look, the messier it gets.
— Grant Edwards
The basic idea behind this issue is that floating points sacrifice accuracy for speed (i.e., cheat). Don’t be shocked. It’s a trade-off we do a lot in computer science.
The result you see conforms with the floating-point specification. If you run the same code in Go, Rust, C, Java, … you will see the same output.
If you want to learn more about floating points, see the links in the following section. The main point you need to remember is that they are not accurate, and accuracy worsens as the number gets bigger.
You’re going to work a lot with floating points and will need to compare pandas.Series or pandas.DataFrame. Don’t expect everything to be exactly equal; think of an acceptable threshold and use the numpy.allclose function.
| In [4]: import numpy as np |
| In [5]: np.allclose(out, expected) |
| Out[5]: True |
numpy.allclose has many options you can tweak. See the documentation.
| import numpy as np |
| import pandas as pd |
| |
| v = pd.Series([.1, 1., 1.1]) |
| out = v * v |
| expected = pd.Series([.01, 1., 1.21]) |
| if np.allclose(out, expected): |
| print('Math rocks!') |
| else: |
| print('Please reinstall universe & reboot.') |
If you need better accuracy, look into the decimal module, which provides correctly rounded decimal floating-point arithmetic.
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html
3.133.147.87