How to do it...

  1. To get an idea of how the equals operator works, let's compare each element to a scalar value:
>>> college_ugds_ == .0019
  1. This works as expected but becomes problematic whenever you attempt to compare DataFrames with missing values. This same equals operator may be used to compare two DataFrames with one another on an element-by-element basis. Take, for instance, college_ugds_ compared against itself, as follows:
>>> college_self_compare = college_ugds_ == college_ugds_
>>> college_self_compare.head()
  1. At first glance, all the values appear to be equal, as you would expect. However, using the all method to determine if each column contains only True values yields an unexpected result:
>>> college_self_compare.all()
UGDS_WHITE False UGDS_BLACK False UGDS_HISP False UGDS_ASIAN False UGDS_AIAN False UGDS_NHPI False UGDS_2MOR False UGDS_NRA False UGDS_UNKN False dtype: bool
  1. This happens because missing values do not compare equally with one another. If you tried to count missing values using the equal operator and summing up the boolean columns, you would get zero for each one:
>>> (college_ugds_ == np.nan).sum()
UGDS_WHITE 0 UGDS_BLACK 0 UGDS_HISP 0 UGDS_ASIAN 0 UGDS_AIAN 0 UGDS_NHPI 0 UGDS_2MOR 0 UGDS_NRA 0 UGDS_UNKN 0 dtype: int64
  1. The primary way to count missing values uses the isnull method:
>>> college_ugds_.isnull().sum()
UGDS_WHITE 661 UGDS_BLACK 661 UGDS_HISP 661 UGDS_ASIAN 661 UGDS_AIAN 661 UGDS_NHPI 661 UGDS_2MOR 661 UGDS_NRA 661 UGDS_UNKN 661 dtype: int64
  1. The correct way to compare two entire DataFrames with one another is not with the equals operator but with the equals method:
>>> college_ugds_.equals(college_ugds_)
True
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.14.245