There's more...

For many operations, pandas has multiple ways to do the same thing. In the preceding recipe, the criteria for salary uses two separate boolean expressions. Similarly to SQL, Series have a between method, with the salary criteria equivalently written as follows:

>>> criteria_sal = employee.BASE_SALARY.between(80000, 120000)

Another useful application of isin is to provide a sequence of values automatically generated by some other pandas statements. This would avoid any manual investigating to find the exact string names to store in a list. Conversely, let's try to exclude the rows from the top five most frequently occurring departments:

>>> top_5_depts = employee.DEPARTMENT.value_counts().index[:5]
>>> criteria = ~employee.DEPARTMENT.isin(top_5_depts)
>>> employee[criteria]

The SQL equivalent of this would be as follows:

SELECT 
*
FROM
EMPLOYEE
WHERE
DEPARTMENT not in
(
SELECT
DEPARTMENT
FROM (
SELECT
DEPARTMENT,
COUNT(1) as CT
FROM
EMPLOYEE
GROUP BY
DEPARTMENT
ORDER BY
CT DESC
LIMIT 5
)
);

Notice the use of the pandas not operator, ~, which negates all boolean values of a Series.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.98.250