T

SettingWithCopyWarning

The SettingWithCopyWarning is just a warning, so your code will still run and produce a result. However, if you do see this warning, it is a “code smell” that maybe you need to re-write something in your code.

Let’s work with one of our small example data sets to recreate the warning.

import pandas as pd

dat = pd.read_csv("data/concat_1.csv")
print(dat)
   A  B  C  D
0 a0 b0 c0 d0
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a3 b3 c3 d3

T.1 Modifying a Subset of Data

It’s pretty common to subset your data for values you need, and then make changes to that subset.

subset = dat[["A", "C"]]
print(subset)
   A  C
0 a0 c0
1 a1 c1
2 a2 c2
3 a3 c3
# this will trigger the warning
subset["new"] = ["bunch", "of", "new", "values"]
print(subset)

   A  C    new
0 a0 c0  bunch
1 a1 c1     of
2 a2 c2    new
3 a3 c3 values
/var/folders/2b/qckmp39n7qn1dh0tpcm8g89w0000gn/T/ipykernel_29772/
4023129152.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/
pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  subset["new"] = ["bunch", "of", "new", "values"]

This goes into how Python passes things by reference, so Pandas does not know for certain if you are working on a subsetted copy of the original dataframe, or want to make changes to the original dataframe.

The way we fix this is to be explicit when we are working with a subset of the data we plan to modify.

subset = dat[["A", "C"]].copy() # explicity copy
print(subset)
   A  C
0 a0 c0
1 a1 c1
2 a2 c2
3 a3 c3
# no more warning!
subset["new"] = ["bunch", "of", "new", "values"]
print(subset)
   A  C    new
0 a0 c0  bunch
1 a1 c1     of
2 a2 c2    new
3 a3 c3 values

In longer analysis and data processing scripts, the SettingWithCopyWarning is not always “close” to where the subsetting happened, so you may need to trace your code back to where you made a copy to your data set. There were a few points in the text book where we made .copy() calls. This was to avoid the SettingWithCopyWarning.

T.2 Replacing a Value

When you want to replace a particular value in a dataframe, make sure you do the entire replacement in a single .loc[] or .iloc[] call.

# reset our data
dat = pd.read_csv("data/concat_1.csv")
print(dat)
   A  B  C  D
0 a0 b0 c0 d0
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a3 b3 c3 d3

If you filter your rows and columns in separate steps, you will also run into the SettingWithCopyWarning.

# want to  replace the c2 value
# filter the rows and separately select the column
dat.loc[dat["C"] == "c2"]["C"] = "new value"

print(dat)
   A  B  C  D
0 a0 b0 c0 d0
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a3 b3 c3 d3
/var/folders/2b/qckmp39n7qn1dh0tpcm8g89w0000gn/T/ipykernel_29772/
3306879196.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/
pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dat.loc[dat["C"] == "c2"]["C"] = "new value"

Instead, you want to do the entire replacement in a single step.

dat = pd.read_csv("data/concat_1.csv")
dat.loc[dat["C"] == "c2", ["C"] ] = "new value"
print(dat)
   A  B         C  D
0 a0 b0        c0 d0
1 a1 b1        c1 d1
2 a2 b2 new value d2
3 a3 b3        c3 d3

T.3 More Resources

For more detail, there is a great blog post by Benjamin Pryke for Dataquest that walks you through this warning: https://www.dataquest.io/blog/settingwithcopywarning/

Kevin Markham from Data School also has a great YouTube video on the topic titled How do I avoid a SettingWithCopyWarning in pandas: https://www.youtube.com/watch?v=4R4WsDJ-KVc

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.83.96