The SettingWithCopyWarning
is just a warning, so your code will still run and produce a result. However, if you do see this warning, it is a “code smell” that maybe you need to re-write something in your code.
Let’s work with one of our small example data sets to recreate the warning.
import pandas as pd
dat = pd.read_csv("data/concat_1.csv")
print(dat)
A B C D
0 a0 b0 c0 d0
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a3 b3 c3 d3
It’s pretty common to subset your data for values you need, and then make changes to that subset.
subset = dat[["A", "C"]]
print(subset)
A C
0 a0 c0
1 a1 c1
2 a2 c2
3 a3 c3
# this will trigger the warning
subset["new"] = ["bunch", "of", "new", "values"]
print(subset)
A C new
0 a0 c0 bunch
1 a1 c1 of
2 a2 c2 new
3 a3 c3 values
/var/folders/2b/qckmp39n7qn1dh0tpcm8g89w0000gn/T/ipykernel_29772/
4023129152.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/
pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
subset["new"] = ["bunch", "of", "new", "values"]
This goes into how Python passes things by reference, so Pandas does not know for certain if you are working on a subsetted copy of the original dataframe, or want to make changes to the original dataframe.
The way we fix this is to be explicit when we are working with a subset of the data we plan to modify.
subset = dat[["A", "C"]].copy() # explicity copy
print(subset)
A C
0 a0 c0
1 a1 c1
2 a2 c2
3 a3 c3
# no more warning!
subset["new"] = ["bunch", "of", "new", "values"]
print(subset)
A C new
0 a0 c0 bunch
1 a1 c1 of
2 a2 c2 new
3 a3 c3 values
In longer analysis and data processing scripts, the SettingWithCopyWarning
is not always “close” to where the subsetting happened, so you may need to trace your code back to where you made a copy to your data set. There were a few points in the text book where we made .copy()
calls. This was to avoid the SettingWithCopyWarning
.
When you want to replace a particular value in a dataframe, make sure you do the entire replacement in a single .loc[]
or .iloc[]
call.
# reset our data
dat = pd.read_csv("data/concat_1.csv")
print(dat)
A B C D
0 a0 b0 c0 d0
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a3 b3 c3 d3
If you filter your rows and columns in separate steps, you will also run into the SettingWithCopyWarning
.
# want to replace the c2 value
# filter the rows and separately select the column
dat.loc[dat["C"] == "c2"]["C"] = "new value"
print(dat)
A B C D
0 a0 b0 c0 d0
1 a1 b1 c1 d1
2 a2 b2 c2 d2
3 a3 b3 c3 d3
/var/folders/2b/qckmp39n7qn1dh0tpcm8g89w0000gn/T/ipykernel_29772/
3306879196.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/
pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
dat.loc[dat["C"] == "c2"]["C"] = "new value"
Instead, you want to do the entire replacement in a single step.
dat = pd.read_csv("data/concat_1.csv")
dat.loc[dat["C"] == "c2", ["C"] ] = "new value"
print(dat)
A B C D
0 a0 b0 c0 d0
1 a1 b1 c1 d1
2 a2 b2 new value d2
3 a3 b3 c3 d3
For more detail, there is a great blog post by Benjamin Pryke for Dataquest that walks you through this warning: https://www.dataquest.io/blog/settingwithcopywarning/
Kevin Markham from Data School also has a great YouTube video on the topic titled How do I avoid a SettingWithCopyWarning in pandas: https://www.youtube.com/watch?v=4R4WsDJ-KVc
3.139.83.96