python pandas 0.16: SettingWithCopyWarning incorrectly reported - python

As per my other question:
Python Anaconda: how to test if updated libraries are compatible with my existing code?
I curse the day I was forced to upgrade to pandas 0.16.
One of the things I don't understand is why I get a chained assignment warning when I do something as banal as adding a new field to an existing dataframe and initialising it with 1:
mydataframe['x']=1
causes the following warning:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
See the the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
mydataframe['x']=1
I understand there can be problems when assigning values to a copy of a dataframe, but here I am just adding a new field to a dataframe! How am I supposed to change my code (which worked perfectly in previous versions of pandas)?

Here's an attempt at an answer, or at least an attempt to reproduce the message. (Note that you may only get this message once and might need to start a new shell or do %reset in ipython to get this message.)
In [1]: %reset
Once deleted, variables cannot be recovered. Proceed (y/[n])? y
In [2]: import pandas as pd
In [3]: pd.__version__
Out[3]: '0.16.0'
Here are 3 variations of setting a new column to '1'. The first two do not generate the warning, but the third one does. (Second one thanks to #Jeff's suggestion)
In [4]: df = pd.DataFrame({ 'x':[1,2,3], 'y':[77,88,99] })
...: df['z'] = 1
In [5]: df = pd.DataFrame({ 'x':[1,2,3], 'y':[77,88,99] })
...: df = df[1:]
...: df['z'] = 1
In [6]: df = pd.DataFrame({ 'x':[1,2,3], 'y':[77,88,99] })
...: df2 = df[1:]
...: df2['z'] = 1
-c:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable
/indexing.html#indexing-view-versus-copy
Perhaps others can correct me if I'm wrong, but I believe the error message here is relating to df2 being a copy of a slice of df. However, that's not really an issue as the resulting df and df2 are what I would have expected:
In [7]: df
Out[7]:
x y
0 1 77
1 2 88
2 3 99
In [8]: df2
Out[8]:
x y z
1 2 88 1
2 3 99 1
I know this is going to be terrible to say, but when I get that message I just check to see whether the command did what I wanted or not and don't overly think about the warning. But whether you get a warning message or not, checking that a command did what you expected is really something you need to do all the time in pandas (or matlab, or R, or SAS, or Stata, ... )

This will not generate the warning:
df = pd.DataFrame({ 'x':[1,2,3], 'y':[77,88,99] })
df2 = df[1:].copy()
df2['z'] = 1

Related

SettingWithCopyWarning A value is trying to be set on a copy of a slice from a DataFrame [duplicate]

This is one of the lines in my code where I get the SettingWithCopyWarning:
value1['Total Population']=value1['Total Population'].replace(to_replace='*', value=4)
Which I then changed to :
row_index= value1['Total Population']=='*'
value1.loc[row_index,'Total Population'] = 4
This still gives the same warning. How do I get rid of it?
Also, I get the same warning for a convert_objects(convert_numeric=True) function that I've used, is there any way to avoid that.
value1['Total Population'] = value1['Total Population'].astype(str).convert_objects(convert_numeric=True)
This is the warning message that I get:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
If you use .loc[row, column] and still get the same error, it's probably because of copying another dataframe. You have to use .copy().
This is a step-by-step error reproduction:
import pandas as pd
d = {'col1': [1, 2, 3, 4], 'col2': [3, 4, 5, 6]}
df = pd.DataFrame(data=d)
df
# col1 col2
#0 1 3
#1 2 4
#2 3 5
#3 4 6
Creating a new column and updating its value:
df['new_column'] = None
df.loc[0, 'new_column'] = 100
df
# col1 col2 new_column
#0 1 3 100
#1 2 4 None
#2 3 5 None
#3 4 6 None
No error I receive. But, let's create another dataframe given the previous one:
new_df = df.loc[df.col1>2]
new_df
#col1 col2 new_column
#2 3 5 None
#3 4 6 None
Now, using .loc, I will try to replace some values in the same manner:
new_df.loc[2, 'new_column'] = 100
However, I got this hateful warning again:
A value is trying to be set on a copy of a slice from a DataFrame. Try
using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
SOLUTION
use .copy() while creating the new dataframe will solve the warning:
new_df_copy = df.loc[df.col1>2].copy()
new_df_copy.loc[2, 'new_column'] = 100
Now, you won't receive any warnings!
If your dataframe is created using a filter on top of another dataframe, always use .copy().
Have you tried setting directly?:
value1.loc[value1['Total Population'] == '*', 'Total Population'] = 4
I have no idea how bad the data storage/memory implications are with this but it fixes it every time for your average dataframe:
def addCrazyColFunc(df):
dfNew = df.copy()
dfNew['newCol'] = 'crazy'
return dfNew
Just like the message says... make a copy and you're good to go. Please if someone can fix the above without the copy, please comment. All the above loc stuff doesn't work for this case.
I came here because I wanted to conditionally set the value of a new column based on the value in another column.
What worked for me was numpy.where:
import numpy as np
import pandas as pd
...
df['Size'] = np.where((df.value > 10), "Greater than 10", df.value)
From numpy docs, this is equivelant to:
[xv if c else yv
for c, xv, yv in zip(condition, x, y)]
Which is a pretty nice use of zip...
It is a warning about whether or not the source df is updated in replica update using sliced index.
If replica update, then try adding pd.set_option('mode.chained_assignment', None) before the line where the warning is raised
df_value = pd.DataFrame({ 'Total Population':['a','b','c','*'] })
value1 = df_value[ df_value['Total Population']=='*']
pd.set_option('mode.chained_assignment', None) # <=== SettingWithCopyWarning Off
row_index = value1['Total Population']=='*'
value1.loc[row_index,'Total Population'] = 44
pd.set_option('mode.chained_assignment', 'warn') # <=== SettingWithCopyWarning Default
Got the solution:
I created a new DataFrame and stored the value of only the columns that I needed to work on, it gives me no errors now!
Strange, but worked.
I was able to avoid the same warning message with syntax like this:
value1.loc[:, 'Total Population'].replace('*', 4)
Note that the dataframe doesn't need to be re-assigned to itself, i.e. value1['Total Population']=value1['Total Population']...
Specifying it is a copy worked for me. I just added .copy() at the end of the statement
value1['Total Population'] = value1['Total Population'].replace(to_replace='*', value=4).copy()
This should fix your problem :
value1[:, 'Total Population'] = value1[:, 'Total Population'].replace(to_replace='*', value=4)

SettingWithCopyWarning when modifying column inplace [duplicate]

This is one of the lines in my code where I get the SettingWithCopyWarning:
value1['Total Population']=value1['Total Population'].replace(to_replace='*', value=4)
Which I then changed to :
row_index= value1['Total Population']=='*'
value1.loc[row_index,'Total Population'] = 4
This still gives the same warning. How do I get rid of it?
Also, I get the same warning for a convert_objects(convert_numeric=True) function that I've used, is there any way to avoid that.
value1['Total Population'] = value1['Total Population'].astype(str).convert_objects(convert_numeric=True)
This is the warning message that I get:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
If you use .loc[row, column] and still get the same error, it's probably because of copying another dataframe. You have to use .copy().
This is a step-by-step error reproduction:
import pandas as pd
d = {'col1': [1, 2, 3, 4], 'col2': [3, 4, 5, 6]}
df = pd.DataFrame(data=d)
df
# col1 col2
#0 1 3
#1 2 4
#2 3 5
#3 4 6
Creating a new column and updating its value:
df['new_column'] = None
df.loc[0, 'new_column'] = 100
df
# col1 col2 new_column
#0 1 3 100
#1 2 4 None
#2 3 5 None
#3 4 6 None
No error I receive. But, let's create another dataframe given the previous one:
new_df = df.loc[df.col1>2]
new_df
#col1 col2 new_column
#2 3 5 None
#3 4 6 None
Now, using .loc, I will try to replace some values in the same manner:
new_df.loc[2, 'new_column'] = 100
However, I got this hateful warning again:
A value is trying to be set on a copy of a slice from a DataFrame. Try
using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
SOLUTION
use .copy() while creating the new dataframe will solve the warning:
new_df_copy = df.loc[df.col1>2].copy()
new_df_copy.loc[2, 'new_column'] = 100
Now, you won't receive any warnings!
If your dataframe is created using a filter on top of another dataframe, always use .copy().
Have you tried setting directly?:
value1.loc[value1['Total Population'] == '*', 'Total Population'] = 4
I have no idea how bad the data storage/memory implications are with this but it fixes it every time for your average dataframe:
def addCrazyColFunc(df):
dfNew = df.copy()
dfNew['newCol'] = 'crazy'
return dfNew
Just like the message says... make a copy and you're good to go. Please if someone can fix the above without the copy, please comment. All the above loc stuff doesn't work for this case.
I came here because I wanted to conditionally set the value of a new column based on the value in another column.
What worked for me was numpy.where:
import numpy as np
import pandas as pd
...
df['Size'] = np.where((df.value > 10), "Greater than 10", df.value)
From numpy docs, this is equivelant to:
[xv if c else yv
for c, xv, yv in zip(condition, x, y)]
Which is a pretty nice use of zip...
It is a warning about whether or not the source df is updated in replica update using sliced index.
If replica update, then try adding pd.set_option('mode.chained_assignment', None) before the line where the warning is raised
df_value = pd.DataFrame({ 'Total Population':['a','b','c','*'] })
value1 = df_value[ df_value['Total Population']=='*']
pd.set_option('mode.chained_assignment', None) # <=== SettingWithCopyWarning Off
row_index = value1['Total Population']=='*'
value1.loc[row_index,'Total Population'] = 44
pd.set_option('mode.chained_assignment', 'warn') # <=== SettingWithCopyWarning Default
Got the solution:
I created a new DataFrame and stored the value of only the columns that I needed to work on, it gives me no errors now!
Strange, but worked.
I was able to avoid the same warning message with syntax like this:
value1.loc[:, 'Total Population'].replace('*', 4)
Note that the dataframe doesn't need to be re-assigned to itself, i.e. value1['Total Population']=value1['Total Population']...
Specifying it is a copy worked for me. I just added .copy() at the end of the statement
value1['Total Population'] = value1['Total Population'].replace(to_replace='*', value=4).copy()
This should fix your problem :
value1[:, 'Total Population'] = value1[:, 'Total Population'].replace(to_replace='*', value=4)

Add column Python [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 1 year ago.
Hi I am trying to add a new column ("A") in an existing data frame based in which the values will be 1 or 3 based on the information in one of the columns ("B")
df["A"] = np.where(df["B"] == "reported-public", 1,3)
When doing so I am getting the warning message:
<ipython-input-239-767754e40f8a>:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Any idea why?
Thanks
Any idea why?
A very simple explanation is that you are slicing the data and trying to assign a value to the slice. Is this slice the same as your original dataframe ? We don't know what Pandas is doing exactly doing underneath. Under some situations it will get assigned into your original dataframe. If it works, then probably it got assigned correctly. That's why it's a warning.
There are some links you get more detailed explanation:
How to deal with SettingWithCopyWarning in Pandas
I have made dummy date as follows, to my best abilities based on your limited sample:
import pandas as pd
data = []
data.append([1, "reported-private"])
data.append([2, "reported-private"])
data.append([3, "reported-public"])
df = pd.DataFrame(data, columns=['Number', 'B'])
While using the command provided with numpy 1.19.5 and pandas 1.2.4
df["A"] = np.where(df["B"] == "reported-public", 1,3)
The following output, probably the one your expecting:
Number B A
1 reported-private 3
2 reported-private 3
3 reported-public 1
Now the error is hinting that you might want to use .loc from pandas itself, and maybe .apply for extra functionality. Example provided as such:
df['A'] = df.apply(lambda row: 1 if row.B == 'reported-public' else 3, axis = 1)
Output for this way is the same as previous:
Number B A
1 reported-private 3
2 reported-private 3
3 reported-public 1
So to sum up, might be a version problem, if it is, try changing the version or try the second approach. Cheers.
You can always disable this behavior, as shown below and is from this post:
import pandas as pd
pd.options.mode.chained_assignment = None # default='warn'

Returning a copy versus a view warning when using Python pandas dataframe

My purpose is to transform date column from object type in dateframe df into datetime type, but suffered a lot from view and copy warning when running the program.
I've found some useful information from link: https://stackoverflow.com/a/25254087/3849539
And tested following three solutions, all of them work as expected, but with different warning messages. Could anyone help explain their differences and point out why still warning message for returning a view versus a copy? Thanks.
Solution 1: df['date'] = df['date'].astype('datetime64')
test.py:85: SettingWithCopyWarning: A value is trying to be set on a
copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['date'] = df['date'].astype('datetime64')
Solution 2: df['date'] = pd.to_datetime(df['date'])
~/report/lib/python3.8/site-packages/pandas/core/frame.py:3188:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
test.py:85: SettingWithCopyWarning: A value is
trying to be set on a copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Solution 3: df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])
~/report/lib/python3.8/site-packages/pandas/core/indexing.py:1676:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(ilocs[0], value, pi)
Changing how you do the datetime conversion will not fix the SettingWithCopyWarning. You get it because the df you are working with is already a slice of some larger data frame. Pandas is simply warning you that you are working with the slice and not the full data. Try instead to create a new column in df - you'll get the warning, but the column will exist in your slice. It won't in the original data set.
You can turn off these warnings if you now what you are doing by using pd.options.mode.chained_assignment = None # default='warn'
I got similar warnings recently. After several tries, at least in my case, the problem is not related to your 3 solutions. It might be your 'df'.
If your df was a slice of another pandas df, such as:
df = dfOrigin[slice,:] or
df = dfOrigin[[some columns]] or
df = dfOrigin[one column]
Then, if you do anything on df, that warning will appear. Try using df = dfOrigin[[]].copy() instead.
Code to reproduce this:
import numpy as np
import pandas as pd
np.random.seed(2021)
dfOrigin = pd.DataFrame(np.random.choice(10, (4, 3)), columns=list('ABC'))
print("Orignal dfOrigin")
print(dfOrigin)
# A B C
# 0 4 5 9
# 1 0 6 5
# 2 8 6 6
# 3 6 6 1
df = dfOrigin[['B', 'C']] # Returns a view
df.loc[:,'B'] = df['B'].astype(str) #Get SettingWithCopyWarning
df2 = dfOrigin[['B', 'C']].copy() #Returns a copy
df2['B'] = df2['B'].astype(str) #OK

Viewing pandas dataframe raises SettingWithCopyWarning in interactive session

I run into a strange situation that I didn't have an explanation for it at all. Let's start by having a simple pandas dataframe:
import pandas as pd
a = pd.DataFrame({'A': [1, 1, -1]})
say I want to select only postive rows to a dataframe and then modify the selected/filtered dataframe,
b = a[a['A'] > 0]
then when I modify b the pandas SettingWithCopyWarning will be raised, and it is expected since b is just a view of a:
b['B'] = -999
warning is raised:
__main__:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-
docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
but it will be okay when I overwrite the variable a:
a = a[a['A'] > 0]
a['B'] = -999
this will NOT raise warning and a simple id() check also shows this a now is a completely different object now. However, in an interactive session (notebook, ipython or python shell), this still raises the warning if you VIEWED the variable, that is:
in one cell you do:
a = pd.DataFrame({'A': [1, 1, -1]})
a
which will display nicely:
Out[4]:
A
0 1
1 1
2 -1
then in next cell (or line, in ipython or python shell), you do the same thing:
a = a[a['A'] > 0]
a['B'] = -999
the warning is raised:
ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-
docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Why would this simple viewing action make this difference? from what I understood it should not raise this warning (especially if you check with id() too, a became a new object with different id value). The second question is, is this the only way for this kind of behavior to happen?

Categories

Resources