Viewing pandas dataframe raises SettingWithCopyWarning in interactive session - python

I run into a strange situation that I didn't have an explanation for it at all. Let's start by having a simple pandas dataframe:
import pandas as pd
a = pd.DataFrame({'A': [1, 1, -1]})
say I want to select only postive rows to a dataframe and then modify the selected/filtered dataframe,
b = a[a['A'] > 0]
then when I modify b the pandas SettingWithCopyWarning will be raised, and it is expected since b is just a view of a:
b['B'] = -999
warning is raised:
__main__:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-
docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
but it will be okay when I overwrite the variable a:
a = a[a['A'] > 0]
a['B'] = -999
this will NOT raise warning and a simple id() check also shows this a now is a completely different object now. However, in an interactive session (notebook, ipython or python shell), this still raises the warning if you VIEWED the variable, that is:
in one cell you do:
a = pd.DataFrame({'A': [1, 1, -1]})
a
which will display nicely:
Out[4]:
A
0 1
1 1
2 -1
then in next cell (or line, in ipython or python shell), you do the same thing:
a = a[a['A'] > 0]
a['B'] = -999
the warning is raised:
ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-
docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Why would this simple viewing action make this difference? from what I understood it should not raise this warning (especially if you check with id() too, a became a new object with different id value). The second question is, is this the only way for this kind of behavior to happen?

Related

Returning a copy versus a view warning when using Python pandas dataframe

My purpose is to transform date column from object type in dateframe df into datetime type, but suffered a lot from view and copy warning when running the program.
I've found some useful information from link: https://stackoverflow.com/a/25254087/3849539
And tested following three solutions, all of them work as expected, but with different warning messages. Could anyone help explain their differences and point out why still warning message for returning a view versus a copy? Thanks.
Solution 1: df['date'] = df['date'].astype('datetime64')
test.py:85: SettingWithCopyWarning: A value is trying to be set on a
copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['date'] = df['date'].astype('datetime64')
Solution 2: df['date'] = pd.to_datetime(df['date'])
~/report/lib/python3.8/site-packages/pandas/core/frame.py:3188:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
test.py:85: SettingWithCopyWarning: A value is
trying to be set on a copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Solution 3: df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])
~/report/lib/python3.8/site-packages/pandas/core/indexing.py:1676:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(ilocs[0], value, pi)
Changing how you do the datetime conversion will not fix the SettingWithCopyWarning. You get it because the df you are working with is already a slice of some larger data frame. Pandas is simply warning you that you are working with the slice and not the full data. Try instead to create a new column in df - you'll get the warning, but the column will exist in your slice. It won't in the original data set.
You can turn off these warnings if you now what you are doing by using pd.options.mode.chained_assignment = None # default='warn'
I got similar warnings recently. After several tries, at least in my case, the problem is not related to your 3 solutions. It might be your 'df'.
If your df was a slice of another pandas df, such as:
df = dfOrigin[slice,:] or
df = dfOrigin[[some columns]] or
df = dfOrigin[one column]
Then, if you do anything on df, that warning will appear. Try using df = dfOrigin[[]].copy() instead.
Code to reproduce this:
import numpy as np
import pandas as pd
np.random.seed(2021)
dfOrigin = pd.DataFrame(np.random.choice(10, (4, 3)), columns=list('ABC'))
print("Orignal dfOrigin")
print(dfOrigin)
# A B C
# 0 4 5 9
# 1 0 6 5
# 2 8 6 6
# 3 6 6 1
df = dfOrigin[['B', 'C']] # Returns a view
df.loc[:,'B'] = df['B'].astype(str) #Get SettingWithCopyWarning
df2 = dfOrigin[['B', 'C']].copy() #Returns a copy
df2['B'] = df2['B'].astype(str) #OK

Set first and last row of a column in a dataframe

I've been reading over this and still find the subject a little confusing :
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
Say I have a Pandas DataFrame and I wish to simultaneously set the first and last row elements of a single column to whatever value. I can do this :
df.iloc[[0, -1]].mycol = [1, 2]
which tells me A value is trying to be set on a copy of a slice from a DataFrame. and that this is potentially dangerous.
I could use .loc instead, but then I need to know the index of the first and last rows ( in constrast, .iloc allows me to access by location ).
What's the safest Pandasy way to do this ?
To get to this point :
# Django queryset
query = market.stats_set.annotate(distance=F("end_date") - query_date)
# Generate a dataframe from this queryset, and order by distance
df = pd.DataFrame.from_records(query.values("distance", *fields), coerce_float=True)
df = df.sort_values("distance").reset_index(drop=True)
Then, I try calling df.distance.iloc[[0, -1]] = [1, 2]. This raises the warning.
The issue isn't with iloc, it's when you access .mycol that a copy is created. You can do this all within iloc:
df.iloc[[0, -1], df.columns.get_loc('mycol')] = [1, 2]
Usually ix is used if you want mixed integer and label based access, but doesn't work in this case since -1 isn't actually in the index, and apparently ix isn't smart enough to know it should be the last index.
What you're doing is called chained indexing, you can use iloc just on that column to avoid the warning:
In [24]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
Out[24]:
a b c
0 1.589940 0.735713 -1.158907
1 0.485653 0.044611 0.070907
2 1.123221 -0.862393 -0.807051
3 0.338653 -0.734169 -0.070471
4 0.344794 1.095861 -1.300339
In [25]:
df['a'].iloc[[0,-1]] ='foo'
df
Out[25]:
a b c
0 foo 0.735713 -1.158907
1 0.485653 0.044611 0.070907
2 1.12322 -0.862393 -0.807051
3 0.338653 -0.734169 -0.070471
4 foo 1.095861 -1.300339
If you do it the other way then it raises the warning:
In [27]:
df.iloc[[0,-1]]['a'] ='foo'
C:\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\IPython\kernel\__main__.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
if __name__ == '__main__':

python pandas 0.16: SettingWithCopyWarning incorrectly reported

As per my other question:
Python Anaconda: how to test if updated libraries are compatible with my existing code?
I curse the day I was forced to upgrade to pandas 0.16.
One of the things I don't understand is why I get a chained assignment warning when I do something as banal as adding a new field to an existing dataframe and initialising it with 1:
mydataframe['x']=1
causes the following warning:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
See the the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
mydataframe['x']=1
I understand there can be problems when assigning values to a copy of a dataframe, but here I am just adding a new field to a dataframe! How am I supposed to change my code (which worked perfectly in previous versions of pandas)?
Here's an attempt at an answer, or at least an attempt to reproduce the message. (Note that you may only get this message once and might need to start a new shell or do %reset in ipython to get this message.)
In [1]: %reset
Once deleted, variables cannot be recovered. Proceed (y/[n])? y
In [2]: import pandas as pd
In [3]: pd.__version__
Out[3]: '0.16.0'
Here are 3 variations of setting a new column to '1'. The first two do not generate the warning, but the third one does. (Second one thanks to #Jeff's suggestion)
In [4]: df = pd.DataFrame({ 'x':[1,2,3], 'y':[77,88,99] })
...: df['z'] = 1
In [5]: df = pd.DataFrame({ 'x':[1,2,3], 'y':[77,88,99] })
...: df = df[1:]
...: df['z'] = 1
In [6]: df = pd.DataFrame({ 'x':[1,2,3], 'y':[77,88,99] })
...: df2 = df[1:]
...: df2['z'] = 1
-c:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable
/indexing.html#indexing-view-versus-copy
Perhaps others can correct me if I'm wrong, but I believe the error message here is relating to df2 being a copy of a slice of df. However, that's not really an issue as the resulting df and df2 are what I would have expected:
In [7]: df
Out[7]:
x y
0 1 77
1 2 88
2 3 99
In [8]: df2
Out[8]:
x y z
1 2 88 1
2 3 99 1
I know this is going to be terrible to say, but when I get that message I just check to see whether the command did what I wanted or not and don't overly think about the warning. But whether you get a warning message or not, checking that a command did what you expected is really something you need to do all the time in pandas (or matlab, or R, or SAS, or Stata, ... )
This will not generate the warning:
df = pd.DataFrame({ 'x':[1,2,3], 'y':[77,88,99] })
df2 = df[1:].copy()
df2['z'] = 1

A value is trying to be set on a copy of a slice from a DataFrame-warning even after using .loc

I am getting a warning "
C:\Python27\lib\site-packages\pandas\core\indexing.py:411: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s"
Although as suggested in document I am using df.loc ?
def sentenceInReview(df):
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
print "size of df: " + str(df.size)
df.loc[: ,'review_text'] = df.review_text.map(lambda x: tokenizer.tokenize(x))
print df[:3]
I ran into this problem earlier today, this problem is related to the way Python passes 'object references' around between functions/assigning variables etc.
Unlike in say, R, in python assigning an existing dataframe to a new variable doesn't make a copy, so any operations on the 'new' dataframe is still a reference to the original underlying data.
The way to get around this is to make a deep copy (see docs) whenever you're trying to return a copy of something. See:
import pandas as pd
data = [1, 2, 3, 4, 5]
df = pd.DataFrame(data, columns = {'num'})
dfh = df.head(3) # This assignment doesn't actually make a copy
dfh.loc[:,'num'] = dfh['num'].apply(lambda x: x + 1)
# This will throw you the error
# Use deepcopy function provided in the default package 'copy'
import copy
df_copy = copy.deepcopy(df.head(3))
df_copy.loc[:,'num'] = df_copy['num'].apply(lambda x: x + 1)
# Making a deep copy breaks the reference to the original df. Hence, no more errors.
Here's a bit more on this topic that might explain the way Python does it better.
The common reason for the warning message "A value is trying to be set on a copy of a slice from a DataFrame": A slice over another slice!
For example:
dfA=dfB['x','y','z']
dfC=dfA['x','z']
"""
For the above codes, you may get such a message since dfC is a slice of dfA while dfA is a slice of dfB. Aka, dfC is a slice over another slice dfA and both are linked to dfB. Under such situation, it does not work whether you use .copy() or deepcopy or other similar ways:-(
"""
Solution:
dfA=dfB['x','y','z']
dfC=dfB['x','z']
Hopefully the above explanation helps:-)
Try inserting the values using pd.Series(data,index= index_list)

SettingWithCopyWarning, even when using loc (?) [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 3 years ago.
I get SettingWithCopyWarning errors in cases where I would not expect them:
N.In <38>: # Column B does not exist yet
N.In <39>: df['B'] = df['A']/25
N.In <40>: df['B'] = df['A']/50
/Users/josh/anaconda/envs/py27/lib/python2.7/site-packages/pandas/core/indexing.py:389: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
self.obj[item] = s
and
N.In <41>: df.loc[:,'B'] = df['A']/50
/Users/josh/anaconda/envs/py27/lib/python2.7/site-packages/pandas/core/indexing.py:389: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
self.obj[item] = s
Why does it happen in case 1 and 2?
In case 1, df['A'] creates a copy of df. As explained by the Pandas documentation, this can lead to unexpected results when chaining, thus a warning is raised. Case 2 looks correct, but false positives are possible:
Warning: The chained assignment warnings / exceptions are aiming to
inform the user of a possibly invalid assignment. There may be false
positives; situations where a chained assignment is inadvertantly
reported.
To turn off SettingWithCopyWarning for a single dataframe, use
df.is_copy = False
To turn off chained assignment warnings altogether, use
options.mode.chained_assignment = None
Another solution that should suppress the warning:
df = df.copy()
df['B'] = df['A']/25
df['B'] = df['A']/50

Categories

Resources