SettingWithCopyWarning happens on DataFrame.astype unreasonably - python

This is very strange and annoying: I have a python script which contains below DataFrame:
>>> x_pattern
sim_target_id line_on_trench top bot orientation session_id
4 0 sim_1 sim_10 X_overlay 1
64 0 sim_8 sim_31 X_overlay 1
If I try:
>>> x_pattern['sim_target_id'] = x_pattern['sim_target_id'].astype(int)
A familiar warning will raise:
86: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
x_pattern['sim_target_id'] = x_pattern['sim_target_id'].astype(int)
However, if I insert below lines into my script:
df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz'],'value': [1, 2, 3],'ccc':['a','vb','c']})
df1.value = df1.value.astype('int')
No 'SettingWithCopyWarning' will be raised on the df1 operations!
I tried the recommend .loc method - does not work, I tried to astype inplace - does not work. Could someone help me?
Appendix - How is the DataFrame created:
The DataFrame is created from a sqlite database:
sqlite_path = 'xxx'
engine2 = create_engine('sqlite:///{}'.format(sqlite_path))
connection2 = engine2.connect()
resoverall = connection2.execute("SELECT \
sim_target_id,line_on_trench,top,bot,orientation,session_id \
FROM \
sim_targets \
WHERE \
sim_target_id In ({});".format(','.join(selected_id))) #pattern info
sim_targets = pd.DataFrame(resoverall.fetchall())
sim_targets.columns = resoverall.keys()
print sim_targets.dtypes
x_pattern = sim_targets[(sim_targets['orientation']=='X_overlay')&(sim_targets['sim_target_id'].isin(x_sim_id))]
print x_pattern
x_pattern['sim_target_id'] = x_pattern['sim_target_id'].astype(int)
The output will be:
sim_target_id object
line_on_trench object
top object
bot object
orientation object
session_id object
dtype: object
sim_target_id line_on_trench top bot orientation session_id
0 4 0 sim_1 sim_10 X_overlay 1
1 64 0 sim_8 sim_31 X_overlay 1
test.py:37: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
x_pattern['sim_target_id'] = x_pattern['sim_target_id'].astype(int)
I just tried to manually input the DataFrame and the warning won't appear. But I can't tell the difference between the manual df and imported df, they look just the same - value and dtypes.

Related

Getting errors while running in Jupyter notebook

I'm having trouble running this code in python. This is my code:
import pandas as pd
import numpy as np
stars_with_planet = pd.read_csv(r'C:\Users\Stars\starswithplanet.csv')
df1 = pd.DataFrame(stars_with_planet)
stars_without_planet = pd.read_csv(r'C:\Users\Stars\starswithoutplanet.csv')
df2 = pd.DataFrame(stars_without_planet)
df3 = df1.loc[(df1['TeffK'] >= 3500) & (df1['TeffK'] <= 5400)]
df4 = df2.loc[(df2['TeffK'] >= 3500) & (df2['TeffK'] <= 5400)]
df3['check'] = df3[['[Fe/H]']].apply(tuple, axis=1)\
.isin(df4[['[Fe/H]']].apply(tuple, axis=1))
It is showing the following error after the last line:
C:\Users\AG\AppData\Local\Temp/ipykernel_5940/3520898032.py:1:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead See the caveats in the documentation:
pandas.pydata.org/pandas-docs/stable/user_guide/… df3['check'] =
df3[['[Fe/H]']].apply(tuple, axis=1)\
Please help me I have used Jupyter notebook.
The CSV Files are attached below:
https://drive.google.com/file/d/1eDf2G969tdaxZrM9mQXk3mSKHrjABRUQ/view?usp=sharing
https://drive.google.com/file/d/1t8OZGgxaXbbp5X-9Ms8NJd4AZfYUMOGC/view?usp=sharing
The one you are showing at the line below is not en error:
df3['check'] = df3[['[Fe/H]']].apply(tuple, axis=1).isin(df4[['[Fe/H]']].apply(tuple, axis=1))
It's a warning:
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
The pandas SettingWithCopyWarning warns you that you may be doing some chained assignments that may not work as expected.
Basically the issue is that modifications to your df3 will not lead to modification to your original df1.
If you don't care about keeping df1 updated, but you only care about df3, you could do this:
df3 = df1.loc[(df1['TeffK'] >= 3500) & (df1['TeffK'] <= 5400)].copy()
...
df3['check'] = df3[['[Fe/H]']].apply(tuple, axis=1).isin(df4[['[Fe/H]']].apply(tuple, axis=1))
Otherwise, you can do as suggested by the warning. I'm not entirely sure what your expected outcome is, but this code below updates directly df1:
df1.loc[(df1['TeffK'] >= 3500) & (df1['TeffK'] <= 5400), 'check'] = df1[['[Fe/H]']].apply(tuple, axis=1).isin(df4[['[Fe/H]']].apply(tuple, axis=1))

Pandas SettingWithCopyWarning for unclear reason

Consider the following example code
import pandas as pd
import numpy as np
pd.set_option('display.expand_frame_repr', False)
foo = pd.read_csv("foo2.csv", skipinitialspace=True, index_col='Index')
foo.loc[:, 'Date'] = pd.to_datetime(foo.Date)
for i in range(0, len(foo)-1):
if foo.at[i, 'Type'] == 'Reservation':
for j in range(i+1, len(foo)):
if foo.at[j, 'Type'] == 'Payout':
foo.at[j, 'Nights'] = foo.at[i, 'Nights']
break
mask = (foo['Date'] >= '2018-03-31') & (foo['Date'] <= '2019-03-31')
foo2019 = foo.loc[mask]
foopayouts2019 = foo2019.loc[foo2019['Type'] == 'Payout']
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
# foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].astype(np.int64, copy=False)
with foo2.csv as:
Index,Date,Type,Nights,Amount,Payout
0,03/07/2018,Reservation,2.0,1000.00,
1,03/07/2018,Payout,,,1000.00
2,09/11/2018,Reservation,3.0,1500.00,
3,09/11/2018,Payout,,,1500.00
4,02/16/2019,Reservation,2.0,2000.00,
5,02/16/2019,Payout,,,2000.00
6,04/25/2019,Reservation,7.0,1200.00,
7,04/25/2019,Payout,,,1200.00
This gives the following warning:
/usr/lib/python2.7/dist-packages/pandas/core/indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
The warning does not mention a line number, but appears to be coming from the line:
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
At least, if I comment that line out, the error goes away. So, I have two questions.
What is causing that error? I've been trying to use .loc where
appropriate, including in that line where the warning is (possibly)
coming from. If the problem is actually earlier, where is it?
Second, which is the better choice, .apply or astype, as used in
the following lines of code?
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
# foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].astype(np.int64, copy=False)
It seems that both of them work, except for that warning.
I would change a few things in the code:
We are checking if the current row is Reservation and the next row is Payout
by using shift()
and ffill-ing the values where condition matches by using np.where()
foo.Date=pd.to_datetime(foo.Date) #convert to datetime
c=foo.Type.eq('Reservation')&foo.Type.shift(-1).eq('Payout')
foo.Nights=np.where(~c,foo.Nights.ffill(),foo.Nights) #replace if else with np.where
Or:
c=foo.Type.shift().eq('Reservation')&foo.Type.eq('Payout')
np.where(c,foo.Nights.ffill(),foo.Nights)
Then use series.between() to check if dates fall between 2 dates:
foo2019 = foo[foo.Date.between('2018-03-31','2019-03-31')].copy() #changes
foopayouts2019 = foo2019[foo2019['Type'] == 'Payout'].copy() #changes .copy()
Or directly:
foopayouts2019=foo[foo.Date.between('2018-03-31','2019-03-31')&foo.Type.eq('Payout')].copy()
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64) #.astype(int)
Index Date Type Nights Amount Payout
3 3 2018-09-11 Payout 3 NaN 1500.0
5 5 2019-02-16 Payout 2 NaN 2000.0

Warning - value is trying to be set on a copy of a slice

I get the warning when i run this code. I tried all possible solutions I can think of, but cannot get rid of it. Kindly help !
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
import math
task2_df['price_square'] = None
i = 0
for row in data.iterrows():
task2_df['price_square'].at[i] = math.pow(task2_df['price'][i],2)
i += 1
For starters, I don't see your error on Pandas v0.19.2 (tested with code at the bottom of this answer). But that's probably irrelevant to solving your issue. You should avoid iterating rows in Python-level loops. NumPy arrays which are used by Pandas are specifically designed for numerical computations:
df = pd.DataFrame({'price': [54.74, 12.34, 35.45, 51.31]})
df['price_square'] = df['price'].pow(2)
print(df)
price price_square
0 54.74 2996.4676
1 12.34 152.2756
2 35.45 1256.7025
3 51.31 2632.7161
Test on Pandas v0.19.2 with no warnings / errors:
import math
df = pd.DataFrame({'price': [54.74, 12.34, 35.45, 51.31]})
df['price_square'] = None
i = 0
for row in df.iterrows():
df['price_square'].at[i] = math.pow(df['price'][i],2)
i += 1

Annoy pandas SettingWithCopyWarning, even tried loc[;,]

code_null.loc[:,'code'] = code_null['blockname'].apply(__f,args=(code_name,))
def __f(x, df):
#markets = ['A','B']
markets = ['A']
for market in markets:
code = df.loc[df.name==x,'code'].tolist()
if code:
return ','.join(code)
else:
return np.nan
Always getting SettingWithCopyWarning,
.virtualenv/python3/lib/python3.6/site-packages/pandas/core/indexing.py:537: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Also tried:
code_null.loc[:,'code'] = code_null.loc[:,'blockname'].apply(__f,args=(code_name,))
But got same warning.
code_null.loc[:,'code'] = code_null['blockname'].apply(__f,args=(code_name,)).copy()
Try using:
code_null.loc[:,(code)] = code_null[(blockname)].apply(__f,args=(code_name,))
Copy method is not preferred:
The .copy() method is not guaranteed and should be avoided per the DOCS
Prefered method from the docs:
dfc = pd.DataFrame({'A':['aaa','bbb','ccc'],'B':[1,2,3]})
dfc.loc[0,'A'] = 11
dfc
A B
0 11 1
1 bbb 2
2 ccc 3

pandas standalone series and from dataframe different behavior

Here is my code and warning message. If I change s to be a standalone Series by using s = pd.Series(np.random.randn(5)), there will no such errors. Using Python 2.7 on Windows.
It seems Series created from standalone and Series created from a column of a data frame are different behavior? Thanks.
My purpose is to change the Series value itself, other than change on a copy.
Source code,
import pandas as pd
sample = pd.read_csv('123.csv', header=None, skiprows=1,
dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
s = sample['c_d']
#s = pd.Series(np.random.randn(5))
for i in range(len(s)):
if s.iloc[i] > 0:
s.iloc[i] = s.iloc[i] + 1
else:
s.iloc[i] = s.iloc[i] - 1
Warning message,
C:\Python27\lib\site-packages\pandas\core\indexing.py:132: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
Content of 123.csv,
c_a,c_b,c_c,c_d
hello,python,numpy,0.0
hi,python,pandas,1.0
ho,c++,vector,0.0
ho,c++,std,1.0
go,c++,std,0.0
Edit 1, seems lambda solution does not work, tried to print s before and after, the same value,
import pandas as pd
sample = pd.read_csv('123.csv', header=None, skiprows=1,
dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
s = sample['c_d']
print s
s.apply(lambda x:x+1 if x>0 else x-1)
print s
0 0
1 1
2 0
3 1
4 0
Name: c_d, dtype: int64
Backend TkAgg is interactive backend. Turning interactive mode on.
0 0
1 1
2 0
3 1
4 0
regards,
Lin
By doing s = sample['c_d'], if you make a change to the value of s then your original Dataframe sample also changes. That's why you got the warning.
You can do s = sample[c_d].copy() instead, so that changing the value of s doesn't change the value of c_d column of the Dataframe sample.
I suggest you use apply function instead:
s.apply(lambda x:x+1 if x>0 else x-1)

Categories

Resources