Annoy pandas SettingWithCopyWarning, even tried loc[;,] - python

code_null.loc[:,'code'] = code_null['blockname'].apply(__f,args=(code_name,))
def __f(x, df):
#markets = ['A','B']
markets = ['A']
for market in markets:
code = df.loc[df.name==x,'code'].tolist()
if code:
return ','.join(code)
else:
return np.nan
Always getting SettingWithCopyWarning,
.virtualenv/python3/lib/python3.6/site-packages/pandas/core/indexing.py:537: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Also tried:
code_null.loc[:,'code'] = code_null.loc[:,'blockname'].apply(__f,args=(code_name,))
But got same warning.

code_null.loc[:,'code'] = code_null['blockname'].apply(__f,args=(code_name,)).copy()

Try using:
code_null.loc[:,(code)] = code_null[(blockname)].apply(__f,args=(code_name,))
Copy method is not preferred:
The .copy() method is not guaranteed and should be avoided per the DOCS
Prefered method from the docs:
dfc = pd.DataFrame({'A':['aaa','bbb','ccc'],'B':[1,2,3]})
dfc.loc[0,'A'] = 11
dfc
A B
0 11 1
1 bbb 2
2 ccc 3

Related

Pandas SettingWithCopyWarning for unclear reason

Consider the following example code
import pandas as pd
import numpy as np
pd.set_option('display.expand_frame_repr', False)
foo = pd.read_csv("foo2.csv", skipinitialspace=True, index_col='Index')
foo.loc[:, 'Date'] = pd.to_datetime(foo.Date)
for i in range(0, len(foo)-1):
if foo.at[i, 'Type'] == 'Reservation':
for j in range(i+1, len(foo)):
if foo.at[j, 'Type'] == 'Payout':
foo.at[j, 'Nights'] = foo.at[i, 'Nights']
break
mask = (foo['Date'] >= '2018-03-31') & (foo['Date'] <= '2019-03-31')
foo2019 = foo.loc[mask]
foopayouts2019 = foo2019.loc[foo2019['Type'] == 'Payout']
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
# foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].astype(np.int64, copy=False)
with foo2.csv as:
Index,Date,Type,Nights,Amount,Payout
0,03/07/2018,Reservation,2.0,1000.00,
1,03/07/2018,Payout,,,1000.00
2,09/11/2018,Reservation,3.0,1500.00,
3,09/11/2018,Payout,,,1500.00
4,02/16/2019,Reservation,2.0,2000.00,
5,02/16/2019,Payout,,,2000.00
6,04/25/2019,Reservation,7.0,1200.00,
7,04/25/2019,Payout,,,1200.00
This gives the following warning:
/usr/lib/python2.7/dist-packages/pandas/core/indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
The warning does not mention a line number, but appears to be coming from the line:
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
At least, if I comment that line out, the error goes away. So, I have two questions.
What is causing that error? I've been trying to use .loc where
appropriate, including in that line where the warning is (possibly)
coming from. If the problem is actually earlier, where is it?
Second, which is the better choice, .apply or astype, as used in
the following lines of code?
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
# foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].astype(np.int64, copy=False)
It seems that both of them work, except for that warning.
I would change a few things in the code:
We are checking if the current row is Reservation and the next row is Payout
by using shift()
and ffill-ing the values where condition matches by using np.where()
foo.Date=pd.to_datetime(foo.Date) #convert to datetime
c=foo.Type.eq('Reservation')&foo.Type.shift(-1).eq('Payout')
foo.Nights=np.where(~c,foo.Nights.ffill(),foo.Nights) #replace if else with np.where
Or:
c=foo.Type.shift().eq('Reservation')&foo.Type.eq('Payout')
np.where(c,foo.Nights.ffill(),foo.Nights)
Then use series.between() to check if dates fall between 2 dates:
foo2019 = foo[foo.Date.between('2018-03-31','2019-03-31')].copy() #changes
foopayouts2019 = foo2019[foo2019['Type'] == 'Payout'].copy() #changes .copy()
Or directly:
foopayouts2019=foo[foo.Date.between('2018-03-31','2019-03-31')&foo.Type.eq('Payout')].copy()
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64) #.astype(int)
Index Date Type Nights Amount Payout
3 3 2018-09-11 Payout 3 NaN 1500.0
5 5 2019-02-16 Payout 2 NaN 2000.0

SettingWithCopyWarning happens on DataFrame.astype unreasonably

This is very strange and annoying: I have a python script which contains below DataFrame:
>>> x_pattern
sim_target_id line_on_trench top bot orientation session_id
4 0 sim_1 sim_10 X_overlay 1
64 0 sim_8 sim_31 X_overlay 1
If I try:
>>> x_pattern['sim_target_id'] = x_pattern['sim_target_id'].astype(int)
A familiar warning will raise:
86: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
x_pattern['sim_target_id'] = x_pattern['sim_target_id'].astype(int)
However, if I insert below lines into my script:
df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz'],'value': [1, 2, 3],'ccc':['a','vb','c']})
df1.value = df1.value.astype('int')
No 'SettingWithCopyWarning' will be raised on the df1 operations!
I tried the recommend .loc method - does not work, I tried to astype inplace - does not work. Could someone help me?
Appendix - How is the DataFrame created:
The DataFrame is created from a sqlite database:
sqlite_path = 'xxx'
engine2 = create_engine('sqlite:///{}'.format(sqlite_path))
connection2 = engine2.connect()
resoverall = connection2.execute("SELECT \
sim_target_id,line_on_trench,top,bot,orientation,session_id \
FROM \
sim_targets \
WHERE \
sim_target_id In ({});".format(','.join(selected_id))) #pattern info
sim_targets = pd.DataFrame(resoverall.fetchall())
sim_targets.columns = resoverall.keys()
print sim_targets.dtypes
x_pattern = sim_targets[(sim_targets['orientation']=='X_overlay')&(sim_targets['sim_target_id'].isin(x_sim_id))]
print x_pattern
x_pattern['sim_target_id'] = x_pattern['sim_target_id'].astype(int)
The output will be:
sim_target_id object
line_on_trench object
top object
bot object
orientation object
session_id object
dtype: object
sim_target_id line_on_trench top bot orientation session_id
0 4 0 sim_1 sim_10 X_overlay 1
1 64 0 sim_8 sim_31 X_overlay 1
test.py:37: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
x_pattern['sim_target_id'] = x_pattern['sim_target_id'].astype(int)
I just tried to manually input the DataFrame and the warning won't appear. But I can't tell the difference between the manual df and imported df, they look just the same - value and dtypes.

Pandas: Modify a particular level of Multiindex, using replace method several times

I am trying to use the replace method several times in order to change the indeces of a given level of a multiindex pandas' dataframe.
As seen here: Pandas: Modify a particular level of Multiindex, #John got a solution that works great so long the replace method is used once.
The problem is, that it does not work if I use this method several times.
E.g.
df.index = df.index.set_levels(df.index.levels[0].str.replace("dataframe_",'').replace("_r",' r'), level=0)
I get the following error message:
AttributeError: 'Index' object has no attribute 'replace'
What am I missing?
Use str.replace twice:
idx = df.index.levels[0].str.replace("dataframe_",'').str.replace("_r",' r')
df.index = df.index.set_levels(idx, level=0)
Another solution is converting to_series and then replace by dictionary:
d = {'dataframe_':'','_r':' r'}
idx = df.index.levels[0].to_series().replace(d)
df.index = df.index.set_levels(idx, level=0)
And solution with map and fillna, if large data and performance is important:
d = {'dataframe_':'','_r':' r'}
s = df.index.levels[0].to_series()
df.index = df.index.set_levels(s.map(d).fillna(s), level=0)
Sample:
df = pd.DataFrame({
'A':['dataframe_','_r', 'a'],
'B':[7,8,9],
'C':[1,3,5],
}).set_index(['A','B'])
print (df)
C
A B
dataframe_ 7 1
_r 8 3
a 9 5
d = {'dataframe_':'','_r':' r'}
idx = df.index.levels[0].to_series().replace(d)
df.index = df.index.set_levels(idx, level=0)
print (df)
C
A B
7 1
r 8 3
a 9 5

Warning - value is trying to be set on a copy of a slice

I get the warning when i run this code. I tried all possible solutions I can think of, but cannot get rid of it. Kindly help !
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
import math
task2_df['price_square'] = None
i = 0
for row in data.iterrows():
task2_df['price_square'].at[i] = math.pow(task2_df['price'][i],2)
i += 1
For starters, I don't see your error on Pandas v0.19.2 (tested with code at the bottom of this answer). But that's probably irrelevant to solving your issue. You should avoid iterating rows in Python-level loops. NumPy arrays which are used by Pandas are specifically designed for numerical computations:
df = pd.DataFrame({'price': [54.74, 12.34, 35.45, 51.31]})
df['price_square'] = df['price'].pow(2)
print(df)
price price_square
0 54.74 2996.4676
1 12.34 152.2756
2 35.45 1256.7025
3 51.31 2632.7161
Test on Pandas v0.19.2 with no warnings / errors:
import math
df = pd.DataFrame({'price': [54.74, 12.34, 35.45, 51.31]})
df['price_square'] = None
i = 0
for row in df.iterrows():
df['price_square'].at[i] = math.pow(df['price'][i],2)
i += 1

pandas standalone series and from dataframe different behavior

Here is my code and warning message. If I change s to be a standalone Series by using s = pd.Series(np.random.randn(5)), there will no such errors. Using Python 2.7 on Windows.
It seems Series created from standalone and Series created from a column of a data frame are different behavior? Thanks.
My purpose is to change the Series value itself, other than change on a copy.
Source code,
import pandas as pd
sample = pd.read_csv('123.csv', header=None, skiprows=1,
dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
s = sample['c_d']
#s = pd.Series(np.random.randn(5))
for i in range(len(s)):
if s.iloc[i] > 0:
s.iloc[i] = s.iloc[i] + 1
else:
s.iloc[i] = s.iloc[i] - 1
Warning message,
C:\Python27\lib\site-packages\pandas\core\indexing.py:132: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
Content of 123.csv,
c_a,c_b,c_c,c_d
hello,python,numpy,0.0
hi,python,pandas,1.0
ho,c++,vector,0.0
ho,c++,std,1.0
go,c++,std,0.0
Edit 1, seems lambda solution does not work, tried to print s before and after, the same value,
import pandas as pd
sample = pd.read_csv('123.csv', header=None, skiprows=1,
dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
s = sample['c_d']
print s
s.apply(lambda x:x+1 if x>0 else x-1)
print s
0 0
1 1
2 0
3 1
4 0
Name: c_d, dtype: int64
Backend TkAgg is interactive backend. Turning interactive mode on.
0 0
1 1
2 0
3 1
4 0
regards,
Lin
By doing s = sample['c_d'], if you make a change to the value of s then your original Dataframe sample also changes. That's why you got the warning.
You can do s = sample[c_d].copy() instead, so that changing the value of s doesn't change the value of c_d column of the Dataframe sample.
I suggest you use apply function instead:
s.apply(lambda x:x+1 if x>0 else x-1)

Categories

Resources