Pandas SettingWithCopyWarning for unclear reason

Pandas SettingWithCopyWarning for unclear reason - python

Consider the following example code
import pandas as pd
import numpy as np
pd.set_option('display.expand_frame_repr', False)
foo = pd.read_csv("foo2.csv", skipinitialspace=True, index_col='Index')
foo.loc[:, 'Date'] = pd.to_datetime(foo.Date)
for i in range(0, len(foo)-1):
if foo.at[i, 'Type'] == 'Reservation':
for j in range(i+1, len(foo)):
if foo.at[j, 'Type'] == 'Payout':
foo.at[j, 'Nights'] = foo.at[i, 'Nights']
break
mask = (foo['Date'] >= '2018-03-31') & (foo['Date'] <= '2019-03-31')
foo2019 = foo.loc[mask]
foopayouts2019 = foo2019.loc[foo2019['Type'] == 'Payout']
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
# foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].astype(np.int64, copy=False)
with foo2.csv as:
Index,Date,Type,Nights,Amount,Payout
0,03/07/2018,Reservation,2.0,1000.00,
1,03/07/2018,Payout,,,1000.00
2,09/11/2018,Reservation,3.0,1500.00,
3,09/11/2018,Payout,,,1500.00
4,02/16/2019,Reservation,2.0,2000.00,
5,02/16/2019,Payout,,,2000.00
6,04/25/2019,Reservation,7.0,1200.00,
7,04/25/2019,Payout,,,1200.00
This gives the following warning:
/usr/lib/python2.7/dist-packages/pandas/core/indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
The warning does not mention a line number, but appears to be coming from the line:
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
At least, if I comment that line out, the error goes away. So, I have two questions.
What is causing that error? I've been trying to use .loc where
appropriate, including in that line where the warning is (possibly)
coming from. If the problem is actually earlier, where is it?
Second, which is the better choice, .apply or astype, as used in
the following lines of code?
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
# foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].astype(np.int64, copy=False)
It seems that both of them work, except for that warning.

I would change a few things in the code:
We are checking if the current row is Reservation and the next row is Payout
by using shift()
and ffill-ing the values where condition matches by using np.where()
foo.Date=pd.to_datetime(foo.Date) #convert to datetime
c=foo.Type.eq('Reservation')&foo.Type.shift(-1).eq('Payout')
foo.Nights=np.where(~c,foo.Nights.ffill(),foo.Nights) #replace if else with np.where
Or:
c=foo.Type.shift().eq('Reservation')&foo.Type.eq('Payout')
np.where(c,foo.Nights.ffill(),foo.Nights)
Then use series.between() to check if dates fall between 2 dates:
foo2019 = foo[foo.Date.between('2018-03-31','2019-03-31')].copy() #changes
foopayouts2019 = foo2019[foo2019['Type'] == 'Payout'].copy() #changes .copy()
Or directly:
foopayouts2019=foo[foo.Date.between('2018-03-31','2019-03-31')&foo.Type.eq('Payout')].copy()
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64) #.astype(int)
Index Date Type Nights Amount Payout
3 3 2018-09-11 Payout 3 NaN 1500.0
5 5 2019-02-16 Payout 2 NaN 2000.0

Related

Getting errors while running in Jupyter notebook

I'm having trouble running this code in python. This is my code:
import pandas as pd
import numpy as np
stars_with_planet = pd.read_csv(r'C:\Users\Stars\starswithplanet.csv')
df1 = pd.DataFrame(stars_with_planet)
stars_without_planet = pd.read_csv(r'C:\Users\Stars\starswithoutplanet.csv')
df2 = pd.DataFrame(stars_without_planet)
df3 = df1.loc[(df1['TeffK'] >= 3500) & (df1['TeffK'] <= 5400)]
df4 = df2.loc[(df2['TeffK'] >= 3500) & (df2['TeffK'] <= 5400)]
df3['check'] = df3[['[Fe/H]']].apply(tuple, axis=1)\
.isin(df4[['[Fe/H]']].apply(tuple, axis=1))
It is showing the following error after the last line:
C:\Users\AG\AppData\Local\Temp/ipykernel_5940/3520898032.py:1:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead See the caveats in the documentation:
pandas.pydata.org/pandas-docs/stable/user_guide/… df3['check'] =
df3[['[Fe/H]']].apply(tuple, axis=1)\
Please help me I have used Jupyter notebook.
The CSV Files are attached below:
https://drive.google.com/file/d/1eDf2G969tdaxZrM9mQXk3mSKHrjABRUQ/view?usp=sharing
https://drive.google.com/file/d/1t8OZGgxaXbbp5X-9Ms8NJd4AZfYUMOGC/view?usp=sharing

The one you are showing at the line below is not en error:
df3['check'] = df3[['[Fe/H]']].apply(tuple, axis=1).isin(df4[['[Fe/H]']].apply(tuple, axis=1))
It's a warning:
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
The pandas SettingWithCopyWarning warns you that you may be doing some chained assignments that may not work as expected.
Basically the issue is that modifications to your df3 will not lead to modification to your original df1.
If you don't care about keeping df1 updated, but you only care about df3, you could do this:
df3 = df1.loc[(df1['TeffK'] >= 3500) & (df1['TeffK'] <= 5400)].copy()
...
df3['check'] = df3[['[Fe/H]']].apply(tuple, axis=1).isin(df4[['[Fe/H]']].apply(tuple, axis=1))
Otherwise, you can do as suggested by the warning. I'm not entirely sure what your expected outcome is, but this code below updates directly df1:
df1.loc[(df1['TeffK'] >= 3500) & (df1['TeffK'] <= 5400), 'check'] = df1[['[Fe/H]']].apply(tuple, axis=1).isin(df4[['[Fe/H]']].apply(tuple, axis=1))

Warning - value is trying to be set on a copy of a slice

I get the warning when i run this code. I tried all possible solutions I can think of, but cannot get rid of it. Kindly help !
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
import math
task2_df['price_square'] = None
i = 0
for row in data.iterrows():
task2_df['price_square'].at[i] = math.pow(task2_df['price'][i],2)
i += 1

For starters, I don't see your error on Pandas v0.19.2 (tested with code at the bottom of this answer). But that's probably irrelevant to solving your issue. You should avoid iterating rows in Python-level loops. NumPy arrays which are used by Pandas are specifically designed for numerical computations:
df = pd.DataFrame({'price': [54.74, 12.34, 35.45, 51.31]})
df['price_square'] = df['price'].pow(2)
print(df)
price price_square
0 54.74 2996.4676
1 12.34 152.2756
2 35.45 1256.7025
3 51.31 2632.7161
Test on Pandas v0.19.2 with no warnings / errors:
import math
df = pd.DataFrame({'price': [54.74, 12.34, 35.45, 51.31]})
df['price_square'] = None
i = 0
for row in df.iterrows():
df['price_square'].at[i] = math.pow(df['price'][i],2)
i += 1

Replace year on pandas dataframe with variable of Timestamp format

I have created the following df with the following code:
df = pd.read_table('https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/06_Stats/Wind_Stats/wind.data', sep = "\s+", parse_dates = [[0,1,2]])
If we run the following command:
type(df['Yr_Mo_Dy'][0])
We'll see that the observations under ['Yr_Mo_Dy'] are of pandas._libs.tslibs.timestamps.Timestamp format.
What I am trying to do is the following: whenever I see a year >= 2061 (['Yr_Mo_Dy']), I want to subtract -100, otherwise I just keep the year and continue with the iteration.
I have tried the following code:
for i in list(range(df.shape[0])):
# assign all the observations under df['Yr_Mo_Dy'] to ts
ts = df['Yr_Mo_Dy'][i]
if df['Yr_Mo_Dy'][i].year >=2061:
# replace the year in ts by year - 100
ts.replace(year=df['Yr_Mo_Dy'][i].year - 100)
else:
continue
But the loop does nothing. I feel it has something to do with the variable assignment ts = df['Yr_Mo_Dy'][i]. yet I cannot figure another way of getting this done.
I am trying to assign a variable after each loop iteration considering the answer I saw in this post.

You should aim to avoid manual loops for vectorisable operations.
In this case, you can use numpy.where to create a conditional series:
df = pd.DataFrame({'A': pd.to_datetime(['2018-01-01', '2080-11-30',
'1955-04-05', '2075-10-09'])})
df['B'] = np.where(df['A'].dt.year >= 2061,
df['A'] - pd.DateOffset(years=100), df['A'])
print(df)
A B
0 2018-01-01 2018-01-01
1 2080-11-30 1980-11-30
2 1955-04-05 1955-04-05
3 2075-10-09 1975-10-09

Annoy pandas SettingWithCopyWarning, even tried loc[;,]

code_null.loc[:,'code'] = code_null['blockname'].apply(__f,args=(code_name,))
def __f(x, df):
#markets = ['A','B']
markets = ['A']
for market in markets:
code = df.loc[df.name==x,'code'].tolist()
if code:
return ','.join(code)
else:
return np.nan
Always getting SettingWithCopyWarning,
.virtualenv/python3/lib/python3.6/site-packages/pandas/core/indexing.py:537: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Also tried:
code_null.loc[:,'code'] = code_null.loc[:,'blockname'].apply(__f,args=(code_name,))
But got same warning.

code_null.loc[:,'code'] = code_null['blockname'].apply(__f,args=(code_name,)).copy()

Try using:
code_null.loc[:,(code)] = code_null[(blockname)].apply(__f,args=(code_name,))
Copy method is not preferred:
The .copy() method is not guaranteed and should be avoided per the DOCS
Prefered method from the docs:
dfc = pd.DataFrame({'A':['aaa','bbb','ccc'],'B':[1,2,3]})
dfc.loc[0,'A'] = 11
dfc
A B
0 11 1
1 bbb 2
2 ccc 3

pandas standalone series and from dataframe different behavior

Here is my code and warning message. If I change s to be a standalone Series by using s = pd.Series(np.random.randn(5)), there will no such errors. Using Python 2.7 on Windows.
It seems Series created from standalone and Series created from a column of a data frame are different behavior? Thanks.
My purpose is to change the Series value itself, other than change on a copy.
Source code,
import pandas as pd
sample = pd.read_csv('123.csv', header=None, skiprows=1,
dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
s = sample['c_d']
#s = pd.Series(np.random.randn(5))
for i in range(len(s)):
if s.iloc[i] > 0:
s.iloc[i] = s.iloc[i] + 1
else:
s.iloc[i] = s.iloc[i] - 1
Warning message,
C:\Python27\lib\site-packages\pandas\core\indexing.py:132: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
Content of 123.csv,
c_a,c_b,c_c,c_d
hello,python,numpy,0.0
hi,python,pandas,1.0
ho,c++,vector,0.0
ho,c++,std,1.0
go,c++,std,0.0
Edit 1, seems lambda solution does not work, tried to print s before and after, the same value,
import pandas as pd
sample = pd.read_csv('123.csv', header=None, skiprows=1,
dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
s = sample['c_d']
print s
s.apply(lambda x:x+1 if x>0 else x-1)
print s
0 0
1 1
2 0
3 1
4 0
Name: c_d, dtype: int64
Backend TkAgg is interactive backend. Turning interactive mode on.
0 0
1 1
2 0
3 1
4 0
regards,
Lin

By doing s = sample['c_d'], if you make a change to the value of s then your original Dataframe sample also changes. That's why you got the warning.
You can do s = sample[c_d].copy() instead, so that changing the value of s doesn't change the value of c_d column of the Dataframe sample.

I suggest you use apply function instead:
s.apply(lambda x:x+1 if x>0 else x-1)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas SettingWithCopyWarning for unclear reason - python

Related

Getting errors while running in Jupyter notebook

Warning - value is trying to be set on a copy of a slice

Replace year on pandas dataframe with variable of Timestamp format

Annoy pandas SettingWithCopyWarning, even tried loc[;,]

pandas standalone series and from dataframe different behavior

Categories

Resources