I'm having trouble running this code in python. This is my code:
import pandas as pd
import numpy as np
stars_with_planet = pd.read_csv(r'C:\Users\Stars\starswithplanet.csv')
df1 = pd.DataFrame(stars_with_planet)
stars_without_planet = pd.read_csv(r'C:\Users\Stars\starswithoutplanet.csv')
df2 = pd.DataFrame(stars_without_planet)
df3 = df1.loc[(df1['TeffK'] >= 3500) & (df1['TeffK'] <= 5400)]
df4 = df2.loc[(df2['TeffK'] >= 3500) & (df2['TeffK'] <= 5400)]
df3['check'] = df3[['[Fe/H]']].apply(tuple, axis=1)\
.isin(df4[['[Fe/H]']].apply(tuple, axis=1))
It is showing the following error after the last line:
C:\Users\AG\AppData\Local\Temp/ipykernel_5940/3520898032.py:1:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead See the caveats in the documentation:
pandas.pydata.org/pandas-docs/stable/user_guide/… df3['check'] =
df3[['[Fe/H]']].apply(tuple, axis=1)\
Please help me I have used Jupyter notebook.
The CSV Files are attached below:
https://drive.google.com/file/d/1eDf2G969tdaxZrM9mQXk3mSKHrjABRUQ/view?usp=sharing
https://drive.google.com/file/d/1t8OZGgxaXbbp5X-9Ms8NJd4AZfYUMOGC/view?usp=sharing
The one you are showing at the line below is not en error:
df3['check'] = df3[['[Fe/H]']].apply(tuple, axis=1).isin(df4[['[Fe/H]']].apply(tuple, axis=1))
It's a warning:
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:16: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
The pandas SettingWithCopyWarning warns you that you may be doing some chained assignments that may not work as expected.
Basically the issue is that modifications to your df3 will not lead to modification to your original df1.
If you don't care about keeping df1 updated, but you only care about df3, you could do this:
df3 = df1.loc[(df1['TeffK'] >= 3500) & (df1['TeffK'] <= 5400)].copy()
...
df3['check'] = df3[['[Fe/H]']].apply(tuple, axis=1).isin(df4[['[Fe/H]']].apply(tuple, axis=1))
Otherwise, you can do as suggested by the warning. I'm not entirely sure what your expected outcome is, but this code below updates directly df1:
df1.loc[(df1['TeffK'] >= 3500) & (df1['TeffK'] <= 5400), 'check'] = df1[['[Fe/H]']].apply(tuple, axis=1).isin(df4[['[Fe/H]']].apply(tuple, axis=1))
Related
In Pandas I'm trying to filter my DataFrame by date, followed by extracting a reportId string (i.e. 6 digits between dashes) from a longer string; however, when I run the below code I get the warning:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
list_date = [1632309961, 1632310980, 1632311134, 1632411137,
1632411139, 1632411142, 1632411144, 1632411146,
1632413166, 1632413427]
list_id = ['se-84c735-hg5675', 'se-5f73s9-hg3465', 'se-1f34g6-hg3455', 'se-09f67s-hg5123',
'se-5g63g9-hg1235', 'se-47h8h0-hg5555', 'se-h901h3-hg6755', 'se-287n54-hg5321',
'se-g357a8-hg6675', 'se-56q89r-hg5767']
df = pd.DataFrame([list_date, list_id], index=['date_unix','id']).T
def test_extract(df):
df['date'] = pd.to_datetime(df['date_unix'], unit='s')
df = df[df['date'] >= pd.to_datetime('2021-09-23')]
df['reportId'] = df['id'].str.extract(r"([a-zA-Z0-9]{6})")
return df
test_extract(df)
I've tried a few different fixes like making my date filter using .loc[row_indexer,col_indexer] or throwing .copy() after everything; however, I get the same issue:
def test_extract(df):
df['date'] = pd.to_datetime(df['date_unix'], unit='s')
df = df.loc[df['date'] >= pd.to_datetime('2021-09-23'),:]
df['reportId'] = df['id'].str.extract(r"([a-zA-Z0-9]{6})")
return df
Strangely, when I run this same code outside of a function I no longer receive the warning. Can anyone provide me with a solution for avoiding this warning while the code is in the function?
Info:
Pandas - 0.23.4 :: Python 3.7.10 ::
OS - Linux (Ubuntu 16.04.7 LTS)
I found a fix to this issue; however I'm still unsure of why this solution works but not others. I simply moved the Pandas regex extraction before the date filter. It makes sense that df['reportId'] is no longer being created from a copy of a slice, but I still don't know why formatting the date filter with .loc didn't solve this. If anyone has insight I welcome your comment.
def test_extract(df):
df['reportId'] = df['id'].str.extract(r"([a-zA-Z0-9]{6})")
df['date'] = pd.to_datetime(df['date_unix'], unit='s')
df = df[df['date'] >= pd.to_datetime('2021-09-23')]
return df
test_extract(df)
I am a complete Python and Pandas novice. I am following a tutorial, and so far have the following code:
import numpy as np
import pandas as pd
import plotly as pyplot
import datetime
df = pd.read_csv("GlobalLandTemperaturesByCountry.csv")
df = df.drop("AverageTemperatureUncertainty", axis=1)
df = df.rename(columns={"dt": "Date"})
df = df.rename(columns={"AverageTemperature": "AvTemp"})
df = df.dropna()
df_countries = df.groupby(["Country", "Date"]).sum().reset_index().sort_values("Date", ascending=False)
start_date = "2001-01-01"
end_date = "2002-01-01"
mask = (df_countries["Date"] > start_date) & (df_countries["Date"] <= end_date)
df_mask = df_countries.loc(mask)
When I try and run the code, I get an error on the last line, i.e. df_mask = df_countries.loc(mask), the error being:
TypeError 'Series' objects are mutable, thus they cannot be hashed
I have already found several StackOverflow answers for this error, but none seem to match my scenario enough to help. Why am I getting this error?
In above example df_countries is dataframe and mask seems to be condition which is to be applied on this dataframe.
The object is mutable, meaning that its value can be changed without reassigning it the same variable, its contents will change at some point in the code. As a result, its hash value will change, so they cannot be hashed.
Try:
df_mask = df_countries.loc[(mask)]
Consider the following example code
import pandas as pd
import numpy as np
pd.set_option('display.expand_frame_repr', False)
foo = pd.read_csv("foo2.csv", skipinitialspace=True, index_col='Index')
foo.loc[:, 'Date'] = pd.to_datetime(foo.Date)
for i in range(0, len(foo)-1):
if foo.at[i, 'Type'] == 'Reservation':
for j in range(i+1, len(foo)):
if foo.at[j, 'Type'] == 'Payout':
foo.at[j, 'Nights'] = foo.at[i, 'Nights']
break
mask = (foo['Date'] >= '2018-03-31') & (foo['Date'] <= '2019-03-31')
foo2019 = foo.loc[mask]
foopayouts2019 = foo2019.loc[foo2019['Type'] == 'Payout']
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
# foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].astype(np.int64, copy=False)
with foo2.csv as:
Index,Date,Type,Nights,Amount,Payout
0,03/07/2018,Reservation,2.0,1000.00,
1,03/07/2018,Payout,,,1000.00
2,09/11/2018,Reservation,3.0,1500.00,
3,09/11/2018,Payout,,,1500.00
4,02/16/2019,Reservation,2.0,2000.00,
5,02/16/2019,Payout,,,2000.00
6,04/25/2019,Reservation,7.0,1200.00,
7,04/25/2019,Payout,,,1200.00
This gives the following warning:
/usr/lib/python2.7/dist-packages/pandas/core/indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
The warning does not mention a line number, but appears to be coming from the line:
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
At least, if I comment that line out, the error goes away. So, I have two questions.
What is causing that error? I've been trying to use .loc where
appropriate, including in that line where the warning is (possibly)
coming from. If the problem is actually earlier, where is it?
Second, which is the better choice, .apply or astype, as used in
the following lines of code?
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64)
# foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].astype(np.int64, copy=False)
It seems that both of them work, except for that warning.
I would change a few things in the code:
We are checking if the current row is Reservation and the next row is Payout
by using shift()
and ffill-ing the values where condition matches by using np.where()
foo.Date=pd.to_datetime(foo.Date) #convert to datetime
c=foo.Type.eq('Reservation')&foo.Type.shift(-1).eq('Payout')
foo.Nights=np.where(~c,foo.Nights.ffill(),foo.Nights) #replace if else with np.where
Or:
c=foo.Type.shift().eq('Reservation')&foo.Type.eq('Payout')
np.where(c,foo.Nights.ffill(),foo.Nights)
Then use series.between() to check if dates fall between 2 dates:
foo2019 = foo[foo.Date.between('2018-03-31','2019-03-31')].copy() #changes
foopayouts2019 = foo2019[foo2019['Type'] == 'Payout'].copy() #changes .copy()
Or directly:
foopayouts2019=foo[foo.Date.between('2018-03-31','2019-03-31')&foo.Type.eq('Payout')].copy()
foopayouts2019.loc[:, 'Nights'] = foopayouts2019['Nights'].apply(np.int64) #.astype(int)
Index Date Type Nights Amount Payout
3 3 2018-09-11 Payout 3 NaN 1500.0
5 5 2019-02-16 Payout 2 NaN 2000.0
I get the warning when i run this code. I tried all possible solutions I can think of, but cannot get rid of it. Kindly help !
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
import math
task2_df['price_square'] = None
i = 0
for row in data.iterrows():
task2_df['price_square'].at[i] = math.pow(task2_df['price'][i],2)
i += 1
For starters, I don't see your error on Pandas v0.19.2 (tested with code at the bottom of this answer). But that's probably irrelevant to solving your issue. You should avoid iterating rows in Python-level loops. NumPy arrays which are used by Pandas are specifically designed for numerical computations:
df = pd.DataFrame({'price': [54.74, 12.34, 35.45, 51.31]})
df['price_square'] = df['price'].pow(2)
print(df)
price price_square
0 54.74 2996.4676
1 12.34 152.2756
2 35.45 1256.7025
3 51.31 2632.7161
Test on Pandas v0.19.2 with no warnings / errors:
import math
df = pd.DataFrame({'price': [54.74, 12.34, 35.45, 51.31]})
df['price_square'] = None
i = 0
for row in df.iterrows():
df['price_square'].at[i] = math.pow(df['price'][i],2)
i += 1
Here is my code and warning message. If I change s to be a standalone Series by using s = pd.Series(np.random.randn(5)), there will no such errors. Using Python 2.7 on Windows.
It seems Series created from standalone and Series created from a column of a data frame are different behavior? Thanks.
My purpose is to change the Series value itself, other than change on a copy.
Source code,
import pandas as pd
sample = pd.read_csv('123.csv', header=None, skiprows=1,
dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
s = sample['c_d']
#s = pd.Series(np.random.randn(5))
for i in range(len(s)):
if s.iloc[i] > 0:
s.iloc[i] = s.iloc[i] + 1
else:
s.iloc[i] = s.iloc[i] - 1
Warning message,
C:\Python27\lib\site-packages\pandas\core\indexing.py:132: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
Content of 123.csv,
c_a,c_b,c_c,c_d
hello,python,numpy,0.0
hi,python,pandas,1.0
ho,c++,vector,0.0
ho,c++,std,1.0
go,c++,std,0.0
Edit 1, seems lambda solution does not work, tried to print s before and after, the same value,
import pandas as pd
sample = pd.read_csv('123.csv', header=None, skiprows=1,
dtype={0:str, 1:str, 2:str, 3:float})
sample.columns = pd.Index(data=['c_a', 'c_b', 'c_c', 'c_d'])
sample['c_d'] = sample['c_d'].astype('int64')
s = sample['c_d']
print s
s.apply(lambda x:x+1 if x>0 else x-1)
print s
0 0
1 1
2 0
3 1
4 0
Name: c_d, dtype: int64
Backend TkAgg is interactive backend. Turning interactive mode on.
0 0
1 1
2 0
3 1
4 0
regards,
Lin
By doing s = sample['c_d'], if you make a change to the value of s then your original Dataframe sample also changes. That's why you got the warning.
You can do s = sample[c_d].copy() instead, so that changing the value of s doesn't change the value of c_d column of the Dataframe sample.
I suggest you use apply function instead:
s.apply(lambda x:x+1 if x>0 else x-1)