pandas dataframe .loc copyerror - python

Trying to perfom basic operation on a pandas dataframe and I get this error
my_script.py:22: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
df['col_new'] = df['col2'] - df['col1'] + 1
changed it to:
df['col_new'] = df.loc['end'] - df.loc['start'] + 1
For which I get this error:
KeyError: 'the label [end] is not in the [index]'
What am I doing wrong?

Related

Returning a copy versus a view warning when using Python pandas dataframe

My purpose is to transform date column from object type in dateframe df into datetime type, but suffered a lot from view and copy warning when running the program.
I've found some useful information from link: https://stackoverflow.com/a/25254087/3849539
And tested following three solutions, all of them work as expected, but with different warning messages. Could anyone help explain their differences and point out why still warning message for returning a view versus a copy? Thanks.
Solution 1: df['date'] = df['date'].astype('datetime64')
test.py:85: SettingWithCopyWarning: A value is trying to be set on a
copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['date'] = df['date'].astype('datetime64')
Solution 2: df['date'] = pd.to_datetime(df['date'])
~/report/lib/python3.8/site-packages/pandas/core/frame.py:3188:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
test.py:85: SettingWithCopyWarning: A value is
trying to be set on a copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Solution 3: df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])
~/report/lib/python3.8/site-packages/pandas/core/indexing.py:1676:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(ilocs[0], value, pi)
Changing how you do the datetime conversion will not fix the SettingWithCopyWarning. You get it because the df you are working with is already a slice of some larger data frame. Pandas is simply warning you that you are working with the slice and not the full data. Try instead to create a new column in df - you'll get the warning, but the column will exist in your slice. It won't in the original data set.
You can turn off these warnings if you now what you are doing by using pd.options.mode.chained_assignment = None # default='warn'
I got similar warnings recently. After several tries, at least in my case, the problem is not related to your 3 solutions. It might be your 'df'.
If your df was a slice of another pandas df, such as:
df = dfOrigin[slice,:] or
df = dfOrigin[[some columns]] or
df = dfOrigin[one column]
Then, if you do anything on df, that warning will appear. Try using df = dfOrigin[[]].copy() instead.
Code to reproduce this:
import numpy as np
import pandas as pd
np.random.seed(2021)
dfOrigin = pd.DataFrame(np.random.choice(10, (4, 3)), columns=list('ABC'))
print("Orignal dfOrigin")
print(dfOrigin)
# A B C
# 0 4 5 9
# 1 0 6 5
# 2 8 6 6
# 3 6 6 1
df = dfOrigin[['B', 'C']] # Returns a view
df.loc[:,'B'] = df['B'].astype(str) #Get SettingWithCopyWarning
df2 = dfOrigin[['B', 'C']].copy() #Returns a copy
df2['B'] = df2['B'].astype(str) #OK

SettingWithCopyWarning when trying to get elements not equal to list

I'm trying to remove everything in a dataframe not equal to elements in a list, but I'm getting the following warning:
C:/Users/jalco/PycharmProjects/project/main.py:119: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[sample'] = ''
C:/Users/jalco/PycharmProjects/project/main.py:120: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['sample'] = np.where((df['num'] > 0) &
Here is my code causing the warning:
if not config_dict['admin']:
df = df[~df['transtype'].isin(transtype['admin'])]
if 'sample' in config_dict['links']:
df['sample'] = ''
df['sample'] = np.where((df['num'] > 0) &
(df['transtype'] == df['coll']),
df['num'], df['sample'])
My question is "is there a better way to drop the rows I don't need or do I just silence the warning manually?"
Thanks
I would add .copy() when actually creating df because that seems to be the root of the problem, and then you can try assigning the column with .loc[]. Also you can save a line of code, by simply using:
df.loc[:,'sample'] = np.where((df['num'] > 0) &
(df['transtype'] == df['coll']),
df['num'], ''])

How to use .loc while creating a new column

df.loc[:,'C'] = df.apply(lambda row: min(row['A'],row['B']) if row['A'] > 0 else max(row['B'],0),axis=1)
I'm creating a new variable 'C' in the dataframe df. I'm getting the slicing error inspite of using the .loc function. How do I fix it?
/opt/python/python35/lib/python3.5/site-packages/pandas/core/indexing.py:362: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
self.obj[key] = _infer_fill_value(value)
/opt/python/python35/lib/python3.5/site-
packages/pandas/core/indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
Link to docs loc
df.loc[:,'C']=df.apply(lambda row: min(row['A'],row['B']) if row['A'] > 0 else max(row['B'],0),axis=1)

Dataframe Warning : SettingWithCopyWarning in python [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 2 years ago.
Processing file from
http://portal.amfiindia.com/spages/NAV0.txt
to get output as follows:
31012017,1,1,135765,12,10.8536000,
31012017,1,1,135762,12,10.8543000,
31012017,1,1,135760,12,10.6599000,
31012017,1,1,135759,12,10.6554000,
31012017,1,1,135763,12,10.8536000,
..
..
..
I have tried using below code but getting below warning.
CODE:
import pandas
import numpy as np
#Sample file for NAV0.txt can be downloaded from url: http://portal.amfiindia.com/spages/NAV0.txt
#creating pandas with selected columns
df=pandas.read_table('NAV0.txt',sep=';',usecols=['Date','Scheme Code','Net Asset Value'])
#converting column with name 'Scheme Code' to digit to remove string part
fil_df=df[df['Scheme Code'].apply(lambda x : str(x).isdigit())]
#converting column with name 'Net Asset value' to numberic and set each value with 7 decimal places
fil_df['Net Asset Value']=pandas.to_numeric(fil_df['Net Asset Value'],errors='coerce')
fil_df['Net Asset Value']=fil_df['Net Asset Value'].map(lambda x: '%2.7f' % x)
#Formating Date column as YYYMMDD
fil_df['Date']=pandas.to_datetime(fil_df['Date']).dt.strftime('%d%m%Y')
#adding extra column in dataframe
fil_df['ser1']=1
fil_df['ser2']=1
fil_df['period']=12
fil_df['lcol']=''
fil_df=fil_df[['Date','ser1','ser2','Scheme Code','period','Net Asset Value','lcol']]
#Converting datafile to csv
fil_df.to_csv('NAV_1.csv',index=False,header=None)
fil_df.dtypes
ERROR:
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:12:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:13:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:17:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:20:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:21:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:22:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:23:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
Csv file is getting generated as expected but how can I overcome this warning?
I have tried using
fil_df.loc[ pandas.to_numeric(fil_df['Net Asset Value'],errors='coerce').map(lambda x: '%2.7f' % x]
but it didnt help.
Help would be appreciated.
I think you need add copy:
fil_df=df[df['Scheme Code'].apply(lambda x : str(x).isdigit())].copy()
If you modify values in fil_df later you will find that the modifications do not propagate back to the original data (df), and that Pandas does warning.
If you know what your code is doing, you can use
pd.options.mode.chained_assignment = None # default='warn'
in your code to disable this warning.
You'll get to the heart of the matter in adding new columns to a DataFrame from this guy's 2017 edit to this answer. Basically the route is to use the .assign('newCol' = enumerableValues )

Concatenate columns in Pandas

How do you concatenate columns and add brackets?
Using Jupyter I have tried the following:
df['xxx (yyy)'] = df['xxx'] + ' (' + df['yyy'] + ')'
this results:
|XXXYYY| not |XXX (YYY)
Is there an escape character required?
Also is this warning applicable?
C:\Users\xxxxxx\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\indexing.py:461: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
Since you want to create an additional column, you should use df['xxx (yyy)'] instead of df.loc['xxx (yyy)'], which creates an additional row.

Categories

Resources