SettingWithCopyWarning when trying to get elements not equal to list - python

I'm trying to remove everything in a dataframe not equal to elements in a list, but I'm getting the following warning:
C:/Users/jalco/PycharmProjects/project/main.py:119: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[sample'] = ''
C:/Users/jalco/PycharmProjects/project/main.py:120: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['sample'] = np.where((df['num'] > 0) &
Here is my code causing the warning:
if not config_dict['admin']:
df = df[~df['transtype'].isin(transtype['admin'])]
if 'sample' in config_dict['links']:
df['sample'] = ''
df['sample'] = np.where((df['num'] > 0) &
(df['transtype'] == df['coll']),
df['num'], df['sample'])
My question is "is there a better way to drop the rows I don't need or do I just silence the warning manually?"
Thanks

I would add .copy() when actually creating df because that seems to be the root of the problem, and then you can try assigning the column with .loc[]. Also you can save a line of code, by simply using:
df.loc[:,'sample'] = np.where((df['num'] > 0) &
(df['transtype'] == df['coll']),
df['num'], ''])

Related

Returning a copy versus a view warning when using Python pandas dataframe

My purpose is to transform date column from object type in dateframe df into datetime type, but suffered a lot from view and copy warning when running the program.
I've found some useful information from link: https://stackoverflow.com/a/25254087/3849539
And tested following three solutions, all of them work as expected, but with different warning messages. Could anyone help explain their differences and point out why still warning message for returning a view versus a copy? Thanks.
Solution 1: df['date'] = df['date'].astype('datetime64')
test.py:85: SettingWithCopyWarning: A value is trying to be set on a
copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['date'] = df['date'].astype('datetime64')
Solution 2: df['date'] = pd.to_datetime(df['date'])
~/report/lib/python3.8/site-packages/pandas/core/frame.py:3188:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self[k1] = value[k2]
test.py:85: SettingWithCopyWarning: A value is
trying to be set on a copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Solution 3: df.loc[:, 'date'] = pd.to_datetime(df.loc[:, 'date'])
~/report/lib/python3.8/site-packages/pandas/core/indexing.py:1676:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value
instead
See the caveats in the documentation:
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(ilocs[0], value, pi)
Changing how you do the datetime conversion will not fix the SettingWithCopyWarning. You get it because the df you are working with is already a slice of some larger data frame. Pandas is simply warning you that you are working with the slice and not the full data. Try instead to create a new column in df - you'll get the warning, but the column will exist in your slice. It won't in the original data set.
You can turn off these warnings if you now what you are doing by using pd.options.mode.chained_assignment = None # default='warn'
I got similar warnings recently. After several tries, at least in my case, the problem is not related to your 3 solutions. It might be your 'df'.
If your df was a slice of another pandas df, such as:
df = dfOrigin[slice,:] or
df = dfOrigin[[some columns]] or
df = dfOrigin[one column]
Then, if you do anything on df, that warning will appear. Try using df = dfOrigin[[]].copy() instead.
Code to reproduce this:
import numpy as np
import pandas as pd
np.random.seed(2021)
dfOrigin = pd.DataFrame(np.random.choice(10, (4, 3)), columns=list('ABC'))
print("Orignal dfOrigin")
print(dfOrigin)
# A B C
# 0 4 5 9
# 1 0 6 5
# 2 8 6 6
# 3 6 6 1
df = dfOrigin[['B', 'C']] # Returns a view
df.loc[:,'B'] = df['B'].astype(str) #Get SettingWithCopyWarning
df2 = dfOrigin[['B', 'C']].copy() #Returns a copy
df2['B'] = df2['B'].astype(str) #OK

How to use .loc while creating a new column

df.loc[:,'C'] = df.apply(lambda row: min(row['A'],row['B']) if row['A'] > 0 else max(row['B'],0),axis=1)
I'm creating a new variable 'C' in the dataframe df. I'm getting the slicing error inspite of using the .loc function. How do I fix it?
/opt/python/python35/lib/python3.5/site-packages/pandas/core/indexing.py:362: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
self.obj[key] = _infer_fill_value(value)
/opt/python/python35/lib/python3.5/site-
packages/pandas/core/indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
Link to docs loc
df.loc[:,'C']=df.apply(lambda row: min(row['A'],row['B']) if row['A'] > 0 else max(row['B'],0),axis=1)

Dataframe Warning : SettingWithCopyWarning in python [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 2 years ago.
Processing file from
http://portal.amfiindia.com/spages/NAV0.txt
to get output as follows:
31012017,1,1,135765,12,10.8536000,
31012017,1,1,135762,12,10.8543000,
31012017,1,1,135760,12,10.6599000,
31012017,1,1,135759,12,10.6554000,
31012017,1,1,135763,12,10.8536000,
..
..
..
I have tried using below code but getting below warning.
CODE:
import pandas
import numpy as np
#Sample file for NAV0.txt can be downloaded from url: http://portal.amfiindia.com/spages/NAV0.txt
#creating pandas with selected columns
df=pandas.read_table('NAV0.txt',sep=';',usecols=['Date','Scheme Code','Net Asset Value'])
#converting column with name 'Scheme Code' to digit to remove string part
fil_df=df[df['Scheme Code'].apply(lambda x : str(x).isdigit())]
#converting column with name 'Net Asset value' to numberic and set each value with 7 decimal places
fil_df['Net Asset Value']=pandas.to_numeric(fil_df['Net Asset Value'],errors='coerce')
fil_df['Net Asset Value']=fil_df['Net Asset Value'].map(lambda x: '%2.7f' % x)
#Formating Date column as YYYMMDD
fil_df['Date']=pandas.to_datetime(fil_df['Date']).dt.strftime('%d%m%Y')
#adding extra column in dataframe
fil_df['ser1']=1
fil_df['ser2']=1
fil_df['period']=12
fil_df['lcol']=''
fil_df=fil_df[['Date','ser1','ser2','Scheme Code','period','Net Asset Value','lcol']]
#Converting datafile to csv
fil_df.to_csv('NAV_1.csv',index=False,header=None)
fil_df.dtypes
ERROR:
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:12:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:13:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:17:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:20:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:21:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:22:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
c:\users\administrator\appdata\local\programs\python\python35-32\lib\site-packages\ipykernel__main__.py:23:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
Csv file is getting generated as expected but how can I overcome this warning?
I have tried using
fil_df.loc[ pandas.to_numeric(fil_df['Net Asset Value'],errors='coerce').map(lambda x: '%2.7f' % x]
but it didnt help.
Help would be appreciated.
I think you need add copy:
fil_df=df[df['Scheme Code'].apply(lambda x : str(x).isdigit())].copy()
If you modify values in fil_df later you will find that the modifications do not propagate back to the original data (df), and that Pandas does warning.
If you know what your code is doing, you can use
pd.options.mode.chained_assignment = None # default='warn'
in your code to disable this warning.
You'll get to the heart of the matter in adding new columns to a DataFrame from this guy's 2017 edit to this answer. Basically the route is to use the .assign('newCol' = enumerableValues )

Concatenate columns in Pandas

How do you concatenate columns and add brackets?
Using Jupyter I have tried the following:
df['xxx (yyy)'] = df['xxx'] + ' (' + df['yyy'] + ')'
this results:
|XXXYYY| not |XXX (YYY)
Is there an escape character required?
Also is this warning applicable?
C:\Users\xxxxxx\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\indexing.py:461: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
Since you want to create an additional column, you should use df['xxx (yyy)'] instead of df.loc['xxx (yyy)'], which creates an additional row.

A value is trying to be set on a copy of a slice from a DataFrame-warning even after using .loc

I am getting a warning "
C:\Python27\lib\site-packages\pandas\core\indexing.py:411: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s"
Although as suggested in document I am using df.loc ?
def sentenceInReview(df):
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
print "size of df: " + str(df.size)
df.loc[: ,'review_text'] = df.review_text.map(lambda x: tokenizer.tokenize(x))
print df[:3]
I ran into this problem earlier today, this problem is related to the way Python passes 'object references' around between functions/assigning variables etc.
Unlike in say, R, in python assigning an existing dataframe to a new variable doesn't make a copy, so any operations on the 'new' dataframe is still a reference to the original underlying data.
The way to get around this is to make a deep copy (see docs) whenever you're trying to return a copy of something. See:
import pandas as pd
data = [1, 2, 3, 4, 5]
df = pd.DataFrame(data, columns = {'num'})
dfh = df.head(3) # This assignment doesn't actually make a copy
dfh.loc[:,'num'] = dfh['num'].apply(lambda x: x + 1)
# This will throw you the error
# Use deepcopy function provided in the default package 'copy'
import copy
df_copy = copy.deepcopy(df.head(3))
df_copy.loc[:,'num'] = df_copy['num'].apply(lambda x: x + 1)
# Making a deep copy breaks the reference to the original df. Hence, no more errors.
Here's a bit more on this topic that might explain the way Python does it better.
The common reason for the warning message "A value is trying to be set on a copy of a slice from a DataFrame": A slice over another slice!
For example:
dfA=dfB['x','y','z']
dfC=dfA['x','z']
"""
For the above codes, you may get such a message since dfC is a slice of dfA while dfA is a slice of dfB. Aka, dfC is a slice over another slice dfA and both are linked to dfB. Under such situation, it does not work whether you use .copy() or deepcopy or other similar ways:-(
"""
Solution:
dfA=dfB['x','y','z']
dfC=dfB['x','z']
Hopefully the above explanation helps:-)
Try inserting the values using pd.Series(data,index= index_list)

Categories

Resources