I have two columns newlabels and newlabels_tobeReplaced . If newlabels contains the word 'trifurcation' in its sentence, newlabels_tobeReplaced should be replaced to 'trifurcation'
I have the following code
df_new.loc[df_new.newlabels.str.contains('trifurcation'),'newlabels_tobeReplaced'] = 'trifurcation'
But , I get this error:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
Any idea, how to get the correct result.
The problem is that the newlabels has got values like : "Waveforms suggest trifucation disease with mild distal ischemia of left lower extremity at rest at level of ankle."
You can get around that warning by reassigning to df_new with the copy produced from assign
df_new = df_new.assign(
newlabels_tobeReplaced=
lambda d: d['newlabels_tobeReplaced'].mask(
d.newlabels.str.contains('trifurcation'), 'trifurcation'
)
)
Related
I'm trying to remove everything in a dataframe not equal to elements in a list, but I'm getting the following warning:
C:/Users/jalco/PycharmProjects/project/main.py:119: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[sample'] = ''
C:/Users/jalco/PycharmProjects/project/main.py:120: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['sample'] = np.where((df['num'] > 0) &
Here is my code causing the warning:
if not config_dict['admin']:
df = df[~df['transtype'].isin(transtype['admin'])]
if 'sample' in config_dict['links']:
df['sample'] = ''
df['sample'] = np.where((df['num'] > 0) &
(df['transtype'] == df['coll']),
df['num'], df['sample'])
My question is "is there a better way to drop the rows I don't need or do I just silence the warning manually?"
Thanks
I would add .copy() when actually creating df because that seems to be the root of the problem, and then you can try assigning the column with .loc[]. Also you can save a line of code, by simply using:
df.loc[:,'sample'] = np.where((df['num'] > 0) &
(df['transtype'] == df['coll']),
df['num'], ''])
I have a certain data to clean, it's some keys where the keys have six leading zeros that I want to get rid of, and if the keys are not ending with "ABC" or it's not ending with "DEFG", then I need to clean the currency code in the last 3 indexes. If the key doesn't start with leading zeros, then just return the key as it is.
To achieve this I wrote a function that deals with string as below:
def cleanAttainKey(dirtyAttainKey):
if dirtyAttainKey[0] != "0":
return dirtyAttainKey
else:
dirtyAttainKey = dirtyAttainKey.strip("0")
if dirtyAttainKey[-3:] != "ABC" and dirtyAttainKey[-3:] != "DEFG":
dirtyAttainKey = dirtyAttainKey[:-3]
cleanAttainKey = dirtyAttainKey
return cleanAttainKey
Now I build a dummy data frame to test it but it's reporting errors:
data frame
df = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102]},
columns=["dirtyKey","amount"])
I need to get a new column called "cleanAttainKey" in the df, then modify each value in the "dirtyKey" using the "cleanAttainKey" function, then assign the cleaned key to the new column "cleanAttainKey", however it seems pandas doesn't support this type of modification.
# add a new column in df called cleanAttainKey
df['cleanAttainKey'] = ""
# I want to clean the keys and get into the new column of cleanAttainKey
dirtyAttainKeyList = df['dirtyKey'].tolist()
for i in range(len(df['cleanAttainKey'])):
df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])
I am getting the below error message:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
The result should be the same as the df2 below:
df2 = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102],
'cleanAttainKey':["12345ABC","12345DEFG","23456DEFG"]},
columns=["dirtyKey","cleanAttainKey","amount"])
df2
Is there any better way to modify the dirty keys and get a new column with the clean keys in Pandas?
Thanks
Here is the culprit:
df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])
When you use extract of the dataframe, Pandas reserves the ability to choose to make a copy or a view. It does not matter if you are just reading the data, but it means that you should never modify it.
The idiomatic way is to use loc (or iloc or [i]at):
df.loc[i, 'cleanAttainKey'] = cleanAttainKey(vpAttainKeyList[i])
(above assumes a natural range index...)
I would like to have query return a view so that I can modify fields without generating this error.
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
I have a multiline (long) query q that I abbreviated here:
df1 = pd.Dataframe(...)
q = "tcpstream==1 and ipsrc=='10.0.0.1' and sport==5201"
df = df1.query(q)
df['dest'] = "toto" # <--- this generates the warning/error
Apparently I could do a df1.update(df) but it seems like a waste, I am looking for something more efficient.
pd.DataFrame.query is designed for querying, not setting values. If you want to use string queries as masks, you can calculate the index and feed into pd.DataFrame.loc:
df.loc[df.query(q).index, 'dest'] = 'toto'
How do you concatenate columns and add brackets?
Using Jupyter I have tried the following:
df['xxx (yyy)'] = df['xxx'] + ' (' + df['yyy'] + ')'
this results:
|XXXYYY| not |XXX (YYY)
Is there an escape character required?
Also is this warning applicable?
C:\Users\xxxxxx\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\indexing.py:461: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
Since you want to create an additional column, you should use df['xxx (yyy)'] instead of df.loc['xxx (yyy)'], which creates an additional row.
I am getting a warning "
C:\Python27\lib\site-packages\pandas\core\indexing.py:411: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s"
Although as suggested in document I am using df.loc ?
def sentenceInReview(df):
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
print "size of df: " + str(df.size)
df.loc[: ,'review_text'] = df.review_text.map(lambda x: tokenizer.tokenize(x))
print df[:3]
I ran into this problem earlier today, this problem is related to the way Python passes 'object references' around between functions/assigning variables etc.
Unlike in say, R, in python assigning an existing dataframe to a new variable doesn't make a copy, so any operations on the 'new' dataframe is still a reference to the original underlying data.
The way to get around this is to make a deep copy (see docs) whenever you're trying to return a copy of something. See:
import pandas as pd
data = [1, 2, 3, 4, 5]
df = pd.DataFrame(data, columns = {'num'})
dfh = df.head(3) # This assignment doesn't actually make a copy
dfh.loc[:,'num'] = dfh['num'].apply(lambda x: x + 1)
# This will throw you the error
# Use deepcopy function provided in the default package 'copy'
import copy
df_copy = copy.deepcopy(df.head(3))
df_copy.loc[:,'num'] = df_copy['num'].apply(lambda x: x + 1)
# Making a deep copy breaks the reference to the original df. Hence, no more errors.
Here's a bit more on this topic that might explain the way Python does it better.
The common reason for the warning message "A value is trying to be set on a copy of a slice from a DataFrame": A slice over another slice!
For example:
dfA=dfB['x','y','z']
dfC=dfA['x','z']
"""
For the above codes, you may get such a message since dfC is a slice of dfA while dfA is a slice of dfB. Aka, dfC is a slice over another slice dfA and both are linked to dfB. Under such situation, it does not work whether you use .copy() or deepcopy or other similar ways:-(
"""
Solution:
dfA=dfB['x','y','z']
dfC=dfB['x','z']
Hopefully the above explanation helps:-)
Try inserting the values using pd.Series(data,index= index_list)