Replace a column in pandas if another column contains a string

Replace a column in pandas if another column contains a string - python

I have two columns newlabels and newlabels_tobeReplaced . If newlabels contains the word 'trifurcation' in its sentence, newlabels_tobeReplaced should be replaced to 'trifurcation'
I have the following code
df_new.loc[df_new.newlabels.str.contains('trifurcation'),'newlabels_tobeReplaced'] = 'trifurcation'
But , I get this error:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
Any idea, how to get the correct result.
The problem is that the newlabels has got values like : "Waveforms suggest trifucation disease with mild distal ischemia of left lower extremity at rest at level of ankle."

You can get around that warning by reassigning to df_new with the copy produced from assign
df_new = df_new.assign(
newlabels_tobeReplaced=
lambda d: d['newlabels_tobeReplaced'].mask(
d.newlabels.str.contains('trifurcation'), 'trifurcation'
)
)

Related

SettingWithCopyWarning when trying to get elements not equal to list

I'm trying to remove everything in a dataframe not equal to elements in a list, but I'm getting the following warning:
C:/Users/jalco/PycharmProjects/project/main.py:119: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df[sample'] = ''
C:/Users/jalco/PycharmProjects/project/main.py:120: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['sample'] = np.where((df['num'] > 0) &
Here is my code causing the warning:
if not config_dict['admin']:
df = df[~df['transtype'].isin(transtype['admin'])]
if 'sample' in config_dict['links']:
df['sample'] = ''
df['sample'] = np.where((df['num'] > 0) &
(df['transtype'] == df['coll']),
df['num'], df['sample'])
My question is "is there a better way to drop the rows I don't need or do I just silence the warning manually?"
Thanks

I would add .copy() when actually creating df because that seems to be the root of the problem, and then you can try assigning the column with .loc[]. Also you can save a line of code, by simply using:
df.loc[:,'sample'] = np.where((df['num'] > 0) &
(df['transtype'] == df['coll']),
df['num'], ''])

Python Pandas dataframe modify column value based on function that cleans string value and assign to new column

I have a certain data to clean, it's some keys where the keys have six leading zeros that I want to get rid of, and if the keys are not ending with "ABC" or it's not ending with "DEFG", then I need to clean the currency code in the last 3 indexes. If the key doesn't start with leading zeros, then just return the key as it is.
To achieve this I wrote a function that deals with string as below:
def cleanAttainKey(dirtyAttainKey):
if dirtyAttainKey[0] != "0":
return dirtyAttainKey
else:
dirtyAttainKey = dirtyAttainKey.strip("0")
if dirtyAttainKey[-3:] != "ABC" and dirtyAttainKey[-3:] != "DEFG":
dirtyAttainKey = dirtyAttainKey[:-3]
cleanAttainKey = dirtyAttainKey
return cleanAttainKey
Now I build a dummy data frame to test it but it's reporting errors:
data frame
df = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102]},
columns=["dirtyKey","amount"])
I need to get a new column called "cleanAttainKey" in the df, then modify each value in the "dirtyKey" using the "cleanAttainKey" function, then assign the cleaned key to the new column "cleanAttainKey", however it seems pandas doesn't support this type of modification.
# add a new column in df called cleanAttainKey
df['cleanAttainKey'] = ""
# I want to clean the keys and get into the new column of cleanAttainKey
dirtyAttainKeyList = df['dirtyKey'].tolist()
for i in range(len(df['cleanAttainKey'])):
df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])
I am getting the below error message:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
The result should be the same as the df2 below:
df2 = pd.DataFrame({'dirtyKey':["00000012345ABC","0000012345DEFG","0000023456DEFGUSD"],'amount':[100,101,102],
'cleanAttainKey':["12345ABC","12345DEFG","23456DEFG"]},
columns=["dirtyKey","cleanAttainKey","amount"])
df2
Is there any better way to modify the dirty keys and get a new column with the clean keys in Pandas?
Thanks

Here is the culprit:
df['cleanAttainKey'][i] = cleanAttainKey(vpAttainKeyList[i])
When you use extract of the dataframe, Pandas reserves the ability to choose to make a copy or a view. It does not matter if you are just reading the data, but it means that you should never modify it.
The idiomatic way is to use loc (or iloc or [i]at):
df.loc[i, 'cleanAttainKey'] = cleanAttainKey(vpAttainKeyList[i])
(above assumes a natural range index...)

update a dataframe at indices returned by query

I would like to have query return a view so that I can modify fields without generating this error.
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
I have a multiline (long) query q that I abbreviated here:
df1 = pd.Dataframe(...)
q = "tcpstream==1 and ipsrc=='10.0.0.1' and sport==5201"
df = df1.query(q)
df['dest'] = "toto" # <--- this generates the warning/error
Apparently I could do a df1.update(df) but it seems like a waste, I am looking for something more efficient.

pd.DataFrame.query is designed for querying, not setting values. If you want to use string queries as masks, you can calculate the index and feed into pd.DataFrame.loc:
df.loc[df.query(q).index, 'dest'] = 'toto'

Concatenate columns in Pandas

How do you concatenate columns and add brackets?
Using Jupyter I have tried the following:
df['xxx (yyy)'] = df['xxx'] + ' (' + df['yyy'] + ')'
this results:
|XXXYYY| not |XXX (YYY)
Is there an escape character required?
Also is this warning applicable?
C:\Users\xxxxxx\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\indexing.py:461: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s

Since you want to create an additional column, you should use df['xxx (yyy)'] instead of df.loc['xxx (yyy)'], which creates an additional row.

A value is trying to be set on a copy of a slice from a DataFrame-warning even after using .loc

I am getting a warning "
C:\Python27\lib\site-packages\pandas\core\indexing.py:411: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s"
Although as suggested in document I am using df.loc ?
def sentenceInReview(df):
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
print "size of df: " + str(df.size)
df.loc[: ,'review_text'] = df.review_text.map(lambda x: tokenizer.tokenize(x))
print df[:3]

I ran into this problem earlier today, this problem is related to the way Python passes 'object references' around between functions/assigning variables etc.
Unlike in say, R, in python assigning an existing dataframe to a new variable doesn't make a copy, so any operations on the 'new' dataframe is still a reference to the original underlying data.
The way to get around this is to make a deep copy (see docs) whenever you're trying to return a copy of something. See:
import pandas as pd
data = [1, 2, 3, 4, 5]
df = pd.DataFrame(data, columns = {'num'})
dfh = df.head(3) # This assignment doesn't actually make a copy
dfh.loc[:,'num'] = dfh['num'].apply(lambda x: x + 1)
# This will throw you the error
# Use deepcopy function provided in the default package 'copy'
import copy
df_copy = copy.deepcopy(df.head(3))
df_copy.loc[:,'num'] = df_copy['num'].apply(lambda x: x + 1)
# Making a deep copy breaks the reference to the original df. Hence, no more errors.
Here's a bit more on this topic that might explain the way Python does it better.

The common reason for the warning message "A value is trying to be set on a copy of a slice from a DataFrame": A slice over another slice!
For example:
dfA=dfB['x','y','z']
dfC=dfA['x','z']
"""
For the above codes, you may get such a message since dfC is a slice of dfA while dfA is a slice of dfB. Aka, dfC is a slice over another slice dfA and both are linked to dfB. Under such situation, it does not work whether you use .copy() or deepcopy or other similar ways:-(
"""
Solution:
dfA=dfB['x','y','z']
dfC=dfB['x','z']
Hopefully the above explanation helps:-)

Try inserting the values using pd.Series(data,index= index_list)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace a column in pandas if another column contains a string - python

You can get around that warning by reassigning to df_new with the copy produced from assign df_new = df_new.assign( newlabels_tobeReplaced= lambda d: d['newlabels_tobeReplaced'].mask( d.newlabels.str.contains('trifurcation'), 'trifurcation' ) )

Related

SettingWithCopyWarning when trying to get elements not equal to list

Python Pandas dataframe modify column value based on function that cleans string value and assign to new column

update a dataframe at indices returned by query

Concatenate columns in Pandas

A value is trying to be set on a copy of a slice from a DataFrame-warning even after using .loc

Categories

Resources