How to replace symbols from object in dataframe - python

I'm trying to replace symbols from object in python, I used
df_summary.replace('\(|\)!,"-', '', regex=True)
but it didn't change anything.

The replace function is not in place. This means that your dataframe will be unchanged, and the result is returned as the return value of the replac function.
You can the the inplace parameter of replace:
df_summary.replace('\(|\)!,"-', '', regex=True, inplace=True)
Most of the pandas functions are note in place and require if needed either the inplace argument, or the assignement of the result to a new dataframe.

You can either do
df_summary.replace('\(|\)!,"-', '', regex=True, inplace=True)
or
df_summary = df_summary.replace('\(|\)!,"-', '', regex=True)
When you only do df_summary.replace..., this line returns you a pandas list. You forgot to save it. Please add comments to further assist you

Apart from addind regex=True or setting the result to df_summary again, your are using a pattern:
`\(|\)!,"-`
That matches matches either ( OR the string \)!,"-
As you are referring to symbols, and you want to replace all separate chars ( ) ! , " - you can use a repeated character class [()!,"-]+ to replace multiple consecutive matches at once.
df_summary.replace('[()!,"-]+', '', regex=True, inplace=True)

Related

How can I replace substring from string by a list in a column dataframe?

I need to replace substrings in a column value in dataframe
Example: I have this column 'code' in a dataframe (in really, the dataframe is very large)
3816R(motor) #I need '3816R'
97224(Eletro)
502812(Defletor)
97252(Defletor)
97525(Eletro)
5725 ( 56)
And I have this list to replace the values:
list = ['(motor)', '(Eletro)', '(Defletor)', '(Eletro)', '( 56)']
I've tried a lot of methods, like:
df['code'] = df['code'].str.replace(list, '')
And regex= True, but anyone method worked to remove the substrings.
How can I do that?
You can try regex replace and regex or condition: https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html
https://www.ocpsoft.org/tutorials/regular-expressions/or-in-regex/
l = ['(motor)', '(Eletro)', '(Defletor)', '( 56)']
l = [s.replace('(', '\(').replace(')', '\)') for s in l]
regex_str = f"({'|'.join(l)})"
df['code'] = df['code'].str.replace(regex_str, '', regex=True)
The regex_str will end up with something like
"(\(motor\)|\(Eletro\)|\(Defletor\)|\( 56\))"
If you are certain any and all rows follow the format provided, you could attempt the following by using a lambda function:
df['code_clean'] = df['code'].apply(lambda x: x.split('(')[0])
You can try the regular expression match method:
https://docs.python.org/3/library/re.html#re.Pattern.match
df['code'] = df['code'].apply(lambda x: re.match(r'^(\w+)\(\w+\)',x).group(1))
The first part of the regular expression ^(\w+), creates a capturing group of any letters or numbers before encountering a parenthesis. The group(1) then extracts the text.
str.replace will work with one string not a list of strings.. you could probably loop through it
rmlist = ['(motor)', '(Eletro)', '(Defletor)', '(Eletro)', '( 56)']
for repl in rmlist:
df['code'] = df['code'].str.replace(repl, '')
alternatively if your bracketed substring is at the end.. split it at "("
and discard additional column generated..will be faster for sure
df["code"]=df["code"].str.split(pat="(",n=1,expand=True)[0]
str.split is reasonably fast

How to replace a character with \ in excel file [duplicate]

I have a column in my dataframe like this:
range
"(2,30)"
"(50,290)"
"(400,1000)"
...
and I want to replace the , comma with - dash. I'm currently using this method but nothing is changed.
org_info_exc['range'].replace(',', '-', inplace=True)
Can anybody help?
Use the vectorised str method replace:
df['range'] = df['range'].str.replace(',','-')
df
range
0 (2-30)
1 (50-290)
EDIT: so if we look at what you tried and why it didn't work:
df['range'].replace(',','-',inplace=True)
from the docs we see this description:
str or regex: str: string exactly matching to_replace will be replaced
with value
So because the str values do not match, no replacement occurs, compare with the following:
df = pd.DataFrame({'range':['(2,30)',',']})
df['range'].replace(',','-', inplace=True)
df['range']
0 (2,30)
1 -
Name: range, dtype: object
here we get an exact match on the second row and the replacement occurs.
For anyone else arriving here from Google search on how to do a string replacement on all columns (for example, if one has multiple columns like the OP's 'range' column):
Pandas has a built in replace method available on a dataframe object.
df.replace(',', '-', regex=True)
Source: Docs
If you only need to replace characters in one specific column, somehow regex=True and in place=True all failed, I think this way will work:
data["column_name"] = data["column_name"].apply(lambda x: x.replace("characters_need_to_replace", "new_characters"))
lambda is more like a function that works like a for loop in this scenario.
x here represents every one of the entries in the current column.
The only thing you need to do is to change the "column_name", "characters_need_to_replace" and "new_characters".
Replace all commas with underscore in the column names
data.columns= data.columns.str.replace(' ','_',regex=True)
In addition, for those looking to replace more than one character in a column, you can do it using regular expressions:
import re
chars_to_remove = ['.', '-', '(', ')', '']
regular_expression = '[' + re.escape (''. join (chars_to_remove)) + ']'
df['string_col'].str.replace(regular_expression, '', regex=True)
Almost similar to the answer by Nancy K, this works for me:
data["column_name"] = data["column_name"].apply(lambda x: x.str.replace("characters_need_to_replace", "new_characters"))
If you want to remove two or more elements from a string, example the characters '$' and ',' :
Column_Name
===========
$100,000
$1,100,000
... then use:
data.Column_Name.str.replace("[$,]", "", regex=True)
=> [ 100000, 1100000 ]

Replacing a word in a psycopg2.sql.Composed object

I have a rather complex psycopg2.sql.Composed object on which I simply need to replace a word for retro-compatibility issue.
Before having such an object, I was having an f-string on which this snippet worked like a charm:
if v4:
sql_update_query = re.sub(
'word_to_replace,',
'new_replacement_word',
sql_update_query
)
I naively tried to do on the psycopg2.sql.Composed object:
if v4:
sql_update_query = re.sub(
'word_to_replace,',
'new_replacement_word',
sql_update_query.as_string(conn) # conversion to a string for re.sub() to work
)
It's OK, but then how to get back to a true psycopg2.sql.Composed object?
Never mind, I noticed that the replacement was done on column identifiers only.
Therefore, I extracted them out in a list and did the replacement within the list itself, e.g. as :
columns_list = [
re.sub(
'^word_to_replace$',
'new_replacement_word',
col
) for col in columns
]
Please, notice that I had a comma at the end of the word to replace in the original post; it was a trick because some columns were starting with the same name but then with some _suffixes. Therefore, I had to precise at least using the dollar sign in the re.sub() first expression, that I restrict the word to be replace to this particular one, letting all suffixes versions untouched.
Then I added the columns as sql.Identifier:
sql.SQL(', ').join(map(sql.Identifier, columns_list)),
and voila.

Remove a bullet character in a Pandas column

Consider:
users_df['LASTNAME_TEST'] = users_df['LASTNAME'].replace(u'•','')
for item in users_df['LASTNAME_TEST']:
if u'•' in item:
print('yay')
I'm trying to remove the special bullet character in this column and using replace. This is still returning yay in the result. What am I missing?
Per SeaBean's comment, add regex=True to the call to .replace():
users_df['LASTNAME_TEST'] = users_df['LASTNAME'].replace(u'•', '', regex=True)
^^^^^^^^^^

How to slice all of the elements of pandas dataframe at once?

I have the following data stored in my Pandas datframe:
Factor SimTime RealTime SimStatus
0 Factor[0.48] SimTime[83.01] RealTime[166.95] Paused[F]
1 Factor[0.48] SimTime[83.11] RealTime[167.15] Paused[F]
2 Factor[0.49] SimTime[83.21] RealTime[167.36] Paused[F]
3 Factor[0.48] SimTime[83.31] RealTime[167.57] Paused[F]
I want to create a new dataframe with only everything within [].
I am attempting to use the following code:
df = dataframe.apply(lambda x: x.str.slice(start=x.str.find('[')+1, stop=x.str.find(']')))
However, all I see in df is NaN. Why? What's going on? What should I do to achieve the desired behavior?
You can use regex to replace the contents.
df.replace(r'\w+\[([\S]+)\]', r'\1', regex=True)
Edit
replace function of pandas DataFrame
Replace values given in to_replace with value
The target string and the value with which it needs to be replaced can be regex expressions. And for that you need to set the regex=True in the arguments to replace
https://regex101.com/r/7KCs6q/1
Look at the above link to see the explanation of the regular expression in detail.
Basically it is using the non whitespace content within the square brackets as the value and any string with some characters followed by square brackets with non whitespace characters as the target string.

Categories

Resources