I have a column in my dataframe like this:
range
"(2,30)"
"(50,290)"
"(400,1000)"
...
and I want to replace the , comma with - dash. I'm currently using this method but nothing is changed.
org_info_exc['range'].replace(',', '-', inplace=True)
Can anybody help?
Use the vectorised str method replace:
df['range'] = df['range'].str.replace(',','-')
df
range
0 (2-30)
1 (50-290)
EDIT: so if we look at what you tried and why it didn't work:
df['range'].replace(',','-',inplace=True)
from the docs we see this description:
str or regex: str: string exactly matching to_replace will be replaced
with value
So because the str values do not match, no replacement occurs, compare with the following:
df = pd.DataFrame({'range':['(2,30)',',']})
df['range'].replace(',','-', inplace=True)
df['range']
0 (2,30)
1 -
Name: range, dtype: object
here we get an exact match on the second row and the replacement occurs.
For anyone else arriving here from Google search on how to do a string replacement on all columns (for example, if one has multiple columns like the OP's 'range' column):
Pandas has a built in replace method available on a dataframe object.
df.replace(',', '-', regex=True)
Source: Docs
If you only need to replace characters in one specific column, somehow regex=True and in place=True all failed, I think this way will work:
data["column_name"] = data["column_name"].apply(lambda x: x.replace("characters_need_to_replace", "new_characters"))
lambda is more like a function that works like a for loop in this scenario.
x here represents every one of the entries in the current column.
The only thing you need to do is to change the "column_name", "characters_need_to_replace" and "new_characters".
Replace all commas with underscore in the column names
data.columns= data.columns.str.replace(' ','_',regex=True)
In addition, for those looking to replace more than one character in a column, you can do it using regular expressions:
import re
chars_to_remove = ['.', '-', '(', ')', '']
regular_expression = '[' + re.escape (''. join (chars_to_remove)) + ']'
df['string_col'].str.replace(regular_expression, '', regex=True)
Almost similar to the answer by Nancy K, this works for me:
data["column_name"] = data["column_name"].apply(lambda x: x.str.replace("characters_need_to_replace", "new_characters"))
If you want to remove two or more elements from a string, example the characters '$' and ',' :
Column_Name
===========
$100,000
$1,100,000
... then use:
data.Column_Name.str.replace("[$,]", "", regex=True)
=> [ 100000, 1100000 ]
Related
I need to replace substrings in a column value in dataframe
Example: I have this column 'code' in a dataframe (in really, the dataframe is very large)
3816R(motor) #I need '3816R'
97224(Eletro)
502812(Defletor)
97252(Defletor)
97525(Eletro)
5725 ( 56)
And I have this list to replace the values:
list = ['(motor)', '(Eletro)', '(Defletor)', '(Eletro)', '( 56)']
I've tried a lot of methods, like:
df['code'] = df['code'].str.replace(list, '')
And regex= True, but anyone method worked to remove the substrings.
How can I do that?
You can try regex replace and regex or condition: https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html
https://www.ocpsoft.org/tutorials/regular-expressions/or-in-regex/
l = ['(motor)', '(Eletro)', '(Defletor)', '( 56)']
l = [s.replace('(', '\(').replace(')', '\)') for s in l]
regex_str = f"({'|'.join(l)})"
df['code'] = df['code'].str.replace(regex_str, '', regex=True)
The regex_str will end up with something like
"(\(motor\)|\(Eletro\)|\(Defletor\)|\( 56\))"
If you are certain any and all rows follow the format provided, you could attempt the following by using a lambda function:
df['code_clean'] = df['code'].apply(lambda x: x.split('(')[0])
You can try the regular expression match method:
https://docs.python.org/3/library/re.html#re.Pattern.match
df['code'] = df['code'].apply(lambda x: re.match(r'^(\w+)\(\w+\)',x).group(1))
The first part of the regular expression ^(\w+), creates a capturing group of any letters or numbers before encountering a parenthesis. The group(1) then extracts the text.
str.replace will work with one string not a list of strings.. you could probably loop through it
rmlist = ['(motor)', '(Eletro)', '(Defletor)', '(Eletro)', '( 56)']
for repl in rmlist:
df['code'] = df['code'].str.replace(repl, '')
alternatively if your bracketed substring is at the end.. split it at "("
and discard additional column generated..will be faster for sure
df["code"]=df["code"].str.split(pat="(",n=1,expand=True)[0]
str.split is reasonably fast
I'm trying to replace symbols from object in python, I used
df_summary.replace('\(|\)!,"-', '', regex=True)
but it didn't change anything.
The replace function is not in place. This means that your dataframe will be unchanged, and the result is returned as the return value of the replac function.
You can the the inplace parameter of replace:
df_summary.replace('\(|\)!,"-', '', regex=True, inplace=True)
Most of the pandas functions are note in place and require if needed either the inplace argument, or the assignement of the result to a new dataframe.
You can either do
df_summary.replace('\(|\)!,"-', '', regex=True, inplace=True)
or
df_summary = df_summary.replace('\(|\)!,"-', '', regex=True)
When you only do df_summary.replace..., this line returns you a pandas list. You forgot to save it. Please add comments to further assist you
Apart from addind regex=True or setting the result to df_summary again, your are using a pattern:
`\(|\)!,"-`
That matches matches either ( OR the string \)!,"-
As you are referring to symbols, and you want to replace all separate chars ( ) ! , " - you can use a repeated character class [()!,"-]+ to replace multiple consecutive matches at once.
df_summary.replace('[()!,"-]+', '', regex=True, inplace=True)
I have a sample dataframe below.
df = pd.DataFrame({'col1' : ['The IO operation at logical block address 0x0 for Disk1 (PDO name: \\Device00024','fddasfsa'],'col2': [1,2])
I like to replace the characters between 'Device' and ')' to 'xxxxxx'. Is it possible to do such replacement in pandas?
I thought I can do the following. The code ran with no issue but the replacement never happen.
df['col1'] = df['col1'].replace(r'\\Device(.*)', 'xxxxxx,regex=True)
You could use str.replace here:
df["col1"] = df["col1"].str.replace(r'\bDevice\d+', 'Devicexxxxxx')
The code sample you gave above won't even compile, but it actually looks on the right track. You made the same mistake I initially made here. You need to include Device in the replacement, not just xxxxxx, as your regex match will consume the device string along with the numbers.
Just replace the digits immediately to the left of Device. Code below
df['col1'].str.replace('(?<!Device)\d+','xxxxx')
Another solution, if you want to have the same number of x as digits:
df["col1"] = df["col1"].str.replace(
r"(?<=Device)(\d+)", lambda g: "x" * len(g.group(1)), regex=True
)
print(df)
Prints:
col1 col2
0 adbsdfklj (\Devicexxxxxx) 1
1 fddasfsa 2
I have a pandas dataframe that consists of strings. I would like to remove the n-th character from the end of the strings. I have the following code:
DF = pandas.DataFrame({'col': ['stri0ng']})
DF['col'] = DF['col'].str.replace('(.)..$','')
Instead of removing the third to the last character (0 in this case), it removes 0ng. The result should be string but it outputs stri. Where am I wrong?
You may want to rather replace a single character followed by n-1 characters at the end of the string:
DF['col'] = DF['col'].str.replace('.(?=.{2}$)', '')
col
0 string
If you want to make sure you're only removing digits (so that 'string' in one special row doesn't get changed to 'strng'), then use something like '[0-9](?=.{2}$)' as pattern.
Another way using pd.Series.str.slice_replace:
df['col'].str.slice_replace(4,5,'')
Output:
0 string
Name: col, dtype: object
I have a DataFrame of 3 columns. 2 of the columns I wish to manipulate with are Dog_Summary and Dog_Description. These columns are strings and I wish to remove any punctuation they may have.
I have tried the following:
df[['Dog_Summary', 'Dog_Description']] = df[['Dog_Summary', 'Dog_Description']].apply(lambda x: x.str.translate(None, string.punctuation))
For the above I get an error saying:
ValueError: ('deletechars is not a valid argument for str.translate in python 3. You should simply specify character deletions in the table argument', 'occurred at index Summary')
The second way I tried was:
df[['Dog_Summary', 'Dog_Description']] = df[['Dog_Summary', 'Dog_Description']].apply(lambda x: x.replace(string.punctuation, ' '))
However, it still does not work!
Can anyone give me suggestions or advice
Thanks! :)
I wish to remove any punctuation it may have.
You can use a regular expression and string.punctuation for this:
>>> import pandas as pd
>>> from string import punctuation
>>> s = pd.Series(['abcd$*%&efg', ' xyz#)$(#rst'])
>>> s.str.replace(rf'[{punctuation}]', '')
0 abcdefg
1 xyzrst
dtype: object
The first argument to .str.replace() can be a regular expression. In this case, you can use f-strings and a character class to catch any of the punctuation characters:
>>> rf'[{punctuation}]'
'[!"#$%&\'()*+,-./:;<=>?#[\\]^_`{|}~]' # ' and \ are escaped
If you want to apply this to a DataFrame, just follow what you're doing now:
df.loc[:, cols] = df[cols].apply(lambda s: s.str.replace(rf'[{punctuation}]', ''))
Alternatively, you could use s.replace(rf'[{punctuation}]', '', regex=True) (no .str accessor).