I have a DataFrame 'df' with a string column. I was trying to remove a list of special values from this column.
For example if the column 'number' is: onE1, I want it change to 1; if the column is FOur4, I want it change to 4
I used the following code:
for i in ['onE','TwO','ThRee', 'FOur']:
print(i)
df['new_number'] = df['number'].str.replace(i,'')
Although print(i) shows the i go through the list of strings, the column 'new_number' only removed 'FOur' from column 'number', the rest string 'onE','TwO','ThRee' are still in column 'new_number', which means onE1, is still onE1; but value FOur4 changed to 4 in the column 'new_number'
So what is wrong with this piece of code?
To get the numbers from the string in the dataFrame you can use this:
number = ''.join(x for x in df['number'].str if x.isdigit())
I found a similar post to this question
pandas replace (erase) different characters from strings
we can use regex to solve this issue
df['new_number'] = df['number'].str.replace('onE|TwO|ThRee|FOur','')
Related
I carefully read the post Select by partial string from a pandas DataFrame but I think it doesn't address my problem. I need to filter rows of a dataframe if a row value can be found within the string.
Example, my table is:
Part_Number
A1127
A1347
I want to filter records if column value is within the string ZA1127B.48. The filtered dataframe should contain row 1. (All the posts show how to check if row value contains a string.)
You can use .apply + in operator:
s = "ZA1127B.48"
print(df[df.apply(lambda x: x.Part_Number in s, axis=1)])
Prints:
Part_Number
0 A1127
I think using the apply function will help you to do what you want.
Try this line:
df[df["Part_Number"].apply(lambda x: x in "ZA1127B.48")]
I have a df with one column (SKUID) where I want to remove all the characters that are not numerical. Here is an sample of the column:
Essentially I want to remove the underscore and the letter for each row. I have tried using following code:
sku_data.split('_', 1)[0]
This gives me an error of 'DataFrame' object has no attribute 'split'. Where am I going wrong?
This should do for number extraction:
sku_data.SKUID = sku_data.SKUID.str.extract('(\d+)')
Note: don't forget to add the str operator if you want to perform string operations on a DataFrame column
I have the following Dataframe
If you look at the first row, the string consists of duplicate values; Ex GM0001, GMM003 and so on.
Is it possible to remove those duplicates within each Cell in SITE_ID column ??
You can turn the tuples into sets:
df['SITE_ID_UNIQUE'] = df.SITE_ID.apply(set)
Vinura Perera answer works just fine... provided you are okay with the brackets instead of tuples. It also adds another column to your dataframe. If you need tuples and don't want to create another column try this:
df['SITE'] = [str(set(i)).replace('{', '(').replace('}', ')') for i in df['SITE']]
My pandas dataframe has a column with row values equal to either a string value of https://acbedfgid=123456 (Which is supposed to be a hyperlink but is in string) or "No Proposal" depending on the other row values...
If its the former case,I want to replace the entire string with just the last 6 digits(This is always constant) and if its the later, it should be as it is ("No Proposal")
How do I achieve this in pandas
Thank you?
You could try
df.loc[df['ColName'].ne('No Proposal'), 'ColName'] = df.loc[df['ColName'].ne('No Proposal'), 'ColName'].str[-6:]
or
df['ColName'].where(df['ColName'].eq('No Proposal'), lambda x: x['ColName'][-6:], inplace=True)
I'm trying to remove the percent sign after a value in a pandas dataframe, relevant code:
for i in loansdata:
if i.endswith('%'):
i = i[:-1]
I was thinking that i = i[:-1] would set the new value, but it doesn't. How do I go about it? For clarity: if I print i inside the for loop, it prints without the percent sign. But if I print the whole dataframe, it has not changed.
use str.replace to replace a specific character for a column:
df[col] = df[col].str.replace('%','')
What you're doing depending on what loansdata actually is, is either looping over the columns or the row values of a column.
You can't modify the row contents like that, even if you could you should avoid loops where a vectorised solution exists.
If % exists in multiple cols then you could call the above for each col but this method only exists for str dtypes