How to replace a string in pandas column based on a condition? - python

I am trying to check if values in column are numbers and replace them with other strings. I am using the code
df["recs"] = ["45", "emp1", "12", "emp3", "emp4", "emp5"]
recomm = df["recs"]
# check if the values in each row of this column a number
recomm = recomm.str.replace(recomm.isdigit(), 'Number')
But it is generating an error
AttributeError: 'Series' object has no attribute 'isdigit'

You could use str.replace here with the regex pattern ^\d+$:
df["recs"] = df["recs"].str.replace(r'^\d+$', 'Number')
The pattern ^\d+$ targets only string values which are pure numbers.

I'd prefer without regex and with mask:
df['recs'] = df['recs'].mask(df['recs'].str.isdigit(), 'Number')

Related

How do I convert DataFrame column of type "string" to "float" using .replace?

In my DataFrame, the "Value_String" column consists of strings that are either:
number-like strings starting with dollar sign, and thousands are separated by a comma, [e.g. $1,000]
"None"
Therefore, I tried to create a new column and convert the string to float with the following lambda function:
to_replace = '$,'
df['Value_Float'] = df[df['Value_String'].apply(lambda x: 0 if x == 'None'
else float(x.replace(y, '')) for y in to_replace)]
This actually generates a "TypeError: 'generator' object is not callable".
How can I solve this?
The numpy where method is very helpful for conditionally updating values. In this case where the value is not 'None' we will use the replace function. However since str.replace uses regex by default, we need to change the pattern to a literal dollar sign OR a comma
import pandas as pd
import numpy as np
df = pd.DataFrame({'Value_String':["$1,000","None"]})
df['Value_String'] = np.where(df['Value_String']!='None', df['Value_String'].str.replace('\$|,',''), df['Value_String'])
print(df)
Output
Value_String
0 1000
1 None

Python Pandas - 'DataFrame' object has no attribute 'str' - .str.replace error

I am trying to replace "," by "" for 80 columns in a panda dataframe.
I have create a list of this headers to iterate:
headers = ['h1', 'h2', 'h3'... 'h80']
and then I am using a list of headers to replace multiple columns string value as bellow:
dataFrame[headers] = dataFrame[headers].str.replace(',','')
Which gave me this error : AttributeError: 'DataFrame' object has no attribute 'str'
When I try the same on only one header it works well, and I need to use the "str.replace" because the only "replace" method does sadly not replace the ",".
Thank you
Using df.apply
pd.Series.str.replace is a series method not for data frames. You can use apply on each row/column series instead.
dataFrame[headers] = dataFrame[headers].apply(lambda x: x.str.replace(',',''))
Using df.applymap
Or, you can use applymap and treat each cell as a string and use replace directly on them -
dataFrame[headers] = dataFrame[headers].applymap(lambda x: x.replace(',',''))
Using df.replace
You can also use df.replace which is a method available to replace values in df directly across all columns selected. But, for this purpose you will have to set regex=True
dataFrame[headers] = dataFrame[headers].replace(',','',regex=True)

Using regex to remove rows from column in pandas with loc operator

I have big dataframe which I want to remove all rows where the word 'test' appears in the column Source (object type).
However this word can be in many varieties of forms, ex:
'test'
'Test'
'TESTE'
How can I use case insensitive with regex to remove these rows from my dataframe?
I've tried the following:
mask = df.iloc[:,'Source'].str.contains('/test/ig', regex = True)
df = df.loc[~mask]
iloc is used for integer-location based, use loc for conditional.
And like #sushanth said, use case = False
mask = df.loc[:,'Source'].str.contains('test', case = False)
df = df.loc[~mask]

Python pd using a variable with column name in groupby dot notation

I am trying to use a list that holds the column names for my groupby notation. My end goal is to loop through multiple columns and run the calculation without having to re-write the same line multiple times. Is this possible?
a_list = list(['','BTC_','ETH_'])
a_variable = ('{}ClosePrice'.format(a_list[0]))
proccessing_data['RSI'] = proccessing_data.groupby('Symbol').**a_variable**.transform(lambda x: talib.RSI(x, timeperiod=14))
this is the error I currently get because it thinks I want the column 'a_variable' which doesn't exist.
AttributeError: 'DataFrameGroupBy' object has no attribute 'a_variable'
Apparently this notation below works:
proccessing_data['RSI'] = proccessing_data.groupby('Symbol')[('{}ClosePrice'.format(a_list[0]))].transform(lambda x: talib.RSI(x, timeperiod=14))

Applying uppercase to a column in pandas dataframe

I'm having trouble applying upper case to a column in my DataFrame.
dataframe is df.
1/2 ID is the column head that need to apply UPPERCASE.
The problem is that the values are made up of three letters and three numbers. For example rrr123 is one of the values.
df['1/2 ID'] = map(str.upper, df['1/2 ID'])
I got an error:
TypeError: descriptor 'upper' requires a 'str' object but received a 'unicode' error.
How can I apply upper case to the first three letters in the column of the DataFrame df?
If your version of pandas is a recent version then you can just use the vectorised string method upper:
df['1/2 ID'] = df['1/2 ID'].str.upper()
This method does not work inplace, so the result must be assigned back.
This should work:
df['1/2 ID'] = map(lambda x: str(x).upper(), df['1/2 ID'])
and should you want all the columns names to be in uppercase format:
df.columns = map(lambda x: str(x).upper(), df.columns)
str.upper() wants a plain old Python 2 string
unicode.upper() will want a unicode not a string (or you get TypeError: descriptor 'upper' requires a 'unicode' object but received a 'str')
So I'd suggest making use of duck typing and call .upper() on each of your elements, e.g.
df['1/2 ID'].apply(lambda x: x.upper(), inplace=True)

Categories

Resources