I am trying to replace "," by "" for 80 columns in a panda dataframe.
I have create a list of this headers to iterate:
headers = ['h1', 'h2', 'h3'... 'h80']
and then I am using a list of headers to replace multiple columns string value as bellow:
dataFrame[headers] = dataFrame[headers].str.replace(',','')
Which gave me this error : AttributeError: 'DataFrame' object has no attribute 'str'
When I try the same on only one header it works well, and I need to use the "str.replace" because the only "replace" method does sadly not replace the ",".
Thank you
Using df.apply
pd.Series.str.replace is a series method not for data frames. You can use apply on each row/column series instead.
dataFrame[headers] = dataFrame[headers].apply(lambda x: x.str.replace(',',''))
Using df.applymap
Or, you can use applymap and treat each cell as a string and use replace directly on them -
dataFrame[headers] = dataFrame[headers].applymap(lambda x: x.replace(',',''))
Using df.replace
You can also use df.replace which is a method available to replace values in df directly across all columns selected. But, for this purpose you will have to set regex=True
dataFrame[headers] = dataFrame[headers].replace(',','',regex=True)
Related
I am trying to check if values in column are numbers and replace them with other strings. I am using the code
df["recs"] = ["45", "emp1", "12", "emp3", "emp4", "emp5"]
recomm = df["recs"]
# check if the values in each row of this column a number
recomm = recomm.str.replace(recomm.isdigit(), 'Number')
But it is generating an error
AttributeError: 'Series' object has no attribute 'isdigit'
You could use str.replace here with the regex pattern ^\d+$:
df["recs"] = df["recs"].str.replace(r'^\d+$', 'Number')
The pattern ^\d+$ targets only string values which are pure numbers.
I'd prefer without regex and with mask:
df['recs'] = df['recs'].mask(df['recs'].str.isdigit(), 'Number')
I am aware that I can find the length of a pd.Series by using pd.Series.str.len() but is there a method to strip the last two characters? I know we can use Python to accomplish this but I was curious to see if it could be done in Pandas.
For example:
$1000.0000
1..0009
456.2233
Would end in :
$1000.00
1..00
456.22
Any insight would be greatly appreciated.
Just do:
import pandas as pd
s = pd.Series(['$1000.0000', '1..0009', '456.2233'])
res = s.str[:-2]
print(res)
Output
0 $1000.00
1 1..00
2 456.22
dtype: object
Pandas supports the built-in string methods through the accessor str, from the documentation:
These are accessed via the str attribute and generally have names
matching the equivalent (scalar) built-in string methods
Try with
df_new = df.astype(str).applymap(lambda x : x[:-2])
Or only one column
df_new = df.astype(str).str[:-2]
Given the following data in an excel sheet (taken in as a dataframe) :
Name Number Date
AA '9988779911' '01-JAN-18'
'BB' '8779912044' '01-FEB-18'
I have used the following code to clean the dataframe and remove the unnecessary apostrophes;
for name in list(df):
df[name] = df[name].str.split("'").str[1]
And I want the following output :
Name Number Date
AA 9988779911 01-JAN-18
BB 8779912044 01-FEB-18
I am getting the following error :
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
Thanks in advance for your help.:):)
try this,
for name in list(df):
df[name] = df[name].str.replace("\'","")
Replace ' with empty character.
simpler approach
df.applymap(lambda x: x.replace("'",""))
Strip function is probably the shortest way here. The other answers are elegant too.
str.strip("'")
Moshevi has said the same in one of the comments.
I am trying to use a list that holds the column names for my groupby notation. My end goal is to loop through multiple columns and run the calculation without having to re-write the same line multiple times. Is this possible?
a_list = list(['','BTC_','ETH_'])
a_variable = ('{}ClosePrice'.format(a_list[0]))
proccessing_data['RSI'] = proccessing_data.groupby('Symbol').**a_variable**.transform(lambda x: talib.RSI(x, timeperiod=14))
this is the error I currently get because it thinks I want the column 'a_variable' which doesn't exist.
AttributeError: 'DataFrameGroupBy' object has no attribute 'a_variable'
Apparently this notation below works:
proccessing_data['RSI'] = proccessing_data.groupby('Symbol')[('{}ClosePrice'.format(a_list[0]))].transform(lambda x: talib.RSI(x, timeperiod=14))
I'm having trouble applying upper case to a column in my DataFrame.
dataframe is df.
1/2 ID is the column head that need to apply UPPERCASE.
The problem is that the values are made up of three letters and three numbers. For example rrr123 is one of the values.
df['1/2 ID'] = map(str.upper, df['1/2 ID'])
I got an error:
TypeError: descriptor 'upper' requires a 'str' object but received a 'unicode' error.
How can I apply upper case to the first three letters in the column of the DataFrame df?
If your version of pandas is a recent version then you can just use the vectorised string method upper:
df['1/2 ID'] = df['1/2 ID'].str.upper()
This method does not work inplace, so the result must be assigned back.
This should work:
df['1/2 ID'] = map(lambda x: str(x).upper(), df['1/2 ID'])
and should you want all the columns names to be in uppercase format:
df.columns = map(lambda x: str(x).upper(), df.columns)
str.upper() wants a plain old Python 2 string
unicode.upper() will want a unicode not a string (or you get TypeError: descriptor 'upper' requires a 'unicode' object but received a 'str')
So I'd suggest making use of duck typing and call .upper() on each of your elements, e.g.
df['1/2 ID'].apply(lambda x: x.upper(), inplace=True)