I'm having trouble applying upper case to a column in my DataFrame.
dataframe is df.
1/2 ID is the column head that need to apply UPPERCASE.
The problem is that the values are made up of three letters and three numbers. For example rrr123 is one of the values.
df['1/2 ID'] = map(str.upper, df['1/2 ID'])
I got an error:
TypeError: descriptor 'upper' requires a 'str' object but received a 'unicode' error.
How can I apply upper case to the first three letters in the column of the DataFrame df?
If your version of pandas is a recent version then you can just use the vectorised string method upper:
df['1/2 ID'] = df['1/2 ID'].str.upper()
This method does not work inplace, so the result must be assigned back.
This should work:
df['1/2 ID'] = map(lambda x: str(x).upper(), df['1/2 ID'])
and should you want all the columns names to be in uppercase format:
df.columns = map(lambda x: str(x).upper(), df.columns)
str.upper() wants a plain old Python 2 string
unicode.upper() will want a unicode not a string (or you get TypeError: descriptor 'upper' requires a 'unicode' object but received a 'str')
So I'd suggest making use of duck typing and call .upper() on each of your elements, e.g.
df['1/2 ID'].apply(lambda x: x.upper(), inplace=True)
Related
I am using the groupby method to combine the cell values (strings) of a column:
dfz = dfy.groupby('Type ').agg({'Initial ':lambda x: '/'.join(x.unique()),2021:'sum',2022:'sum'}).reset_index()
The following error message appears:
TypeError: Sequence item 0: expected str instance, float found
join method input a tuple or array of strings, try this:
dfz = dfy.groupby('Type ').agg({'Initial ':lambda x: '/'.join([str(r) for r in x.unique()]),2021:'sum',2022:'sum'}).reset_index()
In my DataFrame, the "Value_String" column consists of strings that are either:
number-like strings starting with dollar sign, and thousands are separated by a comma, [e.g. $1,000]
"None"
Therefore, I tried to create a new column and convert the string to float with the following lambda function:
to_replace = '$,'
df['Value_Float'] = df[df['Value_String'].apply(lambda x: 0 if x == 'None'
else float(x.replace(y, '')) for y in to_replace)]
This actually generates a "TypeError: 'generator' object is not callable".
How can I solve this?
The numpy where method is very helpful for conditionally updating values. In this case where the value is not 'None' we will use the replace function. However since str.replace uses regex by default, we need to change the pattern to a literal dollar sign OR a comma
import pandas as pd
import numpy as np
df = pd.DataFrame({'Value_String':["$1,000","None"]})
df['Value_String'] = np.where(df['Value_String']!='None', df['Value_String'].str.replace('\$|,',''), df['Value_String'])
print(df)
Output
Value_String
0 1000
1 None
I've done some searching and can't figure out how to filter a dataframe by
df["col"].str.contains(word)
however I'm wondering if there is a way to do the reverse: filter a dataframe by that set's compliment. eg: to the effect of
!(df["col"].str.contains(word))
Can this be done through a DataFrame method?
You can use the invert (~) operator (which acts like a not for boolean data):
new_df = df[~df["col"].str.contains(word)]
where new_df is the copy returned by RHS.
contains also accepts a regular expression...
If the above throws a ValueError or TypeError, the reason is likely because you have mixed datatypes, so use na=False:
new_df = df[~df["col"].str.contains(word, na=False)]
Or,
new_df = df[df["col"].str.contains(word) == False]
I was having trouble with the not (~) symbol as well, so here's another way from another StackOverflow thread:
df[df["col"].str.contains('this|that')==False]
You can use Apply and Lambda :
df[df["col"].apply(lambda x: word not in x)]
Or if you want to define more complex rule, you can use AND:
df[df["col"].apply(lambda x: word_1 not in x and word_2 not in x)]
I hope the answers are already posted
I am adding the framework to find multiple words and negate those from dataFrame.
Here 'word1','word2','word3','word4' = list of patterns to search
df = DataFrame
column_a = A column name from DataFrame df
values_to_remove = ['word1','word2','word3','word4']
pattern = '|'.join(values_to_remove)
result = df.loc[~df['column_a'].str.contains(pattern, case=False)]
I had to get rid of the NULL values before using the command recommended by Andy above. An example:
df = pd.DataFrame(index = [0, 1, 2], columns=['first', 'second', 'third'])
df.ix[:, 'first'] = 'myword'
df.ix[0, 'second'] = 'myword'
df.ix[2, 'second'] = 'myword'
df.ix[1, 'third'] = 'myword'
df
first second third
0 myword myword NaN
1 myword NaN myword
2 myword myword NaN
Now running the command:
~df["second"].str.contains(word)
I get the following error:
TypeError: bad operand type for unary ~: 'float'
I got rid of the NULL values using dropna() or fillna() first and retried the command with no problem.
To negate your query use ~. Using query has the advantage of returning the valid observations of df directly:
df.query('~col.str.contains("word").values')
Additional to nanselm2's answer, you can use 0 instead of False:
df["col"].str.contains(word)==0
somehow '.contains' didn't work for me but when I tried with '.isin' as mentioned by #kenan in the answer (How to drop rows from pandas data frame that contains a particular string in a particular column?) it works. Adding further, if you want to look at the entire dataframe and remove those rows which has the specific word (or set of words) just use the loop below
for col in df.columns:
df = df[~df[col].isin(['string or string list separeted by comma'])]
just remove ~ to get the dataframe that contains the word
To compliment to the above question, if someone wants to remove all the rows with strings, one could do:
df_new=df[~df['col_name'].apply(lambda x: isinstance(x, str))]
I am trying to replace "," by "" for 80 columns in a panda dataframe.
I have create a list of this headers to iterate:
headers = ['h1', 'h2', 'h3'... 'h80']
and then I am using a list of headers to replace multiple columns string value as bellow:
dataFrame[headers] = dataFrame[headers].str.replace(',','')
Which gave me this error : AttributeError: 'DataFrame' object has no attribute 'str'
When I try the same on only one header it works well, and I need to use the "str.replace" because the only "replace" method does sadly not replace the ",".
Thank you
Using df.apply
pd.Series.str.replace is a series method not for data frames. You can use apply on each row/column series instead.
dataFrame[headers] = dataFrame[headers].apply(lambda x: x.str.replace(',',''))
Using df.applymap
Or, you can use applymap and treat each cell as a string and use replace directly on them -
dataFrame[headers] = dataFrame[headers].applymap(lambda x: x.replace(',',''))
Using df.replace
You can also use df.replace which is a method available to replace values in df directly across all columns selected. But, for this purpose you will have to set regex=True
dataFrame[headers] = dataFrame[headers].replace(',','',regex=True)
I want to strip white spaces from all values in lists in the column 'Terms' in my dataframe:
df['Terms'] = df['Terms'].map(lambda x : x.strip())
This throws an error as the df['Terms'] type is list. Any help is appreciated.
AttributeError: 'list' object has no attribute 'values'
ANSWER:
I created a function and then apply it to the column of the dataframe:
def strip_element(my_list):
return [x.strip() for x in my_list]
df['Terms']=df['Terms'].apply(strip_element)
Agree with Rakesh, df['Terms'] = df['Terms'].str.strip() is the best solution, but since he already gave the solution, ypu can change map to apply:
df['Terms'] = df['Terms'].apply(lambda x: x.strip())