I am just starting pandas so please forgive if this is something stupid.
I am trying to apply a function to a column but its not working and i don't see any errors also.
capitalizer = lambda x: x.upper()
for df in pd.read_csv(downloaded_file, chunksize=2, compression='gzip', low_memory=False):
df['level1'].apply(capitalizer)
print df
exit(1)
This print shows the level1 column values same as the original csv not doing upper. Am i missing something here ?
Thanks
apply is not an inplace function - it does not modify values in the original object, so you need to assign it back:
df['level1'] = df['level1'].apply(capitalizer)
Alternatively, you can use str.upper, it should be much faster.
df['level1'] = df['level1'].str.upper()
df['level1'] = map(lambda x: x.upper(), df['level1'])
you can use above code to make your column uppercase
Related
I'm trying to decode html characters within a pandas dataframe.
I don't know why but my apply function won't work.
# requirements
import html
import pandas as pd
# This code works fine.
df = df.apply(lambda x: x + "TESTSTRING")
print(df) # "TESTSTRING" is appended to all values.
# This code also works fine. html.unescape() is working well.
fn = lambda x: html.unescape(x)
str = "Someting wrong with <b>E&S</b>"
print(fn(str)) # returns "Something wrong with <b>E&S</b>"
# However, the code below doesn't work. The "&" within the values dont' get decoded.
df2 = df.apply(fn)
print(df2) # The html characters aren't decoded!
It's really frustrating that the apply function and html.unescape() is working well separately, but I don't know why they don't work when they are together.
I've also tried axis=1
I'd really appreciate your help. Thanks in advance.
The problem is that html.unexcape() seems unvectorized, i.e. it accepts only one single string.
In case Your df is not really large, using applymap should still be sufficiently fast:
df2 = df.applymap(lambda x: html.unescape(x))
print(df2)
I am quite new to Python programming.
I am working with the following dataframe:
Before
Note that in column "FBgn", there is a mix of FBgn and FBtr string values. I would like to replace the FBtr-containing values with FBgn values provided in the adjacent column called "## FlyBase_FBgn". However, I want to keep the FBgn values in column "FBgn". Maybe keep in mind that I am showing only a portion of the dataframe (reality: 1432 rows). How would I do that? I tried the replace() method from Pandas, but it did not work.
This is actually what I would like to have:
After
Thanks a lot!
With Pandas, you could try:
df.loc[df["FBgn"].str.contains("FBtr"), "FBgn"] = df["## FlyBase_FBgn"]
Welcome to stackoverflow. Please next time provide more info including your code. It is always helpful
Please see the code below, I think you need something similar
import pandas as pd
#ignore the dict1, I just wanted to recreate your df
dict1= {"FBgn": ['FBtr389394949', 'FBgn3093840', 'FBtr000025'], "FBtr": ['FBgn546466646', '', 'FBgn15565555']}
df = pd.DataFrame(dict1) #recreating your dataframe
#print df
print(df)
#function to replace the values
def replace_values(df):
for i in range(0, (df.size//2)):
if 'tr' in df['FBgn'][i]:
df['FBgn'][i] = df['FBtr'][i]
return df
df = replace_values(df)
#print new df
print(df)
The top table is what I have and the bottom is what I want. I'm doing this in a Pandas dataframe. Any help would be appreciated.
Thanks!
It would have been nice if you provided a code snippet for this since we are unable to easily test your case.
The following lines should do the job:
df['label'] = df['sentiment'].apply(lambda x: x[0]['label'])
df['score'] = df['sentiment'].apply(lambda x: x[0]['score'])
Is there a more conventional or readable way of achieving the same result?
The below code works, but it feels clunky.
df[df.columns[~df.columns.isin(['name'])]]
Use DataFrame.loc, here : means select all rows:
df.loc[:, ~df.columns.isin(['name'])]
Another idea is use DataFrame.drop with errors='ignore' for avoid errors if not exist name column (same working like solution above):
df.drop('name', axis=1, errors='ignore')
I am new to python. Can you not use:
df.loc[:, df.columns != 'name']
This has been killing me!
Any idea how to convert this to a list comprehension?
for x in dataframe:
if dataframe[x].value_counts().sum()<=1:
dataframe.drop(x, axis=1, inplace=True)
[dataframe.drop(x, axis=1, inplace=True) for x in dataframe if dataframe[x].value_counts().sum() <= 1]
I have not used pandas yet, but the documentation on dataframe.drop says it returns a new object, so I assume it will work.
I would probably suggest going the other way and filtering it, I don't know your dataframe but something like this should work:
counts_valid = df.T.apply(pd.value_counts()).sum() > 1
df = df[counts_valid]
Or, if I see what you are doing, you may be better with
counts_valid = df.T.nunique() > 1
df = df[counts_valid]
That will just keep rows that have more than one unique value.