How to drop rows with empty string values in certain columns? - python

df.dropna(subset=[column_name], inplace=True)
I have a dataframe with some missing values in some column (column_name). For example, one value is the empty string, ''. But the code above doesn't drop the row with such empty values.
Isn't the code right way to do it?

The code does not drop those because they are not na they are an empty string. For those you would have to do something like:
rows_to_drop = df[df[column_name]==''].index
df.drop(rows_to_drop, inplace=True)
Alternative:
Something like this would also work:
df = df.loc[df[column_name]!='',:]

Related

Combing Rows in a Single Dataframe

I have a dataframe that looks like this, where there is a new row per ID if one of the following columns has a value. I'm trying to combine on the ID, and just consolidate all of the remaining columns. I've tried every groupby/agg combination and can't get the right output. There are no conflicting column values. So for instance if ID "1" has an email value in row 0, the remaining rows will be empty in the column. So I just need it to sum/consolidate, not concatenate or anything.
my current dataframe:
the output i'm looking to achieve:
# fill Nones in string columns with empty string
df[['email', 'status']] = df[['email', 'status']].fillna('')
df = df.groupby('id').agg('max')
If you still want the index as you shown in desired output,
df = df.reset_index(drop=False)

Remove a row based on two empty columns in python pandas

I want to be able to remove rows that are empty in column NymexPlus and NymexMinus
right now the code I have is
df.dropna(subset=['NymexPlus'], inplace=True)
The thing about this code is that it will also delete rows in the column NymexMinus which I don't want to happen.
Is there an If/AND statement that will work in terms of only getting rid of empty cells for both of the columns?
Use a list as subset parameter and how='all':
df.dropna(subset=['NymexPlus', 'NymexMinus'], how='all', inplace=True)

How to drop all strings in a column using a wildcard?

I have some data that changes regularly but the column headers need to be consistent (so I cant drop the headers) but I need to clear our the strings in a given column.
This is what I have now but this only seems to work for where I know what the string is called and one at a time?
df1= pd.read_csv(r'C:\Users\Test.csv')
df2 = df1.drop(df1[~(df1['column'] != 'String1')].index)
You can use the pd.drop function which removes rows having a specific index from a dataframe.
for i in df.index:
if type(df.loc[i, 'Aborted Reason']) == str:
df.drop(i, inplace = True)
df.drop will remove the index having a string in the relevant column from the dataframe.

Empty cells on dataframe after use explode()

So I'm new to pandas and this is my first notebook. I needed to join some columns of my dataframe and after that, I wanted to separate the values so it would be better to visualize them.
to join the columns I used df['Q7'] = df[['Q7_Part_1', 'Q7_Part_2', 'Q7_Part_3', 'Q7_Part_4', 'Q7_Part_5','Q7_Part_6','Q7_OTHER']].apply(lambda x : '_'.join(x.dropna().astype(str)), axis=1) and it did well, but i still needed to separate the values and for that i used explode() like: df.Q7 = df.Q7.str.split('_').explode('Q7') and that gave me some empty cells on the dataframe like:
Dataframe
and when i try to visualize the values they just come in empty like:
sum of empty cells
What could I do to not show these empty cells on the viz?
Edit 1: By the way, they not appear as null or NaN cells when I do: df.isnull().sum() or df.isna().sum()
c = ['Q7_Part_1', 'Q7_Part_2', 'Q7_Part_3', 'Q7_Part_4', \
'Q7_Part_5','Q7_Part_6','Q7_OTHER']
df['Q7'] = df[c].apply(lambda x : '_'.join(x.astype(str)), axis=1)
I am not able to replicate your issue but my best guess is if you will do the above the dimension of the list will remain intact and you will get string 'nan' values instead of empty strings.

Replace values in column based on condition, then return dataframe

I'd like to replace some values in the first row of a dataframe by a dummy.
df[[0]].replace(["x"], ["dummy"])
The problem here is that the values in the first column are replaced, but not as part of the dataframe.
print(df)
yields the dataframe with the original data in column 1. I've tried
df[(df[[0]].replace(["x"], ["dummy"]))]
which doesn't work either..
replace returns a copy of the data by default, so you need to either overwrite the df by self-assign or pass inplace=True:
df[[0]].replace(["x"], ["dummy"], inplace=True)
or
df[0] = df[[0]].replace(["x"], ["dummy"])
see the docs

Categories

Resources