How to select rows with no missing data in python? - python

I can only find questions on here that are for selecting rows with missing data using pandas in python.
How can I select rows that are complete and have no missing values?
I am trying to use:
data.notnull() which gives me true or false values per row but I don't know how to do the actual selection of only rows where all values are true for not being NA. Also unsure if notnull() is just considering rows with zeros as false whereas I would accept a zero in a row as a value, I am just looking to find rows with no NAs.

Without seeing your data, if it's in a dataframe df, and you want to drop rows with any missing values, try
newdf = df.dropna(how = 'any')
This is what pandas does by default, so should actually be the same as
newdf = df.dropna()

Related

Drop rows with no values after checking all columns in Pandas Python

I have a dataframe like below.
I would like to check all columns and delete rows if no values.
You can check with dropna
df = df.dropna(how = 'all')
df.dropna()
Check the Pandas Docs here for more info
Use the dropna() function for your dataframe.
df.dropna(axis=0, how="all")
axis=0 performs deletion on rows
how="all" deletes the rows if all the columns for that row are empty. Use " any" if you want the row to be deleted in case of any missing column. You can also use the thresh=<int> parameter to delete the row if the number of missing values exceeds the threshold.

why does pandas drop all of the rows of the dataFrame in this case?

I was trying to drop rows with missing values (NaN) from a 1400 row dataframe with the following code:
df.dropna(axis=0)
and although the dataFrame had 600 missing values total, the resulting dataframe had no rows at all!
To my knowledge, dropna() drops rows with at least one missing value. How is it possible that with 600 total missing values all of the rows are dropped?
in your case, it will drop all rows with missing values. Use IsNa() to verify the row values and check.

Select pandas dataframe column that has NaN or NULL values and fill it with 0's

I have a dataframe that has some missing data in a few different columns. How do I write a function that identifies the columns with missing (i.e. NaN or NULL values) data and fills them with 0's?
I currently have this for inputting specific columns where I already know there is missing data; however I'm trying to come up with a function that finds columns with missing data on its own.
def fill_blanks(dataframe, column):
dataframe[column] = dataframe[column].fillna(0)
you can just use .fillna()
df = df.fillna(0)
or
df.fillna(0, inplace=True)
You can use fillna(0) on entire dataframe:
dataframe = dataframe.fillna(0)
or:
dataframe.fillna(0, inplace=True)
Is this what you are trying to do?

Pandas - Filter DF to contain only rows in which each column is FALSE

I have a dataframe which contains only boolean values. I would like to split this into two dataframes; the first containing rows in which every column's value is False, and the second containing rows in which 1 or more column's value is True.
I know this is a very solvable problem in pandas just having trouble putting together the solution
Thank you
df[-df.any(axis=1)] # All Falses; not any one is True
df[ df.any(axis=1)] # Not all Falses

Find dropped rows in Pandas

I have a Pandas DataFrame of roughly 64,000 rows. It looks roughly like this:
values
asn country
12345 US ...
12345 MX ...
I was running into an error saying that the MultiIndex could not contain non-unique values. This lead me to suspect that I had some NaN value in my index. So I tried the following to verify:
df = # my data frame
rows = df.shape[0]
df = df.reindex(df.index.dropna())
if df.shape[0] < rows:
print "Dropped %s NaN rows!" % (rows - df.shape[0])
As expected, this printed out "Dropped 10 NaN rows!"... although now I'd like to find out which rows were dropped so I can investigate how they got into my DataFrame in the first place.
How can I do this? I've tried looking through the Pandas docs for something like df.index.isna() (no dice) and I've tried taking the "before" and "after" data frames and computing their difference, but wasn't sure how to do this and my attempts led to indexing errors.
You can use MultiIndex.to_frame to get a DataFrame equivalent to your index, then combine isna and any to determine the null rows:
idxr = df.index.to_frame().isna().any(axis=1)
You can now use this to filter your DataFrame via df[idxr] to restrict to rows with a null value in the MultiIndex.
Note: with older versions of pandas you will need to use isnull instead of isna.

Categories

Resources