Drop rows with no values after checking all columns in Pandas Python - python

I have a dataframe like below.
I would like to check all columns and delete rows if no values.

You can check with dropna
df = df.dropna(how = 'all')

df.dropna()
Check the Pandas Docs here for more info

Use the dropna() function for your dataframe.
df.dropna(axis=0, how="all")
axis=0 performs deletion on rows
how="all" deletes the rows if all the columns for that row are empty. Use " any" if you want the row to be deleted in case of any missing column. You can also use the thresh=<int> parameter to delete the row if the number of missing values exceeds the threshold.

Related

Transpose column/row, change column name and reset index

I have a Pandas DF and I need to:
Transpose my columns to rows,
Transform these rows to indexes,
Set the actual columns as titles for each columns (and not as part of rows)
How can I do that?
Here is my DF before the transpostion:
Here is my Df after my failed transposition:
After transposing, use:
df.columns = df.iloc[0]
to set column headers to the first row.
Then use the 'set_axis()' function to set indices for your rows. An explanation for this function is linked
here

How to select rows with no missing data in python?

I can only find questions on here that are for selecting rows with missing data using pandas in python.
How can I select rows that are complete and have no missing values?
I am trying to use:
data.notnull() which gives me true or false values per row but I don't know how to do the actual selection of only rows where all values are true for not being NA. Also unsure if notnull() is just considering rows with zeros as false whereas I would accept a zero in a row as a value, I am just looking to find rows with no NAs.
Without seeing your data, if it's in a dataframe df, and you want to drop rows with any missing values, try
newdf = df.dropna(how = 'any')
This is what pandas does by default, so should actually be the same as
newdf = df.dropna()

How to find if a values exists in all rows of a dataframe?

I have an array of unique elements and a dataframe.
I want to find out if the elements in the array exist in all the row of the dataframe.
p.s- I am new to python.
This is the piece of code I've written.
for i in uniqueArray:
for index,row in newDF.iterrows():
if i in row['MKT']:
#do something to find out if the element i exists in all rows
Also, this way of iterating is quite expensive, is there any better way to do the same?
Thanks in Advance.
Pandas allow you to filter a whole column like if it was Excel:
import pandas
df = pandas.Dataframe(tableData)
Imagine your columns names are "Column1", "Column2"... etc
df2 = df[ df["Column1"] == "ValueToFind"]
df2 now has only the rows that has "ValueToFind" in df["Column1"]. You can concatenate several filters and use AND OR logical doors.
You can try
for i in uniqueArray:
if newDF['MKT'].contains(i).any():
# do your task
You can use isin() method of pd.Series object.
Assuming you have a data frame named df and you check if your column 'MKT' includes any items of your uniqueArray.
new_df = df[df.MKT.isin(uniqueArray)].copy()
new_df will only contain the rows where values of MKT is contained in unique Array.
Now do your things on new_df, and join/merge/concat to the former df as you wish.

Replace values in column based on condition, then return dataframe

I'd like to replace some values in the first row of a dataframe by a dummy.
df[[0]].replace(["x"], ["dummy"])
The problem here is that the values in the first column are replaced, but not as part of the dataframe.
print(df)
yields the dataframe with the original data in column 1. I've tried
df[(df[[0]].replace(["x"], ["dummy"]))]
which doesn't work either..
replace returns a copy of the data by default, so you need to either overwrite the df by self-assign or pass inplace=True:
df[[0]].replace(["x"], ["dummy"], inplace=True)
or
df[0] = df[[0]].replace(["x"], ["dummy"])
see the docs

How to set in pandas the first column and row as index?

When I read in a CSV, I can say pd.read_csv('my.csv', index_col=3) and it sets the third column as index.
How can I do the same if I have a pandas dataframe in memory? And how can I say to use the first row also as an index? The first column and row are strings, rest of the matrix is integer.
You can try this regardless of the number of rows
df = pd.read_csv('data.csv', index_col=0)
Making the first (or n-th) column the index in increasing order of verboseness:
df.set_index(list(df)[0])
df.set_index(df.columns[0])
df.set_index(df.columns.tolist()[0])
Making the first (or n-th) row the index:
df.set_index(df.iloc[0].values)
You can use both if you want a multi-level index:
df.set_index([df.iloc[0], df.columns[0]])
Observe that using a column as index will automatically drop it as column. Using a row as index is just a copy operation and won't drop the row from the DataFrame.
Maybe try set_index()?
df = df.set_index([2])
Maybe try df = pd.read_csv(header = 0)

Categories

Resources