drop rows based on string in pandas dataframe - python

I want to drop rows based on condition where the value is "Bad Input". The datatype of those columns are object.
I have tried df.drop method with label input but doesn't work.
Please advise.

Filter rows if not contains values Bad Input by DataFrame.ne with DataFrame.all and filter in boolean indexing:
df = df[df.ne('Bad Input').all(axis=1)]

Related

filter rows based on a series of columns

i have a dataframe with 50 columns and 7777 rows. the first 4 columns are object type and all the remaining are int type. i would to like to filter the dataframe if all the columns from column 3 to column 50 are zero. kindly help me filter using python
tried :
df.apply(lambda row: row[df.iloc[:,3:].isin(['0'])])
error:
TypeError: Indexing a Series with DataFrame is not supported, use the appropriate DataFrame column
Your filter condition is - rows where all of the values for columns 3+ equal 0. Use DataFrame.loc with the boolean result.
df.loc[(df.iloc[:,3:]=='0').all(axis=1)]
Indexing and selecting data

Drop rows in a dataframe based on the data type of columns

We have a dataset called df and we have a column in it called 'team_id' in the dataframe. And the df[team_id].dtype is object.
And you can see from the attached image that most of the data in the team_id column is a string except there are two 58 which are integer. And what I want is to drop all the rows related to 58, that is to say how could I delete all the rows in a dataframe that its team_id data type is not string(is 58). Thanks!
My idea is like:
df_clear = df.drop(df[df['team_id'].dtype != object].index)
You can use str.isnumeric() method:
df_clear=df[~df['team_id'].str.isnumeric()]
OR
other way is to use str.isalpha() method:
df_clear=df[df['team_id'].str.isalpha()]
OR
another way is to use pd.to_numeric() and notna() method:
df_clear=df[~pd.to_numeric(df['team_id'],errors='coerce').notna()]
Note: only to_numeric() method can handle NaN's so If NaN's exists in column 'team_id' then you have to use dropna() method for using str.isnumeric() and .str.isalpha()
Use to_numeric with errors='coerce' for missing values if no numeric, so then filter in Series.isna in boolean indexing:
df_clear = df[pd.to_numeric(df['team_id'], errors='coerce').isna()]

Deleting columns from a csv if it contains a certain value

I have a csv from which I want to drop the columns which has only '-' values in it. These are the columns I want to drop:
How can I do this?
Use DataFrame.ne for test not - value with DataFrame.all for test if not exist in all rows anf filter by DataFrame.loc - first : means al rows and second is mask for filter columns:
df = df.loc[:, df.ne('-').all()]

How to select rows with no missing data in python?

I can only find questions on here that are for selecting rows with missing data using pandas in python.
How can I select rows that are complete and have no missing values?
I am trying to use:
data.notnull() which gives me true or false values per row but I don't know how to do the actual selection of only rows where all values are true for not being NA. Also unsure if notnull() is just considering rows with zeros as false whereas I would accept a zero in a row as a value, I am just looking to find rows with no NAs.
Without seeing your data, if it's in a dataframe df, and you want to drop rows with any missing values, try
newdf = df.dropna(how = 'any')
This is what pandas does by default, so should actually be the same as
newdf = df.dropna()

Drop rows with no values after checking all columns in Pandas Python

I have a dataframe like below.
I would like to check all columns and delete rows if no values.
You can check with dropna
df = df.dropna(how = 'all')
df.dropna()
Check the Pandas Docs here for more info
Use the dropna() function for your dataframe.
df.dropna(axis=0, how="all")
axis=0 performs deletion on rows
how="all" deletes the rows if all the columns for that row are empty. Use " any" if you want the row to be deleted in case of any missing column. You can also use the thresh=<int> parameter to delete the row if the number of missing values exceeds the threshold.

Categories

Resources