Deleting columns from a csv if it contains a certain value - python

I have a csv from which I want to drop the columns which has only '-' values in it. These are the columns I want to drop:
How can I do this?

Use DataFrame.ne for test not - value with DataFrame.all for test if not exist in all rows anf filter by DataFrame.loc - first : means al rows and second is mask for filter columns:
df = df.loc[:, df.ne('-').all()]

Related

How to drop columns which contains specific characters except one column?

Pandas dataframe is having 5 columns which contains 'verifier' in it. I want to drop all those columns which contained 'verified in it except the column named 'verified_90'(Pandas dataframe). I am trying following code but it is removing all columns which contains that specific word.
Column names: verified_30 verified_60 verified_90 verified_365 logo.verified. verified.at etc
''''
df = df[df.columns.drop(list(df.filter(regex='Test')))]
''''
You might be able to use a regex approach here:
df = df[df.columns.drop(list(df.filter(regex='^(?!verified_90$).*verified.*$')))]
Filter columns with not verified OR with verified_90 in DataFrame.loc, here : means select all rows and columns by mask:
df.loc[:, ~df.columns.str.contains('verified') | (df.columns == 'verified_90')]

drop rows based on string in pandas dataframe

I want to drop rows based on condition where the value is "Bad Input". The datatype of those columns are object.
I have tried df.drop method with label input but doesn't work.
Please advise.
Filter rows if not contains values Bad Input by DataFrame.ne with DataFrame.all and filter in boolean indexing:
df = df[df.ne('Bad Input').all(axis=1)]

How to index a pandas DataFrame element in last column based on criteria being met in two other columns?

A pandas dataframe has 4 columns:
df.columns = ['col1', 'col2', 'question', 'answer']
How do I index a single entry of the 'answer' column, by indexing the dataframe based on criteria being met for the first columns?
i.e.:
df['col1'=='apple' and 'col2'=='guitar'].answer
You can select values after filtering, but not recommended, because if set values similar way possible warning:
s = df.loc[(df['col1']=='apple') & (df['col2']=='guitar')].answer
Better way is use DataFrame.loc for filter by mask and by column name:
s = df.loc[(df['col1']=='apple') & (df['col2']=='guitar'), 'answer']
Or using DataFrame.query:
s = df.query("col1=='apple' and col2=='guitar'").answer
Output is one or more values in Series, if need first one to scalar:
first = s.iat[0]
If need solution working also if no match:
first = next(iter(s), 'no match')

Drop rows with no values after checking all columns in Pandas Python

I have a dataframe like below.
I would like to check all columns and delete rows if no values.
You can check with dropna
df = df.dropna(how = 'all')
df.dropna()
Check the Pandas Docs here for more info
Use the dropna() function for your dataframe.
df.dropna(axis=0, how="all")
axis=0 performs deletion on rows
how="all" deletes the rows if all the columns for that row are empty. Use " any" if you want the row to be deleted in case of any missing column. You can also use the thresh=<int> parameter to delete the row if the number of missing values exceeds the threshold.

Get the column index as a value when value exists

I need do create some additional columns to my table or separate table based on following:
I have a table
and I need to create additional columns where the column indexes (names of columns) will be inserted as values. Like this:
How to do it in pandas? Any ideas?
Thank you
If need matched columns only for 1 values:
df = (df.set_index('name')
.eq(1)
.dot(df.columns[1:].astype(str) + ',')
.str.rstrip(',')
.str.split(',', expand=True)
.add_prefix('c')
.reset_index())
print (df)
Explanation:
Idea is create boolean mask with True for values which are replaced by columns names - so compare by DataFrame.eq by 1 and used matrix multiplication by DataFrame.dot by all columns without first with added separator. Then remove last traling separator by Series.str.rstrip and use Series.str.split for new column, changed columns names by DataFrame.add_prefix.
Another solution:
df1 = df.set_index('name').eq(1).apply(lambda x: x.index[x].tolist(), 1)
df = pd.DataFrame(df1.values.tolist(), index=df1.index).add_prefix('c').reset_index()

Categories

Resources