I have a csv from which I want to drop the columns which has only '-' values in it. These are the columns I want to drop:
How can I do this?
Use DataFrame.ne for test not - value with DataFrame.all for test if not exist in all rows anf filter by DataFrame.loc - first : means al rows and second is mask for filter columns:
df = df.loc[:, df.ne('-').all()]
Related
Pandas dataframe is having 5 columns which contains 'verifier' in it. I want to drop all those columns which contained 'verified in it except the column named 'verified_90'(Pandas dataframe). I am trying following code but it is removing all columns which contains that specific word.
Column names: verified_30 verified_60 verified_90 verified_365 logo.verified. verified.at etc
''''
df = df[df.columns.drop(list(df.filter(regex='Test')))]
''''
You might be able to use a regex approach here:
df = df[df.columns.drop(list(df.filter(regex='^(?!verified_90$).*verified.*$')))]
Filter columns with not verified OR with verified_90 in DataFrame.loc, here : means select all rows and columns by mask:
df.loc[:, ~df.columns.str.contains('verified') | (df.columns == 'verified_90')]
I want to drop rows based on condition where the value is "Bad Input". The datatype of those columns are object.
I have tried df.drop method with label input but doesn't work.
Please advise.
Filter rows if not contains values Bad Input by DataFrame.ne with DataFrame.all and filter in boolean indexing:
df = df[df.ne('Bad Input').all(axis=1)]
A pandas dataframe has 4 columns:
df.columns = ['col1', 'col2', 'question', 'answer']
How do I index a single entry of the 'answer' column, by indexing the dataframe based on criteria being met for the first columns?
i.e.:
df['col1'=='apple' and 'col2'=='guitar'].answer
You can select values after filtering, but not recommended, because if set values similar way possible warning:
s = df.loc[(df['col1']=='apple') & (df['col2']=='guitar')].answer
Better way is use DataFrame.loc for filter by mask and by column name:
s = df.loc[(df['col1']=='apple') & (df['col2']=='guitar'), 'answer']
Or using DataFrame.query:
s = df.query("col1=='apple' and col2=='guitar'").answer
Output is one or more values in Series, if need first one to scalar:
first = s.iat[0]
If need solution working also if no match:
first = next(iter(s), 'no match')
I have a dataframe like below.
I would like to check all columns and delete rows if no values.
You can check with dropna
df = df.dropna(how = 'all')
df.dropna()
Check the Pandas Docs here for more info
Use the dropna() function for your dataframe.
df.dropna(axis=0, how="all")
axis=0 performs deletion on rows
how="all" deletes the rows if all the columns for that row are empty. Use " any" if you want the row to be deleted in case of any missing column. You can also use the thresh=<int> parameter to delete the row if the number of missing values exceeds the threshold.
I need do create some additional columns to my table or separate table based on following:
I have a table
and I need to create additional columns where the column indexes (names of columns) will be inserted as values. Like this:
How to do it in pandas? Any ideas?
Thank you
If need matched columns only for 1 values:
df = (df.set_index('name')
.eq(1)
.dot(df.columns[1:].astype(str) + ',')
.str.rstrip(',')
.str.split(',', expand=True)
.add_prefix('c')
.reset_index())
print (df)
Explanation:
Idea is create boolean mask with True for values which are replaced by columns names - so compare by DataFrame.eq by 1 and used matrix multiplication by DataFrame.dot by all columns without first with added separator. Then remove last traling separator by Series.str.rstrip and use Series.str.split for new column, changed columns names by DataFrame.add_prefix.
Another solution:
df1 = df.set_index('name').eq(1).apply(lambda x: x.index[x].tolist(), 1)
df = pd.DataFrame(df1.values.tolist(), index=df1.index).add_prefix('c').reset_index()