How to drop columns which contains specific characters except one column? - python

Pandas dataframe is having 5 columns which contains 'verifier' in it. I want to drop all those columns which contained 'verified in it except the column named 'verified_90'(Pandas dataframe). I am trying following code but it is removing all columns which contains that specific word.
Column names: verified_30 verified_60 verified_90 verified_365 logo.verified. verified.at etc
''''
df = df[df.columns.drop(list(df.filter(regex='Test')))]
''''

You might be able to use a regex approach here:
df = df[df.columns.drop(list(df.filter(regex='^(?!verified_90$).*verified.*$')))]

Filter columns with not verified OR with verified_90 in DataFrame.loc, here : means select all rows and columns by mask:
df.loc[:, ~df.columns.str.contains('verified') | (df.columns == 'verified_90')]

Related

Deleting columns from a csv if it contains a certain value

I have a csv from which I want to drop the columns which has only '-' values in it. These are the columns I want to drop:
How can I do this?
Use DataFrame.ne for test not - value with DataFrame.all for test if not exist in all rows anf filter by DataFrame.loc - first : means al rows and second is mask for filter columns:
df = df.loc[:, df.ne('-').all()]

Collapsing values of a Pandas column based on Non-NA value of other column

I have a data like this in a csv file which I am importing to pandas df
I want to collapse the values of Type column by concatenating its strings to one sentence and keeping it at the first row next to date value while keeping rest all rows and values same.
As shown below.
Edit:
You can try ffill + transform
df1=df.copy()
df1[['Number', 'Date']]=df1[['Number', 'Date']].ffill()
df1.Type=df1.Type.fillna('')
s=df1.groupby(['Number', 'Date']).Type.transform(' '.join)
df.loc[df.Date.notnull(),'Type']=s
df.loc[df.Date.isnull(),'Type']=''

Is there a function that can remove multiple rows based on multiple specific column values in a pandas dataframe?

I have a particular Pandas dataframe that has multiple different string categories in a particular column - 'A'. I want to create a new dataframe with only rows that contain 7 separate categories from column A out of about 15.
I know that I can individually remove/add categories using:
df1 = df[df.Category != 'a']
but I also tried using a list to try and do it in a single line, like such:
df1 = df[df.Category = ['x','y','z']]
but that gave me a syntax error. Is there any way to perform this function?
try:
df1 = df[df.Category.isin(['x','y','z'])]

How to discard names with a single character from data frame?

Can anyone tell me how can I remove all 'A's and other data like this from the data frame? and I also want to remove XXXX rows from the data frame.
Use Series.str.len with Series.ne to performance a boolean indexing
if you want to delete the column where name is A :
df[df['name'].ne('A') & df['year'].ne('XXXX'))]
to detect when lenght of string in column name is greater than one.
df[df['name'].str.len().gt(1) & df['year'].ne('XXXX')]
In order to remove all the lines where in column name you have 1-character long string just do:
df = df.drop(df.index[df["name"].str.len().eq(1)], axis=0)
Similarly for the XXXX rows:
df = df.drop(df.index[df["year"].eq("XXXX")], axis=0)
And combined:
df = df.drop(df.index[df["name"].str.len().eq(1) | df["year"].eq("XXXX")],axis=0)

Get the column index as a value when value exists

I need do create some additional columns to my table or separate table based on following:
I have a table
and I need to create additional columns where the column indexes (names of columns) will be inserted as values. Like this:
How to do it in pandas? Any ideas?
Thank you
If need matched columns only for 1 values:
df = (df.set_index('name')
.eq(1)
.dot(df.columns[1:].astype(str) + ',')
.str.rstrip(',')
.str.split(',', expand=True)
.add_prefix('c')
.reset_index())
print (df)
Explanation:
Idea is create boolean mask with True for values which are replaced by columns names - so compare by DataFrame.eq by 1 and used matrix multiplication by DataFrame.dot by all columns without first with added separator. Then remove last traling separator by Series.str.rstrip and use Series.str.split for new column, changed columns names by DataFrame.add_prefix.
Another solution:
df1 = df.set_index('name').eq(1).apply(lambda x: x.index[x].tolist(), 1)
df = pd.DataFrame(df1.values.tolist(), index=df1.index).add_prefix('c').reset_index()

Categories

Resources