Pandas dataframe is having 5 columns which contains 'verifier' in it. I want to drop all those columns which contained 'verified in it except the column named 'verified_90'(Pandas dataframe). I am trying following code but it is removing all columns which contains that specific word.
Column names: verified_30 verified_60 verified_90 verified_365 logo.verified. verified.at etc
''''
df = df[df.columns.drop(list(df.filter(regex='Test')))]
''''
You might be able to use a regex approach here:
df = df[df.columns.drop(list(df.filter(regex='^(?!verified_90$).*verified.*$')))]
Filter columns with not verified OR with verified_90 in DataFrame.loc, here : means select all rows and columns by mask:
df.loc[:, ~df.columns.str.contains('verified') | (df.columns == 'verified_90')]
Related
I have a csv from which I want to drop the columns which has only '-' values in it. These are the columns I want to drop:
How can I do this?
Use DataFrame.ne for test not - value with DataFrame.all for test if not exist in all rows anf filter by DataFrame.loc - first : means al rows and second is mask for filter columns:
df = df.loc[:, df.ne('-').all()]
I have a data like this in a csv file which I am importing to pandas df
I want to collapse the values of Type column by concatenating its strings to one sentence and keeping it at the first row next to date value while keeping rest all rows and values same.
As shown below.
Edit:
You can try ffill + transform
df1=df.copy()
df1[['Number', 'Date']]=df1[['Number', 'Date']].ffill()
df1.Type=df1.Type.fillna('')
s=df1.groupby(['Number', 'Date']).Type.transform(' '.join)
df.loc[df.Date.notnull(),'Type']=s
df.loc[df.Date.isnull(),'Type']=''
I have a particular Pandas dataframe that has multiple different string categories in a particular column - 'A'. I want to create a new dataframe with only rows that contain 7 separate categories from column A out of about 15.
I know that I can individually remove/add categories using:
df1 = df[df.Category != 'a']
but I also tried using a list to try and do it in a single line, like such:
df1 = df[df.Category = ['x','y','z']]
but that gave me a syntax error. Is there any way to perform this function?
try:
df1 = df[df.Category.isin(['x','y','z'])]
Can anyone tell me how can I remove all 'A's and other data like this from the data frame? and I also want to remove XXXX rows from the data frame.
Use Series.str.len with Series.ne to performance a boolean indexing
if you want to delete the column where name is A :
df[df['name'].ne('A') & df['year'].ne('XXXX'))]
to detect when lenght of string in column name is greater than one.
df[df['name'].str.len().gt(1) & df['year'].ne('XXXX')]
In order to remove all the lines where in column name you have 1-character long string just do:
df = df.drop(df.index[df["name"].str.len().eq(1)], axis=0)
Similarly for the XXXX rows:
df = df.drop(df.index[df["year"].eq("XXXX")], axis=0)
And combined:
df = df.drop(df.index[df["name"].str.len().eq(1) | df["year"].eq("XXXX")],axis=0)
I need do create some additional columns to my table or separate table based on following:
I have a table
and I need to create additional columns where the column indexes (names of columns) will be inserted as values. Like this:
How to do it in pandas? Any ideas?
Thank you
If need matched columns only for 1 values:
df = (df.set_index('name')
.eq(1)
.dot(df.columns[1:].astype(str) + ',')
.str.rstrip(',')
.str.split(',', expand=True)
.add_prefix('c')
.reset_index())
print (df)
Explanation:
Idea is create boolean mask with True for values which are replaced by columns names - so compare by DataFrame.eq by 1 and used matrix multiplication by DataFrame.dot by all columns without first with added separator. Then remove last traling separator by Series.str.rstrip and use Series.str.split for new column, changed columns names by DataFrame.add_prefix.
Another solution:
df1 = df.set_index('name').eq(1).apply(lambda x: x.index[x].tolist(), 1)
df = pd.DataFrame(df1.values.tolist(), index=df1.index).add_prefix('c').reset_index()