How to discard names with a single character from data frame? - python

Can anyone tell me how can I remove all 'A's and other data like this from the data frame? and I also want to remove XXXX rows from the data frame.

Use Series.str.len with Series.ne to performance a boolean indexing
if you want to delete the column where name is A :
df[df['name'].ne('A') & df['year'].ne('XXXX'))]
to detect when lenght of string in column name is greater than one.
df[df['name'].str.len().gt(1) & df['year'].ne('XXXX')]

In order to remove all the lines where in column name you have 1-character long string just do:
df = df.drop(df.index[df["name"].str.len().eq(1)], axis=0)
Similarly for the XXXX rows:
df = df.drop(df.index[df["year"].eq("XXXX")], axis=0)
And combined:
df = df.drop(df.index[df["name"].str.len().eq(1) | df["year"].eq("XXXX")],axis=0)

Related

Remove characters from string in column name just for a few columns

Hi I have a dataframe with 75 characters in the columns name that I don't need to make a chart with matplot so I was trying to use this line:
dfr = dfr.iloc[:,1:11].rename(columns=lambda x: x[75:], inplace=True)
But when I print it gives me none, I just need to remove this from the first 10 columns the rest are ok, but I'm not sure what is wrong, help please
You are trying to slice the dataframe and rename the sliced dataset columns and re-assigning to the original dataset which has a different shape.
Try renaming using dict
new_names = dfr.iloc[:,0:10].rename(columns=lambda x: x[75:]).columns #first 10 columns
dfr.rename(columns=dict(zip(dfr.columns,new_names)),inplace=True)

How to drop columns which contains specific characters except one column?

Pandas dataframe is having 5 columns which contains 'verifier' in it. I want to drop all those columns which contained 'verified in it except the column named 'verified_90'(Pandas dataframe). I am trying following code but it is removing all columns which contains that specific word.
Column names: verified_30 verified_60 verified_90 verified_365 logo.verified. verified.at etc
''''
df = df[df.columns.drop(list(df.filter(regex='Test')))]
''''
You might be able to use a regex approach here:
df = df[df.columns.drop(list(df.filter(regex='^(?!verified_90$).*verified.*$')))]
Filter columns with not verified OR with verified_90 in DataFrame.loc, here : means select all rows and columns by mask:
df.loc[:, ~df.columns.str.contains('verified') | (df.columns == 'verified_90')]

How to drop/delete/filter rows in pandas dataframe based on string pattern condition?

Say there's a data frame with (4000,13) shape. Say dataframe["str_labels"] may has "|" value in it.
How to sort pandas dataframe based on removing any rows (all 13 columns) consisting of string value "|".
example:
list(dataframe["str_labels"])=["abcd","aaa","op|gg","iku | gv"]
filtered_out = ["abcd", "aaa"]
## example code
dataframe["|" not in dataframe["str_labels"]]
# or
dataframe[dataframe["str_Labels"].str.contains("|")]
# ........etc
You should make a list of characters that are the conditions for dropping rows:
list = ['<character>', '\|',....]
and then filter your df by
df = df[~df['your column'].isin(['list'])]
Note the \| for the pipe character.

How to remove commas from ALL the column in pandas at once

I have a data frame where all the columns are supposed to be numbers. While reading it, some of them were read with commas. I know a single column can be fixed by
df['x']=df['x'].str.replace(',','')
However, this works only for series objects and not for entire data frame. Is there an elegant way to apply it to entire data frame since every single entry in the data frame should be a number.
P.S: To ensure I can str.replace, I have first converted the data frame to str by using
df.astype('str')
So I understand, I will have to convert them all to numeric once the comma is removed.
Numeric columns have no ,, so converting to strings is not necessary, only use DataFrame.replace with regex=True for substrings replacement:
df = df.replace(',','', regex=True)
Or:
df.replace(',','', regex=True, inplace=True)
And last convert strings columns to numeric, thank you #anki_91:
c = df.select_dtypes(object).columns
df[c] = df[c].apply(pd.to_numeric,errors='coerce')
Well, you can simplely do:
df = df.apply(lambda x: x.str.replace(',', ''))
Hope it helps!
In case you want to manipulate just one column:
df.column_name = df.column_name.apply(lambda x : x.replace(',',''))

Extract text enclosed between a delimiter and store it as a list in a separate column

I have a Panda dataframe with a text column in the format below. There are some values/text meshed in between ##. I want to find such text which are present between ## and extract them in a separate column as a list.
##fare_curr.currency####based_fare_90d.price##
htt://www.abcd.lol/abcd-Search?from:##based_best_flight_fare_90d.air##,to:##mbased_90d.water##,departure:##mbased_90d.date_1##TANYT&pas=ch:0Y&mode=search
Consider the above two strings to be two rows of the same column. I want to get a new column with a list [fare_curr.currency, based_fare_90d.price] in the first row and [based_best_flight_fare_90d.air, mbased_90d.water, based_90d.date_1] in the second row.
Given this df
df = pd.DataFrame({'data':
['##fare_curr.currency####based_fare_90d.price##',
'htt://www.abcd.lol/abcd-Search?\ from:##based_best_flight_fare_90d.air##,to:##mbased_90d.water##,departure:#
#mbased_90d.date_1##TANYT&pas=ch:0Y&mode=search']})
You can get desired result in a new column using
df['new'] = pd.Series(df.data.str.extractall('##(.*?)##').unstack().values.tolist())
You get
data new
0 ##fare_curr.currency####based_fare_90d.price## [fare_curr.currency, based_fare_90d.price, None]
1 htt://www.abcd.lol/abcd-Search?from:##based_be... [based_best_flight_fare_90d.air, mbased_90d.wa...

Categories

Resources