Trying to permanently delete all rows that contain a given string. Tried this code, it runs but if you df.head() afterwards it doesn't show that it dropped.
df[df["column"].str.contains('text')==False]
Try assigning it to the df. Like:
df = df[df["column"].str.contains('text')==False]
Related
this is probably a easy fix that just eludes me right now.
I have a excel file with the following content.
from it I want to filter out the "Num-" from the rest. For simplicitys sake I use .str here.
df_test = pd.read_excel(r'C:\...\test.xlsx')
df_test = df_test.filter(like='Order_Number', axis=1)
df_test = df_test['Order_Number'].str[4:]
df_test.head()
The output comes out without the title Order_Number though and I am not sure why. How can I preserve it without adding it manually back?
It appears you are assigning the new values of 'Order_Number' column to entire dataframe, instead of assigning them to the actual column. Try:
df_test['Order_Number'] = df_test['Order_Number'].str[4:]
I am replacing values of an existing dataframe in Python during a for loop. My original dataframe is of this format:
After trying to update the items in the "slice_file_name" and "fsID" via these pandas replace commands:
df1['slice_file_name'] = df1['slice_file_name'].replace(to_replace=str(row["slice_file_name"]),value=f'{actual_filename}_{directory}.wav')
fsID_Name = str(row["fsID"])
df1['fsID'] = df1['fsID'].replace(to_replace=str(row["fsID"]), value=f'{fsID_Name}_{directory}')
Only the "slice_file_name" gets updated correctly:
The "fsID" does not get updated correctly. Can you tell me what I am doing wrong here?
I want to update the "fsID" column as follows: - for example for the first data row, "fsID" should be equal to 102305_TimeShift-10pct. I see from my ide that f'{fsID_Name}_{directory}' gives the correct string, but it is not updating the fsID cell. How to update the fsID cell accordingly?
Thanks!
If I understand correctly what you are looking for 'fsID', would this work (after your 'slice_file_name' is updated) ?
df['fsID']=df['fsID']+'_'+df['slice_file_name'].str.split('_',expand=True)[1].str.replace('.wav','')
I am using pandas for the first time.
df.groupby(np.arange(len(df))//10).mean()
I used the code above which works to take an average of every 10th row. I want to save this updated data frame but doing df.to_csv is saving the original dataframe which I imported.
I also want to multiply one column from my df (df.groupby dataframe essentially) with a number and make a new column. How do I do that?
The operation:
df.groupby(np.arange(len(df))//10).mean()
Might return the averages dataframe as you want it, but it wont change the original dataframe. Instead you'll need to do:
df_new = df.groupby(np.arange(len(df))//10).mean()
You could assign it the same name if you want. The other options is some operations which you might expect to modify the dataframe accept in inplace argument which normally defaults to False. See this question on SO.
To create a new column which is an existing column multpied by a number you'd do:
df_new['new_col'] = df_new['existing_col']*a_number
I am looking to delete a row in a dataframe that is imported into python by pandas.
if you see the sheet below, the first column has same name multiple times. So the condition is, if the first column value re-appears in a next row, delete that row. If not keep that frame in the dataframe.
My final output should look like the following:
Presently I am doing it by converting each column into a list and deleting them by index values. I am hoping there would be an easy way. Rather than this workaround/
df.drop_duplicates([df.columns[0])
should do the trick.
Try the following code;
df.drop_duplicates(subset='columnName', keep=’first’, inplace=true)
I imported a .csv file with a single column of data into a dataframe that I am trying to clean up by splitting the column based on various string occurrences within the cells. I've tried numerous means to split the column, but can't seem to get it to work. My latest attempt was using the following:
df.loc[:,'DataCol'] = df.DataCol.str.split(pat=':\n',expand=True)
df
The result is a dataframe that is still one column and completely unchanged. What am I doing wrong? This is my first time doing anything like this so please forgive the simple question.
Df.loc creates a copy of the column you've selected - try replacing the code below with df['DataCol'], which references the actual column in the original dataframe.
df.loc[:,'DataCol']