Change Names of Column after iterating over every column names in pandas - python

I want to change the names of every column of my dataframe iterating over each column names
I am able to change the column names one by one but i want to use a for loop in order to change all column names
for i in range(0,len(flattened.columns)):
flattened.rename(columns={flattened.columns[i]: "P" + str(i)})

You could just create the dictionary for rename in a list comprehension and then apply it to all columns in a single step, like so:
flattened.rename(
columns = {
column_name: 'P' + str(index) for index,column_name in enumerate(flattened.columns)
}
)
Is this what you are looking for?

Related

Pandas - Find a column with a specific value in the entire dataframe

I have a DataFrame which has a few columns. There is a column with a value that only appears once in the entire dataframe. I want to write a function that returns the column name of the column with that specific value. I can manually find which column it is with the usual data exploration, but since I have multiple dataframes with the same properties, I need to be able to find that column for multiple dataframes. So a somewhat generalized function would be of better use.
The problem is that I don't know beforehand which column is the one I am looking for since in every dataframe the position of that particular column with that particular value is different. Also the desired columns in different dataframes have different names, so I cannot use something like df['my_column'] to extract the column.
Thanks
You'll need to iterate columns and look for the value:
def find_col_with_value(df, value):
for col in df:
if (df[col] == value).any():
return col
This will return the name of the first column that contains value. If value does not exist, it will return None.
Check the entire DataFrame for the specific value, checking any to see if it ever appears in a column, then slice the columns (or the DataFrame if you want the Series)
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.normal(0, 5, (100, 200)),
columns=[chr(i+40) for i in range(200)])
df.loc[5, 'Y'] = 'secret_value' # Secret value in column 'Y'
df.eq('secret_value').any().loc[lambda x: x].index
# or
df.columns[df.eq('secret_value').any()]
Index(['Y'], dtype='object')
I have another solution:
names = ds.columns
for i in names:
for j in ds[i]:
if j == 'your_value':
print(i)
break
Here you are collecting all the names of columns and then iterating all dataset while it will be found. Then print the name of column.

Using the same name for multiples column in a large dataframe

I have created a large dataframe using 19 individual CSV files. All the CSV files have a similar data structure/type because those are the same experimental data from multiple runs. After merging all the CSV file into a large dataframe, I want to change the Column name. I have 40 columns. I want to use the same name for some columns, such as column 2,5,8,..should have "Counts" as column name, column 3,6,8.....should have 'File name' as column name, etc. Right now, all the column names are in number. How can I change the column name?
I have tried this code
newDf.rename(columns = {'0':'Time',tuple(['2','5','8','11','14','17','20','23','26','29','32','35','38','41','44','47','50','53','56']):'File_Name' })
But it didn't work
My datafile looks like this ...
I'm not sure if I understand it correctly, you wish to modify the name of the columns based from its content:
df.columns = [f"FileName_{v[0]}" if df[v[1]].dtype == "O" else f"Count_{v[0]}" for v in enumerate(df.columns)]
What this one does is to check if the column's data type is object where it will assign "Filename" in that element; else "Count"
Then add first column as "Time":
df.columns[0] == "Time"

adding row from one dataframe to another

I am trying to insert or add from one dataframe to another dataframe. I am going through the original dataframe looking for certain words in one column. When I find one of these terms I want to add that row to a new dataframe.
I get the row by using.
entry = df.loc[df['A'] == item]
But when trying to add this row to another dataframe using .add, .insert, .update or other methods i just get an empty dataframe.
I have also tried adding the column to a dictionary and turning that into a dataframe but it writes data for the entire row rather than just the column value. So is there a way to add one specific row to a new dataframe from my existing variable ?
So the entry is a dataframe containing the rows you want to add?
you can simply concatenate two dataframe using concat function if both have the same columns' name
import pandas as pd
entry = df.loc[df['A'] == item]
concat_df = pd.concat([new_df,entry])
pandas.concat reference:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
The append function expect a list of rows in this formation:
[row_1, row_2, ..., row_N]
While each row is a list, representing the value for each columns
So, assuming your trying to add one row, you shuld use:
entry = df.loc[df['A'] == item]
df2=df2.append( [entry] )
Notice that unlike python's list, the DataFrame.append function returning a new object and not changing the object called it.
See also enter link description here
Not sure how large your operations will be, but from an efficiency standpoint, you're better off adding all of the found rows to a list, and then concatenating them together at once using pandas.concat, and then using concat again to combine the found entries dataframe with the "insert into" dataframe. This will be much faster than using concat each time. If you're searching from a list of items search_keys, then something like:
entries = []
for i in search_keys:
entry = df.loc[df['A'] == item]
entries.append(entry)
found_df = pd.concat(entries)
result_df = pd.concat([old_df, found_df])

Split column into multiple column based on string condition

This is how the data set looks like: image
I have a column for 4000's values which contain different values in that column such as shown in the image.
I want to split the dataset based on the string compare. My ultimate goal is get all the values of W_LD(1) to W_LD(57) to put in one column, similarly others such as R_LD(1) to R_LD(32) into different column and so on.
I am creating a dataframe and try to match the string if the string matches particular value then all the values should go into different column.
df=pd.DataFrame(data)
str_x = df.Device_names[56]
def my_split(df):
return pd.Series
({'W_LD': [i for i in df.Device_names if str_x == "^W_LD(57)"] })
df.apply(my_split, axis=1)

How to split a column in a dataframe and store each value as a new row (in pandas)?

One of the columns in my dataset has "keywords" values stored like this:
monster|dna|tyrannosaurus rex|velociraptor|island
I want to split each keyword on (|) the pipeline and store it as a new row, so I can later use groupby to look at correlations based on the keywords.
The furthest I got was:
dfn = df['keywords'].str.split('|',expand=True)
But this stores them as new columns, not new rows, and this only stores these values only in a new dataframe. I still need to .append it back into the original dataframe, and then drop the original rows containing keyword clusters.
You can adding stack after split
dfn = df['keywords'].str.split('|',expand=True).stack()

Categories

Resources