Using the same name for multiples column in a large dataframe

Using the same name for multiples column in a large dataframe - python

I have created a large dataframe using 19 individual CSV files. All the CSV files have a similar data structure/type because those are the same experimental data from multiple runs. After merging all the CSV file into a large dataframe, I want to change the Column name. I have 40 columns. I want to use the same name for some columns, such as column 2,5,8,..should have "Counts" as column name, column 3,6,8.....should have 'File name' as column name, etc. Right now, all the column names are in number. How can I change the column name?
I have tried this code
newDf.rename(columns = {'0':'Time',tuple(['2','5','8','11','14','17','20','23','26','29','32','35','38','41','44','47','50','53','56']):'File_Name' })
But it didn't work
My datafile looks like this ...

I'm not sure if I understand it correctly, you wish to modify the name of the columns based from its content:
df.columns = [f"FileName_{v[0]}" if df[v[1]].dtype == "O" else f"Count_{v[0]}" for v in enumerate(df.columns)]
What this one does is to check if the column's data type is object where it will assign "Filename" in that element; else "Count"
Then add first column as "Time":
df.columns[0] == "Time"

Related

update a column and save it back to dataset in pandas

I m working on a football dataset which has a few columns. There is one column called TimeUnder and the datatype of the column is int64. I want to append the unit 's' to all the values in the column and save it back to the dataset.
I converted the column to a string datatype and modified appending a 's' to each value in the column TimeUnder. I saved the modifications to a new csv file
import pandas as pd
football=pd.read_csv("Football_dataset.csv")
football1=football['TimeUnder'].astype(str) + 's'
football1.to_csv("football_modified.csv")
football_m=pd.read_csv("football_modified.csv")
football_m.head()
But the new modified csv only has the modified column, but I want all of the previous columns in the dataset along with the modified column

Currently, you are modifying the column and creating a separate dataframe from it and writing that to csv.
Instead, you need to modify the column in the original df and write it to df.
Change this football1=football['TimeUnder'].astype(str) + 's' to:
football['TimeUnder']=football['TimeUnder'].astype(str) + 's'
Then write to csv:
football.to_csv("football_modified.csv")

Cleaning dataframe- assign value in one cell to column

I am reading multiple CSV files from a folder into a dataframe. I loop for all the files in the folder and then concat the dataframes to obtain the final dataframe.
However the CSV file has one summary row from which I want to extract the date, and then add as a new column for all the rows in that csv/dataframe.
'''
df=pd.read_csv(f,header=None,names=['Inverter',"Day Yield",'month Yield','Year Yield','SpecificYieldDay','SYMth','SYYear','Power'],sep=';', **kwargs)
df['date']=df.loc[[0],['Day Yield']]
df
I expect ['date'] column to be filled with the date for that file for all the rows in that particular csv, but it gets filled correctly only for the first row.
Refer to image of dataframe. I want all the rows of the 'date' column to be showing 7/25/2019 instead of only the first row.
I have also added an example of one of the csv files I am reading from
csv file

If I understood correctly, the value that you want to add as a new column for all rows is in df.loc[[0],['Day Yield']].
If that is correct you can do the following:
df = df.assign(date=[df.loc[[0],['Day Yield']]]*len(df))

Change Names of Column after iterating over every column names in pandas

I want to change the names of every column of my dataframe iterating over each column names
I am able to change the column names one by one but i want to use a for loop in order to change all column names
for i in range(0,len(flattened.columns)):
flattened.rename(columns={flattened.columns[i]: "P" + str(i)})

You could just create the dictionary for rename in a list comprehension and then apply it to all columns in a single step, like so:
flattened.rename(
columns = {
column_name: 'P' + str(index) for index,column_name in enumerate(flattened.columns)
}
)
Is this what you are looking for?

Searching CSV files with Pandas (unique id's) - Python

I am looking into searching a csv file with 242000 rows and want to sum the unique identifiers in one of the columns. The column name is 'logid' and has a number of different values i.e. 1002, 3004, 5003. I want to search the csv file using the panda dataframe and sum the amount of unique identifiers. If possible I would then like to create a new csv file that stores this information. For example if I find there are 50 logid's of 1004 I would then like to create a csv file that has column name 1004 and the count of 50 displayed below. I would do this for all unique identifiers and add them in the same csv file. I am completely new at this and have done some searching but no idea where to start.
Thanks!

As you haven't post your code I can give you an answer only about the general way it would work.
Load the CSV file into a pd.Dataframe using pandas.read_csv
Save all values which a occurence > 1 in a seperate df1 using pandas.DataFrame.drop_duplicates like:
df1=df.drop_duplicates(keep="first)
--> This Will return a DataFrame which only contains the rows with the first occurence of duplicate values. E.g. if the value 1000 is in 5 rows only the first row will be returned while the others are dropped.
--> Applying df1.shape[0] will return you the number of duplicate values in your df.
3.If you want to store all rows of df which contain a "duplicate value" in a seperate CSV file you have to do smth like this:
df=pd.DataFrame({"A":[0,1,2,3,0,1,2,5,5]}) # This should represent your original data set
print(df)
df1=df.drop_duplicates(subset="A",keep="first") #I assume the column with the duplicate values is columns "A" if you want to check the whole row just omit the subset keyword.
print(df1)
list=[]
for m in df1["A"]:
mask=(df==m)
list.append(df[mask].dropna())
for dfx in range(len(list)):
name="file{0}".format(dfx)
list[dfx].to_csv(r"YOUR PATH\{0}".format(name))

Pandas read_excel, csv; names column names mapper?

Suppose you have a bunch of excel files with ID and company name. You have N number of excel files in a directory and you read them all into a dataframe, however, in each file company name is spelled slightly differently and you end up with a dataframe with N + 1 columns.
is there a way to create a mapping for columns names for example:
col_mappings = {
'company_name': ['name1', 'name2', ... , 'nameN],
}
So that when your run read_excel you can map all the different possibilities of company name to just one column? Also could you do this with any type of datafile? E.g. read_csv ect..

Are you concatenating the files after you read them one by one? If yes, you can simply change the column name once you read the file. From you question, I assume your dataframe only contains two columns - Id and CompanyName. So, you can simply change it by indexing.
df = pd.read_csv(one_file)
df.rename(columns={df.columns[1]:'company_name'})
then concatenate it to the original dataframe.
Otherwise, simply read with given column names,
df = pd.read_csv(one_file, names=['Id','company_name'])
then remove first row from df as it contains original column names.
It can be performed on both .csv and .xlsx file.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using the same name for multiples column in a large dataframe - python

Related

update a column and save it back to dataset in pandas

Cleaning dataframe- assign value in one cell to column

Change Names of Column after iterating over every column names in pandas

Searching CSV files with Pandas (unique id's) - Python

Pandas read_excel, csv; names column names mapper?

Categories

Resources