I am trying to run a excel file with 3 columns, I have imported the excel file as 'excel' using pd.read_excel and works fine. The issue I am having is when I try and create a function in python that reads when the first column's row ends with a certain letter to move it to the Column desired.
Here is am example of the excel file: (Ends with an X only goes in Working Column Y means in Show Column only and XY means it is in Both Show and Working Column.)
All Column Working Column Show Column
ID_XY
Name
Manager_Name_XY
Job_Title_X
Salary_XY
The code below is what I am using to try and differentiate between where the rows should be organized.
df.loc[excel['All Column'].str.endswith('_X'), df2] ='Working Column'
df.loc[excel['All Column'].str.endswith('_XY'), df2] = 'Show Column'
when I run this code I get:
unhashable type: 'Series'
What I want is something like this:
All Column Working Column Show Column
ID_XY ID_XY ID_XY
Name Manager_Name_XY Manager_Name_XY
Manager_Name_XY Job_Title Salary_Y
Salary_Y
Related
I am trying to create a new df with certain columns from 2 others:
The first called visas_df:
And the second called cpdf:
I only need the highlighted columns. But when I try this:
df_joined = pd.merge(cpdf,visas_df["visas"],on="date")
The error appearing is: KeyError: 'date'
I imagine this is due to how I created cpdf. It was a "bad dataset' so I did some fidgeting.Line 12 on the code snipped below might have something to do, but I am clueless...
I even renamed the date columns of both dfs as "date and checked that dtypes and number of rows are the same.
Any feedback would be much appreciated. Thanks!
df['visas'] in merge function is not a dataframe and its not contain date column. İf you want to df as a dataframe, you have to use double square bracket [[]] like this:
df_joined = pd.merge(cpdf,visas_df[["date","visas"]],on="date")
I have a dataframe, and I'm trying to encode all the categorical values within the dataframe. So the following is the code I wrote to encode all categorical columns in one go,
for col in data.select_dtypes('object').columns:
data[col] = data[col].astype('category').cat.codes
but this works only sometimes and often throws the following error saying "Dataframe has no attributed as cat"
AttributeError: 'DataFrame' object has no attribute 'cat'
Now I'm not able to understand how it works sometimes and fails other times. Also I haven't applied the cat method to the whole dataframe but to a column (series) each time going though the loop.
Does anyone know what's going wrong here?
Problem is there are duplicated columns names, so if select one column get all columns with same label.
for col in data.select_dtypes('object').columns:
print (col)
#check what select
print (data[col] )
data[col] = data[col].astype('category').cat.codes
I have created a large dataframe using 19 individual CSV files. All the CSV files have a similar data structure/type because those are the same experimental data from multiple runs. After merging all the CSV file into a large dataframe, I want to change the Column name. I have 40 columns. I want to use the same name for some columns, such as column 2,5,8,..should have "Counts" as column name, column 3,6,8.....should have 'File name' as column name, etc. Right now, all the column names are in number. How can I change the column name?
I have tried this code
newDf.rename(columns = {'0':'Time',tuple(['2','5','8','11','14','17','20','23','26','29','32','35','38','41','44','47','50','53','56']):'File_Name' })
But it didn't work
My datafile looks like this ...
I'm not sure if I understand it correctly, you wish to modify the name of the columns based from its content:
df.columns = [f"FileName_{v[0]}" if df[v[1]].dtype == "O" else f"Count_{v[0]}" for v in enumerate(df.columns)]
What this one does is to check if the column's data type is object where it will assign "Filename" in that element; else "Count"
Then add first column as "Time":
df.columns[0] == "Time"
I am reading multiple CSV files from a folder into a dataframe. I loop for all the files in the folder and then concat the dataframes to obtain the final dataframe.
However the CSV file has one summary row from which I want to extract the date, and then add as a new column for all the rows in that csv/dataframe.
'''
df=pd.read_csv(f,header=None,names=['Inverter',"Day Yield",'month Yield','Year Yield','SpecificYieldDay','SYMth','SYYear','Power'],sep=';', **kwargs)
df['date']=df.loc[[0],['Day Yield']]
df
I expect ['date'] column to be filled with the date for that file for all the rows in that particular csv, but it gets filled correctly only for the first row.
Refer to image of dataframe. I want all the rows of the 'date' column to be showing 7/25/2019 instead of only the first row.
I have also added an example of one of the csv files I am reading from
csv file
If I understood correctly, the value that you want to add as a new column for all rows is in df.loc[[0],['Day Yield']].
If that is correct you can do the following:
df = df.assign(date=[df.loc[[0],['Day Yield']]]*len(df))
Im trying to add a new column to a pandas dataframe. Also, I try to give a name to index to be printed out in Excel when I export the data
import pandas as pd
import csv
#read csv file
file='RALS-04.csv'
df=pd.read_csv(file)
#select the columns that I want
column1=df.iloc[:,0]
column2=df.iloc[:,2]
column3=df.iloc[:,3]
column1.index.name="items"
column2.index.name="march2012"
column3.index.name="march2011"
df=pd.concat([column1, column2, column3], axis=1)
#create a new column with 'RALS' as a defaut value
df['comps']='RALS'
#writing them back to a new CSV file
with open('test.csv','a') as f:
df.to_csv(f, index=False, header=True)
The output is the 'RALS' that I added to the dataframe goes to Row 2000 while the data stops at row 15. How to constrain the RALS so that it doesnt go beyond the length of the data being exported? I would also prefer a more elegant, automated way rather than specifying at which row should the default value stops at.
The second question is, the labels that I have assigned to the columns using columns.index.name, does not appear in the output. Instead it is replaced by a 0 and a 1. Please advise solutions.
Thanks so much for inputs