I have an existing excel file in documents which I open as a data frame with df = pd.read_excel(filename, index_col ='respID') and then I read and treat as a normal data frame.
I process some data and I then have two dictionaries whose key values are the same as the index values of the data frame.
I'm looking to insert these dictionaries into the excel file after a specific column. How would I go about doing this. Would it be better if they were just series?
I've seen solutions
How to write to an existing excel file without overwriting data (using pandas)? and Pandas Series to Excel but I don't think either answers my question totally.
Worst comes to worst I can add the dictionaries to the data frame where I want them and just overwrite the entire excel file or write a new one and replace it but that seems inelegant.
Thank you.
Related
Excel Data
This is the data I've in an excel file. There are 10 sheets containing different data and I want to sort data present in each sheet by the 'BA_Rank' column in descending order.
After sorting the data, I've to write the sorted data in an excel file.
(for eg. the data which was present in sheet1 of the unsorted sheet should be written in sheet1 of the sorted list and so on...)
If I remove the heading from the first row, I can use the pandas (sort_values()) function to sort the data present in the first sheet and save it in another list.
like this
import pandas as pd
import xlrd
doc = xlrd.open_workbook('without_sort.xlsx')
xl = pd.read_excel('without_sort.xlsx')
length = doc.nsheets
#print(length)
#for i in range(0,length):
#sheet = xl.parse(i)
result = xl.sort_values('BA_Rank', ascending = False)
result.to_excel('SortedData.xlsx')
print(result)
So is there any way I can sort the data without removing the header file from the first row?
and how can I iterate between sheets so as to sort the data present in multiple sheets?
(Note: All the sheets contain the same columns and I need to sort every sheet using 'BA_Rank' in descending order.)
First input, you don't need to call xlrd when using pandas, it's done under the hood.
Secondly, the read_excel method its REALLY smart. You can (imo should) define the sheet you're pulling data from. You can also set up lines to skip, inform where the header line is or to ignore it (and then set column names manually). Check the docs, it's quite extensive.
If this "10 sheets" it's merely anecdotal, you could use something like xlrd to extract the workbook's sheet quantity and work by index (or extract names directly).
The sorting looks right to me.
Finally, if you wanna save it all in the same workbook, I would use openpyxl or some similar library (there are many others, like pyexcelerate for large files).
This procedure pretty much always looks like:
Create/Open destination file (often it's the same method)
Write down data, sheet by sheet
Close/Save file
If the data is to be writen all on the same sheet, pd.concat([all_dataframes]).to_excel("path_to_store") should get it done
I've created a csv file with the column names and saved it using pandas library. This file will be used to create a historic record where the rows will be charged one by one in different moments... what I'm doing to add rows to this csv previously created is transform the record to a DataFrame and then using to_csv() I choose mode = 'a' as a parameter in order to append this record to the existing file. The problem here is that I would like to see and index automatically generated in the file everytime I add a new row. I already know when I import this file as a DF, an index is generated automatically, but this is within the idle interface...when I open the csv with Excel for example...the file doesn't have an index.
While writing your files to csv, you can use set index = True in the to_csv method. This ensures that the index of your dataframe is written explicitly to the csv file
So here is my situation. Using Python I want to copy specific columns from excel spreadsheet into specific columns into a csv worksheet.
The pre-filled column header names are named differently in each spreadsheet and I need to use a sublist as a parameter.
For example, in the first sublist, data column in excel needs to be copied from/to:
spreadsheet csv
"scan_date" => "date_of_scan"
Two sublists as parameters: one of names copied from excel, one of names of where to paste into csv.
Not sure if a dictionary sublist would be better than two individual sublists?
Also, the csv column header names are in row B (not row A like excel) which has complicated things such as data frames.
So, ideally I would like to have sublists converted to arrays,
spreadsheet iterates columns to find "scan_date"
copies data
iterates to find "date_of_scan" in csv
paste data
moves on to the second item in the sublists and repeats.
I've tried pandas and openpyxl and just can't seem to figure out the approach/syntax of how to do it.
Any help would be greatly appreciated.
Thank you.
Clarification edit:
The csv file has some preexisting data within. Also, I cannot change the headers into different columns. So, if "date_of_scan" is in column "RF" then it must stay in column "RF". I was able to copy, say, the 5 columns of data from excel into a temp spreadsheet and then concatenate into the csv but it always moved the pasted columns to the beginning of the csv document (columns A, B, C, D, E).
It is hard to know the answer without seeing you specific dataset, but it seems to me that a simpler approach might be to simply make your excel sheet a df, drop everything except the columns you want in the csv then write a csv with pandas. Here's some psuedo-code.
import pandas as pd
df=pd.read_excel('your_file_name.xlsx')
drop_cols=[,,,] #list of columns to get rid of
df.drop(drop_cols,axis='columns')
col_dict={'a':'x','b':'y','c':'z'} #however you want to map you new columns in this example abc are old columns and xyz are new ones
#this line will actually rename your columns with the dictionary
df=df.rename(columns=col_dict)
df.to_csv('new_file_name.csv') #write new file
and this will actually run in python, but I created the df from dummy data instead of an excel file.
#with dummy data
df=pd.DataFrame([0,1,2],index=['a','b','c']).T
col_dict={'a':'x','b':'y','c':'z'}
df=df.rename(columns=col_dict)
df.to_csv('new_file_name.csv') #write new file
I made a program to save two arrays into csv file using pandas data frame in python so that I could record all the data.
I tried the code listed below.
U_8=[]
start=[]
U_8.append(d)
start.append(str(time.time()))
x=pd.DataFrame({'1st':U_8, 'Time Stamp':start})
export_csv = x.to_csv (r'/home/pi/Frames/q8.csv', index = None,
header=True)
Every time the program is closed and run again , it overwrites the previous values stored in the csv file. I expected it to save the new values along with the previous ones. How could I store the past and present value in this csv file.
In order to append to a csv instead of writing, pass mode='a' to df.to_csv. The default mode is 'w' which overwrites any existing csv with the same filename. Plain appending, however, appends the column headers as well which will appear as values in your csv. To mitigate that, pass header=False in your subsequent runs: df.to_csv(data, mode='a', header=False).
Another way is to read your original DataFrame and use pd.concat to join it with your new results. The workflow will then be as such:
Read the original csv into a DataFrame.
Create a DataFrame with new results.
Concatenate the two DataFrames.
Write the resulting DataFrame to csv.
Assigning the return value of .to_csv to a variable is not necessary. As in df.to_csv(data) will still export to csv.
I tried to come up with a way to copy data from a sheet in an excel file as
import pandas as pd
origionalFile = pd.ExcelFile('AnnualReport-V5.0.xlsx')
Transfers = pd.read_excel(origionalFile, 'Sheet1')
I have another excel file, which named 'AnnualReport-V6.0.xlsx', it has existing data in the sheet named 'Transfers', I tried to use the dataframe I created easily on to replace data in the sheet 'Transfers' in 'AnnualReport-V6.0.xlsx' from column B, leave column A as it is.
I did a few searches, the closest to what I want is this
Modifying an excel sheet in a excel book with pandas
but it does not allow me the keep column A in the original sheet (column A has some equations I do want to keep them), any idea how to do it? Thanks
Would reading column A and inserting it to the fresh data you want to write solve your problem?