Creating a Master excel file from dynamic CSV output using Python - python

I am trying to create a repository "Master" excel file from a CSV which will be generated and overwritten every couple of hours. The code below creates a new excel file and writes the content from "combo1.csv" to "master.xlsx". However, whenever the combo1 file is updated, the code basically overwrites the contents in the "master.xlsx" file. I need to append the contents from "combo1" to "Master" without the headers being inserted every time. Can someone help me with this?
import pandas as pd
writer = pd.ExcelWriter('master.xlsx', engine='xlsxwriter')
df = pd.read_csv('combo1.csv')
df.to_excel(writer, sheet_name='sheetname')
writer.save()

Refer to Append Data at the End of an Excel Sheet section in this medium article:
Using Python Pandas with Excel Sheets
(Credit to Nensi Trambadiya for the article)
Basically you'll have to first read the Excel file and find the number of rows before pushing the new data.
reader = pd.read_excel(r'master.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)

First read the excel file and then need to perform below method to append the rows.
import pandas as pd
from xlsxwriter import load_workbook
df = pd.DataFrame({'Name': ['abc','def','xyz','ysv'],
'Age': [08,45,32,26]})
writer = pd.ExcelWriter('master.xlsx', engine='xlsxwriter')
writer.book = load_workbook('Master.xlsx')
writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
reader = pd.read_excel(r'master.xlsx')
df.to_excel(writer,index=False,header=False,startrow=len(reader)+1)
writer.close()

import pandas as pd
from openpyxl import load_workbook
# new dataframe with same columns
df = pd.read_csv('combo.csv')
writer = pd.ExcelWriter('master.xlsx', engine='openpyxl')
# try to open an existing workbook
writer.book = load_workbook('master.xlsx')
# copy existing sheets
writer.sheets = dict((ws.title, ws) for ws in writer.book.worksheets)
# read existing file
reader = pd.read_excel(r'master.xlsx')
# write out the new sheet
df.to_excel(writer, index=False, header=False, startrow=len(reader) + 1)
writer.close()
Note that a Master has to be created before running the script

Related

Write DataFrame to Excel Template and Save as new file

I have a dataframe and an Excel template file, which has a worksheet that contains column headers, some formula, and pivot tables on another sheets.
I want to paste the data onto it then save the template as a new Excel file.
First thing I notice is that I cannot save the template as a new excel file.
Second is I cannot write the Dataframe to existing worksheet, it will create a new sheet for the data.
Then I found an option on pd.ExcelWriter, if_sheet_exists='overlay' on the internet. But it gives me Error
'overlay' is not valid for if_sheet_exists. Valid options are 'error', 'new' and 'replace'.
I'm using pandas version 1.5.1. Is it still possible to achieve this, or is there any better solution?
def write_report(df):
template_filename = f'Daily Quality Report Template.xlsx'
today_str = datetime.strftime(datetime.now(), '%Y%m%d')
result_filename = f'Report\\Daily Quality Report {today_str}.xlsx'
result_sheetname = today_str
# create new file
xlresult = Workbook()
xlresult.save(result_file_name)
# write
writer = pd.ExcelWriter(result_filename, engine='openpyxl', mode='a', if_sheet_exists='overlay')
writer.book = load_workbook(template_filename)
writer.sheets = {ws.title: ws for ws in writer.book.worksheets}
df.to_excel(writer, result_sheetname, startrow=1, header=False, index=False)
writer.save()

Overwrite a sheet in excel using python

I'm trying to overwrite one sheet of my excel file with data from a .txt file. The excel file I'm bringing the data into has several sheets but I only want to overwrite the 'Previous Month' sheet. Every time I run this code and open the excel file only the previous month sheet is there and nothing else. Many solutions on here show how to add more sheets, I'm trying to update an already existing sheet in an excel with 8 sheets total.
How can I fix my code so that only the one sheet is edited but all of them stay there?
import pandas as pd
#importing previous month data#
writer = pd.ExcelWriter('file.xlsx')
df = pd.read_csv('file.txt', sep='\t')
df.to_excel(writer, sheet_name='Previous Month', startrow=4, startcol=2)
writer.save()
writer.close()
Edited code- whatever is happening here keeps corrupting my original file
import pandas as pd
import openpyxl
#importing previous month data#
writer= pd.ExcelWriter('file.xlsx', mode= 'a', engine="openpyxl", if_sheet_exists="replace")
df = pd.read_csv('file.txt', sep='\t')
df.to_excel(writer, sheet_name="Previous Month", startrow=2, startcol=4)
writer.save()
writer.close()
You can use openpyxl.load_workbook() to do what you are looking for. While I did try the above suggestions, it didn't work for me. the load_workbook() usually runs without issues. So, hope this works for you as well.
I open the output file using load_workbook(), deleted the existing sheet (Sheet2 here) if it exists, then create and write the data using create_sheet() and dataframe_to_rows (Ref). Let me know in case of questions/issues.
import pandas as pd
import openpyxl
df = pd.read_csv('file.txt', sep='\t')
wb=openpyxl.load_workbook('output.xlsx') # Open workbook
if "Sheet2" in wb.sheetnames: # If sheet exists, delete it
del wb['Sheet2']
ws = wb.create_sheet(title='Sheet2') # Create new sheet
from openpyxl.utils.dataframe import dataframe_to_rows
rows = dataframe_to_rows(df, index=False, header=True) # Write dataframe as rows
for r_idx, row in enumerate(rows, 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx+2, column=c_idx+4, value=value) #Add... the 2, 4 are the offset, similar to the startrow and startcol in your code
wb.save('output.xlsx')

Pandas create a new sheet instead of adding the data in the active one

I am creating a spreadsheet with openpyxl and adding some data.
import pandas as pd
import numpy as np
from openpyxl import Workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from openpyxl import load_workbook
from collections import OrderedDict
workbook = Workbook()
sheet = workbook.active
def fill_static_values():
sheet["A1"] = "Run No."
sheet["A2"] = "MLIDMLPA"
sheet["A48"] = "Patients here"
sheet["B1"] = "Patient"
fill_static_values()
output = "./Name_of_run.xlsx"
workbook.save(filename=output)
Then my application do some data management and I want to add some of this data into the existing file.
book = load_workbook(output)
writer = pd.ExcelWriter(output, engine='openpyxl')
writer.book = book
## ExcelWriter for some reason uses writer.sheets to access the sheet.
## If you leave it empty it will not know that sheet Main is already there
## and will create a new sheet.
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
data_no_control.to_excel(writer, "sheet", startrow=2, startcol=3,
header=False,
index=False)
writer.save()
Solution found on this StackOverflow link
However, this is creating and adding the data in the correct position but in a new sheet called sheet2. What I am doing wrong?
The to_excel has incorrect sheet name. The S should be in CAPS. Change the line from
data_no_control.to_excel(writer, "sheet", startrow=2, startcol=3,
to
data_no_control.to_excel(writer, "Sheet", startrow=2, startcol=3,
As there is already a sheet in the excel, it is writing the data to Sheet2
EDIT
Noticed that you are using writer.sheets. If you want to use want the program pick up the first sheet from excel automatically, you can use this as well...
data_no_control.to_excel(writer, sheet_name=list(writer.sheets.keys())[0], startrow=2, startcol=3,
This will pick up the first sheet (in your case the only sheet) as the worksheet to update

Appending to a sheet in Excel creates a new sheet instead of appending

I am trying to use this code to append a dataframe to an existing sheet in Excel, but instead of appending the new data to it, it creates a new sheet. Here is the code:
import pandas as pd
import openpyxl as op
df = ['normal_dataframe']
with pd.ExcelWriter('test.xlsx', engine='openpyxl', mode='a') as writer:
df.to_excel(writer, sheet_name='Sheet1', header=False, index=False)
'test.xlsx' has a 'Sheet1', but when the file is appended, theres 2 sheets. 'Sheet1' and 'Sheet11'.
One approach with COM:
import win32com.client
xl = win32com.client.Dispatch("Excel.Application")
path = r'c:\Users\Alex20\Documents\test.xlsx'
wb = xl.Workbooks.Open(path)
ws = wb.Worksheets("Sheet1")
ws.Range("E9:F10").Value = [[9,9],[10,10]]
wb.Close(True)
xl.Quit()

Pandas write to different sheet

I have the following code:
import pandas
data = pandas.DataFrame(dataset)
writer = pandas.ExcelWriter("C:/adhoc/test.xlsx", engine='xlsxwriter')
data.to_excel(writer, sheet_name='Test')
writer.save()
I have two sheets, Sheet1 and Test. When I run the code it is deleting Sheet1 and just writing the data onto Test. What am I doing wrong here? Expected output I want is to not write anything on Sheet1 and have the data written to Test.
You need to use append as the file mode in the ExcelWriter. But append does not supported with the xlsxwriter.
To append you need to specify the engine as openpyxl
This will write the data to the Test sheet and leave the Sheet1 as it is.
import pandas
file_path = "C:/adhoc/test.xlsx"
data = pandas.DataFrame(dataset)
writer = pandas.ExcelWriter(file_path, engine='openpyxl', mode='a')
data.to_excel(writer, sheet_name='Test')
writer.save()
Alternatively, you can use context manager here:
import pandas
file_path = "C:/adhoc/test.xlsx"
data = pandas.DataFrame(dataset)
with pandas.ExcelWriter(file_path, engine='openpyxl', mode='a') as writer:
data.to_excel(writer, sheet_name='Test')

Categories

Resources