Struggling to append dataframe to existing .xlsx file in Python - python

I am trying to append a dataframe to an existing excel spreadsheet, but I am having trouble appending it to an existing SHEET (my excel file only has one sheet, titled "Sheet1," that contains the existing dataset).
with pd.ExcelWriter(xlsx_path, mode="a", engine="openpyxl",sheet_name="Sheet1",if_sheet_exists="overlay") as writer:
transfer.to_excel(writer,header=None,index=False)
When I use the aforementioned code, when I open the existing spreadsheet, the new data from the dataframe I requested to be appended via the to_excel function appears in a separate sheet, entitled "Sheet 11." Can someone elucidate why this is occurring? How can I just get the new data from the dataframe to appear at the bottom of the existing spreadsheet in Sheet1?
Thanks!
Refer to notes written above.

I dont know why the data is appended to 'Sheet11', however 'sheet_name=' is not an attribute in ExcelWriter so you should get a warning about that. The attribute should be used with 'to_excel'.
You'll need to state what row to append from otherwise the new data will start from row 1 over-writting any existing data. You can get the max row for the sheet and use that.
sheet_to_update = 'Sheet1'
with pd.ExcelWriter(xlsx_path,
mode="a",
engine="openpyxl",
if_sheet_exists="overlay") as writer:
transfer.to_excel(writer,
header=None,
index=False,
sheet_name=sheet_to_update,
startrow=writer.sheets[sheet_to_update].max_row)

Related

How to export python dataframe into existing excel sheet and retain formatting?

I am trying to export a dataframe I've generated in Pandas to an Excel Workbook. I have been able to get that part working, but unfortunately no matter what I try, the dataframe goes into the workbook as a brand new worksheet.
What I am ultimately trying to do here is create a program that pulls API data from a website and imports it in an existing Excel sheet in order to create some sort of "live updating excel workbook". This means that the worksheet already has proper formatting, vba, and other calculated columns applied, and all of this would ideally stay the same except for the basic data in the dataframe I'm importing.
Anyway to go about this? Any direction at all would be quite helpful. Thanks.
Here is my current code:
file='testbook.xlsx'
writer = pd.ExcelWriter(file, engine = 'xlsxwriter')
df.to_excel(writer, sheet_name="Sheet1")
workbook = writer.book
worksheet = writer.sheets["Sheet1")
writer.save
In case u have both existing excel file and DataFrame in same format then you can simply import your exiting excel file into another DataFrame and concat both the DataFrames then save into new excel or existing one.
df1["df"] = pd.read_excel('testbook.xlsx')
df2["df"] = 1#your dataFrame
df = pd.concat([df1, df2])
df.to_excel('testbook.xlsx')
There are multiple ways of doing it if you want to do it completely using pandas library this will work.

Overwrite existing Excel data on existing sheet using Pandas dataframes?

Hope you can help me out. I have been searching for already a long time but cannot get this working.
I have defined a dataset using Pandas from an Excel, made some changes, and now I want to write the updated data back to the same Excel.
My understanding is that pd.ExcelWriter should be able to do this according the documentation. Also, I want to have the dataset written starting from specific rows and columns position. Leaving the rest of the Excel sheets intact.
The problem I have is that the code writes the dataset to Excel on a new blank sheet, instead of the specified sheetname: "SheetX". The new blank sheet is called "SheetX1".
I have searched Google and also many similar topics on this website, but I cannot find a solution that works.
In summary: I want to overwrite an existing Excel workbook in an existing worksheet, overwriting the data based on the specified starting and rows and columns.
Many thanks in advance if you can help me out with this one.
with pd.ExcelWriter("Excel1.xlsx", engine="openpyxl", mode = "a") as writer:
df1.to_excel(writer, sheet_name="SheetX", startrow=5, startcol=8)
Please let me know if you need anymore clarification on this. Happy to answer.
You could do it like this :
with pd.ExcelWriter("Excel1.xlsx", engine="openpyxl", mode = "a", if_sheet_exists = 'replace') as writer:
df1.to_excel(writer, sheet_name="SheetX", startrow=5, startcol=8)

Writing two worksheets into a new excel file

I am trying to to find out all the duplicates using pandas and I have managed to do so. However, there are multiple worksheets in the excel and I would like repeat the process for all the worksheets and the final excel should be all the new data without duplicates and they are placed on their individual worksheet. I am currently stuck as my code will only loop through and the result will only the last looped worksheet. Hence, it would be great if anyone can enlighten me on this issue. The below is my code:
final_audited_filepath = '<file_path>\\test12.xlsx'
x1=pd.ExcelFile(final_audited_filepath)
writer=pd.ExcelWriter("<file_path>\\test123.xlsx")
for sheet in x1.sheet_names: #scan for the number of worksheets in the excel
data=pd.read_excel(final_audited_filepath, sheet_name=sheet)
data_first_record= data.drop_duplicates(subset=['Reference ID','Check Description'], keep="first")
data_first_record.to_excel(writer, index=False, sheet_name=sheet)
To write more than one sheet in the workbook or excel file using Pandas, you need to use an ExcelWriter object,
with pd.ExcelWriter('output.xlsx') as f:
df1.to_excel(f, sheet_name='sheet 1')
df2.to_excel(f, sheet_name='sheet 2')
After dropping the duplicates, create separate data frames df1, df2 and so on.. for different worksheets and try the code above.
Click here to check Pandas to_excel Documentation Link

using pandas.DataFrame.to_excel without creating a new sheet

I would like to know if there is a way to paste a dataframe in excel using pandas.DataFrame.to_excel and keep the data in an existing sheet without erasing the existing data in there. I need to do that because there are many dataframes being created from a loop, so I need to storage this information together appending the content. As follows some code:
lista=[1,2,3,4,5,6]
with pd.ExcelWriter('teste.xlsx', mode='a') as writer:
pd.DataFrame(lista).to_excel(writer, header=False, index=False, startrow=3, startcol=1,
sheet_name="Hoja1")
The result is very annoying because even though I pass "Hoja1" as the sheet_name it ends up creating another sheet with a similar name.

unable to append pandas Dataframe to existing excel sheet

I am quite new to Python/Pandas. I have a situation where I have to update an existing sheet with new data every week. this 'new' data is basically a processed data from raw csv files which are generated every week and I have already written a python code to generate this 'new' data which is basically a pandas Dataframe in my code. Now I want to append this Dataframe object to an existing sheet in my excel workbook. I am already using the below code to write the DF to the XL Workbook into a specific sheet.
workbook_master=openpyxl.load_workbook('C:\Claro\Pre-Sales\E2E Optimization\Transport\Transport Network Dashboard.xlsx')
writer=pandas.ExcelWriter('C:\Claro\Pre-Sales\E2E Optimization\Transport\Transport Network Dashboard.xlsx',engine='openpyxl',mode='a')
df_latency.to_excel(writer,sheet_name='Latency',startrow=workbook_master['Latency'].max_row,startcol=0,header=False,index=False)
writer.save()
writer.close()
now the problem is when i run the code and open the excel file, instead of writing the dataframe to existing sheet 'Latency', the code creates a new sheet 'Latency1' and writes the Dataframe to it. the contents and the positioning of the Dataframe is correct but I do not understand why the code is creating a new sheet 'Latency1' instead of writing the Dataframe into existing sheet 'Latency'
will greatly appreciate any help here.
Thanks
Faheem
By default, when ExcelWriter is instantiated, it assumes a new Empty Workbook with no Worksheets.
So when you try to write data into 'Latency', it creates a new blank Worksheet instead. In addition, the openpxyl library performs a check before writing to "avoid duplicate names" (see openpxyl docs : line 18), which numerically increment the sheet name to write to 'Latency1' instead.
To go around this problem, copy the existing Worksheets into the ExcelWriter.sheets attribute, after writer is created.
Like this:
writer.sheets = dict((ws.title, ws) for ws in workbook_master.worksheets)

Categories

Resources