Overwrite existing Excel data on existing sheet using Pandas dataframes?

Overwrite existing Excel data on existing sheet using Pandas dataframes? - python

Hope you can help me out. I have been searching for already a long time but cannot get this working.
I have defined a dataset using Pandas from an Excel, made some changes, and now I want to write the updated data back to the same Excel.
My understanding is that pd.ExcelWriter should be able to do this according the documentation. Also, I want to have the dataset written starting from specific rows and columns position. Leaving the rest of the Excel sheets intact.
The problem I have is that the code writes the dataset to Excel on a new blank sheet, instead of the specified sheetname: "SheetX". The new blank sheet is called "SheetX1".
I have searched Google and also many similar topics on this website, but I cannot find a solution that works.
In summary: I want to overwrite an existing Excel workbook in an existing worksheet, overwriting the data based on the specified starting and rows and columns.
Many thanks in advance if you can help me out with this one.
with pd.ExcelWriter("Excel1.xlsx", engine="openpyxl", mode = "a") as writer:
df1.to_excel(writer, sheet_name="SheetX", startrow=5, startcol=8)
Please let me know if you need anymore clarification on this. Happy to answer.

You could do it like this :
with pd.ExcelWriter("Excel1.xlsx", engine="openpyxl", mode = "a", if_sheet_exists = 'replace') as writer:
df1.to_excel(writer, sheet_name="SheetX", startrow=5, startcol=8)

Related

Struggling to append dataframe to existing .xlsx file in Python

I am trying to append a dataframe to an existing excel spreadsheet, but I am having trouble appending it to an existing SHEET (my excel file only has one sheet, titled "Sheet1," that contains the existing dataset).
with pd.ExcelWriter(xlsx_path, mode="a", engine="openpyxl",sheet_name="Sheet1",if_sheet_exists="overlay") as writer:
transfer.to_excel(writer,header=None,index=False)
When I use the aforementioned code, when I open the existing spreadsheet, the new data from the dataframe I requested to be appended via the to_excel function appears in a separate sheet, entitled "Sheet 11." Can someone elucidate why this is occurring? How can I just get the new data from the dataframe to appear at the bottom of the existing spreadsheet in Sheet1?
Thanks!
Refer to notes written above.

I dont know why the data is appended to 'Sheet11', however 'sheet_name=' is not an attribute in ExcelWriter so you should get a warning about that. The attribute should be used with 'to_excel'.
You'll need to state what row to append from otherwise the new data will start from row 1 over-writting any existing data. You can get the max row for the sheet and use that.
sheet_to_update = 'Sheet1'
with pd.ExcelWriter(xlsx_path,
mode="a",
engine="openpyxl",
if_sheet_exists="overlay") as writer:
transfer.to_excel(writer,
header=None,
index=False,
sheet_name=sheet_to_update,
startrow=writer.sheets[sheet_to_update].max_row)

How to export python dataframe into existing excel sheet and retain formatting?

I am trying to export a dataframe I've generated in Pandas to an Excel Workbook. I have been able to get that part working, but unfortunately no matter what I try, the dataframe goes into the workbook as a brand new worksheet.
What I am ultimately trying to do here is create a program that pulls API data from a website and imports it in an existing Excel sheet in order to create some sort of "live updating excel workbook". This means that the worksheet already has proper formatting, vba, and other calculated columns applied, and all of this would ideally stay the same except for the basic data in the dataframe I'm importing.
Anyway to go about this? Any direction at all would be quite helpful. Thanks.
Here is my current code:
file='testbook.xlsx'
writer = pd.ExcelWriter(file, engine = 'xlsxwriter')
df.to_excel(writer, sheet_name="Sheet1")
workbook = writer.book
worksheet = writer.sheets["Sheet1")
writer.save

In case u have both existing excel file and DataFrame in same format then you can simply import your exiting excel file into another DataFrame and concat both the DataFrames then save into new excel or existing one.
df1["df"] = pd.read_excel('testbook.xlsx')
df2["df"] = 1#your dataFrame
df = pd.concat([df1, df2])
df.to_excel('testbook.xlsx')
There are multiple ways of doing it if you want to do it completely using pandas library this will work.

XlsxWriter: Generate multi-worksheet workbook from separate python files?

I am writing a python script using XlsxWriter to generate an .xlsx file comprising of multiple worksheets. Each worksheet will have multiple tables and lots of formatting - hence my code is getting pretty long. Therefore, I am looking for a way to split the code up, eg. Worksheet 1 corresponding to worksheet1.py, with a 'main' file to compile the worksheets into a single workbook.
I have tried using a function to create a worksheet and calling that from another file to add to an existing workbook - but this method does not work. XlsxWriter requires you to add the worksheet to an existing workbook. (If I'm missing something and this is possible please let me know).
Alternately, I thought of creating individual workbooks with a single worksheet inside and using a second package (openpyxl) to collate the worksheets. However, I think this will alter the formatting on the worksheets. (Again, please let me know if I am missing something).
Any ideas on this subject would be greatly received
Thanks
Edit: example table
example table

Pandas will actually be very helpful in this case.
you can first create writer for your excel file
writer = pd.ExcelWriter('test.xlsx',engine='xlsxwriter')
create you tables are dataframe, check here about dataframes basics
df.to_excel(writer,sheet_name='Sheet 1',startrow=0 , startcol=0)
place that table easily into any excel sheet(workbook) you want just provide the name as argument.
put another table in same sheet
df_1.to_excel(writer,sheet_name='Sheet 1',startrow=20 , startcol=0)
change the row from where you want to start the table, or change the sheet name

unable to append pandas Dataframe to existing excel sheet

I am quite new to Python/Pandas. I have a situation where I have to update an existing sheet with new data every week. this 'new' data is basically a processed data from raw csv files which are generated every week and I have already written a python code to generate this 'new' data which is basically a pandas Dataframe in my code. Now I want to append this Dataframe object to an existing sheet in my excel workbook. I am already using the below code to write the DF to the XL Workbook into a specific sheet.
workbook_master=openpyxl.load_workbook('C:\Claro\Pre-Sales\E2E Optimization\Transport\Transport Network Dashboard.xlsx')
writer=pandas.ExcelWriter('C:\Claro\Pre-Sales\E2E Optimization\Transport\Transport Network Dashboard.xlsx',engine='openpyxl',mode='a')
df_latency.to_excel(writer,sheet_name='Latency',startrow=workbook_master['Latency'].max_row,startcol=0,header=False,index=False)
writer.save()
writer.close()
now the problem is when i run the code and open the excel file, instead of writing the dataframe to existing sheet 'Latency', the code creates a new sheet 'Latency1' and writes the Dataframe to it. the contents and the positioning of the Dataframe is correct but I do not understand why the code is creating a new sheet 'Latency1' instead of writing the Dataframe into existing sheet 'Latency'
will greatly appreciate any help here.
Thanks
Faheem

By default, when ExcelWriter is instantiated, it assumes a new Empty Workbook with no Worksheets.
So when you try to write data into 'Latency', it creates a new blank Worksheet instead. In addition, the openpxyl library performs a check before writing to "avoid duplicate names" (see openpxyl docs : line 18), which numerically increment the sheet name to write to 'Latency1' instead.
To go around this problem, copy the existing Worksheets into the ExcelWriter.sheets attribute, after writer is created.
Like this:
writer.sheets = dict((ws.title, ws) for ws in workbook_master.worksheets)

python : Get Active Sheet in xlrd? and help for reading and validating excel file in Python

2 Questions to ask:
Ques 1:
I just started studying about xlrd for reading excel file in python.
I was wondering if there is a method in xlsrd --> similar to get_active_sheet() in openpyxl or any other way to get the Active sheet ?
get_active_sheet() works this in openpyxl
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
active_sheet = wb.get_active_sheet()
output : Worksheet "Sheet1"
I had found methods in xlrd for retrieving the names of sheets, but none of them could tell me the active sheet.
Ques 2:
Is xlrd the best packaage in python for reading excel files? I also came across this which had info about other python packages(xlsxwriterxlwtxlutils) for reading and writing excel files.
Which of the above all will be best for making an App which reads an Excel File and applies different validations to to different columns
For eg: Column with Header 'ID' should have unique values and A column with Header 'Country' should have valid Countries.

The "active sheet" here seems you're referring to the last sheet selected when the workbook was saved/closed. You can get this sheet via the sheet_visible value.
import xlrd
xl = xlrd.open_workbook("example.xls")
for sht in xl.sheets():
# sht.sheet_visible value of 1 is "active sheet"
print(sht.name, sht.sheet_selected, sht.sheet_visible)
Usually only one sheet is selected at a time, so it may look like sheet_visible and sheet_selected are the same, but multiple sheets can be selected at a time (ctrl+click multiple sheet tabs, for example).
Another reason this may seem confusing is because Excel uses "visible" in terms of hidden/visible sheets. In xlrd, this is instead sheet.visibility (see https://stackoverflow.com/a/44583134/4258124)

Welcome to Stack Overflow.
I have been working with Excel files in Python for a while now, so I could help you with your question, I think.
openpyxl and xlrd solve different problems, one is for xlsx files (Excel 2007+), where the other one is for xls files (Excel 1997-2003), respectively.
Xenon said in his answer that Excel doesn't recognize the concept of an active sheet, which is not totally true. If you open an Excel document, go to some other sheet (that isn't the first one) and save and close the document, the next time you open it, Excel will open the document on the last sheet you were on.
However, xlrd does not support this kind of workflow, i.e. asking for the active sheet. If you know the sheet name, then you could use the method sheet_by_name, or if you know the sheet index, you could use the method sheet_by_index.
I don't know if the xlrd is the best package around, but it is pretty solid, and I have had nary a problem using it.
The example given could be solved by first iterating through the first row and keeping a dictionary of which column a header is. Then storing all the values in the ID column in a list and comparing the length of that list with the length of a set created from that list, i.e. len(values) == len(set(values)). Following that, you could iterate through the column with header of Country and check each value if it is in a dictionary you previously made with all the valid counties.
I hope this answer suits your needs.
Summary: Stick with xlrd because is mature enough.

You can see all worksheets in a given workbook with the sheet_names() function. Excel has no concept of an "active sheet", but if my assumption that you are referring to the first sheet is correct, you can get the first element of sheet_names() to get the "active sheet."
With regards to your second question, it's not easy to say that a package is better than another package objectively. However, xlrd is widely used, and the most popular Python library for what it does.
I would recommend sticking with it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.