XlsxWriter: Generate multi-worksheet workbook from separate python files? - python

I am writing a python script using XlsxWriter to generate an .xlsx file comprising of multiple worksheets. Each worksheet will have multiple tables and lots of formatting - hence my code is getting pretty long. Therefore, I am looking for a way to split the code up, eg. Worksheet 1 corresponding to worksheet1.py, with a 'main' file to compile the worksheets into a single workbook.
I have tried using a function to create a worksheet and calling that from another file to add to an existing workbook - but this method does not work. XlsxWriter requires you to add the worksheet to an existing workbook. (If I'm missing something and this is possible please let me know).
Alternately, I thought of creating individual workbooks with a single worksheet inside and using a second package (openpyxl) to collate the worksheets. However, I think this will alter the formatting on the worksheets. (Again, please let me know if I am missing something).
Any ideas on this subject would be greatly received
Thanks
Edit: example table
example table

Pandas will actually be very helpful in this case.
you can first create writer for your excel file
writer = pd.ExcelWriter('test.xlsx',engine='xlsxwriter')
create you tables are dataframe, check here about dataframes basics
df.to_excel(writer,sheet_name='Sheet 1',startrow=0 , startcol=0)
place that table easily into any excel sheet(workbook) you want just provide the name as argument.
put another table in same sheet
df_1.to_excel(writer,sheet_name='Sheet 1',startrow=20 , startcol=0)
change the row from where you want to start the table, or change the sheet name

Related

How to export python dataframe into existing excel sheet and retain formatting?

I am trying to export a dataframe I've generated in Pandas to an Excel Workbook. I have been able to get that part working, but unfortunately no matter what I try, the dataframe goes into the workbook as a brand new worksheet.
What I am ultimately trying to do here is create a program that pulls API data from a website and imports it in an existing Excel sheet in order to create some sort of "live updating excel workbook". This means that the worksheet already has proper formatting, vba, and other calculated columns applied, and all of this would ideally stay the same except for the basic data in the dataframe I'm importing.
Anyway to go about this? Any direction at all would be quite helpful. Thanks.
Here is my current code:
file='testbook.xlsx'
writer = pd.ExcelWriter(file, engine = 'xlsxwriter')
df.to_excel(writer, sheet_name="Sheet1")
workbook = writer.book
worksheet = writer.sheets["Sheet1")
writer.save
In case u have both existing excel file and DataFrame in same format then you can simply import your exiting excel file into another DataFrame and concat both the DataFrames then save into new excel or existing one.
df1["df"] = pd.read_excel('testbook.xlsx')
df2["df"] = 1#your dataFrame
df = pd.concat([df1, df2])
df.to_excel('testbook.xlsx')
There are multiple ways of doing it if you want to do it completely using pandas library this will work.

How to copy one excel sheet template into multiple excel files using python

i am trying to copy an excel spreadsheet which has some text and numbers and a logo image into multiple excel files as a new sheet with source formatting using python, any help is greatly appreciated.
It's not clear what your end requirement and restrictions are.
The basic requirement;
"I am trying to copy an excel spreadsheet which has some text and numbers and a logo image into multiple excel files"
just indicates you want multiple copies of the same Excel file, the File Management app of your OS can do this, with the limitation perhaps being the resultant naming of each file. If this is your requirement then perhaps something like
Python - making copies of a file
may help to create the files with the necessary naming.
If its a workbook with multiple sheets and you only want one sheet copied to new workbooks then openpyxl can help. Copying sheets is easy enough however the formatting requires extra code.
If there is just a couple of sheets in the original it may be easier to just remove those sheets before saving your copy, see example code below.
Otherwise this link may help
How to copy worksheet from one workbook to another one using openpyxl?
Example:
The following code shows how to open an existing workbook, 'templateA.xlsx' which has two sheets, 'Sheet1 & 'Sheet2'. You only want 'Sheet1' saved as multiple copies (10) named 'template1.xlsx', 'template2.xlsx', 'template3.xlsx'...
The code open the original workook, deletes the sheet called 'Sheet2' before making 10 copies.
from openpyxl import load_workbook
# Load the original Excel workbook
wb = load_workbook("templateA.xlsx")
# delete the Sheet2 that is not required
del wb["Sheet2"]
# Save 10 copies of the workbook with Sheet1 only
for i in range(10):
wb.save("template" + str(i) + ".xlsx")
The first link python-making-copies-of-a-file can help if your required naming is more complex than the example given.
openpyxl

Is it possible to append data to an xls file in Python?

I am trying to add a large dataset to an existing xls spreadsheet.
I'm currently writing to it using a pandas dataframe and the .to_excel() function, however this erases the existing data in the (multi-sheet) workbook. The existing spreadsheet is very large and complex,it also interacts with several other files, so I can't convert it to xlsx or read and rewrite all of the data, as I've seen some suggestions on other questions. I want the data that I am adding to be pasted starting from a set row in an existing sheet.
Yes , you can use the library xlsxwriter , link= https://xlsxwriter.readthedocs.io
code example :
import xlsxwriter
Name="MyFile"+".xlsx"
workbook = xlsxwriter.Workbook(Name)
worksheet = workbook.add_worksheet()
worksheet.write("A1", "Incident category".decode("utf-8"))
worksheet.write("B1", "Longitude".decode("utf-8"))
worksheet.write("C1", "Latitude".decode("utf-8"))
workbook.close()

unable to append pandas Dataframe to existing excel sheet

I am quite new to Python/Pandas. I have a situation where I have to update an existing sheet with new data every week. this 'new' data is basically a processed data from raw csv files which are generated every week and I have already written a python code to generate this 'new' data which is basically a pandas Dataframe in my code. Now I want to append this Dataframe object to an existing sheet in my excel workbook. I am already using the below code to write the DF to the XL Workbook into a specific sheet.
workbook_master=openpyxl.load_workbook('C:\Claro\Pre-Sales\E2E Optimization\Transport\Transport Network Dashboard.xlsx')
writer=pandas.ExcelWriter('C:\Claro\Pre-Sales\E2E Optimization\Transport\Transport Network Dashboard.xlsx',engine='openpyxl',mode='a')
df_latency.to_excel(writer,sheet_name='Latency',startrow=workbook_master['Latency'].max_row,startcol=0,header=False,index=False)
writer.save()
writer.close()
now the problem is when i run the code and open the excel file, instead of writing the dataframe to existing sheet 'Latency', the code creates a new sheet 'Latency1' and writes the Dataframe to it. the contents and the positioning of the Dataframe is correct but I do not understand why the code is creating a new sheet 'Latency1' instead of writing the Dataframe into existing sheet 'Latency'
will greatly appreciate any help here.
Thanks
Faheem
By default, when ExcelWriter is instantiated, it assumes a new Empty Workbook with no Worksheets.
So when you try to write data into 'Latency', it creates a new blank Worksheet instead. In addition, the openpxyl library performs a check before writing to "avoid duplicate names" (see openpxyl docs : line 18), which numerically increment the sheet name to write to 'Latency1' instead.
To go around this problem, copy the existing Worksheets into the ExcelWriter.sheets attribute, after writer is created.
Like this:
writer.sheets = dict((ws.title, ws) for ws in workbook_master.worksheets)

python : Get Active Sheet in xlrd? and help for reading and validating excel file in Python

2 Questions to ask:
Ques 1:
I just started studying about xlrd for reading excel file in python.
I was wondering if there is a method in xlsrd --> similar to get_active_sheet() in openpyxl or any other way to get the Active sheet ?
get_active_sheet() works this in openpyxl
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
active_sheet = wb.get_active_sheet()
output : Worksheet "Sheet1"
I had found methods in xlrd for retrieving the names of sheets, but none of them could tell me the active sheet.
Ques 2:
Is xlrd the best packaage in python for reading excel files? I also came across this which had info about other python packages(xlsxwriterxlwtxlutils) for reading and writing excel files.
Which of the above all will be best for making an App which reads an Excel File and applies different validations to to different columns
For eg: Column with Header 'ID' should have unique values and A column with Header 'Country' should have valid Countries.
The "active sheet" here seems you're referring to the last sheet selected when the workbook was saved/closed. You can get this sheet via the sheet_visible value.
import xlrd
xl = xlrd.open_workbook("example.xls")
for sht in xl.sheets():
# sht.sheet_visible value of 1 is "active sheet"
print(sht.name, sht.sheet_selected, sht.sheet_visible)
Usually only one sheet is selected at a time, so it may look like sheet_visible and sheet_selected are the same, but multiple sheets can be selected at a time (ctrl+click multiple sheet tabs, for example).
Another reason this may seem confusing is because Excel uses "visible" in terms of hidden/visible sheets. In xlrd, this is instead sheet.visibility (see https://stackoverflow.com/a/44583134/4258124)
Welcome to Stack Overflow.
I have been working with Excel files in Python for a while now, so I could help you with your question, I think.
openpyxl and xlrd solve different problems, one is for xlsx files (Excel 2007+), where the other one is for xls files (Excel 1997-2003), respectively.
Xenon said in his answer that Excel doesn't recognize the concept of an active sheet, which is not totally true. If you open an Excel document, go to some other sheet (that isn't the first one) and save and close the document, the next time you open it, Excel will open the document on the last sheet you were on.
However, xlrd does not support this kind of workflow, i.e. asking for the active sheet. If you know the sheet name, then you could use the method sheet_by_name, or if you know the sheet index, you could use the method sheet_by_index.
I don't know if the xlrd is the best package around, but it is pretty solid, and I have had nary a problem using it.
The example given could be solved by first iterating through the first row and keeping a dictionary of which column a header is. Then storing all the values in the ID column in a list and comparing the length of that list with the length of a set created from that list, i.e. len(values) == len(set(values)). Following that, you could iterate through the column with header of Country and check each value if it is in a dictionary you previously made with all the valid counties.
I hope this answer suits your needs.
Summary: Stick with xlrd because is mature enough.
You can see all worksheets in a given workbook with the sheet_names() function. Excel has no concept of an "active sheet", but if my assumption that you are referring to the first sheet is correct, you can get the first element of sheet_names() to get the "active sheet."
With regards to your second question, it's not easy to say that a package is better than another package objectively. However, xlrd is widely used, and the most popular Python library for what it does.
I would recommend sticking with it.

Categories

Resources