VBA inclusion when creating xls file with xlwt

VBA inclusion when creating xls file with xlwt - python

I tried searching for the answer to my question but most existing questions were mentioning editing an existing excel file. My question is, given that i have the VBA code that i would like to include in my excel file (it does not change/depend on what is in the excel file itself) is it possible to include it in the excel file created with xlwt? If so, how?
I feel like the answer to this would be obvious but with my searching i couldn't find anything relevant.
I am basically trying to avoid creating an excel file and then manually copy-pasting the code into the file (same applied to google sheets files)

Related

Adding the "isExportable" attribute to an excel xmlMaps.xml file

So for a previous question I asked how to add a custom made xmlMap to an excel file in python and I was "successful" by opening the xlsx file as an archive and extracting the file structure, followed by adding the xmlMaps.xml file to the structure and including it in the "rels". I can now open the excel file and see that the xml source map is attached, but I cannot export it. It mentions that I need to set the attribute "xmlmap.isExportable" to True, but I have no clue about how to do this, preferably using python.
All I have found on google is this: https://learn.microsoft.com/en-us/office/vba/api/excel.xmlmap.isexportable
My old question regarding the case: Adding XML Source to xlsx file in python
Any help is greatly appreciated
Best regards
Martin

Turns out I just needed to understand how a xlsx file works and how xml connections are stored and referenced throughout multiple sub files

Obtain textbox value inside shape from Excel in Python

I'm developing a function that allows users to upload .xls or .xlsx files to the server and save data from those files into a database.
I'm using openpyxl and xlrd libraries for reading data from Excel, but for some Excel files which contain text in textbook inside shapes, I'm currently unable to read those values.
I know maybe my question is a duplicate of this: Obtain textbox value from Excel in Python but the solution of the asker of that question is not a general solution.
Does any anyone know how to achieve this?

How can I update workbook links when using pd.read_excel()?

The question is pretty simple, actually.
I'm reading an Excel file using Pandas. When I open it using Office's Excel in my Desktop I'm prompted to Enable Content and then Update Links [that is, update values in those cells importing information from cells in other workbooks and xslx files], so it reads other files in some other folders.
While using pd.read_excel('filename') however that option is not available, and I'm afraid it's importing the data previously contained in the spreadsheet without updating it. Is there a workaround?

Embed CSV in Excel and import the data

I wrote a tool that extracts data from a large DB and outputs it to an Excel file along with (conditional) formatting to improve readability. For this I use Python with openpyxl on a Linux machine. It works great, but this package is rather slow for writing Excel.
It seems to be a lot quicker to dump the table as (compressed) csv, import that into Excel and apply formatting there using a macro/vba.
To automate the process I'd like to create an empty Excel file pre-loaded with the required VBA to do the formatting; a template. For every data dump, the data is embedded (compressed using deflate) into the Excel file and loaded into the Workbook upon opening the document (or using a "LOAD" button to circumvent macro related security things).
However, just adding some file into the Excel file raises an error when opened:
We found a problem with some content in 'Werkmap1_test_embed.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
Clicking Yes opens the file and shows some tracing information as XML:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<recoveryLog xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
<logFileName>Repair Result to Werkmap1_OLE_Word0.xml</logFileName>
<summary>Errors were detected in file '/Users/joostk/mnt/cluster/Werkmap1_OLE_Word.xlsx'</summary>
<additionalInfo>
<info>Excel completed file level validation and repair. Some parts of this workbook may have been repaired or discarded.</info>
</additionalInfo>
</recoveryLog>
Is it possible to avoid this? How would I embed a file into the Excel ZIP? Do I need to update some file table (which I could not file easily).
When that's done, I'd like to import the data. Can I access files in the Excel ZIP from VBA? I guess not, and I need to extract the data to some temporary path and load it from there.
I have found these helpful answers elsewhere to load ZIP and plain text:
https://stackoverflow.com/a/35781621/4998990
https://stackoverflow.com/a/11267603/4998990
Many thanks for sharing your thoughts!

so my "Answer" here is that this is caused by using Named Ranges, or an underlying table, or an embedded Query/Connection. When you start manipulating this file you will get the error that you are talking about:
There is no harm to the file if you click "yes" and open. Excel will open this in Repaired Mode which will require you to re-save the file.
The way I've worked around this is to re-read the "repaired" file, in python, and save it as another file or replace it. Essentially just do an extra step of re-reading the data into memory, and write it to a new file. The error will go away. As always, test this method before deploying to production to ensure no records are lost. The way I solve it is with two lines of pandas.
import pandas as pd
repair = pd.read_excel('PATH_TO_REPAIR_FILE')
new_file = repair.to_excel('PATH_TO_WHERE_NEW_FILE_GOES')

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!

You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.