How can I update workbook links when using pd.read_excel()? - python

The question is pretty simple, actually.
I'm reading an Excel file using Pandas. When I open it using Office's Excel in my Desktop I'm prompted to Enable Content and then Update Links [that is, update values in those cells importing information from cells in other workbooks and xslx files], so it reads other files in some other folders.
While using pd.read_excel('filename') however that option is not available, and I'm afraid it's importing the data previously contained in the spreadsheet without updating it. Is there a workaround?

Related

Python: Issue Updating Data Links In Excel After Python Dataframe Export

Situation
I'm working on a data project integrating python in Google Colab and Excel 365 on Win 8.1. My python code collects new data updates on a regimented schedule and then exports/writes (e.g. overwrites, not appends the data) like to a report on an Excel spreadsheet.
I have no issue getting this to work going to a standalone spreadsheet.
I know I could potentially do all this in Python and not use Excel at all, but I prefer not to reinvent the wheel and not spend hours hardcoding all the formulas and links already existing in Excel.
Goal
My goal is to:
1. Use new data from my python export to populate/overwrite a data table on Sheet A in an existing Excel workbook.
2. Then I have a separate Sheet B in the same Excel workbook performing calculations via pre-existing links connecting to the original data table on Sheet A. I then want the links to auto update each time my python export updates the data table on the first sheet.
Problem
The issues I am running into are that if I use the df.to_excel function to export the data and even if I use the spreadsheet name parameter, the export overrides the data table and names the tab okay, but wipes out any other pre-existing sheets within the same workbook.
So I attempted a work around by exporting to an external workbook and then trying to update the links in the second workbook automatically. Problem is the links don't appear to update without the source data file and the second workbook with the links both being manually opened and then the updated file manually saved.
I tried using openpyxl to control the excel files but it appeared to have no effect on the files and no data was updated. (See code block and result at the end of this post.)
Assistance
Does anybody know a way to use python to:
1. Overwrite a specific sheet within an Excel workbook without wiping out the other existing sheets? And then have the links on another sheet automatically update which are connected to the new data?
Or
2. Auto update external links between separate Excel workbooks while the files are unopened?
Or
3. Control an instance of excel that can open both files to allow the links to auto update and then save and close the files automatically?
I found a post from some years ago that identified a win32 package for python that appeared to be able to control instances of excel. When I try doing a pip install in Colab I got an error that the package was unrecognized or doesn't exist.
Ideally, I would prefer not to use VB if at all possible to solve this.
Any solutions are much appreciated.
Thanks in advance.
Sample Code that isn't producing any results:
import openpyxl
# Example code
from openpyxl import load_workbook
from openpyxl import Workbook
wb = load_workbook('/content/drive/MyDrive/Data/Series/AC5M.xlsx', keep_links=True)
ws = wb.active
Workbook.save
Workbook.close
print(ws)
Result:
"function openpyxl.workbook.workbook.Workbook.close"

Obtain textbox value inside shape from Excel in Python

I'm developing a function that allows users to upload .xls or .xlsx files to the server and save data from those files into a database.
I'm using openpyxl and xlrd libraries for reading data from Excel, but for some Excel files which contain text in textbook inside shapes, I'm currently unable to read those values.
I know maybe my question is a duplicate of this: Obtain textbox value from Excel in Python but the solution of the asker of that question is not a general solution.
Does any anyone know how to achieve this?

How to append dataframe to xlsx file without loading workbook?

I'm working with slightly big data and i need to write this data to an xlsx file. Sometimes the size of this files can be 15GB. I have a python code that gets data as dataframes and writes data to excel continuously so i need to write data to an existing excel and the existing sheet. I was using 'openpyxl'.
There are two problems that I faced while working with that library.
Firstly to append an existing excel it needs to load workbook which is an impossible thing for me because of the data size. I must use
the lowest RAM I can use. -
Secondly this lib is useful only writing
to the different sheets. When I'm trying to write data to same sheet
even if I give the 'startrow' for the saving process it deletes the
old data and writes new one starting from that row.
I already tried the solution available here to address my problem but it doesn't fit my requirements.
Do you have any idea how I can do this?.

openpyxl corrupts spreadsheet if it contains a data source

I use openpyxl to interact with Excel files using Python 3.7. I open and save my .xlsx spreadsheets as follows:
from openpyxl import load_workbook
wb.load_workbook('file.xlsx', read_only=False)
wb.save('file.xlsx')
If file.xlsx contains no links to external data sources (such as SQL Server or Postgre-SQL), then there is no problem with the saved file and it opens okay in Excel after being processed by my Python script.
However, if file.xlsx does contain a link to external data, then upon executing the above script, the output file is now corrupted. When opening the file in Excel, the following error is reported and I have the option of attempting to recover it. When recovering, the data remains but all links to the data source are gone.
> We found a problem with some content in file.xlsx. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
It is easy to reproduce this error as follows:
Create a blank spreadsheet and save it as file.xlsx.
Run the above three lines of Python code to open and save the file. You will see this works fine and has no impact on the spreadsheet.
Now open file.xlsx in Excel and, from the Data tab, choose a data source. You can choose any data source (link to a csv file, a table within Excel, or an external data source - it doesn't matter).
Save the spreadsheet, then run the above Python script (which again, simply opens and saves it).
Open file.xlsx in Excel. You will see that it is now corrupted.
My conclusion is that, at the moment, openpyxl doesn't support spreadsheets that contain links to external data. It would be useful to have this confirmed, or for a workaround to the above issue to be proposed.
Thanks!!

save and close all currently open excel sheets (python)

I have another script that opens a bunch of excel sheets and exports a bunch of data to them. However, it is incapable of saving those documents automagically. Is there a way in python to grab all the currently open excel sheets, save them, and then close them?
What libraries are you using?
you can use
for each wb in xl.workbooks:
wb.close(true)

Categories

Resources