Situation
I'm working on a data project integrating python in Google Colab and Excel 365 on Win 8.1. My python code collects new data updates on a regimented schedule and then exports/writes (e.g. overwrites, not appends the data) like to a report on an Excel spreadsheet.
I have no issue getting this to work going to a standalone spreadsheet.
I know I could potentially do all this in Python and not use Excel at all, but I prefer not to reinvent the wheel and not spend hours hardcoding all the formulas and links already existing in Excel.
Goal
My goal is to:
1. Use new data from my python export to populate/overwrite a data table on Sheet A in an existing Excel workbook.
2. Then I have a separate Sheet B in the same Excel workbook performing calculations via pre-existing links connecting to the original data table on Sheet A. I then want the links to auto update each time my python export updates the data table on the first sheet.
Problem
The issues I am running into are that if I use the df.to_excel function to export the data and even if I use the spreadsheet name parameter, the export overrides the data table and names the tab okay, but wipes out any other pre-existing sheets within the same workbook.
So I attempted a work around by exporting to an external workbook and then trying to update the links in the second workbook automatically. Problem is the links don't appear to update without the source data file and the second workbook with the links both being manually opened and then the updated file manually saved.
I tried using openpyxl to control the excel files but it appeared to have no effect on the files and no data was updated. (See code block and result at the end of this post.)
Assistance
Does anybody know a way to use python to:
1. Overwrite a specific sheet within an Excel workbook without wiping out the other existing sheets? And then have the links on another sheet automatically update which are connected to the new data?
Or
2. Auto update external links between separate Excel workbooks while the files are unopened?
Or
3. Control an instance of excel that can open both files to allow the links to auto update and then save and close the files automatically?
I found a post from some years ago that identified a win32 package for python that appeared to be able to control instances of excel. When I try doing a pip install in Colab I got an error that the package was unrecognized or doesn't exist.
Ideally, I would prefer not to use VB if at all possible to solve this.
Any solutions are much appreciated.
Thanks in advance.
Sample Code that isn't producing any results:
import openpyxl
# Example code
from openpyxl import load_workbook
from openpyxl import Workbook
wb = load_workbook('/content/drive/MyDrive/Data/Series/AC5M.xlsx', keep_links=True)
ws = wb.active
Workbook.save
Workbook.close
print(ws)
Result:
"function openpyxl.workbook.workbook.Workbook.close"
Related
Stackoverflow
Hi python noob here.
I been learning python for couple of weeks so I don’t know if this is possible or even super easy.
I have an excel files with o lot of sheets, I have managed to create a python code that do all the changes I need to make on a single sheet and then saves it.
I just need to type the sheet name run it then type another sheet name and run it again etc …
Is there a way to make the code so it does all the sheets one by one when I run it?
I was thinking about creating a list with all the sheets name and using a loop but not sure how… thank you.
Use openpyxl module.
You have the list of sheets in openpyxl object, so you can run over sheets as by list.
You could find the solutions of similar problem here:
getting sheet names from openpyxl
Using Python 3.7. I have several .xlsx workbooks with 34 sheets each, most of which have conditional formatting and charts, but all I'm actually after is a cell with specified text that's somewhere on the first sheet of each book. The workbook is not protected but the sheet is, and I don't know the password, so I can't use pandas.read_excel; using openpyxl/load_workbook, it takes ages to load and I get lots of errors about it not being able to handle conditional formatting etc. I then have to search the sheet for the text.
Is there an easy, quick way of loading just the first sheet (or a named sheet)? The pandas code is very quick and easy, but I can't use it :(
Not completely sure about that but I can recommend trying "read-only" mode from openpyxl
https://openpyxl.readthedocs.io/en/stable/optimized.html
It does not fetch the full file but read it in so-called "lazy" mode. Thus you can jump to the cell you need.
It also allows to start reading from the specific sheet
Note that closing file is mandatory
I use openpyxl to interact with Excel files using Python 3.7. I open and save my .xlsx spreadsheets as follows:
from openpyxl import load_workbook
wb.load_workbook('file.xlsx', read_only=False)
wb.save('file.xlsx')
If file.xlsx contains no links to external data sources (such as SQL Server or Postgre-SQL), then there is no problem with the saved file and it opens okay in Excel after being processed by my Python script.
However, if file.xlsx does contain a link to external data, then upon executing the above script, the output file is now corrupted. When opening the file in Excel, the following error is reported and I have the option of attempting to recover it. When recovering, the data remains but all links to the data source are gone.
> We found a problem with some content in file.xlsx. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.
It is easy to reproduce this error as follows:
Create a blank spreadsheet and save it as file.xlsx.
Run the above three lines of Python code to open and save the file. You will see this works fine and has no impact on the spreadsheet.
Now open file.xlsx in Excel and, from the Data tab, choose a data source. You can choose any data source (link to a csv file, a table within Excel, or an external data source - it doesn't matter).
Save the spreadsheet, then run the above Python script (which again, simply opens and saves it).
Open file.xlsx in Excel. You will see that it is now corrupted.
My conclusion is that, at the moment, openpyxl doesn't support spreadsheets that contain links to external data. It would be useful to have this confirmed, or for a workaround to the above issue to be proposed.
Thanks!!
The question is pretty simple, actually.
I'm reading an Excel file using Pandas. When I open it using Office's Excel in my Desktop I'm prompted to Enable Content and then Update Links [that is, update values in those cells importing information from cells in other workbooks and xslx files], so it reads other files in some other folders.
While using pd.read_excel('filename') however that option is not available, and I'm afraid it's importing the data previously contained in the spreadsheet without updating it. Is there a workaround?
Is there a way to update a spreadsheet in real time while it is open in Excel? I have a workbook called Example.xlsx which is open in Excel and I have the following python code which tries to update cell B1 with the string 'ID':
import openpyxl
wb = openpyxl.load_workbook('Example.xlsx')
sheet = wb['Sheet']
sheet['B1'] = 'ID'
wb.save('Example.xlsx')
On running the script I get this error:
PermissionError: [Errno 13] Permission denied: 'Example.xlsx'
I know its because the file is currently open in Excel, but was wondering if there is another way or module I can use to update a sheet while its open.
I have actually figured this out and its quite simple using xlwings. The following code opens an existing Excel file called Example.xlsx and updates it in real time, in this case puts in the value 45 in cell B2 instantly soon as you run the script.
import xlwings as xw
wb = xw.Book('Example.xlsx')
sht1 = wb.sheets['Sheet']
sht1.range('B2').value = 45
You've already worked out why you can't use openpyxl to write to the .xlsx file: it's locked while Excel has it open. You can't write to it directly, but you can use win32com to communicate with the copy of Excel that is running via its COM interface.
You can download win32com from https://github.com/mhammond/pywin32 .
Use it like this:
from win32com.client import Dispatch
xlApp = Dispatch("Excel.Application")
wb=xlApp.Workbooks.Item("MyExcelFile.xlsx")
ws=wb.Sheets("MyWorksheetName")
At this point, ws is a reference to a worksheet object that you can change. The objects you get back aren't Python objects but a thin Python wrapper around VBA objects that obey their own conventions, not Python's.
There is some useful if rather old Python-oriented documentation here: http://timgolden.me.uk/pywin32-docs/contents.html
There is full documentation for the object model here: https://msdn.microsoft.com/en-us/library/wss56bz7.aspx but bear in mind that it is addressed to VBA programmers.
If you want to stream real time data into Excel from Python, you can use an RTD function. If you've ever used the Bloomberg add-in use for accessing real time market data in Excel then you'll be familiar with RTD functions.
The easiest way to write an RTD function for Excel in Python is to use PyXLL. You can read how to do it in the docs here: https://www.pyxll.com/docs/userguide/rtd.html
There's also a blog post showing how to stream live tweets into Excel using Python here: https://www.pyxll.com/blog/a-real-time-twitter-feed-in-excel/
If you wanted to write an RTD server to run outside of Excel you have to register it as a COM server. The pywin32 package includes an example that shows how to do that, however it only works for Excel prior to 2007. For 2007 and later versions you will need this code https://github.com/pyxll/exceltypes to make that example work (see the modified example from pywin32 in exceltypes/demos in that repo).
You can't change an Excel file that's being used by another application because the file format does not support concurrent access.