Links break when copying Excel sheets with win32com

Links break when copying Excel sheets with win32com - python

I am trying to use win32com to copy a worksheet from my workbook to a new workbook. The code is working fine but the cell formulas in the new book point back to the original book. I would like to break the links in the new book so that these formulas are replaced with raw numbers. This is trivial to do in Excel but I haven't been able to find out how to do it using the win32com client in Python.
Here is a snippet of my code:
import win32com.client
xl = win32com.client.gencache.EnsureDispatch('Excel.Application')
xl.Visible = True
#Open & Refresh Spreadsheet
wb = xl.Workbooks.Open(r"C:\Users\me\dummy.xlsx") #Dummy path
print("Refreshing data...")
wb.RefreshAll()
#Create new book and copy target sheet over
print("Opening new workbook")
nwb = xl.Workbooks.Add()
newfile = r"C:\Users\me\dummy2.xlsx"
wb.Worksheets(["Target Sheet"]).Copy(Before=nwb.Worksheets(1))
nwb.SaveAs(newfile)
This code works fine but in the saved "dummy2" file each of the cells containing formulas reference the original sheet. How can I break the links in the new book and/or copy values only from the original book?

Edit in response to #martineau 's downvote of the answer and of the (admittedly unsatisfactory) Microsoft documentation.
I think you haven't been able to find out how to do this because you have been looking in the wrong place. Your question really has little to do with Python or with win32com.
This line
xl = win32com.client.gencache.EnsureDispatch('Excel.Application')
fires up a COM client called xl that talks to excel.exe. Your variable xl is a thin Python wrapper around a Microsoft COM object that can call Excel VBA functions. When you type xl., everything after the dot is expected to be a VBA object or method. Any value (other than strings and floats) that you get back from a call is a VBA object in a thin Python wrapper. Python conventions do not necessarily apply to such objects.
So to find out about what functions you need to call, you need to be looking at the Excel VBA documentation. One difficulty with that documentation is that it assumes you are writing VBA, not Python. The other is that it isn't all that well-written.
The VBA method you need is Workbook.BreakLink().
Call it after copying the original workbook and before saving the copy, like this (I'm using your dummy filename here, don't expect it to actually work without fixing that):
wb.Worksheets(["Target Sheet"]).Copy(Before=nwb.Worksheets(1))
nwb.BreakLink(Name=r"C:\Users\me\dummy.xlsx", Type=1)
nwb.SaveAs(newfile)
The name of the link is the filename it points to, and the type of the link is 1 (for a link to an Excel spreadsheet). In this case you know the name of the link source (since you just made a copy of it) so there is no need to ask what the filename is, but in the general case you need to call Workbook.LinkSources() to find out what they are, and break them one by one.

Related

Change excel page headers with python

Forgive me if this is an idiotic question. Im new to coding and wanted to automate part of my workflow.
Im enjoying the puzzle so i won't ask too many questions. But im stuck on this
Every time an order comes in, I have to copy data from raw excel files to a templates.
I want to replace the three headers at the top of this page with variables ive already extracted from the raw excel data.
enter image description here
so that it would look like this on every page
enter image description here
In every tutorial I see, their "header" is just row 1.
I think xlsxwriter has the ability to change those headers looks like that only on new worksheets.
df1.to_clipboard(index=False, header=False) #Copies df1 to clipboard (BOM Data)
ws.Range("A2").Select()
ws.PasteSpecial(Format='Unicode Text') # Paste as text in template
*#So at this point i guess im using pywin32 to copy and paste but have to use switch back to xlsxwriter to change the header?*
wb = xlsxwriter.Workbook(r'C:\Users\jfras\Desktop\Auto BOM\PARKER BOM TEMPLATE.xlsx')
ws = wb.Worksheets(1)
header1 = '&CTest Entry'*#So at this point i guess im using pywin32 to copy and paste but have to use switch back to xlsxwriter to change the header?*
wb = xlsxwriter.Workbook(r'C:\Users\jfras\Desktop\Auto BOM\PARKER BOM TEMPLATE.xlsx')
ws = wb.Worksheets(1)
header1 = '&CTest Entry'

Your question is a little unclear, the screenshots you attached look to be inside of word. It seems like you are trying to automate moving data from excel into a word document template, is that correct?
If I understand correctly, you will need to use a python package to read your excel document, then use a python package to insert that data into a parameterized template in word. Here is an article explaining doing exactly that.
In a nutshell, using Openpyxl (or presumably any python excel reader of your choosing) you would read the excel sheet, then "plug-in" your data into a word template using something like Python-docx. The article linked above contains code snippets explaining this process in more detail.

I hope I understood your question right. If so, something like this code below may work:
import xlsxwriter
workbook = xlsxwriter.Workbook('teste.xlsx')
worksheet = workbook.add_worksheet()
worksheet.set_header('&L P10853' + '&CTEST OBJECT' + '&RUN_28583')
workbook.close()
Of course, if you just run this code you gonna end up having an empty sheet that prints nothing until you fill at least one cell.
But, anyway, you can understand the code like, the command set_header it's the mandatory here and it's doing what we want. When you put a string with &L you setting the left header &C for the center header and &R for the right header. You can see more in https://xlsxwriter.readthedocs.io/example_headers_footers.html

How to return the PrintArea from Excel in Python

I'm trying to create a Python script (I'm using Python 3.7.3 with UTF-8 encoding on Windows 10 64-bit with Microsoft Office 365) that exports user selected worksheets to PDF, after the user has selected the Excel-files.
The Excel-files contain a lot of different settings for page setup and each worksheet in each Excel-file has a different page setup.
The task is therefore that I need to read all current variables regarding page setup to be able to assign them to the related variables for export.
The problem is when I'm trying to get Excel to return the current print area of the worksheet, which I can't figure out.
As far as I understand I need to be able to read the current print area, to be able to set it for the export.
The Excel-files are a mixture of ".xlxs" and ".xlsm".
I've tried using all kind of different methods from the Excel VBA documentation, but nothing has worked so far e.g. by adding ".Range" and ".Address" etc.
I've also tried the ".UsedRange", but there is no significant difference in the cells that I can search for and I can't format them in a specific way so I can't use this.
I've also tried using the "IgnorePrintAreas = False" variable in the "ExportAsFixedFormat"-function, but that didn't work either.
#This is some of the script.
#I've left out irrelevant parts (dialogboxes etc.) just to make it shorter
#Import pywin32 and open Excel and selected workbook.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch("Excel.Application")
excel.Visible = False
wb = excel.Workbooks.Open(wb_path)
#Select the 1st worksheet in the workbook
#This is just used for testing
wb.Sheets([1]).Select()
#This is the line I can't get to work
ps_prar = wb.ActiveSheet.PageSetup.PrintArea
#This is just used to test if I get the print area
print(ps_prar)
#This is exporting the selected worksheet to PDF
wb.Sheets([1]).Select()
wb.ActiveSheet.ExportAsFixedFormat(0, pdf_path, Quality = 0, IncludeDocProperties = True, IgnorePrintAreas = False, OpenAfterPublish = True)
#This closes the workbook and the Excel-file (although Excel sometimes still exists in Task Manager
wb.Close()
wb = None
excel.Quit()
excel = None
If I leave the code as above and try and open a test Excel-file (.xlxs) with a small PrintArea (A1:H8) the print function just gives me a blank line.
If I add something to .PrintArea (as mentioned above) I get 1 of 2 errors:
"TypeError: 'str' object is not callable".
or
"ps_prar = wb.ActiveSheet.PageSetup.PrintArea.Range
AttributeError: 'str' object has no attribute 'Range'"
I'm hoping someone can help me in this matter - thanks, in advance.

try
wb = excel.Workbooks.OpenXML(wb_path)
insead of
wb = excel.Workbooks.Open(wb_path)
My problem was with a german version of ms-office. It works now. Check here https://social.msdn.microsoft.com/Forums/de-DE/3dce9f06-2262-4e22-a8ff-5c0d83166e73/excel-api-interne-namen?forum=officede

Find if a value exists in a column in Excel using python

I have an Excel file with one worksheet that has sediment collection data. I am running a long Python script.
In the worksheet is a column titled “CollectionYear.” Say I want the year 2010. If the year 2010 exists in the “CollectionYear” column, I want the rest of the script to run, if not then I want the script to stop.
This seems like an easy enough task but for the life of me I cannot figure it out nor find any examples.
Any help would be greatly appreciated.

I use xlrd all the time and it works great for me. Something like this might be helpful
from xlrd import open_workbook
def main():
book = open_workbook('example.xlsx')
sheet = book.sheet_by_index(0)
collection_year_col = 2 #Just an example
test_year = 2010
for row in range(sheet.nrows):
if sheet.cell(row,collection_year_col).value == test_year:
runCode()
def runCode():
#your code
I hope this points you in the right direction. More help could be given if the details of your problem were known.

Here is what I learned from tackling a needle-in-a-haystack problem for a gigantic pile of .xls files. There are some things xlrd and friends can't (or won't) do, such as getting the formula of a cell. For that, you'll need to use the Microsoft Component Object Model (COM)1.
I recommend you find yourself a copy of Python Programming on Win32 by Mark Hammond. It's still useful 20 years later. Python Programming on Win32 covers the basics of the COM and how to access it using the pywin32 library (also from Mark Hammond).
In a nutshell, you can think of the COM as an API between a server (say, Excel) and a client (such as a Python script)2.
import win32com.client
# Connect to Excel server
xl = win32com.client.Dispatch("Excel.Application")
The COM API is reasonably well documented. Once you get used to the terminology, things become straight-forward albeit tedious. For example, an Excel file is technically a "Workbook". The "Workbooks" COM object has the Open method which provides a handle for Python to interact with the "Workbook". (Did you notice the different 's' endings on those?)
import win32com.client
# Connect to Excel server
xl = win32com.client.Dispatch("Excel.Application")
myfile = r'C:\temp\myworkbook.xls'
wb = xl.Workbooks.Open(Filename=myfile)
A "Workbook" contains a "Sheet", accessed here through the "Sheets" COM object:
import win32com.client
# Connect to Excel server
xl = win32com.client.Dispatch("Excel.Application")
myfile = r'C:\temp\myworkbook.xls'
wb = xl.Workbooks.Open(Filename=myfile)
sht1 = wb.Sheets.Item(1)
Finally, the 'Cells' property of a worksheet "returns a Range object that represents all the cells on the worksheet". The Range object then has a Find method which will search within the range. The LookIn parameter allows for searching cell values, formulas, and comments.
import win32com.client
# Connect to Excel server
xl = win32com.client.Dispatch("Excel.Application")
myfile = r'C:\temp\myworkbook.xls'
wb = xl.Workbooks.Open(Filename=myfile)
sht1 = wb.Sheets.Item(1)
match = sht1.Cells.Find('search string')
The result of Find is a Range object which has many useful properties, like Formula, GetAddress, Value, and Text. You'll also find, as with anything Microsoft, that it's good enough for government work.
Finally, don't forget to close the workbook and to quit Excel!
import win32com.client
# Connect to Excel server
xl = win32com.client.Dispatch("Excel.Application")
myfile = r'C:\temp\myworkbook.xls'
wb = xl.Workbooks.Open(Filename=myfile)
sht1 = wb.Sheets.Item(1)
match = sht1.Cells.Find('search string')
print(match.Formula)
wb.Close(SaveChanges=False)
xl.Quit()
You can extend these ideas with Sheets.Item and Sheets.Count and iterate over all sheets in a workbook (or all workbooks in a directory). You can have lots of fun!
The headaches you may encounter include VBA macros and embedded objects, as well as the various different alerts each can produce. Performance is also an issue. The following silence notifications and can dramatically improve performance:
Application
xl.DisplayAlerts (False)
xl.AutomationSecurity (msoAutomationSecurityForceDisable)
xl.Interactive (False)
xl.PrintCommunication (False)
xl.ScreenUpdating (False)
xl.StatusBar (False)
Workbook
wb.DoNotPromptForConvert (True)
wb.EnableAutoRecover (False)
wb.KeepChangeHistory (False)
Another potential issue is late/early binding. Basically, does Python have information about the COM object? This affects things like introspection and how COM objects are referenced. The win32com.client package uses late-bound automation by default.
With late-bound automation, Python doesn't know much about the COM object:
>> import win32com.client
>> xl = win32com.client.Dispatch("Excel.Application")
>> xl
<COMObject Excel.Application>
>> len(dir(xl))
55
With early-bound automation, Python has full knowledge of the object:
>> import win32com.client
>> xl = win32com.client.Dispatch("Excel.Application")
>> xl
<win32com.gen_py.Microsoft Excel 16.0 Object Library._Application instance at 0x2583562290680>
>> len(dir(xl))
125
To enable early binding, you must run makepy.py which is included with pywin32. Running makepy.py will prompt for the library to bind with.
(venv) c:\temp\venv\Lib\site-packages\win32com\client>python makepy.py
python makepy.py
The process creates a Python file (in Temp\) which maps the methods and properties of the COM object.
(venv) c:\temp\venv\Lib\site-packages\win32com\client>python makepy.py
python makepy.py
Generating to C:\Users\Lorem\AppData\Local\Temp\gen_py\3.6\00020813-0000-0000-C000-000000000046x0x1x9.py
Building definitions from type library...
Generating...
Importing module
Early binding also provides access to COM constants, such as msoAutomationSecurityForceDisable and xlAscending and is case-sensitive (whereas late-binding is not).
That should be enough info to implement a Python-to-Excel library (like xlwings), overkill notwithstanding.
1 Actually, xlwings works by utilizing the COM though pywin32. Here's to one less dependency!
2 This example uses win32com.client.Dispatch which requires processing happen through a single Excel instance. Use win32com.client.DispatchEx to create separate instances of Excel.

Try using xlwings library to interface with Excel from python
example from their docs:
from xlwings import Workbook, Sheet, Range, Chart
wb = Workbook() # Creates a connection with a new workbook
Range('A1').value = 'Foo 1'
Range('A1').value
>>> 'Foo 1'
Range('A1').value = [['Foo 1', 'Foo 2', 'Foo 3'], [10.0, 20.0, 30.0]]

Using Python to read VBA from an Excel spreadsheet

I would like to write a VBA diff program in (preferably) Python. Is there a Python library that will allow me to read the VBA contained in an Excel spreadsheet?

Here's some quick and dirty boilerplate to get you started. It uses the Excel COM object (a Windows only solution):
from win32com.client import Dispatch
wbpath = 'C:\\example.xlsm'
xl = Dispatch("Excel.Application")
xl.Visible = 1
wb = xl.Workbooks.Open(wbpath)
vbcode = wb.VBProject.VBComponents(1).CodeModule
print vbcode.Lines(1, vbcode.CountOfLines)
This prints the silly macro I recorded for this example:
Sub silly_macro()
'
' silly_macro Macro
'
'
Range("B2").Select
End Sub
Note that Lines and VBComponents use 1-based indexing. VBComponents also supports indexing by module name. Also note that Excel requires backslashes in paths.
To dive deeper see Pearson's Programming The VBA Editor. (The above example was cobbled together from what I skimmed from there.)

I have created an application that does this called VbaDiff. If you provide it two Excel files it will compare the VBA code in each. You can also run it from the command line, or use the version that comes with an API if you want to integrate it with your own programs.
You can find out more at http://www.technicana.com/vbadiff-information.html
Chris

Get formula from Excel cell with python xlrd

I have to port an algorithm from an Excel sheet to python code but I have to reverse engineer the algorithm from the Excel file.
The Excel sheet is quite complicated, it contains many cells in which there are formulas that refer to other cells (that can also contains a formula or a constant).
My idea is to analyze with a python script the sheet building a sort of table of dependencies between cells, that is:
A1 depends on B4,C5,E7 formula: "=sqrt(B4)+C5*E7"
A2 depends on B5,C6 formula: "=sin(B5)*C6"
...
The xlrd python module allows to read an XLS workbook but at the moment I can access to the value of a cell, not the formula.
For example, with the following code I can get simply the value of a cell:
import xlrd
#open the .xls file
xlsname="test.xls"
book = xlrd.open_workbook(xlsname)
#build a dictionary of the names->sheets of the book
sd={}
for s in book.sheets():
sd[s.name]=s
#obtain Sheet "Foglio 1" from sheet names dictionary
sheet=sd["Foglio 1"]
#print value of the cell J141
print sheet.cell(142,9)
Anyway, It seems to have no way to get the formul from the Cell object returned by the .cell(...) method.
In documentation they say that it is possible to get a string version of the formula (in english because there is no information about function name translation stored in the Excel file). They speak about formulas (expressions) in the Name and Operand classes, anyway I cannot understand how to get the instances of these classes by the Cell class instance that must contains them.
Could you suggest a code snippet that gets the formula text from a cell?

[Dis]claimer: I'm the author/maintainer of xlrd.
The documentation references to formula text are about "name" formulas; read the section "Named references, constants, formulas, and macros" near the start of the docs. These formulas are associated sheet-wide or book-wide to a name; they are not associated with individual cells. Examples: PI maps to =22/7, SALES maps to =Mktng!$A$2:$Z$99. The name-formula decompiler was written to support inspection of the simpler and/or commonly found usages of defined names.
Formulas in general are of several kinds: cell, shared, and array (all associated with a cell, directly or indirectly), name, data validation, and conditional formatting.
Decompiling general formulas from bytecode to text is a "work-in-progress", slowly. Note that supposing it were available, you would then need to parse the text formula to extract the cell references. Parsing Excel formulas correctly is not an easy job; as with HTML, using regexes looks easy but doesn't work. It would be better to extract the references directly from the formula bytecode.
Also note that cell-based formulas can refer to names, and name formulas can refer both to cells and to other names. So it would be necessary to extract both cell and name references from both cell-based and name formulas. It may be useful to you to have info on shared formulas available; otherwise having parsed the following:
B2 =A2
B3 =A3+B2
B4 =A4+B3
B5 =A5+B4
...
B60 =A60+B59
you would need to deduce the similarity between the B3:B60 formulas yourself.
In any case, none of the above is likely to be available any time soon -- xlrd priorities lie elsewhere.

Update: I have gone and implemented a little library to do exactly what you describe: extracting the cells & dependencies from an Excel spreadsheet and converting them to python code. Code is on github, patches welcome :)
Just to add that you can always interact with excel using win32com (not very fast but it works). This does allow you to get the formula. A tutorial can be found here [cached copy] and details can be found in this chapter [cached copy].
Essentially you just do:
app.ActiveWorkbook.ActiveSheet.Cells(r,c).Formula
As for building a table of cell dependencies, a tricky thing is parsing the excel expressions. If I remember correctly the Trace code you mentioned does not always do this correctly. The best I have seen is the algorithm by E. W. Bachtal, of which a python implementation is available which works well.

So I know this is a very old post, but I found a decent way of getting the formulas from all the sheets in a workbook as well as having the newly created workbook retain all the formatting.
First step is to save a copy of your .xlsx file as .xls
-- Use the .xls as the filename in the code below
Using Python 2.7
from lxml import etree
from StringIO import StringIO
import xlsxwriter
import subprocess
from xlrd import open_workbook
from xlutils.copy import copy
from xlsxwriter.utility import xl_cell_to_rowcol
import os
file_name = '<YOUR-FILE-HERE>'
dir_path = os.path.dirname(os.path.realpath(file_name))
subprocess.call(["unzip",str(file_name+"x"),"-d","file_xml"])
xml_sheet_names = dict()
with open_workbook(file_name,formatting_info=True) as rb:
wb = copy(rb)
workbook_names_list = rb.sheet_names()
for i,name in enumerate(workbook_names_list):
xml_sheet_names[name] = "sheet"+str(i+1)
sheet_formulas = dict()
for i, k in enumerate(workbook_names_list):
xmlFile = os.path.join(dir_path,"file_xml/xl/worksheets/{}.xml".format(xml_sheet_names[k]))
with open(xmlFile) as f:
xml = f.read()
tree = etree.parse(StringIO(xml))
context = etree.iterparse(StringIO(xml))
sheet_formulas[k] = dict()
for _, elem in context:
if elem.tag.split("}")[1]=='f':
cell_key = elem.getparent().get(key="r")
cell_formula = elem.text
sheet_formulas[k][cell_key] = str("="+cell_formula)
sheet_formulas
Structure of Dictionary 'sheet_formulas'
{'Worksheet_Name': {'A1_cell_reference':'cell_formula'}}
Example results:
{u'CY16': {'A1': '=Data!B5',
'B1': '=Data!B1',
'B10': '=IFERROR(Data!B12,"")',
'B11': '=IFERROR(SUM(B9:B10),"")',

It seems that it is impossible now to do what you want with xlrd. You can have a look at this post for the detailed description of why it is so difficult to implement the functionality you need.
Note that the developping team does a great job for support at the python-excel google group.

I know this post is a little late but there's one suggestion that hasn't been covered here. Cut all the entries from the worksheet and paste using paste special (OpenOffice). This will convert the formulas to numbers so there's no need for additional programming and this is a reasonable solution for small workbooks.

Ye! With win32com it's works for me.
import win32com.client
Excel = win32com.client.Dispatch("Excel.Application")
# python -m pip install pywin32
file=r'path Excel file'
wb = Excel.Workbooks.Open(file)
sheet = wb.ActiveSheet
#Get value
val = sheet.Cells(1,1).value
# Get Formula
sheet.Cells(6,2).Formula

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.