Visible excel mode with openpyxl<2.6.0> python<3.7.5> - python

Is there a way to make excel visible with openpyxl module in python 3.7.5?
Did some reasearch on documentation and other resources but did not find any answer.
https://openpyxl.readthedocs.io/en/2.6/api/openpyxl.workbook.properties.html
openpyxl.__version__
> '2.6.0'
My objective is to obtain the same result as with use of win32com.client
xlApp = win32com.client.Dispatch('Excel.Application')
xlApp.Application.Visible = True
Tried setting visibility = 'visible' and minimized=False within parameters of openpyxl.workbook.views.BookView object
Also tried setting different parameters within 'sheetview' as specified in this topic:
Set workbook view with openpyxl?
Yet with no success. I believe that there is possibility to do so but i couldnt dig to the answer.
Would appreciate getting some help with the package as documentation does not include detailed descriptions.

Related

wrap_text with openpyxl. How to use documentation to resolve deprecation warning?

I run the following openpyxl command to wrap text in all rows after row 9. It works fine but throws a deprecation warning. I'd love to figure out how to use documentation such as https://openpyxl.readthedocs.io/en/stable/ to determine the current, non-deprecated, way to wrap_text. But I always find the documentation confusing and unhelpful to me. For example, if I search for wrap_text I get this: https://openpyxl.readthedocs.io/en/stable/api/openpyxl.styles.alignment.html#openpyxl.styles.alignment.Alignment.wrapText
But that tells me nothing about how to wrap text. Do I simply not know know how to use the documentation? Is there some great mystery I am to unravel so I don't have to endlessly google how to use openpyxl? How does one look at such documentation and figure out out how to wrap_text in a cell?
Here is the code:
from openpyxl import load_workbook
from openpyxl.styles import Alignment
file1 = "C:\\folder\\inputFile1.xlsx"
wb=load_workbook(file1)
ws = wb.active
for rows in ws.iter_rows(min_row=10, max_row=None, min_col=None, max_col=None):
for cell in rows:
cell.alignment = cell.alignment.copy(wrapText=True)
wb.save('C:\\folder\file1_wrap.xlsx')
here is the deprecation warning:
C:\Users\Jcurran\AppData\Local\Continuum\anaconda3\lib\site-packages\ipykernel_launcher.py:10: DeprecationWarning: Call to deprecated function copy (Use copy(obj) or cell.obj = cell.obj + other).
Remove the CWD from sys.path while we load stuff.
How might I figure out the way to find the information required to use the current (non-deprecated) approach to wrapping text in cells via the documentation at https://openpyxl.readthedocs.io/en/stable/?
I am using Jupyter for my environment. Shift tab or tab doesn't give me anything useful.
Any suggestions? I crave self sufficiency but can't grasp how to navigate the documentation for answer. There must be some clue somewhere? Some source code perhaps that I do not know how to locate?
I learned that I did not have the latest openpyxl version. pip install openpyxl installed 2.5. I upgraded it to 3.0.
Now when I look at https://openpyxl.readthedocs.io/en/stable/index.html it makes more sense :-)
I now know that the "Working with Styles" section of the openpyxl 3.0 documentation is the place to go for formatting data.
So I click that, and go to https://openpyxl.readthedocs.io/en/stable/styles.html
That page shows me this:
>>> alignment=Alignment(horizontal='general',
... vertical='bottom',
... text_rotation=0,
... wrap_text=False,
... shrink_to_fit=False,
... indent=0)
and I could use that info to wrap text with this line:
cell.alignment = Alignment(wrapText=True)
Now things are starting to make sense for me. :-) Thanks!

How to return the PrintArea from Excel in Python

I'm trying to create a Python script (I'm using Python 3.7.3 with UTF-8 encoding on Windows 10 64-bit with Microsoft Office 365) that exports user selected worksheets to PDF, after the user has selected the Excel-files.
The Excel-files contain a lot of different settings for page setup and each worksheet in each Excel-file has a different page setup.
The task is therefore that I need to read all current variables regarding page setup to be able to assign them to the related variables for export.
The problem is when I'm trying to get Excel to return the current print area of the worksheet, which I can't figure out.
As far as I understand I need to be able to read the current print area, to be able to set it for the export.
The Excel-files are a mixture of ".xlxs" and ".xlsm".
I've tried using all kind of different methods from the Excel VBA documentation, but nothing has worked so far e.g. by adding ".Range" and ".Address" etc.
I've also tried the ".UsedRange", but there is no significant difference in the cells that I can search for and I can't format them in a specific way so I can't use this.
I've also tried using the "IgnorePrintAreas = False" variable in the "ExportAsFixedFormat"-function, but that didn't work either.
#This is some of the script.
#I've left out irrelevant parts (dialogboxes etc.) just to make it shorter
#Import pywin32 and open Excel and selected workbook.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch("Excel.Application")
excel.Visible = False
wb = excel.Workbooks.Open(wb_path)
#Select the 1st worksheet in the workbook
#This is just used for testing
wb.Sheets([1]).Select()
#This is the line I can't get to work
ps_prar = wb.ActiveSheet.PageSetup.PrintArea
#This is just used to test if I get the print area
print(ps_prar)
#This is exporting the selected worksheet to PDF
wb.Sheets([1]).Select()
wb.ActiveSheet.ExportAsFixedFormat(0, pdf_path, Quality = 0, IncludeDocProperties = True, IgnorePrintAreas = False, OpenAfterPublish = True)
#This closes the workbook and the Excel-file (although Excel sometimes still exists in Task Manager
wb.Close()
wb = None
excel.Quit()
excel = None
If I leave the code as above and try and open a test Excel-file (.xlxs) with a small PrintArea (A1:H8) the print function just gives me a blank line.
If I add something to .PrintArea (as mentioned above) I get 1 of 2 errors:
"TypeError: 'str' object is not callable".
or
"ps_prar = wb.ActiveSheet.PageSetup.PrintArea.Range
AttributeError: 'str' object has no attribute 'Range'"
I'm hoping someone can help me in this matter - thanks, in advance.
try
wb = excel.Workbooks.OpenXML(wb_path)
insead of
wb = excel.Workbooks.Open(wb_path)
My problem was with a german version of ms-office. It works now. Check here https://social.msdn.microsoft.com/Forums/de-DE/3dce9f06-2262-4e22-a8ff-5c0d83166e73/excel-api-interne-namen?forum=officede

Excel pivot table filter in Python via win32com

I tried really hard to find how to do these simple lines of VBA code in Python via win32com but I couldn't find how to execute it properly :
ActiveSheet.PivotTables("PivotTable1").PivotFields("Quarters").ClearAllFilters
ActiveSheet.PivotTables("PivotTable1").PivotFields("Effective deadline"). _
PivotFilters.Add2 Type:=xlBefore, Value1:="10/10/2017"
When running these lines :
from win32com.client import DispatchEx
excel = DispatchEx('Excel.Application')
wb = excel.Workbooks.Open('myfile.xlsx')
ws = wb.Worksheets('MySheet')
ws.PivotTables(1).PivotFields("Quarters").PivotFilters('Add2', 'xlBefore', '10/10/2017')
I end up with an 'Invalid number of parameters' so I guess I'm quite close but can't find the documentation to complete my code
Has anyone ever managed to do this kind of work ?
You are calling the wrong method. You should call .Add2 after the PivotFilters property:
ws.PivotTables(1).PivotFields("Effective deadline").ClearAllFilters()
ws.PivotTables(1).PivotFields("Effective deadline").PivotFilters.Add2(31, None, '10/10/2017')
Also, notice that you need to specify the XlPivotFilterType Enumeration according to the type of filter you want to apply (in this case xlBefore = 31)

Can't get image into excel header using openpyxl or win32com

I need to get an image file into an Excel header in an automated fashion; I don't believe this is possible in openpyxl, but I thought it might be doable in win32com, though I am not able to get it working. Does anyone know a way to do this? I found an excel macro that successfully does it within Excel:
Sub InsertPicture
With ActiveSheet.PageSetup.LeftHeaderPicture
.FileName = "C:/Users/bharris/Desktop/my_image_file.png"
.Height = 45
.Width = 30
.CropTop = -30
End With
ActiveSheet.PageSetup.LeftHeader = "&;G"
So I tried implementing from within python using win32com:
excel = DispatchEx('Excel.Application')
excel.Visible = True
wb=excel.Workbooks.Open(my_excel_file.xlsx')
ws = wb.Sheets(1)
ws.PageSetup.LeftHeaderPicture.FileName ="my_image_file.png"
ws.PageSetup.LeftHeader = "&;G"
wb.SaveAs('my_excel_file.xlsx')
wb.Close(True, 'my_excel_file.xlsx')
excel.Application.Quit()
but it gives me:
raise AttributeError("Property '%s.%s' can not be set." % (self._username_, attr))
AttributeError: Property '<unknown>.FileName' can not be set.
So it looks like FileName is not a property within win32com like it is in VBA within Excel. I've tried some different combinations of things and nothing seems to work. If anyone knows how to do this, with any type of openpyxl or win32com code, or any other code (though it has to be able to edit an existing spreadsheet, not just write a new one like xlsxwriter), your help is much appreciated!
P.S. I have found several solutions for how to insert an image into a cell, but this question is for inserting into the header specifically.
Thanks much,
Robert

Get formula from Excel cell with python xlrd

I have to port an algorithm from an Excel sheet to python code but I have to reverse engineer the algorithm from the Excel file.
The Excel sheet is quite complicated, it contains many cells in which there are formulas that refer to other cells (that can also contains a formula or a constant).
My idea is to analyze with a python script the sheet building a sort of table of dependencies between cells, that is:
A1 depends on B4,C5,E7 formula: "=sqrt(B4)+C5*E7"
A2 depends on B5,C6 formula: "=sin(B5)*C6"
...
The xlrd python module allows to read an XLS workbook but at the moment I can access to the value of a cell, not the formula.
For example, with the following code I can get simply the value of a cell:
import xlrd
#open the .xls file
xlsname="test.xls"
book = xlrd.open_workbook(xlsname)
#build a dictionary of the names->sheets of the book
sd={}
for s in book.sheets():
sd[s.name]=s
#obtain Sheet "Foglio 1" from sheet names dictionary
sheet=sd["Foglio 1"]
#print value of the cell J141
print sheet.cell(142,9)
Anyway, It seems to have no way to get the formul from the Cell object returned by the .cell(...) method.
In documentation they say that it is possible to get a string version of the formula (in english because there is no information about function name translation stored in the Excel file). They speak about formulas (expressions) in the Name and Operand classes, anyway I cannot understand how to get the instances of these classes by the Cell class instance that must contains them.
Could you suggest a code snippet that gets the formula text from a cell?
[Dis]claimer: I'm the author/maintainer of xlrd.
The documentation references to formula text are about "name" formulas; read the section "Named references, constants, formulas, and macros" near the start of the docs. These formulas are associated sheet-wide or book-wide to a name; they are not associated with individual cells. Examples: PI maps to =22/7, SALES maps to =Mktng!$A$2:$Z$99. The name-formula decompiler was written to support inspection of the simpler and/or commonly found usages of defined names.
Formulas in general are of several kinds: cell, shared, and array (all associated with a cell, directly or indirectly), name, data validation, and conditional formatting.
Decompiling general formulas from bytecode to text is a "work-in-progress", slowly. Note that supposing it were available, you would then need to parse the text formula to extract the cell references. Parsing Excel formulas correctly is not an easy job; as with HTML, using regexes looks easy but doesn't work. It would be better to extract the references directly from the formula bytecode.
Also note that cell-based formulas can refer to names, and name formulas can refer both to cells and to other names. So it would be necessary to extract both cell and name references from both cell-based and name formulas. It may be useful to you to have info on shared formulas available; otherwise having parsed the following:
B2 =A2
B3 =A3+B2
B4 =A4+B3
B5 =A5+B4
...
B60 =A60+B59
you would need to deduce the similarity between the B3:B60 formulas yourself.
In any case, none of the above is likely to be available any time soon -- xlrd priorities lie elsewhere.
Update: I have gone and implemented a little library to do exactly what you describe: extracting the cells & dependencies from an Excel spreadsheet and converting them to python code. Code is on github, patches welcome :)
Just to add that you can always interact with excel using win32com (not very fast but it works). This does allow you to get the formula. A tutorial can be found here [cached copy] and details can be found in this chapter [cached copy].
Essentially you just do:
app.ActiveWorkbook.ActiveSheet.Cells(r,c).Formula
As for building a table of cell dependencies, a tricky thing is parsing the excel expressions. If I remember correctly the Trace code you mentioned does not always do this correctly. The best I have seen is the algorithm by E. W. Bachtal, of which a python implementation is available which works well.
So I know this is a very old post, but I found a decent way of getting the formulas from all the sheets in a workbook as well as having the newly created workbook retain all the formatting.
First step is to save a copy of your .xlsx file as .xls
-- Use the .xls as the filename in the code below
Using Python 2.7
from lxml import etree
from StringIO import StringIO
import xlsxwriter
import subprocess
from xlrd import open_workbook
from xlutils.copy import copy
from xlsxwriter.utility import xl_cell_to_rowcol
import os
file_name = '<YOUR-FILE-HERE>'
dir_path = os.path.dirname(os.path.realpath(file_name))
subprocess.call(["unzip",str(file_name+"x"),"-d","file_xml"])
xml_sheet_names = dict()
with open_workbook(file_name,formatting_info=True) as rb:
wb = copy(rb)
workbook_names_list = rb.sheet_names()
for i,name in enumerate(workbook_names_list):
xml_sheet_names[name] = "sheet"+str(i+1)
sheet_formulas = dict()
for i, k in enumerate(workbook_names_list):
xmlFile = os.path.join(dir_path,"file_xml/xl/worksheets/{}.xml".format(xml_sheet_names[k]))
with open(xmlFile) as f:
xml = f.read()
tree = etree.parse(StringIO(xml))
context = etree.iterparse(StringIO(xml))
sheet_formulas[k] = dict()
for _, elem in context:
if elem.tag.split("}")[1]=='f':
cell_key = elem.getparent().get(key="r")
cell_formula = elem.text
sheet_formulas[k][cell_key] = str("="+cell_formula)
sheet_formulas
Structure of Dictionary 'sheet_formulas'
{'Worksheet_Name': {'A1_cell_reference':'cell_formula'}}
Example results:
{u'CY16': {'A1': '=Data!B5',
'B1': '=Data!B1',
'B10': '=IFERROR(Data!B12,"")',
'B11': '=IFERROR(SUM(B9:B10),"")',
It seems that it is impossible now to do what you want with xlrd. You can have a look at this post for the detailed description of why it is so difficult to implement the functionality you need.
Note that the developping team does a great job for support at the python-excel google group.
I know this post is a little late but there's one suggestion that hasn't been covered here. Cut all the entries from the worksheet and paste using paste special (OpenOffice). This will convert the formulas to numbers so there's no need for additional programming and this is a reasonable solution for small workbooks.
Ye! With win32com it's works for me.
import win32com.client
Excel = win32com.client.Dispatch("Excel.Application")
# python -m pip install pywin32
file=r'path Excel file'
wb = Excel.Workbooks.Open(file)
sheet = wb.ActiveSheet
#Get value
val = sheet.Cells(1,1).value
# Get Formula
sheet.Cells(6,2).Formula

Categories

Resources