Add icon sets to an existing excel file with python - python

I'm trying to add some icon sets to an existing excel file using python.
The excel file is written using xlsxwriter. As xlsxwriter does not support icon sets, I close the file, reopen it with openpyxl, add the icon sets and save it again. Problem is, that I loose all conditional formatting added previously. Opening the file in openpyxl with "keep_vba=True" results in a non-readable xlsx-File.
Any ideas how to achieve this?
Thanks in advance!
P.S.: Missed some details. Sorry for that. I write xlsx files in both cases (xlsxwriter and openpyxl) and use python 2.7 and the latest versions of openpyxl and xlsxwriter on a windows machine with excel 2013. Icon sets are little symbols like arrows (up, down) which can be used in conditional formatting.

OpenPyXl has a support for conditional formatting and Icon Sets.
See the official documentation: Conditional Formatting > IconSet
Here is an example:
>>> from openpyxl.formatting.rule import IconSet, FormatObject
>>> first = FormatObject(type='percent', val=0)
>>> second = FormatObject(type='percent', val=33)
>>> third = FormatObject(type='percent', val=67)
>>> iconset = IconSet(iconSet='3TrafficLights1', cfvo=[first, second, third], showValue=None, percent=None, reverse=None)
>>> # assign the icon set to a rule
>>> from openpyxl.formatting.rule import Rule
>>> rule = Rule(type='iconSet', iconSet=iconset)

Related

PyCharm autocomplete (hints?) not working for only one package

I'm working with Excel files and I'm using the openpyxl package. I'm also making a GUI so I'm using the PyQt5 package. When I type something like this:
label = QtWidgets.QLabel("Text", window)
as soon as I type QtWidgets. I get options for what to do next, and I know what I have available.
I want to do the same with the openpyxl package, but it's not working. I have tried stuff like this:
wb = openpyxl.load_workbook('test.xlsx')
ws = wb['Sheet1']
for table in ws.tables:
print(table)
When I type ws., there's no "tables" option initially, unless I have already used it in my own code somewhere before. I want to know what options I have when I type in ws., but it's just not showing anything.
Examples:
It doesn't work with the openpyxl
It works with the PyQt5

How to return the PrintArea from Excel in Python

I'm trying to create a Python script (I'm using Python 3.7.3 with UTF-8 encoding on Windows 10 64-bit with Microsoft Office 365) that exports user selected worksheets to PDF, after the user has selected the Excel-files.
The Excel-files contain a lot of different settings for page setup and each worksheet in each Excel-file has a different page setup.
The task is therefore that I need to read all current variables regarding page setup to be able to assign them to the related variables for export.
The problem is when I'm trying to get Excel to return the current print area of the worksheet, which I can't figure out.
As far as I understand I need to be able to read the current print area, to be able to set it for the export.
The Excel-files are a mixture of ".xlxs" and ".xlsm".
I've tried using all kind of different methods from the Excel VBA documentation, but nothing has worked so far e.g. by adding ".Range" and ".Address" etc.
I've also tried the ".UsedRange", but there is no significant difference in the cells that I can search for and I can't format them in a specific way so I can't use this.
I've also tried using the "IgnorePrintAreas = False" variable in the "ExportAsFixedFormat"-function, but that didn't work either.
#This is some of the script.
#I've left out irrelevant parts (dialogboxes etc.) just to make it shorter
#Import pywin32 and open Excel and selected workbook.
import win32com.client as win32
excel = win32.gencache.EnsureDispatch("Excel.Application")
excel.Visible = False
wb = excel.Workbooks.Open(wb_path)
#Select the 1st worksheet in the workbook
#This is just used for testing
wb.Sheets([1]).Select()
#This is the line I can't get to work
ps_prar = wb.ActiveSheet.PageSetup.PrintArea
#This is just used to test if I get the print area
print(ps_prar)
#This is exporting the selected worksheet to PDF
wb.Sheets([1]).Select()
wb.ActiveSheet.ExportAsFixedFormat(0, pdf_path, Quality = 0, IncludeDocProperties = True, IgnorePrintAreas = False, OpenAfterPublish = True)
#This closes the workbook and the Excel-file (although Excel sometimes still exists in Task Manager
wb.Close()
wb = None
excel.Quit()
excel = None
If I leave the code as above and try and open a test Excel-file (.xlxs) with a small PrintArea (A1:H8) the print function just gives me a blank line.
If I add something to .PrintArea (as mentioned above) I get 1 of 2 errors:
"TypeError: 'str' object is not callable".
or
"ps_prar = wb.ActiveSheet.PageSetup.PrintArea.Range
AttributeError: 'str' object has no attribute 'Range'"
I'm hoping someone can help me in this matter - thanks, in advance.
try
wb = excel.Workbooks.OpenXML(wb_path)
insead of
wb = excel.Workbooks.Open(wb_path)
My problem was with a german version of ms-office. It works now. Check here https://social.msdn.microsoft.com/Forums/de-DE/3dce9f06-2262-4e22-a8ff-5c0d83166e73/excel-api-interne-namen?forum=officede

Openpyxl 2.3.2 Change Tab Color/Fit To Page Properties?

I am trying to modify an excel spreadsheet to alter the colors of the tabs using Openpyxl 2.3.2 (using Anaconda), but can't seem to get the code to work. I am using the following code, where bdws is a worksheet already in the workbook, and bdws2 is a worksheet I added later.
I can't get either of the sheets to change color.
As well, I can't seem to adjust other page properties like fitToPage, using the same worksheets. Just wondering if anyone might know why that is.
bdwb = load_workbook(checkFileName(finalBDFileName))
bdws = bdwb[finalBDSheetName]
bdws.sheet_properties.tabcolor ='FFFF00'
bdws.sheet_properties.pageSetUpPr.fitToPage = True
bdws2.sheet_properties.tabcolor = 'FF00FF'
bdws2.sheet_properties.pageSetUpPr.fitToPage = True
bdwb.save("new bd.xlsx")
Thank you.
You just need to capitalize your Color :)
bdws.sheet_properties.tabColor ='FFFF00'
bdws.sheet_properties.pageSetUpPr.fitToPage = True
bdws2.sheet_properties.tabColor = 'FF00FF'
bdws2.sheet_properties.pageSetUpPr.fitToPage = True
should do it for you.

Python, openpyxl: I get the wrong value when running get_highest_column()

I am practicing with openpyxl and I'm working on an Excel file called 'test.xlsx'. The file only has 3 columns and 7 rows. The .xlsx file was created with LibreOffice.
When I run...
>>> #! python3
>>> import openpyxl
>>> wb = openpyxl.load_workbook('test.xlsx')
>>> sheet = wb.get_sheet_by_name('Sheet1')
>>> sheet.get_highest_column()
1025
The returned value should be 3.
A quick Google search suggested I run:
>>> sheet.calculate_dimension()
and got the return value:
'A1:AMK7'
This should only be 'A1:C7'.
I remember reading that LibreOffice could be part of the problem to this.
However, I can't switch to MSOffice, and I hate OpenOffice.
Is there suggestion on how I could fix this, or work around it?
Thanks!
It sounds like you're using older versions of LibreOffice and openpyxl. LibreOffice did used to set a default value of "A1:AMK7" for the dimensions but it version 5 doesn't seem to be doing that any more. openpyxl used to rely on the dimensions tag when reading files but hasn't done this for a while. Please try using openpyxl 2.3-b2

Get formula from Excel cell with python xlrd

I have to port an algorithm from an Excel sheet to python code but I have to reverse engineer the algorithm from the Excel file.
The Excel sheet is quite complicated, it contains many cells in which there are formulas that refer to other cells (that can also contains a formula or a constant).
My idea is to analyze with a python script the sheet building a sort of table of dependencies between cells, that is:
A1 depends on B4,C5,E7 formula: "=sqrt(B4)+C5*E7"
A2 depends on B5,C6 formula: "=sin(B5)*C6"
...
The xlrd python module allows to read an XLS workbook but at the moment I can access to the value of a cell, not the formula.
For example, with the following code I can get simply the value of a cell:
import xlrd
#open the .xls file
xlsname="test.xls"
book = xlrd.open_workbook(xlsname)
#build a dictionary of the names->sheets of the book
sd={}
for s in book.sheets():
sd[s.name]=s
#obtain Sheet "Foglio 1" from sheet names dictionary
sheet=sd["Foglio 1"]
#print value of the cell J141
print sheet.cell(142,9)
Anyway, It seems to have no way to get the formul from the Cell object returned by the .cell(...) method.
In documentation they say that it is possible to get a string version of the formula (in english because there is no information about function name translation stored in the Excel file). They speak about formulas (expressions) in the Name and Operand classes, anyway I cannot understand how to get the instances of these classes by the Cell class instance that must contains them.
Could you suggest a code snippet that gets the formula text from a cell?
[Dis]claimer: I'm the author/maintainer of xlrd.
The documentation references to formula text are about "name" formulas; read the section "Named references, constants, formulas, and macros" near the start of the docs. These formulas are associated sheet-wide or book-wide to a name; they are not associated with individual cells. Examples: PI maps to =22/7, SALES maps to =Mktng!$A$2:$Z$99. The name-formula decompiler was written to support inspection of the simpler and/or commonly found usages of defined names.
Formulas in general are of several kinds: cell, shared, and array (all associated with a cell, directly or indirectly), name, data validation, and conditional formatting.
Decompiling general formulas from bytecode to text is a "work-in-progress", slowly. Note that supposing it were available, you would then need to parse the text formula to extract the cell references. Parsing Excel formulas correctly is not an easy job; as with HTML, using regexes looks easy but doesn't work. It would be better to extract the references directly from the formula bytecode.
Also note that cell-based formulas can refer to names, and name formulas can refer both to cells and to other names. So it would be necessary to extract both cell and name references from both cell-based and name formulas. It may be useful to you to have info on shared formulas available; otherwise having parsed the following:
B2 =A2
B3 =A3+B2
B4 =A4+B3
B5 =A5+B4
...
B60 =A60+B59
you would need to deduce the similarity between the B3:B60 formulas yourself.
In any case, none of the above is likely to be available any time soon -- xlrd priorities lie elsewhere.
Update: I have gone and implemented a little library to do exactly what you describe: extracting the cells & dependencies from an Excel spreadsheet and converting them to python code. Code is on github, patches welcome :)
Just to add that you can always interact with excel using win32com (not very fast but it works). This does allow you to get the formula. A tutorial can be found here [cached copy] and details can be found in this chapter [cached copy].
Essentially you just do:
app.ActiveWorkbook.ActiveSheet.Cells(r,c).Formula
As for building a table of cell dependencies, a tricky thing is parsing the excel expressions. If I remember correctly the Trace code you mentioned does not always do this correctly. The best I have seen is the algorithm by E. W. Bachtal, of which a python implementation is available which works well.
So I know this is a very old post, but I found a decent way of getting the formulas from all the sheets in a workbook as well as having the newly created workbook retain all the formatting.
First step is to save a copy of your .xlsx file as .xls
-- Use the .xls as the filename in the code below
Using Python 2.7
from lxml import etree
from StringIO import StringIO
import xlsxwriter
import subprocess
from xlrd import open_workbook
from xlutils.copy import copy
from xlsxwriter.utility import xl_cell_to_rowcol
import os
file_name = '<YOUR-FILE-HERE>'
dir_path = os.path.dirname(os.path.realpath(file_name))
subprocess.call(["unzip",str(file_name+"x"),"-d","file_xml"])
xml_sheet_names = dict()
with open_workbook(file_name,formatting_info=True) as rb:
wb = copy(rb)
workbook_names_list = rb.sheet_names()
for i,name in enumerate(workbook_names_list):
xml_sheet_names[name] = "sheet"+str(i+1)
sheet_formulas = dict()
for i, k in enumerate(workbook_names_list):
xmlFile = os.path.join(dir_path,"file_xml/xl/worksheets/{}.xml".format(xml_sheet_names[k]))
with open(xmlFile) as f:
xml = f.read()
tree = etree.parse(StringIO(xml))
context = etree.iterparse(StringIO(xml))
sheet_formulas[k] = dict()
for _, elem in context:
if elem.tag.split("}")[1]=='f':
cell_key = elem.getparent().get(key="r")
cell_formula = elem.text
sheet_formulas[k][cell_key] = str("="+cell_formula)
sheet_formulas
Structure of Dictionary 'sheet_formulas'
{'Worksheet_Name': {'A1_cell_reference':'cell_formula'}}
Example results:
{u'CY16': {'A1': '=Data!B5',
'B1': '=Data!B1',
'B10': '=IFERROR(Data!B12,"")',
'B11': '=IFERROR(SUM(B9:B10),"")',
It seems that it is impossible now to do what you want with xlrd. You can have a look at this post for the detailed description of why it is so difficult to implement the functionality you need.
Note that the developping team does a great job for support at the python-excel google group.
I know this post is a little late but there's one suggestion that hasn't been covered here. Cut all the entries from the worksheet and paste using paste special (OpenOffice). This will convert the formulas to numbers so there's no need for additional programming and this is a reasonable solution for small workbooks.
Ye! With win32com it's works for me.
import win32com.client
Excel = win32com.client.Dispatch("Excel.Application")
# python -m pip install pywin32
file=r'path Excel file'
wb = Excel.Workbooks.Open(file)
sheet = wb.ActiveSheet
#Get value
val = sheet.Cells(1,1).value
# Get Formula
sheet.Cells(6,2).Formula

Categories

Resources