openpyxl: conditionally formatting with COUNTIF - python

I am using openpyxl to create an Excel sheet that I need to conditionally format based on if a certain text string is found within a cell. For example, I want to see if a cell begins with "ok:", so my equation is =COUNTIF(A1,"ok:*")>0. This works in Excel. However, the following Python code in openpyxl results in Excel saying the sheet is corrupted:
redFill = PatternFill(start_color='EE1111', end_color='EE1111', fill_type='solid')
ws.conditional_formatting.add('E1:E10', FormulaRule(formula=['=COUNTIF(A1,"ok:*")>0'],fill=redFill))
How do I properly add a COUNTIF condition to an excel sheet with openpyxl?

Turns out you can't use COUNTIF. Here is the code that works:
red_text = Font(color="9C0006")
red_fill = PatternFill(bgColor="FFC7CE")
dxf = DifferentialStyle(font=red_text, fill=red_fill)
rule = Rule(type="containsText", operator="containsText", text="highlight", dxf=dxf)
rule.formula = ['NOT(ISERROR(SEARCH("ok:*",A1)))']
wsNonDebug.conditional_formatting.add('A1:F40', rule)

Related

How to change row height for pandas export .to_excel() after wrapping texts in DataFrame?

I'm wrapping texts in a Pandas DataFrame with this code:
for column in dataframe:
if column != '':
dataframe[column] = dataframe[column].str.wrap(len(column) + 20)
and export the DataFrame to an excel document with .to_excel('filename'). And the result is (LibreOffice on Linux) shown in the image below:
How can I change the row height to get following result:
?
I want to also mention that when I remove above code and wrap text manually in LibreOffice - it works. Maybe it's not possible from code side?
How can I change the row height of the row with the wrapped text in order to see the entire text in Libre Office Calc as shown in the image?
The 'problem' you experience is a result of the wrong expectation that the from pandas dataframe with .to_excel() exported .xls file will auto-magically contain beside the content of the cells and the row/column names also the data about the appropriate formatting of the spreadsheet columns/rows (width/height/font size/etc) so that you can see all of the content in the viewed spreadsheet.
Such expectation does not consider beside other things for example the fact that you haven't along with the export of pandas dataframe to excel file neither specified the font size for displaying the cells nor the widths/heights of the columns and rows which are given in pixels. This makes it impossible to infer the optimal row heights and column widths from the available cell data and store this formatting information along with the content.
In other words there can't be any specific, data-depending formatting information stored in the exported file and if the file is loaded in LibreOffice Calc it is displayed using the standard formatting.
The image shows that after loading the file
you see only the last line of the wrapped text because the used standard row height with the standard font size can display only one line of the string content of the cell.
When I remove above code and wrap text manually in LibreOffice - it works. Maybe it's not possible from code side?
Is it possible on the script side to specify what I achieved by manually change?
If you specify in addition to the spreadsheet cell values also the information about formatting of the rows, columns and cells it is possible to achieve any result you want using Python script code.
Look here (Improving Pandas Excel Output) for more explanations as these ones provided in the comments in the following code which will accomplish what you want to achieve:
row_number = 5 # row number of the cell as shown in the Calc spreadsheet
row_height = 100 # choose appropriate one to show all wrapped text
# get the writer object in order to be able to specify formatting:
writer = pd.ExcelWriter("wrapping_column.xlsx", engine='xlsxwriter')
dataframe.to_excel(writer, index=False, sheet_name='Cell With Wrapped Text')
# get the sheet of the spreadsheet to work on:
workbook = writer.book
worksheet = writer.sheets['Cell With Wrapped Text']
# adjust the height of the row:
worksheet.set_row(row_number-1, row_height)
# save the data along with the formatting changes to file:
writer.save()

openpyxl - Write string beginning with equals ('=')

I'm pulling some random text strings from a database and writing them to an xlsx file with openpyxl. Some of the strings happen to start with an equals sign (something like "=134lj9adsasf&^") This leads to the problem of Excel trying to interpret it as a formula and showing it as "#NAME?" due to the error.
In Excel itself, I can avoid this problem by changing the cell's format from General to Text prior to writing the string. I tried to do this with openpyxl but it doesn't make a difference. When I open the generated spreadsheet it does show the cell as having text format, but it still shows the error. How can I get around this?
A working example is below. When I open the file in Excel, it shows #NAME? for the third cell. Yet if I simply select the cell and type "=abc?123" (without quotes), Excel accepts the text with no issue.
import openpyxl
from openpyxl.cell.cell import Cell
stringList = [("abc","123","=abc?123","ok")]
wb = openpyxl.Workbook()
ws = wb.create_sheet('Sheet1')
for row in stringList:
ws.append(row)
for idx, cell in enumerate(ws[ws.max_row]):
cell.number_format = '#' # Set all cells to text format to avoid issue with =
cell.value = str(row[idx]) # Re-write data
wb.save("filename.xlsx")
I figured it out. Just need to change the data_type rather than number_format.
The strings starting with equals had their data_type set to 'f'.
for row in stringList:
ws.append(row)
for cell in ws[ws.max_row]:
if cell.data_type == 'f':
cell.data_type = 's'

Unable to save formulas under excel file when it is saved using openpyxl lib

Formulas in the excel sheet are getting removed when it is saved through an openpyxl python script.
Is there any way to save excel file without removing formulas using a python script
Expected: Formulas should not be removed and data should be read through openpyxl lib
Actual: Data is read, but formulas are getting removed
If you read file with data_only = True argument you read value from formula, but not formula.
From docs
data_only controls whether cells with formulae have either the formula (default) or the value stored the last time Excel read the sheet.
Though xlswings, this issue is resolved
I am able to successfully resolve this issue for my assignment.
First do not use data_only parameter. Only define the excel and the sheet using -
e.g.:
exl = openpyxl.load_workbook(exlFile)
sheet = exl["Sheet1"]
now again define the same excel this time using data_only=true
exl1 = openpyxl.load_workbook(exlFile, data_only=True)
sheet1 = exl1["Sheet1"]
Now while reading the data from excel, use sheet1 while writing back to excel, use sheet.
Also while saving the workbook, use exl.save(exlFile) instead of exl1.save(exlFile)
With this I was able to retain all the formulas and also could update the required cells.
Let me know if this is sufficient or need more info.

Read table data from Excel file with python

I currently have an Excel workbook with some graphs (charts?). The graphs are plotted from numerical values. I can access the values in LibreOffice if I right click on the graph and select "Data table". These values are nowhere else in the file.
I would like to access these values programmatically with Python. I tried things like xlrd, but it seems xlrd ignores graphical elements. When I run it on my workbook I only get empty cells back.
Have you ever encountered this issue?
Sadly I cannot provide the file as it is confidential.
import pandas as pd
df = pd.read_excel('path/name_of_your_file.xlsx')
print(df.head())
You should have a dataframe (df) to play with in python!
I never worked with graphical excel file. But i used to read normal excel with following code. have you tried this?
import xlrd
file = 'temp.xls'
book = xlrd.open_workbook(file)
for sheet in book.sheets():
#to check columns in sheet
if sheet.ncols:
#row values
row_list = sheet.row_values
for value in row_list:
print(value)

Unhide new cells with openpyxl?

I am trying to add a new sheet to an existing XLSX document with OpenPyxl and Python 2.7. Adding the cells works, but the cells are hidden, actually the whole column.
This the code:
ws = wb.create_sheet(title='newsheet')
for i in range(0, len(items)-1):
c = ws.cell(column=1, row=i+1)
c.value = 'foo'
c.style.protection = Protection(hidden=False)
wb.save('new_file.xlsx')
I don't see 'foo' in the resulting spreadsheet.
Unfortunately, Excel ignores row or column styles for existing cells so you have to apply them to individual cells. As a result, openpyxl only interprets styles applied to individual cells as relevant to those cells. See http://openpyxl.readthedocs.org/en/2.3.0-b1/styles.html#applying-styles for further information.

Categories

Resources