openpyxl - Write string beginning with equals ('=') - python

I'm pulling some random text strings from a database and writing them to an xlsx file with openpyxl. Some of the strings happen to start with an equals sign (something like "=134lj9adsasf&^") This leads to the problem of Excel trying to interpret it as a formula and showing it as "#NAME?" due to the error.
In Excel itself, I can avoid this problem by changing the cell's format from General to Text prior to writing the string. I tried to do this with openpyxl but it doesn't make a difference. When I open the generated spreadsheet it does show the cell as having text format, but it still shows the error. How can I get around this?
A working example is below. When I open the file in Excel, it shows #NAME? for the third cell. Yet if I simply select the cell and type "=abc?123" (without quotes), Excel accepts the text with no issue.
import openpyxl
from openpyxl.cell.cell import Cell
stringList = [("abc","123","=abc?123","ok")]
wb = openpyxl.Workbook()
ws = wb.create_sheet('Sheet1')
for row in stringList:
ws.append(row)
for idx, cell in enumerate(ws[ws.max_row]):
cell.number_format = '#' # Set all cells to text format to avoid issue with =
cell.value = str(row[idx]) # Re-write data
wb.save("filename.xlsx")

I figured it out. Just need to change the data_type rather than number_format.
The strings starting with equals had their data_type set to 'f'.
for row in stringList:
ws.append(row)
for cell in ws[ws.max_row]:
if cell.data_type == 'f':
cell.data_type = 's'

Related

How to change row height for pandas export .to_excel() after wrapping texts in DataFrame?

I'm wrapping texts in a Pandas DataFrame with this code:
for column in dataframe:
if column != '':
dataframe[column] = dataframe[column].str.wrap(len(column) + 20)
and export the DataFrame to an excel document with .to_excel('filename'). And the result is (LibreOffice on Linux) shown in the image below:
How can I change the row height to get following result:
?
I want to also mention that when I remove above code and wrap text manually in LibreOffice - it works. Maybe it's not possible from code side?
How can I change the row height of the row with the wrapped text in order to see the entire text in Libre Office Calc as shown in the image?
The 'problem' you experience is a result of the wrong expectation that the from pandas dataframe with .to_excel() exported .xls file will auto-magically contain beside the content of the cells and the row/column names also the data about the appropriate formatting of the spreadsheet columns/rows (width/height/font size/etc) so that you can see all of the content in the viewed spreadsheet.
Such expectation does not consider beside other things for example the fact that you haven't along with the export of pandas dataframe to excel file neither specified the font size for displaying the cells nor the widths/heights of the columns and rows which are given in pixels. This makes it impossible to infer the optimal row heights and column widths from the available cell data and store this formatting information along with the content.
In other words there can't be any specific, data-depending formatting information stored in the exported file and if the file is loaded in LibreOffice Calc it is displayed using the standard formatting.
The image shows that after loading the file
you see only the last line of the wrapped text because the used standard row height with the standard font size can display only one line of the string content of the cell.
When I remove above code and wrap text manually in LibreOffice - it works. Maybe it's not possible from code side?
Is it possible on the script side to specify what I achieved by manually change?
If you specify in addition to the spreadsheet cell values also the information about formatting of the rows, columns and cells it is possible to achieve any result you want using Python script code.
Look here (Improving Pandas Excel Output) for more explanations as these ones provided in the comments in the following code which will accomplish what you want to achieve:
row_number = 5 # row number of the cell as shown in the Calc spreadsheet
row_height = 100 # choose appropriate one to show all wrapped text
# get the writer object in order to be able to specify formatting:
writer = pd.ExcelWriter("wrapping_column.xlsx", engine='xlsxwriter')
dataframe.to_excel(writer, index=False, sheet_name='Cell With Wrapped Text')
# get the sheet of the spreadsheet to work on:
workbook = writer.book
worksheet = writer.sheets['Cell With Wrapped Text']
# adjust the height of the row:
worksheet.set_row(row_number-1, row_height)
# save the data along with the formatting changes to file:
writer.save()

openpyxl: conditionally formatting with COUNTIF

I am using openpyxl to create an Excel sheet that I need to conditionally format based on if a certain text string is found within a cell. For example, I want to see if a cell begins with "ok:", so my equation is =COUNTIF(A1,"ok:*")>0. This works in Excel. However, the following Python code in openpyxl results in Excel saying the sheet is corrupted:
redFill = PatternFill(start_color='EE1111', end_color='EE1111', fill_type='solid')
ws.conditional_formatting.add('E1:E10', FormulaRule(formula=['=COUNTIF(A1,"ok:*")>0'],fill=redFill))
How do I properly add a COUNTIF condition to an excel sheet with openpyxl?
Turns out you can't use COUNTIF. Here is the code that works:
red_text = Font(color="9C0006")
red_fill = PatternFill(bgColor="FFC7CE")
dxf = DifferentialStyle(font=red_text, fill=red_fill)
rule = Rule(type="containsText", operator="containsText", text="highlight", dxf=dxf)
rule.formula = ['NOT(ISERROR(SEARCH("ok:*",A1)))']
wsNonDebug.conditional_formatting.add('A1:F40', rule)

Preserving Rich Text Formatting in Excel via Python

I'm trying to add rows to an Excel file via Python (need this to run and refresh daily). The Excel file is essentially a template, at the top of which has some cells for some of the words within a cell have specific formatting, i.e. cell value "That cat is fluffy".
I can't quite find a way to get Python+Excel to work together to preserve that formatting - it takes the format of the first letter in the cell and applies it across the board.
From what I can tell, this is an issue with preserving rich text, but I haven't been able to find a package that can preserve rich text, read and write excel files.
I followed this thread to come up with the code below: writing to existing workbook using xlwt
But, it looks like that copy step from the xlutils package isn't preserving the rich text formatting.
import xlwt
import xlrd
from xlutils.copy import copy
rb = xlrd.open_workbook(templateFile,formatting_info=True)
r_sheet = rb.sheet_by_index(0)
wb = copy(rb)
w_sheet = wb.get_sheet(0)
xlsfile = Infile
insheet = xlrd.open_workbook(xlsfile,formatting_info=True).sheets()[0]
outrow_idx = 10
for row_idx in xrange(insheet.nrows):
for col_idx in xrange(insheet.ncols):
w_sheet.write(outrow_idx, col_idx,
insheet.cell_value(row_idx, col_idx))
outrow_idx += 1
wb.save(Outfile)
Please refer to this link over here, as it may help you with keeping the formatting
Preserving styles using python's xlrd,xlwt, and xlutils.copy
though it doesn't keep the cell comments

Set_column not working until manually activatiing each cell

Been trying to get the set_column to work still. Having problems getting Pandas to work, so have been doing it just in xlsxwriter. Right now am using:
'worksheet.set_column('D:D',None,format4)' - this only seems to work when I go into the xlsx file and actually activate each cell in the "D" column. Is there some way of activating each cell so that I wouldn't have to do it manually?
Thanks in advance.
import xlsxwriter,os,sys,datetime
now=datetime.datetime.now()
def main():
platform=sys.platform
if platform.find('win')>=0:
TheSlash='\\'
else:
TheSlash='/'
output = '%s-%s.xlsx' % ('XlsxSample',now.strftime("%m%d%Y-%H%M"))
workbook = xlsxwriter.Workbook(output, {'strings_to_numbers':True,'default_date_format':'mm/dd/yy hh:mm'})
worksheet = workbook.add_worksheet()
count=0
counter=0
format=workbook.add_format({'font_size':'8','border':True})
formatdict={'num_format':'mm/dd/yy hh:mm'}
format4=workbook.add_format(formatdict)
cur =('Pole1','33.62283963','-90.54639967','4/20/16 11:43','-90.54640226','33.62116957','5207069','25-04','50','3','PRIMARY','PGC')
for name in cur:
worksheet.write(counter, count, name,format)
count+=1
counter+=1
worksheet.set_column('D:D',None,format4)
workbook.close()
if __name__ == "__main__":
main()
as stated above - date format only seems to activate if you get into the "D" cell itself with the cursor.
The reason that the column date format isn't showing up in the column cells is that the program is overwriting it with a cell format here:
for name in cur:
worksheet.write(counter, count, name,format)
count+=1
In XlsxWriter, as in Excel, a cell format overrides a column format.
If you want to have a cell or column format that is the result of 2 combined formats you will need to create a new format that combines those formats and apply it to the cells or the column.
Update: Also, I just noticed that you are writing a string in column D. Dates in Excel are formatted numbers. This is probably why you see the cell data change when you hit return. Excel is converting the date-like string into a formatted number displayed as a date. In XlsxWriter you will need to do the conversion. See the Working with Dates and Time section of the XlsxWriter docs.
You need change format using datetime.datetime.strptime()
Example
import datetime
datetime_result = datetime.dateime.strptime('04/20/16 11:43', '%m/%d/%Y %H:%M')
format5 = workbook.add_format({'num_format':'mm/dd/yy hh:mm'})
worksheet.write('A5', datetime_result, format5)
Refer to Working with Dates and Time in XlsxWriter docs.
In VBA, Columns("D").Select does what you want. If you are running from an external script, you might be able to save a VBA macro and run it with a technique like this: How do I call an Excel macro from Python using xlwings?.

Format csv cells as text with python

I am giving a row of data to write to a csv file. They are mostly float type numbers. But when it writes to the csv file, the cells are default in custom format. So if I have an input number like 3.25, it prints as "Mar 25". How can I avoid this?
This is the piece of code:
data = [0.21, 3.25, 25.9, 5.2]
f = open('Boot.csv','w')
out = csv.writer(f, delimiter=';', quoting=csv.QUOTE_NONE)
out.writerow(data)
The csv module is writing the data fine - I'm guessing that you're opening it in Excel to look at the results and that Excel is deciding to autoformat it as a date.
It's an excel issue, you need to tell it not to play around with that field by changing it to Text (or anything that isn't General)
If you're writing Excel data, you may want to look at the xlwt module (check out the very useful site http://www.python-excel.org/) - then your value types will not be so liable to fluctuate.
This is not an issue, just MS Excel trying to 'help'. If you are going to programmatically process the output csv file further, you'll have no issues.
If you have to process/view the data in Excel you may want to quote all data (by using csv.QUOTE_ALL rather than csv.QUOTE_NONE, in which case Excel should treat everything as text and not try to be 'helpful'.
This isn't part of csv. csv is nothing more than comma separated values. If you open the file in notepad, it'll be as you expect.
When you open it in excel, it makes a guess as to what each value represents, since this information isn't and can't be encoded in the CSV file. For whatever reason, excel decides 3.25 represents a date, not a number.
Try using a format that can't be misinterpreted as a date:
out.writerow(['%.12f' % item for item in data])
This will include trailing zeros so it should always be parsed by Excel as a number.
This is not a problem with the code you've written; it's with Excel (which you're likely using to open the CSV)--it's interpreting 3.25 as March 25. You can fix this by selecting the affected cells, right-clicking and pressing "Format Cells", and then in the "Number" tab selecting "Number" as your category, ensuring that you have the proper number of decimal places displayed.
If all your problem is Excel importing CSV strangely, then you should directly write XLSX files instead of CSV. This gives you full control over the interpretation of cell content.
The best package I have used so far for writing Excel files in Python is openpyxl (even recommended by the author of the wider spread xlwt package).
Some example code taken from the openpyxl docs:
from openpyxl import Workbook
wb = Workbook()
# grab the active worksheet
ws = wb.active
# Data can be assigned directly to cells
ws['A1'] = 42
# Rows can also be appended
ws.append([1, 2, 3])
# Python types will automatically be converted
import datetime
ws['A2'] = datetime.datetime.now()
# Save the file
wb.save("sample.xlsx")

Categories

Resources