Python xlsxWriter formula name error

Python xlsxWriter formula name error - python

I am trying to create an Excel workbook with two worksheets - i used xlsxwriter to enter data on the first worksheet, then rank that data on the second worksheet. When i open the workbook, the ranks have an Excel name? error. If i click on the end of the formula in the edit bar, it calculates correctly, so i dont think the formula is incorrect.. i suspect it may be some sort of ordering of operations? My excel sheet is set to automatically calculate formulas... the only similar problem i could find on the web was xlsxwriter: add formula with other sheet in it, but i cannot tell what the solution was (if it actually turned out to be something other than a french to english issue)
Here is a simplified version of my code
import xlsxwriter
wb = xlsxwriter.Workbook('C:\Python33\ScoreTry.xlsx')
ws1 = wb.add_worksheet('RawScores')
ws2 = wb.add_worksheet('RankScores')
ws1.write(0,0,32)
ws1.write(1,0,39)
ws1.write(2,0,15)
for i in range (0,3):
x = 'IF(isblank(RawScores!A'+str(i+1)+'),"",RANK.AVG(RawScores!A'+str(i+1)+',RawScores!A$1:A$100,0))'
ws2.write_formula(i,0,x)
wb.close()
my RankScores worksheet opesn with three #NAME? errors instead of ranks until i click enter on each. Any ideas much appreciated!

RANK.AVG() is a function that was added as an "extension" after the original XLSX file format specification. There is a list of these functions defined in the Microsoft documentation.
So, although the formula is displayed as RANK.AVG() it is stored in the file as _xlfn.RANK.AVG() (as listed in the previous doc).
If you change your formula to use the prefixed version of the function it should work.
This is a kludgy but currently unavoidable workaround (without some equally kludgey workaround in the module). For what it is worth it is documented in the write_formula() section of the docs.

Related

excel has problems opening a file created with openpyxl

Please, if you are not able to provide a constructive solution, do not mark it as duplicate, because I have not found any solution and it says very little about your interest in providing some help.
Excel rejects the formula, but the other strings in other cells are allowed. I'm using the names of the formulas in English and have tried commas and semicolons with the same result.
The formula consists of a markdown template and has several nested conditions.
Part of the code is:
wb = Workbook()
sheet= wb.active
l=str(sheet.max_row+1)
formula='=CONCATENATE("**"&{}&"**"&CHAR(10)&CHAR(10)&"- **Ponente:** "&{}&CHAR(10)&"- **Fuente:** "&{};IF(EXACT({};"");"";CHAR(10)&"- **ID:** "&{});CHAR(10)&"- **Web:** "&{}&CHAR(10)&"- **Idioma:** "&{}&CHAR(10)&"- **Etiquetas:** "&{};IF(EXACT({};"");;CHAR(10)&"- **Fecha:** "&{});IF(EXACT({};"");;CHAR(10)&"- **Notas:** "&{}))'.format("A"+l,"B"+l,"C"+l,"D"+l,"D"+l,"E"+l,"H"+l,"G"+l,"F"+l,"F"+l,"I"+l,"I"+l)
print (formula)
data={
"Título":[titulo],
"Autor":[profesor],
"Fuente":[plataforma],
"ID":[id],
"Web":[url],
"Fecha":[fecha_esp],
"Etiquetas":[etiquetas],
"Idioma":[idioma],
"Notas":[notas],
"Plantilla":[formula]
}
dataframe_pandas = pd.DataFrame(data)
for x in dataframe_to_rows(dataframe_pandas, index=False, header=False):
sheet.append(x)
wb.save(filename)
The console output shows the following formula:
=CONCATENATE("**"&A2&"**"&CHAR(10)&CHAR(10)&"- **Ponente:** "&B2&CHAR(10)&"- **Fuente:** "&C2;IF(EXACT(D2;"");"";CHAR(10)&"- **ID:** "&D2);CHAR(10)&"- **Web:** "&E2&CHAR(10)&"- **Idioma:** "&H2&CHAR(10)&"- **Etiquetas:** "&G2;IF(EXACT(F2;"");;CHAR(10)&"- **Fecha:** "&F2);IF(EXACT(I2;"");;CHAR(10)&"- **Notas:** "&I2))
This formula is rejected by Excel, but if I copy and paste it in the excel field, I have no problem.
Recover workbook: https://i.stack.imgur.com/1OL7T.png
Plantilla field rejected: https://i.stack.imgur.com/Ztcpx.png
Paste formula in field and run: https://i.stack.imgur.com/whlOA.png
So, what is the problem?
Update
Previously this issue was published, but it was marked as duplicated without even trying to help. Now, I am very grateful to the three people who responded.
The problem was the formula in python. I replaced all semicolons, but I must have had some typo that I corrected later and never tried again. But with the evidence provided I tried once more and it worked

Try to replace your semicolons with , in the formula and check.
I tried manually in libreoffice and compared it with auto generated formula. Both differs in , as it automatically changed ; to ,. Then I replaced ; with , in the python file. and the auto generated excel is fine.

You can try with ExcelWriter
from pandas import ExcelWriter
writer = pd.ExcelWriter('output.xlsx')
# write dataframe to excel
df_marks.to_excel(writer)
# save the excel
writer.save()
writer.close()

openpyxl save function is destructively altering the workbook, preventing subsequent reads from working correctly

I'm currently running into an issue with the openpyxl library for Python 2.7. The version is 2.6.4 (i.e., the latest release for Python 2.7)
The issue I'm having is as follows:
I have an excel workbook with several sheets in it. One of these sheets MySheet has some cells that contain formulas. For example, cell B2 has the formula =Start!$Z1 (which references cell Z1 in a different sheet named Start). Now obviously, when viewing this file in excel, the formula works as expected and cell B1 shows the same value contained in Z1 in the Start sheet.
So far so good.
Next, I load this workbook in python, read B1 to ensure that it has the correct value, and then save it as follows:
f = "C:/my_workbook.xlsx"
wb = openpyxl.load_workbook(filename=f, data_only=True)
sheet = wb.get_sheet_by_name("MySheet")
# The following line returns the expected value output from the formula, because I set data_only=True above
# If I had instead set data_only=False above, then this would have spit out "=Start!$Z1" (as expected)
value = sheet["B1"].value
wb.save(filename="output.xlsx")
wb.close()
This output.xlsx file looks the same as the original my_workbook.xlsx file that I loaded, and everything displays correctly in Excel, however its file size is slightly larger.
Now, when I attempt to run the exact same code as above, except using f = "C:/output.xlsx, the value returned by reading sheet[B1].value is now None.
It's as if saving the original my_workbook.xlsx somehow corrupted the file, preventing openpyxl from being able to retrieve the value from cells containing formulas. For what it's worth, I can still read non-formula cells just fine, and they return the correct values. For whatever reason though, any cell that contains a formula simply returns None after the original save operation.
Has anyone observed this behaviour before? Am I missing something here? Any help would be appreciated!

Just wanted to post a follow up to this problem. I wasn't able to solve the initial problem with openpyxl, but I instead switched to xlwings and no longer experienced this problem.
My understanding is that xlwings actually establishes a connection to Excel via win32 (or equivalent on MacOS) whereas openpyxl deals with the raw data format (OOXML).
So, for anyone who may be reading this in the future and happens to be experiencing the same (or similar) issue - I'm sorry I don't have better advice to solve the problem directly, but I'd happily recommend xlwings as my experience has been pretty good using it thus far.

Open and Fetch data from a ListObject of an Excel sheet with Python

The Problem:
Open a ListObject (excel table) of an Excel file from y python environment.
The why:
There are multiple solutions to open an excel file in python. Starting with pandas:
import pandas as pd
mysheetName="sheet1"
df = pd.read_excel(io=file_name, sheet_name=mysheetName)
This will pass the sheet1 into a pandas data frame.
So far so good.
Other more detailed solution is using specific libraries. This one being a code of a stack overflow question.
from openpyxl import load_workbook
wb2 = load_workbook('test.xlsx')
print wb2.get_sheet_names()
['Sheet2', 'New Title', 'Sheet1']
worksheet1 = wb2['Sheet1'] # one way to load a worksheet
worksheet2 = wb2.get_sheet_by_name('Sheet2') # another way to load a worksheet
print(worksheet1['D18'].value)
So far so good as well.
BUT:
If you have a ListObject (excel table) in a sheet I did not find any way to access the data of the Listobject.
ListObjects are often used by a bit more advance users of Excel; above all when programming macros in VBA. There are very convenient and could be seen as the equivalent of a pandas dataframe in Excel. Having a bridge between Excel Listobject and a pandas data frame seems like super logical. Nevertheless I did not find so far any solution, library or workaround for doing that.
The question.
Does anyone know about some python lybrary/solution to directly extract Listobjects form Excel sheets?.
NOTE1: Not nice solution
Of course knowing the "placement" of the Listobject it is possible to refer to the start and last cell, but this is a really bad solution because does not allow you to modify the Listobject in the excel file (the python would have to be modified straight away). As soon as the placement of the ListObject changes, or the listobject itself gets bigger, the python code would be broken.
NOTE2: My current solution:
I export the listObject from excel (with a macro) into a JSON file and read it from python. But the extra work is obvious. VBA code, extra file etc etc.
Last comment: If someone is interested about this issue but still don't have a clue what is a ListObject in excel here click and see here:

James is right:
https://openpyxl.readthedocs.io/en/stable/worksheet_tables.html
https://openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.table.html
There is a class in openpyxl to read tables. Also by id:
class openpyxl.worksheet.table.Table(id=1,...
id=1 would mean the first table of the worksheet.
Remember always that ListObjects in Excel are called Tables. Thats weird (as oft with VBA). If you work with VBA you might forget that the ListObject=Table.
With xlwings is also possible. The API is a bit different:
import xlwings as xw
wb = xw.Workbook.active()
xw.Range('TableName[ColumnName]').value
Or to get the column including header and Total row, you could do:
xw.Range('TableName[[#All], [ColumnName]]').value

Python xlwings copy paste with format

Apologies for no coding provided, this is really a generic question.
I'm using Python xlwings library, and trying to copy a sheet from one workbook to another new workbook, then hard-code the sheet in the newly created workbook. Effectively same as "Copy / Paste Values and source formatting".
I wasn't able to find any documentation on this, and thank you in advance for your help!
edit: someone mentioned that I should include an example. Here it is but it's kind hard to show the format in an Excel file. the following code will copy/paste "sht" into a new workbook but the "new_sht" will contain formulas. I'm trying to hard-code all the values while preserving the number format (eg. with thousands separator, percentage sign, etc)
import xlwings as xw
wb = xw.Book('example1.xlsx')
sht = wb.sheets['sheet1']
new_wb = xw.Book()
new_sht = new_wb.sheets[0]
sht.api.Copy(Before = new_sht.api)

Answering my own question as I just figured out what I wanted to accomplish.
The following code will hardcode the values while preserve the formatting, since it's essentially pasting value-only to an already formatted area.
new_sht.range('A1:C10').value = new_sht.range('A1:C10').value

python : Get Active Sheet in xlrd? and help for reading and validating excel file in Python

2 Questions to ask:
Ques 1:
I just started studying about xlrd for reading excel file in python.
I was wondering if there is a method in xlsrd --> similar to get_active_sheet() in openpyxl or any other way to get the Active sheet ?
get_active_sheet() works this in openpyxl
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
active_sheet = wb.get_active_sheet()
output : Worksheet "Sheet1"
I had found methods in xlrd for retrieving the names of sheets, but none of them could tell me the active sheet.
Ques 2:
Is xlrd the best packaage in python for reading excel files? I also came across this which had info about other python packages(xlsxwriterxlwtxlutils) for reading and writing excel files.
Which of the above all will be best for making an App which reads an Excel File and applies different validations to to different columns
For eg: Column with Header 'ID' should have unique values and A column with Header 'Country' should have valid Countries.

The "active sheet" here seems you're referring to the last sheet selected when the workbook was saved/closed. You can get this sheet via the sheet_visible value.
import xlrd
xl = xlrd.open_workbook("example.xls")
for sht in xl.sheets():
# sht.sheet_visible value of 1 is "active sheet"
print(sht.name, sht.sheet_selected, sht.sheet_visible)
Usually only one sheet is selected at a time, so it may look like sheet_visible and sheet_selected are the same, but multiple sheets can be selected at a time (ctrl+click multiple sheet tabs, for example).
Another reason this may seem confusing is because Excel uses "visible" in terms of hidden/visible sheets. In xlrd, this is instead sheet.visibility (see https://stackoverflow.com/a/44583134/4258124)

Welcome to Stack Overflow.
I have been working with Excel files in Python for a while now, so I could help you with your question, I think.
openpyxl and xlrd solve different problems, one is for xlsx files (Excel 2007+), where the other one is for xls files (Excel 1997-2003), respectively.
Xenon said in his answer that Excel doesn't recognize the concept of an active sheet, which is not totally true. If you open an Excel document, go to some other sheet (that isn't the first one) and save and close the document, the next time you open it, Excel will open the document on the last sheet you were on.
However, xlrd does not support this kind of workflow, i.e. asking for the active sheet. If you know the sheet name, then you could use the method sheet_by_name, or if you know the sheet index, you could use the method sheet_by_index.
I don't know if the xlrd is the best package around, but it is pretty solid, and I have had nary a problem using it.
The example given could be solved by first iterating through the first row and keeping a dictionary of which column a header is. Then storing all the values in the ID column in a list and comparing the length of that list with the length of a set created from that list, i.e. len(values) == len(set(values)). Following that, you could iterate through the column with header of Country and check each value if it is in a dictionary you previously made with all the valid counties.
I hope this answer suits your needs.
Summary: Stick with xlrd because is mature enough.

You can see all worksheets in a given workbook with the sheet_names() function. Excel has no concept of an "active sheet", but if my assumption that you are referring to the first sheet is correct, you can get the first element of sheet_names() to get the "active sheet."
With regards to your second question, it's not easy to say that a package is better than another package objectively. However, xlrd is widely used, and the most popular Python library for what it does.
I would recommend sticking with it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.