excel has problems opening a file created with openpyxl - python

Please, if you are not able to provide a constructive solution, do not mark it as duplicate, because I have not found any solution and it says very little about your interest in providing some help.
Excel rejects the formula, but the other strings in other cells are allowed. I'm using the names of the formulas in English and have tried commas and semicolons with the same result.
The formula consists of a markdown template and has several nested conditions.
Part of the code is:
wb = Workbook()
sheet= wb.active
l=str(sheet.max_row+1)
formula='=CONCATENATE("**"&{}&"**"&CHAR(10)&CHAR(10)&"- **Ponente:** "&{}&CHAR(10)&"- **Fuente:** "&{};IF(EXACT({};"");"";CHAR(10)&"- **ID:** "&{});CHAR(10)&"- **Web:** "&{}&CHAR(10)&"- **Idioma:** "&{}&CHAR(10)&"- **Etiquetas:** "&{};IF(EXACT({};"");;CHAR(10)&"- **Fecha:** "&{});IF(EXACT({};"");;CHAR(10)&"- **Notas:** "&{}))'.format("A"+l,"B"+l,"C"+l,"D"+l,"D"+l,"E"+l,"H"+l,"G"+l,"F"+l,"F"+l,"I"+l,"I"+l)
print (formula)
data={
"Título":[titulo],
"Autor":[profesor],
"Fuente":[plataforma],
"ID":[id],
"Web":[url],
"Fecha":[fecha_esp],
"Etiquetas":[etiquetas],
"Idioma":[idioma],
"Notas":[notas],
"Plantilla":[formula]
}
dataframe_pandas = pd.DataFrame(data)
for x in dataframe_to_rows(dataframe_pandas, index=False, header=False):
sheet.append(x)
wb.save(filename)
The console output shows the following formula:
=CONCATENATE("**"&A2&"**"&CHAR(10)&CHAR(10)&"- **Ponente:** "&B2&CHAR(10)&"- **Fuente:** "&C2;IF(EXACT(D2;"");"";CHAR(10)&"- **ID:** "&D2);CHAR(10)&"- **Web:** "&E2&CHAR(10)&"- **Idioma:** "&H2&CHAR(10)&"- **Etiquetas:** "&G2;IF(EXACT(F2;"");;CHAR(10)&"- **Fecha:** "&F2);IF(EXACT(I2;"");;CHAR(10)&"- **Notas:** "&I2))
This formula is rejected by Excel, but if I copy and paste it in the excel field, I have no problem.
Recover workbook: https://i.stack.imgur.com/1OL7T.png
Plantilla field rejected: https://i.stack.imgur.com/Ztcpx.png
Paste formula in field and run: https://i.stack.imgur.com/whlOA.png
So, what is the problem?
Update
Previously this issue was published, but it was marked as duplicated without even trying to help. Now, I am very grateful to the three people who responded.
The problem was the formula in python. I replaced all semicolons, but I must have had some typo that I corrected later and never tried again. But with the evidence provided I tried once more and it worked

Try to replace your semicolons with , in the formula and check.
I tried manually in libreoffice and compared it with auto generated formula. Both differs in , as it automatically changed ; to ,. Then I replaced ; with , in the python file. and the auto generated excel is fine.

You can try with ExcelWriter
from pandas import ExcelWriter
writer = pd.ExcelWriter('output.xlsx')
# write dataframe to excel
df_marks.to_excel(writer)
# save the excel
writer.save()
writer.close()

Related

Unable to save formulas under excel file when it is saved using openpyxl lib

Formulas in the excel sheet are getting removed when it is saved through an openpyxl python script.
Is there any way to save excel file without removing formulas using a python script
Expected: Formulas should not be removed and data should be read through openpyxl lib
Actual: Data is read, but formulas are getting removed
If you read file with data_only = True argument you read value from formula, but not formula.
From docs
data_only controls whether cells with formulae have either the formula (default) or the value stored the last time Excel read the sheet.
Though xlswings, this issue is resolved
I am able to successfully resolve this issue for my assignment.
First do not use data_only parameter. Only define the excel and the sheet using -
e.g.:
exl = openpyxl.load_workbook(exlFile)
sheet = exl["Sheet1"]
now again define the same excel this time using data_only=true
exl1 = openpyxl.load_workbook(exlFile, data_only=True)
sheet1 = exl1["Sheet1"]
Now while reading the data from excel, use sheet1 while writing back to excel, use sheet.
Also while saving the workbook, use exl.save(exlFile) instead of exl1.save(exlFile)
With this I was able to retain all the formulas and also could update the required cells.
Let me know if this is sufficient or need more info.

Is it possible to read excel comments with Pandas?

I have an excel file(.xlsm), from which I need to extract data, including data stored as comments in some cells. Is it possible to read such comments with Pandas? How to do it?
No. As far as I am aware it is not currently possible. If you know you will be making comments when designing your spreadsheet however, you can just specify a column that will contain these comments. Alternatively, you can use something like
pd.read_excel('tmp.xlsx', index_col=0, comment='#')
to specify that any cell that starts with # will be regarded as a comment. From the documentation regarding the comment argument of pandas:
Comments out remainder of line. Pass a character or characters to this argument to indicate comments in the input file. Any data between the comment string and the end of the current line is ignored.
update
I would like to say that I know openpyxl can read comments. An example script would look like:
from openpyxl import Workbook
from openpyxl import load_workbook
wb = load_workbook("test.xlsx")
ws = wb["Sheet1"] # or whatever sheet name
for row in ws.rows:
for cell in row:
print(cell.comment)
Perhaps you could get this to interface with your data somehow!

Python xlwings copy paste with format

Apologies for no coding provided, this is really a generic question.
I'm using Python xlwings library, and trying to copy a sheet from one workbook to another new workbook, then hard-code the sheet in the newly created workbook. Effectively same as "Copy / Paste Values and source formatting".
I wasn't able to find any documentation on this, and thank you in advance for your help!
edit: someone mentioned that I should include an example. Here it is but it's kind hard to show the format in an Excel file. the following code will copy/paste "sht" into a new workbook but the "new_sht" will contain formulas. I'm trying to hard-code all the values while preserving the number format (eg. with thousands separator, percentage sign, etc)
import xlwings as xw
wb = xw.Book('example1.xlsx')
sht = wb.sheets['sheet1']
new_wb = xw.Book()
new_sht = new_wb.sheets[0]
sht.api.Copy(Before = new_sht.api)
Answering my own question as I just figured out what I wanted to accomplish.
The following code will hardcode the values while preserve the formatting, since it's essentially pasting value-only to an already formatted area.
new_sht.range('A1:C10').value = new_sht.range('A1:C10').value

Python xlsxWriter formula name error

I am trying to create an Excel workbook with two worksheets - i used xlsxwriter to enter data on the first worksheet, then rank that data on the second worksheet. When i open the workbook, the ranks have an Excel name? error. If i click on the end of the formula in the edit bar, it calculates correctly, so i dont think the formula is incorrect.. i suspect it may be some sort of ordering of operations? My excel sheet is set to automatically calculate formulas... the only similar problem i could find on the web was xlsxwriter: add formula with other sheet in it, but i cannot tell what the solution was (if it actually turned out to be something other than a french to english issue)
Here is a simplified version of my code
import xlsxwriter
wb = xlsxwriter.Workbook('C:\Python33\ScoreTry.xlsx')
ws1 = wb.add_worksheet('RawScores')
ws2 = wb.add_worksheet('RankScores')
ws1.write(0,0,32)
ws1.write(1,0,39)
ws1.write(2,0,15)
for i in range (0,3):
x = 'IF(isblank(RawScores!A'+str(i+1)+'),"",RANK.AVG(RawScores!A'+str(i+1)+',RawScores!A$1:A$100,0))'
ws2.write_formula(i,0,x)
wb.close()
my RankScores worksheet opesn with three #NAME? errors instead of ranks until i click enter on each. Any ideas much appreciated!
RANK.AVG() is a function that was added as an "extension" after the original XLSX file format specification. There is a list of these functions defined in the Microsoft documentation.
So, although the formula is displayed as RANK.AVG() it is stored in the file as _xlfn.RANK.AVG() (as listed in the previous doc).
If you change your formula to use the prefixed version of the function it should work.
This is a kludgy but currently unavoidable workaround (without some equally kludgey workaround in the module). For what it is worth it is documented in the write_formula() section of the docs.

Format csv cells as text with python

I am giving a row of data to write to a csv file. They are mostly float type numbers. But when it writes to the csv file, the cells are default in custom format. So if I have an input number like 3.25, it prints as "Mar 25". How can I avoid this?
This is the piece of code:
data = [0.21, 3.25, 25.9, 5.2]
f = open('Boot.csv','w')
out = csv.writer(f, delimiter=';', quoting=csv.QUOTE_NONE)
out.writerow(data)
The csv module is writing the data fine - I'm guessing that you're opening it in Excel to look at the results and that Excel is deciding to autoformat it as a date.
It's an excel issue, you need to tell it not to play around with that field by changing it to Text (or anything that isn't General)
If you're writing Excel data, you may want to look at the xlwt module (check out the very useful site http://www.python-excel.org/) - then your value types will not be so liable to fluctuate.
This is not an issue, just MS Excel trying to 'help'. If you are going to programmatically process the output csv file further, you'll have no issues.
If you have to process/view the data in Excel you may want to quote all data (by using csv.QUOTE_ALL rather than csv.QUOTE_NONE, in which case Excel should treat everything as text and not try to be 'helpful'.
This isn't part of csv. csv is nothing more than comma separated values. If you open the file in notepad, it'll be as you expect.
When you open it in excel, it makes a guess as to what each value represents, since this information isn't and can't be encoded in the CSV file. For whatever reason, excel decides 3.25 represents a date, not a number.
Try using a format that can't be misinterpreted as a date:
out.writerow(['%.12f' % item for item in data])
This will include trailing zeros so it should always be parsed by Excel as a number.
This is not a problem with the code you've written; it's with Excel (which you're likely using to open the CSV)--it's interpreting 3.25 as March 25. You can fix this by selecting the affected cells, right-clicking and pressing "Format Cells", and then in the "Number" tab selecting "Number" as your category, ensuring that you have the proper number of decimal places displayed.
If all your problem is Excel importing CSV strangely, then you should directly write XLSX files instead of CSV. This gives you full control over the interpretation of cell content.
The best package I have used so far for writing Excel files in Python is openpyxl (even recommended by the author of the wider spread xlwt package).
Some example code taken from the openpyxl docs:
from openpyxl import Workbook
wb = Workbook()
# grab the active worksheet
ws = wb.active
# Data can be assigned directly to cells
ws['A1'] = 42
# Rows can also be appended
ws.append([1, 2, 3])
# Python types will automatically be converted
import datetime
ws['A2'] = datetime.datetime.now()
# Save the file
wb.save("sample.xlsx")

Categories

Resources