How to set formula value in openpyxl - python

I'm trying to write a simple tool to work on .csv and .xlsx files. The idea is to read in data as .csv add some modifications and export to .xlsx with the modifications added. I've got everything working but when I try to write out to the .xlsx file (using openpyxl in Python) I'm having some trouble. When I open the document in either LibreOffice Calc or Gnumeric some fields aren't showing. The fields I'm having trouble with include formulas. When I click on them the formulas are correctly typed in the field, but nothing is shown in the field. If I retype a character in the formula it's evaluated and the value shows up.
I had a poke around in the underlying .xml of the worksheet and the cells obviously doesn't have a cache. But I also noticed that it didn't always have the right data type. I tried to fix this by explicitly setting the type to formula but no luck.
What I use now to set the cell is:
cell = ws.cell(column=i,row=j)
cell.set_explicit_value(value=val,data_type=cell.TYPE_FORMULA)

Related

Stop openpyxl from inserting curly brackets into excel sheet?

I'm currently working on a project that reads from an Excel sheet with the openpyxl library, edits some of the data in the file, and then recreates a new one with the edited data.
My main issue is that when I save the new file, a lot of the excel functions I previously had are saved with curly brackets around them like this:
{=INDEX(M98:O98, MATCH(FALSE, ISBLANK(M98:O98),0))}
This is completely breaking the functionality of the excel sheet.
At first I wondered if this was an issue with how I was updating each individual cell. However, I commented out my entire function besides opening the file, and saving the new one, with no edits done, and I'm still getting the issue. The code now looks like this now:
def update(filename):
# Open excel sheet
e_workload = load_workbook(filename, data_only=False)
# Save results
e_workload.save(filename.replace(".xlsx", "_EDITED.xlsx"))
e_workload.close()
I'm just wondering what is causing this issue and how to fix it. I'm wondering if it's an issue with the library, but I don't want to rewrite my entire program without determining what the issue is first.

How do I make my program independent of an excel sheet it currently relies on?

I'm designing a program in python tkinter that displays information that's currently in an excel spreadsheet. Ideally, I'd like to be able to share this program without needing to share the excel book as well. Is it possible to import the excel book into the program to make the program independent of that excel file? Let me know if I can provide any more clarification. Thank you!
You would need to assign the Excel file's content to a variable in your program to do so. I hope it isn't very large, but to make it easier I recommend:
saving your Excel file to .csv format
read it from Pytho
convert it to the data type you want (ex: string, list, ...)
save it back to a .txt file.
Now just copy the content of your .txt and assign that to a variable somewhere in your code.
Edit: what Alphy13 said is also a solution
Typically its best to create an example workbook that you can share. Give it all of the right parameters and leave out the sensitive data, or fill it in with fake data.
You could also set all of the variables that come from the excel file to have a default value that only changes when the workbook is present. This can be the first step toward creating proper error handling for your program.

Load only a single sheet from a large workbook with Python

Using Python 3.7. I have several .xlsx workbooks with 34 sheets each, most of which have conditional formatting and charts, but all I'm actually after is a cell with specified text that's somewhere on the first sheet of each book. The workbook is not protected but the sheet is, and I don't know the password, so I can't use pandas.read_excel; using openpyxl/load_workbook, it takes ages to load and I get lots of errors about it not being able to handle conditional formatting etc. I then have to search the sheet for the text.
Is there an easy, quick way of loading just the first sheet (or a named sheet)? The pandas code is very quick and easy, but I can't use it :(
Not completely sure about that but I can recommend trying "read-only" mode from openpyxl
https://openpyxl.readthedocs.io/en/stable/optimized.html
It does not fetch the full file but read it in so-called "lazy" mode. Thus you can jump to the cell you need.
It also allows to start reading from the specific sheet
Note that closing file is mandatory

Create a csv file that Excel will not mutate the data of when opening

I am programmatically creating csv files using Python. Many end users open and interact with those files using excel. The problem is that Excel by default mutates many of the string values within the file. For example, Excel converts 0123 > 123.
The values being written to the csv are correct and display correctly if I open them with some other program, such as Notepad. If I open a file with Excel, save it, then open it with Notepad, the file now contains incorrect values.
I know that there are ways for an end user to change their Excel settings to disable this behavior, but asking every single user to do so is not possible for my situation.
Is there a way to generate a csv file using Python that a default copy of Excel will NOT mutate the values of?
Edit: Although these files are often opened in Excel, they are not only opened in Excel and must be output as .csv, not .xlsx.
The short answer is no, it is not possible to generate a single CSV that will display (arbitrary) data the same way in Excel and in non-Excel programs.
There are convoluted ways to force strings to appear how you want when you open a CSV in Excel, but then non-Excel programs will almost certainly not display them the way you want.
Though you say you must stick to CSV due to non-Excel programs, you don't say which programs those are. If it is possible that they can open .xlsx files after all, then .xlsx would be the best choice.
The solution is to declare the data type while writing the file. It seems like Excel is trying to be smart and converts the whole column to a numeric type. The output should be written directly into .xlsx format like so:
import pandas as pd
writer = pd.ExcelWriter('path/to/save.xlsx')
data = {'x':['011','012','013'],'y':['022','033','041']}
Df = pd.DataFrame(data = data)
Df.to_excel(writer,"Sheet1")
writer.save()
Source: https://stackoverflow.com/a/31136119/8819895
Have you tried expressly formatting the relevant column(s) to 'str' before exporting?
df['column_ex'] = df['column_ex'].astype('str')
df.to_csv('df_ex.csv')
Another workaround may be to open Excel program (not file), go to Data menu, then Import form Text. Excel's import utility will give you options to define each column's data type. I believe Apache's Liibre office defaults to keep the leading 0s but Excel doesn't.

Is there a way I can display data in Excel without saving a file first?

I'm using openpyxl to create a workbook in memory and fill it with data. Is there anyway to display that data in Excel at the end of the Python script without saving the file? It would be left up to the user to decide if they want to save the file or not. I'm guessing it's not possible but I wanted to see if I could get a more definitive answer here. Thanks!

Categories

Resources