Read and write a single cell in excel using Python - python

I am looking to replace the database (SQL)(around 50,00050 rowscolumns) for my app with excel. I need to update a single cell in excel without loading the whole workbook and then saving it again (I am using Openpyxl) as it is computationally very expensive. I need an alternative that will help me save execution time.
I have tried excel APIs like xlwings but need an alternative to APIs

I cannot comment yet, so I will "answer". Why would you replace a database with Excel? Sounds crazy to me. There are plenty of other persistent storage file systems out there to use, pickle, HD5, pyarrow stuff, csv, etc.. I used the feather format for a while, super fast and pandas can use it natively.

Related

How do I use python pandas to read an already opened excel sheet

Assuming I have an excel sheet already open, make some changes in the file and use pd.read_excel to create a dataframe based on that sheet, I understand that the dataframe will only reflect the data in the last saved version of the excel file. I would have to save the sheet first in order for pandas dataframe to take into account the change.
Is there anyway for pandas or other python packages to read an opened excel file and be able to refresh its data real time (without saving or closing the file)?
Have you tried using mitosheet package? It doesn't answer your question directly, but it allows you working on pandas dataframes as you would do in excel sheets. In this way, you may edit the data on the fly as in excel and still get a pandas dataframe as a result (meanwhile generating the code to perform the same operations with python). Does this help?
There is no way to do this. The table is not saved to disk, so pandas can not read it from disk.
Be careful not to over-engineer, that being said:
Depending on your use case, if this is really needed, I could theoretically imagine a Robotic Process Automation like e.g. BluePrism, UiPath or PowerAutomate loading live data from Excel into a Python environment with a pandas DataFrame continuously and then changing it.
This use case would have to be a really important process though, otherwise licensing RPA is not worth it here.
df = pd.read_excel("path")
In variable explorer you can see the data if you run the program in SPYDER ide

Any way to save format when importing an excel file in Python?

I'm doing some work on the data in an excel sheet using python pandas. When I write and save the data it seems that pandas only saves and cares about the raw data on the import. Meaning a lot of stuff I really want to keep such as cell colouring, font size, borders, etc get lost. Does anyone know of a way to make pandas save such things?
From what I've read so far it doesn't appear to be possible. The best solution I've found so far is to use the xlsxwriter to format the file in my code before exporting. This seems like a very tedious task that will involve a lot of testing to figure out how to achieve the various formats and aesthetic changes I need. I haven't found anything but would said writer happen to in any way be able to save the sheet format upon import?
Alternatively, what would you suggest I do to solve the problem that I have described?
Separate data from formatting. Have a sheet that contains only the data – that's the one you will be reading/writing to – and another that has formatting and reads the data from the first sheet.

XLRD vs Win32 COM performance comparison

I have this huge Excel (xls) file that I have to read data from. I tried using the xlrd library, but is pretty slow. I then found out that by converting the Excel file to CSV file manually and reading the CSV file is orders of magnitude faster.
But I cannot ask my client to save the xls as csv manually every time before importing the file. So I thought of converting the file on the fly, before reading it.
Has anyone done any benchmarking as to which procedure is faster:
Open the Excel file with with the xlrd library and save it as CSV file, or
Open the Excel file with win32com library and save it as CSV file?
I am asking because the slowest part is the opening of the file, so if I can get a performance boots from using win32com I would gladly try it.
if you need to read the file frequently, I think it is better to save it as CSV. Otherwise, just read it on the fly.
for performance issue, I think win32com outperforms. however, considering cross-platform compatibility, I think xlrd is better.
win32com is more powerful. With it, one can handle Excel in all ways (e.g. reading/writing cells or ranges).
However, if you are seeking a quick file conversion, I think pandas.read_excel also works.
I am using another package xlwings. so I am also interested with a comparison among these packages.
to my opinion,
I would use pandas.read_excel to for quick file conversion.
If demanding more processing on Excel, I would choose win32com.

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!
You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Write data to excel template

I need to create some excel tables, but these tables don't have simple look.
There are some pictures, some special fonts etc.
But the complicated parts are static, that means always the same.
So my idea was, I will create an excel-template with these tricky parts and then from python just insert dynamic data to this template.
I am working with pandas framework, but I didn't find a way how to do that with or without this framework.
Any idea?
There isn't an easy way to do this with any of the usual "direct file manipulation" libraries in Python (xlrd, xlwt, XlsxWriter, OpenPyXL; these are what pandas uses). The reason is that the structure of a workbook file is such that it's impossible or prohibitively difficult (depending on whether you're talking about .xls or .xlsx) to do anything resembling "in-place" editing, short of re-implementing Excel itself.
So for what you're trying to do, your best option is to let Excel do the work. (I'm assuming you can run Excel, since you mention that you'd like to create Excel templates.) There are ways to automate Excel, the most straightforward probably being Microsoft's VBA or VBScript. But if you want to do it in Python, you can, using PyWin32 or pywinauto.

Categories

Resources