Excel from Pandas creating a lot of empty rows when opening file - python

I do not have code specifics for this problem unfortunately, but using Python 2.7+openpyxl engine after writing to xlsx causes the excel worksheet to be initially opened with a lot of blank rows, which then gets fixed automatically after scrolling down and back up. Here are some pictures:
There isn't a problem with writing/reading the files with blank rows because the resulting dataframe always matches the amount of rows there are supposed to be, so I believe this is happening when writing to excel. I can't seem to find a similar question online, so I'm wondering if I need to go over my code again or if someone else has hopefully experienced/knows the problem I seem to be experiencing. Thanks!

Related

Writing dataframe to Excel takes extremely long

I have got an excel file from work which I amended using pandas. It has 735719 rows × 31 columns, I made the changes necessary and allocated them to a new dataframe. Now I need to have this dataframe in an Excel format. I have checked to see that in jupyter notebooks the ont_dub works and it shows a dataframe. So I use the following code ont_dub.to_excel("ont_dub 2019.xlsx") which I always use.
However normally this would only take a few seconds, but now it has been 40 minutes and it is still calculating. Sidenote I am working in a onedrive folder from work, but that hasn't caused issues before. Hopefully someone can see the problem.
Usually, if you want to save such high amount of datas in a local folder. You don't utilize excel. If I am not mistaken excel has a know limit of displayable cells and it wasnt built to display and query such massive amounts of data (you can use pandas for that). You can either utilize feather files (a known quick save alternative). Or csv files, which are built for this sole purpose.

Stop openpyxl from inserting curly brackets into excel sheet?

I'm currently working on a project that reads from an Excel sheet with the openpyxl library, edits some of the data in the file, and then recreates a new one with the edited data.
My main issue is that when I save the new file, a lot of the excel functions I previously had are saved with curly brackets around them like this:
{=INDEX(M98:O98, MATCH(FALSE, ISBLANK(M98:O98),0))}
This is completely breaking the functionality of the excel sheet.
At first I wondered if this was an issue with how I was updating each individual cell. However, I commented out my entire function besides opening the file, and saving the new one, with no edits done, and I'm still getting the issue. The code now looks like this now:
def update(filename):
# Open excel sheet
e_workload = load_workbook(filename, data_only=False)
# Save results
e_workload.save(filename.replace(".xlsx", "_EDITED.xlsx"))
e_workload.close()
I'm just wondering what is causing this issue and how to fix it. I'm wondering if it's an issue with the library, but I don't want to rewrite my entire program without determining what the issue is first.

Any way to save format when importing an excel file in Python?

I'm doing some work on the data in an excel sheet using python pandas. When I write and save the data it seems that pandas only saves and cares about the raw data on the import. Meaning a lot of stuff I really want to keep such as cell colouring, font size, borders, etc get lost. Does anyone know of a way to make pandas save such things?
From what I've read so far it doesn't appear to be possible. The best solution I've found so far is to use the xlsxwriter to format the file in my code before exporting. This seems like a very tedious task that will involve a lot of testing to figure out how to achieve the various formats and aesthetic changes I need. I haven't found anything but would said writer happen to in any way be able to save the sheet format upon import?
Alternatively, what would you suggest I do to solve the problem that I have described?
Separate data from formatting. Have a sheet that contains only the data – that's the one you will be reading/writing to – and another that has formatting and reads the data from the first sheet.

Python XLWT: Excel generated by Python xlwt contains missing value

I'm quite new to Python and trying to fetch data in HTML and saved to excels using xlwt.
So far the program seems work well (all the output are correctly printed on the python console when running the program) except that when I open the excel file, an error message saying 'We found a problem with some content in FILENAME, Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.' And after I click Yes, I found that a lot of data fields are missing.
It seems that roughly the first 150 lines are fine and the problem begins to rise after that (In total around 15000 lines). And missing data fields concentrate at several columns with relative high data volume.
I'm thinking if it's related to sort of cache allocating mechanism of xlwt?
Thanks a lot for your help here.
seems like a caching issue.
Try sheet.flush_row_data() every 100 rows or so ?

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!
You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Categories

Resources