Reading excel file with pylightxl

Reading excel file with pylightxl - python

My organization needs me to use pylightxl library to read some bulky excel xlsx files. I have never used this library before and I'm getting a strange error in pycharm. I simply do not understand what it is.
I've tried googling but there isn't much support for pylightxl on the web. Does anyone know how to help?

for completeness of this post it looks like there was a bug in early days of pylightxl for files that were converted from xls to xlsx, however this issue has been resolved with #31 with version 1.52+

For those who might meet the same issue in the future, here's how i solved my issue, or rather a work around of it.
The excel files i had been given were initially in xls format, which pylightxl does not support.
I converted them to xlsx by just clicking "save as" in excel and then tried to read them in pylightxl, of which i was getting the strange error above. Must be something to do with the format
So I ended up saving it in csv instead, of which reading them was successful.
So if anyone meets this error, try different formats for the document you're trying to read

Related

Error when using Writer.Close() function within my Pandas and Openpyxl code

I have written a code which combines some CSV files into a single Excel file, and ended the 'writer' with the code:
writer.save()
writer.close()
However, I get the following error when trying to then open that file after the code has finalised:
We found a problem with some content in 'the file.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.'
This seems to purely be related to the 'Writer.Close()' aspect, as without it I don't get the error. However, instead I cannot open the file as it states that someone else is using it (ie - openpyxl)
I'm not sure if relevant, but my file system runs on a OneDrive cloud based system.
My current plan beyond the 'writer.close()' is to pause the script to allow me to print the excel to PDF (I found this to be unreliable via Python), and then 'hit continue' to continue with exporting the PDF via Email.
Any ideas on how to resolve this error?

With out seeing more of your code and maybe an example of the data you are writing it's tough to make any assumptions. Based on the error you are experiencing it is likely due to the inputs/data going into the actual xlsx file that is causing the issue and not with the actual 'writer'. This is Excel saying that data in your file is 'corrupted' from their standards perspective and needs to be fixed.
You should be able to do a 'recovery' of the file through excel and it will identify the problem spots in your file which you can then back track into your python program and properly address to eliminate the probelm.

Xlwings: avoid to open the file

Is there any way to avoid the file to be opened while working with xlwings?
I have read there was an update going on one year ago but I do not know if the issue has been solved.

As of version 0.10.4, xlwings is purely manipulating Excel files via a running Excel instance. That means, yes, you need to have your file open.
You can set the Excel instance to visible=False, see here, but I doubt that this is what you want. To manipulate the files directly without Excel, you have to use xlrd/xlwt or xlsxwriter or openpyxl.

According with Felix Zumstein, the solution lies in that documentation. Also I had this problem and I solved with the following line of code:
import xlwings
xlwings.App().visible = False
Personally, before I had other problems, even wider! And thanks to this solution I solved them.

Python XLWT: Excel generated by Python xlwt contains missing value

I'm quite new to Python and trying to fetch data in HTML and saved to excels using xlwt.
So far the program seems work well (all the output are correctly printed on the python console when running the program) except that when I open the excel file, an error message saying 'We found a problem with some content in FILENAME, Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.' And after I click Yes, I found that a lot of data fields are missing.
It seems that roughly the first 150 lines are fine and the problem begins to rise after that (In total around 15000 lines). And missing data fields concentrate at several columns with relative high data volume.
I'm thinking if it's related to sort of cache allocating mechanism of xlwt?
Thanks a lot for your help here.

seems like a caching issue.
Try sheet.flush_row_data() every 100 rows or so ?

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!

You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Which module has more option to read and write xlsx extension files using Python?

I have to read and write data's into .xlsx extentsion files using python. And I have to use cell formatting features like merging cells,bold,font size,color etc..So which python module is good to use ?

xlrd and xlwt may help you. Have a look into http://www.python-excel.org/.

openpyxl is the only library I know of that can read and write xlsx files. It's down side is that when you edit an existing file it doesn't save the original formatting or charts. A problem I'm dealing with right now. If anyone knows a work around please let me know.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.