Python XLWT: Excel generated by Python xlwt contains missing value - python

I'm quite new to Python and trying to fetch data in HTML and saved to excels using xlwt.
So far the program seems work well (all the output are correctly printed on the python console when running the program) except that when I open the excel file, an error message saying 'We found a problem with some content in FILENAME, Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.' And after I click Yes, I found that a lot of data fields are missing.
It seems that roughly the first 150 lines are fine and the problem begins to rise after that (In total around 15000 lines). And missing data fields concentrate at several columns with relative high data volume.
I'm thinking if it's related to sort of cache allocating mechanism of xlwt?
Thanks a lot for your help here.

seems like a caching issue.
Try sheet.flush_row_data() every 100 rows or so ?

Related

Writing dataframe to Excel takes extremely long

I have got an excel file from work which I amended using pandas. It has 735719 rows × 31 columns, I made the changes necessary and allocated them to a new dataframe. Now I need to have this dataframe in an Excel format. I have checked to see that in jupyter notebooks the ont_dub works and it shows a dataframe. So I use the following code ont_dub.to_excel("ont_dub 2019.xlsx") which I always use.
However normally this would only take a few seconds, but now it has been 40 minutes and it is still calculating. Sidenote I am working in a onedrive folder from work, but that hasn't caused issues before. Hopefully someone can see the problem.
Usually, if you want to save such high amount of datas in a local folder. You don't utilize excel. If I am not mistaken excel has a know limit of displayable cells and it wasnt built to display and query such massive amounts of data (you can use pandas for that). You can either utilize feather files (a known quick save alternative). Or csv files, which are built for this sole purpose.

Error when using Writer.Close() function within my Pandas and Openpyxl code

I have written a code which combines some CSV files into a single Excel file, and ended the 'writer' with the code:
writer.save()
writer.close()
However, I get the following error when trying to then open that file after the code has finalised:
We found a problem with some content in 'the file.xlsx'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.'
This seems to purely be related to the 'Writer.Close()' aspect, as without it I don't get the error. However, instead I cannot open the file as it states that someone else is using it (ie - openpyxl)
I'm not sure if relevant, but my file system runs on a OneDrive cloud based system.
My current plan beyond the 'writer.close()' is to pause the script to allow me to print the excel to PDF (I found this to be unreliable via Python), and then 'hit continue' to continue with exporting the PDF via Email.
Any ideas on how to resolve this error?
With out seeing more of your code and maybe an example of the data you are writing it's tough to make any assumptions. Based on the error you are experiencing it is likely due to the inputs/data going into the actual xlsx file that is causing the issue and not with the actual 'writer'. This is Excel saying that data in your file is 'corrupted' from their standards perspective and needs to be fixed.
You should be able to do a 'recovery' of the file through excel and it will identify the problem spots in your file which you can then back track into your python program and properly address to eliminate the probelm.

Take data from an xls sheet and add them into python commands

I've been asked to create a Python script to automate a server deployment for 80 retail stores.
As part of this script, I have a secondary script that I call to change multiple values in 9 XML files, however, the values are unique for each store, so this script needs to be changed each time, but after I am gone, this is going to be done by semi / non-technical people, so we don't want them to change the Python scripts directly for fear of breaking them.
This in mind, I would like to have these people input the store details into an XLS sheet, and a python file read this sheet and put the data it finds into the existing python script with the data to be changed.
The file will be 2 columns, with the required data in the 2nd one.
I'm sorry if this is a long explanation, but that is the gist of it. I'm using python 2.6. Does anyone have a clue about how I can do this? Or which language might be better for this. I also know Bash and Javascript.
Thanks in advance
Depending on the complexity and the volume of your data
for small Openpyxl,
for large pandas

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!
You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Charts from Excel to PowerPoint with Python

I have an excel workbook that is created using an excellent "xlsxwriter" module. In this workbook, there about about 200 embedded charts. I am now trying to export all those charts into several power point presentations. Ideally, I want to preserve the original format and embedded data without linking to external excel work book.
I am sure there is a way to do this using VBA. But, I was wondering if there is a way to do this using Python. Is there a way to put xlsxwriter chart objects into powerpoints ?
I have looked at python-pptx and can't find anything about getting charts or data series from excel work book.
Any help is appreciated !
After spending hours of trying different things, I have found the solution to this problem. Hopefully,it will help someone save some time.The following code will copy all the charts from "workbook_with_charts.xlsx" to "Final_PowerPoint.pptx."
For some reason, that I am yet to understand, it works better when running this Python program from CMD terminal. It sometimes breaks down if you tried to run this several times, even though the first run is usually OK.
Another issue is that in the fifth line, if you make False using "presentation=PowerPoint.Presentations.Add(False)," it does not work with Microsoft Office 2013, even though both "True" and "False" will still work with Microsoft Office 2010.
It would be great if someone can clarify these about two issues.
# importing the necessary libraries
import win32com.client
from win32com.client import constants
PowerPoint=win32com.client.Dispatch("PowerPoint.Application")
Excel=win32com.client.Dispatch("Excel.Application")
presentation=PowerPoint.Presentations.Add(True)
workbook=Excel.Workbooks.Open(Filename="C:\\.........\\workbook_with_charts.xlsx",ReadOnly=1,UpdateLinks=False)
for ws in workbook.Worksheets:
for chart in ws.ChartObjects():
# Copying all the charts from excel
chart.Activate()
chart.Copy()
Slide=presentation.Slides.Add(presentation.Slides.Count+1,constants.ppLayoutBlank)
Slide.Shapes.PasteSpecial(constants.ppPasteShape)
# WE are going to make the title of slide the same chart title
# This is optional
textbox=Slide.Shapes.AddTextbox(1,100,100,200,300)
textbox.TextFrame.TextRange.Text=str(chart.Chart.ChartTitle.Text)
presentation.SaveAs("C:\\...........\\Final_PowerPoint.pptx")
presentation.Close()
workbook.Close()
print 'Charts Finished Copying to Powerpoint Presentation'
Excel.Quit()
PowerPoint.Quit()
The approach I'd be inclined toward with the current python-pptx version is to read the Excel sheets for their data and recreate the charts in python-pptx. That of course would require knowing what the chart formatting is, etc., so I could see why you might not want to do that.
Importing charts directly from Excel has been done in the past, see the pull request here on GitHub: https://github.com/scanny/python-pptx/pull/65
But it involved a large amount of surgery on python-pptx, and many versions back now, so at most it might be a good guide to what strategies might work. You'd need to want it pretty bad I suppose to go that route :)
I don't have enough reputation to comment but if you get the same issue as #R__raki__ then you can use the integer value defined by the VBA reference. For this case it would be 12.
So replace
Slide=presentation.Slides.Add(presentation.Slides.Count+1,constants.ppLayoutBlank)
with
Slide=presentation.Slides.Add(presentation.Slides.Count+1,12)
See here for more.

Categories

Resources