Charts from Excel to PowerPoint with Python - python

I have an excel workbook that is created using an excellent "xlsxwriter" module. In this workbook, there about about 200 embedded charts. I am now trying to export all those charts into several power point presentations. Ideally, I want to preserve the original format and embedded data without linking to external excel work book.
I am sure there is a way to do this using VBA. But, I was wondering if there is a way to do this using Python. Is there a way to put xlsxwriter chart objects into powerpoints ?
I have looked at python-pptx and can't find anything about getting charts or data series from excel work book.
Any help is appreciated !

After spending hours of trying different things, I have found the solution to this problem. Hopefully,it will help someone save some time.The following code will copy all the charts from "workbook_with_charts.xlsx" to "Final_PowerPoint.pptx."
For some reason, that I am yet to understand, it works better when running this Python program from CMD terminal. It sometimes breaks down if you tried to run this several times, even though the first run is usually OK.
Another issue is that in the fifth line, if you make False using "presentation=PowerPoint.Presentations.Add(False)," it does not work with Microsoft Office 2013, even though both "True" and "False" will still work with Microsoft Office 2010.
It would be great if someone can clarify these about two issues.
# importing the necessary libraries
import win32com.client
from win32com.client import constants
PowerPoint=win32com.client.Dispatch("PowerPoint.Application")
Excel=win32com.client.Dispatch("Excel.Application")
presentation=PowerPoint.Presentations.Add(True)
workbook=Excel.Workbooks.Open(Filename="C:\\.........\\workbook_with_charts.xlsx",ReadOnly=1,UpdateLinks=False)
for ws in workbook.Worksheets:
for chart in ws.ChartObjects():
# Copying all the charts from excel
chart.Activate()
chart.Copy()
Slide=presentation.Slides.Add(presentation.Slides.Count+1,constants.ppLayoutBlank)
Slide.Shapes.PasteSpecial(constants.ppPasteShape)
# WE are going to make the title of slide the same chart title
# This is optional
textbox=Slide.Shapes.AddTextbox(1,100,100,200,300)
textbox.TextFrame.TextRange.Text=str(chart.Chart.ChartTitle.Text)
presentation.SaveAs("C:\\...........\\Final_PowerPoint.pptx")
presentation.Close()
workbook.Close()
print 'Charts Finished Copying to Powerpoint Presentation'
Excel.Quit()
PowerPoint.Quit()

The approach I'd be inclined toward with the current python-pptx version is to read the Excel sheets for their data and recreate the charts in python-pptx. That of course would require knowing what the chart formatting is, etc., so I could see why you might not want to do that.
Importing charts directly from Excel has been done in the past, see the pull request here on GitHub: https://github.com/scanny/python-pptx/pull/65
But it involved a large amount of surgery on python-pptx, and many versions back now, so at most it might be a good guide to what strategies might work. You'd need to want it pretty bad I suppose to go that route :)

I don't have enough reputation to comment but if you get the same issue as #R__raki__ then you can use the integer value defined by the VBA reference. For this case it would be 12.
So replace
Slide=presentation.Slides.Add(presentation.Slides.Count+1,constants.ppLayoutBlank)
with
Slide=presentation.Slides.Add(presentation.Slides.Count+1,12)
See here for more.

Related

I need a starting point to code an app to extract text from pdf to excel

To start I just want to state that I'm an Electrical Engineer with basic knowledge of programming.
My requirement is as follows:
I want to create an app where I can load and view PDF files that
contain tables.
These PDF files tables are of irregular shapes and in a different
position on every page. (that's why tools like tabular couldn't help
me)
Each table entry is multiline and of irregular dimensions (I cannot
select a whole row at a time it has to be each element alone. simply
copying the lines to excel won't work either because it will need a
lot of formatting)
So I want to be able to select each table entry individually from the
table (like a selection or cropping box over the required text),
delete new line if there is a new line in the text and just keep spaces.
The generated excel (or access database I do not really mind any)
should be reviewable and saveable (if those are even words XD).
I have a good knowledge of python and a very elementary knowledge of Django and I'm seeking some expert who can tell me what do I really need to learn (and if possible where to learn it) to execute my project.
Is it very much for me to execute and if I can dedicate 10 hours a week, how much would it take me to execute such a project.
Thanks all for your help in advance.
Don't use Python, use Word. Open the pdf, then step through the tables collection to collect the data and put it into excel. See this for an example
Here are the advises i can provide you :
first of all, ask internet for questions :
https://lmddgtfy.net/?q=python%20library%20tabular%20pdf
-> Camelot , which is mentioned multiple time seems to be relevant
For the use of excel sheet, i present you one of the most famous library for manipulating DataFrame: Pandas
You can use small courses on internet which will offer you a quick ability to manage your project easier.
for the application, you can easily find on youtube courses on a library made by someone who will explain you how to do a basic application. It could offer you the entry point you are talking about. Then, You can just wonder what else do you need or simply want for making it better.
for the time needed, it depends on how much time do you need to understand the basics, how much time you spend on having a deeper comprehension. I think in one week, working during your free time with a real interest, it could be working( not perfect, but working, which is a good beginning)
PS: I am not sure if your question is relevant for the aims of stackoverflow. I suggest you to read this file. ( https://stackoverflow.com/help/how-to-ask)

Python XLWT: Excel generated by Python xlwt contains missing value

I'm quite new to Python and trying to fetch data in HTML and saved to excels using xlwt.
So far the program seems work well (all the output are correctly printed on the python console when running the program) except that when I open the excel file, an error message saying 'We found a problem with some content in FILENAME, Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.' And after I click Yes, I found that a lot of data fields are missing.
It seems that roughly the first 150 lines are fine and the problem begins to rise after that (In total around 15000 lines). And missing data fields concentrate at several columns with relative high data volume.
I'm thinking if it's related to sort of cache allocating mechanism of xlwt?
Thanks a lot for your help here.
seems like a caching issue.
Try sheet.flush_row_data() every 100 rows or so ?

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!
You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

After modifying Excel Sheets via python, graphs and charts disappears

when I try to write data into an already existing excel Sheet, then the existing charts and pie charts in that excel sheet disappears.
Any code to do that via python is appreciated.
Thanks in Advance :)
This isn't currently possible using openpyxl (as of 2016-11-22), and based on my googling I don't think it's possible using anything else either, though I could be wrong.
Here is the bug for openpyxl. It is scheduled to be fixed in version 2.5 (they're currently on 2.4)
There's already some code written to read charts, I'm not sure if it works yet though. More info at the link.

Python Excel - OpenPyxl Limitation

I recently started to automate a report at work using Python. Since my data was provided to me in the form of an excel sheet, I felt the best way to do this was to use an excel python module. My module of choice was openpyxl. It worked great, I've used it to perform calculations and organise my data ready to plot charts. Now here's the problem...
I know that you cannot update existing charts using openpyxl so that option went out the window.
What I then tried to do was link the data in my openpyxl spreadsheet to another spreadsheet containing the charts (which is then linked to my word document where the charts are to be displayed). So after doing this I ran my script and to my annoyance, the data links between my openpyxl spreadsheet and charts spreadsheet had been severed. I guess this is because openpyxl creates a new spreadsheet when you save using the save function links are severed.
My question is.. are there any ways to maintain the data links?
It is currently not possible to maintain links between files. I think it would be possible to keep them metadata but, for fairly obvious reasons, it won't necessarily be possible to validate them. This best way for this to happen would be through a pull request.
If you're on Windows you might look at using the Python for Windows stuff which will allow you to remote control the applications.

Categories

Resources