Python Excel - OpenPyxl Limitation

Python Excel - OpenPyxl Limitation - python

I recently started to automate a report at work using Python. Since my data was provided to me in the form of an excel sheet, I felt the best way to do this was to use an excel python module. My module of choice was openpyxl. It worked great, I've used it to perform calculations and organise my data ready to plot charts. Now here's the problem...
I know that you cannot update existing charts using openpyxl so that option went out the window.
What I then tried to do was link the data in my openpyxl spreadsheet to another spreadsheet containing the charts (which is then linked to my word document where the charts are to be displayed). So after doing this I ran my script and to my annoyance, the data links between my openpyxl spreadsheet and charts spreadsheet had been severed. I guess this is because openpyxl creates a new spreadsheet when you save using the save function links are severed.
My question is.. are there any ways to maintain the data links?

It is currently not possible to maintain links between files. I think it would be possible to keep them metadata but, for fairly obvious reasons, it won't necessarily be possible to validate them. This best way for this to happen would be through a pull request.
If you're on Windows you might look at using the Python for Windows stuff which will allow you to remote control the applications.

Related

How to use Python to automate the movement of data between two Excel workbooks with specific parameters

Thanks for taking the time to read my question.
I am working on a personal project to learn python scripting for excel, and I want to learn how to move data from one workbook to another.
In this example, I am emulating a company employee ledger that has name, position, address, and more (The organizations is by row so every employee takes up one row). But the project is to have a selected number of people be transferred to a new ledger (another excel file). So I have a list of emails in a .txt file (it could even be another excel file but I thought .txt would be easier), and I would want the script to run through the .txt file, get the emails, and look for any rows that have a matching email address(all emails are in cell 'B'). And if any are found, then copy that entire row to the new excel file.
I tried a lot of ways to make this work, but I could not figure it out. I am really new to python so I am not even sure if this is possible. Would really appreciate some help!

You have essentially two packages that will allow manipulation of Excel files. For reading in data and performing analysis the standard package for use is pandas. You can save the files as .xlsx however you are only really working with base table data and not the file itself (IE, you are extracing data FROM the file, not working WITH the file)
However what you need is really to perform manipulation on Excel files directly which is better done with openpyxl
You can also read files (such as your text file) using with open function that is native to Python and is not a third party import like pandas or openpyxl.
Part of learning to program includes learning how to use documentation.
As such, here is the documentation you require with sufficient examples to learn openpyxl: https://openpyxl.readthedocs.io/en/stable/
And you can learn about pandas here: https://pandas.pydata.org/docs/user_guide/index.html
And you can learn about python with open here: https://docs.python.org/3/tutorial/inputoutput.html
Hope this helps.
EDIT: It's possible I or another person can give you a specific example using your data / code etc, but you would have to provide it fully. Since you're learning, I suggest using the documentation or youtube.

Creating a complex excel worksheet in python

I am working on a project that researches into book records as cataloged in libraries. For a part of the analysis, I need to create an excel sheet with many regions like the following:
The data for these structure are stored in a database, and I'm using python.
Being a Pandas-newbee, I tried to see if I can do this through dataFrames/mulitIndexes, but from the tutorials I have been following so far, I am doubtful this is the best way, because, as you can see from the screen-dump, the row/columns are not very regular.
Should I rather use openpyxl for that task? Or some other API/stack I have not yet encountered?

How to Embed Excel Files and Link Data into PowerPoint by python in specific place of slide

There is an excel file with different sheets which is included by different charts and data. I want to make a powerpoint for my daily presentation automatically by python(PPTx lib in python).
my problem is I have to copy the charts which exist in excel and past in my powerpoint which is created by python (pptx). I want to know is there any possibility to export charts from excel file to powerpoint by python?

There is no direct API support for this in python-pptx. However, there are other approaches that might work for you.
Perhaps the simplest would be to use a package like openpyxl to read the data from the spreadsheet and recreate the chart using python-pptx, based on the data read from Excel.
If you wanted to copy the chart exactly, this is also possible but would require detailed knowledge of the Open Packaging Convention (OPC) file format and XML schemas to accomplish. Essentially, you would copy the chart-part for the chart into the PowerPoint package (zip file) and connect it to a graphic-frame shape on a slide. You'd also need to embed the Excel worksheet into the PowerPoint, perhaps repeatedly (once for each chart) and make any format-specific adjustments (Excel and PowerPoint handle charts slightly differently in certain details).
This latter approach would be a big job, so I would recommend trying the simpler approach first and see if that will get it done for you.

xlsx file extension not valid after saving with openpyxl and keep_vba=true. Which is the best way?

In the environment, we have an excel file, which includes rawdata in one sheet and pivot table and charts in another sheet.
I need to append rows every day to raw data automatically using a python job.
I am not sure, but there may be some VB Script running on the front end which will refresh the pivot tables.
I used openpyxl and by following its online documentation, I was able to append rows and save the workbook. I used keep_vba=true while loading the workbook to keep the VBA modules inside to enable pivoting. But after saving the workbook, the xlsx is not being opened anymore using MS office and saying the format or the extension is not valid. I can see the data using python but with office, its not working anymore. If I don't use keep_vba=true, then pivoting is not working, only the previous values are present (ofcourse as I understood, as VBA script is needed for pivoting).
Could you explain me what's happening? I am new to python and don't know its concepts much.
How can I fix this in openpyxl or is there any better alternative other than openpyxl. Data connections in MS office is not an option for me.
As I understood, xlsx may need special modules to save the VB script to save in the same way as it may be saved using MS office. If it is, then what is the purpose of keep_vba=true ?
I would be grateful if you could explain in more detail. I would love to know.
As I have very short time to complete this task, I am looking for a quick answer here, instead of going through all the concepts.
Thankyou!

You have to save the files with the extension ".xlsm" rather than ".xlsx". The .xlsx format exists specifically to provide the user with assurance that there is no VBA code within the file. This is an Excel standard and not a problem with openpyxl. With that said, I haven't worked with openpyxl, so I'm not sure what you need to do to be sure your files are properly converted to .xlsm.
Edit: Sorry, misread your question first time around. Easiest step would be to set keep_vba=False. That might resolve your issue right there, since you're telling openpyxl to look for VBA code that can't possibly exist in an xlsx file. Hard to say more than that until you post the relevant section of your code.

Charts from Excel to PowerPoint with Python

I have an excel workbook that is created using an excellent "xlsxwriter" module. In this workbook, there about about 200 embedded charts. I am now trying to export all those charts into several power point presentations. Ideally, I want to preserve the original format and embedded data without linking to external excel work book.
I am sure there is a way to do this using VBA. But, I was wondering if there is a way to do this using Python. Is there a way to put xlsxwriter chart objects into powerpoints ?
I have looked at python-pptx and can't find anything about getting charts or data series from excel work book.
Any help is appreciated !

After spending hours of trying different things, I have found the solution to this problem. Hopefully,it will help someone save some time.The following code will copy all the charts from "workbook_with_charts.xlsx" to "Final_PowerPoint.pptx."
For some reason, that I am yet to understand, it works better when running this Python program from CMD terminal. It sometimes breaks down if you tried to run this several times, even though the first run is usually OK.
Another issue is that in the fifth line, if you make False using "presentation=PowerPoint.Presentations.Add(False)," it does not work with Microsoft Office 2013, even though both "True" and "False" will still work with Microsoft Office 2010.
It would be great if someone can clarify these about two issues.
# importing the necessary libraries
import win32com.client
from win32com.client import constants
PowerPoint=win32com.client.Dispatch("PowerPoint.Application")
Excel=win32com.client.Dispatch("Excel.Application")
presentation=PowerPoint.Presentations.Add(True)
workbook=Excel.Workbooks.Open(Filename="C:\\.........\\workbook_with_charts.xlsx",ReadOnly=1,UpdateLinks=False)
for ws in workbook.Worksheets:
for chart in ws.ChartObjects():
# Copying all the charts from excel
chart.Activate()
chart.Copy()
Slide=presentation.Slides.Add(presentation.Slides.Count+1,constants.ppLayoutBlank)
Slide.Shapes.PasteSpecial(constants.ppPasteShape)
# WE are going to make the title of slide the same chart title
# This is optional
textbox=Slide.Shapes.AddTextbox(1,100,100,200,300)
textbox.TextFrame.TextRange.Text=str(chart.Chart.ChartTitle.Text)
presentation.SaveAs("C:\\...........\\Final_PowerPoint.pptx")
presentation.Close()
workbook.Close()
print 'Charts Finished Copying to Powerpoint Presentation'
Excel.Quit()
PowerPoint.Quit()

The approach I'd be inclined toward with the current python-pptx version is to read the Excel sheets for their data and recreate the charts in python-pptx. That of course would require knowing what the chart formatting is, etc., so I could see why you might not want to do that.
Importing charts directly from Excel has been done in the past, see the pull request here on GitHub: https://github.com/scanny/python-pptx/pull/65
But it involved a large amount of surgery on python-pptx, and many versions back now, so at most it might be a good guide to what strategies might work. You'd need to want it pretty bad I suppose to go that route :)

I don't have enough reputation to comment but if you get the same issue as #R__raki__ then you can use the integer value defined by the VBA reference. For this case it would be 12.
So replace
Slide=presentation.Slides.Add(presentation.Slides.Count+1,constants.ppLayoutBlank)
with
Slide=presentation.Slides.Add(presentation.Slides.Count+1,12)
See here for more.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.