Python, how to insert value in Powerpoint template? - python

I want to use an existing powerpoint presentation to generate a series of reports:
In my imagination the powerpoint slides will have content in such or similar form:
Date of report: {{report_date}}
Number of Sales: {{no_sales}}
...
Then my python app opens the powerpoint, fills in the values for this report and saves the report with a new name.
I googled, but could not find a solution for this.
There is python-pptx out there, but this is all about creating a new presentation and not inserting values in a template.
Can anybody advice?

Ultimately, barring some other library which has additional functionality, you need some sort of brute force approach to iterate the Slides collection and each Slide's respective Shapes collection in order to identify the matching shape (unless there is some other library which has additional "Find" functionality in PPT). Here is brute force using only win32com:
from win32com import client
find_date = r'{{report_date}}'
find_sales = r'{{no_sales}}'
report_date = '01/01/2016' # Modify as needed
no_sales = '604' # Modify as needed
path = 'c:/path/to/file.pptx'
outpath = 'c:/path/to/output.pptx'
ppt = client.Dispatch("PowerPoint.Application")
pres = ppt.Presentations.Open(path, WithWindow=False)
for sld in pres.Slides:
for shp in sld.Shapes:
with shp.TextFrame.TextRange as tr:
if find_date in tr.Text
tr.Replace(find_date, report_date)
elif find_sales in shp.TextFrame.Characters.Text
tr.Replace(find_sales, no_sales)
pres.SaveAs(outpath)
pres.Close()
ppt.Quit()
If these strings are inside other strings with mixed text formatting, it gets trickier to preserve existing formatting, but it should still be possible.
If the template file is still in design and subject to your control, I would consider giving the shape a unique identifier like a CustomXMLPart or you could assign something to the shapes' AlternativeText property. The latter is easier to work with because it doesn't require well-formed XML, and also because it's able to be seen & manipulated via the native UI, whereas the CustomXMLPart is only accessible programmatically, and even that is kind of counterintuitive. You'll still need to do shape-by-shape iteration, but you can avoid the string comparisons just by checking the relevant property value.

I tried this on a ".ppx" file I had hanging around.
A microsoft office power point ".pptx" file is in ".zip" format.
When I unzipped my file, I got an ".xml" file and three directories.
My ".pptx" file has 116 slides comprised of 3,477 files and 22 directories/subdirectories.
Normally, I would say it is not workable, but since you have only two short changes you probably could figure out what to change and zip the files to make a new ".ppx" file.
A warning: there are some xml blobs of binary data in one or more of the ".xml" files.

You can definitely do what you want with python-pptx, just perhaps not as straightforwardly as you imagine.
You can read the objects in a presentation, including the slides and the shapes on the slides. So if you wanted to change the text of the second shape on the second slide, you could do it like this:
slide = prs.slides[1]
shape = slide.shapes[1]
shape.text = 'foobar'
The only real question is how you find the shape you're interested in. If you can make non-visual changes to the presentation (template), you can determine the shape id or shape name and use that. Or you could fetch the text for each shape and use regular expressions to find your keyword/replacement bits.
It's not without its challenges, and python-pptx doesn't have features specifically designed for this role, but based on the parameters of your question, this is definitely a doable thing.

Related

Python Generic Data Engine

I have been working on Python for about 1.5yrs and looking for some direction. This is the first time I can't find what I need after doing a lot of searching and must be missing something- most likely searching the wrong terms.
Problem: I am working on an app that has many processes (Could be hundreds or even thousands). Each process may have a unique input and output data format - could be multiline strings, comma separated strings, excel or csv with or without varying headers and many others. I need something that will format the input correctly and handle the output based upon the process. New processes also need to be easily added/defined. I am open to whatever is the best approach, but my thoughts are to use a database that stores the template/data definition and use that to know the format given a process. However, I'm struggling to come up with exactly how, if this is really the best approach, but it needs to be a solution that is scalable. Any direction would be appreciated. Thank you.
A couple simple examples of data
Process 1 example data (multi line string with Header)
Input of
[ABC123, XYZ453, CDE987]
and the resulting data input below would be created:
Barcode
ABC123
XYZ453
CDE987
This code below works, but is not reusable for the example 2.
list = [ABC123, XYZ453, CDE987]
input = "Barcode /r/n"
for l in list:
input = input + l + '/r/n'
Process 2 example input template (comma separated with Header):
Barcode,Location,Param1,Param2
Item1,L1,11,A
Item1,L1,22,B
Item2,L1,33,C
Item2,L2,44,F
Item3,L2,55,B
Item3,L2,66,P
Process 2 example resulting input data (comma separated with Header):
Input of
{'Barcode':['ABC123', 'XYZ453', 'CDE987', 'FGH487', 'YTR123'], 'Location':['Shelf1', 'Shelf2']}
and using the template to create the input data below:
Barcode,Location,Param1,Param2
ABC123,Shelf1,11,A
ABC123,Shelf1,22,B
XYZ453,Shelf1,33,C
XYZ453,Shelf2,44,F
CDE987,Shelf2,55,B
CDE987,Shelf2,66,P
FGH487,Shelf1,11,A
FGH487,Shelf1,22,B
YTR123,Shelf1,33,C
YTR123,Shelf2,44,F
I know how to handle each process with hardcoded loop/dataframe merge, etc. Ive done some abstraction in other cases with dicts. However, how to define/store each format that vary so much and create reusable abstracted code is where I am stuck.
Maybe you can do the output of the functions as a tuple with the keys "datatype" and "output" for the actual output

Dynamically update complex OrderedDict (based on yaml file)

I'm trying to build a piece of software that will rely on very dynamic configuration (or "ruleset", really). I've tried to capture it in the pseudocode:
"""
---
config:
item1:
thething: ${stages.current.variables.item1}
stages:
dev:
variables:
item1: stuff
prod:
variables:
item1: stuf2
"""
config_obj = yaml.load(config)
current_stage = 'dev'
#Insert artificial "current stage" to allow var matching
stages['current'] = stages[current_stage]
updated_config_obj = replace_vars(config_obj)
The goal is to have the updated_config_obj replace all variable-types with the actual value, so for this example it should replace ${stages.current.variables.item1} with stuff. The current part is easily solved by copying whatever's the current stage into a current item in the same OrderedDict, but I'm still stomped by how to actually perform the replace. The config yaml can be quite large and is totally depended on a plugin system so it must be dynamic.
Right now I'm looking at "walking" the entire object, checking for the existence of a $ on each "leaf" (indicating a variable) and performing a lookup backup to the current object to "resolve" the variable, but somehow that seems overly complex. Another alternative is (I guess) to ue Jinja2-templating on the "config string", with the parsed object as a lookup. Certainly doable but it somehow feels a little dirty.
I have the feeling that there should be a more elegant solution which can be done solely on the parsed object (without interacting with the string), but it escapes me.
Any pointers appreciated!
First, my two cents: try to avoid using any form of interpolation in your configuration file. This creates another layer of dependencies - one dependency for your program (the configuration file) and another dependency for your configuration file.
It's a slick solution at the moment, but consider that five years down the road some lowly developer might be staring down ${stages.current.variables.item1} for a month trying to figure out what this is, not understanding that this implicitly maps onto stages.dev. And then worse yet, some other developer comes along, and seeing that the floodgates of interpolation have been opened, they start using {{stages_dev}}, to mean that some value should interpolated from the system's environmental variables. And then some other developer starts using their own convention like {{!stagesdev!}}, which means that the value uses its own custom runtime interpolation, invoked in some obscure, downstream back-alley.
And then some consultant is hired to reverse-engineer the whole thing and now they are sailing the seas of spaghetti.
If you still want to do this, I'd recommend opening/parsing the configuration file into a dictionary (presumably using yaml.load()), then iterating through the whole thing, line-by-line, using regex to find instances of \$\{(.*)\}.
For each captured group, create an ordered list like:
# ["stages", "current", "variables", item1"]
yaml_references = ".".split("stages.current.variables.item1")
Then, you could do something like:
yaml_config_dict = "" # the parsed configuration file
interpolated_reference = None
for y in yaml_references:
interpolated_reference = yaml_config_dict[y]
i = interpolated_reference[0]
Now i should represent whatever ${stages.current.variables.item1} was pointing to in the context of the .yaml file and you should be able to do a string replace.

Using python to parse a large set of filenames concatenated from inconsistent object names

/tldr Looking to parse a large set of filenames that are a concatenation of two names (container + child) for the original two names where nomenclature is inconsistent. Python library suggestions or any other guidance appreciated.
I am looking for a way to parse strings for information where the nomenclature and formatting of information within those strings will likely be inconsistent to some degree.
Background
Industry: Automation controls
Problem to be solved:
Time series data is exported from an automation system with a single data point being saved to a single .csv file. (example: If the controls system were an environmental controls system the point might be the measured temperature of a room taken at 15 minute intervals.) It is possible to have an environment where there are a few dozen points that export to CSV files or several thousand points that export to CSV files. The structure that the points are normally stored in is as follows: points are contained within a controller, controllers are integrated under a management system and occasionally management systems could be integrated into another management system. The resulting structure is a simple hierarchical tree.
The filenames associated with the CSV files are assembled from the path structure of each point as follows: Directories are created for the management systems (nested if necessary) and under those directories are the CSV files where the filename is a concatenation of the controller name and the point name.
I have written a python script that processes a monthly export of the CSV files (currently about 5500 of them [growing]) into a structured data store and another that assembles spreadsheets for others to review. Currently, I am using some really ugly regular expressions and even uglier string.find()s with a list of static string values that I have hand entered to parse out control names and point names for each file so that they can be inserted into the structured data store.
Unfortunately, as mentioned above, the nomenclature used in these environments are rarely consistent. Point names vary widely. The point referenced above might be known as ROOMTEMP, RM_T, RM-T, ROOM-T, ZN_T, ZNT, RMT or several other possibilities. This applies to almost any point contained within a controller. Controller names are also somewhat inconsistent where they may be named for what type of device they are controlling, the geographic location of the device or even an asset number associated with the device.
I would very much like to get out of the business of hand writing regular expressions to parse file names every time a new location is added. I would like to write code that reads in filenames and looks for patterns across the filenames and then makes a recommendation for parsing the controller and point name out of each filename. I already have an interface where I can assign controller name and point name to each point object by hand so if there are errors with the parse I can modify the results. Ideally, the patterns created by the existing objects would influence the suggested names of new files being parsed.
Some examples of filenames are as follows:
UNIT1254_SAT.csv, UNIT1254_RMT.csv, UNIT1254_fil.csv, AHU_5311_CLG_O.csv, QE239-01_DISCH_STPT.csv, HX_E2_CHW_Return.csv, Plant_RM221_CHW_Sys_Enable.csv, TU_E7_Actual Clg Setpoint.csv, 1725_ROOMTEMP.csv, 1725_DA_T.csv, 1725_RA_T.csv
The order will always be consistent where it is a concatenation of controller name and then point name. There will most likely be a consistent character used to separate controller name from point name (normally an underscore, but occasionally a dash or some other character.)
Does anyone have any recommendations on how to get started with parsing these file names? I’ve thought through a few ideas, but keep shelving them before trying them prior to implementation because I keep finding the potential for performance issues or identifying failure points. The rest of my code is working pretty much the way I need it to, I just haven’t figured out an efficient or useful way to pull the correct names out of the filename. Unfortunately, It is not an option to modify the names on the control system side to be consistent.
I don't know if the following code will help you, but I hope it'll give you at least some idea.
Considering that a filename as "QE239-01_STPT_1725_ROOMTEMP_DA" can contain following names
'QE239-01'
'QE239-01_STPT'
'QE239-01_STPT_1725'
'QE239-01_STPT_1725_ROOMTEMP'
'QE239-01_STPT_1725_ROOMTEMP_DA'
'STPT'
'STPT_1725'
'STPT_1725_ROOMTEMP'
'STPT_1725_ROOMTEMP_DA'
'1725'
'1725_ROOMTEMP'
'1725_ROOMTEMP_DA'
'ROOMTEMP'
'ROOMTEMP_DA'
'DA'
as being possible elements (container name or point name) of the filename,
I defined the function treat() to return this list from the name.
Then the code treats all the filenames to find all the possible elements of filenames.
The function is based on the idea that in the chosen example the element ROOMTEMP can't follow the element STPT because STPT_ROOMTEMP isn't a possible container name in this example string since there is 1725 between these two elements.
And then, with the help of a function in difflib module, I try to discriminate elements that may have some similarity, in order to try to detect patterns under which several elements of names can be gathered.
You must play with the value passed as argument to cutoff parameter to choose what could be the best to give interesting results for you.
It's far from being good, certainly, but I didn't understood all aspects of your problem.
s =\
"""UNIT1254_SAT
UNIT1254_RMT
UNIT1254_fil
AHU_5311_CLG_O
QE239-01_DISCH_STPT,
HX_E2_CHW_Return
Plant_RM221_CHW_Sys_Enable
TU_E7_Actual Clg Setpoint
1725_ROOMTEMP
1725_DA_T
1725_RA_T
UNT147_ROOMTEMP
TRU_EZ_RM_T
HXX_V2_RM-T
RHXX_V2_ROOM-T
SIX8_ZN_T
Plint_RP228_ZNT
SOHO79_EZ_RMT"""
li = s.split('\n')
print(li)
print('- - - - - - - - - - - - - - - - - ')
import difflib
from pprint import pprint
def treat(name):
lu = name.split('_')
W = []
while lu:
W.extend('_'.join(lu[0:x]) for x in range(1,len(lu)+1))
lu.pop(0)
return W
if 0:
q = "QE239-01_STPT_1725_ROOMTEMP_DA"
pprint(treat(q))
print('==========================================')
WALL = []
for t in li:
WALL.extend(treat(t))
pprint(WALL)
for x in WALL:
j = set(difflib.get_close_matches(x, WALL, n=9000000, cutoff=0.7 ))
if len(j)>1:
print(j,'\n')

How to write back to a PDB file after doing Superimposer for atoms of a protein in PDB.BIO python

I read and extracted information of atoms from a PDB file and did a Superimposer() to align a mutation to wild-type. How can I write the aligned values of atoms back to PDB file? I tried to use PDBIO() library but it doesn't work since it doesn't accept a list as an input. Anyone has an idea how to do it?
mutantAtoms = []
mutantStructure = PDBParser().get_structure("name",pdbFile)
mutantChain = mutStructure[0]["B"]
# Extract information of atoms
for residues in mutantChain:
mutantAtoms.append(residues)
# Do alignment
si =Superimposer()
si.set_atoms(wildtypeAtoms, mutantAtoms)
si.apply(mutantAtoms)
Now mutantAtoms is the aligned atom to wild-type atom. I need to write this information to a PDB file. My question is how to convert from list of aligned atoms to a structure and use PDBIO() or some other ways to write to a PDB file.
As I see in an example in the PDBIO package documentation in Biopython documentation:
p = PDBParser()
s = p.get_structure("1fat", "1fat.pdb")
io = PDBIO()
io.set_structure(s)
io.save("out.pdb")
Seems like PDBIO module needs an object of class Structure to work, which is in principle what I understand Superimposer works with. When you say it does not accept a list do you mean you have a list of structures? In that case you could simply do it by iterating throught the structures as in:
for s in my_results_list:
io.set_structure(s)
io.save("out.pdb")
If what you have is a list of atoms, I guess you could create a Structure object with that and then pass it to PDBIO.
However, it is difficult to tell more without knowing more about your problem. You could put on your question the code lines where you get the problem.
Edit: Now I have better understood what you want to do. So I have seen in an interesting Biopython Structural Bioinformatics FAQ some information about the Structure class, which is a little complex apparently. At first sight, I do not see a very easy way to create Structure objects from scratch, but what you could do is modify the structure you get from PDBIO substituting the atoms list with the result you get from Superimposer and then write the .pdb file using the same modified structure. So you could try to put your mutantAtoms list into the mutantStructure object you already have.

how can i change the size of a table in word using python (pywin32)

ms word table with python
I am working with python on word tables, i am generating tables, but all of them are
auto fit window..
is it possible to change it to auto fit contents?
i had tried something like this:
table = location.Tables.Add(location,len(df)+1,len(df.columns)
table.AutoFit(AutoFitBehavior.AutoFitToContents)
but it keeps to raise errors
You want to change you table creation to use this:
//''#Add two ones after your columns
table = location.Tables.Add(location,len(df)+1,len(df.columns),1,1)
Information about why you need those variables can be read here:
http://msdn.microsoft.com/en-us/library/office/ff845710(v=office.15).aspx
But basically, the default behavior is to disable Cell Autofitting and Use Table Autofit to Window. The first "1" enables Cell Autofitting. From the link I posted above, the DefaultTableBehavior can either be wdWord8TableBehavior (Autofit disabled --default) or wdWord9TableBehavior (Autofit enabled). The number comes from opening up Word's VBA editor and typing in the Immediate Window:
?Word.wdWord9TableBehavior
Next, from the link, we see another option called AutoFitBehavior. This is defined as:
Sets the AutoFit rules for how Word sizes tables. Can be one of the WdAutoFitBehavior constants.
So now we have another term to look up. In the VBA editor's Immediate window again type:
?Word.wdAutoFitBehavior.
After the last dot, the possible options should appear. These will be:
wdAutoFitContent
wdAutoFitFixed
wdAutoFitWindow
AutoFitContent looks to be the option we want, so let's finish up that previous line with:
?Word.wdAutoFitBehavior.wdAutoFitContent
The result will be a "1".
Now you may ask, why do we have to go through all this trouble finding the numerical representations of the values. From my experience, with using pywin32 with Excel, is that you can't get the Built-in values, from the string, most of the time. But putting in the numerical representation works just the same.
Also, One more reason for why your code may be failing is that the table object may not have a function "Autofit".
I'm using Word 2007, and Table has the function, AutoFitBehavior.
So Change:
table.AutoFit(AutoFitBehaviour.AutoFitToContent)
To:
table.AutoFitBehavior(1)
//''Which we know the 1 means wd.wdAutoFitBehavior.wdAutoFitContent
Hope I got it right, and this helps you out.

Categories

Resources