My task is simple, I want to open a word template, add some text and tables to the word doc, save it as PDF and exit out of word.
Because of the nature of the beast I don't want to save the document in word format, and for it all to work the PDF needs to be generated with no user interaction with word.
There may be better ways of solving this task, but they are the parameters by which I am constrained. The other constraint is that it needs to run with python 2.4...
My starting point was Mark Hammonds easyword.py sample script, and I have got it to do most of what I want, but, there are two issues that I just cannot seem to figure out, and are probably related.
When I run the test() function I get my output PDF and Word docs generated correctly, but
1) I don't seem to be able to 'close' the word session/document
2) I end up with an annoying dialog box asking me if I want to save changes.
The dialog box is a deal breaker.
In my Quit function Close() doesn't appear to be recognised, and none of the tools I've got are giving me any methods at all for 'self.wordapp', although the self.wordapp.Quit() does appear to work (doesn't cause a crash).
I've spent hours searching for an answer to this, both on the internet and looking at similar code for Excel (formatting is why I can't using Excel) to no avail. Does anyone have any ideas?
The relevant sections of my test code are below:
import win32com.client
MYDIR = 'somevalidpath'
class WordWrap:
''' Wrapper around Word documents to make them easy to build.
Has variables for the Applications, Document and Selection;
most methods add things at the end of the document
'''
def __init__(self, templatefile=None):
self.wordApp = win32com.client.gencache.EnsureDispatch('Word.Application')
if templatefile == None:
self.wordDoc = self.wordApp.Documents.Add()
else:
self.wordDoc = self.wordApp.Documents.Add(Template=templatefile)
#set up the selection
self.wordDoc.Range(0,0).Select()
self.wordSel = self.wordApp.Selection
def Quit(self):
self.wordApp.Close(SaveChanges=1)
self.wordApp.Quit()
def test():
'''
Test function for class
'''
outfilename = MYDIR + '\\pythonics_mgt_accounts.doc'
w = WordWrap(MYDIR + '\\pythonics.dot')
#w.show()
w.addStyledPara('Accounts for April', 'Title')
#first some text
w.addStyledPara("Chairman's Introduction", 'Heading 1')
w.addStyledPara(randomText(), 'Normal')
# now a table sections
w.addStyledPara("Sales Figures for Year To Date", 'Heading 1')
data = randomData()
w.addTable(data, 37) # style wdTableStyleProfessional
w.addText('\n\n')
# finally a chart, on the first page of a ready-made spreadsheet
w.addStyledPara("Cash Flow Projections", 'Heading 1')
w.addInlineExcelChart(MYDIR + '\\wordchart.xls', 'Cash Flow Forecast')
w.saveAs(outfilename)
print 'saved in', outfilename
# save as PDF, saveAs handles the file conversion, based on the file extension
# the file is not just being renamed and saved
new_name = outfilename.replace(".doc", r".pdf")
w.saveAs(new_name)
print 'saved in', new_name
w.Quit()
Doh, it would have helped if I'd tried to close the document, and not the application. Code should read self.wordDoc.Close()
Related
I am fairly new to programming, and even newer to Tkinter.
I am setting up a GUI that works with an SQL Server to allow front end users to retrieve, update, and delete certain information.
Currently I have everything communicating and working correctly, but I have a function that exports a list of the results into an excel file using Pandas. The export works fine, but it has the static name and directory I give it inside the Pandas to_excel method.
I want to use a Tkinter asksaveasfilename dialog to allow the user to name and choose the files export location, but I can't seem figure out how this works with this dialogue box (if it is even possible). Is there an option inside the dialog boxes code where I specify what information I want to save?
def exportFunc():
pd.DataFrame(data).to_excel("TestList.xlsx", header=False, index = True)
filedialog.asksaveasfilename(initialdir = "/", title = 'Save File', filetypes = ("Excel File", "*.xlsx"))
pass
My code doesn't produce any errors, just simply saves nothing with the dialogue box with everything I try. Right now I have the file dialog commented out in my actual code, but if someone could direct me towards a possible solution, I would be grateful!
10 months ago this was posted, but I hope this answer can help a fellow novice googling around for this answer as well.
How I solved this was noticing the asksaveasfile function outputs a value that contains the user specified file path and file name. For example:
< closed file u'E:Filepath/AnotherPath/work2.xlsx', mode 'w' at 2119x6710 >
I then used regex and the replace method to strip away all values surrounding the filepath, which once finished, the to_excel function would see as a hardcode.
Hope this helps someone out there!
out = tkFileDialog.asksaveasfile(mode='w', defaultextension=".xlsx")
out.close()
restr = str(out)
RegexPrep = restr.replace("'w'", '')
outRegex = re.findall(r"'(.*?)'", RegexPrep)
ToExcelRegex = str(outRegex)
MorePrep = ToExcelRegex.replace("[",'')
MorePrep = MorePrep.replace("]",'')
MorePrep = MorePrep.replace("'",'')
Final = MorePrep.strip()
find.to_excel(Final, index=False)
Asksavasafile returns a file object, so we can use that to save the df.
from tkinter import filedialog, Tk
import pandas as pd
df = pd.DataFrame(
{"Test": range(20)}
)
root = Tk() # this is to close the dialogue box later
try:
# with block automatically closes file
with filedialog.asksaveasfile(mode='w', defaultextension=".xlsx") as file:
df.to_excel(file.name)
except AttributeError:
# if user cancels save, filedialog returns None rather than a file object, and the 'with' will raise an error
print("The user cancelled save")
root.destroy() # close the dialogue box
I'm trying to insert a picture into a Word document using python-docx but running into errors.
The code is simply:
document.add_picture("test.jpg", width = Cm(2.0))
From looking at the python-docx documentation I can see that the following XML should be generated:
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:nvPicPr>
<pic:cNvPr id="1" name="python-powered.png"/>
<pic:cNvPicPr/>
</pic:nvPicPr>
<pic:blipFill>
<a:blip r:embed="rId7"/>
<a:stretch>
<a:fillRect/>
</a:stretch>
</pic:blipFill>
<pic:spPr>
<a:xfrm>
<a:off x="0" y="0"/>
<a:ext cx="859536" cy="343814"/>
</a:xfrm>
<a:prstGeom prst="rect"/>
</pic:spPr>
</pic:pic>
This does in fact get generated in my document.xml file. (When unzipping the docx file). However looking into the OOXML format I can see that the image should also be saved under the media folder and the relationship should be mapped in word/_rels/document.xml:
<Relationship Id="rId20"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="media/image20.png"/>
None of this is happens however, and when I open the Word document I'm met with a "The picture can't be displayed" placeholder.
Can anyone help me understand what is going on?
It looks like the image is not embedded the way it should be and I need to insert it in the media folder and add the mapping for it, however as a well documented feature this should be working as expected.
UPDATE:
Testing it out with an empty docx file that image does get added as expected which leads me to believe it might have something to do with the python-docx-template library. (https://github.com/elapouya/python-docx-template)
It uses python-docx and jinja to allow templating capabilities but runs and works the same way python-docx should. I added the image to a subdoc which then gets inserted into a full document at a given place.
A sample code can be seen below (from https://github.com/elapouya/python-docx-template/blob/master/tests/subdoc.py):
from docxtpl import DocxTemplate
from docx.shared import Inches
tpl=DocxTemplate('test_files/subdoc_tpl.docx')
sd = tpl.new_subdoc()
sd.add_paragraph('A picture :')
sd.add_picture('test_files/python_logo.png', width=Inches(1.25))
context = {
'mysubdoc' : sd,
}
tpl.render(context)
tpl.save('test_files/subdoc.docx')
I'll keep this up in case anyone else manages to make the same mistake as I did :) I managed to debug it in the end.
The problem was in how I used the python-docx-template library. I opened up a DocxTemplate like so:
report_output = DocxTemplate(template_path)
DoThings(value,template_path)
report_output.render(dictionary)
report_output.save(output_path)
But I accidentally opened it up twice. Instead of passing the template to a function, when working with it, I passed a path to it and opened it again when creating subdocs and building them.
def DoThings(data,template_path):
doc = DocxTemplate(template_path)
temp_finding = doc.new_subdoc()
#DO THINGS
Finally after I had the subdocs built, I rendered the first template which seemed to work fine for paragraphs and such but I'm guessing the images were added to the "second" opened template and not to the first one that I was actually rendering. After passing the template to the function it started working as expected!
I came acrossed with this problem and it was solved after the parameter width=(1.0) in method add_picture removed.
when parameter width=(1.0) was added, I could not see the pic in test.docx
so, it MIGHT BE resulted from an unappropriate size was set to the picture,
to add pictures, headings, paragraphs to existing document:
doc = Document(full_path) # open an existing document with existing styles
for row in tableData: # list from the json api ...
print ('row {}'.format(row))
level = row['level']
levelStyle = 'Heading ' + str(level)
title = row['title']
heading = doc.add_heading( title , level)
heading.style = doc.styles[levelStyle]
p = doc.add_paragraph(row['description'])
if row['img_http_path']:
ip = doc.add_paragraph()
r = ip.add_run()
r.add_text(row['img_name'])
r.add_text("\n")
r.add_picture(row['img_http_path'], width = Cm(15.0))
doc.save(full_path)
I made a vbscript to open an excel doc, then runs a python program that pulls data from the documents tables and prints it to a text file. The script is supposed to wait until the python program is done creating the text doc then close the excel doc, but for whatever reason my python program closes before it even has a chance to make that text doc.
I even changed the python code to just print a simple 'Hello World' into a new text document in case pulling data from excel was causing problems but the text document still wasn't created.
This is the script that i'm running:
Set xl = CreateObject("Excel.application")
xl.Application.Workbooks.Open "C:\Users\V\Documents\_PROGRAMS_\TEST.xlsx"
xl.Application.Visible = True
Dim oshell
Set oshell = WScript.CreateObject("WScript.Shell")
oshell.CurrentDirectory = "C:\Users\V\Documents\_PROGRAMS_\"
windowStyle = 1
waitUntilFinished = True
oshell.run "python table.py", windowStyle, waitUntilFinished
xl.Application.Quit
I don't think adding the python program is important since that isn't really the problem. Although I will say that I tried putting a delay in the python program to see if that would change anything (it didn't).
I though adding the two extra arguments to .run would make it wait until the process is finished but I guess I must be missing something?
I'm just starting to learn how to use vbscript so any explanations of code would be welcomed!
Thanks!
EDIT: So after more testing it seems that it does have something to do with accessing the excel document, as just printing 'Hello World' to a file did actually work and the file was created (I made it in the wrong directory by accident so I was looking in the wrong place). But trying it with the data from the excel document no file is created, the program just ends
So here's the python code I wrote:
#!/usr/bin/python27
import pandas as pd
table = pd.read_excel("TEST.xlsx") #Get excel doc
file = open("text.txt", "w") #Open new file
file.write(table.columns.values) #Print out column headers
file.write("Hello!")
file.close()
Hoping for some help, as I can't find a solution.
We currently have a lot of manual data inputs through people reading PDF files, and I have been asked to find a way to cut this time down. My solution would be to transform the PDF to a much easier readable format, then using grep to get rid of the standard fields (Just leaving the data behind). This would then be uploaded into a template, then into SAP.
However, then main problem has come at the first hurdle - transforming the PDF into a txt file. The code I use is as follows -
import sys
import pyPdf
def getPDFContent(path):
content = ""
pdf = pyPdf.PdfFileReader(file(path, "rb"))
for i in range(0, pdf.getNumPages()):
content += pdf.getPage(i).extractText() + "\n"
content = " ".join(content.replace(u"\xa0", " ").strip().split())
return content
f = open('test.txt', 'w+')
f.write(getPDFContent("Adminform.pdf").encode("ascii", "ignore"))
f.close()
This works, however it ignores some data from the PDF files. To show you what I mean, this PDF page -
http://s23.postimg.org/6dqykomqj/error.png
From the first section (gender, title, name) produces the below -
*Title: *Legal First Name (s): *Your forename and second name (if applicable) as it appears on your passport or birth certificate. Address: *Legal Surname: *Your surname as it appears on your passport or birth certificate
Basically, the actual data that I want to capture is not being converted.
Anyone have a fix for this?
Thanks,
Generally speaking converting pdfs to text is a bad idea. It almost always is messy.
There are linux utilities to do what you have implemented, but I don't expect them to do any better.
I can suggest tabula you can find it at.
http://tabula.technology/
It is meant for extracting tables out of pdfs by manually delineating the boundaries of the table. But running on a pdf with no tables would output text with some formatting retained.
There is some automation, although, limited.
Refer
https://github.com/tabulapdf/tabula-extractor/wiki/Using-the-command-line-tabula-extractor-tool
Also, may not entirely relevant here, you can use openrefine to manage messy data. Refer
http://openrefine.org/
I'll try to give a brief background here. I recently received a large amount of data that was all digitized from paper maps. Each map was saved as an individual file that contains a number of records (polygons mostly). My goal is to merge all of these files into one shapefile or geodatabase, which is an easy enough task. However, other than spatial information, the records in the file do not have any distinguishing information so I would like to add a field and populate it with the original file name to track its provenance. For example, in the file "505_dmg.shp" I would like each record to have a "505_dmg" id in a column in the attribute table labeled "map_name". I am trying to automate this using Python and feel like I am very close. Here is the code I'm using:
# Import system module
import arcpy
from arcpy import env
from arcpy.sa import *
# Set overwrite on/off
arcpy.env.overwriteOutput = "TRUE"
# Define workspace
mywspace = "K:/Research/DATA/ADS_data/Historic/R2_ADS_Historical_Maps/Digitized Data/Arapahoe/test"
print mywspace
# Set the workspace for the ListFeatureClass function
arcpy.env.workspace = mywspace
try:
for shp in arcpy.ListFeatureClasses("","POLYGON",""):
print shp
map_name = shp[0:-4]
print map_name
arcpy.AddField_management(shp, "map_name", "TEXT","","","20")
arcpy.CalculateField_management(shp, "map_name","map_name", "PYTHON")
except:
print "Fubar, It's not working"
print arcpy.GetMessages()
else:
print "You're a genius Aaron"
The output I receive from running this script:
>>>
K:/Research/DATA/ADS_data/Historic/R2_ADS_Historical_Maps/Digitized Data/Arapahoe/test
505_dmg.shp
505_dmg
506_dmg.shp
506_dmg
You're a genius Aaron
Appears successful, right? Well, it has been...almost: a field was added and populated for both files, and it is perfect for 505_dmg.shp file. Problem is, 506_dmg.shp has also been labeled "505_dmg" in the "map_name" column. Though the loop appears to be working partially, the map_name variable does not seem to be updating. Any thoughts or suggestions much appreciated.
Thanks,
Aaron
I received a solution from the ESRI discussion board:
https://geonet.esri.com/thread/114520
Basically, a small edit in the Calculate field function did the trick. Here is the new code that worked:
arcpy.CalculateField_management(shp, "map_name","\"" + map_name + "\"", "PYTHON")