I tried to build a table of contents in Reportlab (but failed ... and did not insisted too much as and seems even more than what I'm needing .. might be I'll give a try newly in the future ..).
As now I'd be quite happy to have some simple text as guide for a document (the document is mainly composed by some Pandas generated numbered grids. Id' simly like to have a text with the titles of the grids at the beginning of the Reportlab generated .pdf).
My goal looked so very simple and was to append two Platypuses one with the titels and one with the grids but did not worked. So I move to an even simpler goal and tried to append two Platypuses plain texts .. but that did not worked again ... :-(
My code as below:
# settings
from reportlab.pdfgen import canvas
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.platypus import *
styles = getSampleStyleSheet()
PATH_OUT = "C:\\"
titolo = 'Test.pdf'
doc = SimpleDocTemplate( PATH_OUT + titolo )
elements0 = []
elements1 = []
elements2 = []
# 1-st platypus
elements0.append(Paragraph("The Platypus0", styles['Heading1']))
elements0.append(Paragraph("Very <i>Special</i>!", styles['Normal']))
# 2-nd platypus
elements1.append(Paragraph("The Platypus1", styles['Heading1']))
elements1.append(Paragraph("Very <i>Special</i>!", styles['Normal']))
# append them
elements2 = elements0.append(elements1)
# Write the document
doc.build(elements2)
The issue I have is this is miserably crashing apparently because of no len() resulting object.
Do you have any suggestion that might be of help in this ? If I use elements0 or elements1, one separate from the other, they work pretty smoothly but when I try to append one with the other it does not. Any suggestion ?
Thank you so much :-) Fabio.
append on list appends the item in place and does not return a new list. With the following:
elements2 = elements0.append(elements1)
elements2 being assigned the value None and elements0 now contains a new item, which is elements1
elements0[0] -> Paragraph("The Platypus0", styles['Heading1'])
elements0[1] -> Paragraph("Very <i>Special</i>!", styles['Normal'])
elements0[2] -> [Paragraph("The Platypus1", styles['Heading1']), Paragraph("Very <i>Special</i>!", styles['Normal'])
If you want to put the two texts together, use the concatenation
elements2 = elements0 + elements1
Related
I have searched high and low (on various forums) and simply can't find the answer. I have a table in a docx file and would like to use the docx Python module to modify it.
I need to add a column to the left side of the table. According to the documentation, using the add_column() function adds a column to the right side of the table.
I have also tried changing the directionality of the table to a RTL table with the following code:
import docx
from docx.enum.table import WD_TABLE_DIRECTION
file = test.docx
doc = docx.Document(file)
tbls = doc.tables #this gives me 3 tables in a list of table objects
test = tbls[1]
test.table_direction = WD_TABLE_DIRECTION.RTL
test.add_column(1)
doc.save(file)
Upon opening the resulting file, I found that the code still adds a column only to the left side.
Does someone know how to add a column to the right side of a table?
Many thanks in advance!
You can try LTR, and also use Inches to define the column width so that the added column can be displayed correctly.
import docx
from docx.enum.table import WD_TABLE_DIRECTION
from docx.shared import Cm, Inches
file = 'test.docx'
doc = docx.Document(file)
tbls = doc.tables
test = tbls[1]
test.table_direction = WD_TABLE_DIRECTION.LTR
test.add_column(Inches(1.0))
doc.save(file)
I'm trying to insert a picture into a Word document using python-docx but running into errors.
The code is simply:
document.add_picture("test.jpg", width = Cm(2.0))
From looking at the python-docx documentation I can see that the following XML should be generated:
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:nvPicPr>
<pic:cNvPr id="1" name="python-powered.png"/>
<pic:cNvPicPr/>
</pic:nvPicPr>
<pic:blipFill>
<a:blip r:embed="rId7"/>
<a:stretch>
<a:fillRect/>
</a:stretch>
</pic:blipFill>
<pic:spPr>
<a:xfrm>
<a:off x="0" y="0"/>
<a:ext cx="859536" cy="343814"/>
</a:xfrm>
<a:prstGeom prst="rect"/>
</pic:spPr>
</pic:pic>
This does in fact get generated in my document.xml file. (When unzipping the docx file). However looking into the OOXML format I can see that the image should also be saved under the media folder and the relationship should be mapped in word/_rels/document.xml:
<Relationship Id="rId20"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="media/image20.png"/>
None of this is happens however, and when I open the Word document I'm met with a "The picture can't be displayed" placeholder.
Can anyone help me understand what is going on?
It looks like the image is not embedded the way it should be and I need to insert it in the media folder and add the mapping for it, however as a well documented feature this should be working as expected.
UPDATE:
Testing it out with an empty docx file that image does get added as expected which leads me to believe it might have something to do with the python-docx-template library. (https://github.com/elapouya/python-docx-template)
It uses python-docx and jinja to allow templating capabilities but runs and works the same way python-docx should. I added the image to a subdoc which then gets inserted into a full document at a given place.
A sample code can be seen below (from https://github.com/elapouya/python-docx-template/blob/master/tests/subdoc.py):
from docxtpl import DocxTemplate
from docx.shared import Inches
tpl=DocxTemplate('test_files/subdoc_tpl.docx')
sd = tpl.new_subdoc()
sd.add_paragraph('A picture :')
sd.add_picture('test_files/python_logo.png', width=Inches(1.25))
context = {
'mysubdoc' : sd,
}
tpl.render(context)
tpl.save('test_files/subdoc.docx')
I'll keep this up in case anyone else manages to make the same mistake as I did :) I managed to debug it in the end.
The problem was in how I used the python-docx-template library. I opened up a DocxTemplate like so:
report_output = DocxTemplate(template_path)
DoThings(value,template_path)
report_output.render(dictionary)
report_output.save(output_path)
But I accidentally opened it up twice. Instead of passing the template to a function, when working with it, I passed a path to it and opened it again when creating subdocs and building them.
def DoThings(data,template_path):
doc = DocxTemplate(template_path)
temp_finding = doc.new_subdoc()
#DO THINGS
Finally after I had the subdocs built, I rendered the first template which seemed to work fine for paragraphs and such but I'm guessing the images were added to the "second" opened template and not to the first one that I was actually rendering. After passing the template to the function it started working as expected!
I came acrossed with this problem and it was solved after the parameter width=(1.0) in method add_picture removed.
when parameter width=(1.0) was added, I could not see the pic in test.docx
so, it MIGHT BE resulted from an unappropriate size was set to the picture,
to add pictures, headings, paragraphs to existing document:
doc = Document(full_path) # open an existing document with existing styles
for row in tableData: # list from the json api ...
print ('row {}'.format(row))
level = row['level']
levelStyle = 'Heading ' + str(level)
title = row['title']
heading = doc.add_heading( title , level)
heading.style = doc.styles[levelStyle]
p = doc.add_paragraph(row['description'])
if row['img_http_path']:
ip = doc.add_paragraph()
r = ip.add_run()
r.add_text(row['img_name'])
r.add_text("\n")
r.add_picture(row['img_http_path'], width = Cm(15.0))
doc.save(full_path)
I'll try to give a brief background here. I recently received a large amount of data that was all digitized from paper maps. Each map was saved as an individual file that contains a number of records (polygons mostly). My goal is to merge all of these files into one shapefile or geodatabase, which is an easy enough task. However, other than spatial information, the records in the file do not have any distinguishing information so I would like to add a field and populate it with the original file name to track its provenance. For example, in the file "505_dmg.shp" I would like each record to have a "505_dmg" id in a column in the attribute table labeled "map_name". I am trying to automate this using Python and feel like I am very close. Here is the code I'm using:
# Import system module
import arcpy
from arcpy import env
from arcpy.sa import *
# Set overwrite on/off
arcpy.env.overwriteOutput = "TRUE"
# Define workspace
mywspace = "K:/Research/DATA/ADS_data/Historic/R2_ADS_Historical_Maps/Digitized Data/Arapahoe/test"
print mywspace
# Set the workspace for the ListFeatureClass function
arcpy.env.workspace = mywspace
try:
for shp in arcpy.ListFeatureClasses("","POLYGON",""):
print shp
map_name = shp[0:-4]
print map_name
arcpy.AddField_management(shp, "map_name", "TEXT","","","20")
arcpy.CalculateField_management(shp, "map_name","map_name", "PYTHON")
except:
print "Fubar, It's not working"
print arcpy.GetMessages()
else:
print "You're a genius Aaron"
The output I receive from running this script:
>>>
K:/Research/DATA/ADS_data/Historic/R2_ADS_Historical_Maps/Digitized Data/Arapahoe/test
505_dmg.shp
505_dmg
506_dmg.shp
506_dmg
You're a genius Aaron
Appears successful, right? Well, it has been...almost: a field was added and populated for both files, and it is perfect for 505_dmg.shp file. Problem is, 506_dmg.shp has also been labeled "505_dmg" in the "map_name" column. Though the loop appears to be working partially, the map_name variable does not seem to be updating. Any thoughts or suggestions much appreciated.
Thanks,
Aaron
I received a solution from the ESRI discussion board:
https://geonet.esri.com/thread/114520
Basically, a small edit in the Calculate field function did the trick. Here is the new code that worked:
arcpy.CalculateField_management(shp, "map_name","\"" + map_name + "\"", "PYTHON")
import shapefile
r = shapefile.Reader("C:\Users\Me\Desktop\py\mis.dbf")
That is as far as I get, must be something simple I don't know about. I have already spent a embarrassing amount of time on this little thing. Could one of you more knowlegeable ones tell me what I missed?
It looks like you're good to go unless you're getting an error that you didn't mention.
First of all you're looking at the dbf file which contains the shapefile attributes (similar to a spreadsheet). But that doesn't matter because the Reader ignores extensions and will try to find the .shp and .shx files as well containing the geometry and geometry record index as well.
If you're just interested in the attributes try the following after you above example:
# Print the dbf field names
print [f[0] for f in r.fields]
# Print the first record:
print r.record(0)
# Loop through all the records using an interator:
for rec in r.iterRecords(): print rec
I have created a series of PDF documents (maps) using data driven pages in ESRI ArcMap 10. There is a page 1 and page 2 for each map generated from separate *.mxd. So I have one list of PDF documents containing page 1 for each map and one list of PDF documents containing page 2 for each map. For example: Map1_001.pdf, map1_002.pdf, map1_003.pdf...map2_001.pdf, map2_002.pdf, map2_003.pdf...and so one.
I would like to append these maps, pages 1 and 2, together so that both page 1 and 2 are together in one PDF per map. For example: mapboth_001.pdf, mapboth_002.pdf, mapboth_003.pdf... (they don't have to go into a new pdf file (mapboth), it's fine to append them to map1)
For each map1_ *.pdf
Walk through the directory and append map2_ *.pdf where the numbers (where the * is) in the file name match
There must be a way to do it using python. Maybe with a combination of arcpy, os.walk or os.listdir, and pyPdf and a for loop?
for pdf in os.walk(datadirectory):
??
Any ideas? Thanks kindly for your help.
A PDF file is structured in a different way than a plain text file. Simply putting two PDF files together wouldn't work, as the file's structure and contents could be overwritten or become corrupt. You could certainly author your own, but that would take a fair amount of time, and intimate knowledge of how a PDF is internally structured.
That said, I would recommend that you look into pyPDF. It supports the merging feature that you're looking for.
This should properly find and collate all the files to be merged; it still needs the actual .pdf-merging code.
Edit: I have added pdf-writing code based on the pyPdf example code. It is not tested, but should (as nearly as I can tell) work properly.
Edit2: realized I had the map-numbering crossways; rejigged it to merge the right sets of maps.
import collections
import glob
import re
# probably need to install this module -
# pip install pyPdf
from pyPdf import PdfFileWriter, PdfFileReader
def group_matched_files(filespec, reg, keyFn, dataFn):
res = collections.defaultdict(list)
reg = re.compile(reg)
for fname in glob.glob(filespec):
data = reg.match(fname)
if data is not None:
res[keyFn(data)].append(dataFn(data))
return res
def merge_pdfs(fnames, newname):
print("Merging {} to {}".format(",".join(fnames), newname))
# create new output pdf
newpdf = PdfFileWriter()
# for each file to merge
for fname in fnames:
with open(fname, "rb") as inf:
oldpdf = PdfFileReader(inf)
# for each page in the file
for pg in range(oldpdf.getNumPages()):
# copy it to the output file
newpdf.addPage(oldpdf.getPage(pg))
# write finished output
with open(newname, "wb") as outf:
newpdf.write(outf)
def main():
matches = group_matched_files(
"map*.pdf",
"map(\d+)_(\d+).pdf$",
lambda d: "{}".format(d.group(2)),
lambda d: "map{}_".format(d.group(1))
)
for map,pages in matches.iteritems():
merge_pdfs((page+map+'.pdf' for page in sorted(pages)), "merged{}.pdf".format(map))
if __name__=="__main__":
main()
I don't have any test pdfs to try and combine but I tested with a cat command on text files.
You can try this out (I'm assuming unix based system): merge.py
import os, re
files = os.listdir("/home/user/directory_with_maps/")
files = [x for x in files if re.search("map1_", x)]
while len(files) > 0:
current = files[0]
search = re.search("_(\d+).pdf", current)
if search:
name = search.group(1)
cmd = "gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=FULLMAP_%s.pdf %s map2_%s.pdf" % (name, current, name)
os.system(cmd)
files.remove(current)
Basically it goes through and grabs the maps1 list and then just goes through and assumes correct files and just goes through numbers. (I can see using a counter to do this and padding with 0's to get similar effect).
Test the gs command first though, I just grabbed it from http://hints.macworld.com/article.php?story=2003083122212228.
There are examples of how to to do this on the pdfrw project page at googlecode:
http://code.google.com/p/pdfrw/wiki/ExampleTools