python-docx does not add picture - python

I'm trying to insert a picture into a Word document using python-docx but running into errors.
The code is simply:
document.add_picture("test.jpg", width = Cm(2.0))
From looking at the python-docx documentation I can see that the following XML should be generated:
<pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">
<pic:nvPicPr>
<pic:cNvPr id="1" name="python-powered.png"/>
<pic:cNvPicPr/>
</pic:nvPicPr>
<pic:blipFill>
<a:blip r:embed="rId7"/>
<a:stretch>
<a:fillRect/>
</a:stretch>
</pic:blipFill>
<pic:spPr>
<a:xfrm>
<a:off x="0" y="0"/>
<a:ext cx="859536" cy="343814"/>
</a:xfrm>
<a:prstGeom prst="rect"/>
</pic:spPr>
</pic:pic>
This does in fact get generated in my document.xml file. (When unzipping the docx file). However looking into the OOXML format I can see that the image should also be saved under the media folder and the relationship should be mapped in word/_rels/document.xml:
<Relationship Id="rId20"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="media/image20.png"/>
None of this is happens however, and when I open the Word document I'm met with a "The picture can't be displayed" placeholder.
Can anyone help me understand what is going on?
It looks like the image is not embedded the way it should be and I need to insert it in the media folder and add the mapping for it, however as a well documented feature this should be working as expected.
UPDATE:
Testing it out with an empty docx file that image does get added as expected which leads me to believe it might have something to do with the python-docx-template library. (https://github.com/elapouya/python-docx-template)
It uses python-docx and jinja to allow templating capabilities but runs and works the same way python-docx should. I added the image to a subdoc which then gets inserted into a full document at a given place.
A sample code can be seen below (from https://github.com/elapouya/python-docx-template/blob/master/tests/subdoc.py):
from docxtpl import DocxTemplate
from docx.shared import Inches
tpl=DocxTemplate('test_files/subdoc_tpl.docx')
sd = tpl.new_subdoc()
sd.add_paragraph('A picture :')
sd.add_picture('test_files/python_logo.png', width=Inches(1.25))
context = {
'mysubdoc' : sd,
}
tpl.render(context)
tpl.save('test_files/subdoc.docx')

I'll keep this up in case anyone else manages to make the same mistake as I did :) I managed to debug it in the end.
The problem was in how I used the python-docx-template library. I opened up a DocxTemplate like so:
report_output = DocxTemplate(template_path)
DoThings(value,template_path)
report_output.render(dictionary)
report_output.save(output_path)
But I accidentally opened it up twice. Instead of passing the template to a function, when working with it, I passed a path to it and opened it again when creating subdocs and building them.
def DoThings(data,template_path):
doc = DocxTemplate(template_path)
temp_finding = doc.new_subdoc()
#DO THINGS
Finally after I had the subdocs built, I rendered the first template which seemed to work fine for paragraphs and such but I'm guessing the images were added to the "second" opened template and not to the first one that I was actually rendering. After passing the template to the function it started working as expected!

I came acrossed with this problem and it was solved after the parameter width=(1.0) in method add_picture removed.
when parameter width=(1.0) was added, I could not see the pic in test.docx
so, it MIGHT BE resulted from an unappropriate size was set to the picture,

to add pictures, headings, paragraphs to existing document:
doc = Document(full_path) # open an existing document with existing styles
for row in tableData: # list from the json api ...
print ('row {}'.format(row))
level = row['level']
levelStyle = 'Heading ' + str(level)
title = row['title']
heading = doc.add_heading( title , level)
heading.style = doc.styles[levelStyle]
p = doc.add_paragraph(row['description'])
if row['img_http_path']:
ip = doc.add_paragraph()
r = ip.add_run()
r.add_text(row['img_name'])
r.add_text("\n")
r.add_picture(row['img_http_path'], width = Cm(15.0))
doc.save(full_path)

Related

Extracting text from MS Word Document uploaded through FileUpload from ipyWidgets in Jupyter Notebook

I am trying to allow user to upload MS Word file and then I run a certain function that takes a string as input argument. I am uploading Word file through FileUpload however I am getting a coded object. I am unable to decode using byte UTF-8 and using upload.value or upload.data just returns coded text
Any ideas how I can extract content from uploaded Word File?
> upload = widgets.FileUpload()
> upload
#I select the file I want to upload
> upload.value #Returns coded text
> upload.data #Returns coded text
> #Previously upload['content'] worked, but I read this no longer works in IPYWidgets 8.0
Modern ms-word files (.docx) are actually zip-files.
The text (but not the page headers) are actually inside an XML document called word/document.xml in the zip-file.
The python-docx module can be used to extract text from these documents. It is mainly used for creating documents, but it can read existing ones. Example from here.
>>> import docx
>>> gkzDoc = docx.Document('grokonez.docx')
>>> fullText = []
>>> for paragraph in doc.paragraphs:
... fullText.append(paragraph.text)
...
Note that this will only extract the text from paragraphs. Not e.g. the text from tables.
Edit:
I want to be able to upload the MS file through the FileUpload widget.
There are a couple of ways you can do that.
First, isolate the actual file data. upload.data is actually a dictionary, see here. So do something like:
rawdata = upload.data[0]
(Note that this format has changed over different version of ipywidgets. The above example is from the documentation of the latest version. Read the relevant version of the documentation, or investigate the data in IPython, and adjust accordingly.)
write rawdata to e.g. foo.docx and open that. That would certainly work, but it does seem somewhat un-elegant.
docx.Document can work with file-like objects. So you could create an io.BytesIO object, and use that.
Like this:
foo = io.BytesIO(rawdata)
doc = docx.Document(foo)
Tweaking with #Roland Smith great suggestions, following code finally worked:
import io
import docx
from docx import Document
upload = widgets.FileUpload()
upload
rawdata = upload.data[0]
test = io.BytesIO(rawdata)
doc = Document(test)
for p in doc.paragraphs:
print (p.text)

Not able to set "right to left" alignment style for text in python-docx

I am trying to write some text to a docx file using python-docx. I want to align the text from right to left and I have added a style for that, which is not working.
Here's the code:
from docx.enum.style import WD_STYLE_TYPE
missingwords= Document()
styles = missingwords.styles
style = missingwords.styles.add_style('rtl', WD_STYLE_TYPE.PARAGRAPH)
style.font.rtl = True
paragraph =missingwords.add_paragraph("Hello world",style='rtl')
I haven't gotten around to playing with docx yet (I've mostly used Excel python modules), but based on the documentation here it's looking like you're modifying the wrong property of style. The Font property, per this definition of the rtl property, would only modify an added run (via myparagraph.add_run("Hello World", style = "rtl")).As far as I can tell, the code you're looking for is:
missingwords = Document()
style = missingwords.styles.add_style('rtl', WD_STYLE_TYPE.PARAGRAPH)
style.paragraph_format.alignment = WD_PARAGRAPH_ALIGNMENT.RIGHT
And then you can go ahead and add the paragraph like you were
paragraph = missingwords.add_paragraph("Hello world",style='rtl')
Again, just going off the documentation, so let me know if that works.

Lost formatting and image after search and replace using python-docx

Experts,
I have a template docx report, which has image and standard formatting inside it. What I did using docx, was just to search some tags, and replace it using the value from a config file.
Search & replace was working as expected, but the output file lost all the image, and the formatting. Do you know what went wrong? All I did was just modifying the example-makedocument.py, and replace it to use with my docx file.
I've searched the discussion on python.docx librelist, and their page on github, there were a lot of questions like this, but remained unanswered.
Thank you.
--- my script is simple one like this ---
from docx import *
from ConfigParser import SafeConfigParser
filename = "template.docx"
document = opendocx(filename)
relationships = relationshiplist()
body = document.xpath('/w:document/w:body',namespaces=nsprefixes)[0]
####### get config file
parser = SafeConfigParser()
parser.read('../TESTING1-config.txt')
######## Search and replace
print 'Searching for something in a paragraph ...',
if search(body, ''):
print 'found it!'
else:
print 'nope.'
print 'Replacing ...',
body = advReplace(body, '', parser.get('ASD', 'ASD'))
print 'done.'
####### #Create our properties, contenttypes, and other support files
title = 'Python docx demo'
subject = 'A practical example of making docx from Python'
creator = 'Mike MacCana'
keywords = ['python', 'Office Open XML', 'Word']
coreprops = coreproperties(title=title, subject=subject, creator=creator,keywords=keywords)
appprops = appproperties()
contenttypes = contenttypes()
websettings = websettings()
wordrelationships = wordrelationships(relationships)
savedocx(document, coreprops, appprops, contenttypes, websettings, wordrelationships, 'Welcome to the Python docx module.docx')
Python-docx only copies over the document.xml file in the original Docx zip. Everything else is discarded and recreated either from a function or from a preexisting template file. This unfortunately includes the document.xml.rels file that is responsible for mapping images.
The oodocx module that I have developed copies over everything from the old Docx and, at least in my experience, plays nicely with images.
I have answered to a similar question about python-docx. Python docx is not meant to store the docx images and export them away.
Python Docx is not a templating engine for Docx.

How change table contents and style using python-docx?

I found python-docx,
it looks very smart, but I have to do some tasks that are not well documented.
I need to open a .docx template, with a table within, ad for all the istances present in a list previously created, I have to format them in the table inside the template.
Probably I've found a solution.
It depends of document.xpath, a way to take a map of it is decompress the .docx and read the ./word/document.xml file.
PATH_CELL = 'the path you individuate in document.xml'
docbody = document.xpath('/w:document/w:body'+PATH_CELL,
namespaces=nsprefixes)[0]
print 'Replacing ...',
docbody = replace(docbody,'Welcome','Hello')
I've found this way to run the game. Any else ?

Rendering dynamically generated HTML through pyramid Response

I am new to python's pyramid framework so kindly help me.
I have a HTML dynamically generated. This HTML is generated by a python script which dynamically writes (tags/tables) which are extracted from some 'xyz.html' [using beautifulsoup] to another 'abc.html'.
Now i need to send this html page ('abc.html') back as a 'Response' object of 'pyramid.response' .
how can i do this. I tried the following
_resp = Response()
_resp.headerlist = [('Content-type',"text/html; charset=UTF-8'"\]
_resp.app_iter = open('abc.html','r')
return _resp
and also
with open('abc.html','r') as f:
data = f.read()
f.close()
return Response(data,content_type='text/html')
both did not work.
PS: I cannot use renderer="package:subpack/abc.html" or any similar renderer as this generated html is stored in a dynamically generated location everytime so i cannot guess the final storage location of this html file.
Thanks in advance for you help.
I'm a little surprised your first example doesn't work. Check out this cookbook entry on it from the Pyramid docs and see if that helps.
http://docs.pylonsproject.org/projects/pyramid_cookbook/en/latest/static_assets/files.html#serving-file-content-dynamically

Categories

Resources