Insert several images at once using macro scripting in LibreOffice

Insert several images at once using macro scripting in LibreOffice - python

I am writing a macro for LibreOffice Writer in Python.
I need to insert several images in one document, one after another with minimal space inbetween them.
The folowing code inserts all the images in the same area and all of them are overlapped.
I need to advance the cursor below the inserted image everytime a new image is inserted.
I have tried the cursor.gotoEnd(), cursor.goDown() and other such methods but none seem to work.
How do I make this work?
def InsertAll():
desktop = XSCRIPTCONTEXT.getDesktop()
doc=desktop.loadComponentFromURL('private:factory/swriter','_blank',0,())
text = doc.getText()
cursor = text.createTextCursor()
file_list = glob.glob('/path/of/your/dir/*.png')
for f in file_list:
img = doc.createInstance('com.sun.star.text.TextGraphicObject')
img.GraphicURL = 'file://' + f
text.insertTextContent(cursor, img, False)
cursor.gotoEnd(False) <- doesnt advance the cursor downwards
return None

Insert a paragraph break after each image:
from com.sun.star.text.ControlCharacter import PARAGRAPH_BREAK
text.insertTextContent(cursor, img, False)
text.insertControlCharacter(cursor, PARAGRAPH_BREAK, False)
cursor.gotoEnd(False)
This will separate the images... by a paragraph
Andrew's book is a basic source for solving many OpenOffice scripting problems: +1

Related

Cropping the Mediabox does not work for some pdfs

I wrote a little script which shall blank out the lower half of a PDF document. The document itself shall remain the same size, but the lower half shall be just white.
(This is to remove the "instructions" part from parcel labels of German parcel comanies like DHL and Hermes.)
To do this, I take the PDF page, adjust the Mediabox, and then merge this page onto a new, blank page.
Fortunately, this works as intended with the PDFs I need it for. However, I also tried a few other PDFs and for some, it just does not work. It copies over the complete PDF. This happens for example, when my code is given this file: https://www.veeam.com/veeam_backup_product_overview_ds.pdf
Here is the code:
import pypdf # PyPDF2, 3 and 4 are deprecated. PyPDF is currently in active development
reader = pypdf.PdfReader(source_filename)
writer = pypdf.PdfWriter()
# get first page
page = reader.pages[0]
# create new page
new_page = pypdf.PageObject.create_blank_page( None, width = page.mediabox.width, height = page.mediabox.height )
# crop original
page.mediabox.bottom = ( page.mediabox.top - page.mediabox.bottom ) / 2 + page.mediabox.bottom
# merge original into empty new page
new_page.merge_page( page )
writer.add_page(new_page)
with open(output_file, "wb") as fp:
writer.write(fp)
Can anyone explain why it does not work sometimes?

Replacing a word with another word, and replacing an image with another image in a PDF file through python, is this possible?

I need to replace a K words with K other words for every PDF file I have within a certain path file location and on top of this I need to replace every logo with another logo. I have around 1000 PDF files, and so I do not want to use Adobe Acrobat and edit 1 file at a time. How can I start this?
Replacing words seems at least doable as long as there is a decent PDF reader one can access through Python ( Note I want to do this task in Python ), however replacing an image might be more difficult. I will most likely have to find the dimension of the current image and resize the image being used to replace the current image dynamically, whilst the program runs through these PDF files.
Hi, so I've written down some code regarding this:
from pikepdf import Pdf, PdfImage, Name
import os
import glob
from PIL import Image
import zlib
example = Pdf.open(r'...\Likelihood.pdf')
PagesWithImages = []
ImageCodesForPages = []
# Grab all the pages and all the images in every page.
for i in example.pages:
if len(list(i.images.keys())) >= 1:
PagesWithImages.append(i)
ImageCodesForPages.append(list(i.images.keys()))
pdfImages = []
for i,j in zip(PagesWithImages, ImageCodesForPages):
for x in j:
pdfImages.append(i.images[x])
# Replace every single page using random image, ensure that the dimensions remain the same?
for i in pdfImages:
pdfimage = PdfImage(i)
rawimage = pdfimage.obj
im = Image.open(r'...\panda.jpg')
pillowimage = pdfimage.as_pil_image()
print(pillowimage.height)
print(pillowimage.width)
im = im.resize((pillowimage.width, pillowimage.height))
im.show()
rawimage.write(zlib.compress(im.tobytes()), filter=Name("/FlateDecode"))
rawimage.ColorSpace = Name("/DeviceRGB")
So just one problem, it doesn't actually replace anything. If you're wondering why and how I wrote this code I actually got it from this documentation:
https://buildmedia.readthedocs.org/media/pdf/pikepdf/latest/pikepdf.pdf
Start at Page 53
I essentially put all the pdfImages into a list, as 1 page can have multiple images. In conjunction with this, the last for loop essentially tries to replace all these images whilst maintaining the same width and height size. Also note, the file path names I changed here and it definitely is not the issue.
Again Thank You

I have figured out what I was doing wrong. So for anyone that wants to actually replace an image with another image in place on a PDF file what you do is:
from pikepdf import Pdf, PdfImage, Name
from PIL import Image
import zlib
example = Pdf.open(filepath, allow_overwriting_input=True)
PagesWithImages = []
ImageCodesForPages = []
# Grab all the pages and all the images in every page.
for i in example.pages:
imagelists = list(i.images.keys())
if len(imagelists) >= 1:
for x in imagelists:
rawimage = i.images[x]
pdfimage = PdfImage(rawimage)
rawimage = pdfimage.obj
pillowimage = pdfimage.as_pil_image()
im = Image.open(imagePath)
im = im.resize((pillowimage.width, pillowimage.height))
rawimage.write(zlib.compress(im.tobytes()), filter=Name("/FlateDecode"))
rawimage.ColorSpace = Name("/DeviceRGB")
rawimage.Width, rawimage.Height = pillowimage.width, pillowimage.height
example.save()
Essentially, I changed the arguements in the first line, such that I specify that I can overwrite. In conjunction, I also added the last line which actually allows me to save.

add two images in same line in python-docx

I am trying to add two images in docx file.
Images should be one left side one right side.
After using this below code the image position is working like left and right as I want but they are not on the same line I want.
One is up and others are under that image.
I have tried the WD_ALIGN_PARAGRAPH.RIGHT but I am not getting the result I want.
## Image for Left Side
my_img = document.add_picture(i,width=Inches(0.8),height=Inches(0.8))
last_paragraph = document.paragraphs[-1]
last_paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT
## Image for Right Side
my_img2 = document.add_picture(i,width=Inches(0.8),height=Inches(0.8))
last_paragraph = document.paragraphs[-1]
last_paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT
Please help me, I want both images on same line like two images together with little space between them.

Use Run.add_picture() instead of Paragraph.add_picture(). This will allow multiple images in the same paragraph, which, if they both fit within the page margins, will result in side-by-side images:
paragraph = document.add_paragraph()
run = paragraph.add_run()
run.add_picture(...)
run_2 = paragraph.add_run()
run_2.add_picture(...)
As far as alignment is concerned, when using paragraphs, inserting tabs is probably most reliable. The other alternative is to add a table and place the images in side-by-side cells.

How can I pull information from Excel into PowerPoint using Python and keep the format?

I've written a script with python's xlrd and pptx to read each workbook in a directory and pull information from each sheet into a table in a PowerPoint slide. It works okay if the excel table is small but I don't know what will be in these excel files. It becomes illegible when there is too many rows and columns. My main problem arose when an excel file had graphs instead of cells and the script couldn't read it. So I tried using pyscreenshot to open the document and take a screenshot but this seems slow and unnecessary. I'd like to make a slide in the PowerPoint look exactly as it would in excel but with the ability to add and change things.
import libraries and modules
import xlrd
from pptx import Presentation
from pptx.util import Inches, Pt
import time
import glob
import os
start = time.time()
prs = Presentation()
title_slide_layout = prs.slide_layouts[0]
slide = prs.slides.add_slide(title_slide_layout)
shapes = slide.shapes
title = slide.shapes.title
subtitle = slide.placeholders[1]
title.text = "Dashboard Generator"
subtitle.text = "made with Python-pptx and xlrd"
for filename in glob.glob(os.path.join("C:/Users/penelope/Desktop/PMO/myfiles/", '*.xlsx')):
print(filename)
file_location = filename
try:
workbook = xlrd.open_workbook(file_location)
nsheets = workbook.nsheets
for n in range(0, nsheets):
sheet = workbook.sheet_by_index(n)
print("sheet:", sheet)
rows = sheet.nrows
cols = sheet.ncols
c = cols
r = rows
if c > 0:
print(c, r)
slide = prs.slides.add_slide(prs.slide_layouts[5])
shapes = slide.shapes
title = slide.shapes.title
title.text = "Table testing"
left = Inches(0.0)
top = Inches(2.0)
width = Inches(6.0)
height = Inches(4.0)
num = 10.0/c
table = shapes.add_table(rows, cols, left, top, width, height).table
for i in range(0, c):
table.columns[i].width = Inches(num)
for i in range(0,r):
for e in range(0,c):
table.cell(i,e).text = str(sheet.cell_value(i,e))
cell = table.rows[i].cells[e]
paragraph = cell.text_frame.paragraphs[0]
paragraph.font.size = Pt(11)
except:
print("Error!")
pass
prs.save('powerpointfile1.pptx')
end = time.time()
print(end - start)
And this is my screenshot script:
import os
import time
import pyscreenshot as ImageGrab
from PIL import Image
if __name__ == "__main__":
os.system('start excel.exe "C:/Users/penelope/Desktop/PMO/TestCase.xlsx"')
time.sleep(3)
im=ImageGrab.grab(bbox=(24,210,1800,990))
im.save("image7.png")
img = Image.open('image7.png')
img.show()

Well, you've chosen a hard problem. Certainly all the times I've attempted this sort of thing I've ended up abandoning the effort.
The fundamental explanation I formed was that Excel (and Word) are "flowed" document environments. That is, when you run out of room on one page, it flows to the next. PowerPoint, on the other hand, is a page-by-page exhibit layout environment. Each slide is independent of the rest (evidenced by the ability to reorder slides freely), each meant to be shown all at once, and not scrolled. This leads to each slide being self-contained, which means constrained to a single "page".
There's a limit to how much information one can place on a slide and still have it communicate. Generally less is better. So, perhaps it's not a surprise all my early efforts there ended in frustration :) I also concluded that an effective "dashboard" slide would require very skillful layout, and extreme restraint on content length, probably requiring specific (human) summarization effort (not just copying from a "database").
Regarding the charts bit, those theoretically can be moved to PowerPoint and I've even seen it done, but it's technically quite challenging. There is no API support for it in python-pptx. This historical issue on the GitHub repo may give some idea what was involved. Not for the faint of heart I expect :)

Issue writing temp images to temp pdf in pyramid with reportlabs

I am using python 3, with pyramid and reportlabs to generate dynamic pdfs.
I am having a issue writing images in to a pdf. I am using Reportlab in a web to generate a pdf with images, by my images are not stored locally, they are on a remote server. I am downloading them locally into a temp directory ( they are saving, I have checked) When i go to add the images to the pdf, they space is allocating but image is not showing up.
Here is my relevant code (simplified):
# creates pdf in memory
doc = SimpleDocTemplate(pdfName, pagesize=A4)
elements = []
for item in model['items']:
# image goes here:
if item['IMAGENAME']:
response = getImageFromRemoteServer(item['IMAGENAME'])
dir_filename = directory + item['IMAGENAME']
if response.status_code == 200:
with open(dir_filename, 'wb') as f:
for chunk in response.iter_content():
f.write(chunk)
questions.append(Image(dir_filename, width=2*inch, height=2*inch))
# create and save the pdf
doc.build(elements,canvasmaker=NumberedCanvas)
I have followed the user guide here https://www.reportlab.com/docs/reportlab-userguide.pdf and have tried the above way, plus embedded images (as the user guide says in the paragraph section) and putting the image in the table.
I also looked here: and it did not help me.
My question is really, what is the right what to download an image and put in a pdf?
EDIT: fixed code indentation
EDIT 2:
Answered, I was finally about to get the images in the PDF. I am not sure what was the trigger to get it to work. The only thing that know I change was now I am using urllib to do the request and before i was not. Here is the my working code (simplified for the question only, this is more abstracted and encapsulated in my code.):
doc = SimpleDocTemplate(pdfName, pagesize=A4)
# array of elements in the pdf
elements = []
for question in model['questions']:
# image goes here:
if question['IMAGEFILE']:
filename = question['IMAGEFILE']
dir_filename = directory + filename
url = get_url(settings, filename)
response = urllib.request.urlopen(url)
raw_data = response.read()
f = open(dir_filename, 'wb')
f.write(raw_data)
f.close()
response.close()
myImage = Image(dir_filename)
myImage.drawHeight = 2* inch
myImage.drawWidth = 2* inch
myImage.hAlign = "LEFT"
elements.append(myImage)
# create and save the pdf
doc.build(elements)

Make your code independent from where the files come from. Separate file/resource retrieval from document generation. Ensure that your toolset is working with local files. Encapsulate the code to load files in a loader class or function. The encapsulation is what matters. Noticed this again this week while looking at thumbor loader classes.
If that works, you know reportlab, PIL and your application basically work.
Then make your code work with remote files using URI like http://path/to/remote/files.
Afterwards you can switch from using your fileloader or your httploader depending on environment or use case.
Another option to go would be to make your code work with local files using URI like file://path/to/file
This way the only thing that changes when switching from local to remote is the URL. Probably you need a python library supporting this. requests library is well suited for downloading things, most probably it supports URL scheme file:// as well.

Most probably the lazy parameter was responsible that your first code sample did not render the images. Triggering reportlab PDF rendering outside of the context managers for temporary files could have lead to this behaviour.
reportlab.platypus.flowables.py (using version 3.1.8)
class Image(Flowable):
"""an image (digital picture). Formats supported by PIL/Java 1.4 (the Python/Java Imaging Library
are supported. At the present time images as flowables are always centered horozontally
in the frame. We allow for two kinds of lazyness to allow for many images in a document
which could lead to file handle starvation.
lazy=1 don't open image until required.
lazy=2 open image when required then shut it.
"""
_fixedWidth = 1
_fixedHeight = 1
def __init__(self, filename, width=None, height=None, kind='direct', mask="auto", lazy=1):
"""If size to draw at not specified, get it from the image."""
self.hAlign = 'CENTER'
self._mask = mask
fp = hasattr(filename,'read')
if fp:
self._file = filename
self.filename = repr(filename)
...
The last three lines of the code example tell you that you can pass an object that has a read method. This is exactly what a call to urllib.request.urlopen(url) returns. Using that memory buffer you create an instance of Image. No need to have write access to filesystem, no need to delete these files after PDF rendering. Applying our new knowledge to improve code readability. Since your use-case includes retrieval of remote resources using memory buffers that support python file API could be a much cleaner approach to assemble your PDF files.
from contextlib import closing
import urllib.request
doc = SimpleDocTemplate(pdfName, pagesize=A4)
# array of elements in the pdf
elements = []
for question in model['questions']:
# download image and create Image from file-like object
if question['IMAGEFILE']:
filename = question['IMAGEFILE']
image_url = get_url(settings, filename)
with closing(urllib.request.urlopen(image_url)) as image_file:
myImage = Image(image_file, width=2*inch, height=2*inch)
myImage.hAlign = "LEFT"
elements.append(myImage)
# create and save the pdf
doc.build(elements)
References
Coding with context managers

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Insert several images at once using macro scripting in LibreOffice - python

Related

Cropping the Mediabox does not work for some pdfs

Replacing a word with another word, and replacing an image with another image in a PDF file through python, is this possible?

add two images in same line in python-docx

How can I pull information from Excel into PowerPoint using Python and keep the format?

Issue writing temp images to temp pdf in pyramid with reportlabs

Categories

Resources