Python: converting an xml file to an image

Python: converting an xml file to an image - python

I am looking to convert a xml file to an image (ideally a png file) using a python script. I have not found much from my online research. I am trying to use PIL. From this post on StackOverflow I was able to find this code:
from PIL import Image
import ImageFont, ImageDraw
image = Image.new("RGBA", (288,432), (255,255,255))
usr_font = ImageFont.truetype("resources/HelveticaNeueLight.ttf", 25)
d_usr = ImageDraw.Draw(image)
d_usr = d_usr.text((105,280), "MYTEXT",(0,0,0), font=usr_font)
But I do not quite understand what's happening. I tried to replace "MYTEXT" with the actual xml file content and it did not work.
I am basically looking for any solution (ideally using PIL, but it can be another module for python). I came close using imgkit:
import imgkit
imgkit.from_file('example_IN.xml','example_OUT.png')
which returns a png file. The resolution of the image is terrible though, and it lies within a very large white rectangle. I may be missing something. I know you can modify options for imgkit, but I have no idea what modifications to bring, even after checking the documentation. Any help would be deeply appreciated.
Thank you so much!
Best regards.

I had a go in pyvips:
#!/usr/bin/env python3
import sys
import pyvips
from xml.sax.saxutils import escape
# load first arg as a string
txt = open(sys.argv[1], "r").read()
# pyvips allows pango markup in strings -- you can write stuff like
# text("hello <i>sailor!</i>")
# so we need to escape < > & in the text file
txt = escape(txt)
img = pyvips.Image.text(txt)
# save to second arg
img.write_to_file(sys.argv[2])
You can run it like this:
./txt2img.py vari.ws x.png
To make this:
It's pretty quick -- that took 300ms to run on this modest laptop.
The text method has a lot of options if you want higher res, to change the alignment, wrap lines at some limit, change the font, etc. etc.
https://libvips.github.io/libvips/API/current/libvips-create.html#vips-text

The solution suggested above by jcuppit using pyvips definitely works and is quick. I found another solution to make my previous code above work using imgkit (it is slower, I am giving it here just for reference): the resolution of the output image was bad. If this happens, width and height can be changed in the options (this is an easy fix I had missed):
import imgkit
options = {
'width' : 600,
'height' : 600
}
imgkit.from_file('example_IN.xml','example_OUT.png', options=options)
And that will convert a xml file into a png file as well.

Related

I have a folder full of pdfs I am wanting to create a code that spits out a list of all pdfs that contain the color blue

Like the title says, a bunch of pdfs that need to be gone through and a list made showing the pdfs that have the color blue in them.
I tried using a snippet of code from another post that is similar to try and get a list of colors from one document thinking if I could create a loop to go through all documents and export the output to excel and filter for a specific color, that might work, but I cant even get it to work for a single pdf:
#!/usr/bin/env python
# -*- Encoding: UTF-8 -*-
import minecart
colors = set()
with open("F://Prints/0-25162.PDF", "rb") as file:
document = minecart.Document(file)
page = document.get_page(1)
for shape in page.shapes:
if shape.outline:
colors.add(shape.outline.color.as_rgb())
for color in colors: print (color)
Any help or direction would be appreciated.

I would try to render the PDF into PNG or similar bitmap format, then load it as a Python pixel array (using Pillow or similar), and look for blue pixels. Not sure which library you'd use for the rasterizing, but Pillow or pdf2image might do the job. Alternatively, you can do it with ImageMagick prior to the Python processing.

OpenCV Python not opening images with imread()

I'm not entirely sure why this is happening but I am in the process of making a program and I am having tons of issues trying to get opencv to open images using imread. I keep getting errors saying that the image is 0px wide by 0px high. This isn't making much sense to me so I searched around on here and I'm not getting any answers from SO either.
I have taken about 20 pictures and they are all using the same device. Probably 8 of them actually open and work correctly, the rest don't. They aren't corrupted either because they open in other programs. I have triple checked the paths and they are using full paths.
Is anyone else having issues like this? All of my files are .jpgs and I am not seeing any problems on my end. Is this a bug or am I doing something wrong?
Here is a snippet of the code that I am using that is reproducing the error on my end.
imgloc = "F:\Kyle\Desktop\Coinjar\Test images\ten.png"
img = cv2.imread(imgloc)
cv2.imshow('img',img)
When I change the file I just adjust the name of the file itself the entire path doesn't change it just refuses to accept some of my images which are essentially the same ones.
I am getting this error from a later part of the code where I try to use img.shape
Traceback (most recent call last):
File "F:\Kyle\Desktop\Coinjar\CoinJar Test2.py", line 14, in <module>
height, width, depth = img.shape
AttributeError: 'NoneType' object has no attribute 'shape'
and I am getting this error when I try to show a window from the code snippet above.
Traceback (most recent call last):
File "F:\Kyle\Desktop\Coinjar\CoinJar Test2.py", line 11, in <module>
cv2.imshow('img',img)
error: ..\..\..\..\opencv\modules\highgui\src\window.cpp:261: error: (-215) size.width>0 && size.height>0 in function cv::imshow

Probably you have problem with special meaning of \ in text - like \t or \n
Use \\ in place of \
imgloc = "F:\\Kyle\\Desktop\\Coinjar\\Test images\\ten.png"
or use prefix r'' (and it will treat it as raw text without special codes)
imgloc = r"F:\Kyle\Desktop\Coinjar\Test images\ten.png"
EDIT:
Some modules accept even / like in Linux path
imgloc = "F:/Kyle/Desktop/Coinjar/Test images/ten.png"

From my experience, file paths that are too long (OS dependent) can also cause cv2.imread() to fail.
Also, when it does fail, it often fails silently, so it is hard to even realize that it failed, and usually something further the the code will be what sparks the error.
Hope this helps.

Faced the same problem on Windows: cv.imread returned None when reading jpg files from a subfolder. The same code and folder structure worked on Linux.
Found out that cv.imread processes the same jpg files, if they are in the same folder as the python file.
My workaround:
copy the image file to the python file folder
use this file in cv.imread
remove redundant image file
import os
import shutil
import cv2 as cv
image_dir = os.path.join('path', 'to', 'image')
image_filename = 'image.jpg'
full_image_path = os.path.join(image_dir, image_filename)
image = cv.imread(full_image_path)
if image is None:
shutil.copy(full_image_path, image_filename)
image = cv.imread(image_filename)
os.remove(image_filename)
...

I had i lot of trouble with cv.imread() not finding my Image. I think i tryed everything involving changing the path. The os.path.exists(file_path) function also gave me back a True.
I finaly solved the problem by loading the images with imageio.
img = imageio.imread('file_path')
This also loads the img in a numpy array and you can use funktions like cv.matchTemplate() on this object. But i would recomment if u are doing stuff with multiple images that you then read all of them with imageio because i found diffrences in the arrays produced by .imread() from the two libs (opencv, imageio) on a File both of them could open.
I hope i could help someone

Take care to :
try imread() with a reliable picture,
and the correct path in your context like (see Kyle772 answer). For me either //or \.
I lost a couple of hours trying with 2 images saved from a left click in a browser. As soon as I took a personal camera image, it works fine.
Spyder screen shot
#context windows10 / anaconda / python 3.2.0
import cv2
print(cv2.__version__) # 3.2.0
imgloc = "D:/violettes/Software/Central/test.jpg" #this path works fine.
# imgloc = "D:\\violettes\\Software\\Central\\test.jpg" this path works fine also.
#imgloc = "D:\violettes\Software\Central\test.jpg" #this path fails.
img = cv2.imread(imgloc)
height, width, channels = img.shape
print (height, width, channels)
python opencv image-loading imread

I know that the question is already answered but in case anybody still is not able to load images with imread. It may be because there are letters in the string path witch imread does not accept.
For exmaple umlauts and diacritical marks.

My suggestion for everyone facing the same problem is to try this:
cv2.imshow("image", img)
The img is keyword. Never forget.

When you get error like this AttributeError: 'NoneType' object has no attribute 'shape'
Try with new_image=image.copy

PIL - Open image is not able to be read

Im trying to read a file's format so I can correctly assign a new name to it and write it to disk, but when the Image.open() is on the image, I cannot write the image to disk. So for example :
This works:
>>>file = open('708864.jpg')
>>> open('lala.jpeg', 'w').write(file.read())
But, this doesn't
>>>import Image
>>>im = Image.open('708864.jpg')
>>> im.format
>>> open('lala.jpeg', 'w').write(file.read())
It creates a corrupted file (lala.jpeg) which is unable to be opened by any software.
I'm suspecting the culprit is the Image.open(). And after trying to locate an Image.close() statement, I was unable to find one. How would you "close" this image, so I can still write it to disk?

As suggested in my comment, im.save('lala.jpg') is the way to go.
For all the other fun methods on an Image object, you can look at the documentation.

Some workaround, it is just idea:
import Image
import StringIO
file = open('/home/mrok/1.jpg')
output = StringIO.StringIO(file.read())
im = Image.open('/home/mrok/1.jpg')
im.format
open('/home/mrok/2.jpg', 'w').write(output.getvalue())
output.close()

As said in a comment, I ended up using a function I never knew about before, Image.save() , which quickly solves my problem.

Convert SVG to PDF (svglib + reportlab not good enough)

I'm creating some SVGs in batches and need to convert those to a PDF document for printing. I've been trying to use svglib and its svg2rlg method but I've just discovered that it's absolutely appalling at preserving the vector graphics in my document. It can barely position text correctly.
My dynamically-generated SVG is well formed and I've tested svglib on the raw input to make sure it's not a problem I'm introducing.
So what are my options past svglib and ReportLab? It either has to be free or very cheap as we're already out of budget on the project this is part of. We can't afford the 1k/year fee for ReportLab Plus.
I'm using Python but at this stage, I'm happy as long as it runs on our Ubuntu server.
Edit: Tested Prince. Better but it's still ignoring half the document.

I use inkscape for this. In your django view do like:
from subprocess import Popen
x = Popen(['/usr/bin/inkscape', your_svg_input, \
'--export-pdf=%s' % your_pdf_output])
try:
waitForResponse(x)
except OSError, e:
return False
def waitForResponse(x):
out, err = x.communicate()
if x.returncode < 0:
r = "Popen returncode: " + str(x.returncode)
raise OSError(r)
You may need to pass as parameters to inkscape all the font files you refer to in your .svg, so keep that in mind if your text does not appear correctly on the .pdf output.

CairoSVG is the one I am using:
import cairosvg
cairosvg.svg2pdf(url='image.svg', write_to='image.pdf')

rst2pdf uses reportlab for generating PDFs. It can use inkscape and pdfrw for reading PDFs.
pdfrw itself has some examples that show reading PDFs and using reportlab to output.
Addressing the comment by Martin below (I can edit this answer, but do not have the reputation to comment on a comment on it...):
reportlab knows nothing about SVG files. Some tools, like svg2rlg, attempt to recreate an SVG image into a PDF by drawing them into the reportlab canvas. But you can do this a different way with pdfrw -- if you can use another tool to convert the SVG file into a PDF image, then pdfrw can take that converted PDF, and add it as a form XObject into the PDF that you are generating with reportlab. As far as reportlab is concerned, it is really no different than placing a JPEG image.
Some tools will do terrible things to your SVG files (rasterizing them, for example). In my experience, inkscape usually does a pretty good job, and leaves them in a vector format. You can even do this headless, e.g. "inkscape my.svg -A my.pdf".
The entire reason I wrote pdfrw in the first place was for this exact use-case -- being able to reuse vector images in new PDFs created by reportlab.

Just to let you know and for the future issue, I find a solution for this problem:
# I only install svg2rlg, not svglib (svg2rlg is inside svglib as well)
import svg2rlg
# Import of the canvas
from reportlab.pdfgen import canvas
# Import of the renderer (image part)
from reportlab.graphics import renderPDF
rlg = svg2rlg.svg2rlg("your_img.svg")
c = canvas.Canvas("example.pdf")
c.setTitle("my_title_we_dont_care")
# Generation of the first page
# You have a last option on this function,
# about the boundary but you can leave it as default.
renderPDF.draw(rlg, c, 80, 740 - rlg.height)
renderPDF.draw(rlg, c, 60, 540 - rlg.height)
c.showPage()
# Generation of the second page
renderPDF.draw(rlg, c, 50, 740 - rlg.height)
c.showPage()
# Save
c.save()
Enjoy a bit with the position (80, 740 - h), it is only the position.
If the code doesn't work, you can look at in the render's reportlab library.
You have a function in reportlab to create directly a pdf from your image:
renderPDF.drawToFile(rlg, "example.pdf", "title")
You can open it and read it. It is not very complicated. This code come from this function.

Python get mac clipboard contents

How can I, using Python (2.7) get the contents of the Mac clipboard. Is there a better way than making a wrapper around pbpaste?
Thanks!

PyObjC is the way to go:
#!/usr/bin/python
from AppKit import NSPasteboard, NSStringPboardType
pb = NSPasteboard.generalPasteboard()
pbstring = pb.stringForType_(NSStringPboardType)
print u"Pastboard string: %s".encode("utf-8") % repr(pbstring)
This only supports text and will return None otherwise. You can extend it to support other data types as well, see NSPastboard Class Reference.

Have you looked at the xerox module?
It is supposed to support windows, OS X and Linux
Usage is as follows:
xerox.copy(u'some string')
And to paste:
>>> xerox.paste()
u'some string'

If you have installed pandas, you can use the function in pandas as follows:
from pandas.io.clipboard import clipboard_get
text = clipboard_get()

The problem with the xerox module and most code samples I've found for "get the contents of the Mac clipboard" is that they return plain text only. They don't support hyperlinks, styles, and such, so they're not really able to access the full contents provided by apps like Microsoft Word and Google Chrome.
Standing on the shoulders of others, I finally figured out how to do this. The resulting richxerox module is available on PyPI and Bitbucket.
Though this question is old, I'm leaving breadcrumbs here because I consistently re-found this page via Google while searching for the answer.

Do you know PyObjC? I guess you could use it to write a Py wrapper which interfaces with NSPasteboard. This might be more "elegant" than shelling out to pbpaste.

You can grab the clipboard (and the screen) with PIL/Pillow on a Mac like this:
from PIL import ImageGrab, Image
# Grab clipboard and save to disk
clip = ImageGrab.grabclipboard()
clip.save("clip.png")
Just for completeness, you can grab the screen like this:
screen = ImageGrab.grab()
# That results in this:
# <PIL.PngImagePlugin.PngImageFile image mode=RGBA size=5120x2880 at 0x110BB7748>
# Save to disk
screen.save("screen.png")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: converting an xml file to an image - python

Related

I have a folder full of pdfs I am wanting to create a code that spits out a list of all pdfs that contain the color blue

OpenCV Python not opening images with imread()

PIL - Open image is not able to be read

Convert SVG to PDF (svglib + reportlab not good enough)

Python get mac clipboard contents

Categories

Resources