convert vti files to open it with python - python

I have several .vti files . How can I convert .vti files into a standard format of 2D images? I've tried ParaView, but didn't find any option to convert in one of above mentioned formats like JPEG or PNG.

In ParaView: load your vti file, then in the File menu click "Save Data..." and chose the image file format you prefer (I tested with PNG, it works).
In Python, this script reads test.vti and saves test.jpg:
import vtk
reader = vtk.vtkXMLImageDataReader()
reader.SetFileName("test.vti")
reader.Update()
image = reader.GetOutput()
writer = vtk.vtkJPEGWriter()
writer.SetInputData(image)
writer.SetFileName("test.jpg")
writer.Write()

Related

Python with pytesseract - How to get the same output for pytesseract.image_to_data in a searchable PDF?

I have this piece of code in Python that makes use of pytesseract (method pytesseract.image_to_data).
This gives me great text information and coordinates that are saved in a text file that is fed to a third party software. It works perfectly for PDF files that have been scanned
data = pytesseract.image_to_data(Image.open('file-001-page-001.png')))
The issue now is that I have a demand for output in the exact same structure for PDFs that already contain text. It's possible to keep the same code and continue as if the PDF had no text, extracting images and doing OCR, but it doesn't seem like the right solution...
Is it possible to achieve this with pytesseract?
Suggestions are welcome
You can use this:
import pytesseract
from PIL import Image
# Open the PDF file
with open('file.pdf', 'rb') as f:
# Extract text from the PDF file and save it to a variable
text = pytesseract.image_to_pdf_or_hocr(f, extension='hocr', lang='eng', config='--oem 3 --psm 6')
# Save the extracted text to a file in the desired format
with open('output.hocr', 'w')as f:
f.write(text)

PDF to IMG to PDF all done in memory

In order to remove sensitive content from a PDF, I am converting it to image and back to PDF again.
I am able to do this while saving the jpeg image, however I would eventually like to adapt my code so that the file is in memory the whole time. PDF in memory -> JPEG in memory -> PDF in memory. I'm having trouble with the intermediary step.
from pdf2image import convert_from_path, convert_from_bytes
import img2pdf
images = convert_from_path('testing.pdf', fmt='jpeg')
image = images[0]
# opening from filename
with open("output/output.pdf","wb") as f:
f.write(img2pdf.convert(image.tobytes()))
On the last line, I am getting the error:
ImageOpenError: cannot read input image (not jpeg2000). PIL: error reading image: cannot identify image file <_io.BytesIO object at 0x1040cc8f0>
I'm not sure how to be converting this image to the string that img2pdf is looking for.
The pdf2image module will extract the images as Pillow images. And according the Pillow tobytes() documention: "This method returns the raw image data from the internal storage." Which is some bitmap representation.
To get your code working use BytesIO module like so:
# opening from filename
import io
with open("output/output.pdf","wb") as f, io.BytesIO() as output:
image.save(output, format='jpg')
f.write(img2pdf.convert(output.getvalue()))

Problems with Python Wand and paths to JXR imagery when attempting to convert JXR imagery to JPG format?

I need to be able to convert JPEG-XR images to JPG format, and have gotten this working through ImageMagick itself. However, I need to be able to do this from a python application, and have been looking at using Wand.
Wand does not seem to properly use paths to JXR imagery.
with open(os.path.join(args.save_location, img_name[0], result[0]+".jxr"), "wb") as output_file:
output_file.write(result[1])
with Image(filename=os.path.join(args.save_location, img_name[0], result[0]+".jxr")) as original:
with original.convert('jpeg') as converted:
print(converted.format)
pass
The first part of this - creating output_file and writing result[1] (blob of JXR imagery from a SQLite database) - works fine. However, when I attempt to then open that newly-saved file as an image using Python and Wand, I get an error that ultimately suggests Wand is not looking in the correct location for the image:
Extracting panorama 00000
FAILED: -102=pWS->Read(pWS, szSig, sizeof(szSig))
JXRGlueJxr.c:1806
FAILED: -102=ReadContainer(pID)
JXRGlueJxr.c:1846
FAILED: -102=pDecoder->Initialize(pDecoder, pStream)
JXRGlue.c:426
FAILED: -102=pCodecFactory->CreateDecoderFromFile(args.szInputFile, &pDecoder)
e:\coding\python\sqlite panoramic image extraction tool\jxrlib\jxrencoderdecoder\jxrdecapp.c:477
JPEG XR Decoder Utility
Copyright 2013 Microsoft Corporation - All Rights Reserved
... [it outputs its help page in case of errors; snipped]
The system cannot find the file specified.
Traceback (most recent call last):
File "E:\Coding\Python\SQLite Panoramic Image Extraction Tool\SQLitePanoramicImageExtractor\trunk\PanoramicImageExtractor.py", line 88, in <module>
with Image(filename=os.path.join(args.save_location, img_name[0], result[0]+".jxr")) as original:
File "C:\Python34\lib\site-packages\wand\image.py", line 1991, in __init__
self.read(filename=filename, resolution=resolution)
File "C:\Python34\lib\site-packages\wand\image.py", line 2048, in read
self.raise_exception()
File "C:\Python34\lib\site-packages\wand\resource.py", line 222, in raise_exception
raise e
wand.exceptions.BlobError: unable to open image `C:/Users/RPALIW~1/AppData/Local/Temp/magick-14988CnJoJDwMRL4t': No such file or directory # error/blob.c/OpenBlob/2674
As you can see at the very end, it seems to have attempted to run off to open a temporary file 'C:/Users/RPALIW~1/AppData/Local/Temp/magick-14988CnJoJDwMRL4'. The filename used at this point should be exactly the same as the filename used to save the imagery as a file just a few lines above, but Wand has substituted something else? This looks similar to the last issue I had with this in ImageMagick, which was fixed over the weekend (detailed here: http://www.imagemagick.org/discourse-server/viewtopic.php?f=1&t=27027&p=119702#p119702).
Has anyone successfully gotten Wand to open JXR imagery as an Image in Python, and convert to another format? Am I doing something wrong here, or is the fault with ImageMagick or Wand?
Something very similar is happening to me. I'm getting an error:
wand.exceptions.BlobError: unable to open image `/var/tmp/magick-454874W--g1RQEK3H.ppm': No such file or directory # error/blob.c/OpenBlob/2701
The path given is not the file path I of the image I am trying to open.
From the docs:
A binary large object could not be allocated, read, or written.
And I am trying to open a large file. (18mb .cr). Could the file size be the problem?
For me:
from wand.image import Image as WImage
with open(file_name, 'r+') as f:
with WImage(file = f) as img:
print 'Opened large image'
Or:
with open(file_name, 'r+') as f:
image_binary = f.read()
with WImage(blob = image_binary) as img:
print 'Opened Large Image'
Did the trick
~Victor

Python PIL reading PNG from STDIN

I am having a problem reading png images from STDIN using PIL. When the image is written by PIL it is all scrambled, but if I write the file using simple file open, write and close the file is saved perfectly.
I have a program that dumps png files to stdout in a sequence, with no compression, and I read that stream using a python script which is suposed to read the data and do some routines on almost every png. The program that dumps the data writes a certain string to delimiter the PNGs files, the string is "{fim:FILE_NAME.png}"
The script is something like:
import sys
import re
from PIL import Image
png = None
for linha in sys.stdin:
if re.search('{fim:', linha):
fname = linha.replace('{fim:','')[:-2]
# writes data directly to file, works fine
#f = open("/tmp/%s" % fname , 'w')
#f.write(png)
#f.close()
# create a PIL Image from data and writes to disk, fails fine
im = Image.frombuffer("RGB",(640,480),png, "raw", "RGB", 0, 1)
#im = Image.fromstring("RGB",(640,480),png)
im.save("/tmp/%s" % fname)
png = None
else:
if png is None:
png = linha
else:
png+= linha
imagemagick identify from a wrong image:
/tmp/1349194042-24.png PNG 640x480 640x480+0+0 8-bit DirectClass 361KiB 0.010u 0:00.019
imagemagick identify from a working image:
/tmp/1349194586-01.png PNG 640x480 640x480+0+0 8-bit DirectClass 903KiB 0.010u 0:00.010
Does any one have an idea of what is happening? Is it something about little/big endians? I have tried Image.frombuffer, Image.fromstring, different modes, but nothing. It seems that there is more information on the buffer that the PIL expects.
Thanks,
If the png variable contains the binary data from a PNG file, you can't read it using frombuffer; that's used for reading raw pixel data. Instead, use io.StringIO and Image.open, i.e.:
import io
from PIL import Image
img = Image.open(io.StringIO(png))
png variable is uninitialized on the first call to Image.frombuffer(). You need to initialize it to something from stdin.
I'm not sure about your use of for linha in sys.stdin:. That gives you line-buffered input. You probably want to use block buffered input of size N, like sys.stdin.read(N). This will read a specific number of bytes and then you can parse the data, like cutting your filename delimiter out and filling the input buffer for Image.frombuffer().

Python determine file type from http form submission

I want to do an image form submission, and I want to validate that the image was submitted is an image server side, which is running python. Is there a simple way to do this in pure python?
A simple and naive way to do it would be with libmagic (for example the one at https://github.com/ahupp/python-magic). A better way, but it's not native Python and is a very extensive library, would be to use PIL http://www.pythonware.com/products/pil/.
Use PIL:
import sys
import Image
for infile in sys.argv[1:]:
try:
im = Image.open(infile)
print infile, im.format, "%dx%d" % im.size, im.mode
except IOError:
pass
From the docs:
The Python Imaging Library supports a wide variety of image file
formats. To read files from disk, use the open function in the Image
module. You don't have to know the file format to open a file. The
library automatically determines the format based on the contents of
the file.

Categories

Resources