Display TIFF Image On Juypyter Notebook using Python Without Saving / Downloading - python

I am working with web services using requests to get an image based on parameters passed. The first response I get is a XML schema with file reference URL.
<?xml version="1.0"?>
<Coverages schemaLocation="http://localhost/server/schemas/wcs/1.1.1/wcsCoverages.xsd">
<Coverage>
<Title>Filename</Title>
<Abstract/>
<Identifier>Filename</Identifier>
<Reference href="http://localhost/server/temp/filename.tif"/>
</Coverage>
</Coverages>
Next using xml.etree.ElementTree I extracted the URL. Next what I need is to dsiplay that tiff image (or any other image) on the Jupyter Notebook without downloading (as one image can be more than 50 or 100 MB sometimes)
Currently I am downloading and plotting the file after reading and converting file data into array ( as pyplot plot image array/matrix ).
import requests as req # request wcs urls
import xml.etree.ElementTree as ET # used for xml parsing
import matplotlib.pyplot as plt # display image
import gdal
# Download the File to local directory using Chunks
chunk_size=1024
local_filename = url.split('/')[-1] # Filename from url
r = req.get(url, stream=True)
with open(local_filename, 'wb') as f:
for chunk in r.iter_content(chunk_size):
if chunk:
f.write(chunk)
# Read File Raster Data as Array using Gdal
gtif = gdal.Open(local_filename)
georaster = gtif.ReadAsArray()
# Plot image using matplotlib
plt.imshow(georaster)
plt.title(local_filename)
plt.show()
So is there any way to convert the raw response from requests API for file directly into image array (in chunks or whole) and display it on notebook (without downloading and taking space on local directory)
The raw response from get request for tiff file is below
resp2 = req.get('tiffFielurl')
rawdata = resp2.content
rawdata[0:10]
Output: b'MM\x00*\x00\x00\x00\x08\x00\x10'
I tried searching for this question but not found any good answer on it so if there is any related question or duplicate provide me the link.

You can try plotting tiff images using ipyplot package
import ipyplot
ipyplot.plot_images(
['https://file-examples.com/wp-content/uploads/2017/10/file_example_TIFF_1MB.tiff',
'https://file-examples.com/wp-content/uploads/2017/10/file_example_TIFF_1MB.tiff',
'https://file-examples.com/wp-content/uploads/2017/10/file_example_TIFF_1MB.tiff'],
img_width=250)

After doing so much research and trying different solutions it seems to me the only way to do the above procedure i.e displaying Tiff file for now is downloading and reading the data using gdal and converting into array and display using matplotlib .
As the solution mentioned in following link is only accepting "PNG" files.
How to plot remote image (from http url)
whcih comes to conclusion we need PIL library that I also tried and fails.
from PIL import Image
resp2 = req.get('tiffFielurl')
resp2.raw.decode_content = True
im = Image.open(resp2.raw)
im
Gives Output:
<PIL.TiffImagePlugin.TiffImageFile image mode=I;16BS size=4800x4800 at 0x11CB7C50>
and converting the PIL object to numpy array or even getting data or pixels from PIL object gives error of unrecognized mode error.
im.getdata()
im.getpixels((0,0))
numpy.array(im)
All gives same error
257 if not self.im or\
258 self.im.mode != self.mode or self.im.size != self.size:
--> 259 self.im = Image.core.new(self.mode, self.size)
260 # create palette (optional)
261 if self.mode == "P":
ValueError: unrecognized mode
It comes out that PIL even don't support 16bit Signed integer pixel that is defined in Tiff object of PIL above.
https://pillow.readthedocs.io/en/4.0.x/handbook/concepts.html#concept-modes

Related

gif generating using imageio.mimwrite

I have the following code to generate GIFs from images, the code works fine and the gif is saved locally, what I want to do rather the saving the GIF locally, I want for example data URI that I could return to my project using a request. How can I generate the GIF and return it without saving it?
my code to generate the GIF
import os
import imageio as iio
import imageio
png_dir='./'
images=[]
for file_name in url:
images.append(file_name)
imageio.imwrite('movie.gif', images, format='gif')
I found I can save it as bytes with the following code
gif_encoded = iio.mimsave("<bytes>", images, format='gif')
it will save the GIF as bytes then you can encode it.
encoded_string = base64.b64encode(gif_encoded)
encoded_string = b'data:image/gif;base64,'+encoded_string
decoded_string = encoded_string.decode()
for more examples check this out
https://imageio.readthedocs.io/en/stable/examples.html#read-from-fancy-sources

Pass PIL Image to google cloud vision without saving and reading

UPDATE BELOW
Is there a way to pass a PIL Image to google cloud vision?
I tried to use io.Bytes, io.String and Image.tobytes() but I always get:
Traceback (most recent call last):
"C:\Users\...\vision_api.py", line 20, in get_text
image = vision.Image(content)
File "C:\...\venv\lib\site-packages\proto\message.py", line 494, in __init__
raise TypeError(
TypeError: Invalid constructor input for Image:b'Ma\x81Ma\x81La\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81La\x81Ma\x81Ma\x81Ma\x81Ma\x80Ma\x81La\x81Ma\x81Ma\x81Ma\x80Ma\x81Ma\x81Ma\x81Ma\x8 ...
or this if I pass the PIL-Image directly:
TypeError: Invalid constructor input for Image: <PIL.Image.Image image mode=RGB size=480x300 at 0x1D707131DC0>
This is my code:
image = Image.open(path).convert('RGB') # Opening the saved image
cropped_image = image.crop((30, 900, 510, 1200)) # Cropping the image
vision_image = vision.Image(# I passed the different options) # Here I need to pass the image, but I don't know how
client = vision.ImageAnnotatorClient()
response = client.text_detection(image=vision_image) # Text detection using google-vision-api
FOR CLARITY:
I want google text detection to only analyse a certain part of an image saved on my disk. So my idea was to crop the image using PIL and then pass the cropped image to google-vision. But it is not possible to pass an PIL-Image to vision.Image, as I get the error above.
The documentation from Google.
This can be found in the vision.Image class:
Attributes:
content (bytes):
Image content, represented as a stream of bytes. Note: As
with all ``bytes`` fields, protobuffers use a pure binary
representation, whereas JSON representations use base64.
Currently, this field only works for BatchAnnotateImages
requests. It does not work for AsyncBatchAnnotateImages
requests.
A working option is to save the PIL-Image as a PNG/JPG on my disk and load it using:
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
vision_image = vision.Image(content=content)
But this is slow and seems unnecessary. And the whole point for me behind using google-vision-api is the speed comaped to open-cv.
UPDATE as of 25/9/2021
from PIL import Image
from io import BytesIO
from google.cloud import vision
with open('images/screenshots/screenshot.png', 'rb') as image_file:
data = image_file.read()
try:
image = vision.Image(content=data)
print('worked')
except TypeError:
print('failed')
im = Image.open('images/screenshots/screenshot.png')
buffer = BytesIO()
im.save(buffer, format='PNG')
try:
image = vision.Image(buffer.getvalue())
print('worked')
except TypeError:
print('failed')
The first version works as expected, but I can't get the second one to work as #Mark Setchell recommended. The first few characters (~50) are the same, the rest is completely different.
UPDATE as of 26/9/2021
Both inputs are of type <class 'bytes'>. The complete error stack can be seen at the top of the question.
Using this code:
print(input_data[:200])
print(type(input_data))
i get the following output:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x048\x00\x00\x07\x80\x08\x06\x00\x00\x00+a\xe7\n\x00\x00\x00\x04sBIT\x08\x08\x08\x08|\x08d\x88\x00\x00 \x00IDATx\x9c\xec\xbdy\xd8-\xc7Y\x1f\xf8\xab\xea>\xe7\xdb\xef\xaa\xbbk\xb3%\xcb\x8b\x16[\x12\xc6\xc8\xbb,\x1b\x03\x06\xc6\x8111\x93#2y\xc2381\x8b1\x90\x10\x9e\xf18\x93\x10\x0811\x84\x192\x0c3\x9e\x1020\x03\x03\xc3\xb0\x04\xf0C0\xc6\x96m\xc9\x96m\xed\xb2dI\x96\xaetu\xf7\xed\xdb\xcf\xe9\xae\x9a?j\xe9\xea\xbd\xba\xbb\xbaO\x9f\xef\x9e\xd7\xd6\xfd\xfat\xbf\xf5Vu-o\xbd\xf5\xeb\xb7\xde"\xef\xff\xc7\'8\x1c\x13\x07\x00\xd2\x82\xcc6\xe5\xc6\xa8B&'
<class 'bytes'>
for the working input.
And:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x048\x00\x00\x07\x80\x08\x06\x00\x00\x00+a\xe7\n\x00\x01\x00\x00IDATx\x9c\xec\xbdw\x80$\xc7u\x1f\xfc\xab\xea\xeeI\x9bw/\'\x1cr\xce\x04#\x10\x04A\x82`\x84\x95%J"\x95,\xcb\x1f%\x91T\xb0$*}\x1fM\xd9\x96\x95EY\x94(\xc9\xb6\x92i+\x90\x12\x83(3)0\x82\x08$rN\x07\\\xce\xb7\xb7yBw\xd5\xf7G\x85\xaeN3\xdd=\xdd\xb3\xb3{\xfb\xc8\xc3\xceLW\xbd\xca\xaf\xde\xfb\xf5\xabW\xe4{\xdeu\x84\xa3`\xe2\x00#J\xe0Y&\xdf\x00e($\x94\x94\'p\xcc\xc3\xda\xe7Y\x0c\xf1Te\x13\xbf\xcc>\xfa:]Y=x\x84\x7f\xe8\xc23u\x1f\x91l\xfd\x99'
<class 'bytes'>
for the failing input.
As far as I can tell, you start off with a PIL Image and you want to obtain a PNG image in memory without going to disk. So you need this:
#!/usr/bin/env python3
from PIL import Image
from io import BytesIO
# Create PIL Image like you have - filled with red
im = Image.new('RGB', (320,240), (255,0,0))
# Create in-memory PNG - like you want for Google Cloud Vision
buffer = BytesIO()
im.save(buffer, format="PNG")
# Look at first few bytes
PNG = buffer.getvalue()
print(PNG[:20])
It prints this, which is exactly what you would get if you wrote the image to disk as a PNG and then read it back as binary - except this does it in memory without going to disk:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01#'
It would be good to have whole error stack and more accurate code snippet. But form presented information this seems to be confusion of two different "Images". Probably the some copy/paste error, as the tutorials have exactly the same line:
response = client.text_detection(image=image)
But mentioned tutorials image is created by vision.Image() so I think in presented code this should be:
response = client.text_detection(image=vision_image)
As, at least if I understand correctly the code snippet, image is PIL Image, while vision_image is Vision Image that should be passed to text_detection method. So whatever is done in vision.Image() does not have effect on the error massage.

PDF to IMG to PDF all done in memory

In order to remove sensitive content from a PDF, I am converting it to image and back to PDF again.
I am able to do this while saving the jpeg image, however I would eventually like to adapt my code so that the file is in memory the whole time. PDF in memory -> JPEG in memory -> PDF in memory. I'm having trouble with the intermediary step.
from pdf2image import convert_from_path, convert_from_bytes
import img2pdf
images = convert_from_path('testing.pdf', fmt='jpeg')
image = images[0]
# opening from filename
with open("output/output.pdf","wb") as f:
f.write(img2pdf.convert(image.tobytes()))
On the last line, I am getting the error:
ImageOpenError: cannot read input image (not jpeg2000). PIL: error reading image: cannot identify image file <_io.BytesIO object at 0x1040cc8f0>
I'm not sure how to be converting this image to the string that img2pdf is looking for.
The pdf2image module will extract the images as Pillow images. And according the Pillow tobytes() documention: "This method returns the raw image data from the internal storage." Which is some bitmap representation.
To get your code working use BytesIO module like so:
# opening from filename
import io
with open("output/output.pdf","wb") as f, io.BytesIO() as output:
image.save(output, format='jpg')
f.write(img2pdf.convert(output.getvalue()))

How to create high res JPEG with Wand from binary string

I'm trying to convert some PDFs to high res jpegs using imagemagick . I'm working on win 10, 64 with python 3.62 - 64 bit and wand 0.4.4. At the command line I have :
$ /e/ImageMagick-6.9.9-Q16-HDRI/convert.exe -density 400 myfile.pdf -scale 2000x1000 test3.jpg.
which is working well for me.
In python:
from wand.image import Image
file_path = os.path.dirname(os.path.abspath(__file__))+os.sep+"myfile.pdf"
with Image(filename=file_path, resolution=400) as image:
image.save()
image_jpeg = image.convert('jpeg')
Which is giving me low res JPEGs . How do I translate this into my wand code to do the same thing?
edit:
I realized that the problem is that the input pdf has to be read into the Image object as a binary string, so based on http://docs.wand-py.org/en/0.4.4/guide/read.html#read-blob I tried:
with open(file_path,'rb') as f:
image_binary = f.read()
f.close()
with Image(blob=image_binary,resolution=400) as img:
img.transform('2000x1000', '100%')
img.make_blob('jpeg')
img.save(filename='out.jpg')
This reads the file in ok, but the output is split into 10 files. Why? I need to get this into 1 high res jpeg.
EDIT:
I need to send the jpeg to an OCR api, so I was wondering if I could write the output to a file like object. Looking at https://www.imagemagick.org/api/magick-image.php#MagickWriteImageFile, I tried :
emptyFile = Image(width=1500, height=2000)
with Image(filename=file_path, resolution=400) as image:
library.MagickResetIterator(image.wand)
# Call C-API Append method.
resource_pointer = library.MagickAppendImages(image.wand,
True)
library.MagickWriteImagesFile(resource_pointer,emptyFile)
This gives:
File "E:/ENVS/r3/pdfminer.six/ocr_space.py", line 113, in <module>
test_file = ocr_stream(filename='test4.jpg')
File "E:/ENVS/r3/pdfminer.six/ocr_space.py", line 96, in ocr_stream
library.MagickWriteImagesFile(resource_pointer,emptyFile)
ctypes.ArgumentError: argument 2: <class 'TypeError'>: wrong type
How can I get this working?
Why? I need to get this into 1 high res jpeg.
The PDF contains pages that ImageMagick considers individual images in a "stack". The wand library provides a wand.image.Image.sequance to work with each page.
However, to append all images into a single JPEG. You can either iterate over each page & stitch them together, or call C-API's method MagickAppendImages.
from wand.image import Image
from wand.api import library
import ctypes
# Map C-API not provided by wand library.
library.MagickAppendImages.argtypes = [ctypes.c_void_p, ctypes.c_int]
library.MagickAppendImages.restype = ctypes.c_void_p
with Image(filename="path_to_document.pdf", resolution=400) as image:
# Do all your preprocessing first
# Ether word directly on the wand instance, or iterate over each page.
# ...
# To write all "pages" into a single image.
# Reset the stack iterator.
library.MagickResetIterator(image.wand)
# Call C-API Append method.
resource_pointer = library.MagickAppendImages(image.wand,
True)
# Write C resource directly to disk.
library.MagickWriteImages(resource_pointer,
"output.jpeg".encode("ASCII"),
False)
Update:
I need to send the jpeg to an OCR api ...
Assuming your using OpenCV's python API, you'll only need to iterate over each page, and pass the image-file data to the OCR via numpy buffers.
from wand.image import Image
import numpy
import cv2
def ocr_process(file_data_buffer):
""" Replace with whatever your OCR-API calls for """
mat_instance = cv2.imdecode(file_data_buffer)
# ... work ...
source_image="path_to_document.pdf"
with Image(filename=source_image, resolution=400) as img:
for page in img.sequence:
file_buffer = numpy.asarray(bytearray(page.make_blob("JPEG")),
dtype=numpy.uint8)
ocr_process(file_buffer)
so I was wondering if I could write the output to a file like object
Don't assume that python "image" objects (or underlining C structures) from different libraries are comparable with each other.
Without knowing the OCR api, I can't help you past the wand part, but I can suggest one of the following...
Use temporary intermediate files. (slower I/O, but easier to learn/develop/debug)
with Image(filename=INPUT_PATH) as img:
# work
img.save(filename=OUTPUT_PATH)
# OCR work on OUTPUT_PATH
Use file descriptors if the OCR API supports it. (Same as above)
with open(INPUT_PATH, 'rb') as fd:
with Image(file=fd) as img:
# work
# OCR work ???
Use blobs. (faster I/O but need a lot more memory)
buffer = None
with Image(filename=INPUT_PATH) as img:
# work
buffer = img.make_blob(FORMAT)
if buffer:
# OCR work ???
Even More Updates
Wrapping all the comments together, a solution might be...
from wand.image import Image
from wand.api import library
import ctypes
import requests
# Map C-API not provided by wand library.
library.MagickAppendImages.argtypes = [ctypes.c_void_p, ctypes.c_int]
library.MagickAppendImages.restype = ctypes.c_void_p
with Image(filename='path_to_document.pdf', resolution=400) as image:
# ... Do pre-processing ...
# Reset the stack iterator.
library.MagickResetIterator(image.wand)
# Call C-API Append method.
resource_pointer = library.MagickAppendImages(image.wand, True)
# Convert to JPEG.
library.MagickSetImageFormat(resource_pointer, b'JPEG')
# Create size sentinel.
length = ctypes.c_size_t()
# Write image blob to memory.
image_data_pointer = library.MagickGetImagesBlob(resource_pointer,
ctypes.byref(length))
# Ensure success
if image_data_pointer and length.value:
# Create buffer from memory address
payload = ctypes.string_at(image_data_pointer, length.value)
# Define local filename.
payload_filename = 'my_hires_image.jpg'
# Post payload as multipart encoded image file with filename.
requests.post(THE_URL, files={'file': (payload_filename, payload)})
What about something like:
ok = Image(filename=file_path, resolution=400)
with ok.transform('2000x1000', '100%') as image:
image.compression_quality = 100
image.save()
or:
with ok.resize(2000, 1000)
related:
https://github.com/dahlia/wand/blob/13c4f544bd271fe298ac8dde44fbf178b349361a/docs/guide/resizecrop.rst
Python 3 Wand How to make an unanimated gif from multiple PDF pages

Failed to GET matplotlib generated png in django

I want to serve matplotlib generated images with django.
If the image is a static png file, the following code works great:
from django.http import HttpResponse
def static_image_view(request):
response = HttpResponse(mimetype='image/png')
with open('test.png', 'rb') as f:
response.write(f.read())
return response
However, if the image is dynamically generated:
import numpy as np
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
def dynamic_image_view(request):
response = HttpResponse(mimetype='image/png')
fig = plt.figure()
plt.plot(np.random.rand(100))
plt.savefig(response, format='png')
plt.close(fig)
return response
When accessing the url in Chrome (v36.0), the image will show up for a few seconds, then disappear and turn to the alt text. It seems that the browser doesn't know the image has already finished loading and waits until timeout. Checking with Chrome > Tools > Developer tools > Network supports this hypothesis: although the image appears after only about 1 sec, the status of the corresponding http request becomes "failed" after about 5 sec.
Note again, this strange phenomenon occurs only with the dynamically generated image, so it shouldn't be Chrome's problem (though it doesn't happen with IE or FireFox, presumably due to different rules in dealing with timeout requests).
To make it more tricky (i.e., hard to reproduce), it seems to be network speed dependent. It happens if I access the url from an IP in China, but not if via a proxy in the US (which seems to be faster visiting the host on which django is running)...
According to #HSquirrel, I tested writing the png into temporary disk file. Strangely, saving file with matplotlib didn't work,
plt.savefig('MPL.png', format='png')
with open('MPL.png', 'rb') as f:
response.write(f.read())
while saving file with PIL worked:
import io
from PIL import Image
f = io.BytesIO()
plt.savefig(f, format='png')
f.seek(0)
im = Image.open(f)
im.save('PIL.png', 'PNG')
Attempt of getting rid of temp file failed:
im.save(response, 'PNG')
However, if I generate the image data stream with PIL rather than matplotlib, temporary disk file would be unnecessary. The following code works:
from PIL import Image, ImageDraw
im = Image.new('RGBA', (256,256), (0,255,0,255))
draw = ImageDraw.Draw(im)
draw.line((100,100, 150,200), fill=128, width=3)
im.save(response, 'PNG')
Finally, savefig(response, 'jepg') has no problem at all.
Have you tried saving the image to disk and then returning that? (you can periodically clear your disk of such generated images based on their time of creation)
If that gives the same problem, it might be a problem with the way the png is generated. Than you could use some kind of image library (like PIL) to make sure all your png's are (re)generated in a way that works with all browsers.
EDIT:
I've checked the png you've linked and I've played around with it a bit, opening and saving it with different programs and with PIL. I get different binary data every time. It seems each program decides which chunks to keep and which to remove. They all encode the png image data differently as well (as far as I can see, I am by no means a specialist in this, I just looked at the binary data based on the specs).
There are a few different paths you can take:
1.The quick and dirty one:
import io
from PIL import Image
f = io.BytesIO()
plt.savefig(f, format='png')
f.seek(0)
im = Image.open(f)
tempfilename = generatetempfilename()
im.save(tempfilename, 'PNG')
with open(tempfilename, 'rb') as f:
response.write(f.read())
2.Adapt how matplotlib makes PNG files (possibly by just using PIL for
it as well). See
http://matplotlib.org/users/customizing.html#customizing-matplotlib
3.If it's an option for you, use jpeg.
4.Figure out what's wrong with the PNG generated by matplotlib and fix
it binary (I don't recommend this). You can use xxd (linux command: xxd test.png) to figure out how the files look in binary and then see how things go using the png spec: overview chunk spec

Categories

Resources