Paste PDF image into Pyplot figure - python

How can I plot the image from a PDF file into a Pyplot figure (e.g. with plt.imshow, or inside some container I can add with ax.add_artist)?
Methods that do not work:
import matplotlib.pyplot as plt
im = plt.imread('file.pdf')
(Source: this question, where it works for PNG files.)
from PIL import Image
im = Image.open('file.pdf')
(Source: this doc, but again, it doesn't work for PDF files; the question links a library to read PDFs but the doc shows no obvious way to add them to a Pyplot figure.)
Also, this question exists, but the answers solve the problem without actually loading a PDF file.

There is a module called PyMuPDF that makes this job a lot easier.
Scraping PDF images into PIL Image
To scrape the individual images out of each page tutorials can be found here and here on how to convert them into PIL format.
If the intention is to grab an entire PDF page or pages, the page.get_pixmap() documented here, can do this.
The snippet below shows how to iterate through and grab each page of a PDF as a PIL.Image
import io
import fitz
from PIL import Image
file = 'myfile.pdf'
pdf_file = fitz.open(file)
# in case there is a need to loop through multiple PDF pages
for page_number in range(len(pdf_file)):
page = pdf_file[page_number]
rgb = page.get_pixmap()
pil_image = Image.open(io.BytesIO(rgb.tobytes()))
# display code or image manipulation here for each page #
Displaying scraped PDF Image
In either case, once there is a PIL.Image object, such as the pil_image variable above, the show() function can display it (and does so differently depending on the OS). However, if the preference is to use matplotlib.pyplot.imshow the PIL.Image must be converted to RGB first.
Snippet to display PIL.Image with pyplot.imshow
import matplotlib.pyplot as plt
plt.imshow(pil_image.convert('RGB'))

Related

How to extract specific text from a pdf using python?

These are the items which are needed to be extracted from the pdf:
This is the link to the PDF.
Could anyone solve this problem using Python with proper comments to help me understand?
import pdf2image
from PIL import Image
import pytesseract
image = pdf2image.convert_from_path('/content/SRW1012022Y0002378_220216102321.PDF')
for pagenumber, page in enumerate(image):
detected_text = pytesseract.image_to_string(page)
print(detected_text)
I tried the above code snippet, and I can extract all the text from pdf, but I can't grab specific text to continue applying logic to it.

How do I add an image from a list in python using docx?

I wrote a code that takes a screenshot that I want to paste into a word document using docx. So far I have to save the image as a png file. The relevant part of my code is:
from docx import Document
import pyautogui
import docx
doc = Document()
images = []
img = pyautogui.screenshot(region = (some region))
images.append(img)
img.save(imagepath.png)
run =doc.add_picture(imagepath.png)
run
I would like to be able to add the image without saving it. Is it possible to do this using docx?
Yes, according to add_picture — Document objects — python-docx 0.8.10 documentation, add_picture can import data from a stream as well.
As per Screenshot Functions — PyAutoGUI 1.0.0 documentation, screenshot() produces a PIL/Pillow image object which can be save()'d with a BytesIO() as destination to produce a compressed image data stream in memory.
So that'll be:
import io
imdata = io.BytesIO()
img.save(imdata, format='png')
imdata.seek(0)
doc.add_picture(imdata)
del imdata # cannot reuse it for other pictures, you need a clean buffer each time
# can use .truncate(0) then .seek(0) instead but this is probably easier

How can I display .png file in a the Microsoft Azure Jupyter Notebook [duplicate]

I would like to use an IPython notebook as a way to interactively analyze some genome charts I am making with Biopython's GenomeDiagram module. While there is extensive documentation on how to use matplotlib to get graphs inline in IPython notebook, GenomeDiagram uses the ReportLab toolkit which I don't think is supported for inline graphing in IPython.
I was thinking, however, that a way around this would be to write out the plot/genome diagram to a file and then open the image inline which would have the same result with something like this:
gd_diagram.write("test.png", "PNG")
display(file="test.png")
However, I can't figure out how to do this - or know if it's possible. So does anyone know if images can be opened/displayed in IPython?
Courtesy of this post, you can do the following:
from IPython.display import Image
Image(filename='test.png')
(official docs)
If you are trying to display an Image in this way inside a loop, then you need to wrap the Image constructor in a display method.
from IPython.display import Image, display
listOfImageNames = ['/path/to/images/1.png',
'/path/to/images/2.png']
for imageName in listOfImageNames:
display(Image(filename=imageName))
Note, until now posted solutions only work for png and jpg!
If you want it even easier without importing further libraries or you want to display an animated or not animated GIF File in your Ipython Notebook. Transform the line where you want to display it to markdown and use this nice short hack!
![alt text](test.gif "Title")
This will import and display a .jpg image in Jupyter (tested with Python 2.7 in Anaconda environment)
from IPython.display import display
from PIL import Image
path="/path/to/image.jpg"
display(Image.open(path))
You may need to install PIL
in Anaconda this is done by typing
conda install pillow
If you want to efficiently display big number of images I recommend using IPyPlot package
import ipyplot
ipyplot.plot_images(images_array, max_images=20, img_width=150)
There are some other useful functions in that package where you can display images in interactive tabs (separate tab for each label/class) which is very helpful for all the ML classification tasks.
You could use in html code in markdown section:
example:
<img src="https://www.tensorflow.org/images/colab_logo_32px.png" />
A cleaner Python3 version that use standard numpy, matplotlib and PIL. Merging the answer for opening from URL.
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
pil_im = Image.open('image.png') #Take jpg + png
## Uncomment to open from URL
#import requests
#r = requests.get('https://www.vegvesen.no/public/webkamera/kamera?id=131206')
#pil_im = Image.open(BytesIO(r.content))
im_array = np.asarray(pil_im)
plt.imshow(im_array)
plt.show()
Courtesy of this page, I found this worked when the suggestions above didn't:
import PIL.Image
from cStringIO import StringIO
import IPython.display
import numpy as np
def showarray(a, fmt='png'):
a = np.uint8(a)
f = StringIO()
PIL.Image.fromarray(a).save(f, fmt)
IPython.display.display(IPython.display.Image(data=f.getvalue()))
from IPython.display import Image
Image(filename =r'C:\user\path')
I've seen some solutions and some wont work because of the raw directory, when adding codes like the one above, just remember to add 'r' before the directory. this should avoid this kind of error: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
If you are looking to embed your image into ipython notebook from the local host, you can do the following:
First: find the current local path:
# show current directory
import os
cwd = os.getcwd()
cwd
The result for example would be:
'C:\\Users\\lenovo\\Tutorials'
Next, embed your image as follows:
from IPython.display import display
from PIL import Image
path="C:\\Users\\lenovo\\Tutorials\\Data_Science\\DS images\\your_image.jpeg"
display(Image.open(path))
Make sure that you choose the right image type among jpg, jpeg or png.
Another option for plotting inline from an array of images could be:
import IPython
def showimg(a):
IPython.display.display(PIL.Image.fromarray(a))
where a is an array
a.shape
(720, 1280, 3)
You can directly use this instead of importing PIL
from IPython.display import Image, display
display(Image(base_image_path))
Another opt is:
from matplotlib import pyplot as plt
from io import BytesIO
from PIL import Image
import Ipython
f = BytesIO()
plt.savefig(f, format='png')
Ipython.display.display(Ipython.display.Image(data=f.getvalue()))
f.close()
When using GenomeDiagram with Jupyter (iPython), the easiest way to display images is by converting the GenomeDiagram to a PNG image. This can be wrapped using an IPython.display.Image object to make it display in the notebook.
from Bio.Graphics import GenomeDiagram
from Bio.SeqFeature import SeqFeature, FeatureLocation
from IPython.display import display, Image
gd_diagram = GenomeDiagram.Diagram("Test diagram")
gd_track_for_features = gd_diagram.new_track(1, name="Annotated Features")
gd_feature_set = gd_track_for_features.new_set()
gd_feature_set.add_feature(SeqFeature(FeatureLocation(25, 75), strand=+1))
gd_diagram.draw(format="linear", orientation="landscape", pagesize='A4',
fragments=1, start=0, end=100)
Image(gd_diagram.write_to_string("PNG"))
[See Notebook]
This is the solution using opencv-python, but it opens new windows which is busy in waiting
import cv2 # pip install opencv-python
image = cv2.imread("foo.png")
cv2.imshow('test',image)
cv2.waitKey(duration) # in milliseconds; duration=0 means waiting forever
cv2.destroyAllWindows()
if you don't want to display image in another window, using matplotlib or whatever instead cv2.imshow()
import cv2
import matplotlib.pyplot as plt
image = cv2.imread("foo.png")
plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
plt.show()

Failed to GET matplotlib generated png in django

I want to serve matplotlib generated images with django.
If the image is a static png file, the following code works great:
from django.http import HttpResponse
def static_image_view(request):
response = HttpResponse(mimetype='image/png')
with open('test.png', 'rb') as f:
response.write(f.read())
return response
However, if the image is dynamically generated:
import numpy as np
import matplotlib
matplotlib.use('Agg')
from matplotlib import pyplot as plt
def dynamic_image_view(request):
response = HttpResponse(mimetype='image/png')
fig = plt.figure()
plt.plot(np.random.rand(100))
plt.savefig(response, format='png')
plt.close(fig)
return response
When accessing the url in Chrome (v36.0), the image will show up for a few seconds, then disappear and turn to the alt text. It seems that the browser doesn't know the image has already finished loading and waits until timeout. Checking with Chrome > Tools > Developer tools > Network supports this hypothesis: although the image appears after only about 1 sec, the status of the corresponding http request becomes "failed" after about 5 sec.
Note again, this strange phenomenon occurs only with the dynamically generated image, so it shouldn't be Chrome's problem (though it doesn't happen with IE or FireFox, presumably due to different rules in dealing with timeout requests).
To make it more tricky (i.e., hard to reproduce), it seems to be network speed dependent. It happens if I access the url from an IP in China, but not if via a proxy in the US (which seems to be faster visiting the host on which django is running)...
According to #HSquirrel, I tested writing the png into temporary disk file. Strangely, saving file with matplotlib didn't work,
plt.savefig('MPL.png', format='png')
with open('MPL.png', 'rb') as f:
response.write(f.read())
while saving file with PIL worked:
import io
from PIL import Image
f = io.BytesIO()
plt.savefig(f, format='png')
f.seek(0)
im = Image.open(f)
im.save('PIL.png', 'PNG')
Attempt of getting rid of temp file failed:
im.save(response, 'PNG')
However, if I generate the image data stream with PIL rather than matplotlib, temporary disk file would be unnecessary. The following code works:
from PIL import Image, ImageDraw
im = Image.new('RGBA', (256,256), (0,255,0,255))
draw = ImageDraw.Draw(im)
draw.line((100,100, 150,200), fill=128, width=3)
im.save(response, 'PNG')
Finally, savefig(response, 'jepg') has no problem at all.
Have you tried saving the image to disk and then returning that? (you can periodically clear your disk of such generated images based on their time of creation)
If that gives the same problem, it might be a problem with the way the png is generated. Than you could use some kind of image library (like PIL) to make sure all your png's are (re)generated in a way that works with all browsers.
EDIT:
I've checked the png you've linked and I've played around with it a bit, opening and saving it with different programs and with PIL. I get different binary data every time. It seems each program decides which chunks to keep and which to remove. They all encode the png image data differently as well (as far as I can see, I am by no means a specialist in this, I just looked at the binary data based on the specs).
There are a few different paths you can take:
1.The quick and dirty one:
import io
from PIL import Image
f = io.BytesIO()
plt.savefig(f, format='png')
f.seek(0)
im = Image.open(f)
tempfilename = generatetempfilename()
im.save(tempfilename, 'PNG')
with open(tempfilename, 'rb') as f:
response.write(f.read())
2.Adapt how matplotlib makes PNG files (possibly by just using PIL for
it as well). See
http://matplotlib.org/users/customizing.html#customizing-matplotlib
3.If it's an option for you, use jpeg.
4.Figure out what's wrong with the PNG generated by matplotlib and fix
it binary (I don't recommend this). You can use xxd (linux command: xxd test.png) to figure out how the files look in binary and then see how things go using the png spec: overview chunk spec

Grabbing animated gif with python script

I've been playing around with Pythonista on iOS to create some automation scripts.
I have a problem where I'm trying to grab an animated gif from a remote url. I've come up with the following script.
import Image
from urllib import urlopen
from io import BytesIO
url = "http://someurl.com/funny.gif"
img = Image.open(BytesIO(urlopen(url).read()))
I get the image but it only appears to be the first frame of the gif? I'm guessing it has something to do with the BytesIO not reading in the whole file but I'm not sure?
Hope I'm along the right lines.
You're almost there. You use img.seek to advance frames. So..
import Image
from urllib import urlopen
from io import BytesIO
url = 'http://upload.wikimedia.org/wikipedia/commons/2/2c/Rotating_earth_%28large%29.gif'
img = Image.open(BytesIO(urlopen(url).read()))
# Start with first frame
img.seek(0)
#img.show()
# Advance by one
img.seek(img.tell() + 1)
#img.show()
Here's a SO post showing how to save a gif using the Image class.
According to Pillow Manual:
To save all frames, the save_all parameter must be present and set to True.
So, opened image could be save by:
image.save('filename.gif', save_all=True)

Categories

Resources