What imaging modules for python will allow you to take a specific size screenshot (not whole screen)?
I have tried PIL, but can't seem to make ImageGrab.grab() select a small rectangle
and i have tried PyGame but i can't make it take a screen shot outside of it's main display panel
You can use pyscreenshot module.
The pyscreenshot module can be used to copy the contents of the screen to a PIL image memory or file.
You can install it using pip.
$ sudo pip install pyscreenshot
Usage:
import pyscreenshot as ImageGrab
# fullscreen
im=ImageGrab.grab()
im.show()
# part of the screen
im=ImageGrab.grab(bbox=(10,10,500,500))
im.show()
# to file
ImageGrab.grab_to_file('im.png')
I have tried PIL, but can't seem to make ImageGrab.grab() select a small rectangle
What did you try?
As the documentation for ImageGrab clearly states, the function has a bbox parameter, and:
The pixels inside the bounding box are returned as an “RGB” image. If the bounding box is omitted, the entire screen is copied.
So, you only get the whole screen if you don't pass a bbox.
Note that, although I linked to the Pillow docs (and you should be using Pillow), old-school PIL's docs say the same thing:
The bounding box argument can be used to copy only a part of the screen.
So, unless you're using a really, really old version of PIL (before 1.1.3, which I believe is more than a decade out of date), it has this feature.
1) Use pyscreenshot, ImageGrab works but only on Windows
2) Grab the image and box it, then save that image
3) Don't use ImageGrab.grab_to_file, it saves the full size image
4) You don't need to show the image with im.show if you just want to save a screenshot
import pyscreenshot as ImageGrab
im=ImageGrab.grab(bbox=(10,10,500,500))
im.save('im.png')
You could use Python MSS.
From documentation to capture only a part of the screen:
import mss
import mss.tools
with mss.mss() as sct:
# The screen part to capture
monitor = {"top": 160, "left": 160, "width": 160, "height": 135}
output = "sct-{top}x{left}_{width}x{height}.png".format(**monitor)
# Grab the data
sct_img = sct.grab(monitor)
# Save to the picture file
mss.tools.to_png(sct_img.rgb, sct_img.size, output=output)
print(output)
You can use pyscreenshot at linux or windows platforms . I am using Ubuntu it works for me. You can force if subprocess is applied setting it to false together with mss gives the best performance.
import pyscreenshot as ImageGrab
import time
t1 = time.time()
imgScreen = ImageGrab.grab(backend="mss", childprocess=False)
img = imgScreen.resize((640,480))
img.save("screen.png")
t2 = time.time()
print("The passing time",(t2-t1))
Related
I am trying to access my webcam on Pygame to take pictures and save them BUT everytime I run the code:
import pygame.camera
import pygame.image
import sys
pygame.camera.init()
cameras = pygame.camera.list_cameras()
print ("Using camera %s ..." % cameras[0])
webcam = pygame.camera.Camera(cameras[0])
webcam.start()
# grab first frame
img = webcam.get_image()
WIDTH = img.get_width()
HEIGHT = img.get_height()
screen = pygame.display.set_mode( ( WIDTH, HEIGHT ) )
pygame.display.set_caption("pyGame Camera View")
while True :
for e in pygame.event.get() :
if e.type == pygame.QUIT :
sys.exit()
# draw frame
screen.blit(img, (0,0))
pygame.display.flip()
# grab next frame
img = webcam.get_image()
I get this in IDLE:
Traceback (most recent call last):
File "/Users/Victor/Documents/Python Related/Python Code for Class/blah.py", line 5, in <module>
pygame.camera.init()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pygame/camera.py", line 42, in init
from pygame import _camera
ImportError: cannot import name '_camera'
Is there something I am doing wrong or another module I can install on Python that can take pictures through the internal or external webcam of the device and store it or send it where it needs to be?
Thank you
According to the documentation for pygame.camera:
Pygame currently supports only Linux and v4l2 cameras.
Seeing as you're using a Mac and not Linux, this won't work. For the future, it is very helpful to read the documentation for libraries and modules you're using, since common issues such as this are usually addressed.
You can use OpenCV which is a robust tool for image acquisition and manipulation.
OpenCV accepts different languages and python is one of them, so you can just import the module and create a function that takes the picture and returns the image in a variable (or you can store the picture in a file and later read it).
Take a look at this link to get an idea of how to acquire images or videos with openCV 2 (and python 2.7)
https://codeplasma.com/2012/11/02/getting-webcam-images-with-python-and-opencv/
Or if you want to use the latest 3.0-beta version of openCV:
http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_gui/py_video_display/py_video_display.html
EDIT:
To install OpenCV on Mac you can follow the following link:
http://docs.opencv.org/2.4/doc/tutorials/introduction/ios_install/ios_install.html#ios-installation
PD: You can also install Linux in your Mac and join the opensource comunity :)
I am trying to take a screenshot for my screen(s).
I am aware of the function
pyautogui.screenshot()
The problem with this function is that it can take a screenshot for one screen only. I am trying to take a full screenshot for all available screens (typically two). But, it does not seem to work in this regards.
Given the fact that you want to use a Windows system I would like to suggest you to use Desktopmagic, a Python Library.
Here's an example:
from __future__ import print_function
from desktopmagic.screengrab_win32 import (
getDisplayRects, saveScreenToBmp, saveRectToBmp, getScreenAsImage,
getRectAsImage, getDisplaysAsImages)
# Save the entire virtual screen as a BMP (no PIL required)
saveScreenToBmp('screencapture_entire.bmp')
# Save an arbitrary rectangle of the virtual screen as a BMP (no PIL required)
saveRectToBmp('screencapture_256_256.bmp', rect=(0, 0, 256, 256))
# Save the entire virtual screen as a PNG
entireScreen = getScreenAsImage()
entireScreen.save('screencapture_entire.png', format='png')
# Get bounding rectangles for all displays, in display order
print("Display rects are:", getDisplayRects())
# -> something like [(0, 0, 1280, 1024), (-1280, 0, 0, 1024), (1280, -176, 3200, 1024)]
# Capture an arbitrary rectangle of the virtual screen: (left, top, right, bottom)
rect256 = getRectAsImage((0, 0, 256, 256))
rect256.save('screencapture_256_256.png', format='png')
# Unsynchronized capture, one display at a time.
# If you need all displays, use getDisplaysAsImages() instead.
for displayNumber, rect in enumerate(getDisplayRects(), 1):
imDisplay = getRectAsImage(rect)
imDisplay.save('screencapture_unsync_display_%d.png' % (displayNumber,), format='png')
# Synchronized capture, entire virtual screen at once, cropped to one Image per display.
for displayNumber, im in enumerate(getDisplaysAsImages(), 1):
im.save('screencapture_sync_display_%d.png' % (displayNumber,), format='png')
I would suggest another module if you mind: MSS (you do not need PIL or any other module, just Python; and it is cross-platform):
from mss import mss
with mss() as sct:
sct.shot(mon=-1, output="fullscreen.png")
The documentation tries to explain more things, if you are interested.
Using pyscreenshot library worked for me, I took a screenshot of all screens.
Source: https://pypi.org/project/pyscreenshot/
#-- include('examples/showgrabfullscreen.py') --#
import pyscreenshot as ImageGrab
if __name__ == '__main__':
# grab fullscreen
im = ImageGrab.grab()
# save image file
im.save('screenshot.png')
# show image in a window
im.show()
#-#
If you don't want to open the GUI, just comment the im.show() line.
The problem
Im trying to capture my desktop with OpenCV and have Tesseract OCR find text and set it as a variable, for example, if I was going to play a game and have the capturing frame over a resource amount, I want it to print that and use it. A perfect example of this is a video by Micheal Reeves
where whenever he loses health in a game it shows it and sends it to his Bluetooth enabled airsoft gun to shoot him. So far I have this:
# imports
from PIL import ImageGrab
from PIL import Image
import numpy as np
import pytesseract
import argparse
import cv2
import os
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter("output.avi", fourcc, 5.0, (1366, 768))
while(True):
x = 760
y = 968
ox = 50
oy = 22
# screen capture
img = ImageGrab.grab(bbox=(x, y, x + ox, y + oy))
img_np = np.array(img)
frame = cv2.cvtColor(img_np, cv2.COLOR_BGR2RGB)
cv2.imshow("Screen", frame)
out.write(frame)
if cv2.waitKey(1) == 0:
break
out.release()
cv2.destroyAllWindows()
it captures real-time and displays it in a window but I have no clue how to make it recognise the text every frame and output it.
any help?
It's fairly simple to grab the screen and pass it to tesseract for OCRing.
The PIL (pillow) library can grab the frames easily on MacOS and Windows. However, this feature has only recently been added for Linux, so the code below works around it not existing. (I'm on Ubuntu 19.10 and my Pillow does not support it).
Essentially the user starts the program with screen-region rectangle co-ordinates. The main loop continually grabs this area of the screen, feeding it to Tesseract. If Tesseract finds any non-whitespace text in that image, it is written to stdout.
Note that this is not a proper Real Time system. There is no guarantee of timeliness, each frame takes as long as it takes. Your machine might get 60 FPS or it might get 6. This will also be greatly influenced by the size of the rectangle your ask it to monitor.
#! /usr/bin/env python3
import sys
import pytesseract
from PIL import Image
# Import ImageGrab if possible, might fail on Linux
try:
from PIL import ImageGrab
use_grab = True
except Exception as ex:
# Some older versions of pillow don't support ImageGrab on Linux
# In which case we will use XLib
if ( sys.platform == 'linux' ):
from Xlib import display, X
use_grab = False
else:
raise ex
def screenGrab( rect ):
""" Given a rectangle, return a PIL Image of that part of the screen.
Handles a Linux installation with and older Pillow by falling-back
to using XLib """
global use_grab
x, y, width, height = rect
if ( use_grab ):
image = PIL.ImageGrab.grab( bbox=[ x, y, x+width, y+height ] )
else:
# ImageGrab can be missing under Linux
dsp = display.Display()
root = dsp.screen().root
raw_image = root.get_image( x, y, width, height, X.ZPixmap, 0xffffffff )
image = Image.frombuffer( "RGB", ( width, height ), raw_image.data, "raw", "BGRX", 0, 1 )
# DEBUG image.save( '/tmp/screen_grab.png', 'PNG' )
return image
### Do some rudimentary command line argument handling
### So the user can speicify the area of the screen to watch
if ( __name__ == "__main__" ):
EXE = sys.argv[0]
del( sys.argv[0] )
# EDIT: catch zero-args
if ( len( sys.argv ) != 4 or sys.argv[0] in ( '--help', '-h', '-?', '/?' ) ): # some minor help
sys.stderr.write( EXE + ": monitors section of screen for text\n" )
sys.stderr.write( EXE + ": Give x, y, width, height as arguments\n" )
sys.exit( 1 )
# TODO - add error checking
x = int( sys.argv[0] )
y = int( sys.argv[1] )
width = int( sys.argv[2] )
height = int( sys.argv[3] )
# Area of screen to monitor
screen_rect = [ x, y, width, height ]
print( EXE + ": watching " + str( screen_rect ) )
### Loop forever, monitoring the user-specified rectangle of the screen
while ( True ):
image = screenGrab( screen_rect ) # Grab the area of the screen
text = pytesseract.image_to_string( image ) # OCR the image
# IF the OCR found anything, write it to stdout.
text = text.strip()
if ( len( text ) > 0 ):
print( text )
This answer was cobbled together from various other answers on SO.
If you use this answer for anything regularly, it would be worth adding a rate-limiter to save some CPU. It could probably sleep for half a second every loop.
Tesseract is a single-use command-line application using files for input and output, meaning every OCR call creates a new process and initializes a new Tesseract engine, which includes reading multi-megabyte data files from disk. Its suitability as a real-time OCR engine will depend on the exact use case—more pixels requires more time—and which parameters are provided to tune the OCR engine. Some experimentation may ultimately be required to tune the engine to the exact scenario, but also expect the time required to OCR for a frame may exceed the frame time and a reduction in the frequency of OCR execution may be required, i.e. performing OCR at 10-20 FPS rather than 60+ FPS the game may be running at.
In my experience, a reasonably complex document in a 2200x1700px image can take anywhere from 0.5s to 2s using the english fast model with 4 cores (the default) on an aging CPU, however this "complex document" represents the worst-case scenario and makes no assumptions on the structure of the text being recognized. For many scenarios, such as extracting data from a game screen, assumptions can be made to implement a few optimizations and speed up OCR:
Reduce the size of the input image. When extracting specific information from the screen, crop the grabbed screen image as much as possible to only that information. If you're trying to extract a value like health, crop the image around just the health value.
Use the "fast" trained models to improve speed at the cost of accuracy. You can use the -l option to specify different models and the --testdata-dir option to specify the directory containing your model files. You can download multiple models and rename the files to "eng_fast.traineddata", "eng_best.traineddata", etc.
Use the --psm parameter to prevent page segmentation not required for your scenario. --psm 7 may be the best option for singular pieces of information, but play around with different values and find which works best.
Restrict the allowed character set if you know which characters will be used, such as if you're only looking for numerics, by changing the whitelist configuration value: -c tessedit_char_whitelist='1234567890'.
pytesseract is the best way to get started with implementing Tesseract, and the library can handle image input directly (although it saves the image to a file before passing to Tesseract) and pass the resulting text back using image_to_string(...).
import pytesseract
# Capture frame...
# If the frame requires cropping:
frame = frame[y:y + h, x:x + w]
# Perform OCR
text = pytesseract.image_to_string(frame, lang="eng_fast" config="--psm 7")
# Process the result
health = int(text)
Alright, I was having the same issue as you so I did some research into it and I'm sure that I found the solution! First, you will need these libraries:
cv2
pytesseract
Pillow(PIL)
numpy
Installation:
To install cv2, simply use this in a command line/command prompt: pip install opencv-python
Installing pytesseract is a little bit harder as you also need to pre-install Tesseract which is the program that actually does the ocr reading. First, follow this tutorial on how to install Tesseract. After that, in a command line/command prompt just use the command: pip install pytesseract
If you don't install this right you will get an error using the ocr
To install Pillow use the following command in a command-line/command prompt: python -m pip install --upgrade Pillow or python3 -m pip install --upgrade Pillow. The one that uses python works for me
To install NumPy, use the following command in a command-line/command prompt: pip install numpy. Thought it's usually already installed in most python libraries.
Code:
This code was made by me and as of right now it works how I want it to and similar to the effect that Michal had. It will take the top left of your screen, take a recorded image of it and show a window display of the image it's currently using OCR to read. Then in the console, it is printing out the text that it read on the screen.
# OCR Screen Scanner
# By Dornu Inene
# Libraries that you show have all installed
import cv2
import numpy as np
import pytesseract
# We only need the ImageGrab class from PIL
from PIL import ImageGrab
# Run forever unless you press Esc
while True:
# This instance will generate an image from
# the point of (115, 143) and (569, 283) in format of (x, y)
cap = ImageGrab.grab(bbox=(115, 143, 569, 283))
# For us to use cv2.imshow we need to convert the image into a numpy array
cap_arr = np.array(cap)
# This isn't really needed for getting the text from a window but
# It will show the image that it is reading it from
# cv2.imshow() shows a window display and it is using the image that we got
# use array as input to image
cv2.imshow("", cap_arr)
# Read the image that was grabbed from ImageGrab.grab using pytesseract.image_to_string
# This is the main thing that will collect the text information from that specific area of the window
text = pytesseract.image_to_string(cap)
# This just removes spaces from the beginning and ends of text
# and makes the the it reads more clean
text = text.strip()
# If any text was translated from the image, print it
if len(text) > 0:
print(text)
# This line will break the while loop when you press Esc
if cv2.waitKey(1) == 27:
break
# This will make sure all windows created from cv2 is destroyed
cv2.destroyAllWindows()
I hope this helped you with what you were looking for, it sure did help me!
I just want to display python image object in wxpython panel.
What i am trying to accomplish is taking screenshot using python ImageGrab or autopy and showing it in wxpython panel. My screenshot program running every second so there is no point save the image for wx.
import pyscreenshot as ImageGrab
imgobj = ImageGrab.grab()
print imgobj
# Convert into wximage
myWxImage = wx.EmptyImage( imgobj.size[0], imgobj.size[0] )
myWxImage.SetData( imgobj.convert( 'RGB' ).tostring())
Output
<Image.Image image mode=RGB size=1366x768 at 0x20B98F0>
ValueError:Invalid data buffer size.
It's not easy to tell without a full traceback, but I think the issue might be this typo:
myWxImage = wx.EmptyImage( imgobj.size[0], imgobj.size[0] )
Should be:
myWxImage = wx.EmptyImage( imgobj.size[0], imgobj.size[1] )
Also you could make your code simpler with:
myWxImage = wx.ImageFromBuffer(imgobj.size[0], imgobj.size[1], imgobj.convert('RGB').tostring())
I assume your problem is that you are passing the image width as new height to wx...
And btw instead you could use
myWxImage = wx.ImageFromBuffer( imgobj.size[0], imgobj.size[1], imgobj.tostring() )
I am doing some image editing with the PIL libary. The point is, that I don't want to save the image each time on my HDD to view it in Explorer. Is there a small module that simply enables me to set up a window and display the image?
From near the beginning of the PIL Tutorial:
Once you have an instance of the Image class, you can use the methods
defined by this class to process and manipulate the image. For
example, let's display the image we just loaded:
>>> im.show()
Update:
Nowadays the Image.show() method is formally documented in the Pillow fork of PIL along with an explanation of how it's implemented on different OSs.
I tested this and it works fine for me:
from PIL import Image
im = Image.open('image.jpg')
im.show()
You can use pyplot to show images:
from PIL import Image
import matplotlib.pyplot as plt
im = Image.open('image.jpg')
plt.imshow(im)
plt.show() # image will not be displayed without this
If you find that PIL has problems on some platforms, using a native image viewer may help.
img.save("tmp.png") #Save the image to a PNG file called tmp.png.
For MacOS:
import os
os.system("open tmp.png") #Will open in Preview.
For most GNU/Linux systems with X.Org and a desktop environment:
import os
os.system("xdg-open tmp.png")
For Windows:
import os
os.system("powershell -c tmp.png")
Maybe you can use matplotlib for this, you can also plot normal images with it. If you call show() the image pops up in a window. Take a look at this:
http://matplotlib.org/users/image_tutorial.html
You can display an image in your own window using Tkinter, w/o depending on image viewers installed in your system:
import Tkinter as tk
from PIL import Image, ImageTk # Place this at the end (to avoid any conflicts/errors)
window = tk.Tk()
#window.geometry("500x500") # (optional)
imagefile = {path_to_your_image_file}
img = ImageTk.PhotoImage(Image.open(imagefile))
lbl = tk.Label(window, image = img).pack()
window.mainloop()
For Python 3, replace import Tkinter as tk with import tkinter as tk.
Yes, PIL.Image.Image.show() easy and convenient.
But if you want to put the image together, and do some comparing, then I will suggest you use the matplotlib. Below is an example,
import PIL
import PIL.IcoImagePlugin
import PIL.Image
import matplotlib.pyplot as plt
with PIL.Image.open("favicon.ico") as pil_img:
pil_img: PIL.IcoImagePlugin.IcoImageFile # You can omit. It helps IDE know what the object is, and then it will hint at the method very correctly.
out_img = pil_img.resize((48, 48), PIL.Image.ANTIALIAS)
plt.figure(figsize=(2, 1)) # 2 row and 1 column.
plt.subplots_adjust(hspace=1) # or you can try: plt.tight_layout()
plt.rc(('xtick', 'ytick'), color=(1, 1, 1, 0)) # set xtick, ytick to transparent
plt.subplot(2, 1, 1), plt.imshow(pil_img)
plt.subplot(2, 1, 2), plt.imshow(out_img)
plt.show()
This is what worked for me:
roses = list(data_dir.glob('roses/*'))
abc = PIL.Image.open(str(roses[0]))
PIL.Image._show(abc)