I want the ability to capture my screen (ideally from a specific application) using Python. Is it possible to do this?
If not, is it possible to build a GUI in Python that is transparent so I can drag it over a pre-existing application and visually capture what's behind it?
Can cv2 preform this? Not sure what to put inside "VideoCapture" though.
import cv2
video_capture = cv2.VideoCapture(....)
I have found solutions online that capture the entire screen. Is it possible to select a specific window/application for capturing?
Related
I am trying to stream live video from an external camera using cv2. I was able to write the simple code to stitch the frames and stream it. But am struggling to find how to change the camera.
I tried to run it after disabling the main webcam from the task manager, but it still did not work.
So, if anyone can help me with some clue regarding the same, that would be a great help.
Cameras are numbered for Windows. You can try a few indeces and check which camera index belongs to the camera you want.
capture = cv2.VideoCapture(index)
I have been trying to make a program that notifies me when a number changes on an app. I have been using ImageGrab and then pytesseract which works but I can only figure out how to take the screenshot when I can visually see the number. It would be very nice if there was a way to take an image of the app if it was minimized (not visible on the screen) so I could work on other things as it ran. I also need the picture of a certain part of that app I need to do a bounding box within the app of where the picture is taken.
This is what I am currently using which takes a certain part of the whole screen:
img = ImageGrab.grab(bbox=(1400,875,1445,905))
I think there might be a way to do it with Quartz but I could not find out how to do a region of a background app.
It is possible to take screenshots of windows that are running in the background with the screencapture api in MacOS. You can see the documentation by typing man screencapture in the terminal.
For your use case it would look something like this:
screencapture -l <windowId> -R <x,y,w,h>
As you mentioned, you can use Quartz to find the desired windowId:
from Quartz import CGWindowListCopyWindowInfo, kCGNullWindowID, kCGWindowListOptionAll
windowName = 'Desktop' # change this to the window you are looking for
def findWindowId():
windowList = CGWindowListCopyWindowInfo(
kCGWindowListOptionAll, kCGNullWindowID)
for window in windowList:
if(windowName.lower() in window.get('kCGWindowName', '').lower()):
return window['kCGWindowNumber']
return None
You can find a fully working example here
Note: From my testings, if you minimise(cmd + m) or hide(cmd + h) a window, taking a screenshot of it will only capture the moment before it was hidden. You would need to keep the window opened for it to work--but it is ok to keep it behind other windows. Tested on MacOS v10.15.7.
What I want to do is, I want to have have a user click the close "X" button in an OpenCV window and have the program recognize it, and close that window.
It seems that this is not easy, and after four days of going round in circles and finding out how it can be done on a windows machine I am no closer to finding out how to do it on a Raspberry Pi using Python.
I think I need to get the handle of the OpenCV window ( how? ) and then use that to see if the window is still visible ( what call? ) and if it is not, bring proceedings to a halt ( I can do that bit ).
I have tried cvGetWindowHandle("window_name") but I've downloaded the source and GetWindowHandle doesn't seem to be available from python.
The code to capture the left button mouse click event and close a window is fairly simple:
if event == cv2.EVENT_LBUTTONDOWN:
cv2.destroyWindow("window_name")
There is a tutorial on how to use the button click event here which is where I took that code, it provides a full working example in python.
However you are probably running a unix based system on your Rpi and will therefore want to read This answer as you made need a combination of waitKey(1) in order for it to work.
I maybe have a solution but I'm not 100% sure so you'll have to check it yourself:) I assume the OpenCV uses X11 underneath (if no none of this makes sense). With X11 you can:
1) Find X11 window handle for your OpenCV window as described here
2) Use XSelectInput to hook into its event loop somewhat similar to what was done here. I assume you should useStructureNotifyMask as the mask to get the XDestroyWindowEvent event. Run the X11 event loop and as soon as you get the corresponding event you can call the OpenCV destroyWindow function.
This suggestion is based on assumptions and I can't give any guarantees it will work, but as far as I understand if OpenCV isn't built with some other specific window manager this should work. As far as I understand Raspbian was shipped with X11 up to some point and then it switched to Wayland. In case you have an image with Wayland then this probably will not work (and I'm sorry but my Linux skills do not contain a recipe on how to determine which one is used:D).
UPDATE
Actually after more reading I seem to feel that gtkshould be able to handle whatever is being used underneath (X11/Wayland). So if you install gtk development libraries you should also be able to connect to the windows deletion signal like described here. The only question then remains on how to obtain the window handle.
My personal advice - use Qt or some other GUI friendly framework to render the OpenCV images instead of doing it directly with OpenCV. OpenCV is an imaging framework but IMHO highgui is too unusable for anything serious.
all I want to do is to have a user click the close X in an openCV
window
This is how I did it, in a capture loop (RPi stretch, opencv 4.0):
while True:
# do your video capture
# ...
cv.imshow("video frame",frame)
if cv.getWindowProperty('video frame', 1) < 0:
break
getWindowProperty isn't much documented but what it does is, as its name implies, to return the property of a given window. Two of the flags of interest are WND_PROP_FULLSCREEN (or 0) and WND_PROP_AUTOSIZE (or 1). When the window is closed the function returns -1. Use this to immediately break your loop (or close your window if not in a loop).
References:
https://docs.opencv.org/3.1.0/d7/dfc/group__highgui.html#gaaf9504b8f9cf19024d9d44a14e461656
OpenCV Python: How to detect if a window is closed?
Poll with cv2.getWindowImageRect(windowName). It will return (-1, -1, -1, -1) when the user clicks the window close button.
# check if window was closed or image was resized
xPos, yPos, width, height = cv2.getWindowImageRect(windowName)
if xPos == -1: # if user closed window
pass # do whatever you want here if the user clicked CLOSE
I haven't found this documented anywhere; discovered it by accident while handling window resizing. (Tested with OpenCV 4.1.0.)
I have a python script that, when executed, wait's until it gets input from the user. I now want to know if it is posible to keep showing an image fullscreen until the user has given the input? I have searched for a solution but all i can find are tools that window managers to show the picture, but this is not installed. It'll probably only run on Debian.
I'm kind of searching for the same idea as omxplayer, but instead of movies it has to display pictures.
Using pygame is probably the easiest way of displaying an image fullscreen on the Linux framebuffer or on the X Windows root window (i.e. without a window manager).
The answers to the question Frame buffer module of python have all the details on how to achieve this.
What's the simplest way in Ubuntu 11.10 to programmatically guide (either from Bash or Python) the user to capture a webcam photo of themselves?
I can launch a simple app like Cheese, but I don't see an easy way to immediately detect or retrieve the photo it captures. I can also access and record the webcam stream directly via OpenCV, but I'd have to reinvent the GUI to communicate with the user.
Is there any kind of script that's a happy medium, where I can launch it, and it prints on stdout the filename of the image the user took?
I like using pygame for that -
it does not require you to open a Pygame SDL window, unlike when you want to use it to capture keyboard events, for example.
import pygame.camera
pygame.camera.init()
cam = pygame.camera.Camera(pygame.camera.list_cameras()[0])
cam.start()
img = cam.get_image()
import pygame.image
pygame.image.save(img, "photo.bmp")
pygame.camera.quit()
Though Pygame will only save uncompressed "bmp" files - you may want to combine it with PIL to write to other formats.
If you want to do this via Python, it looks like you have a few options. The Pygame library has the ability to access cameras.
If that's unsatisfactory, you can go much lower level and access the Video 4 Linux 2 API directly using ioctl calls using Python's fcntl library.