I've been struggling to come up with a script that allows me to take screenshots of my desktop more than once per every second. I'm using Win10.
PIL:
from PIL import ImageGrab
import time
while True:
im = ImageGrab.grab()
fname = "dropfolder/%s.png" %int(time.time())
im.save(fname,'PNG')
Results 1.01 seconds per image.
PyScreeze (https://github.com/asweigart/pyscreeze):
import pyscreeze
import time
while True:
fname = "dropfolder/%s.png" %int(time.time())
x = pyscreeze.screenshot(fname)
Results 1.00 seconds per image.
Win32:
import win32gui
import win32ui
import win32con
import time
w=1920 #res
h=1080 #res
while True:
wDC = win32gui.GetWindowDC(0)
dcObj=win32ui.CreateDCFromHandle(wDC)
cDC=dcObj.CreateCompatibleDC()
dataBitMap = win32ui.CreateBitmap()
dataBitMap.CreateCompatibleBitmap(dcObj, w, h)
cDC.SelectObject(dataBitMap)
cDC.BitBlt((0,0),(w, h) , dcObj, (0,0), win32con.SRCCOPY)
fname = "dropfolder/%s.png" %int(time.time())
dataBitMap.SaveBitmapFile(cDC, fname)
dcObj.DeleteDC()
cDC.DeleteDC()
win32gui.ReleaseDC(0, wDC)
win32gui.DeleteObject(dataBitMap.GetHandle())
Results 1.01 seconds per image.
Then I stumbled into thread (Fastest way to take a screenshot with python on windows) where it was suggested that gtk would yield phenomenal results.
However using gtk:
import gtk
import time
img_width = gtk.gdk.screen_width()
img_height = gtk.gdk.screen_height()
while True:
screengrab = gtk.gdk.Pixbuf(
gtk.gdk.COLORSPACE_RGB,
False,
8,
img_width,
img_height
)
fname = "dropfolder/%s.png" %int(time.time())
screengrab.get_from_drawable(
gtk.gdk.get_default_root_window(),
gtk.gdk.colormap_get_system(),
0, 0, 0, 0,
img_width,
img_height
).save(fname, 'png')
Results 2.34 seconds per image.
It seems to me like I'm doing something wrong, because people have been getting great results with gtk.
Any advices how to speed up the process?
Thanks!
Your first solution should be giving you more than one picture per second. The problem though is that you will be overwriting any pictures that occur within the same second, i.e. they will all have the same filename. To get around this you could create filenames that include 10ths of a second as follows:
from PIL import ImageGrab
from datetime import datetime
while True:
im = ImageGrab.grab()
dt = datetime.now()
fname = "pic_{}.{}.png".format(dt.strftime("%H%M_%S"), dt.microsecond // 100000)
im.save(fname, 'png')
On my machine, this gave the following output:
pic_1143_24.5.png
pic_1143_24.9.png
pic_1143_25.3.png
pic_1143_25.7.png
pic_1143_26.0.png
pic_1143_26.4.png
pic_1143_26.8.png
pic_1143_27.2.png
In case anyone cares in 2022: You can try my newly created project DXcam: I think for raw speed it's the fastest out there (in python, and without going too deep into the rabbit hole). It's originally created for a deep learning pipeline for FPS games where the higher FPS you get the better. Plus I (am trying to) design it to be user-friendly:
For a screenshot just do
import dxcam
camera = dxcam.create()
frame = camera.grab() # full screen
frame = camera.grab(region=(left, top, right, bottom)) # region
For screen capturing:
camera.start(target_fps=60) # threaded
for i in range(1000):
image = camera.get_latest_frame() # Will block until new frame available
camera.stop()
I copied the part of the benchmarks section from the readme:
DXcam
python-mss
D3DShot
Average FPS
238.79
75.87
118.36
Std Dev
1.25
0.5447
0.3224
The benchmarks is conducted through 5 trials on my 240hz monitor with a constant 240hz rendering rate synced w/the monitor (using blurbuster ufo test).
You can read more about the details here: https://github.com/ra1nty/DXcam
This solution uses d3dshot.
def d3dgrab(rect=(0, 0, 0, 0), spath=r".\\pictures\\cache\\", sname="", title=""):
""" take a screenshot by rect. """
sname = sname if sname else time.strftime("%Y%m%d%H%M%S000.jpg", time.localtime())
while os.path.isfile("%s%s" % (spath, sname)):
sname = "%s%03d%s" % (sname[:-7], int(sname[-7:-4]) + 1, sname[-4:])
xlen = win32api.GetSystemMetrics(win32con.SM_CXSCREEN)
ylen = win32api.GetSystemMetrics(win32con.SM_CYSCREEN)
assert 0 <= rect[0] <= xlen and 0 <= rect[2] <= xlen, ValueError("Illegal value of X coordination in rect: %s" % rect)
assert 0 <= rect[1] <= ylen and 0 <= rect[3] <= ylen, ValueError("Illegal value of Y coordinatoin in rect: %s" % rect)
if title:
hdl = win32gui.FindWindow(None, title)
if hdl != win32gui.GetForegroundWindow():
win32gui.SetForegroundWindow(hdl)
rect = win32gui.GetWindowRect(hdl)
elif not sum(rect):
rect = (0, 0, xlen, ylen)
d = d3dshot.create(capture_output="numpy")
return d.screenshot_to_disk(directory=spath, file_name=sname, region=rect)
I think it can be helped
sname = sname if sname else time.strftime("%Y%m%d%H%M%S000.jpg", time.localtime())
while os.path.isfile("%s%s" % (spath, sname)):
sname = "%s%03d%s" % (sname[:-7], int(sname[-7:-4]) + 1, sname[-4:])
And it's fastest way to take screenshot I found.
Related
I'm trying to add a high FPS screen recorder to my application.
I use Python 3.7 on Windows.
The modules and methods I've tried are mss (python-mss) and d3dshot, but I'm still only achieving 15-19 FPS for a long video (more than 20 seconds).
The resolution I'm recording at is 1920 x 1080.
What is the best way to optimize screen recording? I've tried to use the multiprocessing library, but it seems like it's still not fast enough. I'm not sure I'm using it in the optimal way, what are some ways I could use it to improve processing performance?
Using OBS Studio, I'm able to get 30 FPS, no matter how long the video is. My objective is to achieve the same results with my own code.
Here is what I've written so far:
from multiprocessing import Process, Queue
from time import sleep, time
import cv2
import d3dshot
import numpy as np
def grab(queue):
d = d3dshot.create(capture_output="numpy", frame_buffer_size=500)
d.capture()
sleep(0.1)
c=0
begin = time()
while time() - begin < 30:
starter = time()
frame = d.get_latest_frame()
queue.put(frame)
c+=1
ender = time()
sleep(max(0, 1/60 - (ender -starter)))
# Tell the other worker to stop
queue.put(None)
final=time()
print(c/(final-begin))
d.stop()
def save(queue):
SCREEN_SIZE = 1920, 1080
# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'DIVX') # In Windows: DIVX
out = cv2.VideoWriter(r"output.avi",fourcc, 30.0, (SCREEN_SIZE))
# type: (Queue) -> None
last_img = None
while "there are screenshots":
img = queue.get()
if img is None:
break
if img is last_img:
continue
out.write(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
last_img = img
if __name__ == "__main__":
# The screenshots queue
queue = Queue() # type: Queue
# 2 processes: one for grabing and one for saving PNG files
Process(target=grab, args=(queue,)).start()
Process(target=save, args=(queue,)).start()
The goal is to capture a game, while performing automated keyboard and mouse actions.
I have faced the same problem in trying to get high speed recording for games. This was the fastest solution I was able to find for Windows. The code is using raw buffer objects and leads to around ~27 FPS. I cannot find the original post on which this code is based, but if someone finds it I will add the reference.
Note that the framerate will significantly increase if you make the region smaller than 1920x1080.
"""
Alternative screen capture device, when there is no camera of webcam connected
to the desktop.
"""
import logging
import sys
import time
import cv2
import numpy as np
if sys.platform == 'win32':
import win32gui, win32ui, win32con, win32api
else:
logging.warning(f"Screen capture is not supported on platform: `{sys.platform}`")
from collections import namedtuple
class ScreenCapture:
"""
Captures a fixed region of the total screen. If no region is given
it will take the full screen size.
region_ltrb: Tuple[int, int, int, int]
Specific region that has to be taken from the screen using
the top left `x` and `y`, bottom right `x` and `y` (ltrb coordinates).
"""
__region = namedtuple('region', ('x', 'y', 'width', 'height'))
def __init__(self, region_ltrb=None):
self.region = region_ltrb
self.hwin = win32gui.GetDesktopWindow()
# Time management
self._time_start = time.time()
self._time_taken = 0
self._time_average = 0.04
def __getitem__(self, item):
return self.screenshot()
def __next__(self):
return self.screenshot()
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
if exc_type and isinstance(exc_val, StopIteration):
return True
return False
#staticmethod
def screen_dimensions():
""" Retrieve total screen dimensions. """
left = win32api.GetSystemMetrics(win32con.SM_XVIRTUALSCREEN)
top = win32api.GetSystemMetrics(win32con.SM_YVIRTUALSCREEN)
height = win32api.GetSystemMetrics(win32con.SM_CYVIRTUALSCREEN)
width = win32api.GetSystemMetrics(win32con.SM_CXVIRTUALSCREEN)
return left, top, height, width
#property
def fps(self):
return int(1 / self._time_average) * (self._time_average > 0)
#property
def region(self):
return self._region
#property
def size(self):
return self._region.width, self._region.height
#region.setter
def region(self, value):
if value is None:
self._region = self.__region(*self.screen_dimensions())
else:
assert len(value) == 4, f"Region requires 4 input, x, y of left top, and x, y of right bottom."
left, top, x2, y2 = value
width = x2 - left + 1
height = y2 - top + 1
self._region = self.__region(*list(map(int, (left, top, width, height))))
def screenshot(self, color=None):
"""
Takes a part of the screen, defined by the region.
:param color: cv2.COLOR_....2...
Converts the created BGRA image to the requested image output.
:return: np.ndarray
An image of the region in BGRA values.
"""
left, top, width, height = self._region
hwindc = win32gui.GetWindowDC(self.hwin)
srcdc = win32ui.CreateDCFromHandle(hwindc)
memdc = srcdc.CreateCompatibleDC()
bmp = win32ui.CreateBitmap()
bmp.CreateCompatibleBitmap(srcdc, width, height)
memdc.SelectObject(bmp)
memdc.BitBlt((0, 0), (width, height), srcdc, (left, top), win32con.SRCCOPY)
signed_ints_array = bmp.GetBitmapBits(True)
img = np.frombuffer(signed_ints_array, dtype='uint8')
img.shape = (height, width, 4)
srcdc.DeleteDC()
memdc.DeleteDC()
win32gui.ReleaseDC(self.hwin, hwindc)
win32gui.DeleteObject(bmp.GetHandle())
# This makes sure that the FPS are taken in comparison to screenshots rates and vary only slightly.
self._time_taken, self._time_start = time.time() - self._time_start, time.time()
self._time_average = self._time_average * 0.95 + self._time_taken * 0.05
if color is not None:
return cv2.cvtColor(img, color)
return img
def show(self, screenshot=None):
""" Displays an image to the screen. """
image = screenshot if screenshot is not None else self.screenshot()
cv2.imshow('Screenshot', image)
if cv2.waitKey(1) & 0xff == ord('q'):
raise StopIteration
return image
def close(self):
""" Needs to be called before exiting when `show` is used, otherwise an error will occur. """
cv2.destroyWindow('Screenshot')
def scale(self, src: np.ndarray, size: tuple):
return cv2.resize(src, size, interpolation=cv2.INTER_LINEAR_EXACT)
def save(self, path, screenshot=None):
""" Store the current screenshot in the provided path. Full path, with img name is required.) """
image = screenshot if screenshot is not None else self.screenshot
cv2.imwrite(filename=path, img=image)
if __name__ == '__main__':
# Example usage when displaying.
with ScreenCapture((0, 0, 1920, 1080)) as capture:
for _ in range(100):
capture.show()
print(f"\rCapture framerate: {capture.fps}", end='')
# Example usage as generator.
start_time = time.perf_counter()
for frame, screenshot in enumerate(ScreenCapture((0, 0, 1920, 1080)), start=1):
print(f"\rFPS: {frame / (time.perf_counter() - start_time):3.0f}", end='')
Edit
I noticed some small mistake in the window show function, and the self.screenshot calls in the __getitem__ and __next__ method. These have been resolved.
Next to the for example using the ScreenCapture as a context manager, I added an example of using it as a generator.
I try to extract each frame from GIF file.
I found two ways to deal with this problem.
1 Find an online tool to solve it.
https://ezgif.com/split
It is an excellent tool. It can redraw the details. Frames extracted from the GIF are in high quality.
2 Try to use python library to solve this problem.
I use PIL, but it comes with a knotty problem. The frame extracted lost many details with white edges.
So I want to ask what is the algorithm EZGif take, and how to implement it with python?
Got a useful reference of your problem.
import os
from PIL import Image
'''
I searched high and low for solutions to the "extract animated GIF frames in Python"
problem, and after much trial and error came up with the following solution based
on several partial examples around the web (mostly Stack Overflow).
There are two pitfalls that aren't often mentioned when dealing with animated GIFs -
firstly that some files feature per-frame local palettes while some have one global
palette for all frames, and secondly that some GIFs replace the entire image with
each new frame ('full' mode in the code below), and some only update a specific
region ('partial').
This code deals with both those cases by examining the palette and redraw
instructions of each frame. In the latter case this requires a preliminary (usually
partial) iteration of the frames before processing, since the redraw mode needs to
be consistently applied across all frames. I found a couple of examples of
partial-mode GIFs containing the occasional full-frame redraw, which would result
in bad renders of those frames if the mode assessment was only done on a
single-frame basis.
Nov 2012
'''
def analyseImage(path):
'''
Pre-process pass over the image to determine the mode (full or additive).
Necessary as assessing single frames isn't reliable. Need to know the mode
before processing all frames.
'''
im = Image.open(path)
results = {
'size': im.size,
'mode': 'full',
}
try:
while True:
if im.tile:
tile = im.tile[0]
update_region = tile[1]
update_region_dimensions = update_region[2:]
if update_region_dimensions != im.size:
results['mode'] = 'partial'
break
im.seek(im.tell() + 1)
except EOFError:
pass
return results
def processImage(path):
'''
Iterate the GIF, extracting each frame.
'''
mode = analyseImage(path)['mode']
im = Image.open(path)
i = 0
p = im.getpalette()
last_frame = im.convert('RGBA')
try:
while True:
print "saving %s (%s) frame %d, %s %s" % (path, mode, i, im.size, im.tile)
'''
If the GIF uses local colour tables, each frame will have its own palette.
If not, we need to apply the global palette to the new frame.
'''
if not im.getpalette():
im.putpalette(p)
new_frame = Image.new('RGBA', im.size)
'''
Is this file a "partial"-mode GIF where frames update a region of a different size to the entire image?
If so, we need to construct the new frame by pasting it on top of the preceding frames.
'''
if mode == 'partial':
new_frame.paste(last_frame)
new_frame.paste(im, (0,0), im.convert('RGBA'))
new_frame.save('%s-%d.png' % (''.join(os.path.basename(path).split('.')[:-1]), i), 'PNG')
i += 1
last_frame = new_frame
im.seek(im.tell() + 1)
except EOFError:
pass
def main():
processImage('foo.gif')
processImage('bar.gif')
if __name__ == "__main__":
main()
GIF transparency in PIL is broken. Not sure why it is that way, but it`s a fact.
You can try using my GIF library instead, for which I`ve just made a Python frontend:
from PIL import Image
def GIF_Load(file):
from platform import system
from ctypes import string_at, Structure, c_long as cl, c_ubyte, \
py_object, pointer, POINTER as PT, CFUNCTYPE, CDLL
class GIF_WHDR(Structure): _fields_ = \
[("xdim", cl), ("ydim", cl), ("clrs", cl), ("bkgd", cl),
("tran", cl), ("intr", cl), ("mode", cl), ("frxd", cl), ("fryd", cl),
("frxo", cl), ("fryo", cl), ("time", cl), ("ifrm", cl), ("nfrm", cl),
("bptr", PT(c_ubyte)), ("cpal", PT(c_ubyte))]
def intr(y, x, w, base, tran): base.paste(tran.crop((0, y, x, y + 1)), w)
def skew(i, r): return r >> ((7 - (i & 2)) >> (1 + (i & 1)))
def WriteFunc(d, w):
cpal = string_at(w[0].cpal, w[0].clrs * 3)
list = d.contents.value
if (len(list) == 0):
list.append(Image.new("RGBA", (w[0].xdim, w[0].ydim)))
tail = len(list) - 1
base = Image.frombytes("L", (w[0].frxd, w[0].fryd),
string_at(w[0].bptr, w[0].frxd * w[0].fryd))
if (w[0].intr != 0):
tran = base.copy()
[intr(skew(y, y) + (skew(y, w[0].fryd - 1) + 1, 0)[(y & 7) == 0],
w[0].frxd, (0, y), base, tran) for y in range(w[0].fryd)]
tran = Image.eval(base, lambda indx: (255, 0)[indx == w[0].tran])
base.putpalette(cpal)
list[tail].paste(base, (w[0].frxo, w[0].fryo), tran)
list[tail].info = {"delay" : w[0].time}
if (w[0].ifrm != (w[0].nfrm - 1)):
list.append(list[max(0, tail - int(w[0].mode == 3))].copy())
if (w[0].mode == 2):
base = Image.new("L", (w[0].frxd, w[0].fryd), w[0].bkgd)
base.putpalette(cpal)
list[tail + 1].paste(base, (w[0].frxo, w[0].fryo))
try: file = open(file, "rb")
except IOError: return []
file.seek(0, 2)
size = file.tell()
file.seek(0, 0)
list = []
CDLL(("%s.so", "%s.dll")[system() == "Windows"] % "gif_load"). \
GIF_Load(file.read(), size,
CFUNCTYPE(None, PT(py_object), PT(GIF_WHDR))(WriteFunc),
None, pointer(py_object(list)), 0)
file.close()
return list
def GIF_Save(file, fext):
list = GIF_Load("%s.gif" % file)
[pict.save("%s_f%d.%s" % (file, indx, fext))
for (indx, pict) in enumerate(list)]
GIF_Save("insert_gif_name_here_without_extension", "png")
I trying to put gradient on image - and that works.CPU and GPU programs should do the same. I have problem with output images because code for GPU giving me diffrent image than code for CPU and I don't know where is mistake. I think that CPU code it's fine but GPU not. Output images - orginal, cpu, gpu - Please check my code. Thanks.
import pyopencl as cl
import sys
import Image
import numpy
from time import time
def gpu_gradient():
if len(sys.argv) != 3:
print "USAGE: " + sys.argv[0] + " <inputImageFile> <outputImageFile>"
return 1
# create context and command queue
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
# load image
im = Image.open(sys.argv[1])
if im.mode != "RGBA":
im = im.convert("RGBA")
imgSize = im.size
buffer = im.tostring() # len(buffer) = imgSize[0] * imgSize[1] * 4
# Create ouput image object
clImageFormat = cl.ImageFormat(cl.channel_order.RGBA,
cl.channel_type.UNSIGNED_INT8)
input_image = cl.Image(ctx,
cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR,
clImageFormat,
imgSize,
None,
buffer)
output_image = cl.Image(ctx,
cl.mem_flags.WRITE_ONLY,
clImageFormat,
imgSize)
# load the kernel source code
kernelFile = open("gradient.cl", "r")
kernelSrc = kernelFile.read()
# Create OpenCL program
program = cl.Program(ctx, kernelSrc).build()
# Call the kernel directly
globalWorkSize = ( imgSize[0],imgSize[1] )
gpu_start_time = time()
program.gradientcover(queue,
globalWorkSize,
None,
input_image,
output_image)
# Read the output buffer back to the Host
buffer = numpy.zeros(imgSize[0] * imgSize[1] * 4, numpy.uint8)
origin = ( 0, 0, 0 )
region = ( imgSize[0], imgSize[1], 1 )
cl.enqueue_read_image(queue, output_image,
origin, region, buffer).wait()
# Save the image to disk
gsim = Image.fromstring("RGBA", imgSize, buffer.tostring())
gsim.save("GPU_"+sys.argv[2])
gpu_end_time = time()
print("GPU Time: {0} s".format(gpu_end_time - gpu_start_time))
def cpu_gradient():
if len(sys.argv) != 3:
print "USAGE: " + sys.argv[0] + " <inputImageFile> <outputImageFile>"
return 1
gpu_start_time = time()
im = Image.open(sys.argv[1])
if im.mode != "RGBA":
im = im.convert("RGBA")
pixels = im.load()
for i in range(im.size[0]):
for j in range(im.size[1]):
RGBA= pixels[i,j]
RGBA2=RGBA[0],RGBA[1],0,0
pixel=RGBA[0]+RGBA2[0],RGBA[1]+RGBA2[1],RGBA[2],RGBA[3]
final_pixels=list(pixel)
if final_pixels[0]>255:
final_pixels[0]=255
elif final_pixels[1]>255:
final_pixels[1]=255
pixel=tuple(final_pixels)
pixels[i,j]=pixel
im.save("CPU_"+sys.argv[2])
gpu_end_time = time()
print("CPU Time: {0} s".format(gpu_end_time - gpu_start_time))
cpu_gradient()
gpu_gradient()
Kernel code:
const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE |
CLK_ADDRESS_CLAMP |
CLK_FILTER_NEAREST;
__kernel void gradientcover(read_only image2d_t srcImg,
write_only image2d_t dstImg)
{
int2 coord = (int2) (get_global_id(0), get_global_id(1));
uint4 pixel = read_imageui(srcImg, sampler, coord);
uint4 pixel2 = (uint4)(coord.x, coord.y,0,0);
pixel=pixel + pixel2;
if(pixel.x > 255) pixel.x=255;
if(pixel.y > 255) pixel.y=255;
// Write the output value to image
write_imageui(dstImg, coord, pixel);
}
Your CL and Python code do not do the same thing!
RGBA= pixels[i,j]
RGBA2=RGBA[0],RGBA[1],0,0
pixel=RGBA[0]+RGBA2[0],RGBA[1]+RGBA2[1],RGBA[2],RGBA[3]
adds the RG component to the pixel.
uint4 pixel = read_imageui(srcImg, sampler, coord);
uint4 pixel2 = (uint4)(coord.x, coord.y,0,0);
pixel=pixel + pixel2;
adds the X, Y from the coordinates to the pixel.
It is highly likely that this is the cause of difference between your results.
Assuming (from the description) that you want to darkenlighten the image by coordinates, I'd sugest the python code should be:
RGBA= pixels[i,j]
RGBA2=i,j,0,0
instead.
I'm using following code to add watermark to animated GIF images. My problem is that all GIF frames except the first one have incorrect colors in result. Would you know how to fix the color of frames? Thank you.
def add_watermark(in_file, watermark_file, watermark_position, watermark_ratio, out_file, quality=85):
img = Image.open(in_file)
watermark_layer = Image.new('RGBA', img.size, (0,0,0,0))
watermark_img = Image.open(watermark_file).convert('RGBA')
watermark_img.thumbnail((img.size[0]/watermark_ratio, 1000), Image.ANTIALIAS)
alpha = watermark_img.split()[3]
alpha = ImageEnhance.Brightness(alpha).enhance(0.95)
watermark_img.putalpha(alpha)
watermark_layer.paste(watermark_img, count_watermark_position(img, watermark_img, watermark_position))
frames = images2gif.readGifFromPIL(img, False)
frames_out = []
for frame in frames:
frames_out.append(Image.composite(watermark_layer, frame, watermark_layer))
images2gif.writeGif(out_file, frames_out, duration=0.5)
To complete example, i provide also code of helper function:
def count_watermark_position(img, watermark, position):
if position == 'right_bottom':
return img.size[0] - watermark.size[0], img.size[1] - watermark.size[1]
if position == 'center':
return (img.size[0] - watermark.size[0])/2, (img.size[1] - watermark.size[1])/2
if position == 'left_bottom':
return 0, img.size[1] - watermark.size[1]
if position == 'left_top':
return 0, 0
if position == 'right_top':
return img.size[0] - watermark.size[0], 0
raise AttributeError('Invalid position')
Source code of images2gif I 've used - I modified it a little bit to make it work with pillow. See comment at the begining of source code.
I've gotten OpenCV working with Python and I can even detect a face through my webcam. What I really want to do though, is see movement and find the point in the middle of the blob of movement. The camshift sample is close to what I want, but I don't want to have to select which portion of the video to track. Bonus points for being able to predict the next frame.
Here's the code I have currently:
#!/usr/bin/env python
import cv
def is_rect_nonzero(r):
(_,_,w,h) = r
return (w > 0) and (h > 0)
class CamShiftDemo:
def __init__(self):
self.capture = cv.CaptureFromCAM(0)
cv.NamedWindow( "CamShiftDemo", 1 )
self.storage = cv.CreateMemStorage(0)
self.cascade = cv.Load("/usr/local/share/opencv/haarcascades/haarcascade_mcs_upperbody.xml")
self.last_rect = ((0, 0), (0, 0))
def run(self):
hist = cv.CreateHist([180], cv.CV_HIST_ARRAY, [(0,180)], 1 )
backproject_mode = False
i = 0
while True:
i = (i + 1) % 12
frame = cv.QueryFrame( self.capture )
if i == 0:
found = cv.HaarDetectObjects(frame, self.cascade, self.storage, 1.2, 2, 0, (20, 20))
for p in found:
# print p
self.last_rect = (p[0][0], p[0][1]), (p[0][2], p[0][3])
print self.last_rect
cv.Rectangle( frame, self.last_rect[0], self.last_rect[1], cv.CV_RGB(255,0,0), 3, cv.CV_AA, 0 )
cv.ShowImage( "CamShiftDemo", frame )
c = cv.WaitKey(7) % 0x100
if c == 27:
break
if __name__=="__main__":
demo = CamShiftDemo()
demo.run()
Found a solution at How do I track motion using OpenCV in Python?