I am trying to acquire images from my webcam using a python code that imports OpenCV. The code is the following:
import sys
sys.path.append("C:\\opencv\\build\\python\\2.7")
import cv2
import cv2.cv as cv
import time
# Set resolution
cap = cv2.VideoCapture(0)
print "Frame default resolution: (" + str(cap.get(cv.CV_CAP_PROP_FRAME_WIDTH)) + "; " + str(cap.get(cv.CV_CAP_PROP_FRAME_HEIGHT)) + ")"
cap.set(cv.CV_CAP_PROP_FRAME_WIDTH, 800)
cap.set(cv.CV_CAP_PROP_FRAME_HEIGHT, 600)
print "Frame resolution set to: (" + str(cap.get(cv.CV_CAP_PROP_FRAME_WIDTH)) + "; " + str(cap.get(cv.CV_CAP_PROP_FRAME_HEIGHT)) + ")"
# Acquire frame
capture = cv.CreateCameraCapture(0)
img = cv.QueryFrame(capture)
The code works fine, except that the Camera default resolution is 640x480, and my code seems to be able to set only resolution values lower than that. For example, I can set the image size to 320x240, but I can't change it to 800x600. I have no error appearing: simply the resolution is set to the default one (640x480) as I try to set it to higher values.
The camera I am using (no other webcam is connected to the computer) is the QuickCam V-UBK45: with the software provided by Logitech, I am able to take pictures at full resolution (1280x960) and at all intermediate ones (e.g. 800x600).
Therefore, those frame sizes are supported from the hardware, but my code can't access them.
Does anyone know what I can do?
The problem as mentioned above is caused by the camera driver. I was able to fix it using Direct Show as a backend. I read (sorry, but I do not remember where) that almost all cameras provide a driver that allows their use from DirectShow. Therefore, I used DirectShow in Windows to interact with the cameras and I was able to configure the resolution as I wanted and also get the native aspect ratio of my camera (16:9).
You can try this code to see if this works for you:
import cv2
cap = cv2.VideoCapture(0, cv2.CAP_DSHOW) # this is the magic!
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 1280)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 720)
r, frame = cap.read()
...
print('Resolution: ' + str(frame.shape[0]) + ' x ' + str(frame.shape[1]))
In the OpenCV documentation, I found the following information for those who want to know more about OpenCV backends (OpenCV docs)
I hope this can help you!
I used the different resolutions to set image resolution from List of common resolutions by looping over
def set_res(cap, x,y):
cap.set(cv.CV_CAP_PROP_FRAME_WIDTH, int(x))
cap.set(cv.CV_CAP_PROP_FRAME_HEIGHT, int(y))
return str(cap.get(cv.CV_CAP_PROP_FRAME_WIDTH)),str(cap.get(cv.CV_CAP_PROP_FRAME_HEIGHT))
It seems that OpenCV or my camera allows only certain resolutions.
160.0 x 120.0
176.0 x 144.0
320.0 x 240.0
352.0 x 288.0
640.0 x 480.0
1024.0 x 768.0
1280.0 x 1024.0
I got it to work, so this post is for others experiencing the same problem:
I am running on the Logitech C270 as well. For some reason it would only show 640x480 even though the webcam supports 1280x720. Same issue persists with the built-in webcam in my laptop.
If I set it to 800x600 in the code it shows 640x480. However, if I set it to 1024x768 it becomes 800x600. And if I set it to something silly like 2000x2000 it becomes 1280x720.
This is in C++ on OpenCV 3.0, but perhaps it applies to Python as well.
Try the following code to obtain the maximum camera resolution, using this you can capture your photos or video using maximum resolution:
import cv2
HIGH_VALUE = 10000
WIDTH = HIGH_VALUE
HEIGHT = HIGH_VALUE
capture = cv2.VideoCapture(0)
fourcc = cv2.VideoWriter_fourcc(*'XVID')
capture.set(cv2.CAP_PROP_FRAME_WIDTH, WIDTH)
capture.set(cv2.CAP_PROP_FRAME_HEIGHT, HEIGHT)
width = int(capture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
print(width,height)
For cv2 just change to this.
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 800)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 600)
OpenCV now only allows only these Resolutions.
'320.0x240.0': 'OK'
'640.0x480.0': 'OK'
'1280.0x720.0': 'OK'
Source: https://www.learnpythonwithrune.org/find-all-possible-webcam-resolutions-with-opencv-in-python/
These are the common resolutions. It may support some more resolutions, you can check.
If you can not find the supported resolution. You can also use:
frame = imutils(frame, width = 720)
This will set you to the nearer supported resolution.
Note: use you required value for the width and will set it to the nearer supported resolution and then you can check the supported resolution by using:
print(frame.shape)
imutils method is completely based on experience and testing around.
Related
I am navigating the SDK provided by a microscope camera supplier and face a challenge understanding how the image is processed. For those interested, the camera is this on: http://www.touptek.com/product/showproduct.php?id=285&lang=en.
In the unique python example provided by the supplier, here is how the image is generated and displayed in a Qt GUI interface:
First, the script finds the camera using the uvcham.py file provided in the SDK.
a = uvcham.Uvcham.enum()
self.hcam = uvcham.Uvcham.open(a[0].id)
Then, the scripts extract the resolution, width, height, and bufsize (don't know what this is):
self.hcam.put(uvcham.UVCHAM_FORMAT, 2) # format: RGB888
res = self.hcam.get(uvcham.UVCHAM_RES)
self.w = self.hcam.get(uvcham.UVCHAM_WIDTH | res)
self.h = self.hcam.get(uvcham.UVCHAM_HEIGHT | res)
bufsize = ((self.w * 24 + 31) // 32 * 4) * self.h
self.buf = bytes(bufsize)
self.hcam.start(self.buf, self.cameraCallback, self)
(see the following chunk of code for self.cameraCallback)
It then emits the image (eventImage is a pyqtSignal())
#staticmethod
def cameraCallback(nEvent, ctx):
if nEvent == uvcham.UVCHAM_EVENT_IMAGE:
ctx.eventImage.emit()
and lastly it displays the image in the Qt GUI using the following code:
#pyqtSlot()
def eventImageSignal(self):
if self.hcam is not None:
self.total += 1
self.setWindowTitle('{}: {}'.format(self.camname, self.total))
img = QImage(self.buf, self.w, self.h, (self.w * 24 + 31) // 32 * 4, QImage.Format_RGB888)
self.label.setPixmap(QPixmap.fromImage(img))
So, now my question is what if I want to save a video? How can I handle the different part of this script to store somewhere, and properly, the multiple frames generated by the camera, and which are here displayed in a Qt GUI, and later turn these multiples frames into a video?
I know how to do such operations with OpenCV, but here it's different I think, the image is generated using its buf, width, height, and some calculations that I don't understand.
I tried using OpenCV directly to handle this camera using the following classical function:
video = cv2.VideoCapture(1+cv2.CAP_ANY)
and the problem is that the camera is not stable when handled that way through openCV and directshow (a few frame sometimes display X/Y offset and/or color issue). In contrast, the camera, and the images it produced, are very stable when using the method described in the first part of my post (with the Qt GUI).
Have anyone here ever worked with such a way to generate image from a camera using resolution, width, height, buf (?), and could help me navigate this? My final objective here is to be able to record videos using this camera via an automated method (meaning through lines of code, so my need to understand these lines of codes rather than using the manual software provided by the supplier).
Thank you in advance for your help
Iam trying to use pygame camera to take a picture from my computer cam, but for some reason it always comes out as completely black.
After some research I found out the brightness of the camera (camera.get_controls()) is set to -1 and can't be changed with camera.set_controls()
Code:
# initializing the camera
pygame.camera.init()
# make the list of all available cameras
camlist = pygame.camera.list_cameras()
# if camera is detected or not
if camlist:
# initializing the cam variable with default camera
cam = pygame.camera.Camera(camlist[3], (2952, 1944), "RGB")
# opening the camera
cam.start()
# sets the brightness to 1
cam.set_controls(True, False, 1)
# capturing the single image
image = cam.get_image()
# saving the image
pygame.image.save(image, str(cam.get_controls()) + ".png")
else:
print("No camera on current device")
Pygame.camera tries to be a non blocking API, so it can be used well in a game loop.
The problem is that you open the camera and immediately call get_image(). But no image has been generated yet, so it returns a fully black Surface. Maybe it should be changed on pygame’s end to be a blocking call for the first get_image()
On my system, it works if I put a one second delay before get_image().
You can also use the query_image() function to return if a frame is ready.
*Asterisk: This may differ between operating systems, as it uses different backends on different systems.
The problem
Im trying to capture my desktop with OpenCV and have Tesseract OCR find text and set it as a variable, for example, if I was going to play a game and have the capturing frame over a resource amount, I want it to print that and use it. A perfect example of this is a video by Micheal Reeves
where whenever he loses health in a game it shows it and sends it to his Bluetooth enabled airsoft gun to shoot him. So far I have this:
# imports
from PIL import ImageGrab
from PIL import Image
import numpy as np
import pytesseract
import argparse
import cv2
import os
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter("output.avi", fourcc, 5.0, (1366, 768))
while(True):
x = 760
y = 968
ox = 50
oy = 22
# screen capture
img = ImageGrab.grab(bbox=(x, y, x + ox, y + oy))
img_np = np.array(img)
frame = cv2.cvtColor(img_np, cv2.COLOR_BGR2RGB)
cv2.imshow("Screen", frame)
out.write(frame)
if cv2.waitKey(1) == 0:
break
out.release()
cv2.destroyAllWindows()
it captures real-time and displays it in a window but I have no clue how to make it recognise the text every frame and output it.
any help?
It's fairly simple to grab the screen and pass it to tesseract for OCRing.
The PIL (pillow) library can grab the frames easily on MacOS and Windows. However, this feature has only recently been added for Linux, so the code below works around it not existing. (I'm on Ubuntu 19.10 and my Pillow does not support it).
Essentially the user starts the program with screen-region rectangle co-ordinates. The main loop continually grabs this area of the screen, feeding it to Tesseract. If Tesseract finds any non-whitespace text in that image, it is written to stdout.
Note that this is not a proper Real Time system. There is no guarantee of timeliness, each frame takes as long as it takes. Your machine might get 60 FPS or it might get 6. This will also be greatly influenced by the size of the rectangle your ask it to monitor.
#! /usr/bin/env python3
import sys
import pytesseract
from PIL import Image
# Import ImageGrab if possible, might fail on Linux
try:
from PIL import ImageGrab
use_grab = True
except Exception as ex:
# Some older versions of pillow don't support ImageGrab on Linux
# In which case we will use XLib
if ( sys.platform == 'linux' ):
from Xlib import display, X
use_grab = False
else:
raise ex
def screenGrab( rect ):
""" Given a rectangle, return a PIL Image of that part of the screen.
Handles a Linux installation with and older Pillow by falling-back
to using XLib """
global use_grab
x, y, width, height = rect
if ( use_grab ):
image = PIL.ImageGrab.grab( bbox=[ x, y, x+width, y+height ] )
else:
# ImageGrab can be missing under Linux
dsp = display.Display()
root = dsp.screen().root
raw_image = root.get_image( x, y, width, height, X.ZPixmap, 0xffffffff )
image = Image.frombuffer( "RGB", ( width, height ), raw_image.data, "raw", "BGRX", 0, 1 )
# DEBUG image.save( '/tmp/screen_grab.png', 'PNG' )
return image
### Do some rudimentary command line argument handling
### So the user can speicify the area of the screen to watch
if ( __name__ == "__main__" ):
EXE = sys.argv[0]
del( sys.argv[0] )
# EDIT: catch zero-args
if ( len( sys.argv ) != 4 or sys.argv[0] in ( '--help', '-h', '-?', '/?' ) ): # some minor help
sys.stderr.write( EXE + ": monitors section of screen for text\n" )
sys.stderr.write( EXE + ": Give x, y, width, height as arguments\n" )
sys.exit( 1 )
# TODO - add error checking
x = int( sys.argv[0] )
y = int( sys.argv[1] )
width = int( sys.argv[2] )
height = int( sys.argv[3] )
# Area of screen to monitor
screen_rect = [ x, y, width, height ]
print( EXE + ": watching " + str( screen_rect ) )
### Loop forever, monitoring the user-specified rectangle of the screen
while ( True ):
image = screenGrab( screen_rect ) # Grab the area of the screen
text = pytesseract.image_to_string( image ) # OCR the image
# IF the OCR found anything, write it to stdout.
text = text.strip()
if ( len( text ) > 0 ):
print( text )
This answer was cobbled together from various other answers on SO.
If you use this answer for anything regularly, it would be worth adding a rate-limiter to save some CPU. It could probably sleep for half a second every loop.
Tesseract is a single-use command-line application using files for input and output, meaning every OCR call creates a new process and initializes a new Tesseract engine, which includes reading multi-megabyte data files from disk. Its suitability as a real-time OCR engine will depend on the exact use case—more pixels requires more time—and which parameters are provided to tune the OCR engine. Some experimentation may ultimately be required to tune the engine to the exact scenario, but also expect the time required to OCR for a frame may exceed the frame time and a reduction in the frequency of OCR execution may be required, i.e. performing OCR at 10-20 FPS rather than 60+ FPS the game may be running at.
In my experience, a reasonably complex document in a 2200x1700px image can take anywhere from 0.5s to 2s using the english fast model with 4 cores (the default) on an aging CPU, however this "complex document" represents the worst-case scenario and makes no assumptions on the structure of the text being recognized. For many scenarios, such as extracting data from a game screen, assumptions can be made to implement a few optimizations and speed up OCR:
Reduce the size of the input image. When extracting specific information from the screen, crop the grabbed screen image as much as possible to only that information. If you're trying to extract a value like health, crop the image around just the health value.
Use the "fast" trained models to improve speed at the cost of accuracy. You can use the -l option to specify different models and the --testdata-dir option to specify the directory containing your model files. You can download multiple models and rename the files to "eng_fast.traineddata", "eng_best.traineddata", etc.
Use the --psm parameter to prevent page segmentation not required for your scenario. --psm 7 may be the best option for singular pieces of information, but play around with different values and find which works best.
Restrict the allowed character set if you know which characters will be used, such as if you're only looking for numerics, by changing the whitelist configuration value: -c tessedit_char_whitelist='1234567890'.
pytesseract is the best way to get started with implementing Tesseract, and the library can handle image input directly (although it saves the image to a file before passing to Tesseract) and pass the resulting text back using image_to_string(...).
import pytesseract
# Capture frame...
# If the frame requires cropping:
frame = frame[y:y + h, x:x + w]
# Perform OCR
text = pytesseract.image_to_string(frame, lang="eng_fast" config="--psm 7")
# Process the result
health = int(text)
Alright, I was having the same issue as you so I did some research into it and I'm sure that I found the solution! First, you will need these libraries:
cv2
pytesseract
Pillow(PIL)
numpy
Installation:
To install cv2, simply use this in a command line/command prompt: pip install opencv-python
Installing pytesseract is a little bit harder as you also need to pre-install Tesseract which is the program that actually does the ocr reading. First, follow this tutorial on how to install Tesseract. After that, in a command line/command prompt just use the command: pip install pytesseract
If you don't install this right you will get an error using the ocr
To install Pillow use the following command in a command-line/command prompt: python -m pip install --upgrade Pillow or python3 -m pip install --upgrade Pillow. The one that uses python works for me
To install NumPy, use the following command in a command-line/command prompt: pip install numpy. Thought it's usually already installed in most python libraries.
Code:
This code was made by me and as of right now it works how I want it to and similar to the effect that Michal had. It will take the top left of your screen, take a recorded image of it and show a window display of the image it's currently using OCR to read. Then in the console, it is printing out the text that it read on the screen.
# OCR Screen Scanner
# By Dornu Inene
# Libraries that you show have all installed
import cv2
import numpy as np
import pytesseract
# We only need the ImageGrab class from PIL
from PIL import ImageGrab
# Run forever unless you press Esc
while True:
# This instance will generate an image from
# the point of (115, 143) and (569, 283) in format of (x, y)
cap = ImageGrab.grab(bbox=(115, 143, 569, 283))
# For us to use cv2.imshow we need to convert the image into a numpy array
cap_arr = np.array(cap)
# This isn't really needed for getting the text from a window but
# It will show the image that it is reading it from
# cv2.imshow() shows a window display and it is using the image that we got
# use array as input to image
cv2.imshow("", cap_arr)
# Read the image that was grabbed from ImageGrab.grab using pytesseract.image_to_string
# This is the main thing that will collect the text information from that specific area of the window
text = pytesseract.image_to_string(cap)
# This just removes spaces from the beginning and ends of text
# and makes the the it reads more clean
text = text.strip()
# If any text was translated from the image, print it
if len(text) > 0:
print(text)
# This line will break the while loop when you press Esc
if cv2.waitKey(1) == 27:
break
# This will make sure all windows created from cv2 is destroyed
cv2.destroyAllWindows()
I hope this helped you with what you were looking for, it sure did help me!
I just got a highend 1080p webcam, opening it in the "camera" app of windows 10 display it flawlessly, at 25 or 30fps, however when using opencv it's very laggy, I put a timer in the loop while disabling the display and I have around 200ms between each frame.
Why?
import numpy as np
import cv2
import time
def getAvailableCameraIds(max_to_test):
available_ids = []
for i in range(max_to_test):
temp_camera = cv2.VideoCapture(i)
if temp_camera.isOpened():
temp_camera.release()
print "found camera with id {}".format(i)
available_ids.append(i)
return available_ids
def displayCameraFeed(cameraId, width, height):
cap = cv2.VideoCapture(cameraId)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, height)
while(True):
start = time.time()
ret, frame = cap.read()
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
end = time.time()
print "time to read a frame : {} seconds".format(end-start)
#DISABLED
#cv2.imshow('frame', frame)
#if cv2.waitKey(1) & 0xFF == ord('q'):
#break
cap.release()
cv2.destroyAllWindows()
#print getAvailableCameraIds(100)
displayCameraFeed(0, 1920, 1080)
Thanks
Opencv 3.1 on a windows 10 x64, with python 2.7 x64
I've faced the same problem on my linux system where I had 150ms delay between frames. In my case, the problem was that the Auto Exposure feature of the camera was ON, which increased exposure times, causing the delay.
Turning OFF auto exposure reduced delay to 49~51 ms
Here is a link from OBSProject that talks about it https://obsproject.com/forum/threads/getting-the-most-out-of-your-webcam.1036/
I'm not sure how you'd do this on a windows machine, a Google search revealed that changing it on your Skype settings changes it globally. (If you have bundled software with your camera, you could probably change it there as well.)
As for a linux machine, running v4l2-ctl --list-ctrls lists the features of your camera that you can modify.
I set exposure_auto_priority (bool) to 0 which turns OFF Auto Exposure.
for me this did the trick on Windows 10 with a Logitech c922.
The order in which the methods a called seem to have an impact.
(i have 'import cv' instead of 'import cv2')
cap = cv.VideoCapture(camera_index + cv.CAP_DSHOW)
cap.set(cv.CAP_PROP_FRAME_WIDTH, 1920)
cap.set(cv.CAP_PROP_FRAME_HEIGHT, 1080)
cap.set(cv.CAP_PROP_FPS, 30)
cap.set(cv.CAP_PROP_FOURCC, cv.VideoWriter_fourcc(*'MJPG'))
I've showed in various ways how to take images with a webcam in Python (see How can I take camera images with Python?). You can see that the images taken with Python are considerably darker than images taken with JavaScript. What is wrong?
Image example
The image on the left was taken with http://martin-thoma.com/html5/webcam/, the one on the right with the following Python code. Both were taken with the same (controlled) lightning situation (it was dark outside and I only had some electrical lights on) and the same webcam.
Code example
import cv2
camera_port = 0
camera = cv2.VideoCapture(camera_port)
return_value, image = camera.read()
cv2.imwrite("opencv.png", image)
del(camera) # so that others can use the camera as soon as possible
Question
Why is the image taken with Python image considerably darker than the one taken with JavaScript and how do I fix it?
(Getting a similar image quality; simply making it brighter will probably not fix it.)
Note to the "how do I fix it": It does not need to be opencv. If you know a possibility to take webcam images with Python with another package (or without a package) that is also ok.
Faced the same problem. I tried this and it works.
import cv2
camera_port = 0
ramp_frames = 30
camera = cv2.VideoCapture(camera_port)
def get_image():
retval, im = camera.read()
return im
for i in xrange(ramp_frames):
temp = camera.read()
camera_capture = get_image()
filename = "image.jpg"
cv2.imwrite(filename,camera_capture)
del(camera)
I think it's about adjusting the camera to light. The former
former and later images
I think that you have to wait for the camera to be ready.
This code works for me:
from SimpleCV import Camera
import time
cam = Camera()
time.sleep(3)
img = cam.getImage()
img.save("simplecv.png")
I took the idea from this answer and this is the most convincing explanation I found:
The first few frames are dark on some devices because it's the first
frame after initializing the camera and it may be required to pull a
few frames so that the camera has time to adjust brightness
automatically.
reference
So IMHO in order to be sure about the quality of the image, regardless of the programming language, at the startup of a camera device is necessary to wait a few seconds and/or discard a few frames before taking an image.
Tidying up Keerthana's answer results in my code looking like this
import cv2
import time
def main():
capture = capture_write()
def capture_write(filename="image.jpeg", port=0, ramp_frames=30, x=1280, y=720):
camera = cv2.VideoCapture(port)
# Set Resolution
camera.set(3, x)
camera.set(4, y)
# Adjust camera lighting
for i in range(ramp_frames):
temp = camera.read()
retval, im = camera.read()
cv2.imwrite(filename,im)
del(camera)
return True
if __name__ == '__main__':
main()