OpenCV fails to create video from images

OpenCV fails to create video from images - python

This is my first attempt at making a video file and I seem to be very clumsy.
Inspired by these instructions to put several images in a single video, I modified the code by creating a function that can loop through the folder with the images. But it is taking too long. I thought it was because there are many images, but even if I only use two images to do it, it still runs forever.
I get no error message, the script just never stops.
Could anybody please explain what is wrong with my code? There must be something silly which I didn't spot and is making it an infinite loop or something...
import cv2
import os
forexample = "C:/Users/me/Pictures/eg/"
eg = cv2.imread(forexample+'figure.jpg')
height , width , layers = eg.shape
print "ok, got that"
def makeVideo(imgPath, videodir, videoname, width,height):
for img in os.listdir(imgPath):
video = cv2.VideoWriter(videodir+videoname,-1,1,(width,height))
shot = cv2.imread(img)
video.write(shot)
print "one video done"
myexample = makeVideo(forexample,forexample, "example.avi", width, height)
cv2.destroyAllWindows()
myexample.release()
Running on a windows machine, Python 2.7.12, cv2 3.3.0
UPDATE
Eventually created the video using FFmpeg.

When you are running the for-loop, you are creating VideoWriters for every frame with same filename. Therefore it is over-writing the file with the new frame.
So, you have to create the VideoWriter object before entering the for-loop.
But doing that will not make your code working. There are some other errors due to misuse of commands.
First, os.listdir(path) will return list of filenames but not filepaths. Therefore you will need to add the folder path to that file name when you calling file read function (cv2.imread(imgPath+img)).
cv2.VideoWriter() will create the video file in the folder. Therefore it will also be listed in os.listdir(path). So you will need to remove files other than image files that you need. It can be done by checking the file extension.
After writing all the frames to the video, you will need to call the release() function to release the file handle.
And finally, makeVideo() function will not return anything. So there is no need to get it into a variable. (What you have to release() is file handler, but not the function as I said above).
Try the following code..
import cv2
import os
forexample = "C:/Users/me/Pictures/eg/"
eg = cv2.imread(forexample+'figure.jpg')
height , width , layers = eg.shape
print("ok, got that ", height, " ", width, " ", layers)
def makeVideo(imgPath, videodir, videoname, width, height):
video = cv2.VideoWriter(videodir+videoname,-1,1,(width, height))
for img in os.listdir(imgPath):
if not img.endswith('.jpg'):
continue
shot = cv2.imread(imgPath+img)
video.write(shot)
video.release()
print("one video done")
makeVideo(forexample,forexample, "example.avi", width, height)
cv2.destroyAllWindows()

Related

Why cv2.namedWindow is not works?

I try start my code, but process finishes on step with method cv2.namedWindow (without
any errors).
Do you have any suggestions, why it could be so?
import cv2
image_cv2 = cv2.imread('/home/spartak/PycharmProjects/python_base/lesson_016/python_snippets/external_data/girl.jpg')
def viewImage(image, name_of_window):
print('step_1')
cv2.namedWindow(name_of_window, cv2.WINDOW_NORMAL)
print('step_2')
cv2.imshow(name_of_window, image)
print('step_3')
cv2.waitKey(0)
print('step_4')
cv2.destroyAllWindows()
cropped = image_cv2
viewImage(cropped, 'Cropped version')
P.S.:
Also I erased UBUNTU , and installed Fedora.
Instead Pycharm, check programm on VS code.
But nothing changes.
I changed location for picture (girl.jpg) to directory with python document.
But program stops on step1 and waiting something.

I find out the problem.
I started this code in virtual environment.
Apparently, in virtual environment on UBUNTU/FEDORA, opencv have restrictions.

The code completes all 4 steps for me
I think there is a problem with the image path which you took
The function cv2.namedWindow creates a window that can be used as a placeholder for images and trackbars. Created windows are referred to by their names If a window with the same name already exists, the function does nothing.

I do think the function
cv2.destroyAllWindows
This function is deallocating some needed gui elements, I would guess.
So calling this multiple times will produce some trouble.
import cv2
image_cv2 = cv2.imread('foo.jpg')
def viewImage(image, name_of_window):
print('step_1')
cv2.namedWindow(name_of_window)
print('step_2')
cv2.imshow(name_of_window, image)
print('step_3')
cv2.waitKey(0)
#print('step_4')
#cv2.destroyAllWindows()
cropped = image_cv2
viewImage(cropped, 'Cropped version')
#e.g. better usage at the end of the code or section
cv2.destroyAllWindows()

Python 3.7 OpenCV - Slow processing

i'm trying to take pictures with OpenCV 4.4.0.40 and only save them if a switch, read by an Arduino, is press.
So far everything work, but it's super slow, it take about 15 seconds for the Switch value to change.
Arduino = SerialObject()
if os.path.exists(PathCouleur) and os.path.exists(PathGris):
Images = cv2.VideoCapture(Camera)
Images.set(3, 1920)
Images.set(4, 1080)
Images.set(10, Brightness)
Compte = 0
SwitchNumero = 0
while True:
Sucess, Video = Images.read()
cv2.imshow("Camera", Video)
Video = cv2.resize(Video, (Largeur, Hauteur))
Switch = Arduino.getData()
try:
if Switch[0] == "1":
blur = cv2.Laplacian(Video, cv2.CV_64F).var()
if blur < MinBlur:
cv2.imwrite(PathCouleur + ".png", Video)
cv2.imwrite(PathGris + ".png", cv2.cvtColor(Video, cv2.COLOR_BGR2GRAY))
Compte += 1
except IndexError as err:
pass
if cv2.waitKey(40) == 27:
break
Images.release()
cv2.destroyAllWindows()
else:
print("Erreur, aucun Path")
the saved images width are 640 and the height is 480 and the showimage is 1920x1080 but even without the showimage it's slow.
Can someone help me optimize this code please?

I would say that this snippet of code is responsible for the delay, since you have two major calls that depend on external devices to respond (the OpenCV calls and Arduino.getData()).
Sucess, Video = Images.read()
cv2.imshow("Camera", Video)
Video = cv2.resize(Video, (Largeur, Hauteur))
Switch = Arduino.getData()
One solution that comes to mind is to use the multithreading lib and separate the Arduino.getData() call from the main loop cycle and use it in a separate class to run in the background (you should use a sleep on the secondary thread, something like 50 or 100ms delay).
That way you should have a better response on the Switch event, when it tries to read the value on the main loop.

cam=cv2.VideoCapture(0,cv2.CAP_DSHOW)
use cv2.CAP_DSHOW it will improve the loading time of camera becasue it wil give you video feed directly

PyQt5 - fromIccProfile: failed minimal tag size sanity error

I'm using latest PyQt5 5.12.2 and I'm getting a weird message for every JPG picture that I'm showing in my script using QPixmap or QIcon.
qt.gui.icc: fromIccProfile: failed minimal tag size sanity
It isn't causing anything and the script works as it should. The problem is that I'm trying to display a huge amount of jpg pictures at the same time (as a photo gallery) so the window gets unresponsive until all messages are printed for each photo.
I tried for hours to find something useful online but unfortunately, it seems like nearly no one had the same issue. I'm using some PNG files too and they don't raise this error so I'm assuming that the problem is with jpg format. I tried using older pyqt5 versions but the only difference is that they are not printing the message but the problem remains.
Meanwhile, I tried to use this command to mute those messages since there is no use of them but the problem with unresponsive window for a few seconds remains even when it's not printing in the console.
def handler(*args):
pass
qInstallMessageHandler(handler)
EDIT: I tried converting these images to PNG but the error remains. So the JPG format wasn't the problem

I dug more deeply into ICC profiles and colour spaces and it seems like the colour space that your pictures are using is somehow non-standard for PyQt.
My solution is to convert these pictures to an ICC profile that is classical such as sRGB.
Here's an example function:
import io
from PIL import Image, ImageCms
def convert_to_srgb(file_path):
'''Convert PIL image to sRGB color space (if possible)'''
img = Image.open(file_path)
icc = img.info.get('icc_profile', '')
if icc:
io_handle = io.BytesIO(icc) # virtual file
src_profile = ImageCms.ImageCmsProfile(io_handle)
dst_profile = ImageCms.createProfile('sRGB')
img_conv = ImageCms.profileToProfile(img, src_profile, dst_profile)
icc_conv = img_conv.info.get('icc_profile','')
if icc != icc_conv:
# ICC profile was changed -> save converted file
img_conv.save(file_path,
format = 'JPEG',
quality = 50,
icc_profile = icc_conv)
Using PIL library is a fast and effective way how to properly resolve that error.

I am making a GUI Image Viewer with Pyside2 and was having a similar issue.
The images were loading fine and for my case there was no performances issues, but I keep getting these ICC warnings.
And I didn't want to fix the original files, because my app is supposed to be only a viewer.
I don't know it will help for your case, but my solution is to first load the image with pillow ImageQT module
from pathlib import Path
from PIL.ImageQt import ImageQt
def load_image(path):
if Path(path).is_file():
return ImageQt(path)
Then in my QT Widget class that display the image, I load this image on a empty QPixmap:
def on_change(self, path):
pixmap = QtGui.QPixmap()
image = load_image(path)
if image:
pixmap.convertFromImage(image)
if pixmap.isNull():
self.display_area_label.setText('No Image')
else:
self.display_area_label.setPixmap(pixmap)

NoneType' object is not subscriptable

Getting the error bellow in the code
TypeError: 'NoneType' object is not subscriptable
line : crop_img = img[70:300, 70:300]
Can anyone please help me with this?
thanks a lot
img_dofh = cv2.imread("D.png",0)
ret, img = cap.read()
cv2.rectangle(img,(60,60),(300,300),(255,255,2),4) #outer most rectangle
crop_img = img[70:300, 70:300]
crop_img_2 = img[70:300, 70:300]
grey = cv2.cvtColor(crop_img, cv2.COLOR_BGR2GRAY)

You don't show where your img variable comes from. But somehow it is None instead of containing image data.
Often this happens when you write a function that is supposed to return a valid object for img, but you forget to include a return statement in the function, so it automatically returns None instead.
Check the code that creates img.
UPDATE
Responding to your code posting:
It would be helpful if you could provide a minimal, reproducible example. That might look something like this:
import cv2
cap = cv2.VideoCapture(0)
if cap.isOpened():
ret, img = cap.read()
if img is None:
print("img was not returned.")
else:
crop_img = img[70:300, 70:300]
print(crop_img) # should show an array of image data
Looking at the documentation, it appears that your camera may not have grabbed any frames by the time you reach this point in your code. The documentation says "If no frames has been grabbed (camera has been disconnected, or there are no more frames in video file), the methods return false and the functions return NULL pointer." I would bet that the .read() function is returning a NULL pointer, which gets converted to None when it is sent back to Python.
Unfortunately, since no one else has your particular camera setup, other people may have trouble reproducing your problem.
The code above works fine on my MacBook Pro, but I have to give Terminal permission to use the camera the first time I try it. Have you tried restarting your terminal app? Does your program have access to the camera?

Real time OCR in python

The problem
Im trying to capture my desktop with OpenCV and have Tesseract OCR find text and set it as a variable, for example, if I was going to play a game and have the capturing frame over a resource amount, I want it to print that and use it. A perfect example of this is a video by Micheal Reeves
where whenever he loses health in a game it shows it and sends it to his Bluetooth enabled airsoft gun to shoot him. So far I have this:
# imports
from PIL import ImageGrab
from PIL import Image
import numpy as np
import pytesseract
import argparse
import cv2
import os
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter("output.avi", fourcc, 5.0, (1366, 768))
while(True):
x = 760
y = 968
ox = 50
oy = 22
# screen capture
img = ImageGrab.grab(bbox=(x, y, x + ox, y + oy))
img_np = np.array(img)
frame = cv2.cvtColor(img_np, cv2.COLOR_BGR2RGB)
cv2.imshow("Screen", frame)
out.write(frame)
if cv2.waitKey(1) == 0:
break
out.release()
cv2.destroyAllWindows()
it captures real-time and displays it in a window but I have no clue how to make it recognise the text every frame and output it.
any help?

It's fairly simple to grab the screen and pass it to tesseract for OCRing.
The PIL (pillow) library can grab the frames easily on MacOS and Windows. However, this feature has only recently been added for Linux, so the code below works around it not existing. (I'm on Ubuntu 19.10 and my Pillow does not support it).
Essentially the user starts the program with screen-region rectangle co-ordinates. The main loop continually grabs this area of the screen, feeding it to Tesseract. If Tesseract finds any non-whitespace text in that image, it is written to stdout.
Note that this is not a proper Real Time system. There is no guarantee of timeliness, each frame takes as long as it takes. Your machine might get 60 FPS or it might get 6. This will also be greatly influenced by the size of the rectangle your ask it to monitor.
#! /usr/bin/env python3
import sys
import pytesseract
from PIL import Image
# Import ImageGrab if possible, might fail on Linux
try:
from PIL import ImageGrab
use_grab = True
except Exception as ex:
# Some older versions of pillow don't support ImageGrab on Linux
# In which case we will use XLib
if ( sys.platform == 'linux' ):
from Xlib import display, X
use_grab = False
else:
raise ex
def screenGrab( rect ):
""" Given a rectangle, return a PIL Image of that part of the screen.
Handles a Linux installation with and older Pillow by falling-back
to using XLib """
global use_grab
x, y, width, height = rect
if ( use_grab ):
image = PIL.ImageGrab.grab( bbox=[ x, y, x+width, y+height ] )
else:
# ImageGrab can be missing under Linux
dsp = display.Display()
root = dsp.screen().root
raw_image = root.get_image( x, y, width, height, X.ZPixmap, 0xffffffff )
image = Image.frombuffer( "RGB", ( width, height ), raw_image.data, "raw", "BGRX", 0, 1 )
# DEBUG image.save( '/tmp/screen_grab.png', 'PNG' )
return image
### Do some rudimentary command line argument handling
### So the user can speicify the area of the screen to watch
if ( __name__ == "__main__" ):
EXE = sys.argv[0]
del( sys.argv[0] )
# EDIT: catch zero-args
if ( len( sys.argv ) != 4 or sys.argv[0] in ( '--help', '-h', '-?', '/?' ) ): # some minor help
sys.stderr.write( EXE + ": monitors section of screen for text\n" )
sys.stderr.write( EXE + ": Give x, y, width, height as arguments\n" )
sys.exit( 1 )
# TODO - add error checking
x = int( sys.argv[0] )
y = int( sys.argv[1] )
width = int( sys.argv[2] )
height = int( sys.argv[3] )
# Area of screen to monitor
screen_rect = [ x, y, width, height ]
print( EXE + ": watching " + str( screen_rect ) )
### Loop forever, monitoring the user-specified rectangle of the screen
while ( True ):
image = screenGrab( screen_rect ) # Grab the area of the screen
text = pytesseract.image_to_string( image ) # OCR the image
# IF the OCR found anything, write it to stdout.
text = text.strip()
if ( len( text ) > 0 ):
print( text )
This answer was cobbled together from various other answers on SO.
If you use this answer for anything regularly, it would be worth adding a rate-limiter to save some CPU. It could probably sleep for half a second every loop.

Tesseract is a single-use command-line application using files for input and output, meaning every OCR call creates a new process and initializes a new Tesseract engine, which includes reading multi-megabyte data files from disk. Its suitability as a real-time OCR engine will depend on the exact use case—more pixels requires more time—and which parameters are provided to tune the OCR engine. Some experimentation may ultimately be required to tune the engine to the exact scenario, but also expect the time required to OCR for a frame may exceed the frame time and a reduction in the frequency of OCR execution may be required, i.e. performing OCR at 10-20 FPS rather than 60+ FPS the game may be running at.
In my experience, a reasonably complex document in a 2200x1700px image can take anywhere from 0.5s to 2s using the english fast model with 4 cores (the default) on an aging CPU, however this "complex document" represents the worst-case scenario and makes no assumptions on the structure of the text being recognized. For many scenarios, such as extracting data from a game screen, assumptions can be made to implement a few optimizations and speed up OCR:
Reduce the size of the input image. When extracting specific information from the screen, crop the grabbed screen image as much as possible to only that information. If you're trying to extract a value like health, crop the image around just the health value.
Use the "fast" trained models to improve speed at the cost of accuracy. You can use the -l option to specify different models and the --testdata-dir option to specify the directory containing your model files. You can download multiple models and rename the files to "eng_fast.traineddata", "eng_best.traineddata", etc.
Use the --psm parameter to prevent page segmentation not required for your scenario. --psm 7 may be the best option for singular pieces of information, but play around with different values and find which works best.
Restrict the allowed character set if you know which characters will be used, such as if you're only looking for numerics, by changing the whitelist configuration value: -c tessedit_char_whitelist='1234567890'.
pytesseract is the best way to get started with implementing Tesseract, and the library can handle image input directly (although it saves the image to a file before passing to Tesseract) and pass the resulting text back using image_to_string(...).
import pytesseract
# Capture frame...
# If the frame requires cropping:
frame = frame[y:y + h, x:x + w]
# Perform OCR
text = pytesseract.image_to_string(frame, lang="eng_fast" config="--psm 7")
# Process the result
health = int(text)

Alright, I was having the same issue as you so I did some research into it and I'm sure that I found the solution! First, you will need these libraries:
cv2
pytesseract
Pillow(PIL)
numpy
Installation:
To install cv2, simply use this in a command line/command prompt: pip install opencv-python
Installing pytesseract is a little bit harder as you also need to pre-install Tesseract which is the program that actually does the ocr reading. First, follow this tutorial on how to install Tesseract. After that, in a command line/command prompt just use the command: pip install pytesseract
If you don't install this right you will get an error using the ocr
To install Pillow use the following command in a command-line/command prompt: python -m pip install --upgrade Pillow or python3 -m pip install --upgrade Pillow. The one that uses python works for me
To install NumPy, use the following command in a command-line/command prompt: pip install numpy. Thought it's usually already installed in most python libraries.
Code:
This code was made by me and as of right now it works how I want it to and similar to the effect that Michal had. It will take the top left of your screen, take a recorded image of it and show a window display of the image it's currently using OCR to read. Then in the console, it is printing out the text that it read on the screen.
# OCR Screen Scanner
# By Dornu Inene
# Libraries that you show have all installed
import cv2
import numpy as np
import pytesseract
# We only need the ImageGrab class from PIL
from PIL import ImageGrab
# Run forever unless you press Esc
while True:
# This instance will generate an image from
# the point of (115, 143) and (569, 283) in format of (x, y)
cap = ImageGrab.grab(bbox=(115, 143, 569, 283))
# For us to use cv2.imshow we need to convert the image into a numpy array
cap_arr = np.array(cap)
# This isn't really needed for getting the text from a window but
# It will show the image that it is reading it from
# cv2.imshow() shows a window display and it is using the image that we got
# use array as input to image
cv2.imshow("", cap_arr)
# Read the image that was grabbed from ImageGrab.grab using pytesseract.image_to_string
# This is the main thing that will collect the text information from that specific area of the window
text = pytesseract.image_to_string(cap)
# This just removes spaces from the beginning and ends of text
# and makes the the it reads more clean
text = text.strip()
# If any text was translated from the image, print it
if len(text) > 0:
print(text)
# This line will break the while loop when you press Esc
if cv2.waitKey(1) == 27:
break
# This will make sure all windows created from cv2 is destroyed
cv2.destroyAllWindows()
I hope this helped you with what you were looking for, it sure did help me!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.