I am a new to deep learning algorithms and Machine learning as well as working with data. I am currently trying to work with annotated video dataset, I tried to have a simple example on How I should get started. I am aware that to work with video dataset, we will first need to extract the images from videos and then do the image processing. However, as I am new it is still difficult for me to understand the steps. I came accross this link, it is great but the data is really large and it cannot be downloaded on my computer.
https://www.analyticsvidhya.com/blog/2019/09/step-by-step-deep-learning-tutorial-video-classification-python/
Any suggestions to a walk through examples I can use to build my understanding and Know how to deal with these datasets
Here is a way to create synthetic video dataset quickly:
import numpy as np
import skvideo.io as sk
# creating sample video data (Here object is moving towards left)
num_vids = 5
num_imgs = 50
img_size = 50
min_object_size = 1
max_object_size = 5
for i_vid in range(num_vids):
imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0
vid_name = "vid" + str(i_vid) + ".mp4"
w, h = np.random.randint(min_object_size, max_object_size, size=2)
x = np.random.randint(0, img_size - w)
y = np.random.randint(0, img_size - h)
i_img = 0
while x > 0:
imgs[i_img, y : y + h, x : x + w] = 255 # set rectangle as foreground
x = x - 1
i_img = i_img + 1
sk.vwrite(vid_name, imgs.astype(np.uint8))
from IPython.display import Video
Video("vid3.mp4") # the script & video generated should be in same folder
Similarly you can create videos where, object(s) move(s) in other directions.
Related
I am currently working on a school project where I am trying to simulate an LSD trip using a webcam and video processing effects. I am using python and opencv to accomplish this, but I am having trouble figuring out how to create/apply a certain effect that acts like a "drift/morph/melt/flow" to the webcam footage.
I have attached an example of the effect I am trying to achieve. It looks like the image is slowly melting and distorting, almost as if it is being pulled in multiple directions at once.
Example 1
Example 2
Example 3
Example 4
I have looked into various image processing techniques such as warping, affine transformations, and image blending, but I am not sure which method would be best for creating this specific effect. Below are some of the lines of code I have tried (I am very new to coding so I have just been playing around with stuff I have already found made on the internet):
import cv2
import numpy as np
# Capture video from webcam
cap = cv2.VideoCapture(0)
while True:
# Read frame from webcam
ret, frame = cap.read()
# Apply swirling effect
rows, cols = frame.shape[:2]
for i in range(rows):
for j in range(cols):
dx = i - rows // 2
dy = j - cols // 2
distance = np.sqrt(dx**2 + dy**2)
angle = np.arctan2(dy, dx) + distance * 0.1
x = int(rows // 2 + distance * np.cos(angle))
y = int(cols // 2 + distance * np.sin(angle))
if x >= 0 and x < rows and y >= 0 and y < cols:
frame[i, j] = frame[x, y]
# Display the resulting frame
cv2.imshow('Video', frame)
# Break the loop if the user hits 'q'
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release the capture and destroy the window
cap.release()
cv2.destroyAllWindows()
import cv2
from skimage.transform import swirl
# Create a VideoCapture object to access the webcam
cap = cv2.VideoCapture(0)
while True:
# Read a frame from the webcam
_, frame = cap.read()
# Convert the frame to grayscale
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Apply the swirl effect to the frame
swirled = swirl(gray, rotation=0, strength=10, radius=120)
# Display the swirled frame in a window
cv2.imshow('Swirled', swirled)
# Wait for the user to press a key
key = cv2.waitKey(1) & 0xFF
if key == ord('q'):
break
# Release the VideoCapture object and destroy all windows
cap.release()
cv2.destroyAllWindows()
I have also found a link with someone achieved a similar effect to what I am looking for using a program called TouchDesigner
Any advice or guidance on how to accomplish this using python and opencv and any other libraries I might need would be greatly appreciated.
Here is one way to create an animated GIF from one single image (or video frame) in Python/OpenCV/PIL.
Read the input image
Set parameters
Create X and Y ramps
Loop over the input creating sinusoids and incrementing the phase
Use remap to warp the input according to the sinusoids in X and Y
Convert the image to PIL format and save the frames in a list
When loop is finished, save the frames from the list to an animated GIF using PIL
Input:
import numpy as np
import cv2
from PIL import Image
img = cv2.imread("bluecar_sm.jpg")
# get dimensions
h, w = img.shape[:2]
# set wavelength
wave_x = 2*w
wave_y = h
# set amount, number of frames and delay
amount_x = 10
amount_y = 5
num_frames = 100
delay = 50
border_color = (128,128,128)
# create X and Y ramps
x = np.arange(w, dtype=np.float32)
y = np.arange(h, dtype=np.float32)
frames = []
# loop and change phase
for i in range(0,num_frames):
# compute phase to increment over 360 degree for number of frames specified so makes full cycle
phase_x = i*360/num_frames
phase_y = phase_x
# create sinusoids in X and Y, add to ramps and tile out to fill to size of image
x_sin = amount_x * np.sin(2 * np.pi * (x/wave_x + phase_x/360)) + x
map_x = np.tile(x_sin, (h,1))
y_sin = amount_y * np.sin(2 * np.pi * (y/wave_y + phase_y/360)) + y
map_y = np.tile(y_sin, (w,1)).transpose()
# do the warping using remap
result = cv2.remap(img.copy(), map_x, map_y, cv2.INTER_CUBIC, borderMode = cv2.BORDER_CONSTANT, borderValue=border_color)
# show result
cv2.imshow('result', result)
cv2.waitKey(delay)
# convert to PIL format and save frames
result = cv2.cvtColor(result, cv2.COLOR_BGR2RGB)
pil_result = Image.fromarray(result)
frames.append(pil_result)
# write animated gif from frames using PIL
frames[0].save('bluecar_sm_animation.gif',save_all=True, append_images=frames[1:], optimize=False, duration=delay, loop=0)
Full Sized Animated GIF
Here is a reduced version that has a small enough file size so that it can be displayed directly.
I am building an OCR model where I have performed object detection on the images. I am calling the detection function to detect bounding boxes. I am cropping the images basis bounding boxes. The challenge I am facing is the cropped images are too small for tesseract for data extraction and it is impacting the accuracy quality.
# Crop Image
cropped_image = tf.image.crop_to_bounding_box(image, y_min, x_min, y_max - y_min, x_max - x_min)
# write jpg with pillow
img_pil = Image.fromarray(cropped_image.numpy())
score = bscores[idx] * 100
file_name = OUTPUT_PATH + "somefilename"
img_pil = ImageOps.grayscale(img_pil)
img_pil.save(file_name, quality=95, subsampling=0)
I am running super resolution algorithm over the cropped images to improve the image quality before passing to tesseract, however still not able to achieve good accuracy.
# Create an SR object
sr = dnn_superres.DnnSuperResImpl_create()
# Define model path
model_path = os.path.join(base_path, model + ".pb")
# Extract model name, get the text between '/' and '_'
model_name = model_path.split('\\')[-1].split('_')[0].lower()
# Extract model scale
model_scale = int(model_path.split('\\')[-1].split('_')[1].split('.')[0][1])
# Read the desired model
sr.readModel(model_path)
sr.setModel(model_name, model_scale)
How to fix these cropped images issue so that data extraction is more accurate.
Have you tried OCRing and then cropping, rather than the reverse? It may take longer but it is likely going to be more accurate.
I have a lot of experience using ocrmypdf with PDFPlumber and Regex to parse PDF documents into spreadsheets and this is the process I generally follow:
import pandas as pd
import os
import pdfplumber
import re
#OCR PDF
os.system('ocrmypdf --force-ocr --deskew path/to/file.pdf path/to/file.pdf')
text = ''
with pdfplumber.open('path/to/file.pdf'):
for i in range(0, len(pages)):
page = pdf.pages[i]
text = page.extract_text()
pdf_text = pdf_text + '\n' + text
ids = re.findall('id: (.*)', text)
y = pdf_text.split('\n')
ds = []
for i,j in enumerate(ids):
d = {}
try:
id1 = ids[i]
idx1 = [idx for idx, s in enumerate(y) if id1 in s][0]
try:
id2 = ids[i+1]
idx2 = [idx for idx, s in enumerate(y) if id2 in s][0]
z = y[idx1:idx2]
except:
z = y[idx1:]
except:
pass
chunk = ''
#may need to add if/else or try/except
d['value'] = re.findall('Model name: (.*)', chunk)[0]
#rinse and repeat
ds.append(d)
df = pd.DataFrame(ds)
Not sure how helpful that will be, but it may give you some inspiration.
My goal is to transform a video into a 2D matrix X, where the column vectors represent a frame. So the matrix has the dimension: X.shape ---> (# features of a frame, # total number of frames)
I need this form because I want to apply different ML algorithms on X. To get X I proceed as follows:
upload the video in python with the OpenCV library and save all frames.
Loop{
a) Frame (=3D array with dimensions height, width, depth=3 rbg) is converted into a 1D vector x
b) Append vector x to Matrix X
}
For step 2 b) I use
video_matrix = np.column_stack((video_matrix, frame_vector))
This operation takes about 0.5s for a 640x320 frame. For a small video with 3min (8000 frames) the calculation of X takes almost 150 minutes. Is there a way to make it faster?
Code for the first part:
video = cv2.VideoCapture('path/video.mp4')
if not os.path.exists('data'):
os.makedirs('data')
counter = 0
while(True):
# reading from frame
ret,frame = video.read()
if ret:
# if video is still left continue creating images
name = './data/frame' + str(counter) + '.jpg'
#print ('Creating...' + name)
# writing the extracted images
cv2.imwrite(name, frame)
# increasing counter so that it will
# show how many frames are created
counter += 1
else:
break
# Release all space and windows once done
video.release()
cv2.destroyAllWindows()
And the second part which is to slow
video_matrix = np.zeros(width * height * 3) # initialize 1D array which will become the 2D array; first column will be deleted at the end
for i in range(counter): # loops over the total amount of frames
current_frame = np.asarray(Image.open('./data/frame'+str(i)+'.jpg')) # 3D-array = current frame
frame_vector = image_to_vector(current_frame) #convert frame into a 1D array
video_matrix = np.column_stack((video_matrix, frame_vector)) # append frame x to a matrix X that will represent the video
video_matrix = np.delete(video_matrix, 0, 1) # delete the initialized zero column
do not repeatedly append single frames to your accumulated data. that'll cost you O(n^2), i.e. the program will run ever slower the more it has to read. numpy can't enlarge arrays in-place. it has to create a copy every time. the copying effort increases with every additional frame.
append each frame to a python list. when you're done reading the video, convert the whole list into a numpy array once.
Here's is the Python (Keras) code for 'Video Data' generation & preprocessing before passing through a DL based classification model:
import numpy as np
# preparing dataset
X_train = []
Y_train = []
labels = enumerate([‘left’, ‘right’, ‘up’, ‘down’]) #4 classes
num_vids = 30
num_imgs = 30
img_size = 20
min_object_size = 1
max_object_size = 5
# video frames with left moving object
for i_vid in range(num_vids):
imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0
#vid_name = ‘vid’ + str(i_vid) + ‘.mp4’
w, h = np.random.randint(min_object_size, max_object_size, size=2)
x = np.random.randint(0, img_size — w)
y = np.random.randint(0, img_size — h)
i_img = 0
while x>0:
imgs[i_img, y:y+h, x:x+w] = 255 # set rectangle as foreground
x = x-1
i_img = i_img+1
X_train.append(imgs)
for i in range(0,num_imgs):
Y_train.append(0)
# video frames with right moving object
for i_vid in range(num_vids):
imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0
#vid_name = ‘vid’ + str(i_vid) + ‘.mp4’
w, h = np.random.randint(min_object_size, max_object_size, size=2)
x = np.random.randint(0, img_size — w)
y = np.random.randint(0, img_size — h)
i_img = 0
while x<img_size:
imgs[i_img, y:y+h, x:x+w] = 255 # set rectangle as foreground
x = x+1
i_img = i_img+1
X_train.append(imgs)
for i in range(0,num_imgs):
Y_train.append(1)
# video frames with up moving object
for i_vid in range(num_vids):
imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0
#vid_name = ‘vid’ + str(i_vid) + ‘.mp4’
w, h = np.random.randint(min_object_size, max_object_size, size=2)
x = np.random.randint(0, img_size — w)
y = np.random.randint(0, img_size — h)
i_img = 0
while y>0:
imgs[i_img, y:y+h, x:x+w] = 255 # set rectangle as foreground
y = y-1
i_img = i_img+1
X_train.append(imgs)
for i in range(0,num_imgs):
Y_train.append(2)
# video frames with down moving object
for i_vid in range(num_vids):
imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0
#vid_name = ‘vid’ + str(i_vid) + ‘.mp4’
w, h = np.random.randint(min_object_size, max_object_size, size=2)
x = np.random.randint(0, img_size — w)
y = np.random.randint(0, img_size — h)
i_img = 0
while y<img_size:
imgs[i_img, y:y+h, x:x+w] = 255 # set rectangle as foreground
y = y+1
i_img = i_img+1
X_train.append(imgs)
for i in range(0,num_imgs):
Y_train.append(3)
# data pre-processing
from keras.utils import np_utils
X_train=np.array(X_train, dtype=np.float32) /255
X_train=X_train.reshape(X_train.shape[0], num_imgs, img_size, img_size, 1)
print(X_train.shape)
Y_train=np.array(Y_train, dtype=np.uint8)
Y_train = Y_train.reshape(X_train.shape[0], 1)
print(Y_train.shape)
Y_train = np_utils.to_categorical(Y_train, 4)
It should help you.
I'm trying to create simple Eigenfaces face recognition app using Python and OpenCV. Unfortunately when I try to play app, then I got result:
(-1, '\n', 1.7976931348623157e+308), where -1 stands for not found and confidence... Is quite high...
Is there possibility to put by someone the most basic OpenCV implementation of Eigenfaces?
Here is my approach to the problem. I use Python2, as it is suggested in official documentation (due to some problems with P3).
import cv2 as cv
import numpy as np
import os
num_components = 10
threshold = 10.0
faceRecognizer = cv.face_EigenFaceRecognizer.create(num_components, threshold)
images = []
labels = []
textLabels = ["Person1", "Person2", "Person3"]
destinedIm = cv.imread("images/set1/1.jpg", cv.IMREAD_GRAYSCALE)
destinedSize = destinedIm.shape
#Person1
img = cv.imread("images/set1/1.jpg", cv.IMREAD_GRAYSCALE)
imResized = cv.resize(img, destinedSize)
images.append(imResized)
labels.append(0)
#In similar way I read total 8 images of set1 and 6 images of set2 (2 different people, with label 0 and 1 respectively)
cv.imwrite("images/set2/resized.jpg", imResized) #this doesn't work
numpyImages = np.array(images)
numpyLabels = np.array(labels)
# cv.face_FaceRecognizer.train(self=faceRecognizer, src=images, labels=labels)
faceRecognizer.train(src=images, labels=numpyLabels)
testImage = cv.imread("images/set1/testIm.jpg", cv.IMREAD_GRAYSCALE)
# cv.face_FaceRecognizer.predict()
resultLabel, resultConfidence = faceRecognizer.predict(testImage)
print (resultLabel, "\n" ,resultConfidence)
testImage is another image of person with label = 0;
I would look at the sizing of the testImage. Also, I used a different sizing method than you used and got it working.
face_resized = cv2.resize(img, (299, 299))
I'm working on a little problem in my sparetime involving analysis of some images obtained through a microscope. It is a wafer with some stuff here and there, and ultimately I want to make a program to detect when certain materials show up.
Anyways, first step is to normalize the intensity across the image, since the lens does not give uniform lightning. Currently I use an image, with no stuff on, only the substrate, as a background, or reference, image. I find the maximum of the three (intensity) values for RGB.
from PIL import Image
from PIL import ImageDraw
rmax = 0;gmax = 0;bmax = 0;rmin = 300;gmin = 300;bmin = 300
im_old = Image.open("test_image.png")
im_back = Image.open("background.png")
maxx = im_old.size[0] #Import the size of the image
maxy = im_old.size[1]
im_new = Image.new("RGB", (maxx,maxy))
pixback = im_back.load()
for x in range(maxx):
for y in range(maxy):
if pixback[x,y][0] > rmax:
rmax = pixback[x,y][0]
if pixback[x,y][1] > gmax:
gmax = pixback[x,y][1]
if pixback[x,y][2] > bmax:
bmax = pixback[x,y][2]
pixnew = im_new.load()
pixold = im_old.load()
for x in range(maxx):
for y in range(maxy):
r = float(pixold[x,y][0]) / ( float(pixback[x,y][0])*rmax )
g = float(pixold[x,y][1]) / ( float(pixback[x,y][1])*gmax )
b = float(pixold[x,y][2]) / ( float(pixback[x,y][2])*bmax )
pixnew[x,y] = (r,g,b)
The first part of the code determines the maximum intensity of the RED, GREEN and BLUE channels, pixel by pixel, of the background image, but needs only be done once.
The second part takes the "real" image (with stuff on it), and normalizes the RED, GREEN and BLUE channels, pixel by pixel, according to the background. This takes some time, 5-10 seconds for an 1280x960 image, which is way too slow if I need to do this to several images.
What can I do to improve the speed? I thought of moving all the images to numpy arrays, but I can't seem to find a fast way to do that for RGB images.
I'd rather not move away from python, since my C++ is quite low-level, and getting a working FORTRAN code would probably take longer than I could ever save in terms of speed :P
import numpy as np
from PIL import Image
def normalize(arr):
"""
Linear normalization
http://en.wikipedia.org/wiki/Normalization_%28image_processing%29
"""
arr = arr.astype('float')
# Do not touch the alpha channel
for i in range(3):
minval = arr[...,i].min()
maxval = arr[...,i].max()
if minval != maxval:
arr[...,i] -= minval
arr[...,i] *= (255.0/(maxval-minval))
return arr
def demo_normalize():
img = Image.open(FILENAME).convert('RGBA')
arr = np.array(img)
new_img = Image.fromarray(normalize(arr).astype('uint8'),'RGBA')
new_img.save('/tmp/normalized.png')
See http://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.fromimage.html#scipy.misc.fromimage
You can say
databack = scipy.misc.fromimage(pixback)
rmax = numpy.max(databack[:,:,0])
gmax = numpy.max(databack[:,:,1])
bmax = numpy.max(databack[:,:,2])
which should be much faster than looping over all (r,g,b) triplets of your image.
Then you can do
dataold = scip.misc.fromimage(pixold)
r = dataold[:,:,0] / (pixback[:,:,0] * rmax )
g = dataold[:,:,1] / (pixback[:,:,1] * gmax )
b = dataold[:,:,2] / (pixback[:,:,2] * bmax )
datanew = numpy.array((r,g,b))
imnew = scipy.misc.toimage(datanew)
The code is not tested, but should work somehow with minor modifications.
This is partially from FolksTalk webpage:
from PIL import Image
import numpy as np
# Read image file
in_file = "my_image.png"
# convert('RGB') for PNG file type
image = Image.open(in_file).convert('RGB')
pixels = np.asarray(image)
# Convert from integers to floats
pixels = pixels.astype('float32')
# Normalize to the range 0-1
pixels /= 255.0