Before a couple days ago I had never used OpenCV or done any video processing. I've been asked to computationally overlay a video based upon some user inputs and build a new video with the overlays incorporated for download in AVI format. Essentially, the goal is to have a form that takes as input 3 images (icon, screenshot #1, screenshot #1) and 3 text inputs and overlays the original video with them. Here is a link to the video. When the video is running you'll notice the icon in the center of the iPhone at the beginning is stretched and pulled. I've been iteratively testing OpenCV methods by breakding down the video frame by frame and doing stuff to each one, then rebuilding (obviously this is probably the only way to successfully rebuild a video with OpenCV with edits, but anyway). this video is one I overlayed a colored circle that moves back and forth with.
# the method I've been using
import cv2 as cv
import numpy as np
cap = cv.VideoCapture('the_vid.avi')
flag, frame = cap.read()
width = np.size(frame,1)
height = np.size(frame,0)
writer = cv.VideoWriter('output.avi', cv.VideoWriter_fourcc('I','4','2','0'), fps=35, (width,height), 1)
while True:
flag, frame = cap.read()
if flag == 0:
break
x = width/2
y = height/2
# add a line or circle or something
origin radius
cv.circle(frame, (x,y), 20, (0,0,255), -1)
# write our new frame
writer.write(frame)
Now we've got an output of this very large uncompressed AVI file which can be compressed using ffmpeg
ffmpeg -i output.avi -vcodec msmpeg4v2 compressed_output.avi
Ok, so that's the method I've been using to rebuild this video, and from that method I'm not seeing it possible to take a static image and stretch it around like is shown in the first 90 frames or so. The only other possibility I saw was maybe doing something like below. If you can tell me if there is even a way to implement this pseudo-code that would be awesome, I'm thinking it will be extremely difficult:
# example for the first image and first few seconds of video only
first_image = cv.imread('user_uploaded_icon.png')
flag, first_frame = cap.read()
# section of the frame that contains the original icon
the_section = algorithm_to_get_array_of_original_icon_in_first_frame(first_frame)
rows, cols = the_section.shape
# somehow find the array within the first image that is the same size as the_section
# containing JUST the icon
icon = array_of_icon(first_image)
# build a blank image with the size of the original icon in the current frame
blank_image = np.zeros((rows,cols,3),np.uint8)
for i in xrange(row):
for j in xrange(col):
blank_image[i,j] = icon[i,j]
What seems like it might not work about this is the fact that the_section in the first_frame will be stretched to different dimensions than the static image...so I'm not sure if there is ANY viable way to handle this. I appreciate all the time saving help in advance.
Related
I recently discovered the awesome pyvips package and would like to use it to analyze data that was taken on a homebuilt slide scanner (not built by me). I scan about 4000 tiles of 1024x1024 pixels each along the edges of a square-shaped sample (the center part of the sample is not recorded). All tiles are saved as a single binary file. I have written a python class that returns a desired tile as a numpy array from the binary file and which also gives the (x, y) coordinates of the specific tile. Unfortunately, the tiles are not arranged on a grid.
I first determine the total width and height of the full image and initialize a black image of the correct size and subsequently place the tiles at the correct locations using the insert function. The composite image is about 120k x 120k pixels, but most of the image is empty. Finally, I plot the resulting image using matplotlib.
import pyvips
import numpy as np
import matplotlib.pyplot as plt
# class to read tiles from data file
sr = TileReader("path_to_scan_file")
# some stuff to determine the width and height of the total image...
# create empty image for inserting tiles
im = pyvips.Image.black(width, height)
# loop over all tiles and place tile at correct position
for i in range(sr.num_tiles()):
frame, coord = sr.ReadFrame(i)
tile = pyvips.Image.new_from_array(frame)
im = im.insert(tile, coord[0], coord[1])
# plot result
plt.imshow(im.numpy())
plt.show()
# save file
im.write_to_file('full_image.tiff')
Generating the full image in the loop seems to be very fast. However, plotting or saving the data is not. (Obviously,) the plotting only works for a small number of tiles (~10). I also tried saving the data to a pyramidal tiff. However, writing the image took several hours and the generated file seems to be corrupted or too large to be opened. Unfortunately I could not get nip2 installed without admin rights.
I would like to be able to manually select regions of interest of the composite image that I can use for further processing. What is the best/fastest way to interact with the generated image to enable this?
You can use crop to cut out a chunk of the image and pass that on to something else. It won't make the whole thing, it'll just render the bit you need, so it'll be quick.
Something like:
# loop over all tiles and place at correct position
# do this once on startup
for i in range(sr.num_tiles()):
frame, coord = sr.ReadFrame(i)
tile = pyvips.Image.new_from_array(frame)
im = im.insert(tile, coord[0], coord[1])
# left, top, width, height
# hook these numbers up to eg. a scrollbar
# do the crop again for each scrollbar movement
tile = im.crop(0, 0, 1000, 1000)
# plot result
plt.imshow(tile.numpy())
plt.show()
If you want to get fancy, the best solution is probably vips_sink_screen():
https://www.libvips.org/API/current/libvips-generate.html#vips-sink-screen
That'll let you generate pixels from any pipeline asynchronously as you pan and zoom, but it needs C, sadly. There's an example image viewer using this API here:
https://github.com/jcupitt/vipsdisp
That's running vips_sink_screen() in the background to generate GPU textures at various scales, then using that set of textures to paint the screen at 60 fps (ish) as you pan and zoom around. It can display huge dynamically computed images very quickly.
As I am trying to create a gif file, the file has been created successfully but it is pixelating. So if anyone can help me out with how to increase resolution.
.Here is the code:-
import PIL
from PIL import Image
import NumPy as np
image_frames = []
days = np.arange(0, 12)
for i in days:
new_frame = PIL.Image.open(
r"C:\Users\Harsh Kotecha\PycharmProjects\pythonProject1\totalprecipplot" + "//" + str(i) + ".jpg"
)
image_frames.append(new_frame)
image_frames[0].save(
"precipitation.gif",
format="GIF",
append_images=image_frames[1:],
save_all="true",
duration=800,
loop=0,
quality=100,
)
Here is the Gif file:-
Here are the original images:-
image1
image2
iamge3
Updated Answer
Now that you have provided some images I had a go at disabling the dithering:
#!/usr/bin/env python3
from PIL import Image
# User editable values
method = Image.FASTOCTREE
colors = 250
# Load images precip-01.jpg through precip-12.jpg, quantize to common palette
imgs = []
for i in range(1,12):
filename = f'precip-{i:02d}.jpg'
print(f'Loading: {filename}')
try:
im = Image.open(filename)
pImage = im.quantize(colors=colors, method=method, dither=0)
imgs.append(pImage)
except:
print(f'ERROR: Unable to open {filename}')
imgs[0].save(
"precipitation.gif",
format="GIF",
append_images=imgs[1:],
save_all="true",
duration=800,
loop=0
)
Original Answer
Your original images are JPEGs which means they likely have many thousands of colours 2. When you make an animated GIF (or even a static GIF) each frame can only have 256 colours in its palette.
This can create several problems:
each frame gets a new, distinct palette stored with it, thereby increasing the size of the GIF (each palette is 0.75kB)
colours get dithered in an attempt to make the image look as close as possible to the original colours
different colours can get chosen for frames that are nearly identical which means colours flicker between distinct shades on successive frames - can cause "twinkling" like stars
If you want to learn about GIFs, you can learn 3,872 times as much as I will ever know by reading Anthony Thyssen's excellent notes here, here and here.
Your image is suffering from the first problem because it has 12 "per frame" local colour tables as well as a global colour table3. It is also suffering from the second problem - dithering.
To avoid the dithering, you probably want to do some of the following:
load all images and append them all together into a 12x1 monster image, and find the best palette for all the colours. As all your images are very similar, I think that you'll get away with generating a palette just from the first image without needing to montage all 12 - that'll be quicker
now palettize each image, with dithering disabled and using the single common palette
save your animated sequence of the palletised images, pushing in the singe common palette from the first step above
2: You can count the number of colours in an image with ImageMagick, using:
magick YOURIMAGE -format %k info:
3: You can see the colour tables in a GIF with gifsicle using:
gifsicle -I YOURIMAGE.GIF
I am currently working on making a dataset for a computer vision problem. I wanted to add some data to the previous ones I had. So I wanted to get around ~3000 frames from 2 different videos.
I used openCV because I knew the capture feature but I'm not sure about this because my memory is really exploding. I was using pickle file for the previous dataset that was already processed and I had no problem having that much information with my memory. Maybe my code is horrible without noticing it...
Here is my code to get around 3000 frames from the videos :
import cv2
video_name1 = "videosDataset/AMAZExNHORMS2019_Lo-res.mp4"
video_name2 = "videosDataset/CAMILLATHULINS2019_Lo-res.mp4"
def getAllFrames(videoName):
video_name = videoName
property_id =int(cv2.CAP_PROP_FRAME_COUNT)
cap = cv2.VideoCapture(video_name) #video_name is the video being called
frames = []
length = int(cv2.VideoCapture.get(cap, property_id))
print(length)
minL = int(length/2)
maxL = int(2*length/3)
print(minL,maxL)
for i in range(minL,maxL):
cap.set(1,i); # Where frame_no is the frame you want
ret, frame = cap.read() # Read the frame
frames.append(frame)
print(str(round((i-minL)/(maxL-minL)*100, 2))+'%')
return frames
frames1 = getAllFrames(video_name1)
I would like to know if there is a better way to do this. Thank you
The problem here is the compresion - when read, each frame is stored as numpy array which is rather expensive. For example - one RGB frame of 1280 x 720 pixels is about 200 kB in jpg format, 1.2 MB in png format, 2.7 MB when stored in numpy uint8 array and 22 MB when stored in numpy float64 array.
Easiest solution is to store each frame to disk as jpg image (e.g. by cv2.imwrite) instead of creating an array with all frames.
Assuming that by making a dataset, you mean that want to save all the frames individually for use in the dataset, the easiest option would probably be to use a tool like ffmpeg to do so. See here for an example to do so. Ffmpeg will support a number of image file formats, probably including the format you want to save the image in.
I just bought a FLIR BlackFlyS USB3.0 camera. I can grap frames from the camera but I am not able to use that frame with opencv without saving them first. Is there anyone who knows how to convert them to use in opencv?
I searched on the internet about everything that include "PySpin" word and found this book.
I have tried to use PySpinCapture which is mentioned in this book but I couldn't figure it out anyway.
capture = PySpinCapture.PySpinCapture(0, roi=(0, 0, 960, 600),binningRadius=2,isMonochrome=True)
ret, frame = capture.read()
cv2.imshow("image",frame)
cv2.waitKey(0)
I expect the see the image but it throws an error
_PySpin.SpinnakerException: Spinnaker: GenICam::AccessException= Node is not writable. : AccessException thrown in node 'PixelFormat' while calling 'PixelFormat.SetIntValue()' (file 'EnumerationT.h', line 83) [-2006]
terminate called after throwing an instance of 'Spinnaker::Exception'
One year later, and not sure if my response will help, but I figured out that you can just get the RGB numpy array from a PySpin Image by using the GetData() function.
So you could do without the PySpinCapture module and just do something like the following.
import PySpin
import cv2
serial = '18475994' #Probably different for you although I also use a BlackFly USB3.0
system = PySpin.System.GetInstance()
blackFly_list = system.GetCameras()
blackFly = blackFly_list.GetBySerial(serial)
height = blackFly.Height()
width = blackFly.Width()
channels = 1
fourcc = cv2.VideoWriter_fourcc(*'XVID')
out = cv2.VideoWriter('test_vid.avi',fourcc, blackFly.AcquisitionFrameRate(), (blackFly.Width(), blackFly.Height()), False) #The last argument should be True if you are recording in color.
blackFly.Init()
blackFly.AcquisitionMode.SetValue(PySpin.AcquisitionMode_Continuous)
blackFly.BeginAcquisition()
nFrames = 1000
for _ in range(nFrames):
im = blackFly.GetNextImage()
im_cv2_format = im.GetData().reshape(height,width,channels)
# Here I am writing the image to a Video, but once you could save the image as something and just do whatever you want with it.
out.write(im_cv2_format)
im.release()
out.release()
In this code example I want to create an AVI video file with 1000 grabbed frames. The im.GetData() returns a 1-D numpy array which can then be converted into the correct dimensions with reshape. I have seen some talks about using the UMat class, but it does not seem to be necessary to make it work in this case. Perhaps it helps performance, but I am not sure :)
I am trying to do something very simple: to subtract a bg image from a video for object tracking. I understood images can be simple subtracted from one another as follows img3 = img2 - img1. However, even when I start simple with one image, add a black line to it and store it as img2, img3 will not just show the line. When I run the following code
import cv2
img1 = cv2.imread("img1.png")
img2 = cv2.imread("img2.png")
img3 = img2 - img1
cv2.imwrite("img3.png",img3)
with bellow img1 and img2:
I get the image on the left below, instead of the image on the right:
I want to use this method for background extraction in a video, e.g. where I have a bg image file that shows an emtpy scene and a video that shows the same scene with sometimes objects moving in and out of the screen. I use the following code but similarly get a B/W image instead of just the object visible without the scene..
import cv2
import numpy as np
from PIL import Image
capture = cv2.VideoCapture("video.mov")
while True:
f, frame = capture.read()
frame = cv2.GaussianBlur(frame,(15,15),0)
frame = frame - bg
cv2.imshow("window", frame)
ps: I know about automatic background subtraction but I have very good background files and very clear empty scenes with very obvious objects so thought this should easily work!
Update: I have just found out about the PIL ImageChops difference function that works for getting what I want with two images but seems not possible to use with a video opened with opencv. Also would it be possible to do ImageChops.difference(img1,img2) manually with numpy arrays?
The closest to expected result you can get using this code:
img3 = 255 - cv2.absdiff(img1,img2)
This code will give you this:
Note that using only cv2.absdiff(img1,img2) will give the oposite of this result, because basically this operation tells you what is the difference between 2 images - if on some position there is no difference, the result (int this position) is 0.
To achieve "perfect result" (exactly what you expect) you need to apply some thresholding(or some other kind of filter which will erase left part of image).