Numpy averaging a series - python

numpy_frames_original are frames in a video. Firstly, I wanted to find the average of these frames and subtract it from each frame giving numpy_frames. For the problem I am trying to tackle I thought it would be a good idea to find the average of all of these frames, to do this I wrote the code below:
arr=np.zeros((height,width), np.float)
for i in range(0, number_frames):
imarr=np.array(numpy_frames_original[i,:,:].astype(float))
arr=arr+imarr/number_frames
img_avg=np.array(np.round(arr),dtype=np.uint8)
numpy_frames = np.array(np.absolute(np.round(np.array(numpy_frames_original.astype(float))-np.array(img_avg.astype(float)))), dtype=np.uint8)
Now I have decided It would be better not to get an average of all of the frames, but instead for each frame subtract an average of 100 frames closest to it.
I'm not sure how to write this code?
For example for frame 0 it would average frames 0 - 99 and subtract the average. For frame 3 it would also average frames 0 - 99 and subtract, for frames 62 it would average frames 12-112 and subtract.
Thanks

I think this does what you need.
import numpy
# Create some fake data
frame_count = 10
frame_width = 2
frame_height = 3
frames = numpy.random.randn(frame_count, frame_width, frame_height).astype(numpy.float32)
print 'Original frames\n', frames
# Compute the modified frames over a specified range
mean_range_size = 2
assert frame_count >= mean_range_size * 2
start_mean = frames[:2 * mean_range_size + 1].mean(axis=0)
start_frames = frames[:mean_range_size] - start_mean
middle_frames = numpy.empty((frames.shape[0] - 2 * mean_range_size,
frames.shape[1], frames.shape[2]),
dtype=frames.dtype)
for index in xrange(middle_frames.shape[0]):
middle_frames[index] = frames[mean_range_size + index] - \
frames[index:index + 2 * mean_range_size + 1].mean(axis=0)
end_mean = frames[-2 * mean_range_size - 1:].mean(axis=0)
end_frames = frames[-mean_range_size:] - end_mean
modified_frames = numpy.concatenate([start_frames, middle_frames, end_frames])
print 'Modified frames\n', modified_frames
Note the assert, the code will need to be modified if your shortest sequence is shorter than the total range size (e.g. 100 frames).

Related

How to use phase correlation to find the similar part between two 3d matrices in python?

Phase Correlation is calculated as follows:
phase correlation
The task is to detect duplicated content in the 3D domain by cross-correlating small 3D blocks.
R : residual matrix (2096448)
splitting R into non overlapping 3D blocks B of size 30 × 16 × 16
Find phase correlation between R and B
Then define the maximum correlation value obtained for each time position as ctBnm
maximun correlation value
The most prominent peak is due to auto-correlation, i.e., it is located in the exact time position n from
which Bnm starts.
the following is my demo:
# read the video
cap = cv2.VideoCapture('01_forged.mp4')
# video frames
total = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
# initializing video residual matrix
r = np.zeros((total-1,64,48),dtype=np.int64)
# initializing time
t=0
ret, frame = cap.read()
# calculating residual frame
prev = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
while(1):
ret, frame = cap.read()
if ret==False: break
next = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
for i in range(64):
for j in range(48):
r[t][i][j]= int(prev[j*5][i*5])-int(next[j*5][i*5])
t+=1
prev=next
# initializing Bnm (30,16,16)
b1 = np.zeros((30,16,16),dtype=np.int64)
# n = 60 : It means that the block starts at frame 60
for i in range(60,90):
b1[i - 60] = r[i][0:16,0:16]
# calculating phase correlation matrix
# finding Hadamard product a*b requires the 2 matrices to be of similar time * height * width. So I reshaped b1(30,16,16) to b1(210,64,48).
ba = np.zeros((209,64,48),dtype=np.int64)
ba[60:90,0:16,32:48] = b1
# F(Bnm)
FB = np.fft.fftn(ba)
# F(R)*
FR = np.conjugate(np.fft.fftn(r))
# F(Bnm)F(R)*
FX =FB * FR
# |F(Bnm)F(R)*|
FX_Ab = np.absolute(FX)
# F(Bnm)F(R)* / |F(Bnm)F(R)*|
e = np.zeros((209,64,48),dtype=np.cdouble)
np.true_divide(FX,FX_Ab, out=e, where=FX!=0)
# f is the phase correlation matrix
f = np.absolute(np.fft.ifftn(e).real)
# Calculate the maximum value for each time position —— ctBnm
final = np.zeros(209,dtype=np.double)
for i in range(209):
final[i] = np.max(f[i])
# values of ctBnm normalized between 0 and 1
normalfinal = (final-np.min(final))/(np.max(final)-np.min(final))
plt.plot(range(len(normalfinal)),normalfinal)
plt.show()
cap.release()
cv2.destroyAllWindows()
I have the following doubts about what I have done:
I expect that the most prominent peak is due to auto-correlation, it is located in the exact time position n from which Bnm starts. If my code is right, the most prominent peak must be located at the time 60, because the Bnm starts at frame 60. But the result is that the most prominent peak is located at the time 0.
Is it wrong to use np.fft.fftn for 3d Fourier calculation
my result

Slicing audio given video frames

I have audio from a video that I've loaded with PyTorch. Given a starting index and ending index corresponding to the video segment of interest, along with the video FPS and audio sampling rate, how would I go about extracting the slice of audio that matches the segment of interest of the video?
My intuition is to convert frames to time via:
start_time = frame_start / fps
end_time = frame_end / fps
the convert time to sample position with:
start_sample = int(math.floor(start_time * sr))
end_sample = int(math.floor(end_time * sr))
Is this correct? Or is there something I'm missing? I'm worried that there will be loss of information since I'm converting the samples into ints with floor.
Let's say you have
fs = 44100 # audio sampling frequency
vfr = 24 # video frame rate
frame_start = 10 # index of first frame
frame_end = 10 # index of last frame
audio = np.arange(44100) # audio in form of ndarray
you can calculate at which points in time you want to slice the audio
time_start = frame_start / vfr
time_end = frame_end / vfr # or (frame_end + 1) / vfr for inclusive cut
and then to which samples those points in time correspond:
sample_start_idx = int(time_start * fs)
sample_end_idx = int(time_end * fs)
Its up to you if you want to be super-precise and take into account the fact that audio corresponding to a given frame should rather be starting half a frame before a frame and end half a frame after.
In such a case use:
time_start = np.clip((frame_start - 0.5) / vfr, 0, np.inf)
time_end = (frame_end + 0.5) / vfr
Your solution is just fine. Assuming your sample rate is 16000, the flooring will cause a video/audio desynch on the order of 4.166e-05 seconds, which is orders of magnitude below what human ears are able to discern.
import math
fps = 60
frame_start = 121
frame_end = 181
sr=16000
start_time = frame_start / fps
end_time = frame_end / fps
start_sample = int(math.floor(start_time * sr))
end_sample = int(math.floor(end_time * sr))
print(end_time-end_sample/sr) # 4.166666666671759e-05

Eliminating Certain Values in Dataframe

Initial Data
d = {'RedVal':[1,1.1,2,1.5,1.7,2,1,1.1,2,1,1.1,2,2.6,2.5,2.4,2.5], 'GreenVal':[1,1.1,1.1,1,1.1,1.7,1,1.1,1.5,1,1.9,3,2.8,2.7,2.6,2.5],'Frame':[0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3],'Particle':[0,0,0,0,2,2,2,2,3,3,3,3,4,4,4,4] }
testframe = pd.DataFrame(data=d)
testframe
framenot = 2 #set how many frames you would like to get initial ratio for
ratarray = [] #initialize blank ratio array
testframe.sort_values(by =[ 'Particle', 'Frame'])
for particle in range(0,5):
if(testframe['Particle']== particle).any() == False:
particle = particle + 1
else:
newframe = testframe.loc[(testframe['Frame']<= framenot) & (testframe['Particle'] == particle)]
#print(particle)
for i in range(framenot):
#print(i)
GVal = newframe['GreenVal'].values[i]
RVal = newframe['RedVal'].values[i]
ratio = RVal/GVal
#print(RVal)
#print(GVal)
#print(ratio)
ratarray.append(ratio)
i+=1
#print(ratarray)
particle+=1
ratarray = np.array(ratarray)
avgRatios = np.average(ratarray.reshape(-1,framenot), axis = 1)
stdRatios = np.std(ratarray.reshape(-1,framenot), axis = 1)
print(avgRatios) #array with average ratios over set frames starting from initial particle
print(stdRatios)
So far I have code that gives the avg and standard deviation for each particle's ratio of Red/Green over the frames 0 and 1. Now I want to compare this avg ratio to the ratio for the next x frames and eliminate particles where the subsequent frames ratios falls outside the avg+2stdev. Not quite sure how to do this. Any help is appreciated.

Scanning lists more efficiently in python

I have some code, which works as intended, however takes about 4 and a half hours to run, I understand that there are about 50 billion calculations my poor pc needs to do but I thought it would be worth asking!
This code gets an image, and wants to find every possible region of 331*331 pixels in the given image, and find how many black pixels there are in each, I will use this data to create a heatmap of black pixel density, and also a list of all of the values found:
image = Image.open(self.selectedFile)
pixels = list(image.getdata())
width, height = image.size
pixels = [pixels[i * width:(i+1) * width] for i in range(height)]
#print(pixels)
rightShifts = width - 331
downShifts = height - 331
self.totalRegionsLabel['text'] = f'Total Regions: {rightShifts * downShifts}'
self.blackList = [0 for i in range(0, rightShifts*downShifts)]
self.heatMap = [[] for i in range(0, downShifts)]
for x in range(len(self.heatMap)):
self.heatMap[x] = [0 for i in range(0, rightShifts)]
for x in range(rightShifts):
for y in range(downShifts):
blackCount = 0
for z in range(x + 331):
for w in range(y + 331):
if pixels[z][w] == 0:
blackCount += 1
self.blackList[x+1*y] = blackCount
self.heatMap[x][y] = blackCount
print(self.blackList)
You have several problems here, as I pointed out. Your z/w loops are always starting at the upper left, so by the time you get towards the end, you're summing the entire image, not just a 331x331 subset. You also have much confusion in your axes. In an image, [y] is first, [x] is second. An image is rows of columns. You need to remember that.
Here's an implementation as I suggested above. For each column, I do a full sum on the top 331x331 block. Then, for every row below, I just subtract the top row and add the next row below.
self.heatMap = [[0]*rightShifts for i in range(downShifts)]
for x in range(rightShifts):
# Sum up the block at the top.
blackCount = 0
for row in range(331):
for col in range(331):
if pixels[row][x+col] == 0:
blackCount += 1
self.heatMap[0][x] = blackCount
for y in range(1,downShifts):
# To do the next block down, we subtract the top row and
# add the bottom.
for col in range(331):
blackCount += pixels[y+330][x+col] - pixels[y-1][x+col]
self.heatMap[y][x] = blackCount
You could tweak this even more by alternating the columns. So, at the bottom of the first column, scoot to the right by subtracting the first column and adding the next new column. then scoot back up to the top. That's a lot more trouble.
The two innermost for-loops seem to be transformable to some numpy code if using this package is not an issue. It would give something like:
pixels = image.get_data() # it is probably already a numpy array
# Get an array filled with either True or False, with True whenever pixel is black:
pixel_is_black = (pixels[x:(x+331), y:(y+331)] == 0)
pixel_is_black *= 1 # Transform True and False to respectively 1 and 0. Maybe not needed
self.blackList[x+y] = pixel_is_black.sum() # self explanatory
This is the simplest optimization I can think of, you probably can do much better with clever numpy tricks.
I would recommend using some efficient vector computations through the numpy and opencv libraries.
First, binarize your image so that black pixels are set to zero, and any other color pixels (gray to white) are set to 1. Then, apply a 2D filter to the image of shape 331 x 331 where each value in the filter kernel is (1 / (331 x 331) - this will take the average of all the values in each 331x331 area and assign it to the center pixel.
This gives you a heatmap, where each pixel value is the proportion of non-black pixels in the surrounding 331 x 331 region. A darker pixel (value closer to zero) means more pixels in that region are black.
For some background, this approach uses image processing techniques called image binarization and box blur
Example code:
import cv2
import numpy as np
# setting up a fake image, with some white spaces, gray spaces, and black spaces
img_dim = 10000
fake_img = np.full(shape=(img_dim, img_dim), fill_value=255, dtype=np.uint8) # white
fake_img[: img_dim // 3, : img_dim // 3] = 0 # top left black
fake_img[2 * img_dim // 3 :, 2 * img_dim // 3 :] = 0 # bottom right black
fake_img[img_dim // 3 : 2 * img_dim // 3, img_dim // 3 : 2 * img_dim // 3] = 127 # center gray
# show the fake image
cv2.imshow("", fake_img)
cv2.waitKey()
cv2.destroyAllWindows()
# solution to your problem
binarized = np.where(fake_img == 0, 0, 1) # have 0 values where black, 1 values else
my_filter = np.full(shape=(331, 331), fill_value=(1 / (331 * 331))) # set up filter
heatmap = cv2.filter2D(fake_img, 1, my_filter) # apply filter, which takes average of values in 331x331 block
# show the heatmap
cv2.imshow("", heatmap)
cv2.waitKey()
cv2.destroyAllWindows()
I ran this on my laptop, with a huge (fake) image of 10000 x 10000 pixels, almost instantly.
Sorry I should have deleted this post before you all put the effort in, however, some of these workarounds are really smart and interesting, I ended up coming up with a solution independently that is the same as what Tim Robbers first suggested, I used the array I had and built a second one on which every item in a row is the number of black cells preceding it, and then for each row in a region instead of scanning every item, just scan the preceding value and the final value and you are good:
image = Image.open(self.selectedFile).convert('L') #convert to luminance mode as RGB information is irrelevant
pixels = list(image.getdata()) #get the value of every pixel in the image
width, height = image.size
pixels = [pixels[i * width:(i+1) * width] for i in range(height)] #split the pixels array into a two dimensional array with the dimensions to match the image
#This program scans every possible 331*331 square starting from the top left, so it will move right width - 331 pixels and down height - 331 pixels
rightShifts = width - 331
downShifts = height - 331
self.totalRegionsLabel['text'] = f'Total Regions: {rightShifts * downShifts}' #This wont update till the function has completed running
#The process of asigning new values to values in an array is faster than appending them so this is why I prefilled the arrays:
self.heatMap = [[] for i in range(0, downShifts)]
for x in range(len(self.heatMap)):
self.heatMap[x] = [0 for i in range(0, rightShifts)]
cumulativeMatrix = [] #The cumulative matrix replaces each value in each row with how many zeros precede it
for y in range(len(pixels)):
cumulativeMatrix.append([])
cumulativeMatrix[y].append(0)
count = 0
for x in range(len(pixels[y])):
if pixels[y][x] == 0:
count += 1
cumulativeMatrix[y].append(count)
regionCount = 0
maxValue = 0 #this is the lowest possible maximum value
minValue = 109561 #this is the largest possible minimum value
self.blackList = []
#loop through all possible regions
for y in range(downShifts):
for x in range(rightShifts):
blackPixels = 0
for regionY in range(y, y + 331):
lowerLimit = cumulativeMatrix[regionY][x]
upperLimit = cumulativeMatrix[regionY][x+332]
blackPixels += (upperLimit - lowerLimit)
if blackPixels > maxValue:
maxValue = blackPixels
if blackPixels < minValue:
minValue = blackPixels
self.blackList.append(blackPixels)
self.heatMap[y][x] = blackPixels
regionCount += 1
This brought run time to under a minute and thus solved my problem, however, thank you for your contributions I have learned a lot from reading them!
Try to look into the map() function. It uses C to streamline iterations.
You can speed up your for loops like this:
pixels = list(map(lambda i: x[i*width:(i+1)*width], range(height)))

How to find noise point of .wav file. i mean, not remove noise, just when occurred noise by using python

How to find noise point of .wav file. i mean, not remove noise, just when occurred noise
i checked this site that classify dog and cat sound
https://www.kaggle.com/nadir89/classification-logistic-regression-svm-on-mfccs/notebook?select=utils.py
but it didnt work properly...
can you guys give me some advises or other way to find noise point of .wav file
is it available to find noise from sound by using logreg(machine learning)? not removing..
is there any way to find noise point?
Try this
import noisereduce
temp = noisereduce.reduce_noise(noise_clip=noise_clip,audio_clip=temp,verbose=True)
noise_clip small part of the signal(sample of noise, maybe 1s frame_duration)
audio_clip actual audio
signal, fs = librosa.load(path)
signln = len(signal)
avg_energy = np.sum(signal ** 2) / float(signln) #avg_energy of acual signal
f_d = 0.02 #frame duration
perc = 0.01
flag = True
j = 0
f_length = fs * f_d #frame length is `frame per second(fs) * frame_duration(f_d)`
signln = len(signal)
retsig = []
noise = signal[0:441] # just considering first little part as noise
avg_energy = np.sum(signal ** 2) / float(signln)
while j < signln:
subsig = signal[int(j): int(j) + int(f_length)]
average_energy = np.sum(subsig ** 2) / float(len(subsig)) # avg energy of current frame
if average_energy <= avg_energy: #if enegy of the current frame is less than actual signal then then we can confirm that this frame as silence or noise part
if flag: #to get first noise or silence appearing on the signal
noise = subsig #if you want to get all the noise frame, then just create a list and append it(noise_list.append(subsig)) and also don't use the flag condition
flag = False
else: # if avg energy of current frame is grater than actual signal energy then this frame contain the data
retsig.append(subsig) # so you need to add that frame to new variable
j += f_length

Categories

Resources