Average of elements in a 2d list - python

I have a list like the following:
[[1, 1], [7, 7], [20, 20], [9, 9], [-12, -12]]
And I'm trying to have a new list which has the same number of lists inside, but changes the value of the elements by calculating the average of an element with the element after and before.
What do I mean by that ?:
Let's say I have the sub-list sub = [7,7]
at index 1. I want this list to be [9,9], because sub[1][0] + lst_before_sub[0][0] + lst_after_sub[1][0] = 7 + 1 + 20 = 28, and 28//3 = 9 (I want integer divison).
The ideal output would be:
[[4, 4], [9, 9], [12, 12], [5, 5], [-1, -1]]
I have currently this code:
copy_l = copy.deepcopy(audio_data)
sub_list = []
for i in range(0, len(audio_data)-1):
sub_data = []
for j in range(2):
if i == 0:
audio_data[i][j] += int(audio_data[i+1][j] / 2)
sub_data.append(audio_data[i][j])
elif audio_data[i+1] == audio_data[-1]:
audio_data[i+1][j] = int((audio_data[i+1][j]+audio_data[i][j])/2)
sub_data.append(audio_data[i+1][j])
else:
audio_data = copy_l
audio_data[i][j] = int((audio_data[i-1][j] + audio_data[i][j] + audio_data[i+1][j])/3)
sub_data.append(audio_data[i][j])
sub_list.append(sub_data)
print(sub_list)
where audio_data is the list [[1, 1], [7, 7], [20, 20], [9, 9], [-12, -12]] that I passed in.
(I have separated the average calculation in three cases:
- First element of the list: [1,1] so the average is just 1 + 7 // 2 (no element before [1,1])
- Last element of the list: [-12,-12] so the average is just -12 + 9 // 2 (no element after [-12,-12])
- All the elements in between
)
Problem is, my output (sub_list) is:
[[4, 4], [9, 9], [12, 12], [-1, -1]]
And it seems that [9,9] never turns into [5,5]
Does someone have any idea how to achieve what I want, or even an idea to make it simpler ? I hope I was clear enough, if not feel free to ask me more details, thank you!
EDIT:
I'm seeking a solution without numpy, list comprehension or zip.

Here is a way to do it:
data = [[1, 1], [7, 7], [20, 20], [9, 9], [-12, -12]]
out = []
for i in range(len(data)):
first = max(i -1, 0) # don't have the start of the slice <0
last = min(i + 2, len(data)) # neither beyond the end of the list
mean = [sum(col) // (last-first) for col in zip(*data[first:last])]
out.append(mean)
print(out)
# [[4, 4], [9, 9], [12, 12], [5, 5], [-2, -2]]
We take slices of data around the current item.
Then, we zip the sublists, and we calculate the result on the first (resp. second) values of the sublists.
Also, note that using Python's integer division, we get -2 for -3//2, not -1 as you got by rounding to the closest to 0. If you really want to do that, you'll have to use a custom function for the division.

Here's a NumPy solution:
import numpy as np
def mean3(data):
return np.convolve(np.r_[data[:2].mean(), data, data[-2:].mean()], np.ones(3), 'valid')//3
>>> np.apply_along_axis(mean3, 0, audio_data)
array([[ 4., 4.],
[ 9., 9.],
[12., 12.],
[ 5., 5.],
[-2., -2.]])
Or, if you prefer the int(x/y) definition of integer division:
import numpy as np
def mean3(data):
return (np.convolve(np.r_[data[:2].mean(), data, data[-2:].mean()], np.ones(3), 'valid')/3).astype(int)
>>> np.apply_along_axis(mean3, 0, audio_data)
array([[ 4, 4],
[ 9, 9],
[12, 12],
[ 5, 5],
[-1, -1]])

Related

Get indices of element of one array using indices in another array

Suppose I have an array a of shape (2, 2, 2):
a = np.array([[[7, 9],
[19, 18]],
[[24, 5],
[18, 11]]])
and an array b that is the max of a: b=a.max(-1) (row-wise):
b = np.array([[9, 19],
[24, 18]])
I'd like to obtain the index of elements in b using index in flattened a, i.e. a.reshape(-1):
array([ 7, 9, 19, 18, 24, 5, 18, 11])
The result should be an array that is the same shape with b with indices of b in flattened a:
array([[1, 2],
[4, 6]])
Basically this is the result of maxpool2d when return_indices= True in pytorch, but I'm looking for an implementation in numpy. I've used where but it seems doesn't work, also is it possible to combine finding max and indices in one go, to be more efficient? Thanks for any help!
I have a solution similar to that of Andras based on np.argmax and np.arange. Instead of "indexing the index" I propose to add a piecewise offset to the result of np.argmax:
import numpy as np
a = np.array([[[7, 9],
[19, 18]],
[[24, 5],
[18, 11]]])
off = np.arange(0, a.size, a.shape[2]).reshape(a.shape[0], a.shape[1])
>>> off
array([[0, 2],
[4, 6]])
This results in:
>>> a.argmax(-1) + off
array([[1, 2],
[4, 6]])
Or as a one-liner:
>>> a.argmax(-1) + np.arange(0, a.size, a.shape[2]).reshape(a.shape[0], a.shape[1])
array([[1, 2],
[4, 6]])
The only solution I could think of right now is generating a 2d (or 3d, see below) range that indexes your flat array, and indexing into that with the maximum indices that define b (i.e. a.argmax(-1)):
import numpy as np
a = np.array([[[ 7, 9],
[19, 18]],
[[24, 5],
[18, 11]]])
multi_inds = a.argmax(-1)
b_shape = a.shape[:-1]
b_size = np.prod(b_shape)
flat_inds = np.arange(a.size).reshape(b_size, -1)
flat_max_inds = flat_inds[range(b_size), multi_inds.ravel()]
max_inds = flat_max_inds.reshape(b_shape)
I separated the steps with some meaningful variable names, which should hopefully explain what's going on.
multi_inds tells you which "column" to choose in each "row" in a to get the maximum:
>>> multi_inds
array([[1, 0],
[0, 0]])
flat_inds is a list of indices, from which one value is to be chosen in each row:
>>> flat_inds
array([[0, 1],
[2, 3],
[4, 5],
[6, 7]])
This is indexed into exactly according to the maximum indices in each row. flat_max_inds are the values you're looking for, but in a flat array:
>>> flat_max_inds
array([1, 2, 4, 6])
So we need to reshape that back to match b.shape:
>>> max_inds
array([[1, 2],
[4, 6]])
A slightly more obscure but also more elegant solution is to use a 3d index array and use broadcasted indexing into it:
import numpy as np
a = np.array([[[ 7, 9],
[19, 18]],
[[24, 5],
[18, 11]]])
multi_inds = a.argmax(-1)
i, j = np.indices(a.shape[:-1])
max_inds = np.arange(a.size).reshape(a.shape)[i, j, multi_inds]
This does the same thing without an intermediate flattening into 2d.
The last part is also how you can get b from multi_inds, i.e. without having to call a *max function a second time:
b = a[i, j, multi_inds]
This is a long one-liner
new = np.array([np.where(a.reshape(-1)==x)[0][0] for x in a.max(-1).reshape(-1)]).reshape(2,2)
print(new)
array([[1, 2],
[4, 3]])
However number = 18 is repeated twice; So which index is the target.

Modifying certain array indexes in numpy

I have a NumPy array of shape (5, x, y).
I want to modify every element in the first three channels only by an equation
element = (element - a)/b.
I want the other two channels to remain the same. How would you index the array to achieve this?
Since shape is (channels, x, y) you can use
x = np.random.rand(5,300,400)
a,b = 10,15
x[0:3] = (x[0:3] - a)/b
Generally one would use indices to get a slice of the array with only relevant values.
>>> desired_shape = (5, 2, 2) # xy = (2,2) in this example
>>> ary = np.array(range(5 * 2 * 2))
>>> ary.shape = desired_shape
>>> ary
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]]])
>>> channels_view = ary[:3, ...] # up to 3 in 1st axis, preserve the others
>>> channels_view
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]]])
>>> ary[:3, ...] = (ary[:3, ...] - a) / b
It is also possible to use np.view(), so we can string more operations without having to slice the array every time.
>>> view = ary.view()
>>> view = view[:3, ...]
>>> view
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]]])
For this example, let's suppose we want to halve all the values in the first three channels for now:
>>> view //= 2 # use //= rather than /=, because this is an integer array, and we didn't specify a dtype, so numpy assumes fixed point integers (longs) rather than floats
>>> view
array([[[0, 0],
[1, 1]],
[[2, 2],
[3, 3]],
[[4, 4],
[5, 5]]])
>>> ary
array([[[ 0, 0],
[ 1, 1]],
[[ 2, 2],
[ 3, 3]],
[[ 4, 4],
[ 5, 5]],
[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]]])
But uh oh! Turns out we actually had to multiply this by several factors of a number!
>>> factors_of_420
[2, 2, 3, 5, 7]
Kind of a dumb example, I know, but just assume we can't know what the number is ahead of time. Like, just pretend we're getting the factors from a TCP server or something.
We could write it like this:
>>> ary
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]]])
>>> for fac in factors_of_420:
... ary[:3, ...] = ary[:3, ...] * fac
...
>>> ary
array([[[ 0, 420],
[ 840, 1260]],
[[1680, 2100],
[2520, 2940]],
[[3360, 3780],
[4200, 4620]],
[[ 12, 13],
[ 14, 15]],
[[ 16, 17],
[ 18, 19]]])
But that's a bit ugly, isn't it? Also, I bet running the slicing operation twice (once for setting, and once for getting) for every factor in the list can be a bit of a performance hit.
This is where view shines. We can just make one view, and operate on that, and numpy applies the operations to the underlying array for us:
We don't need to sacrifice anything. We make nicer and faster code at the same time!
>>> ary
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]],
[[16, 17],
[18, 19]]])
>>> view = ary.view()[:3, ...] # make our pre-sliced view, yum!
>>> view
array([[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]]])
>>> for fac in factors_of_420:
... view *= fac # use the *= (in place) operator, because 'view =' sets view to something else and does not apply to ary
...
>>> view
array([[[ 0, 420],
[ 840, 1260]],
[[1680, 2100],
[2520, 2940]],
[[3360, 3780],
[4200, 4620]]])
>>> ary
array([[[ 0, 420],
[ 840, 1260]],
[[1680, 2100],
[2520, 2940]],
[[3360, 3780],
[4200, 4620]],
[[ 12, 13],
[ 14, 15]],
[[ 16, 17],
[ 18, 19]]])
Let's see what timing tells us.
>>> class WeirdSliceMultiplier:
... def __init__(self):
... self.factors = [2, 2, 3, 5, 7]
... def setup(self):
... self.ary = np.reshape(range(5 * 2 * 2), (5, 2, 2))
... def setup_with_view(self):
... self.setup()
... self.view = self.ary.view()[:3, ...]
... def multiply_every_slice(self):
... for fac in self.factors:
... self.ary[:3, ...] = self.ary[:3, ...] * fac
... def multiply_view(self):
... for fac in self.factors:
... self.view *= fac
>>> timeit.timeit(multiplier.multiply_every_slice, multiplier.setup, number=50000) # 'slice for every factor' version
0.9404756519943476
>>> timeit.timeit(multiplier.multiply_view, multiplier.setup_with_view, number=50000) # 'slice view ahead of time' version
0.8748960520024411
Note in the second timeit call that view is set in the setup (preciesly, in setup_with_view), rather than the actual function being timed. This is because setting view doesn't count to the final time, as it is supposed to be ahead of time, and we're only counting the actual operation of multiplication, not any others that apply to view but may be stringed before or after.
Edit: Also, as #MadPhysicist pointed out in #mujiga's answer, we may actually prefer using the inplace operators. In fact, we already use them in the multiply_view function, so using inplace operators for both is a fairer comparison:
>>> class WeirdSliceMultiplier:
... def __init__(self):
... self.factors = [2, 2, 3, 5, 7]
... def setup(self):
... self.ary = np.reshape(range(5 * 2 * 2), (5, 2, 2))
... def setup_with_view(self):
... self.setup()
... self.view = self.ary.view()[:3, ...]
... def multiply_every_slice_inplace(self):
... for fac in self.factors:
... self.ary[:3, ...] *= fac
... def multiply_view(self):
... for fac in self.factors:
... self.view *= fac
...
>>> multiplier = WeirdSliceMultiplier()
>>> timeit.timeit(multiplier.multiply_every_slice_inplace, multiplier.setup, number=50000) # 'slice for every factor' version, but with inplace operators
1.0672136489883997
>>> timeit.timeit(multiplier.multiply_view, multiplier.setup_with_view, number=50000) # 'slice view ahead of time' version again for comparison
0.9300520950055216
The strange change in the time to execute setup_with_view, possibly to do with the CPU load average or something, can be fixed using a normalizing factor:
>>> old_viewslice_time = 0.8748960520024411
>>> new_viewslice_time = 0.9300520950055216
>>> norm_fac = old_viewslice_time / new_viewslice_time
>>> norm_fac
0.9406957488733435
>>> new_viewslice_time * norm_fac # should be very similar to old_viewslice_time
0.8748960520024411
>>> new_everyslice_inplace_time = 1.0672136489883997
>>> new_everyslice_inplace_time * norm_fac
1.003923342742996

Indexing numpy 2D array that wraps around

How do you index a numpy array that wraps around when its out of bounds?
For example, I have 3x3 array:
import numpy as np
matrix = np.array([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15]])
##
[[ 1 2 3 4 5]
[ 6 7 8 9 10]
[11 12 13 14 15]]
Say I would like to index the values around index (2,4) where value 15 is located. I would like to get back the array with values:
[[9, 10, 6]
[14, 15, 11]
[4, 5, 1]]
Basically all the values around 15 was returned, assuming it wraps around
A fairly standard idiom to find the neighboring elements in a numpy array is arr[x-1:x+2, y-1:y+2]. However, since you want to wrap, you can pad your array using wrap mode, and offset your x and y coordinates to account for this padding.
This answer assumes that you want the neighbors of the first occurence of your desired element.
First, find the indices of your element, and offset to account for padding:
x, y = np.unravel_index((m==15).argmax(), m.shape)
x += 1; y += 1
Now pad, and index your array to get your neighbors:
t = np.pad(m, 1, mode='wrap')
out = t[x-1:x+2, y-1:y+2]
array([[ 9, 10, 6],
[14, 15, 11],
[ 4, 5, 1]])
Here's how you can do it without padding. This can generalize easily to when you want more than just one neighbor and without the overhead of padding the array.
def get_wrapped(matrix, i, j):
m, n = matrix.shape
rows = [(i-1) % m, i, (i+1) % m]
cols = [(j-1) % n, j, (j+1) % n]
return matrix[rows][:, cols]
res = get_wrapped(matrix, 2, 4)
Let me explain what's happening here return matrix[rows][:, cols]. This is really two operations.
The first is matrix[rows] which is short hand for matrix[rows, :] which means give me the selected rows, and all columns for those rows.
Then next we do [:, cols] which means give me all the rows and the selected cols.
The take function works in-place.
>>> a = np.arange(1, 16).reshape(3,5)
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
>>> b = np.take(a, [3,4,5], axis=1, mode='wrap')
array([[ 4, 5, 1],
[ 9, 10, 6],
[14, 15, 11]])
>>> np.take(b, [1,2,3], mode='wrap', axis=0)
array([[ 9, 10, 6],
[14, 15, 11],
[ 4, 5, 1]])

Vectorization - Adding numpy arrays without loops?

So I have the following numpy arrays:
c = array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
X = array([[10, 15, 20, 5],
[ 1, 2, 6, 23]])
y = array([1, 1])
I am trying to add each 1x4 row in the X array to one of the columns in c. The y array specifies which column. The above example, means that we are adding both rows in the X array to column 1 of c. That is, we should expect the result of:
c = array([[ 1, 2+10+1, 3], = array([[ 1, 13, 3],
[ 4, 5+15+2, 6], [ 4, 22, 6],
[ 7, 8+20+6, 9], [ 7, 34, 9],
[10, 11+5+23, 12]]) [10, 39, 12]])
Does anyone know how I can do this without any loops? I tried c[:,y] += X but it seems like this only adds the second row of X to column 1 of c once. With that being said, it should be noted that y does not necessarily have to be [1,1], it can also be [0,1]. In this case, we would add the first row of X to column 0 of c and the second row of X to column 1 of c.
My first thought when I saw your desired calculation, was to just sum the 2 rows of X, and add that to the 2nd column of c:
In [636]: c = array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
In [637]: c[:,1]+=X.sum(axis=0)
In [638]: c
Out[638]:
array([[ 1, 13, 3],
[ 4, 22, 6],
[ 7, 34, 9],
[10, 39, 12]])
But if we want to work from a general index like y, we need a special bufferless operation - that is if there are duplicates in y:
In [639]: c = array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
In [641]: np.add.at(c,(slice(None),y),X.T)
In [642]: c
Out[642]:
array([[ 1, 13, 3],
[ 4, 22, 6],
[ 7, 34, 9],
[10, 39, 12]])
You need to look up .at in the numpy docs.
in Ipython add.at? shows me the doc that includes:
Performs unbuffered in place operation on operand 'a' for elements
specified by 'indices'. For addition ufunc, this method is equivalent to
a[indices] += b, except that results are accumulated for elements that
are indexed more than once. For example, a[[0,0]] += 1 will only
increment the first element once because of buffering, whereas
add.at(a, [0,0], 1) will increment the first element twice.
With a different y it still works
In [645]: np.add.at(c,(slice(None),[0,2]),X.T)
In [646]: c
Out[646]:
array([[11, 2, 4],
[19, 5, 8],
[27, 8, 15],
[15, 11, 35]])
Firstly, your code seems to work in general if you transpose X. For example:
c = array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
X = array([[10, 15, 20, 5],
[ 1, 2, 6, 23]]).transpose()
y = array([1, 2])
c[:,y] += X
print c
#OUTPUT:
#[[ 1 12 4]
# [ 4 20 8]
# [ 7 28 15]
# [10 16 35]]
However, it doesn't work when there are any duplicate columns in y, like in your specific example. I believe this is because c[:, [1,1]] will generate an array with two columns, each having the slice c[:, 1]. Both of these slices point to the same part of c, and so when the addition happens on each, they are both read, then the corresponding part of X is added to each, then they are written back, meaning the last one to be written back is the final value. I don't believe numpy will let you vectorize an operation like this because it fundamentally can't be. This requires editing one column at a time, saving back it's value, and then editing it again later.
You might have to settle for no duplicates, or otherwise implement something like an accumulator.
This is the solution I came up with:
def my_func(c, X, y):
cc = np.zeros((len(y), c.shape[0], c.shape[1]))
cc[range(len(y)), :, y] = X
return c + np.sum(cc, 0)
The following interactive session demonstrates how it works:
>>> my_func(c, X, y)
array([[ 1., 13., 3.],
[ 4., 22., 6.],
[ 7., 34., 9.],
[ 10., 39., 12.]])
>>> y2 = np.array([0, 2])
>>> my_func(c, X, y2)
array([[ 11., 2., 4.],
[ 19., 5., 8.],
[ 27., 8., 15.],
[ 15., 11., 35.]])

Multiplying two 2D numpy arrays to a 3D array

I've got two 2D numpy arrays called A and B, where A is M x N and B is M x n. My problem is that I wish to multiply each element of each row of B with corresponding row of A and create a 3D matrix C which is of size M x n x N, without using for-loops.
As an example, if A is:
A = np.array([[1, 2, 3],
[4, 5, 6]])
and B is
B = np.array([[1, 2],
[3, 4]])
Then the resulting multiplication C = A x B would look something like
C = [
[[1, 2],
[12, 16]],
[[2, 4],
[15, 20]],
[[3, 6],
[18, 24]]
]
Is it clear what I'm trying to achieve, and is it possible doing without any for-loops? Best, tingis
C=np.einsum('ij,ik->jik',A,B)
It is possible by creating a new axis in each array and transposing the modified A:
A[np.newaxis,...].T * B[np.newaxis,...]
giving:
array([[[ 1, 2],
[12, 16]],
[[ 2, 4],
[15, 20]],
[[ 3, 6],
[18, 24]]])

Categories

Resources