I would like to use a numpy function in a daily report, because my data is quite large.
Let consider i have a numpy 2d-array
A = array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
I want to do something like this
abs(array([0, 1, 2]) - array([[3, 4, 5], [4, 5, 6], ..., [7, 8, 9]])).sum()
abs(array([1, 2, 3]) - array([[4, 5, 6], [5, 6, 7], ..., [7, 8, 9]])).sum()
...
abs(array([3, 4, 5]) - array([[0, 1, 2], [6, 7, 8], [7, 8, 9]])).sum()
abs(array([4, 5, 6]) - array([[0, 1, 2], [1, 2, 3], [7, 8, 9]])).sum()
...
abs(array([7, 8, 9]) - array([[0, 1, 2], [1, 2, 3], ..., [4, 5, 6]])).sum()
I have tried this, but cannot skip arrays with any of elements on the left side that are in array on the right side.
for i in range(len(A)):
temp = np.roll(A, -i, axis=0)
print(abs(temp[0] - temp[3:]).sum())
This is the expected results
results = [75, 54, ..., 30, 30, ...75]
Sorry for my poor english explanation, thank you.
If you wish to have a simple one-liner solution involving only NumPy functionality, I propose this:
import numpy as np
results = np.apply_along_axis(arr=A,
axis=1,
func1d=lambda x:
np.abs(x - A[~np.isin(A, x).any(axis=1),:]).sum()
)
The results is as expected:
array([75, 54, 36, 30, 30, 36, 54, 75])
Here You go:
=^..^=
import numpy as np
A = np.array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
def sum_data(select_row):
# roll data
rolled_data = np.roll(A, -select_row, axis=0)
drop_numbers = []
for item in rolled_data[0]:
drop_numbers.append(item)
# find rows to drop
drop_rows = []
for item in drop_numbers:
# get rows
gg = np.unique(np.where(rolled_data == item)[0])
for number in gg:
drop_rows.append(number)
# get unique rows numbers
unique_rows = list(set(drop_rows))
del unique_rows[0] # delete first number that is selected row
# delete rows
rolled_data = np.delete(rolled_data, unique_rows, axis=0)
# calculate
difference_value = 0
for i in range(1, len(rolled_data), 1):
difference_value += abs(rolled_data[0] - rolled_data[i]).sum()
return difference_value
# loop over each row
collect_values = []
for j in range(len(A)):
collect_values.append(sum_data(j))
Output:
[75, 54, 36, 30, 30, 36, 54, 75]
Related
my_function must expand a 1D numpy array to a 2D numpy array, with the 2nd axis containing the slices of length starting from the first index until the end. Example:
import numpy as np
a = np.arange(10)
print (my_function(a, length=3))
Expected output
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
I can achieve this using a for loop, but I was wondering if there is a numpy vectorization technique for this.
def my_function(a, length):
b = np.zeros((len(a)-(length-1), length))
for i in range(len(b)):
b[i] = a[i:i+length]
return b
If you're careful with the math and heed the warningin the docs, you can use np.lib.stride_tricks.as_strided(). You need to calculate the correct dimensions for your array so you don't overflow. Also note that as_strided() shares memory, so you will multiple references to the same memory in the final output. (You can of course, copy this to a new array).
>> import numpy as np
>> def my_function(a, length):
stride = a.strides[0]
l = len(a) - length + 1
return np.lib.stride_tricks.as_strided(a, (l, length), (stride,stride) )
>> np.array(my_function(np.arange(10), 3))
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
>> np.array(my_function(np.arange(15), 7))
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 1, 2, 3, 4, 5, 6, 7],
[ 2, 3, 4, 5, 6, 7, 8],
[ 3, 4, 5, 6, 7, 8, 9],
[ 4, 5, 6, 7, 8, 9, 10],
[ 5, 6, 7, 8, 9, 10, 11],
[ 6, 7, 8, 9, 10, 11, 12],
[ 7, 8, 9, 10, 11, 12, 13],
[ 8, 9, 10, 11, 12, 13, 14]])
How about this function?
import numpy as np
def my_function(a, length):
result = []
for i in range(length):
result.append(a + i)
return np.vstack(result).T[:len(a) - length + 1]
a = np.arange(10)
length = 3
my_function(a, length)
Let say we have a 2-D array like this:
>>> a
array([[1, 1, 2],
[0, 2, 2],
[2, 2, 0],
[0, 2, 0]])
For each line I want to replace each element by the maximum of the 2 others in the same line.
I've found how to do it for each column separately, using numpy.amax and an identity array, like this:
>>> np.amax(a*(1-np.eye(3)[0]), axis=1)
array([ 2., 2., 2., 2.])
>>> np.amax(a*(1-np.eye(3)[1]), axis=1)
array([ 2., 2., 2., 0.])
>>> np.amax(a*(1-np.eye(3)[2]), axis=1)
array([ 1., 2., 2., 2.])
But I would like to know if there is a way to avoid a for loop and get directly the result which in this case should look like this:
>>> numpy_magic(a)
array([[2, 2, 1],
[2, 2, 2],
[2, 2, 2],
[2, 0, 2]])
Edit: after a few hours playing in the console, I've finally come up with the solution I was looking for. Be ready for some mind blowing one line code:
np.amax(a[[range(a.shape[0])]*a.shape[1],:][(np.eye(a.shape[1]) == 0)[:,[range(a.shape[1])*a.shape[0]]].reshape(a.shape[1],a.shape[0],a.shape[1])].reshape((a.shape[1],a.shape[0],a.shape[1]-1)),axis=2).transpose()
array([[2, 2, 1],
[2, 2, 2],
[2, 2, 2],
[2, 0, 2]])
Edit2: Paul has suggested a much more readable and faster alternative which is:
np.max(a[:, np.where(~np.identity(a.shape[1], dtype=bool))[1].reshape(a.shape[1], -1)], axis=-1)
After timing these 3 alternatives, both Paul's solutions are 4 times faster in every contexts (I've benchmarked for 2, 3 and 4 columns with 200 rows). Congratulations for these amazing pieces of code!
Last Edit (sorry): after replacing np.identity with np.eye which is faster, we now have the fastest and most concise solution:
np.max(a[:, np.where(~np.eye(a.shape[1], dtype=bool))[1].reshape(a.shape[1], -1)], axis=-1)
Here are two solutions, one that is specifically designed for max and a more general one that works for other operations as well.
Using the fact that all except possibly one maximums in each row are the maximum of the entire row, we can use argpartition to cheaply find the indices of the largest two elements. Then in the position of the largest we put the value of the second largest and everywhere else the largest value. Works also for more than 3 columns.
>>> a
array([[6, 0, 8, 8, 0, 4, 4, 5],
[3, 1, 5, 0, 9, 0, 3, 6],
[1, 6, 8, 3, 4, 7, 3, 7],
[2, 1, 6, 2, 9, 1, 8, 9],
[7, 3, 9, 5, 3, 7, 4, 3],
[3, 4, 3, 5, 8, 2, 2, 4],
[4, 1, 7, 9, 2, 5, 9, 6],
[5, 6, 8, 5, 5, 3, 3, 3]])
>>>
>>> M, N = a.shape
>>> result = np.empty_like(a)
>>> largest_two = np.argpartition(a, N-2, axis=-1)
>>> rng = np.arange(M)
>>> result[...] = a[rng, largest_two[:, -1], None]
>>> result[rng, largest_two[:, -1]] = a[rng, largest_two[:, -2]]>>>
>>> result
array([[8, 8, 8, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 6, 9, 9, 9],
[8, 8, 7, 8, 8, 8, 8, 8],
[9, 9, 9, 9, 9, 9, 9, 9],
[9, 9, 7, 9, 9, 9, 9, 9],
[8, 8, 8, 8, 5, 8, 8, 8],
[9, 9, 9, 9, 9, 9, 9, 9],
[8, 8, 6, 8, 8, 8, 8, 8]])
This solution depends on specific properties of max.
A more general solution that for example also works for sum instead of max would be. Glue two copies of a together (side-by-side, not on top of each other). So the rows are something like a0 a1 a2 a3 a0 a1 a2 a3. For an index x we can get all but ax by slicing [x+1:x+4]. To do this vectorized we use stride_tricks:
>>> a
array([[2, 6, 0],
[5, 0, 0],
[5, 0, 9],
[6, 4, 4],
[5, 0, 8],
[1, 7, 5],
[9, 7, 7],
[4, 4, 3]])
>>> M, N = a.shape
>>> aa = np.c_[a, a]
>>> ast = np.lib.stride_tricks.as_strided(aa, (M, N+1, N-1), aa.strides + aa.strides[1:])
>>> result = np.max(ast[:, 1:, :], axis=-1)
>>> result
array([[6, 2, 6],
[0, 5, 5],
[9, 9, 5],
[4, 6, 6],
[8, 8, 5],
[7, 5, 7],
[7, 9, 9],
[4, 4, 4]])
# use sum instead of max
>>> result = np.sum(ast[:, 1:, :], axis=-1)
>>> result
array([[ 6, 2, 8],
[ 0, 5, 5],
[ 9, 14, 5],
[ 8, 10, 10],
[ 8, 13, 5],
[12, 6, 8],
[14, 16, 16],
[ 7, 7, 8]])
List comprehension solution.
np.array([np.amax(a * (1 - np.eye(3)[j]), axis=1) for j in range(a.shape[1])]).T
Similar to #Ethan's answer but with np.delete(), np.max(), and np.dstack():
np.dstack([np.max(np.delete(a, i, 1), axis=1) for i in range(a.shape[1])])
array([[2, 2, 1],
[2, 2, 2],
[2, 2, 2],
[2, 0, 2]])
delete() "filters" out each column successively;
max() finds the row-wise maximum of the remaining two columns
dstack() stacks the resulting 1d arrays
If you have more than 3 columns, note that this will find the maximum of "all other" columns rather than the "2-greatest" columns per row. For example:
a2 = np.arange(25).reshape(5,5)
np.dstack([np.max(np.delete(a2, i, 1), axis=1) for i in range(a2.shape[1])])
array([[[ 4, 4, 4, 4, 3],
[ 9, 9, 9, 9, 8],
[14, 14, 14, 14, 13],
[19, 19, 19, 19, 18],
[24, 24, 24, 24, 23]]])
I want to apply tf.gather() to all the rows of a given parameters tensor and an indices tensor.
I can apply tf.gather() on two 1D tensors to extract a 1D tensor:
# params == array([3, 8, 9, 7, 6])
# inds == array([1, 2, 3])
>>> tf.gather(params, inds).eval()
array([8, 9, 7])
Now what if I have two 2D tensors, and want to apply tf.gather() on them row-wise? I want something like this:
# params == array([[3, 8, 9, 7, 6],
# [6, 1, 7, 0, 7],
# [7, 4, 4, 5, 8]])
# inds == array([[1, 2, 3],
# [2, 3, 4],
# [0, 1, 2]])
>>> row_wise_gather(params, inds)
array([[8, 9, 7],
[7, 0, 7],
[7, 4, 4]]
The closest I've come so far is using tf.gather() with axis=1, which yields a 3D tensor, and then index the result with gather_nd():
>>> gathered3d = tf.gather(params, inds, axis=1)
# gathered3d == array([[[8, 9, 7],
# [9, 7, 6],
# [3, 8, 9]],
#
# [[1, 7, 0],
# [7, 0, 7],
# [6, 1, 7]],
#
# [[4, 4, 5],
# [4, 5, 8],
# [7, 4, 4]]])
>>> tf.gather_nd(gathered3d, [[0, 0], [1, 1], [2, 2]]).eval()
array([[8, 9, 7],
[7, 0, 7],
[7, 4, 4]])
(I'd call other functions instead of giving literal values, but that's beside the point and not an issue)
This is very clumsy. Is there a more efficient way to do this?
By the way, the indices I use are always values increasing one by one; each row just has a different start and end value. That might make the problem easier.
I want to use this code on very huge array. this code take long time to execute and it is not efficient.
is there any way to remove loop and convert this code to optimum way?
>>> import numpy as np
>>> x=np.random.randint(10, size=(4,5,3))
>>> x
array([[[3, 2, 6],
[4, 6, 6],
[3, 7, 9],
[6, 4, 2],
[9, 0, 1]],
[[9, 0, 4],
[1, 8, 9],
[6, 8, 1],
[9, 4, 5],
[1, 5, 2]],
[[6, 1, 6],
[1, 8, 8],
[3, 8, 3],
[7, 1, 0],
[7, 7, 0]],
[[5, 6, 6],
[8, 3, 1],
[0, 5, 4],
[6, 1, 2],
[5, 6, 1]]])
>>> y=[]
>>> for i in range(x.shape[1]):
for j in range(x.shape[2]):
y.append(x[:, i, j].tolist())
>>> y
[[3, 9, 6, 5], [2, 0, 1, 6], [6, 4, 6, 6], [4, 1, 1, 8], [6, 8, 8, 3], [6, 9, 8, 1], [3, 6, 3, 0], [7, 8, 8, 5], [9, 1, 3, 4], [6, 9, 7, 6], [4, 4, 1, 1], [2, 5, 0, 2], [9, 1, 7, 5], [0, 5, 7, 6], [1, 2, 0, 1]]
You could permute axes with np.transpose and then reshape to 2D -
y = x.transpose(1,2,0).reshape(-1,x.shape[0])
Append with .tolist() for list output.
yes, either use np.reshape(x, shape) or try it with np.ndarray.flatten(x, order='F') (F for Fortran style, column first, according to your example).
read the documentation to find out which parameters fit the best. IMHO, I think ndarray.flatten is the better and more elegant option for you here. However, depending on your exact wanted solution, you might have to reshape the array first.
I have a 4D array training images, whose dimensions correspond to (image_number,channels,width,height). I also have a 2D target labels,whose dimensions correspond to (image_number,class_number). When training, I want to randomly shuffle the data by using random.shuffle, but how can I keep the labels shuffled by the same order of my images? Thx!
from sklearn.utils import shuffle
import numpy as np
X = np.array([[0, 0, 0], [1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]])
y = np.array([0, 1, 2, 3, 4])
X, y = shuffle(X, y)
print(X)
print(y)
[[1 1 1]
[3 3 3]
[0 0 0]
[2 2 2]
[4 4 4]]
[1 3 0 2 4]
There is another easy way to do that. Let us suppose that there are total N images. Then we can do the following:
from random import shuffle
ind_list = [i for i in range(N)]
shuffle(ind_list)
train_new = train[ind_list, :,:,:]
target_new = target[ind_list,]
If you want a numpy-only solution, you can just reindex the second array on the first, assuming you've got the same image numbers in both:
In [67]: train = np.arange(20).reshape(4,5).T
In [68]: target = np.hstack([np.arange(5).reshape(5,1), np.arange(100, 105).reshape(5,1)])
In [69]: train
Out[69]:
array([[ 0, 5, 10, 15],
[ 1, 6, 11, 16],
[ 2, 7, 12, 17],
[ 3, 8, 13, 18],
[ 4, 9, 14, 19]])
In [70]: target
Out[70]:
array([[ 0, 100],
[ 1, 101],
[ 2, 102],
[ 3, 103],
[ 4, 104]])
In [71]: np.random.shuffle(train)
In [72]: target[train[:,0]]
Out[72]:
array([[ 2, 102],
[ 3, 103],
[ 1, 101],
[ 4, 104],
[ 0, 100]])
In [73]: train
Out[73]:
array([[ 2, 7, 12, 17],
[ 3, 8, 13, 18],
[ 1, 6, 11, 16],
[ 4, 9, 14, 19],
[ 0, 5, 10, 15]])
If you're looking for a sync/ unison shuffle you can use the following func.
def unisonShuffleDataset(a, b):
assert len(a) == len(b)
p = np.random.permutation(len(a))
return a[p], b[p]
the one above is only for 2 numpy. One can extend to more than 2 by adding the number of input vars on the func. and also on the return of the function.
Depending on what you want to do, you could also randomly generate a number for each dimension of your array with
random.randint(a, b) #a and b are the extremes of your array
which would select randomly amongst your objects.
Use the same seed to build the random generator multiple times to shuffle different arrays:
>>> seed = np.random.SeedSequence()
>>> arrays = [np.arange(10).repeat(i).reshape(10, -1) for i in range(1, 4)]
>>> for ar in arrays:
... np.random.default_rng(seed).shuffle(ar)
...
>>> arrays
[array([[1],
[2],
[7],
[8],
[0],
[4],
[3],
[6],
[9],
[5]]),
array([[1, 1],
[2, 2],
[7, 7],
[8, 8],
[0, 0],
[4, 4],
[3, 3],
[6, 6],
[9, 9],
[5, 5]]),
array([[1, 1, 1],
[2, 2, 2],
[7, 7, 7],
[8, 8, 8],
[0, 0, 0],
[4, 4, 4],
[3, 3, 3],
[6, 6, 6],
[9, 9, 9],
[5, 5, 5]])]