In python, I have an array of shape n*2 ( where n is a positive integer ). Essentially, this is an array of pairs. I wish to remove all mirror pairs in this array. For example, the following array A is of shape 10*2. The pairs [0,55] and [55, 0] would constitute one such mirror pair in A, and I wish to keep one out of those two.
A = np.array([[ 0, 55], [ 5, 25], [12, 62], [27, 32], [25, 73],
[55, 0], [25, 5], [62, 12], [32, 27], [99, 95]])
For the aforementioned example, I would want the result array to look like:
B = np.array([[ 0, 55], [ 5, 25], [12, 62], [27, 32], [25, 73], [99,95])
since there are 6 unique pairs (after 4 mirror pairs are excluded).
I realize that I can achieve this using two nested for loops, but I would want to achieve this using the fastest possible method, since for the actual problem at hand, I will be dealing with huge arrays. I will be thankful to have some help.
A cryptic one-liner:
In [301]: A
Out[301]:
array([[ 0, 55],
[ 5, 25],
[12, 62],
[27, 32],
[25, 73],
[55, 0],
[25, 5],
[62, 12],
[32, 27],
[99, 95]])
In [302]: np.unique(np.sort(A, axis=1).view(','.join([A.dtype.char]*2))).view(A.dtype).reshape(-1, 2)
Out[302]:
array([[ 0, 55],
[ 5, 25],
[12, 62],
[25, 73],
[27, 32],
[95, 99]])
Break it down into steps...
First, create a copy that is sorted along the second axis. In the sorted array, we want to remove duplicate rows.
In [303]: a = np.sort(A, axis=1)
In [304]: a
Out[304]:
array([[ 0, 55],
[ 5, 25],
[12, 62],
[27, 32],
[25, 73],
[ 0, 55],
[ 5, 25],
[12, 62],
[27, 32],
[95, 99]])
numpy.unique() can be used to find the unique elements of an array, but it only works with one-dimensional data. So we'll create a one-dimensional view of b in which each row becomes a single structure with two fields. One way to define the new data type that we want is as a string:
In [305]: dt = ','.join([A.dtype.char]*2)
In [306]: dt
Out[306]: 'l,l'
b is a structured array; it is the one-dimensional view of a:
In [307]: b = a.view(dt)
In [308]: b
Out[308]:
array([[( 0, 55)],
[( 5, 25)],
[(12, 62)],
[(27, 32)],
[(25, 73)],
[( 0, 55)],
[( 5, 25)],
[(12, 62)],
[(27, 32)],
[(95, 99)]],
dtype=[('f0', '<i8'), ('f1', '<i8')])
Now we use numpy.unique() to find the unique elements of b:
In [309]: u = np.unique(b)
In [310]: u
Out[310]:
array([( 0, 55), ( 5, 25), (12, 62), (25, 73), (27, 32), (95, 99)],
dtype=[('f0', '<i8'), ('f1', '<i8')])
Next, create a view of u, using the data type of the original array A. This will be one-dimensional:
In [311]: v = u.view(A.dtype)
In [312]: v
Out[312]: array([ 0, 55, 5, 25, 12, 62, 25, 73, 27, 32, 95, 99])
Finally, reshape v to restore the two-dimensional array:
In [313]: w = v.reshape(-1, 2)
In [314]: w
Out[314]:
array([[ 0, 55],
[ 5, 25],
[12, 62],
[25, 73],
[27, 32],
[95, 99]])
If you are using a pure python list, try the following code.
>>> list(set([tuple(i) for i in map(sorted, b)]))
[(27, 32), (5, 25), (12, 62), (95, 99), (25, 73), (0, 55)]
I'm assuming the order of the pairs doesn't matter (for example: [1,2] = [2,1]). If this is the case, you can flip all pairs so that the first number is always smaller than the second number.
[[1,2], [4,3], [1,7], [10,2]]
becomes
[[1,2], [3,4], [1,7], [2,10]]
Then you could sort all the pairs by the first, then second number:
[[1,2], [1,7], [2,10], [3,4]]
Finally, you could loop through the list and remove any duplicate pairs.
If you use an efficient sorting algorithm, like a mergesort, this entire process will have O(n*log(n)) work, which is much better than O(n^2) work (what you get with nested for loops.
I will show u my way(because it has one my favourites tricks to convert 1D list to nD),even if there are probably easier ways:
A = [[ 0, 55], [ 5, 25], [12, 62], [27, 32], [25, 73],
[55, 0], [25, 5], [62, 12], [32, 27], [99, 95]]
B=[]
long = int(len(A)/2)
for i in range(long):
if A[i][0] == A[i+long][1] and A[i][1] == A[i+long][0]:
B.append(A[i][0])
B.append(A[i][1])
else:
B.append(A[i][0])
B.append(A[i][1])
B.append(A[i+long][0])
B.append(A[i+long][1])
#Now we created an 1D list and then we convert it to 2D!
B=[B[i:i+2] for i in range(0,len(B),2)]
Related
I have a matrix mat of size (3, 5, 4) and a vector vec of size (4,) with indices corresponding to the first dimension of the matrix (i.e. between 0 and 2). I would like to extract an array of size (4, 5), which can be done via mat[vec, :, [True] * len(vec)], but I was wondering if there is a more elegant solution using numpy functions without the need to create a new list of boolean values.
In [15]: mat = np.arange(3 * 5 * 4).reshape(3, 5, 4)
In [16]: idx = np.array([0, 2, 1, 1])
In [18]: mat[idx, :, [True] * len(idx)]
Out[18]:
array([[ 0, 4, 8, 12, 16],
[41, 45, 49, 53, 57],
[22, 26, 30, 34, 38],
[23, 27, 31, 35, 39]])
equivalent - whether it's more elegant?
In [19]: mat[idx, :, np.arange(4)]
Out[19]:
array([[ 0, 4, 8, 12, 16],
[41, 45, 49, 53, 57],
[22, 26, 30, 34, 38],
[23, 27, 31, 35, 39]])
Unless you want a (4,5,4), you will have to provide equal size arrays for the 1st and 3rd dimensions. There's no way around that.
I've got a multidimensional array of shape (1000000,3,2). Essentially it is 1 million sets of samples, each sample being 3 coordinates (each coordinate has an x and y component). I would like to sort each of the samples by the y component of each coordinate. For example, if vecCoords is the full array andI have one of the samples being vecCoords[0,:,:] = [[1,3],[2,1],[4,2]], I want it sorted to give me [[2,1],[4,2],[1,3]]. I was this vectorized so it is done for each of the 1 million samples. So, the output shape is still (1000000,3,2). I tried doing it iteratively, but my code isn't giving me the correct results there, and it is also slower than I would like.
Make a small sample array:
In [153]: arr = np.random.randint(0,100,(4,3,2))
In [154]: arr
Out[154]:
array([[[21, 12],
[15, 31],
[17, 88]],
[[35, 81],
[99, 58],
[39, 46]],
[[54, 54],
[85, 71],
[ 9, 19]],
[[25, 46],
[62, 61],
[74, 69]]])
The values you want to sort:
In [155]: arr[:,:,1]
Out[155]:
array([[12, 31, 88],
[81, 58, 46],
[54, 71, 19],
[46, 61, 69]])
In [156]: idx=np.argsort(arr[:,:,1], axis=1)
In [157]: idx
Out[157]:
array([[0, 1, 2],
[2, 1, 0],
[2, 0, 1],
[0, 1, 2]])
Test this sort on one plane:
In [159]: arr[1,idx[1],:]
Out[159]:
array([[39, 46],
[99, 58],
[35, 81]])
Apply it to all planes:
In [161]: arr[np.arange(arr.shape[0])[:,None], idx,:]
Out[161]:
array([[[21, 12],
[15, 31],
[17, 88]],
[[39, 46],
[99, 58],
[35, 81]],
[[ 9, 19],
[54, 54],
[85, 71]],
[[25, 46],
[62, 61],
[74, 69]]])
While I had a general idea where I was heading with this, I still had to experiment a bit.
A newish function is supposed to make this easier - though even here I had to try a couple of things:
In [168]: np.take_along_axis(arr,idx[:,:,None], axis=1)
Out[168]:
array([[[21, 12],
[15, 31],
[17, 88]],
[[39, 46],
[99, 58],
[35, 81]],
[[ 9, 19],
[54, 54],
[85, 71]],
[[25, 46],
[62, 61],
[74, 69]]])
I have two matrices (numpy arrays), mu and nu. From these I would like to create a third array as follows:
new_array_{j, k, l} = mu_{l, k} nu_{j, k}
I can do it naively using list comprehensions:
[[[mu[l, k] * nu[j, k] for k in np.arange(N)] for l in np.arange(N)] for j in np.arange(N)]
but it quickly becomes slow.
How can I create new_array using numpy functions which should be faster?
Two quick solutions (without my usual proofs and explanations):
res = np.einsum('lk,jk->jkl', mu, nu)
res = mu.T[None,:,:] * nu[:,:,None] # axes in same order as result
#!/usr/bin/env python
import numpy as np
# example data
mu = np.arange(10).reshape(2,5)
nu = np.arange(15).reshape(3,5) + 20
# get array sizes
nl, nk = mu.shape
nj, nk_ = nu.shape
assert(nk == nk_)
# get arrays with dimensions (nj, nk, nl)
# in the case of mu3d, we need to add a slowest varying dimension
# so (after transposing) this can be done by cycling through the data
# nj times along the slowest existing axis and then reshaping
mu3d = np.concatenate((mu.transpose(),) * nj).reshape(nj, nk, nl)
# in the case of nu3d, we need to add a new fastest varying dimension
# so this can be done by repeating each element nl times, and again it
# needs reshaping
nu3d = nu.repeat(nl).reshape(nj, nk, nl)
# now just multiple element by element
new_array = mu3d * nu3d
print(new_array)
Gives:
>>> mu
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> nu
array([[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34]])
>>> nj, nk, nl
(3, 5, 2)
>>> mu3d
array([[[0, 5],
[1, 6],
[2, 7],
[3, 8],
[4, 9]],
[[0, 5],
[1, 6],
[2, 7],
[3, 8],
[4, 9]],
[[0, 5],
[1, 6],
[2, 7],
[3, 8],
[4, 9]]])
>>> nu3d
array([[[20, 20],
[21, 21],
[22, 22],
[23, 23],
[24, 24]],
[[25, 25],
[26, 26],
[27, 27],
[28, 28],
[29, 29]],
[[30, 30],
[31, 31],
[32, 32],
[33, 33],
[34, 34]]])
>>> new_array
array([[[ 0, 100],
[ 21, 126],
[ 44, 154],
[ 69, 184],
[ 96, 216]],
[[ 0, 125],
[ 26, 156],
[ 54, 189],
[ 84, 224],
[116, 261]],
[[ 0, 150],
[ 31, 186],
[ 64, 224],
[ 99, 264],
[136, 306]]])
I'm new to numpy and am having trouble understanding how shapes of arrays are decided.
An array of the form
[[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [1,2,4,3]]
has a shape of (2,) while one of the form
[[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [1,2,4]]
has a shape of (2,3). Moreover,
[[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [[1,2,4], [3,4,2]]]
has a shape of (2,) but adding another vector as
[[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [[1,2,4], [3,4,2], [1,2,4]]]
changes the shape to (2,3,3). Intuitively, I feel that all the arrays should be 3 - dimensional. Could anyone help me understand what's happening exactly?
The underlying idea is that np.array tries to create as high a dimensional array as it can. When the sublists have matching numbers of elements the result is easy to see. When they mix lists of differing lengths the result can be confusing.
In your first case you have 2 sublists, one of length 3, the other length 4. So it makes a 2 element object array, and doesn't try to parse the sublists of the 1st sublist
In [1]: arr = np.array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [1,2,4,3]])
In [2]: arr
Out[2]: array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]],
[1, 2, 4, 3]
], dtype=object) # adjusted format
In [3]: arr.dtype
Out[3]: dtype('O')
In [4]: arr.shape
Out[4]: (2,)
In [5]: arr[0]
Out[5]: [[5, 10, 15], [20, 25, 30], [35, 40, 45]] # 3 element list of lists
In [6]: arr[1]
Out[6]: [1, 2, 4, 3] # 4 element list of numbers
In the 2nd case you have two sublists, both of length 3. So it makes a 2x3 array. But one sublist contains lists, the other numbers - so the result is again object array:
In [7]: arr = np.array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [1,2,4]] )
In [8]: arr
Out[8]:
array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]],
[1, 2, 4]
], dtype=object)
In [9]: arr.shape
Out[9]: (2, 3)
In [10]: arr[0,0]
Out[10]: [5, 10, 15]
Finally, 2 lists, each with 3 elements, each of which is also 3 element lists - a 3d array.
In [11]: arr = np.array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [[1,2,4], [3,4,2], [1,2,4]]] )
In [12]: arr
Out[12]:
array([[[ 5, 10, 15],
[20, 25, 30],
[35, 40, 45]],
[[ 1, 2, 4],
[ 3, 4, 2],
[ 1, 2, 4]]])
In [13]: arr.shape
Out[13]: (2, 3, 3)
There are also mixes of sublist lengths that can raise an error.
In general don't mix sublists of differing size and content type casually. np.array behaves most predictably when given lists that will produce a nice multidimensional array. Mixing list lengths leads to confusion.
Updated numpy:
In [1]: np.__version__
Out[1]: '1.13.1'
In [2]: np.array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [1,2,4,3]])
Out[2]: array([list([[5, 10, 15], [20, 25, 30], [35, 40, 45]]), list([1, 2, 4, 3])], dtype=object)
In [3]: np.array([[[5, 10, 15], [20, 25, 30], [35, 40, 45]], [1,2,4]] )
Out[3]:
array([[list([5, 10, 15]), list([20, 25, 30]), list([35, 40, 45])],
[1, 2, 4]], dtype=object)
It now identifies the list elements
This last example is still (2,3) object array. As such each of those 6 elements could be a different Python type, e.g.:
In [11]: np.array([[[5, 10, 15], np.array([20, 25, 30]), (35, 40, 45)], [None,2,'astr']] )
Out[11]:
array([[list([5, 10, 15]), array([20, 25, 30]), (35, 40, 45)],
[None, 2, 'astr']], dtype=object)
In [12]: [type(x) for x in _.flat]
Out[12]: [list, numpy.ndarray, tuple, NoneType, int, str]
I'm new to python and now I have a problem with my assignment.
There is a original dataset. In this dataset, there are 400 cells. In each cell, there is a 64*64 matrix.
array([[ array([[ 75, 89, 101, ..., 90, 80, 74],
[ 83, 98, 106, ..., 90, 82, 76],
[ 83, 101, 109, ..., 92, 82, 72],
...,
[ 52, 50, 54, ..., 37, 40, 42],
[ 49, 51, 51, ..., 36, 39, 40],
[ 49, 50, 49, ..., 37, 39, 38]], dtype=uint8),
array([[110, 114, 124, ..., 46, 45, 45],
[108, 117, 126, ..., 52, 51, 51],
[120, 125, 129, ..., 49, 50, 50],
...,
[187, 189, 192, ..., 35, 35, 35],
[187, 188, 191, ..., 33, 33, 33],
[185, 191, 189, ..., 37, 37, 37]], dtype=uint8),
array([[ 77, 97, 119, ..., 97, 86, 75],
[ 75, 96, 116, ..., 98, 91, 73],
[ 65, 84, 110, ..., 96, 90, 75],
...,
[ 32, 24, 20, ..., 33, 36, 37],
[ 28, 23, 19, ..., 35, 35, 38],
[ 27, 22, 19, ..., 34, 36, 37]], dtype=uint8),
(400 of them)
I want to stretch the data into column vectors and form one matrix. In the new matrix, each column vector is consisted of the 64*64=4096 data in one cell of original data. And thus I can get a 4096*400 matrix.
Thanks...
The answer to your question would be to reshape your array and then transpose it:
In [28]: a = np.arange(3*4*2).reshape(3,1,4,2)
In [29]: a
Out[29]:
array([[[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7]]],
[[[ 8, 9],
[10, 11],
[12, 13],
[14, 15]]],
[[[16, 17],
[18, 19],
[20, 21],
[22, 23]]]])
In [30]: a.reshape(a.shape[0], a.shape[-2]*a.shape[-1]).T
Out[30]:
array([[ 0, 8, 16],
[ 1, 9, 17],
[ 2, 10, 18],
[ 3, 11, 19],
[ 4, 12, 20],
[ 5, 13, 21],
[ 6, 14, 22],
[ 7, 15, 23]])
As David pointed out in the comments, if you wanted to unroll the 64x64 matrices column-wise, use the Fortran ordering in reshape:
In [31]: a.reshape((a.shape[0], a.shape[-2]*a.shape[-1]), order='F').T
Out[31]:
array([[ 0, 8, 16],
[ 2, 10, 18],
[ 4, 12, 20],
[ 6, 14, 22],
[ 1, 9, 17],
[ 3, 11, 19],
[ 5, 13, 21],
[ 7, 15, 23]])
However, how you got your data to be printed in that form with array([[ array([[ is unusual and it's not something I can immediately come up with how you did that. You might want to consider changing the way you generated that dataset: if the number of 64x64 "matrices" is known upfront, you could preallocate an array (np.rempty(400, 64, 64)) and then fill it. Otherwise, an ordinary Python list would suffice, to which you could append each new 64x64 matrix and at the end convert the list of arrays to a full numpy array (np.array(list_of_matrices)).
Wow, how did you end up with that?
I can think of no good reason for nesting numeric arrays within an np.object array. You would have a much easier time if you were starting out with a (400, 64, 64) homogeneous array of dtype np.uint8, but that's a separate issue for you to think about.
I had to think for a minute how to even construct an array like yours. Here's a toy example - a (3,) np.object array containing (2, 2) np.uint8 arrays:
arr = np.empty(3, dtype=np.object)
for ii in xrange(3):
arr[ii] = np.arange(4, dtype=np.uint8).reshape(2, 2)
Here's what that looks like:
print(repr(arr))
# array([array([[0, 1],
# [2, 3]], dtype=uint8),
# array([[0, 1],
# [2, 3]], dtype=uint8),
# array([[0, 1],
# [2, 3]], dtype=uint8)], dtype=object)
Eeew. Let's say that we wanted to reshape this into a (3, 4) array. It's not possible to do this directly because the size of the outer np.object array 'container' is still only (3,):
print(arr.shape)
# (3,)
arr.reshape(3, 4)
# ValueError: total size of new array must be unchanged
To turn that into something sensible we'll first stack the subarrays in the first dimension in order to make a homogeneous array of dtype np.uint8:
arr = np.vstack(subarr[None, ...] for subarr in arr)
print(repr(arr))
# array([[[0, 1],
# [2, 3]],
# [[0, 1],
# [2, 3]],
# [[0, 1],
# [2, 3]]], dtype=uint8)
The None index is used here to insert a new (first) dimension that we stack the arrays over.
Now that we have a (3, 2, 2) homogeneous array, we can just flatten out the last two dimensions to make a (3, 4) array. A trick for this is to use the .reshape() method with a -1 for the size of the last dimension, meaning that its size will be determined automatically based on the total number of elements in the array:
arr = arr.reshape(3, -1)
print(repr(arr))
# array([[0, 1, 2, 3],
# [0, 1, 2, 3],
# [0, 1, 2, 3]], dtype=uint8)