I have multiple 5x5 arrays which are contained within one large array - the overarching shape is: 5 x 5 x 29. I want to sum every 5 x 5 array to produce one single array, instead of 29 single arrays.
I know that you can do something along the lines of:
new_data = data1[:,:,0] + data1[:,:,1] + ... + data1[:,:,29]
However, this gets very cumbersome for large arrays. Is there an easier way to do this?
Assuming you are using NumPy, you should be able to do this with:
In [13]: data1 = np.arange(100).reshape(5, 5, 4) # For example
In [14]: data1[:,:,0] + data1[:,:,1] + data1[:,:,2] + data1[:,:,3] # Bad way
Out[14]:
array([[ 6, 22, 38, 54, 70],
[ 86, 102, 118, 134, 150],
[166, 182, 198, 214, 230],
[246, 262, 278, 294, 310],
[326, 342, 358, 374, 390]])
In [15]: data1.sum(axis=2) # Good way
Out[15]:
array([[ 6, 22, 38, 54, 70],
[ 86, 102, 118, 134, 150],
[166, 182, 198, 214, 230],
[246, 262, 278, 294, 310],
[326, 342, 358, 374, 390]])
If you are saying you have a list of arrays, then use a for loop.
for i in range(29):
new_data+= data1[:,:,i]
If you are saying you have a tensor or some ND array you should review and research numpy's ND array docs.
You can use a for loop. Like this:
import numpy as np
new_data = np.zeros((5, 5))
for i in range(29):
new_data += data1[:,:,i]
Related
TL;DR: np.apply_along_axis works for a certain array with shape (1561, 338) which is a subset of another array with shape (351225, 338) for which it fails.
I am trying to apply the following function:
def add_min(a):
return a + abs(a.min()) if a.min() < 0 else a
x_train has shape (1561, 15, 15, 338) (n * height * width * channels) and I need to shift all values to positive to be able to log normalize my data. I want to do that per channel, for obvious reasons.
Now if I reshape x_train: x_train = x_train.reshape(-1, 338) and get shape (351225, 338)
I should be able to perform:
x_train = np.apply_along_axis(add_min, 0, x_train)
However...
Before:
x_train.min()
>> -2147483648
After:
x_train.min()
>> -2147370103
In other words, it does not work. On the other hand, if I only keep the center pixel:
# Keep the center value of axes (1, 2)
x_train = x_train[:, x_train.shape[1]//2, x_train.shape[2]//2, :]
x_train.shape
>> (1561, 338)
x_train.min()
>> -32768 # strange coincidence that this location in the image has a much smaller value range
x_train = np.apply_along_axis(add_min, 0, x_train)
x_train.min()
>> 0
I think it has something to do with the large negative values, because if I select random indices in the 2 center axes (i.e. 1 and 8) instead of the middle (7, 7) I again get x_train.min() of -2147483648 and -2147369934, before and after np.apply_along_axis, respectively.
So what am I doing wrong? Is there a better way I can achieve my goal?
The overflow on int32 is a good guess if the problem is "random". But using apply_along_axis may be making it harder to diagnose the issue, since it's wrapping your function in an (obscure) iteration. It should be easier to diagnose things with whole array calculations.
Make a modest array a mix of min values:
In [77]: A = np.random.randint(-50,1000,(4,8))
In [78]: A
Out[78]:
array([[151, 531, 765, 379, 89, 499, 818, 848],
[873, -12, -45, 900, 416, 838, 603, 849],
[540, 0, 1, 589, 297, 566, 688, 556],
[ 53, 170, 461, -16, -6, 480, 321, 392]])
Your function:
In [79]: np.apply_along_axis(add_min, 0, A)
Out[79]:
array([[151, 543, 810, 395, 95, 499, 818, 848],
[873, 0, 0, 916, 422, 838, 603, 849],
[540, 12, 46, 605, 303, 566, 688, 556],
[ 53, 182, 506, 0, 0, 480, 321, 392]])
Let's create an whole-array equivalent. First find the min with a axis specification:
In [80]: am = np.min(A, axis=0, keepdims=True)
In [81]: am
Out[81]: array([[ 53, -12, -45, -16, -6, 480, 321, 392]])
Now create a shift array that imitates your function (without the if that only works for scalars):
In [82]: shift=np.abs(am)
In [83]: shift[am>=0]=0
In [84]: shift
Out[84]: array([[ 0, 12, 45, 16, 6, 0, 0, 0]])
In [85]: A+shift
Out[85]:
array([[151, 543, 810, 395, 95, 499, 818, 848],
[873, 0, 0, 916, 422, 838, 603, 849],
[540, 12, 46, 605, 303, 566, 688, 556],
[ 53, 182, 506, 0, 0, 480, 321, 392]])
There are other ways of getting that shift, but the same basic idea applies, using the am<0 to determine which columns get the shift.
This will also be faster.
I have a 2d numpy array:
dataset_tr = 'data/20news_clean/train.txt.npy'
data_tr = np.load(dataset_tr)
thedata_tr looks like this: It is a 3*10 numpy array:
[array([ 700, 152, 572, 572, 619, 724, 326, 1571, 572, 99])
array([ 331, 152, 397, 1273, 89, 228, 0, 0, 0, 0])
array([ 6, 1, 26, 174, 216, 135, 1060, 259, 75, 7])]
Each row here is a representation for a document in the 20newsgroup dataset.
All I want to do is to create a key out of this 2d array. The result will be 1 * 3 because I had 3 row in my 2d aaray.
Actually what I am doing here is that I am trying to assign a name to each row of that array. So the result will look like this:
['doc1', 'doc2', 'doc3']
I am able to get this but by looping through the 2d array.
Is there any better numpy way of doing this?
You can get the desired result with a list comprehension:
result = ['doc%i' % i for i in range(len(data_tr))]
docs = ['doc'+str(i+1) for i in range(len(data_tr)]
I've an image processing task and we're prohibited to use NumPy so we need to code from scratch. I've done the logic image transformation but now I'm stuck on creating an array without numpy.
So here's my last output code :
Output :
new_log =
[[236,
232,
226,
.
.
.
198,
204]]
I need to convert this to an array so I can write the image like this (with Numpy)
new_log =
array([[236, 232, 226, ..., 208, 209, 212],
[202, 197, 187, ..., 198, 200, 203],
[192, 188, 180, ..., 205, 206, 207],
...,
[233, 226, 227, ..., 172, 189, 199],
[235, 233, 228, ..., 175, 182, 192],
[235, 232, 228, ..., 195, 198, 204]], dtype=uint8)
cv.imwrite('log_transformed.jpg', new_log)
# new_log must be shaped like the second output
You can make a straightforward function to take your list and reshape it in a similar way to NumPy's np.reshape(). But it's not going to be fast, and it doesn't know anything about data types (NumPy's dtype) so... my advice is to challenge whoever it is that doesn't like NumPy. Especially if you're using OpenCV — it depends on NumPy!
Here's an example of what you could do in pure Python:
def reshape(l, shape):
"""Reshape a list.
Example
-------
>>> l = [1,2,3,4,5,6,7,8,9]
>>> reshape(l, shape=(3, -1))
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
"""
nrows, ncols = shape
if ncols == -1:
ncols = len(l) // nrows
if nrows == -1:
nrows = len(l) // ncols
array = []
for r in range(nrows):
row = []
for c in range(ncols):
row.append(l[ncols*r + c])
array.append(row)
return array
How to change this array into a 5*2 matrix?
This is my array:
[[ ([[315, 327, 333, 334, 339]], [[146, 143, 145, 145, 146]])]]
how to change the array into a 5*2 matrix
I'm really not sure what you meant by that, but if you want to rotate it (2*5 --> 5*2) you can try this
arrays = [[315, 327, 333, 334, 339], [146, 143, 145, 145, 146]]
newArrays = [[] for _ in range(len(arrays[0]))] # initialise this list first
for arr in arrays:
for i,item in enumerate(arr):
newArrays[i].append(item)
print(newArrays)
# [[315, 146], [327, 143], [333, 145], [334, 145], [339, 146]]
Numpy provides reshape method to reshape an array into array of any dimension with the same number of elements. You can use the method to reshape array of any shape into another shape as long as the product of the original array dimension(s) is/are equal to the product of new array dimension(s).
import numpy as np
a=[[ ([[315, 327, 333, 334, 339]], [[146, 143, 145, 145, 146]])]]
b=np.array(a).reshape((5,2))
list_b=b.tolist();
print list_b
# [[315, 327], [333, 334], [339, 146], [143, 145], [145, 146]]
I have a large 2xn array A and a smaller 2xn array B. All columns in B can be found in A. I'm looking to find the indices of A by matching columns in B. For example,
import numpy
A = numpy.array([
[101, 101, 101, 102, 102, 103, 103, 104, 105, 106, 107, 108, 108, 109, 109, 110, 110, 211],
[102, 103, 105, 104, 106, 109, 224, 109, 110, 110, 108, 109, 110, 211, 212, 211, 212, 213]
])
B = numpy.array([
[101, 103, 109],
[102, 224, 212]
])
The answer that I'm looking for is [0,6,14]. Interested to know if there is an efficient way rather than looping. Thanks!
There is hardly a good answer for your problem: numpy is not very well suited for this type of problems, although it can be done. To do subarray searches, if your dtype is not floating point, the method here is probably your best bet. You would start with something like:
AA = np.ascontiguousarray(A.T)
BB = np.ascontiguousarray(B.T)
dt = np.dtype((np.void, AA.dtype.itemsize * AA.shape[1]))
AA = AA.view(dt).ravel()
BB = BB.view(dt).ravel()
And now it is just about searching for the items in a 1D array in another 1D array, which is pretty straightforward, assuming there are no repeated columns in the original A array.
If either of your arrays is really small, as in your example, it is going to be hard to beat something like:
indices = np.argmax(AA == BB[:, None], axis = 1)
But for larger datasets, it is going to be hard to beat a sorting approach:
sorter = np.argsort(AA)
sorted_indices = np.searchsorted(AA, BB, sorter=sorter)
indices = sorter[sorted_indices]
Here's a way, given the arrays are pre-sorted:
import numpy
A = numpy.array([
[101, 101, 101, 102, 102, 103, 103, 104, 105, 106, 107, 108, 108, 109, 109, 110, 110, 211],
[102, 103, 105, 104, 106, 109, 224, 109, 110, 110, 108, 109, 110, 211, 212, 211, 212, 213]
])
B = numpy.array([
[101, 103, 109],
[102, 224, 212]
])
def search2D(A, B):
to_find_and_bounds = zip(
B[1],
numpy.searchsorted(A[0], B[0], side="left"),
numpy.searchsorted(A[0], B[0], side="right")
)
for to_find, left, right in to_find_and_bounds:
offset = numpy.searchsorted(A[1, left:right], to_find)
yield offset + left
list(search2D(A, B))
#>>> [0, 6, 14]
This is O(len B · log len A).
For unsorted arrays, you can perform an indirect sort:
sorter = numpy.lexsort(A[::-1])
sorted_copy = A.T[sorter].T
sorter[list(search2D(sorted_copy, B))]
#>>> array([ 3, 6, 14])
If you need multiple results from one index, try
for to_find, left, right in to_find_and_bounds:
offset_left = numpy.searchsorted(A[1, left:right], to_find, side="left")
offset_right = numpy.searchsorted(A[1, left:right], to_find, side="right")
yield from range(offset_left + left, offset_right + left)
You could use a string-based comparison such as this one using np.char.array
ca = np.char.array(a)[0,:] + np.char.array(a)[1,:]
cb = np.char.array(b)[0,:] + np.char.array(b)[1,:]
np.where(np.in1d(ca, cb))[0]
#array([ 0, 6, 14], dtype=int64)
EDIT:
you can also manipulate the array dtype in order to transform the a array to one with shape=(18,) where each element contains the data of the two elements of the corresponding column. The same idea can be applied to array b, obtaining shape=(3,). Then you use np.where(np.in1d()) to get the indices:
nrows = a.shape[0]
ta = np.ascontiguousarray(a.T).view(np.dtype((np.void, a.itemsize*nrows))).flatten()
tb = np.ascontiguousarray(b.T).view(np.dtype((np.void, b.itemsize*nrows))).flatten()
np.where(np.in1d(ta, tb))[0]
#array([ 0, 6, 14], dtype=int64)
The idea is similar to the string-based approach.
Numpy has all you need. I assume the arrays are not sorted, conversly you can improve the following code as you prefer:
import numpy as np
a = np.array([[101, 101, 101, 102, 102, 103, 103, 104, 105, 106, 107, 108, 108, 109, 109, 110, 110, 211],
[102, 103, 105, 104, 106, 109, 224, 109, 110, 110, 108, 109, 110, 211, 212, 211, 212, 213]])
b = np.array([[101, 103, 109],
[102, 224, 212]])
idxs = []
for i in range(np.shape(b)[1]):
for j in range(np.shape(a)[1]):
if np.array_equal(b[:,i],a[:,j]):
idxs.append(j)
print idxs