I want to get slices from a numpy array and assign them to a larger array.
The slices should be 64 long and should be taken out evenly from the source array.
I tried the following:
r = np.arange(0,magnitude.shape[0],step)
magnitudes[counter:counter+len(r),ch] = magnitude[r:r+64]
I get the following error when I tried the above code:
TypeError: only integer arrays with one element can be converted to an index
What is the most pythonic way to achieve the slicing?
magnitude[r:r+64] where r is an array is wrong. The variables in the slice must be scalars, magnitude[3:67], not magnitude[[1,2,3]:[5,6,7]].
If you want to collect multiple slices you have to do something like
In [345]: x=np.arange(10)
In [346]: [x[i:i+3] for i in range(4)]
Out[346]: [array([0, 1, 2]), array([1, 2, 3]), array([2, 3, 4]), array([3, 4, 5])]
In [347]: np.array([x[i:i+3] for i in range(4)])
Out[347]:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
Other SO questions have explored variations on this, trying to find the fastest, but it's hard to get around some sort loop or list comprehension.
I'd suggest working with this answer, and come back with a new question, and a small working example, if you think you need more speed.
Related
Suppose I have the following array:
import numpy as np
x = np.array([1,2,3,4,5,
1,2,3,4,5,
1,2,3,4,5])
How can I manipulate it to remove the term in equally spaced intervals and adapt the new length for it? For example, I'd like to have:
x = [1,2,3,4,
1,2,3,4,
1,2,3,4]
Where the terms from positions 4, 9, and 14 were excluded (so every 5 terms, one gets excluded). If possible, I'd like to have a code that I could use for an array with length N. Thank you in advance!
In your case, you can simply run code below after initializing the x array(as you did your question):
x.reshape(3,5)[:,:4]
Output
array([[1, 2, 3, 4],
[1, 2, 3, 4],
[1, 2, 3, 4]])
If you are interested in getting a vector and not a matrix(such as the output above), you can call the flatten function on the code above:
x.reshape(3,5)[:,:4].flatten()
Output
array([1, 2, 3, 4,
1, 2, 3, 4,
1, 2, 3, 4])
Explanation
Since x is a numpy array, we can use NumPy in-built functions such as reshape. This function, which has a self-explanatory name, shapes the array into the desired format. x was a vector of 15 elements. Therefore, running x.reshape(3,5) gives us a matrix with 3 rows and five columns. [:, :4] is to reselect the first four columns. flatten function changes a matrix into a vector.
IIUC, you can use a boolean mask generated with the modulo (%) operator:
N = 5
mask = np.arange(len(x))%N != N-1
x[mask]
output: array([1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4])
This works even if your array has not a size that is a multiple of N
Given:
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[i] gives the ith row (e.g. [1, 2]). How do I access the ith column? (e.g. [1, 3, 5]). Also, would this be an expensive operation?
To access column 0:
>>> test[:, 0]
array([1, 3, 5])
To access row 0:
>>> test[0, :]
array([1, 2])
This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It's certainly much quicker than accessing each element in a loop.
>>> test[:,0]
array([1, 3, 5])
this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have
ValueError: all the input arrays must have same number of dimensions
while
>>> test[:,[0]]
array([[1],
[3],
[5]])
gives you a column vector, so that you can do concatenate or hstack operation.
e.g.
>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
[3, 4, 3],
[5, 6, 5]])
And if you want to access more than one column at a time you could do:
>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
[3, 5],
[6, 8]])
You could also transpose and return a row:
In [4]: test.T[0]
Out[4]: array([1, 3, 5])
Although the question has been answered, let me mention some nuances.
Let's say you are interested in the first column of the array
arr = numpy.array([[1, 2],
[3, 4],
[5, 6]])
As you already know from other answers, to get it in the form of "row vector" (array of shape (3,)), you use slicing:
arr_col1_view = arr[:, 1] # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy() # creates a copy of the 1st column of the arr
To check if an array is a view or a copy of another array you can do the following:
arr_col1_view.base is arr # True
arr_col1_copy.base is arr # False
see ndarray.base.
Besides the obvious difference between the two (modifying arr_col1_view will affect the arr), the number of byte-steps for traversing each of them is different:
arr_col1_view.strides[0] # 8 bytes
arr_col1_copy.strides[0] # 4 bytes
see strides and this answer.
Why is this important? Imagine that you have a very big array A instead of the arr:
A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1]
A_col1_copy = A[:, 1].copy()
and you want to compute the sum of all the elements of the first column, i.e. A_col1_view.sum() or A_col1_copy.sum(). Using the copied version is much faster:
%timeit A_col1_view.sum() # ~248 µs
%timeit A_col1_copy.sum() # ~12.8 µs
This is due to the different number of strides mentioned before:
A_col1_view.strides[0] # 40000 bytes
A_col1_copy.strides[0] # 4 bytes
Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the A_col1_copy). However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.
In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:
A = np.asfortranarray(A) # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0] # 4 bytes
%timeit A_col1_view.sum() # ~12.6 µs vs ~248 µs
Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.
Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.
A[:, 1].strides[0] # 40000 bytes
A.T[1, :].strides[0] # 40000 bytes
To get several and indepent columns, just:
> test[:,[0,2]]
you will get colums 0 and 2
>>> test
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> ncol = test.shape[1]
>>> ncol
5L
Then you can select the 2nd - 4th column this way:
>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
[6, 7, 8]])
This is not multidimensional. It is 2 dimensional array. where you want to access the columns you wish.
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b] # you can provide index in place of a and b
Suppose I have
[[array([x1, y1]), z1]
[array([x2, y1]), z2]
......
[array([xn, yn]), zn]
]
And I want to find the index of array([x5, y5]). How can find effieciently using NumPy?
To start off, owing to the mixed data format, I don't think you can extract the arrays in a vectorized manner. Thus, you can use loop comprehension to extract the first element corresponding to the arrays from each list element as a 2D array. So, let's say A is the input list, we would have -
arr = np.vstack([a[0] for a in A])
Then, simply do the comparison in a vectorized fashion using NumPy's broadcasting feature, as it will broadcast that comparison along all the rows and look all matching rows with np.all(axis=1). Finally, use np.flatnonzero to get the final indices. Thus, the final peace of the puzzle would be -
idx = np.flatnonzero((arr == search1D).all(1))
You can read up on the answers to this post to see other alternatives to get indices in such a 1D array searching in 2D array problem.
Sample run -
In [140]: A
Out[140]:
[[array([3, 4]), 11],
[array([2, 1]), 12],
[array([4, 2]), 16],
[array([2, 1]), 21]]
In [141]: search1D = [2,1]
In [142]: arr = np.vstack([a[0] for a in A]) # Extract 2D array
In [143]: arr
Out[143]:
array([[3, 4],
[2, 1],
[4, 2],
[2, 1]])
In [144]: np.flatnonzero((arr == search1D).all(1)) # Finally get indices
Out[144]: array([1, 3])
I tried to use numpy.apply_along_axis, but this seems to work only when the applied function collapses the dimension and not when it expands it.
Example:
def dup(x):
return np.array([x, x])
a = np.array([1,2,3])
np.apply_along_axis(dup, axis=0, arr=a) # This doesn't work
I was expecting the matrix below (notice how its dimension has expanded from the input matrix a):
np.array([[1, 1], [2, 2], [3, 3]])
In R, this would be accomplished by the **ply set of functions from the plyr package. How to do it with numpy?
If you just want to repeat the elements you can use np.repeat :
>>> np.repeat(a,2).reshape(3,2)
array([[1, 1],
[2, 2],
[3, 3]])
And for apply a function use np.frompyfunc and for convert to an integrate array use np.vstack:
>>> def dup(x):
... return np.array([x, x])
>>> oct_array = np.frompyfunc(dup, 1, 1)
>>> oct_array(a)
array([array([1, 1]), array([2, 2]), array([3, 3])], dtype=object)
>>> np.vstack(oct_array(a))
array([[1, 1],
[2, 2],
[3, 3]])
For someone used to general Python code, a list comprehension may be the simplest approach:
In [20]: np.array([dup(x) for x in a])
Out[20]:
array([[1, 1],
[2, 2],
[3, 3]])
The comprehension (a loop or mapping that applies dup to each element of a) returns [array([1, 1]), array([2, 2]), array([3, 3])], which is easily turned into a 2d array with np.array().
At least for this small a, it is also faster than the np.frompyfunc approach. The np.frompyfunc function will give full access to broadcasting, but evidently it doesn't apply any fast iteration tricks.
apply_along_axis can help keep indices straight when dealing with many dimensions, but it still is just an iteration method. It's written Python so you can study its code yourself. It is much more complicated than needed for this simple case.
In order for your example to work as expected, a should be 2-dimensional:
def dup(x):
# x is now an array of size 1
return np.array([ x[0], x[0] ])
a = np.array([[1,2,3]]) # 2dim
np.apply_along_axis(dup, axis=0, arr=a)
=>
array([[1, 2, 3],
[1, 2, 3]])
Of course, you probably want to transpose the result.
Is there any built-in numpy function that would get:
a=np.asarray([[[1,2],[3,4]],[[1,2],[3,4]]])
And would return:
b=[[1,2],[3,4],[1,2],[3,4]]
? Something like like one layer flattening.
P.S. I am looking for a vectorized option otherwise this dumb code is available:
flat1D(a):
b=np.array([])
for item in a:
b=np.append(b,item)
return b
You can simply reshape the array.
>>> a.reshape(-1,a.shape[-1])
array([[1, 2],
[3, 4],
[1, 2],
[3, 4]])
The shown code returns a 1D array, to do this:
>>> a.ravel()
array([1, 2, 3, 4, 1, 2, 3, 4])
Or, if you are sure you want to copy the array:
>>> a.flatten()
array([1, 2, 3, 4, 1, 2, 3, 4])
The difference between ravel and flatten primarily comes from the fact that flatten will always return a copy and ravel will return a view if possible and a copy if not.
if you know the dimensions of the new array you can specify these as a tuple (4,2) and use .reshape()
a.reshape((4,2))