Duplicate numpy array rows where the number of duplications varies - python

Whereas there is a numpy array for which one desires to duplicate each value a specified number of times:
np.array([1,2,3,4])
and a second array definining the number of duplications desired for each corresponding index position in the original array:
np.array([3,3,2,2])
How does one produce:
[1,1,1,2,2,2,3,3,4,4]
Obviously, it is possible to use iteration to produce the new array, but I'm curious if there is a more elegant numpy-based solution.

Use numpy.repeat:
>>> numpy.repeat([1,2,3,4], [3,3,2,2])
array([1, 1, 1, 2, 2, 2, 3, 3, 4, 4])

Related

Flipping variable length subvectors inside a numpy array efficiently

I have a problem that requires me to re-order elements in subvectors within a long vector in a specific way such that the first element of the subvector remains in place, and the remaining elements are flipped.
For example:
vector = [0, 1, 2, 3, 4, 5, 6, 7] and the subvectors have length 3 and 5, then the flipped version would be:
vector = [0, 2, 1, 3, 7, 6, 5, 4]
A naive way of doing this would be:
import numpy as np
vector = [0, 1, 2, 3, 4, 5, 6, 7] # the vector to flip
subVecStartIdxs = [0, 3] # start position of each subvector
for jj in range(0, len(vector)-1):
vector[subVecStartIdxs[jj]+1:subVecStartIdxs[jj+1]] =
np.flipud(vector[subVecStartIdxs[jj]+1:subVecStartIdxs[jj+1]])
#do the last one:
faceIdx[fStartIdx[jj]+1:fStartIdx[jj+1]] =
np.flipud(faceIdx[fStartIdx[jj]+1:fStartIdx[jj+1]])
Can you think of a faster way to do this? I cannot find a way to vectorise this... The speed is ok for small vectors, but million+ lengths it becomes very slow.
My solution to this in the end was to determine the unique lengths of the subvectors and create 2D arrays that are groups of these, where the 2D array is nSubVectors long, and has zeros at locations where the subvectors have different lengths to the current length.
From there, the entire 2D array can be flipped from left to right, excluding the first column, which I believe is O(1) in numpy.
Then, we just loop over each unique subvector length. In my case, this is very efficient because there are only ~10 of these, but there are millions of subvectors.
There is a bit of data management in rearranging this all to the original data structure, but it's really just some admin.
This results in a > 100x speedup from the naive loop presented in my original question.

Insert elements from a set to an array (numpy)

I have a set (S) of numbers and I want to put this numbers in an array (arr) . I tried this code
Arr = np.array(S)
but I can't access to arrays element, for example if I try
Arr[0]
, I get this error:
IndexError: too many indices for array
Can anyone explain what is the problem with this approach and is there any other way that I can use in order to put the elements of set in array and access to them?
Thanks
You first need to convert your set of numbers to a list.
S = {1, 2, 3}
>>> np.array(S)
array(set([1, 2, 3]), dtype=object)
>>> np.array(list(S))
array([1, 2, 3])

Custom slicing in numpy arrays (get specific elements, then every n-th) possible?

I'm in need of a more customized way to extract given elements from a numpy array than the general indexing seems to allow me. In particular, I want to get a number of arbitrary, predefined elements, then every n-th, starting at a given point.
Say, e.g., I want the second (as in index number 2) and fourth element of an array, and then, every third element, beginning from the sixth one. So far, I'm doing:
newArray = np.concatenate(myArray[(2, 4)], myArray[6::3])
Is there a more convenient way to achieve this?
It's effectively identical to what you're doing, but you might find it a bit more convenient to do:
new_array = my_array[np.r_[2, 4, 6:len(my_array):3]]
np.r_ is basically concatenation + arange-like slicing.
For example:
In [1]: import numpy as np
In [2]: np.r_[np.arange(5), np.arange(1, 4)]
Out[2]: array([0, 1, 2, 3, 4, 1, 2, 3])
In [3]: np.r_[1, 2, :5]
Out[3]: array([1, 2, 0, 1, 2, 3, 4])
In [4]: np.r_[:5]
Out[4]: array([0, 1, 2, 3, 4])
The downside to this approach is that you're building up an (potentially very large) additional indexing array. In either case, you're going to wind up creating a copy, but if my_array is very large, your original approach is more efficient.
np.r_ is a bit unreadable (meant for interactive use), but it can be a very handy way of building up arbitrary indexing arrays.

How do I remove the first and last rows and columns from a 2D numpy array?

I'd like to know how to remove the first and last rows and columns from a 2D array in numpy. For example, say we have a (N+1) x (N+1) matrix called H then in MATLAB/Octave, the code I'd use would be:
Hsub = H(2:N,2:N);
What's the equivalent code in Numpy? I thought that np.reshape might do what I want but I'm not sure how to get it to remove just the target rows as I think if I reshape to a (N-1) x (N-1) matrix, it'll remove the last two rows and columns.
How about this?
Hsub = H[1:-1, 1:-1]
The 1:-1 range means that we access elements from the second index, or 1, and we go up to the second last index, as indicated by the -1 for a dimension. We do this for both dimensions independently. When you do this independently for both dimensions, the result is the intersection of how you're accessing each dimension, which is essentially chopping off the first row, first column, last row and last column.
Remember, the ending index is exclusive, so if we did 0:3 for example, we only get the first three elements of a dimension, not four.
Also, negative indices mean that we access the array from the end. -1 is the last value to access in a particular dimension, but because of the exclusivity, we are getting up to the second last element, not the last element. Essentially, this is the same as doing:
Hsub = H[1:H.shape[0]-1, 1:H.shape[1]-1]
... but using negative indices is much more elegant. You also don't have to use the number of rows and columns to extract out what you need. The above syntax is dimension agnostic. However, you need to make sure that the matrix is at least 3 x 3, or you'll get an error.
Small bonus
In MATLAB / Octave, you can achieve the same thing without using the dimensions by:
Hsub = H(2:end-1, 2:end-1);
The end keyword with regards to indexing means to get the last element for a particular dimension.
Example use
Here's an example (using IPython):
In [1]: import numpy as np
In [2]: H = np.meshgrid(np.arange(5), np.arange(5))[0]
In [3]: H
Out[3]:
array([[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4],
[0, 1, 2, 3, 4]])
In [4]: Hsub = H[1:-1,1:-1]
In [5]: Hsub
Out[5]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
As you can see, the first row, first column, last row and last column have been removed from the source matrix H and the remainder has been placed in the output matrix Hsub.

Finding the minimum value in a numpy array and the corresponding values for the rest of that array's row

Consider the following NumPy array:
a = np.array([[1,4], [2,1],(3,10),(4,8)])
This gives an array that looks like the following:
array([[ 1, 4],
[ 2, 1],
[ 3, 10],
[ 4, 8]])
What I'm trying to do is find the minimum value of the second column (which in this case is 1), and then report the other value of that pair (in this case 2). I've tried using something like argmin, but that gets tripped up by the 1 in the first column.
Is there a way to do this easily? I've also considered sorting the array, but I can't seem to get that to work in a way that keeps the pairs together. The data is being generated by a loop like the following, so if there's a easier way to do this that isn't a numpy array, I'd take that as an answer too:
results = np.zeros((100,2))
# Loop over search range, change kappa each time
for i in range(100):
results[i,0] = function1(x)
results[i,1] = function2(y)
How about
a[np.argmin(a[:, 1]), 0]
Break-down
a. Grab the second column
>>> a[:, 1]
array([ 4, 1, 10, 8])
b. Get the index of the minimum element in the second column
>>> np.argmin(a[:, 1])
1
c. Index a with that to get the corresponding row
>>> a[np.argmin(a[:, 1])]
array([2, 1])
d. And take the first element
>>> a[np.argmin(a[:, 1]), 0]
2
Using np.argmin is probably the best way to tackle this. To do it in pure python, you could use:
min(tuple(r[::-1]) for r in a)[::-1]

Categories

Resources