I have an array of 2d indices.
indices = [[2,4], [6,77], [102,554]]
Now, I have a different 4-dimensional array, arr, and I want to only extract an array (it is an array, since it is 4-dimensional) with corresponding index in the indices array. It is equivalent to the following code.
for i in range(len(indices)):
output[i] = arr[indices[i][0], indices[i][1]]
However, I realized that using explicit for-loop yields a slow result. Is there any built-in numpy API that I can utilized? At this point, I tried using np.choose, np.put, np.take, but did not succeed to yield what I wanted. Thank you!
We need to index into the first two axes with the two columns from indices (thinking of it as an array).
Thus, simply convert to array and index, like so -
indices_arr = np.array(indices)
out = arr[indices_arr[:,0], indices_arr[:,1]]
Or we could extract those directly without converting to array and then index -
d0,d1 = [i[0] for i in indices], [i[1] for i in indices]
out = arr[d0,d1]
Another way to extract the elements would be with conversion to tuple, like so -
out = arr[tuple(indices_arr.T)]
If indices is already an array, skip the conversion process and use indices in places where we had indices_arr.
Try using the take function of numpy arrays. Your code should be something like:
outputarray= np.take(arr,indices)
Related
I have a dictionary of 2D arrays and I would like to normalize each row of each 2D array by its mean.
I have:
for key, value in sorted(baseline.items()):
for i in baseline[str(key)]:
i = i / np.mean(i)
Where:
baseline is a dict
baseline[str(key)] is a 2D numpy array
i is a 1D array
print(i) results in the appropriately updated values, however the individual rows across baseline.items() do not get updated.
What am I missing?
First of all, here is a solution:
for i in baseline.values():
i /= i.mean(axis=1, keepdims=True)
Now as to why. The loop for i in baseline[key]: binds a view into the row of a 2D array to the name i at each iteration. You don't need str(key) because the outer loop ensures that the keys are correct. In fact, avoid transforming the keys unnecessarily to avoid surprises, like if you accidentally get an integer key.
The line i = i / np.mean(i) does not do in-place division of the array by its mean. It computes the array i / np.mean(i), then rebinds the name i to the new array. The new array is then discarded on the next iteration.
You can fix this by re-assigning into the slice that i represents:
i[:] = i / np.mean(i)
Alternatively, you can perform the division in-place using the correct operator:
i /= np.mean(i)
As you can see in my solution, there is no need to iterate over the rows at all. np.mean is a vectorized function that can operate along any axis of an array. By setting keepdims=True, you ensure that the result has the right shape to be broadcasted right back over the original when you divide them.
A less flexible alternative to i.mean(axis=1, keepdims=True) specific for 2D arrays is
i.mean(axis=1)[:, None]
I have a little question about python Numpy. What I want to do is the following:
having two numpy arrays arr1 = [1,2,3] and arr2 = [3,4,5] I would like to obtain a new array arr3 = [[1,2,3],[3,4,5]], but in an iterative way. For a single instance, this is just obtained by typing arr3 = np.array([arr1,arr2]).
What I have instead, are several arrays e.g. [4,3,1 ..], [4,3,5, ...],[1,2,1,...] and I would like to end up with [[4,3,1 ..], [4,3,5, ...],[1,2,1,...]], potentally using a for loop. How should I do this?
EDIT:
Ok I'm trying to add more details to the overall problem. First, I have a list of strings list_strings=['A', 'B','C', 'D', ...]. I'm using a specific method to obtain informative numbers out of a single string, so for example I have method(list_strings[0]) = [1,2,3,...], and I can do this for each single string I have in the initial list.
What I would like to come up with is an iterative for loop to end up having all the numbers extracted from each string in turn in the way I've described at the beginning, i.e.a single array with all the numeric sub-arrays with information extracted from each string. Hope this makes more sense now, and sorry If I haven't explained correctly, I'm really new in programming and trying to figure out stuff.
Well if your strings are in a list, we want to put the arrays that result from calling method in a list as well. Python's list comprehension is a great way to achieve that.
list_strings = ['A', ...]
list_of_converted_strings = [method(item) for item in list_strings]
arr = np.array(list_of_converted_strings)
Numpy arrays are of fixed dimension i.e. for example a 2D numpy array of shape n X m will have n rows and m columns. If you want to convert a list of lists into a numpy array all the the sublists in the main list should be of same length. You cannot convert it into a numpy array if sublist are of varying size.
For example, below code will give an error
np.array([[1], [3,4]]])
so if all the sublist are of same size then you can use
np.array([method(x) for x in strings]])
I have function predicton like
def predictions(degree):
some magic,
return an np.ndarray([0..100])
I want to call this function for a few values of degree and use it to populate a larger np.ndarray (n=2), filling each row with the outcome of the function predictions. It seems like a simple task but somehow I cant get it working. I tried with
for deg in [1,2,4,8,10]:
np.append(result, predictions(deg),axis=1)
with result being an np.empty(100). But that failed with Singleton array array(1) cannot be considered a valid collection.
I could not get fromfunction it only works on a coordinate tuple, and the irregular list of degrees is not covered in the docs.
Don't use np.ndarray until you are older and wiser! I couldn't even use it without rereading the docs.
arr1d = np.array([1,2,3,4,5])
is the correct way to construct a 1d array from a list of numbers.
Also don't use np.append. I won't even add the 'older and wiser' qualification. It doesn't work in-place; and is slow when used in a loop.
A good way of building a 2 array from 1d arrays is:
alist = []
for i in ....:
alist.append(<alist or 1d array>)
arr = np.array(alist)
provided all the sublists have the same size, arr should be a 2d array.
This is equivalent to building a 2d array from
np.array([[1,2,3], [4,5,6]])
that is a list of lists.
Or a list comprehension:
np.array([predictions(i) for i in range(10)])
Again, predictions must all return the same length arrays or lists.
append is in the boring section of numpy. here you know the shape in advance
len_predictions = 100
def predictions(degree):
return np.ones((len_predictions,))
degrees = [1,2,4,8,10]
result = np.empty((len(degrees), len_predictions))
for i, deg in enumerate(degrees):
result[i] = predictions(deg)
if you want to store the degree somehow, you can use custom dtypes
Let's say I have a function (called numpyarrayfunction) that outputs an array every time I run it. I would like to run the function multiple times and store the resulting arrays. Obviously, the current method that I am using to do this -
numpyarray = np.zeros((5))
for i in range(5):
numpyarray[i] = numpyarrayfunction
generates an error message since I am trying to store an array within an array.
Eventually, what I would like to do is to take the average of the numbers that are in the arrays, and then take the average of these averages. But for the moment, it would be useful to just know how to store the arrays!
Thank you for your help!
As comments and other answers have already laid out, a good way to do this is to store the arrays being returned by numpyarrayfunction in a normal Python list.
If you want everything to be in a single numpy array (for, say, memory efficiency or computation speed), and the arrays returned by numpyarrayfunction are of a fixed length n, you could make numpyarray multidimensional:
numpyarray = np.empty((5, n))
for i in range(5):
numpyarray[i, :] = numpyarrayfunction
Then you could do np.average(numpyarray, axis = 1) to average over the second axis, which would give you back a one-dimensional array with the average of each array you got from numpyarrayfunction. np.average(numpyarray) would be the average over all the elements, or np.average(np.average(numpyarray, axis = 1)) if you really want the average value of the averages.
More on numpy array indexing.
I initially misread what was going on inside the for loop there. The reason you're getting an error is because numpy arrays will only store numeric types by default, and numpyarrayfunction is returning a non-numeric value (from the name, probably another numpy array). If that function already returns a full numpy array, then you can do something more like this:
arrays = []
for i in range(5):
arrays.append(numpyarrayfunction(args))
Then, you can take the average like so:
avgarray = np.zeros((len(arrays[0])))
for array in arrays:
avgarray += array
avgarray = avgarray/len(arrays)
I'm confused by what the results of numpy.where mean, and how to use it to index into an array.
Have a look at the code sample below:
import numpy as np
a = np.random.randn(10,10,2)
indices = np.where(a[:,:,0] > 0.5)
I expect the indices array to be 2-dim and contain the indices where the condition is true. We can see that by
indices = np.array(indices)
indices.shape # (2,120)
So it looks like indices is acting on the flattened array of some sort, but I'm not able to figure out exactly how. More confusingly,
a.shape # (20,20,2)
a[indices].shape # (2,120,20,2)
Question:
How does indexing my array with the output of np.where actually grow the size of the array? What is going on here?
You are basing your indexing on a wrong assumption: np.where returns something that can be immediatly used for advanced indexing (it's a tuple of np.ndarrays). But you convert it to a numpy array (so it's now a np.ndarray of np.ndarrays).
So
import numpy as np
a = np.random.randn(10,10,2)
indices = np.where(a[:,:,0] > 0.5)
a[:,:,0][indices]
# If you do a[indices] the result would be different, I'm not sure what
# you intended.
gives you the elements that are found by np.where. If you convert indices to a np.array it triggers another form of indexing (see this section of the numpy docs) and the warning message in the docs gets very important. That's the reason why it increases the total size of your array.
Some additional information about what np.where means: You get a tuple containing n arrays. n is the number of dimensions of the input array. So the first element that satisfies the condition has index [0][0], [1][0], ... [n][0] and not [0][0], [0][1], ... [0][n]. So in your case you have (2, 120) meaning you have 2 dimensions and 120 found points.