I'm interested in the multi-dimensional case of Increment Numpy array with repeated indices.
I have an N-dimensional array and a set N index arrays, who's values I want to increment. The index arrays might have have repeated entries.
Without repeats, the solution is
a = arange(24).reshape(2,3,4)
i = array([0,0,1])
j = array([0,1,1])
k = array([0,0,3])
a[i,j,k] += 1
With repeats, (ex. j=array([0,0,2]) ), I'm unable to make numpy increment the replicates.
How about this:
import numpy as np
a = np.zeros((2,3,4))
i = np.array([0,0,1])
j = np.array([0,0,1])
k = np.array([0,0,3])
ijk = np.vstack((i,j,k)).T
H,edge = np.histogramdd(ijk,bins=a.shape)
a += H
I don't know if there is an easier solution with direct array indexing, but this works:
for x,y,z in zip(i,j,k):
a[x,y,z] +=1
Related
I have a large 2D numpy array and want to find the indices of the 1D arrays inside it that meet a condition: e.g., have at least a value greater than a given threshold x.
I already can do it the following way but is there a shorter, more efficient way to do it?
import numpy
a = numpy.array([[1,2,3,4,5], [1,2,3,4,20], [1,2,2,4,5]])
indices = []
i = 0
x = 10
for item in a:
if any(j > x for j in item):
indices.append(i)
i += 1
print(indices) # gives [1]
You could use numpy's built-in boolean operations:
import numpy as np
a = np.array([[1,2,3,4,5], [1,2,3,4,20], [1,2,2,4,5]])
indices = np.argwhere(np.any(a > 10, axis=1))
I want to know how to code an efficient index over a numpy array. For the moment, I go over the array elements using repeated additions. For example, I have to make some loops over an array A like this:
import numpy as np
A = np.arange(0,100)
M = 10
for i in range(A.shape[0]-M):
B = []
for j in range(M):
value = A[i+j]
B.append(value)
Is there a way for me to get the values without repeatedly doing the i+j addition?
I am building a function that I would like to creates a random 20x20 array consisting of the values 0, 1 and 2. I would secondly like to iterate through the array and keep a count of how many of each number are in the array. Here is my code:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import random
def my_array():
rand_array = np.random.randint(0,3,(20,20))
zeros = 0
ones = 0
twos = 0
for element in rand_array:
if element == 0:
zeros += 1
elif element == 1:
ones += 1
else:
twos += 1
return rand_array,zeros,ones,twos
print(my_array())
When I eliminate the for loop to try and iterate the array it works fine and prints the array however as is, the code gives this error message:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
When you iterate on a multi-dimensional numpy array, you're only iterating over the first dimension. In your example, your element values will be 1-dimensional arrays too!
You could solve the issue with another for loop over the values of the 1-dimensional array, but in numpy code, using for loops is very often a bad idea. You usually want to be using vector operations and operations broadcast across the whole array instead.
In your example, you could do:
rand_array = np.random.randint(0,3,(20,20))
# no loop needed
zeros = np.sum(rand_array == 0)
ones = np.sum(rand_array == 1)
twos = np.sum(rand_array == 2)
The == operator is broadcast over the whole array producing an boolean array. Then the sum adds up the True values (True is equal to 1 in Python) to get a count.
As already pointed out you are iterating over the rows, not the elements. And numpy just refuses to evaluate the truth of an array except the array only contains one element.
Iteration over all elements
If you want to iterate over each element I would suggest using np.nditer. That way you access every element regardless of how many dimensions your array has. You just need to alter this line:
for element in np.nditer(rand_array):
# instead of "for element in rand_array:"
An alternative using a histogram
But I think there is an even better approach: If you have an array containing discrete values (like integer) you could use np.histogram to get your counts.
You need to setup the bins so that every integer will have it's own bin:
bins = np.arange(np.min(rand_array)-0.5, np.max(rand_array)+1.5)
# in your case this will give an array containing [-0.5, 0.5, 1.5, 2.5]
This way the histogram will fill the first bin with every value between -0.5 and 0.5 (so every 0 of your array), the second bin with all values between 0.5 and 1.5 (every 1), and so on. Then you call the histogram function to get the counts:
counts, _ = np.histogram(rand_array, bins=bins)
print(counts) # [130 145 125] # So 130 zeros, 145 ones, 125 twos
This approach has the advantage that you don't need to hardcode your values (because they will be calculated within the bins).
As indicated in the comments, you don't need to setup the bins as float. You could use simple integer-bins:
bins = np.arange(np.min(rand_array), np.max(rand_array)+2)
# [0 1 2 3]
counts, _ = np.histogram(rand_array, bins=bins)
print(counts) # [130 145 125]
The for loop iterates through the rows, so you have to insert another loop for every row:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import random
def my_array():
rand_array = np.random.randint(0,3,(20,20))
zeros = 0
ones = 0
twos = 0
for element in rand_array:
for el in element:
if el == 0:
zeros += 1
elif el == 1:
ones += 1
else:
twos += 1
return rand_array,zeros,ones,twos
return rand_array
print(my_array())
I have an array with 4 values in it, called array r, using the numpy array command.
from numpy import array, amax, amin
r = array([r1,r2,r3,r4]
I need to sum the max and the min of this array:
g_1 = amax(r)+amin(r)
Now I need to compare this value (g_1) with the sum of the two other elements of the array (I don't know what value is the max when I program this part of the code) and I don't know how to do that.
from numpy import sum
g_2 = sum(r) - g_1
comp = g_1 <= g_2
The sum of the other two elements of the array is simply the sum of all elements of the array, minus the max and min values: sum(r) - g_1
You might as well sort the array, will probably require less compares overall:
r_sort = np.sort(r)
g_1 = r_sort[0] + r_sort[-1]
g_2 = r_sort[1] + r_sort[2]
I tried to use a for and if to see if I can do it right and the code I wrote seems to work:
from numpy import array, amax, amin
r=array([r1,r2,r3,r4])
g_1=amax(r)+amin(r)
for j in range (size(r)):
if r[j] != amax(r) and r[j] != amin(r):
g_2+=r[j]
This code seems to return correctly the g_2 I was looking for. Not the best solution, what do you think about it?
Is there a more efficient way to update the values of a multidimensional numpy array?
For example, I have a loop
for i in np.arange(5):
for j in np.arange(5):
if (i + j) % 2 == 0:
v[i,j] = v[i,j] + v[i, j + 1]
I was thinking on parallelizing this process later (with multiprocessing and Pool) but I can't imagine how. Maybe defining a function and using map but this is a 2D array and the operations depend on the element indexes.
Basically you are doing this:
You can do this in two lines using slice indexing:
v[0:5:2,0:5:2] += v[0:5:2,1:6:2] # even rows
v[1:5:2,1:5:2] += v[1:5:2,2:6:2] # odd rows