I am building a function that I would like to creates a random 20x20 array consisting of the values 0, 1 and 2. I would secondly like to iterate through the array and keep a count of how many of each number are in the array. Here is my code:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import random
def my_array():
rand_array = np.random.randint(0,3,(20,20))
zeros = 0
ones = 0
twos = 0
for element in rand_array:
if element == 0:
zeros += 1
elif element == 1:
ones += 1
else:
twos += 1
return rand_array,zeros,ones,twos
print(my_array())
When I eliminate the for loop to try and iterate the array it works fine and prints the array however as is, the code gives this error message:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
When you iterate on a multi-dimensional numpy array, you're only iterating over the first dimension. In your example, your element values will be 1-dimensional arrays too!
You could solve the issue with another for loop over the values of the 1-dimensional array, but in numpy code, using for loops is very often a bad idea. You usually want to be using vector operations and operations broadcast across the whole array instead.
In your example, you could do:
rand_array = np.random.randint(0,3,(20,20))
# no loop needed
zeros = np.sum(rand_array == 0)
ones = np.sum(rand_array == 1)
twos = np.sum(rand_array == 2)
The == operator is broadcast over the whole array producing an boolean array. Then the sum adds up the True values (True is equal to 1 in Python) to get a count.
As already pointed out you are iterating over the rows, not the elements. And numpy just refuses to evaluate the truth of an array except the array only contains one element.
Iteration over all elements
If you want to iterate over each element I would suggest using np.nditer. That way you access every element regardless of how many dimensions your array has. You just need to alter this line:
for element in np.nditer(rand_array):
# instead of "for element in rand_array:"
An alternative using a histogram
But I think there is an even better approach: If you have an array containing discrete values (like integer) you could use np.histogram to get your counts.
You need to setup the bins so that every integer will have it's own bin:
bins = np.arange(np.min(rand_array)-0.5, np.max(rand_array)+1.5)
# in your case this will give an array containing [-0.5, 0.5, 1.5, 2.5]
This way the histogram will fill the first bin with every value between -0.5 and 0.5 (so every 0 of your array), the second bin with all values between 0.5 and 1.5 (every 1), and so on. Then you call the histogram function to get the counts:
counts, _ = np.histogram(rand_array, bins=bins)
print(counts) # [130 145 125] # So 130 zeros, 145 ones, 125 twos
This approach has the advantage that you don't need to hardcode your values (because they will be calculated within the bins).
As indicated in the comments, you don't need to setup the bins as float. You could use simple integer-bins:
bins = np.arange(np.min(rand_array), np.max(rand_array)+2)
# [0 1 2 3]
counts, _ = np.histogram(rand_array, bins=bins)
print(counts) # [130 145 125]
The for loop iterates through the rows, so you have to insert another loop for every row:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import random
def my_array():
rand_array = np.random.randint(0,3,(20,20))
zeros = 0
ones = 0
twos = 0
for element in rand_array:
for el in element:
if el == 0:
zeros += 1
elif el == 1:
ones += 1
else:
twos += 1
return rand_array,zeros,ones,twos
return rand_array
print(my_array())
Related
I work in the field of image processing, and I converted the image into a matrix. The values of this matrix are only two numbers: 0 and 255. I want to know where the value 0 is in which column and in which row it is repeated within this matrix. Please help
I wrote these
array = np.array(binary_img)
print("array",array)
for i in array:
a = np.where(i==0)
print(a)
continue
The == operator on an array returns a boolean array of True/False values for each element. The argwhere function returns the coordinates of the nonzero points in its parameter:
array = np.array(binary_img)
for pt in np.argwhere(array == 0):
print( pt )
I have a 4 x 4 numpy array. I would like determine whether the maximum of each row is unique, i.e. there is only one occurrence of the maximum value. I'm new to Python and numpy and wondered if there is a pythonic way (method) of doing this rather than running a for loop.
You could, for example, try this:
import numpy as np
x = np.random.randint(0, 10, (4, 4))
res = np.sum(x == x.max(axis=1, keepdims=True), axis=1) > 1
This gives you a boolean array. Its nth index is True if the maxima of the nth row of the input array occurs multiple times in the same row.
x.max(axis=1, keepdims=True) computes the maxima along the rows of the array and ensures that the result has the same number of dimensions as the input. Then it checks if there are further occurrences of the maxima in the corresponding rows. The result is boolean array of the same shape as the input array. In Python, booleans are effectively integer values, so you can sum them up. If sum is greater than 1, the maximum is not strict.
I have a 4x1 array that I want to search for the minimum non zero value and find its index. For example:
theta = array([0,1,2,3]).reshape(4,1)
It was suggested in a similar thread to use nonzero() or where(), but when I tried to use that in the way that was suggested, it creates a new array that doesn't have the same indices as the original:
np.argmin(theta[np.nonzero(theta)])
gives an index of zero, which clearly isn't right. I think this is because it creates a new array of non zero elements first. I am only interested in the first minimum value if there are duplicates.
np.nonzero(theta) returns the index of the values that are non-zero. In your case, it returns,
[1,2,3]
Then, theta[np.nonzero(theta)] returns the values
[1,2,3]
When you do np.argmin(theta[np.nonzero(theta)]) on the previous output, it returns the index of the value 1 which is 0.
Hence, the correct approach would be:
i,j = np.where( theta==np.min(theta[np.nonzero(theta)])) where i,j are the indices of the minimum non zero element of the original numpy array
theta[i,j] or theta[i] gives the respective value at that index.
#!/usr/bin/env python
# Solution utilizing numpy masking of zero value in array
import numpy as np
import numpy.ma as ma
a = [0,1,2,3]
a = np.array(a)
print "your array: ",a
# the non-zero minimum value
minval = np.min(ma.masked_where(a==0, a))
print "non-zero minimum: ",minval
# the position/index of non-zero minimum value in the array
minvalpos = np.argmin(ma.masked_where(a==0, a))
print "index of non-zero minimum: ", minvalpos
I think you #Emily were very close to the correct answer. You said:
np.argmin(theta[np.nonzero(theta)]) gives an index of zero, which clearly isn't right. I think this is because it creates a new array of non zero elements first.
The last sentence is correct => the first one is wrong since it is expected to give the index in the new array.
Let's now extract the correct index in the old (original) array:
nztheta_ind = np.nonzero(theta)
k = np.argmin(theta[nztheta_ind])
i = nztheta_ind[0][k]
j = nztheta_ind[1][k]
or:
[i[k] for i in nztheta_ind]
for arbitrary dimensionality of original array.
ndim Solution
i = np.unravel_index(np.where(theta!=0, theta, theta.max()+1).argmin(), theta.shape)
Explaination
Masking the zeros out creates t0. There are other ways, see the perfplot.
Finding the minimum location, returns the flattened (1D) index.
unravel_index fixes this problem, and hasn't been suggested yet.
theta = np.triu(np.random.rand(4,4), 1) # example array
t0 = np.where(theta!=0, theta, np.nan) # 1
i0 = np.nanargmin(t0) # 2
i = np.unravel_index(i0, theta.shape) # 3
print(theta, i, theta[i]) #
mask: i = np.unravel_index(np.ma.masked_where(a==0, a).argmin(), a.shape)
nan: i = np.unravel_index(np.nanargmin(np.where(a!=0, a, np.nan)), a.shape)
max: i = np.unravel_index(np.where(a!=0, a, a.max()+1).argmin(), a.shape)
I have a program that takes some large NumPy arrays and, based on some outside data, grows them by adding one to randomly selected cells until the array's sum is equal to the outside data. A simplified and smaller version looks like:
import numpy as np
my_array = np.random.random_integers(0, 100, [100, 100])
## Just creating a sample version of the array, then getting it's sum:
np.sum(my_array)
499097
So, supposing I want to grow the array until its sum is 1,000,000, and that I want to do so by repeatedly selecting a random cell and adding 1 to it until we hit that sum, I'm doing something like:
diff = 1000000 - np.sum(my_array)
counter = 0
while counter < diff:
row = random.randrange(0,99)
col = random.randrange(0,99)
coordinate = [row, col]
my_array[coord] += 1
counter += 1
Where row/col combine to return a random cell in the array, and then that cell is grown by 1. It repeats until the number of times by which it has added 1 to a random cell == the difference between the original array's sum and the target sum (1,000,000).
However, when I check the result after running this - the sum is always off. In this case after running it with the same numbers as above:
np.sum(my_array)
99667203
I can't figure out what is accounting for this massive difference. And is there a more pythonic way to go about this?
my_array[coordinate] does not do what you expect. It is selecting multiple rows and adding 1 to all of those entries. You could simply use my_array[row, col] instead.
You could simply write something like:
for _ in range(1000000 - np.sum(my_array)):
my_array[random.randrange(0, 99), random.randrange(0, 99)] += 1
(or xrange instead of range if using Python 2.x)
Replace my_array[coord] with my_array[row][col]. Your method chose two random integers and added 1 to every entry in the rows corresponding to both integers.
Basically you had a minor misunderstanding of how numpy indexes arrays.
Edit: To make this clearer.
The code posted chose two numbers, say 30 and 45, and added 1 to all 100 entries of row 30 and all 100 entries of row 45.
From this you would expect the total sum to be 100,679,697 = 200*(1,000,000 - 499,097) + 499,097
However when the random integers are identical (say, 45 and 45), only 1 is added to every entry of column 45, not 2, so in that case the sum only jumps by 100.
The problem with your original approach is that you are indexing your array with a list, which is interpreted as a sequence of indices into the row dimension, rather than as separate indices into the row/column dimensions (see here).
Try passing a tuple instead of a list:
coord = row, col
my_array[coord] += 1
A much faster approach would be to find the difference between the sum over the input array and the target value, then generate an array containing the same number of random indices into the array and increment them all in one go, thus avoiding looping in Python:
import numpy as np
def grow_to_target(A, target=1000000, inplace=False):
if not inplace:
A = A.copy()
# how many times do we need to increment A?
n = target - A.sum()
# pick n random indices into the flattened array
idx = np.random.random_integers(0, A.size - 1, n)
# how many times did we sample each unique index?
uidx, counts = np.unique(idx, return_counts=True)
# increment the array counts times at each unique index
A.flat[uidx] += counts
return A
For example:
a = np.zeros((100, 100), dtype=np.int)
b = grow_to_target(a)
print(b.sum())
# 1000000
%timeit grow_to_target(a)
# 10 loops, best of 3: 91.5 ms per loop
I have a multi-dimensional array such as;
a = [[1,1,5,12,0,4,0],
[0,1,2,11,0,4,2],
[0,4,3,17,0,4,9],
[1,3,5,74,0,8,16]]
How can I delete the column if all entries within that column are equal to zero? In the array a that would mean deleting the 4th column resulting in:
a = [[1,1,5,12,4,0],
[0,1,2,11,4,2],
[0,4,3,17,4,9],
[1,3,5,74,8,16]]
N.b I've written a as a nested list but only to make it clear. I also don't know a priori where the zero column will be in the array.
My attempt so far only finds the index of the column in which all elements are equal to zero:
a = np.array([[1,1,5,12,0,4,0],[0,1,2,11,0,4,2],[0,4,3,17,0,4,9],[1,3,5,74,0,8,16]])
b = np.vstack(a)
ind = []
for n,m in zip(b.T,range(len(b.T))):
if sum(n) == 0:
ind.append(m)
Is there any way to achieve this?
With the code you already have, you can just do:
for place in ind:
for sublist in a:
del sublist[place]
Which gets the job done but is not very satisfactory...
Edit: numpy is strong
import numpy as np
a = np.array(a)
a = a[:, np.sum(a, axis=0)!=0]