import numpy as np
array = np.random.uniform(0.0, 1.0, (100, 2))
array[(array[:, 1:2] < 3.0).flatten()][:][1:2] = 4.0
I want to change the second value of the rows who is less than 3.0 to 4.0, but the above code does not work. I tried to search a little bit, it appears that fancy slicing always just operates on a copy of the original array. Is that true? How to do the correct assignment in this case?
Just use multi index (boolean index for the 1st dimension & integer index for 2nd dimension) to mutate in place:
array[array[:, 1] < 3, 1] = 4
array
# array([[2.59733369e-01, 4.00000000e+00],
# [5.08406931e-01, 4.00000000e+00],
# ...
Related
I have a numpy array and I want to add n elements with the same value until the length of the array reaches 100.
For example
my_array = numpy.array([3, 4, 5])
Note that I do not know the length of the array beforehand. It may be anything 3 <= x <= 100
I want to add (100 - x) more elements, all with the value 9.
How can I do it?
There are two ways to approach this: concatenating arrays or assigning them.
You can use np.concatenate and generate an appropriately sized array:
my_array = # as you defined it
remainder = [9] * (100 - len(my_array))
remainder = np.array(remainder)
a100 = np.concatenate((my_array, remainder))
Alternatively, you can construct a np.full array, and then overwrite some of the values using slice notation:
a100 = numpy.full(100, 9)
my_array = # as you defined it
a100[0:len(my_array)] = my_array
It's important to remember with numpy arrays, you can't add elements like you can with lists. So adding numbers to an array is not really the best thing to do.
Far better is to start with an array, and replace the elements with new data as it comes in. For example:
import numpy as np
MY_SPECIAL_NUMBER = 100
my_array = np.array[3, 4, 5]
my_new_array = np.ones(100) * MY_SPECIAL_NUMBER
my_new_array[:my_array.size] = my_array
my_new_array is now what you want.
If you "cannot" know the size of your mysterious array:
fillvalue=9
padding=numpy.ones(100)*fillvalue
newarray=numpy.append(myarray, padding)
newarray=newarray[:100]
I have a 4x1 array that I want to search for the minimum non zero value and find its index. For example:
theta = array([0,1,2,3]).reshape(4,1)
It was suggested in a similar thread to use nonzero() or where(), but when I tried to use that in the way that was suggested, it creates a new array that doesn't have the same indices as the original:
np.argmin(theta[np.nonzero(theta)])
gives an index of zero, which clearly isn't right. I think this is because it creates a new array of non zero elements first. I am only interested in the first minimum value if there are duplicates.
np.nonzero(theta) returns the index of the values that are non-zero. In your case, it returns,
[1,2,3]
Then, theta[np.nonzero(theta)] returns the values
[1,2,3]
When you do np.argmin(theta[np.nonzero(theta)]) on the previous output, it returns the index of the value 1 which is 0.
Hence, the correct approach would be:
i,j = np.where( theta==np.min(theta[np.nonzero(theta)])) where i,j are the indices of the minimum non zero element of the original numpy array
theta[i,j] or theta[i] gives the respective value at that index.
#!/usr/bin/env python
# Solution utilizing numpy masking of zero value in array
import numpy as np
import numpy.ma as ma
a = [0,1,2,3]
a = np.array(a)
print "your array: ",a
# the non-zero minimum value
minval = np.min(ma.masked_where(a==0, a))
print "non-zero minimum: ",minval
# the position/index of non-zero minimum value in the array
minvalpos = np.argmin(ma.masked_where(a==0, a))
print "index of non-zero minimum: ", minvalpos
I think you #Emily were very close to the correct answer. You said:
np.argmin(theta[np.nonzero(theta)]) gives an index of zero, which clearly isn't right. I think this is because it creates a new array of non zero elements first.
The last sentence is correct => the first one is wrong since it is expected to give the index in the new array.
Let's now extract the correct index in the old (original) array:
nztheta_ind = np.nonzero(theta)
k = np.argmin(theta[nztheta_ind])
i = nztheta_ind[0][k]
j = nztheta_ind[1][k]
or:
[i[k] for i in nztheta_ind]
for arbitrary dimensionality of original array.
ndim Solution
i = np.unravel_index(np.where(theta!=0, theta, theta.max()+1).argmin(), theta.shape)
Explaination
Masking the zeros out creates t0. There are other ways, see the perfplot.
Finding the minimum location, returns the flattened (1D) index.
unravel_index fixes this problem, and hasn't been suggested yet.
theta = np.triu(np.random.rand(4,4), 1) # example array
t0 = np.where(theta!=0, theta, np.nan) # 1
i0 = np.nanargmin(t0) # 2
i = np.unravel_index(i0, theta.shape) # 3
print(theta, i, theta[i]) #
mask: i = np.unravel_index(np.ma.masked_where(a==0, a).argmin(), a.shape)
nan: i = np.unravel_index(np.nanargmin(np.where(a!=0, a, np.nan)), a.shape)
max: i = np.unravel_index(np.where(a!=0, a, a.max()+1).argmin(), a.shape)
Simple Version:
if I do this:
import numpy as np
a = np.zeros(2)
a[[1, 1]] += np.array([1, 1])
I get [0, 1] as an output. but I would like [0, 2]. Is that possible somehow, using implicit numpy looping instead of looping over it myself?
What-I-actually-need-to-do version:
I have a structured array that contains an index, a value, and some boolean value. I would like to sum those values at those indices, based on the boolean. Clearly that can be done with a simple loop, but it seems like it should be possible with clever numpy indexing (as above).
For example, I have an array with 5 elements that I want to populate from the array with values, indices, and conditions:
import numpy as np
size = 5
nvalues = 10
np.random.seed(1)
a = np.zeros(nvalues, dtype=[('val', float), ('ix', int), ('cond', bool)])
a = np.rec.array(a)
a.val = np.random.rand(nvalues)
a.cond = (np.random.rand(nvalues) > 0.3)
a.ix = np.random.randint(size, size=nvalues)
# obvious solution
obvssum = np.zeros(size)
for i in a:
if i.cond:
obvssum[i.ix] += i.val
# is something this possible?
doesntwork = np.zeros(size)
doesntwork[a[a.cond].ix] += a[a.cond].val
print(doesntwork)
print(obvssum)
Output:
[ 0. 0. 0.61927097 0.02592623 0.29965467]
[ 0. 0. 1.05459336 0.02592623 1.27063303]
I think what's happening here is if a[a.cond].ix were guaranteed to be unique, my method would work just fine, as noted in the simple example.
This is what the at method of NumPy ufuncs is for:
output = numpy.zeros(size)
numpy.add.at(output, a[a.cond].ix, a[a.cond].val)
For example, let's consider this toy code
import numpy as np
import numpy.random as rnd
a = rnd.randint(0,10,(10,10))
k = (1,2)
b = a[:,k]
for col in np.arange(np.size(b,1)):
b[:,col] = b[:,col]+col*100
This code will work when the size of k is bigger than 1. However, with the size equal to 1, the extracted sub-matrix from a is transformed into a row vector, and applying the function in the for loop throws an error.
Of course, I could fix this by checking the dimension of b and reshaping:
if np.dim(b) == 1:
b = np.reshape(b, (np.size(b), 1))
in order to obtain a column vector, but this is expensive.
So, the question is: what is the best way to handle this situation?
This seems like something that would arise quite often and I wonder what is the best strategy to deal with it.
If you index with a list or tuple, the 2d shape is preserved:
In [638]: a=np.random.randint(0,10,(10,10))
In [639]: a[:,(1,2)].shape
Out[639]: (10, 2)
In [640]: a[:,(1,)].shape
Out[640]: (10, 1)
And I think b iteration can be simplified to:
a[:,k] += np.arange(len(k))*100
This sort of calculation will also be easier is k is always a list or tuple, and never a scalar (a scalar does not have a len).
np.column_stack ensures its inputs are 2d (and expands at the end if not) with:
if arr.ndim < 2:
arr = array(arr, copy=False, subok=True, ndmin=2).T
np.atleast_2d does
elif len(ary.shape) == 1:
result = ary[newaxis,:]
which of course could changed in this case to
if b.ndim==1:
b = b[:,None]
Any ways, I think it is better to ensure the k is a tuple rather than adjust b shape after. But keep both options in your toolbox.
I am building a function that I would like to creates a random 20x20 array consisting of the values 0, 1 and 2. I would secondly like to iterate through the array and keep a count of how many of each number are in the array. Here is my code:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import random
def my_array():
rand_array = np.random.randint(0,3,(20,20))
zeros = 0
ones = 0
twos = 0
for element in rand_array:
if element == 0:
zeros += 1
elif element == 1:
ones += 1
else:
twos += 1
return rand_array,zeros,ones,twos
print(my_array())
When I eliminate the for loop to try and iterate the array it works fine and prints the array however as is, the code gives this error message:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
When you iterate on a multi-dimensional numpy array, you're only iterating over the first dimension. In your example, your element values will be 1-dimensional arrays too!
You could solve the issue with another for loop over the values of the 1-dimensional array, but in numpy code, using for loops is very often a bad idea. You usually want to be using vector operations and operations broadcast across the whole array instead.
In your example, you could do:
rand_array = np.random.randint(0,3,(20,20))
# no loop needed
zeros = np.sum(rand_array == 0)
ones = np.sum(rand_array == 1)
twos = np.sum(rand_array == 2)
The == operator is broadcast over the whole array producing an boolean array. Then the sum adds up the True values (True is equal to 1 in Python) to get a count.
As already pointed out you are iterating over the rows, not the elements. And numpy just refuses to evaluate the truth of an array except the array only contains one element.
Iteration over all elements
If you want to iterate over each element I would suggest using np.nditer. That way you access every element regardless of how many dimensions your array has. You just need to alter this line:
for element in np.nditer(rand_array):
# instead of "for element in rand_array:"
An alternative using a histogram
But I think there is an even better approach: If you have an array containing discrete values (like integer) you could use np.histogram to get your counts.
You need to setup the bins so that every integer will have it's own bin:
bins = np.arange(np.min(rand_array)-0.5, np.max(rand_array)+1.5)
# in your case this will give an array containing [-0.5, 0.5, 1.5, 2.5]
This way the histogram will fill the first bin with every value between -0.5 and 0.5 (so every 0 of your array), the second bin with all values between 0.5 and 1.5 (every 1), and so on. Then you call the histogram function to get the counts:
counts, _ = np.histogram(rand_array, bins=bins)
print(counts) # [130 145 125] # So 130 zeros, 145 ones, 125 twos
This approach has the advantage that you don't need to hardcode your values (because they will be calculated within the bins).
As indicated in the comments, you don't need to setup the bins as float. You could use simple integer-bins:
bins = np.arange(np.min(rand_array), np.max(rand_array)+2)
# [0 1 2 3]
counts, _ = np.histogram(rand_array, bins=bins)
print(counts) # [130 145 125]
The for loop iterates through the rows, so you have to insert another loop for every row:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import random
def my_array():
rand_array = np.random.randint(0,3,(20,20))
zeros = 0
ones = 0
twos = 0
for element in rand_array:
for el in element:
if el == 0:
zeros += 1
elif el == 1:
ones += 1
else:
twos += 1
return rand_array,zeros,ones,twos
return rand_array
print(my_array())