Simple Version:
if I do this:
import numpy as np
a = np.zeros(2)
a[[1, 1]] += np.array([1, 1])
I get [0, 1] as an output. but I would like [0, 2]. Is that possible somehow, using implicit numpy looping instead of looping over it myself?
What-I-actually-need-to-do version:
I have a structured array that contains an index, a value, and some boolean value. I would like to sum those values at those indices, based on the boolean. Clearly that can be done with a simple loop, but it seems like it should be possible with clever numpy indexing (as above).
For example, I have an array with 5 elements that I want to populate from the array with values, indices, and conditions:
import numpy as np
size = 5
nvalues = 10
np.random.seed(1)
a = np.zeros(nvalues, dtype=[('val', float), ('ix', int), ('cond', bool)])
a = np.rec.array(a)
a.val = np.random.rand(nvalues)
a.cond = (np.random.rand(nvalues) > 0.3)
a.ix = np.random.randint(size, size=nvalues)
# obvious solution
obvssum = np.zeros(size)
for i in a:
if i.cond:
obvssum[i.ix] += i.val
# is something this possible?
doesntwork = np.zeros(size)
doesntwork[a[a.cond].ix] += a[a.cond].val
print(doesntwork)
print(obvssum)
Output:
[ 0. 0. 0.61927097 0.02592623 0.29965467]
[ 0. 0. 1.05459336 0.02592623 1.27063303]
I think what's happening here is if a[a.cond].ix were guaranteed to be unique, my method would work just fine, as noted in the simple example.
This is what the at method of NumPy ufuncs is for:
output = numpy.zeros(size)
numpy.add.at(output, a[a.cond].ix, a[a.cond].val)
Related
I am trying to make the transition from excel to python, but run in some troubles, with summing every element within an one dimensional array.
In excel this can be easily done, as in the attached image.
From excel I can clearly see the mathematical pattern for achieving this. My approach was to create a for loop, by indexing the array A.
This is my code:
import numpy as np
A = np.array([0.520094,0.850895E-1,-0.108374e1])
B = np.array([0]) #initialize array
B[0] = A[0]
A is equivalent to column A in excel & similarly B
Using a for loop to sum every element/row:
for i in range(len(A)):
i = i+1
B.append([B[i-1]+A[i]])
print(B)
This strategy doesn't work and keep getting erros. Any suggestion as to how I could make this work or is there a more elegant way of doing this?
Just use np.cumsum:
import numpy as np
A = np.array([0.520094,0.850895E-1,-0.108374e1])
cumsum = np.cumsum(A)
print(cumsum)
Output:
[ 0.520094 0.6051835 -0.4785565]
A manual approach would look like this:
A = np.array([0.520094,0.850895E-1,-0.108374e1])
B = [] # Create B as a list and not a numpy array, because it's faster to append
for i in range(len(A)):
cumulated = A[i]
if i > 0:
cumulated += B[i-1]
B.append(cumulated)
B = np.array(B) # Convert B from list to a numpy array
This question already has answers here:
convert numpy array to 0 or 1
(7 answers)
Closed 2 years ago.
I have this function:
if elem < 0:
elem = 0
else:
elem = 1
I want to apply this function to every element in a NumPy array, which would be done with a for loop when performing this function for only the same dimensions. But in this case, I need it to work regardless of the array dimensions and shape. Would there be any way this can be achieved in Python with NumPy?
Or would there be any general way to apply any def to every element in a NumPy n-dimensional array?
Isn't it
arr = (arr >= 0).astype(int)
np.where
np.where(arr < 0, 0, 1)
You can use a boolean mask to define an array of decisions. Let's work through a concrete example. You have an array of positive and negative numbers and you want to take the square root only at non-negative locations:
arr = np.random.normal(size=100)
You compute a mask like
mask = arr >= 0
The most straightforward way to apply the mask is to create an output array, and fill in the required elements:
result = np.empty(arr.shape)
result[mask] = np.sqrt(arr[mask])
result[~mask] = arr[~mask]
This is not super efficient because you have to compute the inverse of the mask and apply it multiple times. For this specific example, your can take advantage of the fact that np.sqrt is a ufunc and use its where keyword:
result = arr.copy()
np.sqrt(arr, where=mask, out=result)
One popular way to apply the mask would be to use np.where but I specifically constructed this example to show the caveats. The simplistic approach would be to compute
result = np.where(mask, np.sqrt(arr), arr)
where chooses the value from either np.sqrt(arr) or arr depending on whether mask is truthy or not. This is a very good method in many cases, but you have to have the values pre-computed for both branches, which is exactly what to want to avoid with a square root.
TL;DR
Your specific example is looking for a representation of the mask itself. If you don't care about the type:
result = arr >= 0
If you do care about the type:
result = (arr >= 0).astype(int)
OR
result = -np.clip(arr, -1, 0)
These solutions create a different array from the input. If you want to replace values in the same buffer,
mask = arr >= 0
arr[mask] = 1
arr[~mask] = 0
You can do something like this:
import numpy as np
a=np.array([-2,-1,0,1,2])
a[a>=0]=1
a[a<0]=0
>>> a
array([0, 0, 1, 1, 1])
An alternative to the above solutions could be combining list comprenhension with ternary operators.
my_array = np.array([-1.2, 3.0, -10.11, 5.2])
sol = np.asarray([0 if val < 0 else 1 for val in my_array])
take a look to these sources
https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions
https://book.pythontips.com/en/latest/ternary_operators.html
Use numpy.vectorize():
import numpy as np
def unit(elem):
if elem < 0:
elem = 0
else:
elem = 1
a = np.array([[1, 2, -0.5], [0.5, 2, 3]])
vfunc = np.vectorize(unit)
vfunc(a)
# array([[1, 1, 0], [1, 1, 1]])
I have a numpy array and I want to add n elements with the same value until the length of the array reaches 100.
For example
my_array = numpy.array([3, 4, 5])
Note that I do not know the length of the array beforehand. It may be anything 3 <= x <= 100
I want to add (100 - x) more elements, all with the value 9.
How can I do it?
There are two ways to approach this: concatenating arrays or assigning them.
You can use np.concatenate and generate an appropriately sized array:
my_array = # as you defined it
remainder = [9] * (100 - len(my_array))
remainder = np.array(remainder)
a100 = np.concatenate((my_array, remainder))
Alternatively, you can construct a np.full array, and then overwrite some of the values using slice notation:
a100 = numpy.full(100, 9)
my_array = # as you defined it
a100[0:len(my_array)] = my_array
It's important to remember with numpy arrays, you can't add elements like you can with lists. So adding numbers to an array is not really the best thing to do.
Far better is to start with an array, and replace the elements with new data as it comes in. For example:
import numpy as np
MY_SPECIAL_NUMBER = 100
my_array = np.array[3, 4, 5]
my_new_array = np.ones(100) * MY_SPECIAL_NUMBER
my_new_array[:my_array.size] = my_array
my_new_array is now what you want.
If you "cannot" know the size of your mysterious array:
fillvalue=9
padding=numpy.ones(100)*fillvalue
newarray=numpy.append(myarray, padding)
newarray=newarray[:100]
For example, let's consider this toy code
import numpy as np
import numpy.random as rnd
a = rnd.randint(0,10,(10,10))
k = (1,2)
b = a[:,k]
for col in np.arange(np.size(b,1)):
b[:,col] = b[:,col]+col*100
This code will work when the size of k is bigger than 1. However, with the size equal to 1, the extracted sub-matrix from a is transformed into a row vector, and applying the function in the for loop throws an error.
Of course, I could fix this by checking the dimension of b and reshaping:
if np.dim(b) == 1:
b = np.reshape(b, (np.size(b), 1))
in order to obtain a column vector, but this is expensive.
So, the question is: what is the best way to handle this situation?
This seems like something that would arise quite often and I wonder what is the best strategy to deal with it.
If you index with a list or tuple, the 2d shape is preserved:
In [638]: a=np.random.randint(0,10,(10,10))
In [639]: a[:,(1,2)].shape
Out[639]: (10, 2)
In [640]: a[:,(1,)].shape
Out[640]: (10, 1)
And I think b iteration can be simplified to:
a[:,k] += np.arange(len(k))*100
This sort of calculation will also be easier is k is always a list or tuple, and never a scalar (a scalar does not have a len).
np.column_stack ensures its inputs are 2d (and expands at the end if not) with:
if arr.ndim < 2:
arr = array(arr, copy=False, subok=True, ndmin=2).T
np.atleast_2d does
elif len(ary.shape) == 1:
result = ary[newaxis,:]
which of course could changed in this case to
if b.ndim==1:
b = b[:,None]
Any ways, I think it is better to ensure the k is a tuple rather than adjust b shape after. But keep both options in your toolbox.
I am working with data from netcdf files, with multi-dimensional variables, read into numpy arrays. I need to scan all values in all dimensions (axes in numpy) and alter some values. But, I don't know in advance the dimension of any given variable. At runtime I can, of course, get the ndims and shapes of the numpy array.
How can I program a loop thru all values without knowing the number of dimensions, or shapes in advance? If I knew a variable was exactly 2 dimensions, I would do
shp=myarray.shape
for i in range(shp[0]):
for j in range(shp[1]):
do_something(myarray[i][j])
You should look into ravel, nditer and ndindex.
# For the simple case
for value in np.nditer(a):
do_something_with(value)
# This is similar to above
for value in a.ravel():
do_something_with(value)
# Or if you need the index
for idx in np.ndindex(a.shape):
a[idx] = do_something_with(a[idx])
On an unrelated note, numpy arrays are indexed a[i, j] instead of a[i][j]. In python a[i, j] is equivalent to indexing with a tuple, ie a[(i, j)].
You can use the flat property of numpy arrays, which returns a generator on all values (no matter the shape).
For instance:
>>> A = np.array([[1,2,3],[4,5,6]])
>>> for x in A.flat:
... print x
1
2
3
4
5
6
You can also set the values in the same order they're returned, e.g. like this:
>>> A.flat[:] = [x / 2 if x % 2 == 0 else x for x in A.flat]
>>> A
array([[1, 1, 3],
[2, 5, 3]])
I am not sure the order in which flat returns the elements is guaranteed in any way (as it iterates through the elements as they are in memory, so depending on your array convention you are likely to have it always being the same, unless you are really doing it on purpose, but be careful...)
And this will work for any dimension.
** -- Edit -- **
To clarify what I meant by 'order not guaranteed', the order of elements returned by flat does not change, but I think it would be unwise to count on it for things like row1 = A.flat[:N], although it will work most of the time.
This might be the easiest with recursion:
a = numpy.array(range(30)).reshape(5, 3, 2)
def recursive_do_something(array):
if len(array.shape) == 1:
for obj in array:
do_something(obj)
else:
for subarray in array:
recursive_do_something(subarray)
recursive_do_something(a)
In case you want the indices:
a = numpy.array(range(30)).reshape(5, 3, 2)
def do_something(x, indices):
print(indices, x)
def recursive_do_something(array, indices=None):
indices = indices or []
if len(array.shape) == 1:
for obj in array:
do_something(obj, indices)
else:
for i, subarray in enumerate(array):
recursive_do_something(subarray, indices + [i])
recursive_do_something(a)
Look into Python's itertools module.
Python 2: http://docs.python.org/2/library/itertools.html#itertools.product
Python 3: http://docs.python.org/3.3/library/itertools.html#itertools.product
This will allow you to do something along the lines of
for lengths in product(shp[0], shp[1], ...):
do_something(myarray[lengths[0]][lengths[1]]