I need to implement this following function with NumPy -
where F_l(x) are N number of arrays that I need to calculate, which are dependent on an array G(x), that I am given, and A_j are N coefficients that are also given. I would like to implement it in NumPy as I would have to calculate F_l(x) for every iteration of my program. The dummy way to do this is by for loops and ifs:
import numpy as np
A = np.arange(1.,5.,1)
G = np.array([[1.,2.],[3.,4.]])
def calcF(G,A):
N = A.size
print A
print N
F = []
for l in range(N):
F.append(G/A[l])
print F[l]
for j in range(N):
if j != l:
F[l]*=((G - A[l])/(G + A[j]))*((A[l] - A[j])/(A[l] + A[j]))
return F
F= calcF(G,A)
print F
As for loops and if statements are relatively slow, I am looking for a NumPy witty way to do the same thing. Anyone has an idea?
Listed in this post is a vectorized solution making heavy usage of NumPy's powerful broadcasting feature after extending dimensions of input arrays to 3D and 4D cases with np.newaxis/None at various places according to the computation involved. Here's the implementation -
# Get size of A
N = A.size
# Perform "(G - A[l])/(G + A[j]))" in a vectorized manner
p1 = (G - A[:,None,None,None])/(G + A[:,None,None])
# Perform "((A[l] - A[j])/(A[l] + A[j]))" in a vectorized manner
p2 = ((A[:,None] - A)/(A[:,None] + A))
# Elementwise multiplications between the previously calculated parts
p3 = p1*p2[...,None,None]
# Set the escaped portion "j != l" output as "G/A[l]"
p3[np.eye(N,dtype=bool)] = G/A[:,None,None]
Fout = p3.prod(1)
# If you need separate arrays just like in the question, split it
Fout_split = np.array_split(Fout,N)
Sample run -
In [284]: # Original inputs
...: A = np.arange(1.,5.,1)
...: G = np.array([[1.,2.],[3.,4.]])
...:
In [285]: calcF(G,A)
Out[285]:
[array([[-0. , -0.00166667],
[-0.01142857, -0.03214286]]), array([[-0.00027778, 0. ],
[ 0.00019841, 0.00126984]]), array([[ 1.26984127e-03, 1.32275132e-04],
[ -0.00000000e+00, -7.93650794e-05]]), array([[-0.00803571, -0.00190476],
[-0.00017857, 0. ]])]
In [286]: vectorized_calcF(G,A) # Posted solution
Out[286]:
[array([[[-0. , -0.00166667],
[-0.01142857, -0.03214286]]]), array([[[-0.00027778, 0. ],
[ 0.00019841, 0.00126984]]]), array([[[ 1.26984127e-03, 1.32275132e-04],
[ -0.00000000e+00, -7.93650794e-05]]]), array([[[-0.00803571, -0.00190476],
[-0.00017857, 0. ]]])]
Runtime test -
In [289]: # Larger inputs
...: A = np.random.randint(1,500,(400))
...: G = np.random.randint(1,400,(20,20))
...:
In [290]: %timeit calcF(G,A)
1 loops, best of 3: 4.46 s per loop
In [291]: %timeit vectorized_calcF(G,A) # Posted solution
1 loops, best of 3: 1.87 s per loop
Vectorization with NumPy/MATLAB : General approach
Felt like I could throw in my two cents on my general approach and I would think others follow similar strategies when trying to vectorize codes, especially in a high level platform like NumPy or MATLAB. So, here's a quick check-list of things that could be considered for Vectorization -
Idea about extending the dimensions : Dimensions are to be extended for the input arrays such that the new dimensions hold results that would have otherwise gotten generated iteratively within the nested loops.
Where to start vectorizing from? Start from the deepest (that loop where the code is iterating the most) stage of computation and see how inputs could be extended and the relevant computation could be brought in. Take good care of tracing the iterators involved and extend dimensions accordingly. Move outwards onto outer loops, until you are satisfied with the vectorization done.
How to take care of conditional statements? For simple cases, brute force compute everything and see how the IF/ELSE parts could be taken care of later on. This would be highly context specific.
Are there dependencies? If so, see if the dependencies could be traced and implemented accordingly. This could form another topic for discussion, but here are few examples I got myself involved with.
Related
Given the following 2-column array, I want to select items from the second column that correspond to "edges" in the first column. This is just an example, as in reality my a has potentially millions of rows. So, ideally I'd like to do this as fast as possible, and without creating intermediate results.
import numpy as np
a = np.array([[1,4],[1,2],[1,3],[2,6],[2,1],[2,8],[2,3],[2,1],
[3,6],[3,7],[5,4],[5,9],[5,1],[5,3],[5,2],[8,2],
[8,6],[8,8]])
i.e. I want to find the result,
desired = np.array([4,6,6,4,2])
which is entries in a[:,1] corresponding to where a[:,0] changes.
One solution is,
b = a[(a[1:,0]-a[:-1,0]).nonzero()[0]+1, 1]
which gives np.array([6,6,4,2]), I could simply prepend the first item, no problem. However, this creates an intermediate array of the indexes of the first items. I could avoid the intermediate by using a list comprehension:
c = [a[i+1,1] for i,(x,y) in enumerate(zip(a[1:,0],a[:-1,0])) if x!=y]
This also gives [6,6,4,2]. Assuming a generator-based zip (true in Python 3), this doesn't need to create an intermediate representation and should be very memory efficient. However, the inner loop is not numpy, and it necessitates generating a list which must be subsequently turned back into a numpy array.
Can you come up with a numpy-only version with the memory efficiency of c but the speed efficiency of b? Ideally only one pass over a is needed.
(Note that measuring the speed won't help much here, unless a is very big, so I wouldn't bother with benchmarking this, I just want something that is theoretically fast and memory efficient. For example, you can assume rows in a are streamed from a file and are slow to access -- another reason to avoid the b solution, as it requires a second random-access pass over a.)
Edit: a way to generate a large a matrix for testing:
from itertools import repeat
N, M = 100000, 100
a = np.array(zip([x for y in zip(*repeat(np.arange(N),M)) for x in y ], np.random.random(N*M)))
I am afraid if you are looking to do this in a vectorized way, you can't avoid an intermediate array, as there's no built-in for it.
Now, let's look for vectorized approaches other than nonzero(), which might be more performant. Going by the same idea of performing differentiation as with the original code of (a[1:,0]-a[:-1,0]), we can use boolean indexing after looking for non-zero differentiations that correspond to "edges" or shifts.
Thus, we would have a vectorized approach like so -
a[np.append(True,np.diff(a[:,0])!=0),1]
Runtime test
The original solution a[(a[1:,0]-a[:-1,0]).nonzero()[0]+1,1] would skip the first row. But, let's just say for the sake of timing purposes, it's a valid result. Here's the runtimes with it against the proposed solution in this post -
In [118]: from itertools import repeat
...: N, M = 100000, 2
...: a = np.array(zip([x for y in zip(*repeat(np.arange(N),M))\
for x in y ], np.random.random(N*M)))
...:
In [119]: %timeit a[(a[1:,0]-a[:-1,0]).nonzero()[0]+1,1]
100 loops, best of 3: 6.31 ms per loop
In [120]: %timeit a[1:][np.diff(a[:,0])!=0,1]
100 loops, best of 3: 4.51 ms per loop
Now, let's say you want to include the first row too. The updated runtimes would look something like this -
In [123]: from itertools import repeat
...: N, M = 100000, 2
...: a = np.array(zip([x for y in zip(*repeat(np.arange(N),M))\
for x in y ], np.random.random(N*M)))
...:
In [124]: %timeit a[np.append(0,(a[1:,0]-a[:-1,0]).nonzero()[0]+1),1]
100 loops, best of 3: 6.8 ms per loop
In [125]: %timeit a[np.append(True,np.diff(a[:,0])!=0),1]
100 loops, best of 3: 5 ms per loop
Ok actually I found a solution, just learned about np.fromiter, which can build a numpy array based on a generator:
d = np.fromiter((a[i+1,1] for i,(x,y) in enumerate(zip(a[1:,0],a[:-1,0])) if x!=y), int)
I think this does it, generates a numpy array without any intermediate arrays. However, the caveat is that it does not seem to be all that efficient! Forgetting what I said in the question about testing:
t = [lambda a: a[(a[1:,0]-a[:-1,0]).nonzero()[0]+1, 1],
lambda a: np.array([a[i+1,1] for i,(x,y) in enumerate(zip(a[1:,0],a[:-1,0])) if x!=y]),
lambda a: np.fromiter((a[i+1,1] for i,(x,y) in enumerate(zip(a[1:,0],a[:-1,0])) if x!=y), int)]
from timeit import Timer
[Timer(x(a)).timeit(number=10) for x in t]
[0.16596235800034265, 1.811289312000099, 2.1662971739997374]
It seems the first solution is drastically faster! I assume this is because even though it generates intermediate data, it is able to perform the inner loop completely in numpy, while in the other it runs Python code for each item in the array.
Like I said, this is why I'm not sure this kind of benchmarking makes sense here -- if accesses to a were much slower, the benchmark wouldn't be CPU-loaded. Thoughts?
Not "accepting" this answer since I am hoping someone can come up with something faster.
If memory efficiency is your concern, that can be solved as such: The only intermediate of the same size-order as the input data can be made of type bool (a[1:,0] != a[:-1, 0]); and if your input data is int32, that is 8 times smaller than 'a' itself. You can count the nonzeros of that binary array to preallocate the output array as well; though that should not be very either significant if the output of the != is as sparse as your example suggests.
I'm looking to remove certain values within a constant range around values held within a second array. i.e. I have one large np array and I want to remove values +-3 in that array using another array of specific values, say [20,50,90,210]. So if my large array was [14,21,48,54,92,215] I would want [14,54,215] returned. The values are double precision so I'm trying to avoid creating a large mask array to remove specific values and use a range instead.
You mentioned that you wanted to avoid a large mask array. Unless both your "large array" and your "specific values" array are very large, I wouldn't try to avoid this. Often, with numpy it's best to allow relatively large temporary arrays to be created.
However, if you do need to control memory usage more tightly, you have several options. A typical trick is to only vectorize one part of the operation and iterate over the shorter input (this is shown in the second example below). It saves having nested loops in Python, and can significantly decrease the memory usage involved.
I'll show three different approaches. There are several others (including dropping down to C or Cython if you really need tight control and performance), but hopefully this gives you some ideas.
On a side note, for these small inputs, the overhead of array creation will overwhelm the differences. The speed and memory usage I'm referring to is only for large (>~1e6 elements) arrays.
Fully vectorized, but most memory usage
The easiest way is to calculate all distances at once and then reduce the mask back to the same shape as the initial array. For example:
import numpy as np
vals = np.array([14,21,48,54,92,215])
other = np.array([20,50,90,210])
dist = np.abs(vals[:,None] - other[None,:])
mask = np.all(dist > 3, axis=1)
result = vals[mask]
Partially vectorized, intermediate memory usage
Another option is to build up the mask iteratively for each element in the "specific values" array. This iterates over all elements of the shorter "specific values" array (a.k.a. other in this case):
import numpy as np
vals = np.array([14,21,48,54,92,215])
other = np.array([20,50,90,210])
mask = np.ones(len(vals), dtype=bool)
for num in other:
dist = np.abs(vals - num)
mask &= dist > 3
result = vals[mask]
Slowest, but lowest memory usage
Finally, if you really want to reduce memory usage, you could iterate over every item in your large array:
import numpy as np
vals = np.array([14,21,48,54,92,215])
other = np.array([20,50,90,210])
result = []
for num in vals:
if np.all(np.abs(num - other) > 3):
result.append(num)
The temporary list in that case is likely to take up more memory than the mask in the previous version. However, you could avoid the temporary list by using np.fromiter if you wanted. The timing comparison below shows an example of this.
Timing Comparisons
Let's compare the speed of these functions. We'll use 10,000,000 elements in the "large array" and 4 values in the "specific values" array. The relative speed and memory usage of these functions depend strongly on the sizes of the two arrays, so you should only consider this as a vague guideline.
import numpy as np
vals = np.random.random(1e7)
other = np.array([0.1, 0.5, 0.8, 0.95])
tolerance = 0.05
def basic(vals, other, tolerance):
dist = np.abs(vals[:,None] - other[None,:])
mask = np.all(dist > tolerance, axis=1)
return vals[mask]
def intermediate(vals, other, tolerance):
mask = np.ones(len(vals), dtype=bool)
for num in other:
dist = np.abs(vals - num)
mask &= dist > tolerance
return vals[mask]
def slow(vals, other, tolerance):
def func(vals, other, tolerance):
for num in vals:
if np.all(np.abs(num - other) > tolerance):
yield num
return np.fromiter(func(vals, other, tolerance), dtype=vals.dtype)
And in this case, the partially vectorized version wins out. That's to be expected in most cases where vals is significantly longer than other. However, the first example (basic) is almost as fast, and is arguably simpler.
In [7]: %timeit basic(vals, other, tolerance)
1 loops, best of 3: 1.45 s per loop
In [8]: %timeit intermediate(vals, other, tolerance)
1 loops, best of 3: 917 ms per loop
In [9]: %timeit slow(vals, other, tolerance)
1 loops, best of 3: 2min 30s per loop
Either way you choose to implement things, these are common vectorization "tricks" that show up in many problems. In high-level languages like Python, Matlab, R, etc It's often useful to try fully vectorizing, then mix vectorization and explicit loops if memory usage is an issue. Which one is best usually depends on the relative sizes of the inputs, but this is a common pattern to try when optimizing speed vs memory usage in high-level scientific programming.
You can try:
def closestmatch(x, y):
val = np.abs(x-y)
return(val.min()>=3)
Then:
b[np.array([closestmatch(a, x) for x in b])]
I have come across the following issue when multiplying numpy arrays. In the example below (which is slightly simplified from the real version I am dealing with), I start with a nearly empty array A and a full array C. I then use a recursive algorithm to fill in A.
Below, I perform this algorithm in two different ways. The first method involves the operations
n_array = np.arange(0,c-1)
temp_vec= C[c-n_array] * A[n_array]
A[c] += temp_vec.sum(axis=0)
while the second method involves the for loop
for m in range(0, c - 1):
B[c] += C[c-m] * B[m]
Note that the arrays A and B are identical, but they are filled in using the two different methods.
In the example below I time how long it takes to perform the computation using each method. I find that, for example, with n_pix=2 and max_counts = 400, the first method is much faster than the second (that is, time_np is much smaller than time_for). However, when I then switch to, for example, n_pix=1000 and max_counts = 400, instead I find method 2 is much faster (time_for is much smaller than time_np). I would have thought that method 1 would always be faster since method 2 explicitly runs over a loop while method 1 uses np.multiply.
So, I have two questions:
Why does the timing behave this way as a function of n_pix for a fixed max_counts?
What is optimal method for writing this code so that it behaves quickly for all n_pix?
That is, can anyone suggest a method 3? In my project, it is very important for this piece of code to perform quickly over a range of large and small n_pix.
import numpy as np
import time
def return_timing(n_pix,max_counts):
A=np.zeros((max_counts+1,n_pix))
A[0]=np.random.random(n_pix)*1.8
A[1]=np.random.random(n_pix)*2.3
B=np.zeros((max_counts+1,n_pix))
B[0]=A[0]
B[1]=A[1]
C=np.outer(np.random.random(max_counts+1),np.random.random(n_pix))*3.24
time_np=0
time_for=0
for c in range(2, max_counts + 1):
t0 = time.time()
n_array = np.arange(0,c-1)
temp_vec= C[c-n_array] * A[n_array]
A[c] += temp_vec.sum(axis=0)
time_np += time.time()-t0
t0 = time.time()
for m in range(0, c - 1):
B[c] += C[c-m] * B[m]
time_for += time.time()-t0
return time_np, time_for
First of all, you can easily replace:
n_array = np.arange(0,c-1)
temp_vec= C[c-n_array] * A[n_array]
A[c] += temp_vec.sum(axis=0)
with:
A[c] += (C[c:1:-1] * A[:c-1]).sum(0)
This is much faster because indexing with an array is much slower than slicing. But the temp_vec is still hidden in there, created before summing is done. This leads to the idea of using einsum, which is the fastest because it doesn't make the temp array.
A[c] = np.einsum('ij,ij->j', C[c:1:-1], A[:c-1])
Timing. For small arrays:
>>> return_timing(10,10)
numpy OP 0.000525951385498
loop OP 0.000250101089478
numpy slice 0.000246047973633
einsum 0.000170946121216
For large:
>>> return_timing(1000,100)
numpy OP 0.185983896255
loop OP 0.0458009243011
numpy slice 0.038364648819
einsum 0.0167834758759
It is probably because your numpy-only version requires creation/allocation of new ndarrays (temp_vec and n_array), while your other method does not.
Creation of new ndarrays is very slow and if you can modify your code in such a way that it no longer have to continuously create them, I would expect that you could get better performance out of that method.
I have a large csr_matrix and I am interested in the top ten values and their indices each row. But I did not find a decent way to manipulate the matrix.
Here is my current solution and the main idea is to process them row by row:
row = csr_matrix.getrow(row_number).toarray()[0].ravel()
top_ten_indicies = row.argsort()[-10:]
top_ten_values = row[row.argsort()[-10:]]
By doing this, the advantages of csr_matrix is not fully used. It's more like a brute force solution.
I don't see what the advantages of csr format are in this case. Sure, all the nonzero values are collected in one .data array, with the corresponding column indexes in .indices. But they are in blocks of varying length. And that means they can't be processed in parallel or with numpy array strides.
One solution is the pad those blocks into common length blocks. That's what .toarray() does. Then you can find the maximum values with argsort(axis=1) or withargpartition`.
Another is to break them into row sized blocks, and process each of those. That's what you are doing with the .getrow. Another way of breaking them up is convert to lil format, and process the sublists of the .data and .rows arrays.
A possible third option is to use the ufunc reduceat method. This lets you apply ufunc reduction methods to sequential blocks of an array. There are established ufunc like np.add that take advantage of this. argsort is not such a function. But there is a way of constructing a ufunc from a Python function, and gain some modest speed over regular Python iteration. [I need to look up a recent SO question that illustrates this.]
I'll illustrate some of this with a simpler function, sum over rows.
If A2 is a csr matrix.
A2.sum(axis=1) # the fastest compile csr method
A2.A.sum(axis=1) # same, but with a dense intermediary
[np.sum(l.data) for l in A2] # iterate over the rows of A2
[np.sum(A2.getrow(i).data) for i in range(A2.shape[0])] # iterate with index
[np.sum(l) for l in A2.tolil().data] # sum the sublists of lil format
np.add.reduceat(A2.data, A2.indptr[:-1]) # with reduceat
A2.sum(axis=1) is implemented as a matrix multiplication. That's not relevant to the sort problem, but still an interesting way of looking at the summation problem. Remember csr format was developed for efficient multiplication.
For a my current sample matrix (created for another SO sparse question)
<8x47752 sparse matrix of type '<class 'numpy.float32'>'
with 32 stored elements in Compressed Sparse Row format>
some comparative times are
In [694]: timeit np.add.reduceat(A2.data, A2.indptr[:-1])
100000 loops, best of 3: 7.41 µs per loop
In [695]: timeit A2.sum(axis=1)
10000 loops, best of 3: 71.6 µs per loop
In [696]: timeit [np.sum(l) for l in A2.tolil().data]
1000 loops, best of 3: 280 µs per loop
Everything else is 1ms or more.
I suggest focusing on developing your one-row function, something like:
def max_n(row_data, row_indices, n):
i = row_data.argsort()[-n:]
# i = row_data.argpartition(-n)[-n:]
top_values = row_data[i]
top_indices = row_indices[i] # do the sparse indices matter?
return top_values, top_indices, i
Then see how if fits in one of these iteration methods. tolil() looks most promising.
I haven't addressed the question of how to collect these results. Should they be lists of lists, array with 10 columns, another sparse matrix with 10 values per row, etc.?
sorting each row of a large sparse & saving top K values & column index - Similar question from several years back, but unanswered.
Argmax of each row or column in scipy sparse matrix - Recent question seeking argmax for rows of csr. I discuss some of the same issues.
how to speed up loop in numpy? - example of how to use np.frompyfunc to create a ufunc. I don't know if the resulting function has the .reduceat method.
Increasing value of top k elements in sparse matrix - get the top k elements of csr (not by row). Case for argpartition.
The row summation implemented with np.frompyfunc:
In [741]: def foo(a,b):
return a+b
In [742]: vfoo=np.frompyfunc(foo,2,1)
In [743]: timeit vfoo.reduceat(A2.data,A2.indptr[:-1],dtype=object).astype(float)
10000 loops, best of 3: 26.2 µs per loop
That's respectable speed. But I can't think of a way of writing a binary function (takes to 2 arguments) that would implement argsort via reduction. So this is probably a deadend for this problem.
Just to answer the original question (for people like me who found this question looking for copy-pasta), here's a solution using multiprocessing based on #hpaulj's suggestion of converting to lil_matrix, and iterating over rows
from multiprocessing import Pool
def _top_k(args):
"""
Helper function to process a single row of top_k
"""
data, row = args
data, row = zip(*sorted(zip(data, row), reverse=True)[:k])
return data, row
def top_k(m, k):
"""
Keep only the top k elements of each row in a csr_matrix
"""
ml = m.tolil()
with Pool() as p:
ms = p.map(_top_k, zip(ml.data, ml.rows))
ml.data, ml.rows = zip(*ms)
return ml.tocsr()
One would require to iterate over the rows and get the top indices for each row separately. But this loop can be jited(and parallelized) to get extremely fast function.
#nb.njit(cache=True)
def row_topk_csr(data, indices, indptr, K):
m = indptr.shape[0] - 1
max_indices = np.zeros((m, K), dtype=indices.dtype)
max_values = np.zeros((m, K), dtype=data.dtype)
for i in nb.prange(m):
top_inds = np.argsort(data[indptr[i] : indptr[i + 1]])[::-1][:K]
max_indices[i] = indices[indptr[i] : indptr[i + 1]][top_inds]
max_values[i] = data[indptr[i] : indptr[i + 1]][top_inds]
return max_indices, max_values
Call it like this:
top_pred_indices, _ = row_topk_csr(csr_mat.data, csr_mat.indices, csr_mat.indptr, K)
I need to frequently perform this operation, and this function is fast enough for me, executes in <1s on 1mil x 400k sparse matrix.
HTH.
I have written some code that for a range of years (eg. 15 years), ndimage.filters.convolveis used to convolve an array (eg. array1), then where the resulting array (eg. array2) is above a randomly generated number, another array (eg. array3) is given a value of 1. Once array3 has been assigned a value of one it counts up for every year, and when it eventually reaches a certain value (eg. 5), array1 is updated in this location.
Sorry if this is a little confusing. I've actually got the script working by using numpy.where(boolean expression, value, value), but where I needed multiple expressions (eg. where array2 == 1 and array3 == 0), I used a for loop to iterate through each value in the arrays. This works great in the example here, but when I substitute the arrays for larger arrays (The full script imports GIS grids and converts them into arrays), this for loop takes a few minutes to process for every year. As we have to run the model over 60 years 1000 times, I need to find a much more efficient way to process these arrays.
I've tried to use multiple expressions within numpy.where but couldn't work out how to get it to work. I also tried zip(array) to zip the arrays together, but I couldn't update them, I think because this created tuples of the array elements.
I've attached a copy of the script, as mentioned earlier it works exactly as I need it to. However, it needs to do this more efficiently. If anyone has any suggestions that would be great. This is my first post regarding python so I still consider myself a novice.
import numpy as np
from scipy import ndimage
import random
from pylab import *
###################### FUNCTIONS ###########################
def convolveArray1(array1, kern1):
newArray = ndimage.filters.convolve(array1, kern1, mode='constant')
return newArray
######################## MAIN ##############################
## Set the number of years
nYears = range(1,16)
## Cretae array1
array1 = np.zeros((10,10), dtype=np.int) # vegThreshMask
# Add some values to array1
array1[[4,4],[4,5]] = 8
array1[5,4] = 8
array1[5,5] = 8
## Create kerna; array
kernal = np.ones((3,3), dtype=np.float32)
## Create an empty array to be used as counter
array3 = np.zeros((10,10), dtype=np.int)
## iterate through nYears
for y, yea in enumerate(nYears):
# Create a random number for the year
randNum = randint(7, 40)
print 'The random number for year %i is %i' % (yea, randNum)
print
# Call the convolveArray function
convArray = convolveArray1(array1, kernal)
# Update array2 where it is greater than the random number
array2 = np.where(convArray > randNum, 1, 0)
print 'Where convArray > randNum in year %i' % (yea)
print array2
print
# Iterate through array2
for a, ar in enumerate(array2):
for b, arr in enumerate(ar):
if all(arr == 1 and array3[a][b] == 0):
array3[a][b] = 1
else:
if array3[a][b] > 0:
array3[a][b] = array3[a][b] + 1
if array3[a][b] == 5:
array1[a][b] = 8
# Remove the initial array (array1) from the updated array3
array3 = np.where(array1 > 0, 0, array3)
print 'New array3 after %i years' % (yea)
print '(Excluding initial array)'
print array3
print
print 'The final output of the initial array'
print array1
I suspect you could gain a substantial speedup if you start using broadcasting. For example, starting from your line # Iterate through array2 we can remove the explicit loop and simply broadcast over the variables we want to change. Note I'm using AX instead of arrayX for clarity:
# Iterate through A2
idx = (A2==1) & (A3==0)
idx2 = (~idx) & (A3>0)
A3[idx ] = 1
A3[idx2] += 1
A1[A3==5] = 8
In addition, this greatly improves code clarity once you get used to this style as you aren't explicitly dealing with the indices (your a and b here).
Is it worth the trouble?
I asked the OP to do a speed test after trying the code above:
If you do implement loop change, please let me know the speed-up on your real-world code.
It would be useful to know if the advice given is simply glorified syntactic sugar, or has a notable effect.
After testing, the response was a substantial 40x speedup! When dealing with large arrays of contiguous data where simple masks are being performed, numpy is a far better alternative over native python lists.
It sounds like you were trying to use multiple conditions in np.where using expressions like array1 > 0 and array2 < 0. This doesn't work because of the way boolean operations work in Python, as documented here. First, array1 > 0 is evaluated, then it is converted to a boolean value using the __nonzero__ method (renamed to __bool__ in Python 3). There isn't a unique useful way of converting an array into a bool, and there is currently no way of overriding the behaviour of the boolean operators (though I believe this is being discussed for future versions), so in numpy, ndarray.__nonzero__ is defined to raise an exception. Instead, you can use np.logical_and, np.logical_or, and np.logical_not, which have the behaviour you would expect.
I don't know how much of a speedup this will give you, though. If you do end up performing lots of array indexing operations in loops, it might be worth looking into cython, with which you can easily speed up array operations up by moving them into a C extension.