Numpy Array index problems - python

I am having a small issue understanding indexing in Numpy arrays. I think a simplified example is best to get an idea of what I am trying to do.
So first I create an array of zeros of the size I want to fill:
x = range(0,10,2)
y = range(0,10,2)
a = zeros(len(x),len(y))
so that will give me an array of zeros that will be 5X5. Now, I want to fill the array with a rather complicated function that I can't get to work with grids. My problem is that I'd like to iterate as:
for i in xrange(0,10,2):
for j in xrange(0,10,2):
.........
"do function and fill the array corresponding to (i,j)"
however, right now what I would like to be a[2,10] is a function of 2 and 10 but instead the index for a function of 2 and 10 would be a[1,4] or whatever.
Again, maybe this is elementary, I've gone over the docs and find myself at a loss.
EDIT:
In the end I vectorized as much as possible and wrote the simulation loops that I could not in Cython. Further I used Joblib to Parallelize the operation. I stored the results in a list because an array was not filling right when running in Parallel. I then used Itertools to split the list into individual results and Pandas to organize the results.
Thank you for all the help

Some tips for your to get the things done keeping a good performance:
- avoid Python `for` loops
- create a function that can deal with vectorized inputs
Example:
def f(xs, ys)
return x**2 + y**2 + x*y
where you can pass xs and ys as arrays and the operation will be done element-wise:
xs = np.random.random((100,200))
ys = np.random.random((100,200))
f(xs,ys)
You should read more about numpy broadcasting to get a better understanding about how the arrays's operations work. This will help you to design a function that can handle properly the arrays.

First, you lack some parenthesis with zeros, the first argument should be a tuple :
a = zeros((len(x),len(y)))
Then, the corresponding indices for your table are i/2 and j/2 :
for i in xrange(0,10,2):
for j in xrange(0,10,2):
# do function and fill the array corresponding to (i,j)
a[i/2, j/2] = 1
But I second Saullo Castro, you should try to vectorize your computations.

Related

List comprehension for np matrixes

I have two np.matrixes, one of which I'm trying to normalize. I know, in general, list comprehensions are faster than for loops, so I'm trying to convert my double for loop into a list expression.
# normalize the rows and columns of A by B
for i in range(1,q+1):
for j in range(1,q+1):
A[i-1,j-1] = A[i-1,j-1] / (B[i-1] / B[j-1])
This is what I have gotten so far:
A = np.asarray([A/(B[i-1]/B[j-1]) for i, j in zip(range(1,q+1), range(1,q+1))])
but I think I'm taking the wrong approach because I'm not seeing any significant time difference.
Any help would be appreciated.
First, if you really do mean np.matrix, stop using np.matrix. It has all sorts of nasty incompatibilities, and its role is obsolete now that # for matrix multiplication exists. Even if you're stuck on a Python version without #, using the dot method with normal ndarrays is still better than dealing with np.matrix.
You shouldn't use any sort of Python-level iteration construct with NumPy arrays, whether for loops or list comprehensions, unless you're sure you have no better options. Assuming A is 2D and B is 1D with shapes (q, q) and (q,) respectively, what you should instead do for this case is
A *= B
A /= B[:, np.newaxis]
broadcasting the operation over A. This will allow NumPy to perform the iteration at C level directly over the arrays' underlying data buffers, without having to create wrapper objects and perform dynamic dispatch on every operation.

trying to sum two arrays

I'm trying to code something like this:
where x and y are two different numpy arrays and the j is an index for the array. I don't know the length of the array because it will be entered by the user and I cannot use loops to code this.
My main problem is finding a way to move between indexes since i would need to go from
x[2]-x[1] ... x[3]-x[2]
and so on.
I'm stumped but I would appreciate any clues.
A numpy-ic solution would be:
np.square(np.diff(x)).sum() + np.square(np.diff(y)).sum()
A list comprehension approach would be:
sum([(x[k]-x[k-1])**2+(y[k]-y[k-1])**2 for k in range(1,len(x))])
will give you the result you want, even if your data appears as list.
x[2]-x[1] ... x[3]-x[2] can be generalized to:
x[[1,2,3,...]-x[[0,1,2,...]]
x[1:]-x[:-1] # ie. (1 to the end)-(0 to almost the end)
numpy can take the difference between two arrays of the same shape
In list terms this would be
[i-j for i,j in zip(x[1:], x[:-1])]
np.diff does essentially this, a[slice1]-a[slice2], where the slices are as above.
The full answer squares, sums and squareroots.

Array operations using multiple indices of same array

I am very new to Python, and I am trying to get used to performing Python's array operations rather than looping through arrays. Below is an example of the kind of looping operation I am doing, but am unable to work out a suitable pure array operation that does not rely on loops:
import numpy as np
def f(arg1, arg2):
# an arbitrary function
def myFunction(a1DNumpyArray):
A = a1DNumpyArray
# Create a square array with each dimension the size of the argument array.
B = np.zeros((A.size, A.size))
# Function f is a function of two elements of the 1D array. For each
# element, i, I want to perform the function on it and every element
# before it, and store the result in the square array, multiplied by
# the difference between the ith and (i-1)th element.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
# Sum through j and return full sums as 1D array.
return np.sum(B, axis=0)
In short, I am integrating a function which takes two elements of the same array as arguments, returning an array of results of the integral.
Is there a more compact way to do this, without using loops?
The use of an arbitrary f function, and this [i, :i] business complicates by passing a loop.
Most of the fast compiled numpy operations work on the whole array, or whole rows and/or columns, and effectively do so in parallel. Loops that are inherently sequential (value from one loop depends on the previous) don't fit well. And different size lists or arrays in each loop are also a good indicator that 'vectorizing' will be difficult.
for i in range(A.size):
B[i,:i] = f(A[i], A[:i])*(A[i]-A[i-1])
With a sample A and known f (as simple as arg1*arg2), I'd generate a B array, and look for patterns that treat B as a whole. At first glance it looks like your B is a lower triangle. There are functions to help index those. But that final sum might change the picture.
Sometimes I tackle these problems with a bottom up approach, trying to remove inner loops first. But in this case, I think some sort of big-picture approach is needed.

Python - Difficulty Calculating Partition Function using Large NumPy arrays

I had a pretty compact way of computing the partition function of an Ising-like model using itertools, lambda functions, and large NumPy arrays. Given a network consisting of N nodes and Q "states"/node, I have two arrays, h-fields and J-couplings, of sizes (N,Q) and (N,N,Q,Q) respectively. J is upper-triangular, however. Using these arrays, I have been computing the partition function Z using the following method:
# Set up lambda functions and iteration tuples of the form (A_1, A_2, ..., A_n)
iters = itertools.product(range(Q),repeat=N)
hf = lambda s: h[range(N),s]
jf = lambda s: np.array([J[fi,fj,s[fi],s[fj]] \
for fi,fj in itertools.combinations(range(N),2)]).flatten()
# Initialize and populate partition function array
pf = np.zeros(tuple([Q for i in range(N)]))
for it in iters:
hterms = np.exp(hf(it)).prod()
jterms = np.exp(-jf(it)).prod()
pf[it] = jterms * hterms
# Calculates partition function
Z = pf.sum()
This method works quickly for small N and Q, say (N,Q) = (5,2). However, for larger systems (N,Q) = (18,3), this method cannot even create the pf array due to memory issues because it has Q^N nontrivial elements. Any ideas on how to either overcome this memory issue or how to alter the code to work on subarrays?
Edit: Made a small mistake in the definition of jf. It has been corrected.
You can avoid the large array just by initializing Z to 0, and incrementing it by jterms * iterms in each iteration. This still won't get you out of calculating and summing Q^N numbers, however. To do that, you probably need to figure out a way to simplify the partition function algebraically.
Not sure what you are trying to compute but I tested your code with ChrisB suggestion and jf will not work for Q=3.
Perhaps you shouldn't use a dense numpy array to encode your function? You could try sparse arrays or just straight Python with Numba compilation. This blogpost shows using Numba on the simple Ising model with good performance.

numpy: efficient execution of a complex reshape of an array

I am reading a vendor-provided large binary array into a 2D numpy array tempfid(M, N)
# load data
data=numpy.fromfile(file=dirname+'/fid', dtype=numpy.dtype('i4'))
# convert to complex data
fid=data[::2]+1j*data[1::2]
tempfid=fid.reshape(I*J*K, N)
and then I need to reshape it into a 4D array useful4d(N,I,J,K) using non-trivial mappings for the indices. I do this with a for loop along the following lines:
for idx in range(M):
i=f1(idx) # f1, f2, and f3 are functions involving / and % as well as some lookups
j=f2(idx)
k=f3(idx)
newfid[:,i,j,k] = tempfid[idx,:] #SLOW! CAN WE IMPROVE THIS?
Converting to complex takes 33% of the time while the copying of these slices M slices takes the remaining 66%. Calculating the indices is fast irrespective of whether I do this one by one in a loop as shown or by numpy.vectorizing the operation and applying it to an arange(M).
Is there a way to speed this up? Any help on more efficient slicing, copying (or not) etc appreciated.
EDIT:
As learned in the answer to question "What's the fastest way to convert an interleaved NumPy integer array to complex64?" the conversion to complex can be sped up by a factor of 6 if a view is used instead:
fid = data.astype(numpy.float32).view(numpy.complex64)
idx = numpy.arange(M)
i = numpy.vectorize(f1)(idx)
j = numpy.vectorize(f2)(idx)
k = numpy.vectorize(f3)(idx)
# you can index arrays with other arrays
# that lets you specify this operation in one line.
newfid[:, i,j,k] = tempfid.T
I've never used numpy's vectorize. Vectorize just means that numpy will call your python function multiple times. In order to get speed, you need use array operations like the one I showed here and you used to get complex numbers.
EDIT
The problem is that the dimension of size 128 was first in newfid, but last in tempfid. This is easily by using .T which takes the transpose.
How about this. Set us your indicies using the vectorized versions of f1,f2,f3 (not necessarily using np.vectorize, but perhaps just writing a function that takes an array and returns an array), then use np.ix_:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ix_.html
to get the index arrays. Then reshape tempfid to the same shape as newfid and then use the results of np.ix_ to set the values. For example:
tempfid = np.arange(10)
i = f1(idx) # i = [4,3,2,1,0]
j = f2(idx) # j = [1,0]
ii = np.ix_(i,j)
newfid = tempfid.reshape((5,2))[ii]
This maps the elements of tempfid onto a new shape with a different ordering.

Categories

Resources