iterate through number of columns, with variable columns - python

For example, let's consider this toy code
import numpy as np
import numpy.random as rnd
a = rnd.randint(0,10,(10,10))
k = (1,2)
b = a[:,k]
for col in np.arange(np.size(b,1)):
b[:,col] = b[:,col]+col*100
This code will work when the size of k is bigger than 1. However, with the size equal to 1, the extracted sub-matrix from a is transformed into a row vector, and applying the function in the for loop throws an error.
Of course, I could fix this by checking the dimension of b and reshaping:
if np.dim(b) == 1:
b = np.reshape(b, (np.size(b), 1))
in order to obtain a column vector, but this is expensive.
So, the question is: what is the best way to handle this situation?
This seems like something that would arise quite often and I wonder what is the best strategy to deal with it.

If you index with a list or tuple, the 2d shape is preserved:
In [638]: a=np.random.randint(0,10,(10,10))
In [639]: a[:,(1,2)].shape
Out[639]: (10, 2)
In [640]: a[:,(1,)].shape
Out[640]: (10, 1)
And I think b iteration can be simplified to:
a[:,k] += np.arange(len(k))*100
This sort of calculation will also be easier is k is always a list or tuple, and never a scalar (a scalar does not have a len).
np.column_stack ensures its inputs are 2d (and expands at the end if not) with:
if arr.ndim < 2:
arr = array(arr, copy=False, subok=True, ndmin=2).T
np.atleast_2d does
elif len(ary.shape) == 1:
result = ary[newaxis,:]
which of course could changed in this case to
if b.ndim==1:
b = b[:,None]
Any ways, I think it is better to ensure the k is a tuple rather than adjust b shape after. But keep both options in your toolbox.

Related

Problem with python while loop going over every element in 2d array

I am having problems with while loops in python right now:
while(j < len(firstx)):
trainingset[j][0] = firstx[j]
trainingset[j][1] = firsty[j]
trainingset[j][2] = 1 #top
print(trainingset)
j += 1
print("j is " + str(j))
i = 0
while(i < len(firsty)):
trainingset[j+i][0] = twox[i]
trainingset[j+i][1] = twoy[i]
trainingset[j+i][2] = 0 #bottom
i += 1
Where trainingset = [[0,0,0]]*points*2 where points is a number. Also firstx and firsty and twox and twoy are all numpy arrays.
I want the training set to have 2*points array entries which go [firstx[0], firsty[0], 1] all the way to [twox[points-1], twoy[points-1], 0].
After some debugging, I am realizing that for each iteration, **every single value **in the training set is being changed, so that when j = 0 all the training set values are replaced with firstx[0], firsty[0], and 1.
What am I doing wrong?
In this case, I would recommend a for loop instead of a while loop; for loops are great for when you want the index to increment and you know what the last increment value should be, which is true here.
I've had to make some assumptions about the shape of your arrays based on your code. I'm assuming that:
firstx, firsty, twox, and twoy are 1D NumPy arrays with either shape (length,) or (length, 1).
trainingset is a 2D NumPy array with least len(firstx) + len(firsty) rows and at least 3 columns.
j starts at 0 before the while loop begins.
Given these assumptions, here's some code that gives you the output you want:
len_firstx = len(firstx)
# Replace each row with index j with the desired values
for j in range(len(firstx)):
trainingset[j][0:3] = [firstx[j], firsty[j], 1]
# Because i starts at 0, len_firstx needs to be added to the trainingset row index
for i in range(len(firsty)):
trainingset[i + len_firstx][0:3] = [twox[i], twoy[i], 0]
Let me know if you have any questions.
EDIT: Alright, looks like the above doesn't work correctly on rare occasions, not sure why, so if it's being fickle, you can change trainingset[j][0:3] and trainingset[i + len_firstx][0:3] to trainingset[j, 0:3] and trainingset[i + len_firstx, 0:3].
I think it has something to do with the shape of the trainingset array, but I'm not quite sure.
EDIT 2: There's also a more Pythonic way to do what you want instead of using loops. It standardizes the shapes of the four arrays assumed to be 1D (firstx, twox, etc. -- also, if you could let me know exactly what shape these arrays have, that would be super helpful and I could simplify the code!) and then makes the appropriate rows and columns in trainingset have the corresponding values.
# Function to reshape the 1D arrays.
# Works for shapes (length,), (length, 1), and (1, length).
def reshape_1d_array(arr):
shape = arr.shape
if len(shape) == 1:
return arr[:, None]
elif shape[0] >= shape[1]:
return arr
else:
return arr.T
# Reshape the 1D arrays
firstx = reshape_1d_array(firstx)
firsty = reshape_1d_array(firsty)
twox = reshape_1d_array(twox)
twoy = reshape_1d_array(twoy)
len_firstx = len(firstx)
# The following 2 lines do what the loops did, but in 1 step.
arr1 = np.concatenate((firstx, firsty[0:len_firstx], np.array([[1]*len_firstx]).T), axis=1)
arr2 = np.concatenate((twox, twoy[0:len_firstx], np.array([[0]*len_firstx]).T), axis=1)
# Now put arr1 and arr2 where they're supposed to go in trainingset.
trainingset[:len_firstx, 0:3] = arr1
trainingset[len_firstx:len_firstx + len(firsty), 0:3] = arr2
This gives the same result as the two for loops I wrote before, but is faster if firstx has more than ~50 elements.

Selecting numpy columns based on values in a row

Suppose I have a numpy array with 2 rows and 10 columns. I want to select columns with even values in the first row. The outcome I want can be obtained is as follows:
a = list(range(10))
b = list(reversed(range(10)))
c = np.concatenate([a, b]).reshape(2, 10).T
c[c[:, 0] % 2 == 0].T
However, this method transposes twice and I don't suppose it's very pythonic. Is there a way to do the same job cleaner?
Numpy allows you to select along each dimension separately. You pass in a tuple of indices whose length is the number of dimensions.
Say your array is
a = np.random.randint(10, size=(2, 10))
The even elements in the first row are given by the mask
m = (a[0, :] % 2 == 0)
You can use a[0] to get the first row instead of a[0, :] because missing indices are synonymous with the slice : (take everything).
Now you can apply the mask to just the second dimension:
result = a[:, m]
You can also convert the mask to indices first. There are subtle differences between the two approaches, which you won't see in this simple case. The biggest difference is usually that linear indices are a little faster, especially if applied more than once:
i = np.flatnonzero(m)
result = a[:, i]

Is it possible to access the current indices during a Numpy vectorized broadcasting operation?

I would like to speed up a function on a single array in Numpy using fancy indexing, vectorization, and/or broadcasting. For each value in my array, I need to do a calculation that involves adjacent values. Therefore, in my vectorized operation, I need to have access to the current index so that I can grab indices around it. Consider the following simple array operation:
x = np.arange(36).reshape(6, 6)
y = np.zeros((6, 6))
y[:] = x + 1
I'd like to use similar syntax, but rather than a simple increment, I'd like to do something like add all values at adjacent indices to the current value in the vectorized loop. For instance if the region around index [i, j] == 7 looks like
3 2 5
2 7 6
5 5 5
I'd like the calculated value for [i, j] to be 3 + 2 + 5 + 2 + 7 + 6 + 5 + 5 + 5, and I want to do that for all indices [i, j].
This is a straightforward nested for loop (or a single for loop using np.sum for each index)... but I want to use broadcasting and/or fancy indexing if possible. This may be too complex of a problem for the Numpy syntax, but I feel like it should be possible.
Essentially, it comes down to this: how do I reference the current index during a broadcasting operation?
Start with a 1D example:
x = np.arange(10)
There is a choice you have to make: do you discard the edges or not, since they don't have two neighbors? If you do, you can create your output array in esentially one step:
result = x[:-2] + x[1:-1] + x[2:]
Notice that all three addends are views because they use simple indexing. You want to avoid fancy indexing as much as you can because it generally involves making copies.
If you prefer to retain the edges, you can pre-allocate the output buffer and add directly into it:
result = x.copy()
result[:-1] += x[1:]
result[1:] += x[:-1]
The fundamental idea in both cases is that to apply an operation to all neighboring elements, you just shift the array by +/-1. You don't need to know any indices, or do anything fancy. The simpler the better.
Hopefully you can see how how to generalize this to the 2D case. Rather than a single index shifting between -1, 0, 1, you have two indices in every possible combination of -1, 0, 1 between the two of them.
Appendix
Here's the generalized approach for a no-egde result:
from itertools import product
def sum_shifted(a):
result = np.zeros(tuple(x - 2 for x in a.shape), dtype=a.dtype)
for index in product([slice(0, -2), slice(1, -1), slice(2, None)], repeat=a.ndim):
result += a[index]
return result
This implementation is somewhat rudimentary because it doesn't check for inputs with no dimensions or shapes < 2, but it does work for arbitrary numbers of dimensions.
Notice that for a 1D case, the loop will run exactly three times, for 2D nine times and for ND 3N. This is one case where I find an explicit for loop to be appropriate with numpy. The loop is very small compared to the work done on a large array, fast enough for a small array, and certainly better than writing all 27 possibilities out by hand for the 3D case.
One more thing to pay attention to is how the successive indices are generated. In Python an index with a colon, like x[1:2:3] is converted to the relatively unknown slice object: slice(1, 2, 3). Since (almost) everything with commas gets interpreted as a tuple, an index like in the expression x[1:2, ::-1, :2] is exactly equivalent to (slice(1, 2), slice(None, None, -1), slice(None, 2)). The loop generates exactly such an expression, with one element for each dimension. So the result is actually simple indexing across all dimensions.
A similar approach is possible if you want to retain edges. The only significant difference is that you need to index both the input and the output arrays:
from itertools import product
def sum_shifted(a):
result = np.zeros_like(a)
for r_index, a_index in zip(product([slice(0, -1), slice(None), slice(1, None)], repeat=a.ndim),
product([slice(1, None), slice(None), slice(0, -1)], repeat=a.ndim)):
result[r_index] += a[a_index]
return result
This works because itertools.product guarantees the order of the iteration, so the two zipped iterators will stay in lockstep.
try this:
x = np.arange(36).reshape(6, 6)
y = np.zeros((6, 6))
for i in range(x.shape[0]):
for j in range(x.shape[1]):
if i>0 and i<x.shape[0]-1 and j>0 and j<x.shape[1]-1:
y[i,j]=x[i,j]+x[i-1,j]+x[i,j-1]+x[i-1,j-1]+x[i+1,j]+x[i,j+1]+x[i+1,j+1]+x[i-1,j+1]+x[i+1,j-1]
if j==0:
if i==0:
y[i,j]=x[i,j]+x[i,j+1]+x[i+1,j+1]+x[i+1,j]
elif i==x.shape[0]-1:
y[i,j]=x[i,j]+x[i,j+1]+x[i-1,j+1]+x[i-1,j]
else:
y[i,j]=x[i,j]+x[i,j+1]+x[i+1,j+1]+x[i+1,j]+x[i-1,j]+x[i-1,j+1]
if j==x.shape[1]-1:
if i==0:
y[i,j]=x[i,j]+x[i,j-1]+x[i+1,j-1]+x[i+1,j]
elif i==x.shape[0]-1:
y[i,j]=x[i,j]+x[i,j-1]+x[i-1,j-1]+x[i-1,j]
else:
y[i,j]=x[i,j]+x[i,j-1]+x[i-1,j-1]+x[i+1,j]+x[i-1,j]+x[i+1,j-1]
if i==0 and j in range(1,x.shape[1]-1):
y[i,j]=x[i,j]+x[i,j-1]+x[i+1,j-1]+x[i+1,j]+x[i+1,j+1]+x[i,j+1]
if i==x.shape[0]-1 and j in range(1,x.shape[1]-1):
y[i,j]=x[i,j]+x[i,j-1]+x[i-1,j-1]+x[i-1,j]+x[i-1,j+1]+x[i,j+1]
print(y)

numpy: How do I pick rows from two 2D arrays based on conditions in 1D arrays?

I have two arrays of length n, namely old_fitness and new_fitness, and two matrices of dimension nxm, namely old_values and new_values.
What is the best way to create an nxm matrix best_fitness that comprises row new_values[i] when new_fitness[i] > old_fitness[i] and old_values[i] otherwise?
Something like:
best_values = nd.where(new_fitness > old_fitness, new_values, old_values)
but that works on rows of the last two matrices, instead of individual elements? I'm sure there's an easy answer, but I am a complete newbie to numpy.
Edit: new_values and old_values contain rows that represent possible solutions to a problem, and new_fitness and old_fitness contain a numeric measure of fitness for each possible solution / row in new_values and old_values respectively.
Should work as long as the comparison is of shape (n,1) - not (n,)
import numpy as np
old_fitness = np.asarray([0,1])
new_fitness = np.asarray([1,0])
old_value = np.asarray([[1,2], [3,4]])
new_value = np.asarray([[5,6], [7,8]])
np.where((new_fitness>old_fitness).reshape(old_fitness.shape[0],1), new_value, old_value)
returns
array([[5, 6],
[3, 4]])
Another possible solution, working on numpy arrays:
best_values = numpy.copy(old_values)
best_values[new_fitness > old_fitness, :] = new_values[new_fitness > old_fitness, :]
Are the arrays of equal length? If so zip them and then use a map function to return the desired output.
For example, something like:
bests = map(new_val if new_val > old_val else old_val for (old_val, new_val) in zip(old_fitness, new_fitness))
Edit: this is probably better
bests = map(lambda n, o: n if n > o else o, new_fitness, old_fitness)
Here's another one that works too!
bests = [np.max(pair) for pair in zip(new_fitness, old_fitness)]

Dealing with multi-dimensional arrays when ndims not known in advance

I am working with data from netcdf files, with multi-dimensional variables, read into numpy arrays. I need to scan all values in all dimensions (axes in numpy) and alter some values. But, I don't know in advance the dimension of any given variable. At runtime I can, of course, get the ndims and shapes of the numpy array.
How can I program a loop thru all values without knowing the number of dimensions, or shapes in advance? If I knew a variable was exactly 2 dimensions, I would do
shp=myarray.shape
for i in range(shp[0]):
for j in range(shp[1]):
do_something(myarray[i][j])
You should look into ravel, nditer and ndindex.
# For the simple case
for value in np.nditer(a):
do_something_with(value)
# This is similar to above
for value in a.ravel():
do_something_with(value)
# Or if you need the index
for idx in np.ndindex(a.shape):
a[idx] = do_something_with(a[idx])
On an unrelated note, numpy arrays are indexed a[i, j] instead of a[i][j]. In python a[i, j] is equivalent to indexing with a tuple, ie a[(i, j)].
You can use the flat property of numpy arrays, which returns a generator on all values (no matter the shape).
For instance:
>>> A = np.array([[1,2,3],[4,5,6]])
>>> for x in A.flat:
... print x
1
2
3
4
5
6
You can also set the values in the same order they're returned, e.g. like this:
>>> A.flat[:] = [x / 2 if x % 2 == 0 else x for x in A.flat]
>>> A
array([[1, 1, 3],
[2, 5, 3]])
I am not sure the order in which flat returns the elements is guaranteed in any way (as it iterates through the elements as they are in memory, so depending on your array convention you are likely to have it always being the same, unless you are really doing it on purpose, but be careful...)
And this will work for any dimension.
** -- Edit -- **
To clarify what I meant by 'order not guaranteed', the order of elements returned by flat does not change, but I think it would be unwise to count on it for things like row1 = A.flat[:N], although it will work most of the time.
This might be the easiest with recursion:
a = numpy.array(range(30)).reshape(5, 3, 2)
def recursive_do_something(array):
if len(array.shape) == 1:
for obj in array:
do_something(obj)
else:
for subarray in array:
recursive_do_something(subarray)
recursive_do_something(a)
In case you want the indices:
a = numpy.array(range(30)).reshape(5, 3, 2)
def do_something(x, indices):
print(indices, x)
def recursive_do_something(array, indices=None):
indices = indices or []
if len(array.shape) == 1:
for obj in array:
do_something(obj, indices)
else:
for i, subarray in enumerate(array):
recursive_do_something(subarray, indices + [i])
recursive_do_something(a)
Look into Python's itertools module.
Python 2: http://docs.python.org/2/library/itertools.html#itertools.product
Python 3: http://docs.python.org/3.3/library/itertools.html#itertools.product
This will allow you to do something along the lines of
for lengths in product(shp[0], shp[1], ...):
do_something(myarray[lengths[0]][lengths[1]]

Categories

Resources