Build an ever growing 3D numpy array - python

I have a function (MyFunct(X)) that, depending on the value of X, will return either a 3D numpy array (e.g np.ones((5,2,3)) or an empty array (np.array([])).
RetVal = MyFunct(X) # RetVal can be np.array([]) or np.ones((5,2,3))
NB I'm using np.ones((5,2,3)) as a way to generate fake data - in reality the content of the RetVal is all integers.
MyFunct is called with a range of different X values, some of which will lead to an empty array being returned while others don't.
I'd like to create a new 3D numpy array (OUT) which is an n by 2 by 3 concatenated array of all the returned values from MyFunct(). This issue is trying to concatenate a 3D array and an empty list causes an exception (understandably!) rather than just silently not doing anything. There are various ways around this:
Explicitly checking if the RetVal is empty or not and then use np.concatenate()
Using a try/except block and catching exceptions
Adding each value to a list and then post-processing by removing empty entries
But these all feel ugly. Is there an efficient/fast way to do this 'correctly'?

You can reshape arrays to compatible shape :
concatenate([MyFunct(X).reshape((-1,2,3)) for X in values])
Example :
In [2]: def MyFunc(X): return ones((5,2,3)) if X%2 else array([])
In [3]: concatenate([MyFunc(X).reshape((-1,2,3)) for X in range(6)]).shape
Out[3]: (15, 2, 3)

Related

What Is the Function of Less Than (<) Operator in numpy Array?

I'm learning Python right now and I'm stuck with this line of code I found on the internet. I can not understand what actually this line of code do.
Suppose I have this array:
import numpy as np
x = np.array ([[1,5],[8,1],[10,0.5]]
y = x[np.sqrt(x[:,0]**2+x[:,1]**2) < 1]
print (y)
The result is an empty array. What I want to know is what does actually the y do? I've never encountered this kind of code before. It seems like the square brackets is like the if-conditional statement. Instead of that code, If write this line of code:
import numpy as np
x = np.array ([[1,5],[8,1],[10,0.5]]
y = x[0 < 1]
print (y)
It will return exactly what x is (because zero IS less than one).
Assuming that it is a way to write if-conditional statement, I find it really absurd because I'm comparing an array with an integer.
Thank you for your answer!
In Numpy:
[1,1,2,3,4] < 2
is (very roughly) equivalent to something like:
[x<2 for x in [1,1,2,3,4]]
for vanilla Python lists. And as such, in both cases, the result would be:
[True, True, False, False, False]
The same holds true for some other functions, like addition, multiplication and so on. Broadcasting is actually a major selling point for Numpy.
Now, another thing you can do in Numpy is boolean indexing, which is providing an array of bools that are interpreted as 'Keep this value Y/N?'. So:
arr = [1,1,2,3,4]
res = arr[arr<2]
# evaluates to:
=> [1,1]
numpy works differently when you slice an array using a boolean or an int.
From the docs:
This advanced indexing occurs when obj is an array object of Boolean type, such as may be returned from comparison operators. A single
boolean index array is practically identical to x[obj.nonzero()]
where, as described above, obj.nonzero() returns a tuple (of length
obj.ndim) of integer index arrays showing the True elements of obj.
However, it is faster when obj.shape == x.shape.
If obj.ndim == x.ndim, x[obj] returns a 1-dimensional array filled
with the elements of x corresponding to the True values of obj. The
search order will be row-major, C-style. If obj has True values at
entries that are outside of the bounds of x, then an index error will
be raised. If obj is smaller than x it is identical to filling it with
False.
When you index an array using booleans, you are telling numpy to select the data corresponding to True, therefore array[True] is not the same as array[1]. In the first case, numpy will therefore interpret it as a zero dimensional boolean array, which, based on how masks works, is the same as selecting all data.
Therefore:
x[True]
will return the full array, just as
x[False]
will return an empty array.

how to populate a numpy 2D array with function returning a numpy 1D array

I have function predicton like
def predictions(degree):
some magic,
return an np.ndarray([0..100])
I want to call this function for a few values of degree and use it to populate a larger np.ndarray (n=2), filling each row with the outcome of the function predictions. It seems like a simple task but somehow I cant get it working. I tried with
for deg in [1,2,4,8,10]:
np.append(result, predictions(deg),axis=1)
with result being an np.empty(100). But that failed with Singleton array array(1) cannot be considered a valid collection.
I could not get fromfunction it only works on a coordinate tuple, and the irregular list of degrees is not covered in the docs.
Don't use np.ndarray until you are older and wiser! I couldn't even use it without rereading the docs.
arr1d = np.array([1,2,3,4,5])
is the correct way to construct a 1d array from a list of numbers.
Also don't use np.append. I won't even add the 'older and wiser' qualification. It doesn't work in-place; and is slow when used in a loop.
A good way of building a 2 array from 1d arrays is:
alist = []
for i in ....:
alist.append(<alist or 1d array>)
arr = np.array(alist)
provided all the sublists have the same size, arr should be a 2d array.
This is equivalent to building a 2d array from
np.array([[1,2,3], [4,5,6]])
that is a list of lists.
Or a list comprehension:
np.array([predictions(i) for i in range(10)])
Again, predictions must all return the same length arrays or lists.
append is in the boring section of numpy. here you know the shape in advance
len_predictions = 100
def predictions(degree):
return np.ones((len_predictions,))
degrees = [1,2,4,8,10]
result = np.empty((len(degrees), len_predictions))
for i, deg in enumerate(degrees):
result[i] = predictions(deg)
if you want to store the degree somehow, you can use custom dtypes

Numpy array of multiple indices replace with a different matrix

I have an array of 2d indices.
indices = [[2,4], [6,77], [102,554]]
Now, I have a different 4-dimensional array, arr, and I want to only extract an array (it is an array, since it is 4-dimensional) with corresponding index in the indices array. It is equivalent to the following code.
for i in range(len(indices)):
output[i] = arr[indices[i][0], indices[i][1]]
However, I realized that using explicit for-loop yields a slow result. Is there any built-in numpy API that I can utilized? At this point, I tried using np.choose, np.put, np.take, but did not succeed to yield what I wanted. Thank you!
We need to index into the first two axes with the two columns from indices (thinking of it as an array).
Thus, simply convert to array and index, like so -
indices_arr = np.array(indices)
out = arr[indices_arr[:,0], indices_arr[:,1]]
Or we could extract those directly without converting to array and then index -
d0,d1 = [i[0] for i in indices], [i[1] for i in indices]
out = arr[d0,d1]
Another way to extract the elements would be with conversion to tuple, like so -
out = arr[tuple(indices_arr.T)]
If indices is already an array, skip the conversion process and use indices in places where we had indices_arr.
Try using the take function of numpy arrays. Your code should be something like:
outputarray= np.take(arr,indices)

Adding coordinates to an array in Python 3

So I have image data which I am iterating through in order to find the pixel which have useful data in them, I then need to find these coordinates subject to a conditional statement and then put these into an array or DataFrame. The code I have so far is:
pix_coor = np.empty((0,2))
for (x,y), value in np.ndenumerate(data_int):
if value >= sigma3:
pix_coor.append([x,y])
where data is just an image array (129,129). All the pixels that have a value larger than sigma3 are useful and the other ones I dont need.
Creating an empty array works fine but when I append this it doesn't seem to work, I need to end up with an array which has two columns of x and y values for the useful pixels. Any ideas?
You could simply use np.argwhere for a vectorized solution -
pix_coor = np.argwhere(data_int >= sigma3)
In numpy, array.append is not an inplace operation, instead it copies the entire array into newly allocated memory (big enough to hold it along with the new values), and returns the new array. Therefore it should be used as such:
new_arr = arr.append(values)
Obviously, this is not an efficient way to add elements one by one.
You should use probably a regular python list for this.
Alternatively, pre allocate the numpy array with all values and then resize it:
pix_coor = np.empty((data_int.size, 2), int)
c = 0
for (x, y), value in np.ndenumerate(data_int):
if value >= sigma3:
pix_coor[c] = (x, y)
c += 1
numpy.resize(pix_coor, (c, 2))
Note that I used np.empty((data_int.size, 2), int), since your coordinates are integral, while numpy defaults to floats.

How to construct an np.array with fromiter

I'm trying to construct an np.array by sampling from a python generator, that yields one row of the array per invocation of next. Here is some sample code:
import numpy as np
data = np.eye(9)
labels = np.array([0,0,0,1,1,1,2,2,2])
def extract_one_class(X,labels,y):
""" Take an array of data X, a column vector array of labels, and one particular label y. Return an array of all instances in X that have label y """
return X[np.nonzero(labels[:] == y)[0],:]
def generate_points(data, labels, size):
""" Generate and return 'size' pairs of points drawn from different classes """
label_alphabet = np.unique(labels)
assert(label_alphabet.size > 1)
for useless in xrange(size):
shuffle(label_alphabet)
first_class = extract_one_class(data,labels,label_alphabet[0])
second_class = extract_one_class(data,labels,label_alphabet[1])
pair = np.hstack((first_class[randint(0,first_class.shape[0]),:],second_class[randint(0,second_class.shape[0]),:]))
yield pair
points = np.fromiter(generate_points(data,labels,5),dtype = np.dtype('f8',(2*data.shape[1],1)))
The extract_one_class function returns a subset of data: all data points belonging to one class label. I would like to have points be an np.array with shape = (size,data.shape[1]). Currently the code snippet above returns an error:
ValueError: setting an array element with a sequence.
The documentation of fromiter claims to return a one-dimensional array. Yet others have used fromiter to construct record arrays in numpy before (e.g http://iam.al/post/21116450281/numpy-is-my-homeboy).
Am I off the mark in assuming I can generate an array in this fashion? Or is my numpy just not quite right?
As you've noticed, the documentation of np.fromiter explains that the function creates a 1D array. You won't be able to create a 2D array that way, and #unutbu method of returning a 1D array that you reshape afterwards is a sure go.
However, you can indeed create structured arrays using fromiter, as illustrated by:
>>> import itertools
>>> a = itertools.izip((1,2,3),(10,20,30))
>>> r = np.fromiter(a,dtype=[('',int),('',int)])
array([(1, 10), (2, 20), (3, 30)],
dtype=[('f0', '<i8'), ('f1', '<i8')])
but look, r.shape=(3,), that is, r is really nothing but 1D array of records, each record being composed of two integers. Because all the fields have the same dtype, we can take a view of r as a 2D array
>>> r.view((int,2))
array([[ 1, 10],
[ 2, 20],
[ 3, 30]])
So, yes, you could try to use np.fromiter with a dtype like [('',int)]*data.shape[1]: you'll get a 1D array of length size, that you can then view this array as ((int, data.shape[1])). You can use floats instead of ints, the important part is that all fields have the same dtype.
If you really want it, you can use some fairly complex dtype. Consider for example
r = np.fromiter(((_,) for _ in a),dtype=[('',(int,2))])
Here, you get a 1D structured array with 1 field, the field consisting of an array of 2 integers. Note the use of (_,) to make sure that each record is passed as a tuple (else np.fromiter chokes). But do you need that complexity?
Note also that as you know the length of the array beforehand (it's size), you should use the counter optional argument of np.fromiter for more efficiency.
You could modify generate_points to yield single floats instead of np.arrays, use np.fromiter to form a 1D array, and then use .reshape(size, -1) to make it a 2D array.
points = np.fromiter(
generate_points(data,labels,5)).reshape(size, -1)
Following some suggestions here, I came up with a fairly general drop-in replacement for numpy.fromiter() that satisfies the requirements of the OP:
import numpy as np
def fromiter(iterator, dtype, *shape):
"""Generalises `numpy.fromiter()` to multi-dimesional arrays.
Instead of the number of elements, the parameter `shape` has to be given,
which contains the shape of the output array. The first dimension may be
`-1`, in which case it is inferred from the iterator.
"""
res_shape = shape[1:]
if not res_shape: # Fallback to the "normal" fromiter in the 1-D case
return np.fromiter(iterator, dtype, shape[0])
# This wrapping of the iterator is necessary because when used with the
# field trick, np.fromiter does not enforce consistency of the shapes
# returned with the '_' field and silently cuts additional elements.
def shape_checker(iterator, res_shape):
for value in iterator:
if value.shape != res_shape:
raise ValueError("shape of returned object %s does not match"
" given shape %s" % (value.shape, res_shape))
yield value,
return np.fromiter(shape_checker(iterator, res_shape),
[("_", dtype, res_shape)], shape[0])["_"]

Categories

Resources