How to construct an np.array with fromiter - python

I'm trying to construct an np.array by sampling from a python generator, that yields one row of the array per invocation of next. Here is some sample code:
import numpy as np
data = np.eye(9)
labels = np.array([0,0,0,1,1,1,2,2,2])
def extract_one_class(X,labels,y):
""" Take an array of data X, a column vector array of labels, and one particular label y. Return an array of all instances in X that have label y """
return X[np.nonzero(labels[:] == y)[0],:]
def generate_points(data, labels, size):
""" Generate and return 'size' pairs of points drawn from different classes """
label_alphabet = np.unique(labels)
assert(label_alphabet.size > 1)
for useless in xrange(size):
shuffle(label_alphabet)
first_class = extract_one_class(data,labels,label_alphabet[0])
second_class = extract_one_class(data,labels,label_alphabet[1])
pair = np.hstack((first_class[randint(0,first_class.shape[0]),:],second_class[randint(0,second_class.shape[0]),:]))
yield pair
points = np.fromiter(generate_points(data,labels,5),dtype = np.dtype('f8',(2*data.shape[1],1)))
The extract_one_class function returns a subset of data: all data points belonging to one class label. I would like to have points be an np.array with shape = (size,data.shape[1]). Currently the code snippet above returns an error:
ValueError: setting an array element with a sequence.
The documentation of fromiter claims to return a one-dimensional array. Yet others have used fromiter to construct record arrays in numpy before (e.g http://iam.al/post/21116450281/numpy-is-my-homeboy).
Am I off the mark in assuming I can generate an array in this fashion? Or is my numpy just not quite right?

As you've noticed, the documentation of np.fromiter explains that the function creates a 1D array. You won't be able to create a 2D array that way, and #unutbu method of returning a 1D array that you reshape afterwards is a sure go.
However, you can indeed create structured arrays using fromiter, as illustrated by:
>>> import itertools
>>> a = itertools.izip((1,2,3),(10,20,30))
>>> r = np.fromiter(a,dtype=[('',int),('',int)])
array([(1, 10), (2, 20), (3, 30)],
dtype=[('f0', '<i8'), ('f1', '<i8')])
but look, r.shape=(3,), that is, r is really nothing but 1D array of records, each record being composed of two integers. Because all the fields have the same dtype, we can take a view of r as a 2D array
>>> r.view((int,2))
array([[ 1, 10],
[ 2, 20],
[ 3, 30]])
So, yes, you could try to use np.fromiter with a dtype like [('',int)]*data.shape[1]: you'll get a 1D array of length size, that you can then view this array as ((int, data.shape[1])). You can use floats instead of ints, the important part is that all fields have the same dtype.
If you really want it, you can use some fairly complex dtype. Consider for example
r = np.fromiter(((_,) for _ in a),dtype=[('',(int,2))])
Here, you get a 1D structured array with 1 field, the field consisting of an array of 2 integers. Note the use of (_,) to make sure that each record is passed as a tuple (else np.fromiter chokes). But do you need that complexity?
Note also that as you know the length of the array beforehand (it's size), you should use the counter optional argument of np.fromiter for more efficiency.

You could modify generate_points to yield single floats instead of np.arrays, use np.fromiter to form a 1D array, and then use .reshape(size, -1) to make it a 2D array.
points = np.fromiter(
generate_points(data,labels,5)).reshape(size, -1)

Following some suggestions here, I came up with a fairly general drop-in replacement for numpy.fromiter() that satisfies the requirements of the OP:
import numpy as np
def fromiter(iterator, dtype, *shape):
"""Generalises `numpy.fromiter()` to multi-dimesional arrays.
Instead of the number of elements, the parameter `shape` has to be given,
which contains the shape of the output array. The first dimension may be
`-1`, in which case it is inferred from the iterator.
"""
res_shape = shape[1:]
if not res_shape: # Fallback to the "normal" fromiter in the 1-D case
return np.fromiter(iterator, dtype, shape[0])
# This wrapping of the iterator is necessary because when used with the
# field trick, np.fromiter does not enforce consistency of the shapes
# returned with the '_' field and silently cuts additional elements.
def shape_checker(iterator, res_shape):
for value in iterator:
if value.shape != res_shape:
raise ValueError("shape of returned object %s does not match"
" given shape %s" % (value.shape, res_shape))
yield value,
return np.fromiter(shape_checker(iterator, res_shape),
[("_", dtype, res_shape)], shape[0])["_"]

Related

Why the length of the array appended in loop is more than the number of iteration?

I ran this code and expected an array size of 10000 as time is a numpy array of length of 10000.
freq=np.empty([])
for i,t in enumerate(time):
freq=np.append(freq,np.sin(t))
print(time.shape)
print(freq.shape)
But this is the output I got
(10000,)
(10001,)
Can someone explain why I am getting this disparity?
It turns out that the function np.empty() returns an uninitialized array of a given shape. Hence, when you do np.empty([]), it returns an uninitialized array as array(0.14112001). It's like having a value "ready to be used", but without having the actual value. You can check this out by printing the variable freq before the loop starts.
So, when you loop over freq = np.append(freq,np.sin(t)) this actually initializes the array and append a second value to it.
Also, if you just need to create an empty array just do x = np.array([]) or x = [].
You can read more about this numpy.empty function here:
https://numpy.org/doc/1.18/reference/generated/numpy.empty.html
And more about initializing arrays here:
https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/language_ref/aryin.html
I'm not sure if I was clear enough. It's not a straight forward concept. So please let me know.
You should fill np.empty(0).
I look for source code of numpy numpy/core.py
def empty(shape, dtype=None, order='C'):
"""Return a new matrix of given shape and type, without initializing entries.
Parameters
----------
shape : int or tuple of int
Shape of the empty matrix.
dtype : data-type, optional
Desired output data-type.
order : {'C', 'F'}, optional
Whether to store multi-dimensional data in row-major
(C-style) or column-major (Fortran-style) order in
memory.
See Also
--------
empty_like, zeros
Notes
-----
`empty`, unlike `zeros`, does not set the matrix values to zero,
and may therefore be marginally faster. On the other hand, it requires
the user to manually set all the values in the array, and should be
used with caution.
Examples
--------
>>> import numpy.matlib
>>> np.matlib.empty((2, 2)) # filled with random data
matrix([[ 6.76425276e-320, 9.79033856e-307], # random
[ 7.39337286e-309, 3.22135945e-309]])
>>> np.matlib.empty((2, 2), dtype=int)
matrix([[ 6600475, 0], # random
[ 6586976, 22740995]])
"""
return ndarray.__new__(matrix, shape, dtype, order=order)
It will input first arg shape into ndarray, so it will init a new array as [].
And you can print np.empty(0) and freq=np.empty([]) to see what are their differences.
I think you are trying to replicate a list operation:
freq=[]
for i,t in enumerate(time):
freq.append(np.sin(t))
But neither np.empty or np.append are exact clones; the names are similar but the differences are significant.
First:
In [75]: np.empty([])
Out[75]: array(1.)
In [77]: np.empty([]).shape
Out[77]: ()
This is a 1 element, 0d array.
If you look at the code for np.append you'll see that if the 1st argument is not 1d (and axis is not provided) it flattens it (that's documented as well):
In [78]: np.append??
In [82]: np.empty([]).ravel()
Out[82]: array([1.])
In [83]: np.empty([]).ravel().shape
Out[83]: (1,)
It is not a 1d, 1 element array. Append that with another array:
In [84]: np.append(np.empty([]), np.sin(2))
Out[84]: array([1. , 0.90929743])
The result is 2d. Repeat that 1000 times and you end up with 1001 values.
np.empty despite its name does not produce a [] list equivalent. As others show np.array([]) sort of does, as would np.empty(0).
np.append is not a list append clone. It is just a cover function to np.concatenate. It's ok for adding an element to a longer array, but beyond that it has too many pitfalls to be useful. It's especially bad in a loop like this. Getting a correct start array is tricky. And it is slow (compared to list append). Actually these problems apply to all uses of concatenate and stack... in a loop.

how to populate a numpy 2D array with function returning a numpy 1D array

I have function predicton like
def predictions(degree):
some magic,
return an np.ndarray([0..100])
I want to call this function for a few values of degree and use it to populate a larger np.ndarray (n=2), filling each row with the outcome of the function predictions. It seems like a simple task but somehow I cant get it working. I tried with
for deg in [1,2,4,8,10]:
np.append(result, predictions(deg),axis=1)
with result being an np.empty(100). But that failed with Singleton array array(1) cannot be considered a valid collection.
I could not get fromfunction it only works on a coordinate tuple, and the irregular list of degrees is not covered in the docs.
Don't use np.ndarray until you are older and wiser! I couldn't even use it without rereading the docs.
arr1d = np.array([1,2,3,4,5])
is the correct way to construct a 1d array from a list of numbers.
Also don't use np.append. I won't even add the 'older and wiser' qualification. It doesn't work in-place; and is slow when used in a loop.
A good way of building a 2 array from 1d arrays is:
alist = []
for i in ....:
alist.append(<alist or 1d array>)
arr = np.array(alist)
provided all the sublists have the same size, arr should be a 2d array.
This is equivalent to building a 2d array from
np.array([[1,2,3], [4,5,6]])
that is a list of lists.
Or a list comprehension:
np.array([predictions(i) for i in range(10)])
Again, predictions must all return the same length arrays or lists.
append is in the boring section of numpy. here you know the shape in advance
len_predictions = 100
def predictions(degree):
return np.ones((len_predictions,))
degrees = [1,2,4,8,10]
result = np.empty((len(degrees), len_predictions))
for i, deg in enumerate(degrees):
result[i] = predictions(deg)
if you want to store the degree somehow, you can use custom dtypes

array of unknown size

I am pulling data from a .CSV into an array, as follows:
my_data = genfromtxt('nice.csv', delimiter='')
a = np.array(my_data)
I then attempt to establish the size and shape of the array, thus:
size_array=np.size(a)
shape_array=np.shape(a)
Now, I want to generate an array of identical shape and size, and then carry out some multiplications. The trouble I am having is generating the correctly sized array. I have tried this:
D = np.empty([shape_array,])
I receive the error:
"tuple' object cannot be interpreted as an index".
After investigation my array has a shape of (248L,). Please...how do I get this array in a sensible format?
Thanks.
The line shape_array=np.shape(a) creates a shape tuple, which is the expected input to np.empty.
The expression [shape_array,] is that tuple, wrapped in a list, which seems superfluous. Use shape_array directly:
d = np.empty(shape_array)
On a related note, you can use the function np.empty_like to get an array of the same shape and type as the original more effectively:
d = np.empty_like(a)
If you want to use just the shape and size, there is really no need to store them in separate variables after calling np.size and np.shape. It is more idiomatic to use the corresponding properties of np.ndarray directly:
d = np.empty(a.shape)

Build an ever growing 3D numpy array

I have a function (MyFunct(X)) that, depending on the value of X, will return either a 3D numpy array (e.g np.ones((5,2,3)) or an empty array (np.array([])).
RetVal = MyFunct(X) # RetVal can be np.array([]) or np.ones((5,2,3))
NB I'm using np.ones((5,2,3)) as a way to generate fake data - in reality the content of the RetVal is all integers.
MyFunct is called with a range of different X values, some of which will lead to an empty array being returned while others don't.
I'd like to create a new 3D numpy array (OUT) which is an n by 2 by 3 concatenated array of all the returned values from MyFunct(). This issue is trying to concatenate a 3D array and an empty list causes an exception (understandably!) rather than just silently not doing anything. There are various ways around this:
Explicitly checking if the RetVal is empty or not and then use np.concatenate()
Using a try/except block and catching exceptions
Adding each value to a list and then post-processing by removing empty entries
But these all feel ugly. Is there an efficient/fast way to do this 'correctly'?
You can reshape arrays to compatible shape :
concatenate([MyFunct(X).reshape((-1,2,3)) for X in values])
Example :
In [2]: def MyFunc(X): return ones((5,2,3)) if X%2 else array([])
In [3]: concatenate([MyFunc(X).reshape((-1,2,3)) for X in range(6)]).shape
Out[3]: (15, 2, 3)

Interpreting numpy.where results

I'm confused by what the results of numpy.where mean, and how to use it to index into an array.
Have a look at the code sample below:
import numpy as np
a = np.random.randn(10,10,2)
indices = np.where(a[:,:,0] > 0.5)
I expect the indices array to be 2-dim and contain the indices where the condition is true. We can see that by
indices = np.array(indices)
indices.shape # (2,120)
So it looks like indices is acting on the flattened array of some sort, but I'm not able to figure out exactly how. More confusingly,
a.shape # (20,20,2)
a[indices].shape # (2,120,20,2)
Question:
How does indexing my array with the output of np.where actually grow the size of the array? What is going on here?
You are basing your indexing on a wrong assumption: np.where returns something that can be immediatly used for advanced indexing (it's a tuple of np.ndarrays). But you convert it to a numpy array (so it's now a np.ndarray of np.ndarrays).
So
import numpy as np
a = np.random.randn(10,10,2)
indices = np.where(a[:,:,0] > 0.5)
a[:,:,0][indices]
# If you do a[indices] the result would be different, I'm not sure what
# you intended.
gives you the elements that are found by np.where. If you convert indices to a np.array it triggers another form of indexing (see this section of the numpy docs) and the warning message in the docs gets very important. That's the reason why it increases the total size of your array.
Some additional information about what np.where means: You get a tuple containing n arrays. n is the number of dimensions of the input array. So the first element that satisfies the condition has index [0][0], [1][0], ... [n][0] and not [0][0], [0][1], ... [0][n]. So in your case you have (2, 120) meaning you have 2 dimensions and 120 found points.

Categories

Resources