numpy fromfile and structured arrays

numpy fromfile and structured arrays - python

I'm trying to use numpy.fromfile to read a structured array (file header) by passing in a user defined data-type. For some reason, my structured array elements are coming back as 2-d Arrays instead of flat 1D arrays:
headerfmt='20i,20f,a80'
dt = np.dtype(headerfmt)
header = np.fromfile(fobj,dtype=dt,count=1)
ints,floats,chars = header['f0'][0], header['f1'][0], header['f2'][0]
# ^? ^? ^?
How do I modify headerfmt so that it will read them as flat 1D arrays?

If the count will always be 1, just do:
header = np.fromfile(fobj, dtype=dt, count=1)[0]
You'll still be able to index by field name, though the repr of the array won't show the field names.
For example:
import numpy as np
headerfmt='20i,20f,a80'
dt = np.dtype(headerfmt)
# Note the 0-index!
x = np.zeros(1, dtype=dt)[0]
print x['f0'], x['f1'], x['f2']
ints, floats, chars = x
It may or may not be ideal for your purposes, but it's simple, at any rate.

Related

Efficient way of creating numpy array of fixed size tuples from bytes

I'm trying to convert bytes to numpy array of fixed size tuples (2 or 3 doubles) and it must be 1d array.
What I managed to get is:
values = np.fromstring(data, (np.double, (n,))) - it gives me 2d array with shape (105107, 2)
array([[0.03171165, 0.03171165],
[0.03171165, 0.03171165],
[0.03020949, 0.03020949],
...,
[0.05559354, 0.16173067],
[0.12667986, 0.04522982],
[0.14062567, 0.11422881]])
values = np.fromstring(data, [('dt', np.double, (n,))]) - it gives me 1d array with shape (105107,), but array contains tuples containing array with two doubles
array([([0.03171165, 0.03171165],), ([0.03171165, 0.03171165],),
([0.03020949, 0.03020949],), ..., ([0.05559354, 0.16173067],),
([0.12667986, 0.04522982],), ([0.14062567, 0.11422881],)],
dtype=[('dt', '<f8', (2,))])
is there any efficient way to achieve 1d array like this?:
array([(0.03171165, 0.03171165),
(0.03171165, 0.03171165),
(0.03020949, 0.03020949),
...,
(0.05559354, 0.16173067),
(0.12667986, 0.04522982),
(0.14062567, 0.11422881)])

No, I don't know an efficient way, but as nobody has so far posted any answer at all, here is a way that at least gets you the desired output. However, efficient it is not.
values = np.fromstring(data, (np.double, (n,)))
x = np.empty(values.shape[0], dtype=np.object)
for i, a in enumerate(values):
x[i] = tuple(a)
I would add that if you have an array of objects, it so much negates the benefits of using vectorisation in numpy, that you might as well just use a list instead:
values = np.fromstring(data, (np.double, (n,)))
x = [tuple(a) for a in values]
A possible alternative approach to generating the array of tuples -- not sure if it is any faster -- would be to go via such a list, and convert it back into an array in such a way as to deliberately break the conversion to a nice ordinary 2-d array that numpy would otherwise do:
values = np.fromstring(data, (np.double, (n,)))
x = [tuple(a) for a in values]
x.append(None)
y = np.array(x)[:-1]

I already solved the problem using this code:
names = ['d{i}'.format(i=i) for i in range(n)]
value = np.fromstring(data, {
'names': names,
'formats': [np.double] * n
})

Manipulating copied numpy array without changing the original

I am trying to manipulate a numpy array that contains data stored in an other array. So far, when I change a value in my array, both of the arrays get values changed:
import numpy as np
from astropy.io import fits
image = fits.getdata("randomImage.fits")
fft = np.fft.fft2(image)
fftMod = np.copy(fft)
fftMod = fftMod*2
if fftMod.all()== fft.all():
print "shit same same same "
-- > shit same same same
Why is?

You misunderstood the usage of the .all() method.
It yields True if all elements of an array are not 0. This seems to be the case in both your arrays or in neither of them.
Since one is the double of the other, they definetly give the same result to the .all() method (both True or both False)
edit as requested in the comments:
To compare the content of the both arrays use element wise comparison first and check that all elements are True with .all:
(fftMod == fft).all()
Or maybe better for floats including a certain tolerance:
np.allclose(fftMod, fft)

Build an ever growing 3D numpy array

I have a function (MyFunct(X)) that, depending on the value of X, will return either a 3D numpy array (e.g np.ones((5,2,3)) or an empty array (np.array([])).
RetVal = MyFunct(X) # RetVal can be np.array([]) or np.ones((5,2,3))
NB I'm using np.ones((5,2,3)) as a way to generate fake data - in reality the content of the RetVal is all integers.
MyFunct is called with a range of different X values, some of which will lead to an empty array being returned while others don't.
I'd like to create a new 3D numpy array (OUT) which is an n by 2 by 3 concatenated array of all the returned values from MyFunct(). This issue is trying to concatenate a 3D array and an empty list causes an exception (understandably!) rather than just silently not doing anything. There are various ways around this:
Explicitly checking if the RetVal is empty or not and then use np.concatenate()
Using a try/except block and catching exceptions
Adding each value to a list and then post-processing by removing empty entries
But these all feel ugly. Is there an efficient/fast way to do this 'correctly'?

You can reshape arrays to compatible shape :
concatenate([MyFunct(X).reshape((-1,2,3)) for X in values])
Example :
In [2]: def MyFunc(X): return ones((5,2,3)) if X%2 else array([])
In [3]: concatenate([MyFunc(X).reshape((-1,2,3)) for X in range(6)]).shape
Out[3]: (15, 2, 3)

Get dimensions of numpy structured (i.e. record) array?

Say I have a numpy structured array (a.k.a. record array):
record_types = np.dtype([
('date',object), #00 - Timestamp
('lats',float), #01 - Latitude
('lons',float), #02 - Longitude
('vals',float), #03 - Value
])
data = np.zeros(10, dtype=record_types)
If I try to call the shape attribute, I get (10,)
How can I do something like the following:
y, x = data.shape
To get y = 10 and x = 4
Thanks!

This is one of the confusing things about structured arrays.
You basically have a (n-1)D array where each item is a C-like struct.
This type of structure allows for all kinds of useful things (easy file IO with binary formats, for example), but it's quite confusing for a lot of other use cases. For what you're doing, you'd likely be better served by using pandas than directly using a structured array.
That having been said, here's how you'd get what you're asking:
def structured_shape(x):
if len(x.dtype) > 0
return list(x.shape) + [len(x.dtype)]
else:
return x.shape

How to construct an np.array with fromiter

I'm trying to construct an np.array by sampling from a python generator, that yields one row of the array per invocation of next. Here is some sample code:
import numpy as np
data = np.eye(9)
labels = np.array([0,0,0,1,1,1,2,2,2])
def extract_one_class(X,labels,y):
""" Take an array of data X, a column vector array of labels, and one particular label y. Return an array of all instances in X that have label y """
return X[np.nonzero(labels[:] == y)[0],:]
def generate_points(data, labels, size):
""" Generate and return 'size' pairs of points drawn from different classes """
label_alphabet = np.unique(labels)
assert(label_alphabet.size > 1)
for useless in xrange(size):
shuffle(label_alphabet)
first_class = extract_one_class(data,labels,label_alphabet[0])
second_class = extract_one_class(data,labels,label_alphabet[1])
pair = np.hstack((first_class[randint(0,first_class.shape[0]),:],second_class[randint(0,second_class.shape[0]),:]))
yield pair
points = np.fromiter(generate_points(data,labels,5),dtype = np.dtype('f8',(2*data.shape[1],1)))
The extract_one_class function returns a subset of data: all data points belonging to one class label. I would like to have points be an np.array with shape = (size,data.shape[1]). Currently the code snippet above returns an error:
ValueError: setting an array element with a sequence.
The documentation of fromiter claims to return a one-dimensional array. Yet others have used fromiter to construct record arrays in numpy before (e.g http://iam.al/post/21116450281/numpy-is-my-homeboy).
Am I off the mark in assuming I can generate an array in this fashion? Or is my numpy just not quite right?

As you've noticed, the documentation of np.fromiter explains that the function creates a 1D array. You won't be able to create a 2D array that way, and #unutbu method of returning a 1D array that you reshape afterwards is a sure go.
However, you can indeed create structured arrays using fromiter, as illustrated by:
>>> import itertools
>>> a = itertools.izip((1,2,3),(10,20,30))
>>> r = np.fromiter(a,dtype=[('',int),('',int)])
array([(1, 10), (2, 20), (3, 30)],
dtype=[('f0', '<i8'), ('f1', '<i8')])
but look, r.shape=(3,), that is, r is really nothing but 1D array of records, each record being composed of two integers. Because all the fields have the same dtype, we can take a view of r as a 2D array
>>> r.view((int,2))
array([[ 1, 10],
[ 2, 20],
[ 3, 30]])
So, yes, you could try to use np.fromiter with a dtype like [('',int)]*data.shape[1]: you'll get a 1D array of length size, that you can then view this array as ((int, data.shape[1])). You can use floats instead of ints, the important part is that all fields have the same dtype.
If you really want it, you can use some fairly complex dtype. Consider for example
r = np.fromiter(((_,) for _ in a),dtype=[('',(int,2))])
Here, you get a 1D structured array with 1 field, the field consisting of an array of 2 integers. Note the use of (_,) to make sure that each record is passed as a tuple (else np.fromiter chokes). But do you need that complexity?
Note also that as you know the length of the array beforehand (it's size), you should use the counter optional argument of np.fromiter for more efficiency.

You could modify generate_points to yield single floats instead of np.arrays, use np.fromiter to form a 1D array, and then use .reshape(size, -1) to make it a 2D array.
points = np.fromiter(
generate_points(data,labels,5)).reshape(size, -1)

Following some suggestions here, I came up with a fairly general drop-in replacement for numpy.fromiter() that satisfies the requirements of the OP:
import numpy as np
def fromiter(iterator, dtype, *shape):
"""Generalises `numpy.fromiter()` to multi-dimesional arrays.
Instead of the number of elements, the parameter `shape` has to be given,
which contains the shape of the output array. The first dimension may be
`-1`, in which case it is inferred from the iterator.
"""
res_shape = shape[1:]
if not res_shape: # Fallback to the "normal" fromiter in the 1-D case
return np.fromiter(iterator, dtype, shape[0])
# This wrapping of the iterator is necessary because when used with the
# field trick, np.fromiter does not enforce consistency of the shapes
# returned with the '_' field and silently cuts additional elements.
def shape_checker(iterator, res_shape):
for value in iterator:
if value.shape != res_shape:
raise ValueError("shape of returned object %s does not match"
" given shape %s" % (value.shape, res_shape))
yield value,
return np.fromiter(shape_checker(iterator, res_shape),
[("_", dtype, res_shape)], shape[0])["_"]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy fromfile and structured arrays - python

Related

Efficient way of creating numpy array of fixed size tuples from bytes

Manipulating copied numpy array without changing the original

Build an ever growing 3D numpy array

Get dimensions of numpy structured (i.e. record) array?

How to construct an np.array with fromiter

Categories

Resources