Say I have a numpy structured array (a.k.a. record array):
record_types = np.dtype([
('date',object), #00 - Timestamp
('lats',float), #01 - Latitude
('lons',float), #02 - Longitude
('vals',float), #03 - Value
])
data = np.zeros(10, dtype=record_types)
If I try to call the shape attribute, I get (10,)
How can I do something like the following:
y, x = data.shape
To get y = 10 and x = 4
Thanks!
This is one of the confusing things about structured arrays.
You basically have a (n-1)D array where each item is a C-like struct.
This type of structure allows for all kinds of useful things (easy file IO with binary formats, for example), but it's quite confusing for a lot of other use cases. For what you're doing, you'd likely be better served by using pandas than directly using a structured array.
That having been said, here's how you'd get what you're asking:
def structured_shape(x):
if len(x.dtype) > 0
return list(x.shape) + [len(x.dtype)]
else:
return x.shape
Related
Apologies is the title is not correct. I didn't know how to describe exactly what I am looking for.
So coming from Matlab I want to be able to do the following in Python in one line if possible
Say I have an index array:
index_array= np.array([0,1,1,0,0,0,1])
and a data array:
data_array = np.zeros([len(index_array),2])
I want to place a value (e.g. 100) where index_array=0 to the data_array[:,0] and where index_array=1 to data_array[:,1]
I matlab you could do it with one line. Something like
data_array(index_array)=100
The best I could figure out in python is this
data_array [index_array==0,0]=100
data_array [index_array==1,1]=100
Is it possible to do it more efficiently (w.r.t. lines of code). Also it would be nice to scale for additional dimensions in data_array (beyond 2d)
If I understand your question correctly, please try this:
import numpy as np
index_array= np.array([0,1,1,0,0,0,1],dtype=bool)
data_array = np.zeros([len(index_array),2])
data_array[index_array,:]=100
print(data_array)
Here, index_array is instantiated as boolean. Or it can convert index_array to boolean using == (index_array==0).
Use data_array[index_array,...]=100 if there are additional dimensions.
------ CORRECTED VERSION ------
Hope I now understand your question. So try this:
import numpy as np
index_array= np.array([0,1,1,0,2,0,1,2])
data_array = np.zeros([len(index_array),3])
data_array[np.arange(len(index_array)), index_array] = 100
print(data_array)
I have a function that accepts multiple 2D arrays and creates two new arrays with the same shape. It was originally written to only support numpy arrays, but was "hacked" to support dask arrays if a "chunks" attribute was seen. A user who was using xarray DataArrays pointed out that this function now returns dask arrays because DataArray's have a "chunks" attribute.
I'm wondering if the dask/xarray experts can tell me what the cleanest way to support all 3 (4?) object types might be without having to duplicate code for each type (numpy array, dask array, xarray with numpy, xarray with dask). Keep in mind the inputs are 2D arrays so the masking operations involved aren't supported out of the box. The related pull request for fixing this is here. Here is what we have so far while trying to avoid adding xarray and dask as required dependencies:
if hasattr(az_, 'chunks') and not hasattr(az_, 'loc'):
# dask array, but not xarray
import dask.array as da
az_ = da.where(top_s > 0, az_ + np.pi, az_)
az_ = da.where(az_ < 0, az_ + 2 * np.pi, az_)
elif hasattr(az_, 'loc'):
# xarray
az_.data[top_s > 0] += np.pi
az_.data[az_.data < 0] += 2 * np.pi
else:
az_[top_s > 0] += np.pi
az_[az_ < 0] += 2 * np.pi
Edit: Is there an attribute that is semi-unique to xarray objects?
OK. You may want to avoid unnecessary dependence.
I often define has_dask variable
try:
import dask.array as da
has_dask = True
except ImportError:
has_dask = False
and then
if has_dask and isinstance(az_, da.Array):
--- do some thing ---
else
--- do some other thing ----
I'm a little late to the party here, but if this something you're doing a lot then you might consider a function decorator that will coerce your input array down to an ndarray (or whatever the case may be), run the wrapped function, and maybe even rewrap the result to match the input type before returning it. It's something I've played around with a couple of times, but I kept deciding that I'd rather be able to leverage and support xarray objects when possible. I spent some time looking at xr-scipy when I first started playing with xarray. You might find some patterns in there that would be generic enough (or could easily be made so) while adding a little extra something for xarray objects when appropriate.
If I have a numpy array x, I can get its data type by using dtype like this:
t = x.dtype
However, that obviously won't work for things like lists. I wonder if there is a standard way of retrieving types for lists and numpy arrays. In the case of lists, I guess this would mean the largest type which fits all of the data. For instance, if
x = [ 1, 2.2 ]
I would want such a method to return float, or better yet numpy.float64.
Intuitively, I thought that this was the purpose of the numpy.dtype method. However, that is not the case. That method is used to create a type, not extract a type.
The only method that I know of getting a type is to wrap whatever object is passed in with a numpy array, and then get the dtype:
def dtype(x):
return numpy.asarray(x).dtype
The issue with this approach, however, is that it will copy the array if it is not already a numpy array. In this circumstance, that is extremely heavy for such a simple operation.
So is there a numpy method that I can use which won't require me to do any list copies?
EDIT
I am designing a library for doing some geometric manipulations... Conversions between rotation matrices, rotation vectors, quaternions, euler angles, etc.
It can easily happen that the user is simply working with a single rotation vector (which has 3 elements). In that case, they might write something like
q = vectorToQuaternion([ .1, 0, 0 ])
In this case, I would want the output quaternion to be a numpy array of type numpy.float64. However, sometimes to speed up the calculations, the user might want to use a numpy array of float32's:
q = vectorToQuaternion(numpy.float32([ .1, 0, 0 ]))
In which case, I think it is natural to expect that the output is the same type.
The issue is that I cannot use the zeros_like function (or empty_like, etc) because a quaternion has 4 components, while a vector has 3. So internally, I have to do something like
def vectorToQuaternion(v):
q = empty( (4,), dtype = asarray(v).dtype )
...
If there was a way of using empty_like which extracts all of the properties of the input, but lets me specify the shape of the output, then that would be the ideal function for me. However, to my knowledge, you cannot specify the shape in the call to empty_like.
EDIT
Here are some gists for the class I am talking about, and a test class (so that you can see how I intend to use it).
Class: https://gist.github.com/mholzel/c3af45562a56f2210270d9d1f292943a
Tests: https://gist.github.com/mholzel/1d59eecf1e77f21be7b8aadb37cc67f2
If you really want to do it that way you will probably have to use np.asarray, but I'm not sure that's the most solid way of dealing with the problem. If the user forgets to add . and gives [1, 0, 0] then you will be creating integer outputs, which most definitely does not make sense for quaternions. I would default to np.float64, using the dtype of the input if it is an array of some float type, and maybe also giving the option to explicitly pass a dtype:
import numpy as np
def vectorToQuaternion(v, dtype=None):
if dtype is None:
if isinstance(v, np.ndarray) and np.issubdtype(v.dtype, np.floating):
# Or if you prefer:
if np.issubdtype(getattr(v, 'dtype', np.int), np.floating):
dtype = v.dtype
else:
dtype = np.float64
q = np.empty((4,), dtype=dtype)
# ...
I have a function (MyFunct(X)) that, depending on the value of X, will return either a 3D numpy array (e.g np.ones((5,2,3)) or an empty array (np.array([])).
RetVal = MyFunct(X) # RetVal can be np.array([]) or np.ones((5,2,3))
NB I'm using np.ones((5,2,3)) as a way to generate fake data - in reality the content of the RetVal is all integers.
MyFunct is called with a range of different X values, some of which will lead to an empty array being returned while others don't.
I'd like to create a new 3D numpy array (OUT) which is an n by 2 by 3 concatenated array of all the returned values from MyFunct(). This issue is trying to concatenate a 3D array and an empty list causes an exception (understandably!) rather than just silently not doing anything. There are various ways around this:
Explicitly checking if the RetVal is empty or not and then use np.concatenate()
Using a try/except block and catching exceptions
Adding each value to a list and then post-processing by removing empty entries
But these all feel ugly. Is there an efficient/fast way to do this 'correctly'?
You can reshape arrays to compatible shape :
concatenate([MyFunct(X).reshape((-1,2,3)) for X in values])
Example :
In [2]: def MyFunc(X): return ones((5,2,3)) if X%2 else array([])
In [3]: concatenate([MyFunc(X).reshape((-1,2,3)) for X in range(6)]).shape
Out[3]: (15, 2, 3)
I'm trying to use numpy.fromfile to read a structured array (file header) by passing in a user defined data-type. For some reason, my structured array elements are coming back as 2-d Arrays instead of flat 1D arrays:
headerfmt='20i,20f,a80'
dt = np.dtype(headerfmt)
header = np.fromfile(fobj,dtype=dt,count=1)
ints,floats,chars = header['f0'][0], header['f1'][0], header['f2'][0]
# ^? ^? ^?
How do I modify headerfmt so that it will read them as flat 1D arrays?
If the count will always be 1, just do:
header = np.fromfile(fobj, dtype=dt, count=1)[0]
You'll still be able to index by field name, though the repr of the array won't show the field names.
For example:
import numpy as np
headerfmt='20i,20f,a80'
dt = np.dtype(headerfmt)
# Note the 0-index!
x = np.zeros(1, dtype=dt)[0]
print x['f0'], x['f1'], x['f2']
ints, floats, chars = x
It may or may not be ideal for your purposes, but it's simple, at any rate.