We both know that: "Numpy array is multidimensional array of objects of all the same type"
However, I could create a Numpy array that contains different data types as example below. Can anyone give an explain, how it could be.
import numpy as np
a = np.array([('a',1),('b',2)],dtype=[('alpha','U11'),('num','i8')])
print(a[0][1]+1)
print(len(a[0][0]))
Output:
2
1
Those are numpy records:
https://numpy.org/doc/stable/user/basics.rec.html
Numpy provides two data structures, the homogeneous arrays and the structured (aka record) arrays. The latter one, what you just stumbled across, is a structure that not only allows you to have different data types (float, int, str, etc.) but also provides handy methods to access them, through labels for instance.
In Numpy: it's called Structured arrays
Please read more here:
https://numpy.org/doc/stable/user/basics.rec.html
P/S: thanks Brandt
Related
I am trying to get a an array generated from applying differnt functions all stored in a numpy array on the same parameter, is there an efficient way coding this using numpy?
#func_array- a numpy array of different functions that get the same parameter
#X - parameter for evey function in func_array
def aplly_all(func_array, X):
return func_array(X)
#where return value is an array where index i has the value - func_array[i](X)
the only solution i thought of is iterating through the func_array and i wonder if there is a faster way of doing it
I once had the exact same questions, and this is what I was told:
The vectorization speed-up that numpy array operations provide is due to the base data-types defined for the array (say an array of floats, for instance).
When the array elements are objects, this advantage is mostly nullified. Since functions are objects, func_array is an array of objects. Thus any other method will hardly provide any speedup over iteration.
This is what I've learnt. I'm open to more experienced advice.
I have a Python program that needs to pass an array to a .dll that is expecting an array of c doubles. This is currently done by the following code, which is the fastest method of conversion I could find:
from array import array
from ctypes import *
import numpy as np
python_array = np.array(some_python_array)
temp = array('d', python_array.astype('float'))
c_double_array = (c_double * len(temp)).from_buffer(temp)
...where 'np.array' is just there to show that in my case the python_array is a numpy array. Let's say I now have two c_double arrays: c_double_array_a and c_double_array_b, the issue I'm having is I would like to append c_double_array_b to c_double_array_a without reconverting to/from whatever python typically uses for arrays. Is there a way to do this with the ctypes library?
I've been reading through the docs here but nothing seems to detail combining two c_type arrays after creation. It is very important in my program that they can be combined after creation, of course it would be trivial to just append python_array_b to python_array_a and then convert but that won't work in my case.
Thanks!
P.S. if anyone knows a way to speed up the conversion code that would also be greatly appreciated, it takes on the order of 150ms / million elements in the array and my program typically handles 1-10 million elements at a time.
Leaving aside the construction of the ctypes arrays (for which Mark's comment is surely relevant), the issue is that C arrays are not resizable: you can't append or extend them. (There do exist wrappers that provide these features, which may be useful references in constructing this.) What you can do is make a new array of the size of the two existing arrays together and then ctypes.memmove them into it. It might be possible to improve the performance by using realloc, but you'd have to go even lower than normal ctypes memory management to use it.
I came across an oddity when loading a .mat-file created in Matlab into Python with scipy.io.loadmat. I found similar 'array structures' being alluded to in other posts, but none explaining them. Also, I found ways to work around this oddity, but I would like to understand why Python (or scipy.io.loadmat) handles files this way.
Let's say I create a cell in Matlab and save it:
my_data = cell(dim1, dim2);
% Fill my_data with strings and floats...
save('my_data.mat','my_data')
Now I load it into Python:
import scipy.io as sio
data = sio.loadmat('my_data.mat')['my_data']
Now data has type numpy.ndarray and dtype object. When I look at a slice, it might look something like this:
data[0]
>>> array([array(['Some descriptive string'], dtype='<U13'),
array([[3.141592]]), array([[2.71828]]), array([[4.66920]]), etc.
], dtype=object).
Why is this happening? Why does Python/sio.loadmat create an array of single-element arrays, rather than an array of floats (assuming I remove the first column, which contain strings)?
I'm sorry if my question is basic, but I'd really like to understand what seems like an unnecessary complication.
As said in the comments:
This behaviour comes because you are saving a cell, an "array" that can contain anything inside. You fill this with matrices of size 1x1 (floats).
That is what python is giving you. an nparray of dtype=object that has inside 1x1 arrays.
Python is doing exactly what MATLAB was doing. For this example, you should just avoid using cells in MATLAB.
I would like to know if numbers bigger than what int64 or float128 can be correctly processed by numpy functions
EDIT: numpy functions applied to numbers/python objects outside of any numpy array. Like using a np function in a list comprehension that applies to the content of a list of int128?
I can't find anything about that in their docs, but I really don't know what to think and expect. From tests, it should work but I want to be sure, and a few trivial test won't help for that. So I come here for knowledge:
If np framework is not handling such big numbers, are its functions able to deal with these anyway?
EDIT: sorry, I wasn't clear. Please see the edit above
Thanks by advance.
See the Extended Precision heading in the Numpy documentation here. For very large numbers, you can also create an array with dtype set to 'object', which will allow you essentially to use the Numpy framework on the large numbers but with lower performance than using native types. As has been pointed out, though, this will break when you try to call a function not supported by the particular object saved in the array.
import numpy as np
arr = np.array([10**105, 10**106], dtype='object')
But the short answer is that you you can and will get unexpected behavior when using these large numbers unless you take special care to account for them.
When storing a number into a numpy array with a dtype not sufficient to store it, you will get truncation or an error
arr = np.empty(1, dtype=np.int64)
arr[0] = 2**65
arr
Gives OverflowError: Python int too large to convert to C long.
arr = np.empty(1, dtype=float16)
arr[0] = 2**64
arr
Gives inf (and no error)
arr[0] = 2**15 + 2
arr
Gives [ 32768.] (i.e., 2**15), so truncation occurred. It would be harder for this to happen with float128...
You can have numpy arrays of python objects, which could be a python integer which is too big to fit in np.int64. Some of numpy's functionality will work, but many functions call underlying c code which will not work. Here is an example:
import numpy as np
a = np.array([123456789012345678901234567890]) # a has dtype object now
print((a*2)[0]) # Works and gives the right result
print(np.exp(a)) # Does not work, because "'int' object has no attribute 'exp'"
Generally, most functionality will probably be lost for your extremely large numbers. Also, as it has been pointed out, when you have an array with a dtype of np.int64 or similar, you will have overflow problems, when you increase the size of your array elements over that types limit. With numpy, you have to be careful about what your array's dtype is!
I have two datasets that I need to correlate in Python. One array is a .mat file and the other is a list of .bin files. From these datasets I have created two 3D arrays with the same extent (120x112x244). While familiar with Python I have not worked with such datasets before, and thus am seeking advice on how to correlate these arrays. I attempted numpy correlate and received:
"ValueError: object too deep for desired array"
Any suggestions would be greatly appreciated
One idea I would try is to flatten the 3D matrix first, then use coorelate -- since coorelate only takes 1D vectors.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.correlate.html.
Let's say your two matricies are called A and B.
>>> import numpy
>>> array_a = numpy.ndarray.flatten(A)
>>> array_b = numpy.ndarray.flatten(B)
>>> results = numpy.correlate(array_a, array_b)