If I construct a numpy matrix like this:
A = array([[1,2,3],[4,5,6]])
and then type A.shape I get the result:
(2L, 3L)
Why am I getting a shape with the format long?
I can restart everything and I still have the same problem. And as far as I can see, it is only when I construct arrays I have this problem, otherwise I get short (regular) integers.
As #CédricJulien puts it on the comment, there is no problem with long numbers in this case - this should be treated as an implementation detail.
The real answer for your question can, of course, only be found inside numpy's source code, but the fact that the dimensions are long in this case should not matter for any use you have for the arrays or these indexes.
Related
I came across an oddity when loading a .mat-file created in Matlab into Python with scipy.io.loadmat. I found similar 'array structures' being alluded to in other posts, but none explaining them. Also, I found ways to work around this oddity, but I would like to understand why Python (or scipy.io.loadmat) handles files this way.
Let's say I create a cell in Matlab and save it:
my_data = cell(dim1, dim2);
% Fill my_data with strings and floats...
save('my_data.mat','my_data')
Now I load it into Python:
import scipy.io as sio
data = sio.loadmat('my_data.mat')['my_data']
Now data has type numpy.ndarray and dtype object. When I look at a slice, it might look something like this:
data[0]
>>> array([array(['Some descriptive string'], dtype='<U13'),
array([[3.141592]]), array([[2.71828]]), array([[4.66920]]), etc.
], dtype=object).
Why is this happening? Why does Python/sio.loadmat create an array of single-element arrays, rather than an array of floats (assuming I remove the first column, which contain strings)?
I'm sorry if my question is basic, but I'd really like to understand what seems like an unnecessary complication.
As said in the comments:
This behaviour comes because you are saving a cell, an "array" that can contain anything inside. You fill this with matrices of size 1x1 (floats).
That is what python is giving you. an nparray of dtype=object that has inside 1x1 arrays.
Python is doing exactly what MATLAB was doing. For this example, you should just avoid using cells in MATLAB.
I am interested in sorting a matrix of type
a = array([[1,2,3],[4,5,6],[0,0,1]])
by some column as discussed here. One straight forward answer given there is
a[a[:,1].argsort()]
However, this seems to break the array in some cases as also commented there. In my case I start with a np.array of .shape (a,b). After the above code, I end up with an array of .shape (a,1,b). What are potential reasons for this behaviour?
I would like to know if numbers bigger than what int64 or float128 can be correctly processed by numpy functions
EDIT: numpy functions applied to numbers/python objects outside of any numpy array. Like using a np function in a list comprehension that applies to the content of a list of int128?
I can't find anything about that in their docs, but I really don't know what to think and expect. From tests, it should work but I want to be sure, and a few trivial test won't help for that. So I come here for knowledge:
If np framework is not handling such big numbers, are its functions able to deal with these anyway?
EDIT: sorry, I wasn't clear. Please see the edit above
Thanks by advance.
See the Extended Precision heading in the Numpy documentation here. For very large numbers, you can also create an array with dtype set to 'object', which will allow you essentially to use the Numpy framework on the large numbers but with lower performance than using native types. As has been pointed out, though, this will break when you try to call a function not supported by the particular object saved in the array.
import numpy as np
arr = np.array([10**105, 10**106], dtype='object')
But the short answer is that you you can and will get unexpected behavior when using these large numbers unless you take special care to account for them.
When storing a number into a numpy array with a dtype not sufficient to store it, you will get truncation or an error
arr = np.empty(1, dtype=np.int64)
arr[0] = 2**65
arr
Gives OverflowError: Python int too large to convert to C long.
arr = np.empty(1, dtype=float16)
arr[0] = 2**64
arr
Gives inf (and no error)
arr[0] = 2**15 + 2
arr
Gives [ 32768.] (i.e., 2**15), so truncation occurred. It would be harder for this to happen with float128...
You can have numpy arrays of python objects, which could be a python integer which is too big to fit in np.int64. Some of numpy's functionality will work, but many functions call underlying c code which will not work. Here is an example:
import numpy as np
a = np.array([123456789012345678901234567890]) # a has dtype object now
print((a*2)[0]) # Works and gives the right result
print(np.exp(a)) # Does not work, because "'int' object has no attribute 'exp'"
Generally, most functionality will probably be lost for your extremely large numbers. Also, as it has been pointed out, when you have an array with a dtype of np.int64 or similar, you will have overflow problems, when you increase the size of your array elements over that types limit. With numpy, you have to be careful about what your array's dtype is!
I have a function that I want to have quickly access the first (aka zeroth) element of a given Numpy array, which itself might have any number of dimensions. What's the quickest way to do that?
I'm currently using the following:
a.reshape(-1)[0]
This reshapes the perhaps-multi-dimensionsal array into a 1D array and grabs the zeroth element, which is short, sweet and often fast. However, I think this would work poorly with some arrays, e.g., an array that is a transposed view of a large array, as I worry this would end up needing to create a copy rather than just another view of the original array, in order to get everything in the right order. (Is that right? Or am I worrying needlessly?) Regardless, it feels like this is doing more work than what I really need, so I imagine some of you may know a generally faster way of doing this?
Other options I've considered are creating an iterator over the whole array and drawing just one element from it, or creating a vector of zeroes containing one zero for each dimension and using that to fancy-index into the array. But neither of these seems all that great either.
a.flat[0]
This should be pretty fast and never require a copy. (Note that a.flat is an instance of numpy.flatiter, not an array, which is why this operation can be done without a copy.)
You can use a.item(0); see the documentation at numpy.ndarray.item.
A possible disadvantage of this approach is that the return value is a Python data type, not a numpy object. For example, if a has data type numpy.uint8, a.item(0) will be a Python integer. If that is a problem, a.flat[0] is better--see #user2357112's answer.
np.hsplit(x, 2)[0]
Source: https://numpy.org/doc/stable/reference/generated/numpy.dsplit.html
Source:
https://numpy.org/doc/stable/reference/generated/numpy.hsplit.html
## y -- numpy array of shape (1, Ty)
if you want to get the first element:
use y.shape[0]
if you want to get the second element:
use y.shape[1]
Source:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.take.html
You can also use the take for more complicated extraction (to get few elements):
numpy.take(a, indices, axis=None, out=None, mode='raise')[source] Take
elements from an array along an axis.
If I construct a numpy matrix like this:
A = array([[1,2,3],[4,5,6]])
and then type A.shape I get the result:
(2L, 3L)
Why am I getting a shape with the format long?
I can restart everything and I still have the same problem. And as far as I can see, it is only when I construct arrays I have this problem, otherwise I get short (regular) integers.
As #CédricJulien puts it on the comment, there is no problem with long numbers in this case - this should be treated as an implementation detail.
The real answer for your question can, of course, only be found inside numpy's source code, but the fact that the dimensions are long in this case should not matter for any use you have for the arrays or these indexes.