I'm trying to fill an array with integers, but it seems like numpy array keep turning the integers into floats. Why is this happening and how do I stop this?
arr = np.empty(9)
arr[3] = 7
print(arr[3])
>>>7.0
NumPy arrays, unlike Python lists, can contain only a single type, which (as far as I know) is set at creation time. Everything you put into the array gets converted to that type.
By default, the data type is assumed to be float. To set another type, you can pass dtype to the empty function like this:
>>> arr = np.empty(9, dtype=int)
>>> arr[3] = 7
>>> arr[3]
7
Related
I am creating a ndarray using:
import numpy as np
arr=np.array({1,2})
print(arr, type(arr))
which outputs
{1, 2} <class 'numpy.ndarray'>
If its type is numpy.ndarray, then o/p must be in square brackets like [1,2]?
Thanks
Yes, but it's because you put on the function np.array a set and not a list
if you try this:
import numpy as np
arr=np.array([1,2])
print(arr, type(arr))
you get:
[1 2] <class 'numpy.ndarray'>
This does something slightly different than you might imagine. Instead of constructing an array with the data you specify, the numbers 1 and 2, you're actually building an array of type object. See below:
>>> np.array({1, 2)).dtype
dtype('O')
This is because sets are not "array-like", in NumPy's terminology, in particular they are not ordered. Thus the array construction does not build an array with the contents of the set, but with the set itself as a single object.
If you really want to build an array from the set's contents you could do the following:
>>> x = np.fromiter(iter({1, 2}), dtype=int)
>>> x.dtype
dtype('int64')
Edit: This answer helps explain how various types are used to build an array in NumPy.
It returns a numpy array object with no dimensions. A set is an object. It is similar to passing numpy.array a number (without brackets). See the difference here:
arr=np.array([1])
arr.shape: (1,)
arr=np.array(1)
arr.shape: ()
arr=np.array({1,2})
arr.shape: ()
Therefore, it treats your entire set as a single object and creates a numpy array with no dimensions that only returns the set object. Sets are not array-like and do not have order, hence according to numpy array doc they are not converted to arrays like you expect. If you wish to create a numpy array from a set and you do not care about its order, use:
arr=np.fromiter({1,2},int)
arr.shape: (2,)
The repr display of ipython may make this clearer:
In [162]: arr=np.array({1,2})
In [163]: arr
Out[163]: array({1, 2}, dtype=object)
arr is a 0d array, object dtype, contain 1 item, the set.
But if we first turn the set into a list:
In [164]: arr=np.array(list({1,2}))
In [165]: arr
Out[165]: array([1, 2])
now we have a 1d (2,) integer dtype array.
np.array(...) converts list (and list like) arguments into a multdimensional array. A set is not sufficiently list-like.
I'm trying to use prange in order to process multiple strings.
As it is not possible to do this with a python list, I'm using a numpy array.
With an array of floats, this function works :
from cython.parallel import prange
cimport numpy as np
from numpy cimport ndarray as ar
cpdef func_float(ar[np.float64_t,cast=True] x, double alpha):
cdef int i
for i in prange(x.shape[0], nogil=True):
x[i] = alpha * x[i]
return x
When I try this simple one :
cpdef func_string(ar[np.str,cast=True] x):
cdef int i
for i in prange(x.shape[0], nogil=True):
x[i] = x[i] + str(i)
return x
I'm getting this
>> func_string(x = np.array(["apple","pear"],dtype=np.str))
File "processing.pyx", line 8, in processing.func_string
cpdef func_string(ar[np.str,cast=True] x):
ValueError: Item size of buffer (20 bytes) does not match size of 'str object' (8 bytes)
I'm probably missing something and I can't find an alternative to str.
Is there a way to properly use prange with an array of string ?
Beside the fact, that your code should fail when cythonized, because you try to create a Python-object (i.e. str(i)) without gil, your code isn't doing what you think it should do.
In order to analyse what is going on, let's take a look at a much simple cython-version:
%%cython -2
cimport numpy as np
from numpy cimport ndarray as ar
cpdef func_string(ar[np.str, cast=True] x):
print(len(x))
From your error message, one can deduct that you use Python 3 and the Cython-extension is built with (still default) language_level=2, thus I'm using -2 in the %%cython-magic cell.
And now:
>>> x = np.array(["apple", "pear"], dtype=np.str)
>>> func_string(x)
ValueError: Item size of buffer (20 bytes) does not match size of 'str object' (8 bytes)
What is going on?
x is not what you think it is
First, let's take a look at x:
>>> x.dtype
<U5
So x isn't a collection of unicode-objects. An element of x consist of 5 unicode-characters and those elements are stored contiguously in memory, one after another. What is important: The same information as in unicode-objects stored in a different memory layout.
This is one of numpy's quirks and how np.array works: every element in the list is converted to an unicode-object, than the maximal size of the element is calculated and dtype (in this case <U5) is calculated and used.
np.str is interpreted differently in cython code (ar[np.str] x) (twice!)
First difference: in your Python3-code np.str is for unicode, but in your cython code, which is cythonized with language_level=2, np.str is for bytes (see doc).
Second difference: seeing np.str, Cython will interpret it as array with Python-objects (maybe it should be seen as a Cython-bug) - it is almost the same as if dtype were np.object - actually the only difference to np.object are slightly different error messages.
With this information we can understand the error message. During the runtime, the input-array is checked (before the first line of the function is executed!):
expected is an array with python-objects, i.e. 8-byte pointers, i.e. array with element size of 8bytes
received is an array with element size 5*4=20 bytes (one unicode-character is 4 bytes)
thus the cast cannot be done and the observed exception is thrown.
you cannot change the size of an element in an <U..-numpy-array:
Now let's take a look at the following:
>>> x = np.array(["apple", b"pear"], dtype=np.str)
>>> x[0] = x[0]+str(0)
>>> x[0]
'apple'
the element didn't change, because the string x[0]+str(0) was truncated while written back to x-array: there is only place for 5 characters! It would work (to some degree, as long as resulting string has no more than 5 characters) with "pear" though:
>>> x[1] = x[1]+str(1)
>>> x[1]
'pear0'
Where does this all leave you?
you probably want to use bytes and not unicodes (i.e. dtype=np.bytes_)
given you don't know the element size of your numpy-array at the compile type, you should declare the input-array x as ar x in the signature and roll out the runtime checks, similar as done in the Cython's "depricated" numpy-tutorial.
if changes should be done in-place, the elements in the input-array should be big enough for the resulting strings.
All of the above, has nothing to do with prange. To use prange you cannot use str(i) because it operates on python-objects.
I have a function which works on arrays with more than one item, but fails if the array contains only one item. Let's consider this example
import numpy as np
def checker(a):
a[a>5] = np.nan
a = np.arange(10)
a = checker(a)
Works, but
a = 1
a = checker(a) # fails
and gives
Traceback (most recent call last):
a[a>5] = np.nan
TypeError: 'int' object does not support item assignment
I'd like to handle it like MATLAB, and NOT like this version of checker(), which has 4x more lines than the version above.
def checker(a):
try:
a[a>5] = np.nan
except TypeError:
if a>5: a = np.nan
In MATLAB everything has atleast 2 dimensons; in numpy indexing can reduce the number of dimensions
np.shape(1)
is ()? This is the same as np.array(1).shape, i.e. the shape (size in MATLAB terms) of a single element array. It is 0d, as opposed to 2d in MATLAB.
a = np.empty(np.shape(1))*np.nan
# a = np.array(np.nan) does the same thing
is nan, a single element array with value nan.
a[False]
displays as array([], dtype=float), with shape (0,); it's now 1d, but without any elements.
With a 0d array, the only meaningful indexing is a[()] which returns the element, nan, a np.float64. a.item() does the same.
And for assignment purposes I can't find a way to change the value of that item
a[???] = 0
correction, ellipsis can be used, since it stands in for any number of : including none.
a[...] = 0
# array(0,0)
(you don't wand a=0 since that just reassigns the variable).
In general 0d arrays like this are possible, but they are rarely useful.
I'm not entirely sure what you are trying to do (I don't have a working Octave session at the moment). But this difference in how dimensions change with indexing is the key to your problems.
a = np.array([np.nan])
a[np.array([False])] = 0 # no change
a[np.array([True])] = 0 # change
Note that I made the boolean index an array, but just a scalar or list. This is closer to your MATLAB boolean index.
To create an empty array filled with nans, you can use np.fill:
a=np.empty(np.shape(1))
a.fill(np.nan)
b=False
a[b]=10
You were getting an error because a wasn't an array, it was a float.
When you multiply a scalar array by a scalar, numpy coerces the result into a scalar:
>>> a = np.empty(())
>>> a
array(10.0)
>>> a * 2
20.0
If you need to keep the scalar as an array, you can use np.asarray(a * 2)
(note: the original question was a bit different, to which the other answer applies; see the revision history for the original question.)
Is there a uniform way to index numpy arrays, when these arrays could be scalar as well?
I'm trying to write a function that deals with a float, a list of floats, or a 0/1D numpy array. To deal with that uniformly, I use numpy.asarray(), which works fine overall (I don't mind returning a numpy.float64 when the input is a standard Python float).
Problems arise when I need to deal with conditional operations and an intermediate array function, something like:
value = np.asarray(5.5)
mask = value > 5
tmpvalue = np.asarray(np.cos(value))
tmpvalue[mask] = value
This will throw an exception:
Traceback (most recent call last):
File "testscalars.py", line 27, in <module>
tmpvalue[mask] = value
IndexError: 0-d arrays can't be indexed
Is there any elegant solution to this?
It turns out this problem pertains to numpy 1.8 and before; upgrading to numpy 1.9(.2) fixes this.
The numpy 1.9 release notes have this to say:
Boolean indexing into scalar arrays will always return a new 1-d
array. This means that array(1)[array(True)] gives array([1]) and not
the original array.
which conveniently will turn tmpvalue[mask] temporarily into a 1D array, allowing it to be assigned to value:
tmpvalue[mask] = value
While not the actual answer to the question asked, the following is essentially what bit me and caused (Type)errors:
value = numpy.asarray(5.5)
mask = value > 5
tmpvalue = numpy.cos(value)
tmpvalue[mask] = value[mask]
The problem here is that value is of type numpy.ndarray, but since it's a 0-d array, numpy.cos returns a numpy.scalar, which can't be indexed.
I think this numpy issue is directly related to this problem.
For now, it appears that the simplest solution is to wrap the numpy ufuncs with numpy.asarray:
value = numpy.asarray(5.5)
mask = value > 5
tmpvalue = numpy.asarray(numpy.cos(value))
tmpvalue[mask] = value[mask]
which I've tested successfully with inputs 5.5, 4.5, [5.5], [4.5] and [4.5, 5.5].
Note that this behaviour also applies to even more common operations, like addition:
>>> x = numpy.asarray(5)
>>> y = numpy.asarray(6)
>>> z = x + y
>>> type(x), type(y), type(z)
(<class 'numpy.ndarray'>, <class 'numpy.ndarray'>, <class 'numpy.int64'>)
I came across the following oddity in numpy which may or may not be a bug:
import numpy as np
dt = np.dtype([('tuple', (int, 2))])
a = np.zeros(3, dt)
type(a['tuple'][0]) # ndarray
type(a[0]['tuple']) # ndarray
a['tuple'][0] = (1,2) # ok
a[0]['tuple'] = (1,2) # ValueError: shape-mismatch on array construction
I would have expected that both of the options below work.
Opinions?
I asked that on the numpy-discussion list. Travis Oliphant answered here.
Citing his answer:
The short answer is that this is not really a "normal" bug, but it could be considered a "design" bug (although the issues may not be straightforward to resolve). What that means is that it may not be changed in the short term --- and you should just use the first spelling.
Structured arrays can be a confusing area of NumPy for several of reasons. You've constructed an example that touches on several of them. You have a data-type that is a "structure" array with one member ("tuple"). That member contains a 2-vector of integers.
First of all, it is important to remember that with Python, doing
a['tuple'][0] = (1,2)
is equivalent to
b = a['tuple']; b[0] = (1,2)
In like manner,
a[0]['tuple'] = (1,2)
is equivalent to
b = a[0]; b['tuple'] = (1,2)
To understand the behavior, we need to dissect both code paths and what happens. You built a (3,) array of those elements in 'a'. When you write b = a['tuple'] you should probably be getting a (3,) array of (2,)-integers, but as there is currently no formal dtype support for (n,)-integers as a general dtype in NumPy, you get back a (3,2) array of integers which is the closest thing that NumPy can give you. Setting the [0] row of this object via
a['tuple'][0] = (1,2)
works just fine and does what you would expect.
On the other hand, when you type:
b = a[0]
you are getting back an array-scalar which is a particularly interesting kind of array scalar that can hold records. This new object is formally of type numpy.void and it holds a "scalar representation" of anything that fits under the "VOID" basic dtype.
For some reason:
b['tuple'] = [1,2]
is not working. On my system I'm getting a different error: TypeError: object of type 'int' has no len()
I think this should be filed as a bug on the issue tracker which is for the time being here: http://projects.scipy.org/numpy
The problem is ultimately the void->copyswap function being called in voidtype_setfields if someone wants to investigate. I think this behavior should work.
An explanation for this is given in a numpy bug report.
I get a different error than you do (using numpy 1.7.0.dev):
ValueError: setting an array element with a sequence.
so the explanation below may not be correct for your system (or it could even be the wrong explanation for what I see).
First, notice that indexing a row of a structured array gives you a numpy.void object (see data type docs)
import numpy as np
dt = np.dtype([('tuple', (int, 2))])
a = np.zeros(3, dt)
print type(a[0]) # = numpy.void
From what I understand, void is sort of like a Python list since it can hold objects of different data types, which makes sense since the columns in a structured array can be different data types.
If, instead of indexing, you slice out the first row, you get an ndarray:
print type(a[:1]) # = numpy.ndarray
This is analogous to how Python lists work:
b = [1, 2, 3]
print b[0] # 1
print b[:1] # [1]
Slicing returns a shortened version of the original sequence, but indexing returns an element (here, an int; above, a void type).
So when you slice into the rows of the structured array, you should expect it to behave just like your original array (only with fewer rows). Continuing with your example, you can now assign to the 'tuple' columns of the first row:
a[:1]['tuple'] = (1, 2)
So,... why doesn't a[0]['tuple'] = (1, 2) work?
Well, recall that a[0] returns a void object. So, when you call
a[0]['tuple'] = (1, 2) # this line fails
you're assigning a tuple to the 'tuple' element of that void object. Note: despite the fact you've called this index 'tuple', it was stored as an ndarray:
print type(a[0]['tuple']) # = numpy.ndarray
So, this means the tuple needs to be cast into an ndarray. But, the void object can't cast assignments (this is just a guess) because it can contain arbitrary data types so it doesn't know what type to cast to. To get around this you can cast the input yourself:
a[0]['tuple'] = np.array((1, 2))
The fact that we get different errors suggests that the above line might not work for you since casting addresses the error I received---not the one you received.
Addendum:
So why does the following work?
a[0]['tuple'][:] = (1, 2)
Here, you're indexing into the array when you add [:], but without that, you're indexing into the void object. In other words, a[0]['tuple'][:] says "replace the elements of the stored array" (which is handled by the array), a[0]['tuple'] says "replace the stored array" (which is handled by void).
Epilogue:
Strangely enough, accessing the row (i.e. indexing with 0) seems to drop the base array, but it still allows you to assign to the base array.
print a['tuple'].base is a # = True
print a[0].base is a # = False
a[0] = ((1, 2),) # `a` is changed
Maybe void is not really an array so it doesn't have a base array,... but then why does it have a base attribute?
This was an upstream bug, fixed as of NumPy PR #5947, with a fix in 1.9.3.