Apologies in advance - I seem to be having a very fundamental misunderstanding that I can't clear up. I have a fourvector class with variables for ct and the position vector. I'm writing code to perform an x-direction lorentz boost. The problem I'm running in to is that I, as it's written below, ct returns with a proper float value, but x does not. Messing around, I find that tempx is a float, but assigning tempx to r[0] does not make that into a float, instead it rounds down to an int. I have previously posted a question on mutability vs immutability, and I suspect this is the issue. If so I clearly have a deeper misunderstanding than expected. Regardless, there are a couple of questions I have;
1a) If instantiate a with a = FourVector(ct=5,r=[55,2.,3]), then type(a._r[0]) returns numpy.float64 as opposed to numpy.int32. What is going on here? I expected just a._r[1] to be a float, and instead it changes the type of the whole list?
1b) How do I get the above behaviour (The whole list being floats), without having to instantiate the variables as floats? I read up on the documentation and have tried various methods, like using astype(float), but everything I do seems to keep it as an int. Again, thinking this is the mutable/immutable problem I'm having.
2) I had thought, in the tempx=... line, multiplying by 1.0 would convert it to a float, as it appears this is the reason ct converts to a float, but for some reason it doesn't. Perhaps the same reason as the others?
import numpy as np
class FourVector():
def __init__(self, ct=0, x=0, y=0, z=0, r=[]):
self._ct = ct
self._r = np.array(r)
if r == []:
self._r = np.array([x,y,z])
def boost(self, beta):
gamma=1/np.sqrt(1-(beta ** 2))
tempct=(self._ct*gamma-beta*gamma*self._r[0])
tempx=(-1.0*self._ct*beta*gamma+self._r[0]*gamma)
self._ct=tempct
print(type(self._r[0]))
self._r[0]=tempx.astype(float)
print(type(self._r[0]))
a = FourVector(ct=5,r=[55,2,3])
b = FourVector(ct=1,r=[4,5,6])
print(a._r)
a.boost(.5)
print(a._r)
All your problems are indeed related.
A numpy array is an array that holds objects efficiently. It does this by having these objects be of the same type, like strings (of equal length) or integers or floats. It can then easily calculate just how much space each element needs and how many bytes it must "jump" to access the next element (we call these the "strides").
When you create an array from a list, numpy will try to determine a suitable data type ("dtype") from that list, to ensure all elements can be represented well. Only when you specify the dtype explicitly, will it not make an educated guess.
Consider the following example:
>>> import numpy as np
>>> integer_array = np.array([1,2,3]) # pass in a list of integers
>>> integer_array
array([1, 2, 3])
>>> integer_array.dtype
dtype('int64')
As you can see, on my system it returns a data type of int64, which is a representation of integers using 8 bytes. It chooses this, because:
numpy recognizes all elements of the list are integers
my system is a 64-bit system
Now consider an attempt at changing that array:
>>> integer_array[0] = 2.4 # attempt to put a float in an array with dtype int
>>> integer_array # it is automatically converted to an int!
array([2, 2, 3])
As you can see, once a datatype for an array was set, automatic casting to that datatype is done.
Let's now consider what happens when you pass in a list that has at least one float:
>>> float_array = np.array([1., 2,3])
>>> float_array
array([ 1., 2., 3.])
>>> float_array.dtype
dtype('float64')
Once again, numpy determines a suitable datatype for this array.
Blindly attempting to change the datatype of an array is not wise:
>>> integer_array.dtype = np.float32
>>> integer_array
array([ 2.80259693e-45, 0.00000000e+00, 2.80259693e-45,
0.00000000e+00, 4.20389539e-45, 0.00000000e+00], dtype=float32)
Those numbers are gibberish you might say. That's because numpy tries to reinterpret the memory locations of that array as 4-byte floats (the skilled people will be able to convert the numbers to binary representation and from there reinterpret the original integer values).
If you want to cast, you'll have to do it explicitly and numpy will return a new array:
>>> integer_array.dtype = np.int64 # go back to the previous interpretation
>>> integer_array
array([2, 2, 3])
>>> integer_array.astype(np.float32)
array([ 2., 2., 3.], dtype=float32)
Now, to address your specific questions:
1a) If instantiate a with a = FourVector(ct=5,r=[55,2.,3]), then type(a._r[0]) returns numpy.float64 as opposed to numpy.int32. What is going on here? I expected just a._r[1] to be a float, and instead it changes the type of the whole list?
That's because numpy has to determine a datatype for the entire array (unless you use a structured array), ensuring all elements fit in that datatype. Only then can numpy iterate over the elements of that array efficiently.
1b) How do I get the above behaviour (The whole list being floats), without having to instantiate the variables as floats? I read up on the documentation and have tried various methods, like using astype(float), but everything I do seems to keep it as an int. Again, thinking this is the mutable/immutable problem I'm having.
Specify the dtype when you are creating the array. In your code, that would be:
self._r = np.array(r, dtype=np.float)
2) I had thought, in the tempx=... line, multiplying by 1.0 would convert it to a float, as it appears this is the reason ct converts to a float, but for some reason it doesn't. Perhaps the same reason as the others?
That is true. Try printing the datatype of tempx, it should be a float. However, later on, you are reinserting that value into the array self._r, which has the dtype of int. And as you saw previously, that will cast the float back to an integer type.
Related
Why does Python not cast long numbers to numpy floats when doing sth. like
a = np.array([10.0, 56.0]) + long(10**47)
The dtype of the variable a is object. I did not expect this when during an maximum likelihood optimization problem one fit parameter B was an integer and thus 10**B became a long.
Is this due to fear of precision loss?
I suspect this is because python is able to store arbitrarily long integers and so numpy realizes that it can't safely cast the result to a known data type. Therefore, it falls back to treating the array as an array of python objects and multiplies elementwise using python's rules (which casts to a float).
You can see what the result type is by using np.result_type:
>>> np.result_type(np.array([10.0, 56.0],long(10**47))
dtype('O')
Based on the documentation for np.result_type what happens is:
First, np.min_scalar_type() is called on each of the inputs:
>>> np.min_scalar_type(np.array([10.0, 56.0]))
dtype('float64')
>>> np.min_scalar_type(long(10**47))
dtype('O')
Second, the result is determined by combining these types using np.promote_types:
>>> np.promote_types(np.float64,np.dtype('O'))
dtype('O')
I am new to programming and numpy... While reading tutorials and experimenting on jupyter-notebook... I thought of converting dtype of a numpy array as follows:
import numpy as np
c = np.random.rand(4)*10
print c
#Output1: [ 0.12757225 5.48992242 7.63139022 2.92746857]
c.dtype = int
print c
#Output2: [4593764294844833304 4617867121563982285 4620278199966380988 4613774491979221856]
I know the proper way of changing is:
c = c.astype(int)
But I want to the reason behind those ambiguous numbers in Output2. What are they and what do they signify?
Floats and integers (numpy.float64s and numpy.int64s) are represented differently in memory. The value 42 stored in these different types corresponds to a different bit pattern in memory.
When you're reassigning the dtype attribute of an array, you keep the underlying data unchanged, and you're telling numpy to interpret that pattern of bits in a new way. Since the interpretation now doesn't match the original definition of the data, you end up with gibberish (meaningless numbers).
On the other hand, converting your array via .astype() will actually convert the data in memory:
>>> import numpy as np
>>> arr = np.random.rand(3)
>>> arr.dtype
dtype('float64')
>>> arr
array([ 0.7258989 , 0.56473195, 0.20885672])
>>> arr.data
<memory at 0x7f10d7061288>
>>> arr.dtype = np.int64
>>> arr.data
<memory at 0x7f10d7061348>
>>> arr
array([4604713535589390862, 4603261872765946451, 4596692876638008676])
Proper conversion:
>>> arr = np.random.rand(3)*10
>>> arr
array([ 3.59591191, 1.21786042, 6.42272461])
>>> arr.astype(np.int64)
array([3, 1, 6])
As you can see, using astype will meaningfully convert the original values of the array, in this case it will truncate to the integer part, and return a new array with corresponding values and dtype.
Note that assigning a new dtype doesn't trigger any checks, so you can do very weird stuff with your array. In the above example, 64 bits of floats were reinterpreted as 64 bits of integers. But you can also change the bit size:
>>> arr = np.random.rand(3)
>>> arr.shape
(3,)
>>> arr.dtype
dtype('float64')
>>> arr.dtype = np.float32
>>> arr.shape
(6,)
>>> arr
array([ 4.00690371e+35, 1.87285304e+00, 8.62005305e+13,
1.33751166e+00, 7.17894062e+30, 1.81315207e+00], dtype=float32)
By telling numpy that your data occupies half the space than originally, numpy will deduce that your array has twice as many elements! Clearly not what you should ever want to do.
Another example: consider the 8-bit unsigned integer 255==2**8-1: it corresponds to 11111111 in binary. Now, try to reinterpret two of these numbers as a single 16-bit unsigned integer:
>>> arr = np.array([255,255],dtype=np.uint8)
>>> arr.dtype = np.uint16
>>> arr
array([65535], dtype=uint16)
As you can see, the result is the single number 65535. If that doesn't ring a bell, it's exactly 2**16-1, with 16 ones in its binary pattern. The two full-one patterns were reinterpreted as a single 16-bit number, and the result changed accordingly. The reason you often see weirder numbers is that reinterpreting floats as ints as vice versa will lead to a much stronger mangling of the data, due to how floating-point numbers are represented in memory.
As hpaulj noted, you can directly perform this reinterpretation of the data by constructing a new view of the array with a modified dtype. This is probably more useful than having to reassign the dtype of a given array, but then again changing the dtype is only useful in fairly rare, very specific use cases.
Right now numpy throws an error if I try to feed it objects when dtype = 'float'
However floats are also an object. How can I make numpy treat my object like a float?
Edit: I have an object that returns a float upon multiplication and addition. I want to treat it as a float as it becomes a float after a linear transformation.
So my object:
Class some_class()
def __init__(self,a)
self.a=a
def __mul__(self.x):
Return float(a*x)
...
def __radd__(self,x):
return float(a+x)
How do I get it to interrupt this as a float?
Edit 2: am I fundamentally misinterpreting dtype? My logic is that as it behaves like a float it should be treated as such. Is this just wrong? I also ask this question because np.exp() and np.dot() don't seem to work for object types, even if they behave like that.
Edit3: thanks everyone for teaching me about dtypes and astype functions :)
If you want to typecast your resultant numpy array to float values then use astype($TYPE)
>>> x = np.array([1, 2, 2.5])
>>> x
array([ 1. , 2. , 2.5])
>>>
>>> x.astype(float)
array([ 1. , 2. , 2.5])
Refer this link for more information.
Apologies in advance - I seem to be having a very fundamental misunderstanding that I can't clear up. I have a fourvector class with variables for ct and the position vector. I'm writing code to perform an x-direction lorentz boost. The problem I'm running in to is that I, as it's written below, ct returns with a proper float value, but x does not. Messing around, I find that tempx is a float, but assigning tempx to r[0] does not make that into a float, instead it rounds down to an int. I have previously posted a question on mutability vs immutability, and I suspect this is the issue. If so I clearly have a deeper misunderstanding than expected. Regardless, there are a couple of questions I have;
1a) If instantiate a with a = FourVector(ct=5,r=[55,2.,3]), then type(a._r[0]) returns numpy.float64 as opposed to numpy.int32. What is going on here? I expected just a._r[1] to be a float, and instead it changes the type of the whole list?
1b) How do I get the above behaviour (The whole list being floats), without having to instantiate the variables as floats? I read up on the documentation and have tried various methods, like using astype(float), but everything I do seems to keep it as an int. Again, thinking this is the mutable/immutable problem I'm having.
2) I had thought, in the tempx=... line, multiplying by 1.0 would convert it to a float, as it appears this is the reason ct converts to a float, but for some reason it doesn't. Perhaps the same reason as the others?
import numpy as np
class FourVector():
def __init__(self, ct=0, x=0, y=0, z=0, r=[]):
self._ct = ct
self._r = np.array(r)
if r == []:
self._r = np.array([x,y,z])
def boost(self, beta):
gamma=1/np.sqrt(1-(beta ** 2))
tempct=(self._ct*gamma-beta*gamma*self._r[0])
tempx=(-1.0*self._ct*beta*gamma+self._r[0]*gamma)
self._ct=tempct
print(type(self._r[0]))
self._r[0]=tempx.astype(float)
print(type(self._r[0]))
a = FourVector(ct=5,r=[55,2,3])
b = FourVector(ct=1,r=[4,5,6])
print(a._r)
a.boost(.5)
print(a._r)
All your problems are indeed related.
A numpy array is an array that holds objects efficiently. It does this by having these objects be of the same type, like strings (of equal length) or integers or floats. It can then easily calculate just how much space each element needs and how many bytes it must "jump" to access the next element (we call these the "strides").
When you create an array from a list, numpy will try to determine a suitable data type ("dtype") from that list, to ensure all elements can be represented well. Only when you specify the dtype explicitly, will it not make an educated guess.
Consider the following example:
>>> import numpy as np
>>> integer_array = np.array([1,2,3]) # pass in a list of integers
>>> integer_array
array([1, 2, 3])
>>> integer_array.dtype
dtype('int64')
As you can see, on my system it returns a data type of int64, which is a representation of integers using 8 bytes. It chooses this, because:
numpy recognizes all elements of the list are integers
my system is a 64-bit system
Now consider an attempt at changing that array:
>>> integer_array[0] = 2.4 # attempt to put a float in an array with dtype int
>>> integer_array # it is automatically converted to an int!
array([2, 2, 3])
As you can see, once a datatype for an array was set, automatic casting to that datatype is done.
Let's now consider what happens when you pass in a list that has at least one float:
>>> float_array = np.array([1., 2,3])
>>> float_array
array([ 1., 2., 3.])
>>> float_array.dtype
dtype('float64')
Once again, numpy determines a suitable datatype for this array.
Blindly attempting to change the datatype of an array is not wise:
>>> integer_array.dtype = np.float32
>>> integer_array
array([ 2.80259693e-45, 0.00000000e+00, 2.80259693e-45,
0.00000000e+00, 4.20389539e-45, 0.00000000e+00], dtype=float32)
Those numbers are gibberish you might say. That's because numpy tries to reinterpret the memory locations of that array as 4-byte floats (the skilled people will be able to convert the numbers to binary representation and from there reinterpret the original integer values).
If you want to cast, you'll have to do it explicitly and numpy will return a new array:
>>> integer_array.dtype = np.int64 # go back to the previous interpretation
>>> integer_array
array([2, 2, 3])
>>> integer_array.astype(np.float32)
array([ 2., 2., 3.], dtype=float32)
Now, to address your specific questions:
1a) If instantiate a with a = FourVector(ct=5,r=[55,2.,3]), then type(a._r[0]) returns numpy.float64 as opposed to numpy.int32. What is going on here? I expected just a._r[1] to be a float, and instead it changes the type of the whole list?
That's because numpy has to determine a datatype for the entire array (unless you use a structured array), ensuring all elements fit in that datatype. Only then can numpy iterate over the elements of that array efficiently.
1b) How do I get the above behaviour (The whole list being floats), without having to instantiate the variables as floats? I read up on the documentation and have tried various methods, like using astype(float), but everything I do seems to keep it as an int. Again, thinking this is the mutable/immutable problem I'm having.
Specify the dtype when you are creating the array. In your code, that would be:
self._r = np.array(r, dtype=np.float)
2) I had thought, in the tempx=... line, multiplying by 1.0 would convert it to a float, as it appears this is the reason ct converts to a float, but for some reason it doesn't. Perhaps the same reason as the others?
That is true. Try printing the datatype of tempx, it should be a float. However, later on, you are reinserting that value into the array self._r, which has the dtype of int. And as you saw previously, that will cast the float back to an integer type.
Suppose I enter:
a = uint8(200)
a*2
Then the result is 400, and it is recast to be of type uint16.
However:
a = array([200],dtype=uint8)
a*2
and the result is
array([144], dtype=uint8)
The multiplication has been performed modulo 256, to ensure that the result stays in one byte.
I'm confused about "types" and "dtypes" and where one is used in preference to another. And as you see, the type may make a significant difference in the output.
Can I, for example, create a single number of dtype uint8, so that operations on that number will be performed modulo 256? Alternatively, can I create an array of type (not dtype) uint8 so that operations on it will produce values outside the range 0-255?
The simple, high-level answer is that NumPy layers a second type system atop Python's type system.
When you ask for the type of an NumPy object, you get the type of the container--something like numpy.ndarray. But when you ask for the dtype, you get the (numpy-managed) type of the elements.
>>> from numpy import *
>>> arr = array([1.0, 4.0, 3.14])
>>> type(arr)
<type 'numpy.ndarray'>
>>> arr.dtype
dtype('float64')
Sometimes, as when using the default float type, the element data type (dtype) is equivalent to a Python type. But that's equivalent, not identical:
>>> arr.dtype == float
True
>>> arr.dtype is float
False
In other cases, there is no equivalent Python type. For example, when you specified uint8. Such data values/types can be managed by Python, but unlike in C, Rust, and other "systems languages," managing values that align directly to machine data types (like uint8 aligns closely with "unsigned bytes" computations) is not the common use-case for Python.
So the big story is that NumPy provides containers like arrays and matrices that operate under its own type system. And it provides a bunch of highly useful, well-optimized routines to operate on those containers (and their elements). You can mix-and-match NumPy and normal Python computations, if you use care.
There is no Python type uint8. There is a constructor function named uint8, which when called returns a NumPy type:
>>> u = uint8(44)
>>> u
44
>>> u.dtype
dtype('uint8')
>>> type(u)
<type 'numpy.uint8'>
So "can I create an array of type (not dtype) uint8...?" No. You can't. There is no such animal.
You can
do computations constrained to uint8 rules without using NumPy arrays (a.k.a. NumPy scalar values). E.g.:
>>> uint8(44 + 1000)
20
>>> uint8(44) + uint8(1000)
20
But if you want to compute values mod 256, it's probably easier to use Python's mod operator:
>> (44 + 1000) % 256
20
Driving data values larger than 255 into uint8 data types and then doing arithmetic is a rather backdoor way to get mod-256 arithmetic. If you're not careful, you'll either cause Python to "upgrade" your values to full integers (killing your mod-256 scheme), or trigger overflow exceptions (because tricks that work great in C and machine language are often flagged by higher level languages).
The type of a NumPy array is numpy.ndarray; this is just the type of Python object it is (similar to how type("hello") is str for example).
dtype just defines how bytes in memory will be interpreted by a scalar (i.e. a single number) or an array and the way in which the bytes will be treated (e.g. int/float). For that reason you don't change the type of an array or scalar, just its dtype.
As you observe, if you multiply two scalars, the resulting datatype is the smallest "safe" type to which both values can be cast. However, multiplying an array and a scalar will simply return an array of the same datatype. The documentation for the function np.inspect_types is clear about when a particular scalar or array object's dtype is changed:
Type promotion in NumPy works similarly to the rules in languages like C++, with some slight differences. When both scalars and arrays are used, the array's type takes precedence and the actual value of the scalar is taken into account.
The documentation continues:
If there are only scalars or the maximum category of the scalars is higher than the maximum category of the arrays, the data types are combined with promote_types to produce the return value.
So for np.uint8(200) * 2, two scalars, the resulting datatype will be the type returned by np.promote_types:
>>> np.promote_types(np.uint8, int)
dtype('int32')
For np.array([200], dtype=np.uint8) * 2 the array's datatype takes precedence over the scalar int and a np.uint8 datatype is returned.
To address your final question about retaining the dtype of a scalar during operations, you'll have to restrict the datatypes of any other scalars you use to avoid NumPy's automatic dtype promotion:
>>> np.array([200], dtype=np.uint8) * np.uint8(2)
144
The alternative, of course, is to simply wrap the single value in a NumPy array (and then NumPy won't cast it in operations with scalars of different dtype).
To promote the type of an array during an operation, you could wrap any scalars in an array first:
>>> np.array([200], dtype=np.uint8) * np.array([2])
array([400])
A numpy array contains elements of the same type, so np.array([200],dtype=uint8) is an array with one value of type uint8. When you do np.uint8(200), you don't have an array, only a single value. This make a huge difference.
When performing some operation on the array, the type stays the same, irrespective of a single value overflows or not. Automatic upcasting in arrays is forbidden, as the size of the whole array has to change. This is only done if the user explicitly wants that. When performing an operation on a single value, it can easily upcast, not influencing other values.