I want to convert an int64 numpy array to a uint64 numpy array, adding 2**63 to the values in the process so that they are still within the valid range allowed by the arrays. So for example if I start from
a = np.array([-2**63,2**63-1], dtype=np.int64)
I want to end up with
np.array([0.,2**64], dtype=np.uint64)
Sounds simple at first, but how would you actually do it?
Use astype() to convert the values to another dtype:
import numpy as np
(a+2**63).astype(np.uint64)
# array([ 0, 18446744073709551615], dtype=uint64)
I'm not a real numpy expert, but this:
>>> a = np.array([-2**63,2**63-1], dtype=np.int64)
>>> b = np.array([x+2**63 for x in a], dtype=np.uint64)
>>> b
array([ 0, 18446744073709551615], dtype=uint64)
works for me with Python 2.6 and numpy 1.3.0
I assume you meant 2**64-1, not 2**64, in your expected output, since 2**64 won't fit in a uint64. (18446744073709551615 is 2**64-1)
Related
Consider the following Python code that multiplies two complex numbers:
import numpy as np
a = np.matrix('28534314.10478439+28534314.10478436j').astype(np.complex128)
b = np.matrix('-1.39818115e+09+1.39818115e+09j').astype(np.complex128)
#Verify values
print(a)
print(b)
c=np.dot(a.getT(),b)
#Verify product
print(c)
Now the product should be -7.979228021897728000e+16 + 48j which is correct when I run on Spyder. However, if I receive the values a and b from a sender to a receiver via MPI on an MPI4py program (I verify that they have been received correctly) the product is wrong and specifically -7.97922801e+16+28534416.j. In both cases I am using numpy 1.14.3 and Python 2.7.14. The only difference in the latter case is that prior to receiving the values I initialize the matrices with:
a = np.empty_like(np.matrix([[0]*(1) for i in range(1)])).astype(np.complex128)
b = np.empty_like(np.matrix([[0]*(1) for i in range(1)])).astype(np.complex128)
and then the function MPI::Comm::Irecv() gives them the correct values.
What is going wrong in the latter case if the a and b are correct but c is wrong? Is numpy arbitrarily setting the imaginary part since it's quite smaller than the real part of the product?
First, this doesn't address the mp stuff, but since it was raised in comments:
np.matrix can takes a string argument, and produce a numeric matrix from it. Also notice that the shape is (1,1)
In [145]: a = np.matrix('28534314.10478439+28534314.10478436j')
In [146]: a
Out[146]: matrix([[28534314.10478439+28534314.10478436j]])
In [147]: a.dtype
Out[147]: dtype('complex128')
String input to np.array produces a string:
In [148]: a = np.array('28534314.10478439+28534314.10478436j')
In [149]: a
Out[149]: array('28534314.10478439+28534314.10478436j', dtype='<U36')
But omit the quotes and we get the complex array, with shape () (0d):
In [151]: a = np.array(28534314.10478439+28534314.10478436j)
In [152]: a
Out[152]: array(28534314.10478439+28534314.10478436j)
In [153]: a.dtype
Out[153]: dtype('complex128')
And the product of these values:
In [154]: b = np.array(-1.39818115e+09+1.39818115e+09j)
In [155]: a*b # a.dot(b) same thing
Out[155]: (-7.979228021897728e+16+48j)
Without using mp, I assume the initialization and setting is something like this:
In [179]: x=np.empty_like(np.matrix([[0]*(1) for i in range(1)])).astype(np.complex128)
In [180]: x[:]=a
In [181]: x
Out[181]: matrix([[28534314.10478439+28534314.10478436j]])
In [182]: y=np.empty_like(np.matrix([[0]*(1) for i in range(1)])).astype(np.complex128)
In [183]: y[:]=b
In [184]: y
Out[184]: matrix([[-1.39818115e+09+1.39818115e+09j]])
In [185]: x*y
Out[185]: matrix([[-7.97922802e+16+48.j]])
It may be worth trying np.zeros_like instead of np.empty_like. That will ensure the imaginary part is 0, instead of something random. Then if the mp process is just setting the real part, you should get something different.
I'd like to plot points:
points = np.random.multivariate_normal(mean=(0,0), cov=[[0.4,9],[9,10]],size=int(1e4))
print(points)
[[-2.50584156 2.77190372]
[ 2.68192136 -3.83203819]
...,
[-1.10738221 -1.72058301]
[ 3.75168017 5.6905342 ]]
print(type(points))
<class 'numpy.ndarray'>
data = ascii.read(datafile)
type(data['ra'])
astropy.table.column.Column
type(data['dec'])
astropy.table.column.Column
and then I try:
points = np.array([data['ra']], [data['dec']])
and get a
TypeError: data type not understood
Thoughts?
An astropy Table Column object can be converted to a numpy array using the data attribute:
In [7]: c = Column([1, 2, 3])
In [8]: c.data
Out[8]: array([1, 2, 3])
You can also convert an entire table to a numpy structured array with the as_array() Table method (e.g. data.as_array() in your example).
BTW I think the actual problem is not about astropy Column but your numpy array creation statement. It should probably be:
arr = np.array([data['ra'], data['dec']])
This works with Column objects.
The signature of numpy.array is numpy.array(object, dtype=None,)
Hence, when calling np.array([data['ra']], [data['dec']]), [data['ra']] is the object to convert to a numpy array, and [data['dec']] is the data type, which is not understood (as the error says).
It's not actually clear from the question what you are trying to achieve instead - possibly something like
points = np.array([data['ra'], data['dec']])
Keep in mind, though, that if you actiually want is to plot points you don't need to convert to arrays. The following will work just fine:
from matplotlib import pyplot as plt
plt.scatter(data['ra'], data['dec'])
With no need to do any conversion to arrays.
I am trying to use a function (preprocessing.scale) on a list of data. I am new to mapreduce/parallelism in Python - I would like to process this on a large list of data to improve performance.
Example:
X = [1,2,3,4]
Using the syntax:
list(map(preprocessing.scale, X))
I get this error:
TypeError: Singleton array array(1.0) cannot be considered a valid collection.
I think that is because of the return type of the function, but I am not sure how to fix this. Any help would be greatly appreciated!
You don't need/want to use map function as it does for loop under the hood.
Almost all sklearn methods are vectorized and they accept list-alike objects (lists, numpy arrays, etc.) and this would work much-much faster compared to map(...) approach
Demo:
In [121]: from sklearn.preprocessing import scale
In [122]: X = [1,2,3,4]
In [123]: scale(X)
Out[123]: array([-1.34164079, -0.4472136 , 0.4472136 , 1.34164079])
the same demo using numpy array:
In [39]: x = np.array(X)
In [40]: x
Out[40]: array([1, 2, 3, 4])
In [41]: scale(x)
DataConversionWarning: Data with input dtype int32 was converted to float64 by the scale function.
warnings.warn(msg, _DataConversionWarning)
Out[41]: array([-1.34164079, -0.4472136 , 0.4472136 , 1.34164079])
it expects float dtype, so we can easily convert our numpy array to float dtype on the fly:
In [42]: scale(x.astype('float64'))
Out[42]: array([-1.34164079, -0.4472136 , 0.4472136 , 1.34164079])
Executing list(map(preprocessing.scale, X)) is equivalent to executing [preprocessing.scale(a) for a in X].
Given this, what you are currently doing is scaling an singleton (one observation). You cannot scale a single item, and that is where the function breaks. Try doing preprocessing.scale(X[0]) and you will get the same error.
What is the purpose for you trying to run it like that and not just pass the array X preprocessing.scale(X)?
Is there any way to do a "reinterpret_cast" with numpy arrays? Here's an example:
>>> import numpy as np
>>> x=np.array([105,79,196,53,151,176,59,202,249,0,207,6], dtype=np.uint8)
>>> np.fromstring(x.tostring(),'<h')
array([ 20329, 13764, -20329, -13765, 249, 1743], dtype=int16)
I can call tostring() and then fromstring() to convert from an array to raw bytes and then back to another array. I'm just wondering if there's a way for me to skip the intermediate step. (not that it's a big deal, I would just like to understand.)
Yes. When you view an array with a different dtype, you are reinterpreting the underlying data (zeros and ones) according to the different dtype.
In [85]: x.view('<i2')
Out[85]: array([ 20329, 13764, -20329, -13765, 249, 1743], dtype=int16)
So, this should be a really straightforward thing but for whatever reason, nothing I'm doing to convert an array of strings to an array of floats is working.
I have a two column array, like so:
Name Value
Bob 4.56
Sam 5.22
Amy 1.22
I try this:
for row in myarray[1:,]:
row[1]=float(row[1])
And this:
for row in myarray[1:,]:
row[1]=row[1].astype(1)
And this:
myarray[1:,1] = map(float, myarray[1:,1])
And they all seem to do something, but when I double check:
type(myarray[9,1])
I get
<type> 'numpy.string_'>
Numpy arrays must have one dtype unless it is structured. Since you have some strings in the array, they must all be strings.
If you wish to have a complex dtype, you may do so:
import numpy as np
a = np.array([('Bob','4.56'), ('Sam','5.22'),('Amy', '1.22')], dtype = [('name','S3'),('val',float)])
Note that a is now a 1d structured array, where each element is a tuple of type dtype.
You can access the values using their field name:
In [21]: a = np.array([('Bob','4.56'), ('Sam','5.22'),('Amy', '1.22')],
...: dtype = [('name','S3'),('val',float)])
In [22]: a
Out[22]:
array([('Bob', 4.56), ('Sam', 5.22), ('Amy', 1.22)],
dtype=[('name', 'S3'), ('val', '<f8')])
In [23]: a['val']
Out[23]: array([ 4.56, 5.22, 1.22])
In [24]: a['name']
Out[24]:
array(['Bob', 'Sam', 'Amy'],
dtype='|S3')
The type of the objects in a numpy array is determined at the initialsation of that array. If you want to change that later, you must cast the array, not the objects within that array.
myNewArray = myArray.asType(float)
Note: Upcasting is possible, for downcasting you need the astype method.
For further information see:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.chararray.astype.html