Converting numpy array values into integers - python

My values are currently showing as 1.00+e09 in an array (type float64). I would like them to show 1000000000 instead. Is this possible?

Make a sample array
In [206]: x=np.array([1e9, 2e10, 1e6])
In [207]: x
Out[207]: array([ 1.00000000e+09, 2.00000000e+10, 1.00000000e+06])
We can convert to ints - except notice that the largest one is too large the default int32
In [208]: x.astype(int)
Out[208]: array([ 1000000000, -2147483648, 1000000])
In [212]: x.astype(np.int64)
Out[212]: array([ 1000000000, 20000000000, 1000000], dtype=int64)
Writing a csv with the default format (float) (this is the default format regardless of the array dtype):
In [213]: np.savetxt('text.txt',x)
In [214]: cat text.txt
1.000000000000000000e+09
2.000000000000000000e+10
1.000000000000000000e+06
We can specify a format:
In [215]: np.savetxt('text.txt',x, fmt='%d')
In [216]: cat text.txt
1000000000
20000000000
1000000
Potentially there are 3 issues:
integer v float in the array itself, it's dtype
display or print of the array
writing the array to a csv file

It is a printing option, see the documentation: printing options. Briefly stated: you need to use the suppress option when printing:
np.set_printoptions(suppress=True) # for small floating point.
np.set_printoptions(suppress=True, formatter={'all':lambda x: str(x)})

Related

Inconsistent numpy complex multiplication results

Consider the following Python code that multiplies two complex numbers:
import numpy as np
a = np.matrix('28534314.10478439+28534314.10478436j').astype(np.complex128)
b = np.matrix('-1.39818115e+09+1.39818115e+09j').astype(np.complex128)
#Verify values
print(a)
print(b)
c=np.dot(a.getT(),b)
#Verify product
print(c)
Now the product should be -7.979228021897728000e+16 + 48j which is correct when I run on Spyder. However, if I receive the values a and b from a sender to a receiver via MPI on an MPI4py program (I verify that they have been received correctly) the product is wrong and specifically -7.97922801e+16+28534416.j. In both cases I am using numpy 1.14.3 and Python 2.7.14. The only difference in the latter case is that prior to receiving the values I initialize the matrices with:
a = np.empty_like(np.matrix([[0]*(1) for i in range(1)])).astype(np.complex128)
b = np.empty_like(np.matrix([[0]*(1) for i in range(1)])).astype(np.complex128)
and then the function MPI::Comm::Irecv() gives them the correct values.
What is going wrong in the latter case if the a and b are correct but c is wrong? Is numpy arbitrarily setting the imaginary part since it's quite smaller than the real part of the product?
First, this doesn't address the mp stuff, but since it was raised in comments:
np.matrix can takes a string argument, and produce a numeric matrix from it. Also notice that the shape is (1,1)
In [145]: a = np.matrix('28534314.10478439+28534314.10478436j')
In [146]: a
Out[146]: matrix([[28534314.10478439+28534314.10478436j]])
In [147]: a.dtype
Out[147]: dtype('complex128')
String input to np.array produces a string:
In [148]: a = np.array('28534314.10478439+28534314.10478436j')
In [149]: a
Out[149]: array('28534314.10478439+28534314.10478436j', dtype='<U36')
But omit the quotes and we get the complex array, with shape () (0d):
In [151]: a = np.array(28534314.10478439+28534314.10478436j)
In [152]: a
Out[152]: array(28534314.10478439+28534314.10478436j)
In [153]: a.dtype
Out[153]: dtype('complex128')
And the product of these values:
In [154]: b = np.array(-1.39818115e+09+1.39818115e+09j)
In [155]: a*b # a.dot(b) same thing
Out[155]: (-7.979228021897728e+16+48j)
Without using mp, I assume the initialization and setting is something like this:
In [179]: x=np.empty_like(np.matrix([[0]*(1) for i in range(1)])).astype(np.complex128)
In [180]: x[:]=a
In [181]: x
Out[181]: matrix([[28534314.10478439+28534314.10478436j]])
In [182]: y=np.empty_like(np.matrix([[0]*(1) for i in range(1)])).astype(np.complex128)
In [183]: y[:]=b
In [184]: y
Out[184]: matrix([[-1.39818115e+09+1.39818115e+09j]])
In [185]: x*y
Out[185]: matrix([[-7.97922802e+16+48.j]])
It may be worth trying np.zeros_like instead of np.empty_like. That will ensure the imaginary part is 0, instead of something random. Then if the mp process is just setting the real part, you should get something different.

How do I use Python's map function with sklearn's preprocessing.scale?

I am trying to use a function (preprocessing.scale) on a list of data. I am new to mapreduce/parallelism in Python - I would like to process this on a large list of data to improve performance.
Example:
X = [1,2,3,4]
Using the syntax:
list(map(preprocessing.scale, X))
I get this error:
TypeError: Singleton array array(1.0) cannot be considered a valid collection.
I think that is because of the return type of the function, but I am not sure how to fix this. Any help would be greatly appreciated!
You don't need/want to use map function as it does for loop under the hood.
Almost all sklearn methods are vectorized and they accept list-alike objects (lists, numpy arrays, etc.) and this would work much-much faster compared to map(...) approach
Demo:
In [121]: from sklearn.preprocessing import scale
In [122]: X = [1,2,3,4]
In [123]: scale(X)
Out[123]: array([-1.34164079, -0.4472136 , 0.4472136 , 1.34164079])
the same demo using numpy array:
In [39]: x = np.array(X)
In [40]: x
Out[40]: array([1, 2, 3, 4])
In [41]: scale(x)
DataConversionWarning: Data with input dtype int32 was converted to float64 by the scale function.
warnings.warn(msg, _DataConversionWarning)
Out[41]: array([-1.34164079, -0.4472136 , 0.4472136 , 1.34164079])
it expects float dtype, so we can easily convert our numpy array to float dtype on the fly:
In [42]: scale(x.astype('float64'))
Out[42]: array([-1.34164079, -0.4472136 , 0.4472136 , 1.34164079])
Executing list(map(preprocessing.scale, X)) is equivalent to executing [preprocessing.scale(a) for a in X].
Given this, what you are currently doing is scaling an singleton (one observation). You cannot scale a single item, and that is where the function breaks. Try doing preprocessing.scale(X[0]) and you will get the same error.
What is the purpose for you trying to run it like that and not just pass the array X preprocessing.scale(X)?

numpy.column_stack with numeric and string arrays

I have several arrays, some of them have float numbers and others have string characters, all the arrays have the same length. When I try to use numpy.column_stack in these arrays, this function convert to string the float numbers, for example:
a = np.array([3.4,3.4,6.4])
b = np.array(['holi','xlo','xlo'])
B = np.column_stack((a,b))
print B
>>> [['3.4' 'holi']
['3.4' 'xlo']
['3.4' 'xlo']
type(B[0,0])
>>> numpy.string
Why? It's possible to avoid it?
Thanks a lot for your time.
The easiest structured array approach is with the rec.fromarrays function:
In [1411]: a=np.array([3.4,3.4,6.4]); b=np.array(['holi','xlo','xlo'])
In [1412]: B = np.rec.fromarrays([a,b],names=['a','b'])
In [1413]: B
Out[1413]:
rec.array([(3.4, 'holi'), (3.4, 'xlo'), (6.4, 'xlo')],
dtype=[('a', '<f8'), ('b', '<U4')])
In [1414]: B['a']
Out[1414]: array([ 3.4, 3.4, 6.4])
In [1415]: B['b']
Out[1415]:
array(['holi', 'xlo', 'xlo'],
dtype='<U4')
Check its docs for more parameters. But it basically constructs an empty array of the correct compound dtype, and copies your arrays to the respective fields.
To store such mixed type data, most probably you would be required to store them as Object dtype arrays or use structured arrays. Going with the Object dtype arrays, we could convert either of the input arrays to an Object dtype upfront and then stack it alongside the rest of the arrays to be stacked. The rest of the arrays would be converted automatically to Object dtype to give us a stacked array of that type. Thus, we would have an implementation like so-
np.column_stack((a.astype(np.object),b))
Sample run to show how to construct a stacked array and retrieve the individual arrays back -
In [88]: a
Out[88]: array([ 3.4, 3.4, 6.4])
In [89]: b
Out[89]:
array(['holi', 'xlo', 'xlo'],
dtype='|S4')
In [90]: out = np.column_stack((a.astype(np.object),b))
In [91]: out
Out[91]:
array([[3.4, 'holi'],
[3.4, 'xlo'],
[6.4, 'xlo']], dtype=object)
In [92]: out[:,0].astype(float)
Out[92]: array([ 3.4, 3.4, 6.4])
In [93]: out[:,1].astype(str)
Out[93]:
array(['holi', 'xlo', 'xlo'],
dtype='|S4')

Converting numpy string array to float: Bizarre?

So, this should be a really straightforward thing but for whatever reason, nothing I'm doing to convert an array of strings to an array of floats is working.
I have a two column array, like so:
Name Value
Bob 4.56
Sam 5.22
Amy 1.22
I try this:
for row in myarray[1:,]:
row[1]=float(row[1])
And this:
for row in myarray[1:,]:
row[1]=row[1].astype(1)
And this:
myarray[1:,1] = map(float, myarray[1:,1])
And they all seem to do something, but when I double check:
type(myarray[9,1])
I get
<type> 'numpy.string_'>
Numpy arrays must have one dtype unless it is structured. Since you have some strings in the array, they must all be strings.
If you wish to have a complex dtype, you may do so:
import numpy as np
a = np.array([('Bob','4.56'), ('Sam','5.22'),('Amy', '1.22')], dtype = [('name','S3'),('val',float)])
Note that a is now a 1d structured array, where each element is a tuple of type dtype.
You can access the values using their field name:
In [21]: a = np.array([('Bob','4.56'), ('Sam','5.22'),('Amy', '1.22')],
...: dtype = [('name','S3'),('val',float)])
In [22]: a
Out[22]:
array([('Bob', 4.56), ('Sam', 5.22), ('Amy', 1.22)],
dtype=[('name', 'S3'), ('val', '<f8')])
In [23]: a['val']
Out[23]: array([ 4.56, 5.22, 1.22])
In [24]: a['name']
Out[24]:
array(['Bob', 'Sam', 'Amy'],
dtype='|S3')
The type of the objects in a numpy array is determined at the initialsation of that array. If you want to change that later, you must cast the array, not the objects within that array.
myNewArray = myArray.asType(float)
Note: Upcasting is possible, for downcasting you need the astype method.
For further information see:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.chararray.astype.html

Convert int64 to uint64

I want to convert an int64 numpy array to a uint64 numpy array, adding 2**63 to the values in the process so that they are still within the valid range allowed by the arrays. So for example if I start from
a = np.array([-2**63,2**63-1], dtype=np.int64)
I want to end up with
np.array([0.,2**64], dtype=np.uint64)
Sounds simple at first, but how would you actually do it?
Use astype() to convert the values to another dtype:
import numpy as np
(a+2**63).astype(np.uint64)
# array([ 0, 18446744073709551615], dtype=uint64)
I'm not a real numpy expert, but this:
>>> a = np.array([-2**63,2**63-1], dtype=np.int64)
>>> b = np.array([x+2**63 for x in a], dtype=np.uint64)
>>> b
array([ 0, 18446744073709551615], dtype=uint64)
works for me with Python 2.6 and numpy 1.3.0
I assume you meant 2**64-1, not 2**64, in your expected output, since 2**64 won't fit in a uint64. (18446744073709551615 is 2**64-1)

Categories

Resources