Consider the following Python code that multiplies two complex numbers:
import numpy as np
a = np.matrix('28534314.10478439+28534314.10478436j').astype(np.complex128)
b = np.matrix('-1.39818115e+09+1.39818115e+09j').astype(np.complex128)
#Verify values
print(a)
print(b)
c=np.dot(a.getT(),b)
#Verify product
print(c)
Now the product should be -7.979228021897728000e+16 + 48j which is correct when I run on Spyder. However, if I receive the values a and b from a sender to a receiver via MPI on an MPI4py program (I verify that they have been received correctly) the product is wrong and specifically -7.97922801e+16+28534416.j. In both cases I am using numpy 1.14.3 and Python 2.7.14. The only difference in the latter case is that prior to receiving the values I initialize the matrices with:
a = np.empty_like(np.matrix([[0]*(1) for i in range(1)])).astype(np.complex128)
b = np.empty_like(np.matrix([[0]*(1) for i in range(1)])).astype(np.complex128)
and then the function MPI::Comm::Irecv() gives them the correct values.
What is going wrong in the latter case if the a and b are correct but c is wrong? Is numpy arbitrarily setting the imaginary part since it's quite smaller than the real part of the product?
First, this doesn't address the mp stuff, but since it was raised in comments:
np.matrix can takes a string argument, and produce a numeric matrix from it. Also notice that the shape is (1,1)
In [145]: a = np.matrix('28534314.10478439+28534314.10478436j')
In [146]: a
Out[146]: matrix([[28534314.10478439+28534314.10478436j]])
In [147]: a.dtype
Out[147]: dtype('complex128')
String input to np.array produces a string:
In [148]: a = np.array('28534314.10478439+28534314.10478436j')
In [149]: a
Out[149]: array('28534314.10478439+28534314.10478436j', dtype='<U36')
But omit the quotes and we get the complex array, with shape () (0d):
In [151]: a = np.array(28534314.10478439+28534314.10478436j)
In [152]: a
Out[152]: array(28534314.10478439+28534314.10478436j)
In [153]: a.dtype
Out[153]: dtype('complex128')
And the product of these values:
In [154]: b = np.array(-1.39818115e+09+1.39818115e+09j)
In [155]: a*b # a.dot(b) same thing
Out[155]: (-7.979228021897728e+16+48j)
Without using mp, I assume the initialization and setting is something like this:
In [179]: x=np.empty_like(np.matrix([[0]*(1) for i in range(1)])).astype(np.complex128)
In [180]: x[:]=a
In [181]: x
Out[181]: matrix([[28534314.10478439+28534314.10478436j]])
In [182]: y=np.empty_like(np.matrix([[0]*(1) for i in range(1)])).astype(np.complex128)
In [183]: y[:]=b
In [184]: y
Out[184]: matrix([[-1.39818115e+09+1.39818115e+09j]])
In [185]: x*y
Out[185]: matrix([[-7.97922802e+16+48.j]])
It may be worth trying np.zeros_like instead of np.empty_like. That will ensure the imaginary part is 0, instead of something random. Then if the mp process is just setting the real part, you should get something different.
Related
I have a program that outputs numpy arrays that looks like for example:
[[a1, a2],
[b1],
[c1, c2, c3]]
Is there an elegant and python-way to turn this into this ?
[[a1, b1, c1],
[a2, c2],
[c3]]
This purpose of this is to get the sum/average over the columns that does not complain if some values are missing, so I am happy with something that can do this directly. Here is a example for you to copy past:
import numpy
test = numpy.array([
numpy.array([3, 5]),
numpy.array([3.4]),
numpy.array([2.8, 5.3, 7.1])
])
Since you don't have a matrix you can't benefit from Numpy's vectorized functionalities. Instead you can use itertools.zip_longest and filter as following to get what you want:
In [13]: import numpy as np
In [14]: import numpy
...: test = np.array(
...: [np.array([3 , 5]),
...: np.array([3.4]),
...: np.array([2.8,5.3,7.1])])
...:
In [15]: from itertools import zip_longest
In [16]: [np.fromiter(filter(bool, i), dtype=np.float) for i in zip_longest(*test)]
Out[16]: [array([3. , 3.4, 2.8]), array([5. , 5.3]), array([7.1])]
Note that using bool as the filtering function will eliminate items like 0 or empty string which their bool value is False.
If you're not sure that you might have such items in your array you can just use another list comprehension or a lambda function with filter.
[np.array([for i in sub if i is not None]) for sub in zip_longest(*test)]
You might also wanna take a look at the zip_longest's roughly equivalent implementation so that (if possible) generate the desired result at the first place before returning that list.
You lose all the benefits of numpy arrays when you start treating them as ragged lists. An alternative is to set empty/missing elements to NaN, and use the functions prefixed with "nan" in the numpy suite to compute your statistics. For example, mean maps to nanmean, sum maps to nansum, etc (complete list here). This has the additional advantage that the order of the gaps does not matter.
If at all possible, have your program create a single array that looks like this:
test = np.array([
[3.0, 5.0, np.nan],
[3.4, np.nan, np.nan],
[2.8, 5.3, 7.1]])
If not, here is a primitive attempt at converting the input:
def to_full(a):
output = np.full((len(a), max(map(len, a))), np.nan)
for i, row in enumerate(a):
output[i, :len(row)] = row
return output
Now computing the mean is trivial:
mean = np.nanmean(test, axis=0)
I am trying to use a function (preprocessing.scale) on a list of data. I am new to mapreduce/parallelism in Python - I would like to process this on a large list of data to improve performance.
Example:
X = [1,2,3,4]
Using the syntax:
list(map(preprocessing.scale, X))
I get this error:
TypeError: Singleton array array(1.0) cannot be considered a valid collection.
I think that is because of the return type of the function, but I am not sure how to fix this. Any help would be greatly appreciated!
You don't need/want to use map function as it does for loop under the hood.
Almost all sklearn methods are vectorized and they accept list-alike objects (lists, numpy arrays, etc.) and this would work much-much faster compared to map(...) approach
Demo:
In [121]: from sklearn.preprocessing import scale
In [122]: X = [1,2,3,4]
In [123]: scale(X)
Out[123]: array([-1.34164079, -0.4472136 , 0.4472136 , 1.34164079])
the same demo using numpy array:
In [39]: x = np.array(X)
In [40]: x
Out[40]: array([1, 2, 3, 4])
In [41]: scale(x)
DataConversionWarning: Data with input dtype int32 was converted to float64 by the scale function.
warnings.warn(msg, _DataConversionWarning)
Out[41]: array([-1.34164079, -0.4472136 , 0.4472136 , 1.34164079])
it expects float dtype, so we can easily convert our numpy array to float dtype on the fly:
In [42]: scale(x.astype('float64'))
Out[42]: array([-1.34164079, -0.4472136 , 0.4472136 , 1.34164079])
Executing list(map(preprocessing.scale, X)) is equivalent to executing [preprocessing.scale(a) for a in X].
Given this, what you are currently doing is scaling an singleton (one observation). You cannot scale a single item, and that is where the function breaks. Try doing preprocessing.scale(X[0]) and you will get the same error.
What is the purpose for you trying to run it like that and not just pass the array X preprocessing.scale(X)?
My values are currently showing as 1.00+e09 in an array (type float64). I would like them to show 1000000000 instead. Is this possible?
Make a sample array
In [206]: x=np.array([1e9, 2e10, 1e6])
In [207]: x
Out[207]: array([ 1.00000000e+09, 2.00000000e+10, 1.00000000e+06])
We can convert to ints - except notice that the largest one is too large the default int32
In [208]: x.astype(int)
Out[208]: array([ 1000000000, -2147483648, 1000000])
In [212]: x.astype(np.int64)
Out[212]: array([ 1000000000, 20000000000, 1000000], dtype=int64)
Writing a csv with the default format (float) (this is the default format regardless of the array dtype):
In [213]: np.savetxt('text.txt',x)
In [214]: cat text.txt
1.000000000000000000e+09
2.000000000000000000e+10
1.000000000000000000e+06
We can specify a format:
In [215]: np.savetxt('text.txt',x, fmt='%d')
In [216]: cat text.txt
1000000000
20000000000
1000000
Potentially there are 3 issues:
integer v float in the array itself, it's dtype
display or print of the array
writing the array to a csv file
It is a printing option, see the documentation: printing options. Briefly stated: you need to use the suppress option when printing:
np.set_printoptions(suppress=True) # for small floating point.
np.set_printoptions(suppress=True, formatter={'all':lambda x: str(x)})
I have several arrays, some of them have float numbers and others have string characters, all the arrays have the same length. When I try to use numpy.column_stack in these arrays, this function convert to string the float numbers, for example:
a = np.array([3.4,3.4,6.4])
b = np.array(['holi','xlo','xlo'])
B = np.column_stack((a,b))
print B
>>> [['3.4' 'holi']
['3.4' 'xlo']
['3.4' 'xlo']
type(B[0,0])
>>> numpy.string
Why? It's possible to avoid it?
Thanks a lot for your time.
The easiest structured array approach is with the rec.fromarrays function:
In [1411]: a=np.array([3.4,3.4,6.4]); b=np.array(['holi','xlo','xlo'])
In [1412]: B = np.rec.fromarrays([a,b],names=['a','b'])
In [1413]: B
Out[1413]:
rec.array([(3.4, 'holi'), (3.4, 'xlo'), (6.4, 'xlo')],
dtype=[('a', '<f8'), ('b', '<U4')])
In [1414]: B['a']
Out[1414]: array([ 3.4, 3.4, 6.4])
In [1415]: B['b']
Out[1415]:
array(['holi', 'xlo', 'xlo'],
dtype='<U4')
Check its docs for more parameters. But it basically constructs an empty array of the correct compound dtype, and copies your arrays to the respective fields.
To store such mixed type data, most probably you would be required to store them as Object dtype arrays or use structured arrays. Going with the Object dtype arrays, we could convert either of the input arrays to an Object dtype upfront and then stack it alongside the rest of the arrays to be stacked. The rest of the arrays would be converted automatically to Object dtype to give us a stacked array of that type. Thus, we would have an implementation like so-
np.column_stack((a.astype(np.object),b))
Sample run to show how to construct a stacked array and retrieve the individual arrays back -
In [88]: a
Out[88]: array([ 3.4, 3.4, 6.4])
In [89]: b
Out[89]:
array(['holi', 'xlo', 'xlo'],
dtype='|S4')
In [90]: out = np.column_stack((a.astype(np.object),b))
In [91]: out
Out[91]:
array([[3.4, 'holi'],
[3.4, 'xlo'],
[6.4, 'xlo']], dtype=object)
In [92]: out[:,0].astype(float)
Out[92]: array([ 3.4, 3.4, 6.4])
In [93]: out[:,1].astype(str)
Out[93]:
array(['holi', 'xlo', 'xlo'],
dtype='|S4')
I want to convert an int64 numpy array to a uint64 numpy array, adding 2**63 to the values in the process so that they are still within the valid range allowed by the arrays. So for example if I start from
a = np.array([-2**63,2**63-1], dtype=np.int64)
I want to end up with
np.array([0.,2**64], dtype=np.uint64)
Sounds simple at first, but how would you actually do it?
Use astype() to convert the values to another dtype:
import numpy as np
(a+2**63).astype(np.uint64)
# array([ 0, 18446744073709551615], dtype=uint64)
I'm not a real numpy expert, but this:
>>> a = np.array([-2**63,2**63-1], dtype=np.int64)
>>> b = np.array([x+2**63 for x in a], dtype=np.uint64)
>>> b
array([ 0, 18446744073709551615], dtype=uint64)
works for me with Python 2.6 and numpy 1.3.0
I assume you meant 2**64-1, not 2**64, in your expected output, since 2**64 won't fit in a uint64. (18446744073709551615 is 2**64-1)