numpy's complex128 conversion - python

I have a list of np.complex128 numbers but for all the numbers the complex part is equal to zero. How can I extract the real part of the number (which is pretty much the only part of the number)?
As a side not, I want to do scipy integration but I am not sure whether their integration methods can handle y samples with dtype of np.complex128.

If your list is a NumPy array, you can simply refer to its real attribute:
In [59]: a = np.array([1+0j, 2+0j, -1+0j])
In [60]: a
Out[60]: array([ 1.+0.j, 2.+0.j, -1.+0.j])
In [61]: a.real
Out[61]: array([ 1., 2., -1.])
If your list is a Python list, perhaps the following list comprehension would be the simplest way to get the real parts you want:
In [64]: l
Out[64]: [(1+0j), (2+0j), (-1+0j)]
In [67]: [c.real for c in l]
Out[67]: [1.0, 2.0, -1.0]
You would need to do this conversion if you want to integrate a function returning a np.complex128 with quad: scipy.integrate.quad expects the function to return some kind of float.

Related

Using `numpy.vectorize` to create multidimensional array results in ValueError: setting an array element with a sequence

This problem only seems to arise when my dummy function returns an array and thus, a multidimensional array is being created.
I reduced the issue to the following example:
def dummy(x):
y = np.array([np.sin(x), np.cos(x)])
return y
x = np.array([0, np.pi/2, np.pi])
The code I want to optimize looks like this:
y = []
for x_i in x:
y_i = dummy(x_i)
y.append(y_i)
y = np.array(y)
So I thought, I could use vectorize to get rid of the slow loop:
y = np.vectorize(dummy)(x)
But this results in
ValueError: setting an array element with a sequence.
Where even is the sequence, which the error is talking about?!
Your function returns an array when given a scalar:
In [233]: def dummy(x):
...: y = np.array([np.sin(x), np.cos(x)])
...: return y
...:
...:
In [234]: dummy(1)
Out[234]: array([0.84147098, 0.54030231])
In [235]: f = np.vectorize(dummy)
In [236]: f([0,1,2])
...
ValueError: setting an array element with a sequence.
vectorize constructs a empty result array, and tries to put the result of each calculation in it. But a cell of the target array cannot accept an array.
If we specify a otypes parameter, it does work:
In [237]: f = np.vectorize(dummy, otypes=[object])
In [238]: f([0,1,2])
Out[238]:
array([array([0., 1.]), array([0.84147098, 0.54030231]),
array([ 0.90929743, -0.41614684])], dtype=object)
That is, each dummy array is put in a element of a shape (3,) result array.
Since the component arrays all have the same shape, we can stack them:
In [239]: np.stack(_)
Out[239]:
array([[ 0. , 1. ],
[ 0.84147098, 0.54030231],
[ 0.90929743, -0.41614684]])
But as noted, vectorize does not promise a speedup. I suspect we could also use the newer signature parameter, but that's even slower.
vectorize makes some sense if your function takes several scalar arguments, and you'd like to take advantage of numpy broadcasting when feeding sets of values. But as replacement for a simple iteration over a 1d array, it isn't an improvement.
I don't really understand the error either, but with python 3.6.3 you can just write:
y = dummy(x)
so it is automatically vectorized.
Also in the official documentation there is written the following:
The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.
I hope this was at least a little help.

Adding arrays which may contain 'None'-entries

I have a question regarding the addition of numpy arrays.
Let's assume I have defined a function
def foo(a,b):
return a+b
that takes two arrays of the same shape and simply returns their sum.
Now, I have to deal with the cases that some of the entries may be None.
I would like to deal with those entries as they correspond to float(0), such that
[1.0,None,2.0] + [1.0,2.0,2.0]
would add up to
[2.0,2.0,4.0]
Can you provide me with an already-implemented solution?
TIA
I suggest numpy.nan_to_num:
>>> np.nan_to_num(np.array([1.0,None,2.0], dtype=np.float))
array([ 1., 0., 2.])
Then,
>>> def foo(a,b):
... return np.nan_to_num(a) + np.nan_to_num(b)
...
>>> foo(np.array([1.0,None,2.0], dtype=np.float), np.array([1.0,2.0,2.0], dtype=np.float))
array([ 2., 0., 4.])
Usually, the answer to this is to use an array of floats, rather than an array of arbitrary objects, and then use np.nan instead of None. NaN has well-defined semantics for arithmetic. (Also, using an array of floats instead of objects will make your code significantly more time and space efficient.)
Notice that you don't have to manually convert None to np.nan if you build the array with an explicit dtype of float or np.float64. Both of these are equivalent:
>>> a = np.array([1.0,np.nan,2.0])
>>> a = np.array([1.0,None,2.0],dtype=float)
Which means that if, for some reason, you really needed arrays of arbitrary objects with actual None in them, you could do that, and then convert it to an array of floats on the fly to get the benefits of NaN:
>>> a.astype(float) + b.astype(float)
At any rate, in this case, just using NaN isn't sufficient:
>>> a = np.array([1.0,np.nan,2.0])
>>> b = np.array([1.0,2.0,2.0])
>>> a + b
array([ 2., nan, 4.])
That's because the semantics of NaN are that the result of any operation with NaN returns NaN. But you want to treat it as 0.
But it does make the problem easy to solve. The simplest way to solve that is with the function nan_to_num:
>>> np.nan_to_num(a, 0)
array([1., 0., 2.0])
>>> np.nan_to_num(a, 0) + np.nan_to_num(b, 0)
array([2., 2., 4.])
You can use column_stack to concatenates both arrays along the second axis then use np.nansum() to sum items over the second axis.
In [15]: a = np.array([1.0,None,2.0], dtype=np.float)
# Using dtype here is necessary to convert None to np.nan
In [16]: b = np.array([1.0,2.0,2.0])
In [17]: np.nansum(np.column_stack((a, b)), 1)
Out[17]: array([2., 2., 4.])

Comparing NumPy object references

I want to understand the NumPy behavior.
When I try to get the reference of an inner array of a NumPy array, and then compare it to the object itself, I get as returned value False.
Here is the example:
In [198]: x = np.array([[1,2,3], [4,5,6]])
In [201]: x0 = x[0]
In [202]: x0 is x[0]
Out[202]: False
While on the other hand, with Python native objects, the returned is True.
In [205]: c = [[1,2,3],[1]]
In [206]: c0 = c[0]
In [207]: c0 is c[0]
Out[207]: True
My question, is that the intended behavior of NumPy? If so, what should I do if I want to create a reference of inner objects of NumPy arrays.
2d slicing
When I first wrote this I constructed and indexed a 1d array. But the OP is working with a 2d array, so x[0] is a 'row', a slice of the original.
In [81]: arr = np.array([[1,2,3], [4,5,6]])
In [82]: arr.__array_interface__['data']
Out[82]: (181595128, False)
In [83]: x0 = arr[0,:]
In [84]: x0.__array_interface__['data']
Out[84]: (181595128, False) # same databuffer pointer
In [85]: id(x0)
Out[85]: 2886887088
In [86]: x1 = arr[0,:] # another slice, different id
In [87]: x1.__array_interface__['data']
Out[87]: (181595128, False)
In [88]: id(x1)
Out[88]: 2886888888
What I wrote earlier about slices still applies. Indexing an individual elements, as with arr[0,0] works the same as with a 1d array.
This 2d arr has the same databuffer as the 1d arr.ravel(); the shape and strides are different. And the distinction between view, copy and item still applies.
A common way of implementing 2d arrays in C is to have an array of pointers to other arrays. numpy takes a different, strided approach, with just one flat array of data, and usesshape and strides parameters to implement the transversal. So a subarray requires its own shape and strides as well as a pointer to the shared databuffer.
1d array indexing
I'll try to illustrate what is going on when you index an array:
In [51]: arr = np.arange(4)
The array is an object with various attributes such as shape, and a data buffer. The buffer stores the data as bytes (in a C array), not as Python numeric objects. You can see information on the array with:
In [52]: np.info(arr)
class: ndarray
shape: (4,)
strides: (4,)
itemsize: 4
aligned: True
contiguous: True
fortran: True
data pointer: 0xa84f8d8
byteorder: little
byteswap: False
type: int32
or
In [53]: arr.__array_interface__
Out[53]:
{'data': (176486616, False),
'descr': [('', '<i4')],
'shape': (4,),
'strides': None,
'typestr': '<i4',
'version': 3}
One has the data pointer in hex, the other decimal. We usually don't reference it directly.
If I index an element, I get a new object:
In [54]: x1 = arr[1]
In [55]: type(x1)
Out[55]: numpy.int32
In [56]: x1.__array_interface__
Out[56]:
{'__ref': array(1),
'data': (181158400, False),
....}
In [57]: id(x1)
Out[57]: 2946170352
It has some properties of an array, but not all. For example you can't assign to it. Notice also that its 'data` value is totally different.
Make another selection from the same place - different id and different data:
In [58]: x2 = arr[1]
In [59]: id(x2)
Out[59]: 2946170336
In [60]: x2.__array_interface__['data']
Out[60]: (181143288, False)
Also if I change the array at this point, it does not affect the earlier selections:
In [61]: arr[1] = 10
In [62]: arr
Out[62]: array([ 0, 10, 2, 3])
In [63]: x1
Out[63]: 1
x1 and x2 don't have the same id, and thus won't match with is, and they don't use the arr data buffer either. There's no record that either variable was derived from arr.
With slicing it is possible get a view of the original array,
In [64]: y = arr[1:2]
In [65]: y.__array_interface__
Out[65]:
{'data': (176486620, False),
'descr': [('', '<i4')],
'shape': (1,),
....}
In [66]: y
Out[66]: array([10])
In [67]: y[0]=4
In [68]: arr
Out[68]: array([0, 4, 2, 3])
In [69]: x1
Out[69]: 1
It's data pointer is 4 bytes larger than arr - that is, it points to the same buffer, just a different spot. And changing y does change arr (but not the independent x1).
I could even make a 0d view of this item
In [71]: z = y.reshape(())
In [72]: z
Out[72]: array(4)
In [73]: z[...]=0
In [74]: arr
Out[74]: array([0, 0, 2, 3])
In Python code we normally don't work with objects like this. When we use the c-api or cython is it possible to access the data buffer directly. nditer is an iteration mechanism that works with 0d objects like this (either in Python or the c-api). In cython typed memoryviews are particularly useful for low level access.
http://cython.readthedocs.io/en/latest/src/userguide/memoryviews.html
https://docs.scipy.org/doc/numpy/reference/arrays.nditer.html
https://docs.scipy.org/doc/numpy/reference/c-api.iterator.html#c.NpyIter
elementwise ==
In response to comment, Comparing NumPy object references
np.array([1]) == np.array([2]) will return array([False], dtype=bool)
== is defined for arrays as an elementwise operation. It compares the values of the respective elements and returns a matching boolean array.
If such a comparison needs to be used in a scalar context (such as an if) it needs to be reduced to a single value, as with np.all or np.any.
The is test compares object id's (not just for numpy objects). It has limited value in practical coding. I used it most often in expressions like is None, where None is an object with a unique id, and which does not play nicely with equality tests.
I think that you have a miss understanding about Numpy arrays. You think that sub arrays in a multidimensional array in Numpy (like in Python lists) are separate objects, well, they're not.
A Numpy array, regardless of its dimension is just one object. And that's because Numpy creates the arrays at C levels and when loads them up as a python object it can't be break down to multiple objects. That makes Python to create a new object for preserving new parts when you use some attributes like split(), __getitem__, take() or etc., which as a mater of fact, its just the way that python abstracts the list-like behavior for Numpy arrays.
You can also check thin in real-time like following:
In [7]: x
Out[7]:
array([[1, 2, 3],
[4, 5, 6]])
In [8]: x[0] is x[0]
Out[8]: False
So as soon as you have an array or any mutable object that can hols other object in it you'll have a python mutable object and therefore you will lose the performance and all other Numpy array's cool features.
Also as #Imanol mentioned in comments you may want to use Numpy view objects if you want to have a memory optimized and flexible operation when you want to modify an array(s) with reference(s). view objects can be constructed in following two ways:
a.view(some_dtype) or a.view(dtype=some_dtype) constructs a view of
the array’s memory with a different data-type. This can cause a
reinterpretation of the bytes of memory.
a.view(ndarray_subclass) or a.view(type=ndarray_subclass) just returns
an instance of ndarray_subclass that looks at the same array (same
shape, dtype, etc.) This does not cause a reinterpretation of the
memory.
For a.view(some_dtype), if some_dtype has a different number of bytes
per entry than the previous dtype (for example, converting a regular
array to a structured array), then the behavior of the view cannot be
predicted just from the superficial appearance of a (shown by
print(a)). It also depends on exactly how a is stored in memory.
Therefore if a is C-ordered versus fortran-ordered, versus defined as
a slice or transpose, etc., the view may give different results.
Not sure if it's useful at this point, but numpy.ndarray.ctypes seems to have useful bits:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.ctypes.html
Used something like this (missing dtype, but meh):
def is_same_array(a, b):
return (a.shape == b.shape) and (a == b).all() and a.ctypes.data == b.ctypes.data
here:
https://github.com/EricCousineau-TRI/repro/blob/a60daf899e9726daf2ca1259bb80ad2c7c9b3e3f/python/namedlist_alt.py#L111

Python, numpy.array slicing, altering array values with slices

I have a Task for numerical integration in which we approximate integral with quadrature formula. My problem is that the task needs me to avoid loops and use vectorized variant, which would be a slice?!
I have np.array object with n values and i have to alter each value of this array using a specific formula. The problem is that the value of this array at point i ist used in the formula to alter the position in. With a for loop it would be easy:
x = np.array([...])
for i in range(0,n):
x[i]=f(x[i]+a)*b`
(a,b some othe variables)
How do i do this with slices? I Have to do this for all elements of the array so it would be something like:
x[:]=f(x[???]+a)*b
And how do i get the right position from my array in to the formula? A slicing instruction like x[:] just runs through my whole object. Is there a way to somehow save the index i am currently at?
I tried to search but found nothing. The other problem is that i do not even know how to properly put the search request...
You may be confusing two issues
modifying all elements of an array
calculating values for all elements of an array
In
x = np.array([...])
for i in range(0,n):
x[i]=f(x[i]+a)*b`
you change elements of x one by one, and also pass them one by one to f.
x[:] = ... lets you change all elements of x at once, but the source (the right hand side of the equation) has to generate all those values. But usually you don't need to assign values. Instead just use x = .... It's just as fast and memory efficient.
Using x[:] on the RHS does nothing for you. If x is a list this makes a copy; if x is an array is just returns a view, an array with the same values.
The key question is, what does your f(...) function accept? If it uses operations like +, * and functions like np.sin, you can give it an array, and it will return an array.
But if it only works with scalars (that includes using functions like math.sin), the you have to feed it scalars, i.e. x[i].
Let's try to unpack that comment (which might be better as an edit to the original question)
I have an interval which has to be cut in picies.
x = np.linspace(start,end,pieceAmount)
function f
quadrature formula
b (weights or factors)
c (function values)
b1*f(x[i]+c1)+...+bn*f(x[i]+cn)
For example
In [1]: x = np.arange(5)
In [2]: b = np.arange(3)
In [6]: c = np.arange(4,7)*.1
We can do the x[i]+c for all x and c with broadcasting
In [7]: xc = x + c[:,None]
In [8]: xc
Out[8]:
array([[ 0.4, 1.4, 2.4, 3.4, 4.4],
[ 0.5, 1.5, 2.5, 3.5, 4.5],
[ 0.6, 1.6, 2.6, 3.6, 4.6]])
If f is a function like np.sin that takes any array, we can pass xc to that, getting back a like sized array.
Again with broadcasting we can do the b[n]*f(x[i]+c[n]) calculation
In [9]: b[:,None]* np.sin(xc)
Out[9]:
array([[ 0. , 0. , 0. , -0. , -0. ],
[ 0.47942554, 0.99749499, 0.59847214, -0.35078323, -0.97753012],
[ 1.12928495, 1.99914721, 1.03100274, -0.88504089, -1.98738201]])
and then we can sum, getting back an array shaped just like x:
In [10]: np.sum(_, axis=0)
Out[10]: array([ 1.60871049, 2.99664219, 1.62947489, -1.23582411, -2.96491212])
That's the dot or matrix product:
In [11]: b.dot(np.sin(xc))
Out[11]: array([ 1.60871049, 2.99664219, 1.62947489, -1.23582411, -2.96491212])
And as I noted earlier we can complete the action with
x = b.dot(f(x+c[:,None])
The key to a simple expression like this is f taking an array.

Numpy concatenating ints to string

I have a rather big numpy array of M*N ints and I want to end up with a M array of strings that have all N corresponding ints concatenated.
I tried using a view but this is probably not the way to go with numpy.
Hope this is what you want
numpy.apply_along_axis(numpy.array_str,0,array)
Look the documentation of apply_along_axis
http://docs.scipy.org/doc/numpy/reference/generated/numpy.apply_along_axis.html
and array_str
http://docs.scipy.org/doc/numpy/reference/generated/numpy.array_str.html
for deeper understanding
if you want to concatenating ints to string it wasnt be numpy array longer ! you will have a list with string indexes.
this is an example that concatenate 'a' to a numpy zero array :
>>> np.zeros(10)
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
>>> [str(i)+'a' for i in np.zeros(10)]
['0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a', '0.0a']
>>>
Without an example, your request is unclear. But here's one way of understanding it
In [13]: X=np.arange(12).reshape(3,4)
In [14]: np.array([''.join([str(i) for i in x]) for x in X])
Out[14]:
array(['0123', '4567', '891011'],
dtype='|S6')
I have a 3x4 array; I convert each element to a string using str(i), and use join to concatenate the strings of a row into one longer string.
That's not a very satisfying answer, especially when joining '9' to '10'. Of course it could be refined by elaborating on the 'int' to 'string' formatting (ie. fixed width), maybe adding delimiters in the 'join', etc.
In [21]: np.array([','.join(['{:*^8}'.format(i) for i in x]) for x in X])
Out[21]:
array(['***0****,***1****,***2****,***3****',
'***4****,***5****,***6****,***7****',
'***8****,***9****,***10***,***11***'],
dtype='|S35')
A view would only work if what you want is some sort of string to bytes representation, or str to char.

Categories

Resources