Say I have some matrix, W = MxN and a long array of indices z with shape of Mx1.
Now, assume I'd like to sum up the element of each row in W, excluding the index appears for that row in z.
1-d example:
import numpy as np
W = np.array([1.0, 2.0, 8.0])
z = 2
np.sum(np.delete(W,z))
MxN example and desired output:
import numpy as np
W = np.array([[1.0,2.0,8.0], [5.0,15.0,3.0]])
z = np.array([0,2]).reshape(2,1)
# desired output
# [10. 20.]
I tried to use np.delete and axis=1 with no success
I managed to get around it using tricks like:
W = np.array([[1.0,2.0,8.0], [5.0,15.0,3.0]])
z = np.array([0,2])
W[np.arange(z.shape[0]), z]=0
print(np.sum(W, axis=1))
# [10. 20.]
but I'm wondering if there's a more elegant way.
Using broadcasting to get the mask to simulate deletion and then sum-reduce -
(W*(z != np.arange(W.shape[-1]))).sum(-1)
Sample runs -
For 2D case :
In [61]: W = np.array([[1.0,2.0,8.0], [5.0,15.0,3.0]])
...: z = np.array([0,2]).reshape(2,1)
In [62]: (W*(z != np.arange(W.shape[-1]))).sum(-1)
Out[62]: array([10., 20.])
Works just as well for the 1D case :
In [59]: W = np.array([1.0, 2.0, 8.0])
...: z = 2
In [60]: (W*(z != np.arange(W.shape[-1]))).sum(-1)
Out[60]: 3.0
For 2D case :
With np.einsum for the sum-reduction -
In [53]: np.einsum('ij,ij->i',W,z != np.arange(W.shape[1]))
Out[53]: array([10., 20.])
Summing and then subtracting the z-indexed values for 2D case -
In [134]: W.sum(1) - np.take_along_axis(W,z,axis=1).squeeze(1)
Out[134]: array([10., 20.])
Extend to handle both 2D and 1D cases -
W.sum(-1)-np.take_along_axis(W,np.atleast_1d(z),axis=-1).squeeze(-1)
#Divaka answers are pretty good. I just give another perspective on your question. If you need masking to ignore certain indices and doing multiple operations on array, you should use numpy masked array np.ma.array instead of regular np.array. Masked array is truly for the purpose of ignore certain indices.
document of masked array for more info
z = np.array([0,2]).reshape(2,1)
W_ma = np.ma.array(W, mask=z == np.arange(W.shape[-1]))
In [36]: W_ma
Out[36]:
masked_array(
data=[[--, 2.0, 8.0],
[5.0, 15.0, --]],
mask=[[ True, False, False],
[False, False, True]],
fill_value=1e+20)
From this W_ma masked array, you may do almost all operations the same as np.array. For sum
W_ma.sum(1)
Out[44]:
masked_array(data=[10.0, 20.0],
mask=[False, False],
fill_value=1e+20)
To turn masked array to regular array, you may use compressed, filled, or compress_rowcols
In [46]: W_ma.sum(1).compressed()
Out[46]: array([10., 20.])
Note: I emphasize masked array is useful when you do multiple operations on ignore indices. If you only need to do one or two operations on ignore indices, there is no point to use masked array.
Related
I'm working on a tight binding model for graphene using pythTB. I want to incorporate spinfull elements in the calculation. The hamiltonian for the rashba hopping terms has the pauli spin matrix vector crossed with the site hopping vector.
Initially I created a list of matrices and crossed that with the vector, unfortunately this did not yield the correct result (I think that after the vector cross product was taken, then the cross product of the matrices were taken).
Next, I declared 3 symbols 's_x', 's_y', and 's_z' and used those instead of the matrices in my pauli spin matrix vector. After taking the cross product I received the correct result. The problem I am having is that I cannot substitute a matrix into the variable symbols I added in. Is it possible to do this? Or will I need to take the cross product manually?
Here is some of my code:
from __future__ import print_function
from pythtb import * # import TB model class
from sympy import symbols
import numpy as np
import matplotlib.pyplot as plt
# create list of pauli spin matrices
sx = [[0., 1.],[1., 0.]]
sy = [[0., -1.j],[1.j, 0.]]
sz = [[1., 0.],[0., -1.]]
Id = [[1., 0.], [0., 1.]]
s_pauli = np.zeros((4, 2, 2), dtype=complex)
s_pauli = [Id, sx, sy, sz]
# create s_pauli without identity matrix
s_pau = np.zeros((3, 2, 2), dtype=complex)
s_pau = [ s_x, s_y, s_z]
ab00 = [ 0.5, 0.28867513, 0.]
sig_x_ab00 = np.cross( s_pau, ab00)
If I print sig_x_ab00[2] (which is the only one I'm currently interested in), then I get:
0.288675134594813*s_x - 0.5*s_y
After obtaining that, I wanted to substitute s_pauli[1] for s_x and s_pauli[2] for s_y by doing the following command:
sig_x_ab00_ = sig_x_ab00.subs(s_x, s_pauli[1])
And I get the following error output:
AttributeError: 'numpy.ndarray' object has no attribute 'subs'
Is what I am doing at all valid? Or is there a better way to go about this?
Any input is much appreciated!
Thanks!
Let's run your code, but looking at each step. Don't make assumptions.
I'm using an isympy interactive environment; That ipython with sympy enhancements. I also imported np.
In [4]: ab00 = [ 0.5, 0.28867513, 0.]
In [5]: s_pauli
Out[5]:
[[[1.0, 0.0], [0.0, 1.0]],
[[0.0, 1.0], [1.0, 0.0]],
[[0.0, (-0-1j)], [1j, 0.0]],
[[1.0, 0.0], [0.0, -1.0]]]
This is a list. The previous np.zeros(...) expression does nothing. In Python we don't set the 'type' of a variable.
We can make an array from this list:
In [6]: np.array(s_pauli)
s_pauli[1] works because it is just list indexing.
And the added symbols:
In [11]: s_x, s_y, s_z = symbols('s_x s_y s_z')
In [12]: s_x
Out[12]: sₓ
In [13]: s_pau = [ s_x, s_y, s_z]
Again, s_pau is a list, not an array. When used in cross it will be turned into an array:
In [14]: np.array(s_pau)
Out[14]: array([s_x, s_y, s_z], dtype=object)
Note that is an object dtype array, which is still very much like a list. Some basic math works, because math like multiply and add are defined for the symbols. But transcendentals like np.log and np.sin don't work on such arrays.
cross just uses multiply and addition, so it works with these object arrays:
In [15]: sig = np.cross( s_pau, ab00)
In [16]: sig
Out[16]: array([-0.28867513*s_z, 0.5*s_z, 0.28867513*s_x - 0.5*s_y], dtype=object)
sig is a numpy array. It is not a sympy expression, and does not have a subs method. Again, it pays to pay close attention to what is happening.
The elements of the array are sympy expressions:
In [17]: sig[2]
Out[17]: 0.28867513⋅sₓ - 0.5⋅s_y
In [20]: s2 = sig[2]
subs with a scalar value works:
In [22]: s2.subs(s_x, 1)
Out[22]: 0.28867513 - 0.5⋅s_y
but not with a list
In [23]: s2.subs(s_x, s_pauli[1])
Out[23]: 0.28867513⋅sₓ - 0.5⋅s_y
However if I make sympy matrix from it:
In [24]: s_pauli[1]
Out[24]: [[0.0, 1.0], [1.0, 0.0]]
In [25]: Matrix(s_pauli[1])
Out[25]:
⎡0.0 1.0⎤
⎢ ⎥
⎣1.0 0.0⎦
In [26]: s2.subs(s_x, Out[25])
Out[26]:
⎡ 0 0.28867513⎤
-0.5⋅s_y + ⎢ ⎥
⎣0.28867513 0 ⎦
The substitution does work.
In general mixing sympy and numpy is hit-or-miss; something work, almost more by accident than by design. Others don't. sympy.lambdify is the most reliable way of making a function that will work with numpy arrays.
In this case I suspect you'd be better of using a sympy version of cross, and doing the sympy.Matrix substitutions.
I am porting some Matlab code to python and I have the following statement in Matlab:
cross([pt1,1]-[pp,0],[pt2,1]-[pp,0]);
pt1, pt2 and pp are 2D points.
So, my corresponding python code looks as follows:
np.cross(np.c_[pt1 - pp, 1], np.c_[pt2 - pp, 1])
The points are defined as:
pt1 = np.asarray((440.0, 59.0))
pt2 = np.asarray((-2546.23, 591.03))
pp = np.asarray([563., 456.5])
When I execute the statement with the cross product, I get the following error:
ValueError: all the input array dimensions except for the concatenation axis must match exactly
So looking at some other posts, here I thought I would try np.column_stack but I get the same error:
np.cross(np.column_stack((pt1 - pp, 1)), np.column_stack((pt2 - pp, 1)))
This might be what you are looking for:
np.cross(np.append(pt1-pp, 1), np.append(pt2-pp, 1))
If you use np.r_ instead it works:
In [40]: np.cross(np.r_[pt1 - pp, 1], np.r_[pt2 - pp, 1])
Out[40]: array([-5.32030000e+02, -2.98623000e+03, -1.25246611e+06])
Your pt1 and pp are (2,) arrays. To add a 1 to them you need to use a 1d concatenate, np.r_ for 'row', as opposed to columns.
There are lots of ways of constructing a 3 element array:
In [43]: np.r_[pt1 - pp, 1]
Out[43]: array([-123. , -397.5, 1. ])
In [44]: np.append(pt1 - pp, 1)
Out[44]: array([-123. , -397.5, 1. ])
In [45]: np.concatenate((pt1 - pp, [1]))
Out[45]: array([-123. , -397.5, 1. ])
concatenate is the base operation. The others tweak the 1 to produce a 1d array that can be joined with the (2,) shape array to make a (3,).
Concatenate turns all of its inputs into arrays, if they aren't already: np.concatenate((pt1 - pp, np.array([1]))).
Note that np.c_ docs say it is the equivalent of
np.r_['-1,2,0', index expression]
That initial string expression is a bit complicated. The key point is it tries to concatenate 2d arrays (whereas your pt1 is 1d).
It is like column_stack, joiningn(2,1)arrays to make a(2,n)` array.
In [48]: np.c_[pt1, pt2]
Out[48]:
array([[ 440. , -2546.23],
[ 59. , 591.03]])
In [50]: np.column_stack((pt1, pt2))
Out[50]:
array([[ 440. , -2546.23],
[ 59. , 591.03]])
In MATLAB everything has at least 2 dimensions, and because it is Fortran based, the outer dimensions are last. So in a sense its most natural 'vector' shape is n x 1, a column matrix. numpy is built on Python, with a natural interface to its scalars and nested lists. Order is c based; the initial dimensions are outer most. So numpy code can have true scalars (Python numbers without shape or size), or arrays with 0 or more dimensions. A 'vector' most naturally has shape (n,) (a 1 element tuple). It can easily be reshaped to (1,n) or (n,1) if needed.
If you want a (3,1) array (instead of (3,) shaped), you'd need to use some sort of 'vertical' concatenation, joining a (2,1) array with a (1,1):
In [51]: np.r_['0,2,0', pt1-pp, 1]
Out[51]:
array([[-123. ],
[-397.5],
[ 1. ]])
In [53]: np.vstack([(pt1-pp)[:,None], 1])
Out[53]:
array([[-123. ],
[-397.5],
[ 1. ]])
(But np.cross wants (n,3) or (3,) arrays, not (3,1)!)
In [58]: np.cross(np.r_['0,2,0', pt1-pp, 1], np.r_['0,2,0', pt2-pp, 1])
...
ValueError: incompatible dimensions for cross product
(dimension must be 2 or 3)
To get around this specify an axis:
In [59]: np.cross(np.r_['0,2,0', pt1-pp, 1], np.r_['0,2,0', pt2-pp, 1], axis=0)
Out[59]:
array([[-5.32030000e+02],
[-2.98623000e+03],
[-1.25246611e+06]])
Study np.cross if you want an example of manipulating dimensions. In this axis=0 case it transposes the arrays so they are (1,3) and then does the calculation.
Let's say I have a standard 2d numpy array, let's call it my2darray with values. In this array there are two major sections. Let's say for each column, there is a specific row which separates "scenario1" and "scenario2". How can i create 2 masked arrays that represent the top section of my2darray and the bottom of my2darray. For example, i am interested in calculating the mean of the top half and the mean of the second half. One idea is to have a mask that is of the same shape as my2darray but that seems like a waste of memory. Is there a better idea? Let's say I have a vector, in which the length is equal to the number of rows in my2darray (in this case 6), i.e. I have
myvector=np.array([9, 15, 5,7,11,11])
I am using python 2.6 with numpy 1.5.0
Using NumPy's broadcasted comparison, we can create such a 2D mask in a vectorized manner. Rest of the work is all about sum-reduction along the first axis for which we can take help from np.einsum. Thus, we would have an implementation like so -
N = my2darray.shape[0]
mask = myvector <= np.arange(N)[:,None]
uout = np.true_divide(np.einsum('ij,ij->j',my2darray,~mask),myvector)
lout = np.true_divide(np.einsum('ij,ij->j',my2darray,mask),N-myvector)
Sample run to verify results -
In [184]: N = my2darray.shape[0]
...: mask = myvector <= np.arange(N)[:,None]
...: uout = np.true_divide(np.einsum('ij,ij->j',my2darray,~mask),myvector)
...: lout = np.true_divide(np.einsum('ij,ij->j',my2darray,mask),N-myvector)
...:
In [185]: uout
Out[185]: array([ 6. , 4.6, 4. , 0. ])
In [186]: [my2darray[:item,i].mean() for i,item in enumerate(myvector)]
Out[186]: [6.0, 4.5999999999999996, 4.0, 0.0] # Loopy version results
In [187]: lout
Out[187]: array([ 5.2 , 4. , 2.66666667, 2. ])
In [188]: [my2darray[item:,i].mean() for i,item in enumerate(myvector)]
Out[188]: [5.2000000000000002, 4.0, 2.6666666666666665, 2.0] # Loopy version
Another potentially faster way would be to calculate the summations for the upper mask, store it and from it, subtract the sum along the first axis along the entire length of the 2D input array. This could be then used for the calculation of the lower part average. Thus, after we store N and calculate mask, we would have -
usum = np.einsum('ij,ij->j',my2darray,~mask)
uout = np.true_divide(usums,myvector)
lout = np.true_divide(my2darray.sum(0) - usums,N-myvector)
Say I have a numpy array that has some float('nan'), I don't want to impute those data now and I want to first normalize those and keep the NaN data at the original space, is there any way I can do that?
Previously I used normalize function in sklearn.Preprocessing, but that function seems can't take any NaN contained array as input.
You can mask your array using the numpy.ma.array function and subsequently apply any numpy operation:
import numpy as np
a = np.random.rand(10) # Generate random data.
a = np.where(a > 0.8, np.nan, a) # Set all data larger than 0.8 to NaN
a = np.ma.array(a, mask=np.isnan(a)) # Use a mask to mark the NaNs
a_norm = a / np.sum(a) # The sum function ignores the masked values.
a_norm2 = a / np.std(a) # The std function ignores the masked values.
You can still access your raw data:
print a.data
You can use numpy.nansum to compute the norm and ignore nan:
In [54]: x
Out[54]: array([ 1., 2., nan, 3.])
Here's the norm with nan ignored:
In [55]: np.sqrt(np.nansum(np.square(x)))
Out[55]: 3.7416573867739413
y is the normalized array:
In [56]: y = x / np.sqrt(np.nansum(np.square(x)))
In [57]: y
Out[57]: array([ 0.26726124, 0.53452248, nan, 0.80178373])
In [58]: np.linalg.norm(y[~np.isnan(y)])
Out[58]: 1.0
The nansum and np.ma.array answers are good options, however, those functions are not as commonly used or explicit (IMHO) as the following:
import numpy as np
def rms(arr):
arr = np.array(arr) # Sanitize the input
np.sqrt(np.mean(np.square(arr[np.isfinite(arr)]))) #root-mean-square
print(rms([np.nan,-1,0,1]))
I have two one-dimensional numpy matrices:
[[ 0.69 0.41]] and [[ 0.81818182 0.18181818]]
I want to multiply these two to get the result
[[0.883, 0.117]] (the result is normalized)
If I use np.dot I get ValueError: matrices are not aligned
Does anybody have an idea what I am doing wrong?
EDIT
I solved it in a kind of hacky way, but it worked for me, regardless of if there is a better solution or not.
new_matrix = np.matrix([ a[0,0] * b[0,0], a[0,1] * b[0,1] ])
It seems you want to do element-wise math. Numpy arrays do this by default.
In [1]: import numpy as np
In [2]: a = np.matrix([.69,.41])
In [3]: b = np.matrix([ 0.81818182, 0.18181818])
In [4]: np.asarray(a) * np.asarray(b)
Out[4]: array([[ 0.56454546, 0.07454545]])
In [5]: np.matrix(_)
Out[5]: matrix([[ 0.56454546, 0.07454545]])