So, here I have a numpy array, array([[-1.228, 0.709, 0. ], [ 0. , 2.836, 0. ], [ 1.228, 0.709, 0. ]]). What my plan is to perform addition to all the rows of this array with a vector (say [1,2,3]), and then append the result onto the end of it i.e the addition of another three rows? I want to perform the same process, like 5 times, so that the vector is added only to the last three rows, which were the result of the previous calculation(addition). Any suggestions?
Just use np.append along the first axis:
import numpy as np
a = np.array([[-1.228, 0.709, 0. ], [ 0. , 2.836, 0. ], [ 1.228, 0.709, 0. ]])
v = np.array([1, 2, 3])
new_a = np.append(a, a+v, axis=0)
For the addition part, just write something like a[0]+[1,2,3] (where a is your array), numpy will perform addition element-wise as expected.
For appending a=np.append(a, [line], axis=1) is what you're looking for, where line is the new line you want to add, for example the result of the previous sum.
The iteration can be easily repeated selecting the last three rows thanks to negative indexing: if you use a[-1], a[-2] and a[-3] you'll be sure to pick the last three lines
If you really need to keep results within a single array, a better option is to create it at the beginning and perform operations you need on it.
arr = np.array([[-1.228, 0.709, 0. ], [ 0. , 2.836, 0. ], [ 1.228, 0.709, 0. ]])
vector = np.array([1,2,3])
N = 5
multiarr = np.tile(arr, (1,N))
>>> multiarr
array([[-1.228, 0.709, 0. , -1.228, 0.709, 0. , -1.228, 0.709, 0. , -1.228, 0.709, 0. , -1.228, 0.709, 0. ],
[ 0. , 2.836, 0. , 0. , 2.836, 0. , 0. , 2.836, 0. , 0. , 2.836, 0. , 0. , 2.836, 0. ],
[ 1.228, 0.709, 0. , 1.228, 0.709, 0. , 1.228, 0.709, 0. , 1.228, 0.709, 0. , 1.228, 0.709, 0. ]])
multivector = (vector * np.arange(N)[:, None]).ravel()
>>> multivector
array([ 0, 0, 0, 1, 2, 3, 2, 4, 6, 3, 6, 9, 4, 8, 12])
>>> multiarr + multivector
array([[-1.228, 0.709, 0. , -0.228, 2.709, 3. , 0.772, 4.709, 6. , 1.772, 6.709, 9. , 2.772, 8.709, 12. ],
[ 0. , 2.836, 0. , 1. , 4.836, 3. , 2. , 6.836, 6. , 3. , 8.836, 9. , 4. , 10.836, 12. ],
[ 1.228, 0.709, 0. , 2.228, 2.709, 3. , 3.228, 4.709, 6. , 4.228, 6.709, 9. , 5.228, 8.709, 12. ]])
Related
The following code prints row numbers solution1 which have at least one non-zero element. However, corresponding to these row numbers, how do I also print which locations have non-zero elements solution2 as shown in the expected output.? For instance, row 1 has non-zero elements at locations [1,3,4,6], row 2 has non-zero elements at locations [0,2,3,5].
import numpy as np
A=np.array([[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ],
[ 0. , 423.81345923, 0. , 407.01354328,
419.14952534, 0. , 212.13245959, 0. ,
0. , 0. , 0. , 0. ],
[402.93473651, 0. , 216.08166277, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ]])
solution1 = []
for idx, e in enumerate(A):
if any(e):
solution1.append(idx)
print("solution 1 =",solution1)
The current output is
solution 1 = [1,2]
The expected output is
solution 1 = [1,2]
solution 2 = [[1,3,4,6],[0,2,3,5]]
Use np.where to find all coordinates for non zero values first, and then split y index by rows:
idx, idy = np.where(A)
np.split(idy, np.flatnonzero(np.diff(idx) != 0) + 1)
# [array([1, 3, 4, 6], dtype=int32), array([0, 2, 3, 5], dtype=int32)]
I want to construct an array of list of arrays but my method breaks down if the arrays inside the list have equal dimensions.
I make a list of lists of arrays
and then I convert that list into an array
import numpy as np
foo = [[np.array([4. , 0. , 0.1]), np.array([5. , 0. , 0.1])],
[np.array([6. , 0. , 0.5])],
[],
[],
[]]
foo = np.array(foo)
foo
which results in an array of list of arrays:
Out[40]:
array([list([array([4. , 0. , 0.1]), array([5. , 0. , 0.1])]),
list([array([6. , 0. , 0.5])]), list([]), list([]), list([])],
dtype=object)
perfect.
Now consider the case where array dimensions are equal:
bar = [[np.array([4. , 0. , 0.1])],
[np.array([4. , 0. , 0.1])],
[np.array([4. , 0. , 0.1])],
[np.array([4. , 0. , 0.1])],
[np.array([4. , 0. , 0.1])]]
bar = np.array(bar)
print(bar)
The same method results in merely nested arrays.
Out[1]:
array([[[4. , 0. , 0.1]],
[[4. , 0. , 0.1]],
[[4. , 0. , 0.1]],
[[4. , 0. , 0.1]],
[[4. , 0. , 0.1]]])
Is there a way to make bar an array of list of arrays?
(I would like to have this format because I am appending more arrays to the individual lists)
I recently posted a question here which was answered exactly as I asked. However, I think I overestimated my ability to manipulate the answer further. I read the broadcasting doc, and followed a few links that led me way back to 2002 about numpy broadcasting.
I've used the second method of array creation using broadcasting:
N = 10
out = np.zeros((N**3,4),dtype=int)
out[:,:3] = (np.arange(N**3)[:,None]/[N**2,N,1])%N
which outputs:
[[0,0,0,0]
[0,0,1,0]
...
[0,1,0,0]
[0,1,1,0]
...
[9,9,8,0]
[9,9,9,0]]
but I do not understand via the docs how to manipulate that. I would ideally like to be able to set the increments in which each individual column changes.
ex. Column A changes by 0.5 up to 2, column B changes by 0.2 up to 1, and column C changes by 1 up to 10.
[[0,0,0,0]
[0,0,1,0]
...
[0,0,9,0]
[0,0.2,0,0]
...
[0,0.8,9,0]
[0.5,0,0,0]
...
[1.5,0.8,9,0]]
Thanks for any help.
You can adjust your current code just a little bit to make it work.
>>> out = np.zeros((4*5*10,4))
>>> out[:,:3] = (np.arange(4*5*10)[:,None]//(5*10, 10, 1)*(0.5, 0.2, 1)%(2, 1, 10))
>>> out
array([[ 0. , 0. , 0. , 0. ],
[ 0. , 0. , 1. , 0. ],
[ 0. , 0. , 2. , 0. ],
...
[ 0. , 0. , 8. , 0. ],
[ 0. , 0. , 9. , 0. ],
[ 0. , 0.2, 0. , 0. ],
...
[ 0. , 0.8, 9. , 0. ],
[ 0.5, 0. , 0. , 0. ],
...
[ 1.5, 0.8, 9. , 0. ]])
The changes are:
No int dtype on the array, since we need it to hold floats in some columns. You could specify a float dtype if you want (or even something more complicated that only allows floats in the first two columns).
Rather than N**3 total values, figure out the number of distinct values for each column, and multiply them together to get our total size. This is used for both zeros and arange.
Use the floor division // operator in the first broadcast operation because we want integers at this point, but later we'll want floats.
The values to divide by are again based on the number of values for the later columns (e.g. for A,B,C numbers of values, divide by B*C, C, 1).
Add a new broadcast operation to multiply by various scale factors (how much each value increases at once).
Change the values in the broadcast mod % operation to match the bounds on each column.
This small example helps me understand what is going on:
In [123]: N=2
In [124]: np.arange(N**3)[:,None]/[N**2, N, 1]
Out[124]:
array([[ 0. , 0. , 0. ],
[ 0.25, 0.5 , 1. ],
[ 0.5 , 1. , 2. ],
[ 0.75, 1.5 , 3. ],
[ 1. , 2. , 4. ],
[ 1.25, 2.5 , 5. ],
[ 1.5 , 3. , 6. ],
[ 1.75, 3.5 , 7. ]])
So we generate a range of numbers (0 to 7) and divide them by 4,2, and 1.
The rest of the calculation just changes each value without further broadcasting
Apply %N to each element
In [126]: np.arange(N**3)[:,None]/[N**2, N, 1]%N
Out[126]:
array([[ 0. , 0. , 0. ],
[ 0.25, 0.5 , 1. ],
[ 0.5 , 1. , 0. ],
[ 0.75, 1.5 , 1. ],
[ 1. , 0. , 0. ],
[ 1.25, 0.5 , 1. ],
[ 1.5 , 1. , 0. ],
[ 1.75, 1.5 , 1. ]])
Assigning to an int array is the same as converting the floats to integers:
In [127]: (np.arange(N**3)[:,None]/[N**2, N, 1]%N).astype(int)
Out[127]:
array([[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]])
Given a 2d Numpy array, I would like to be able to compute the diagonal for each row in the fastest way possible, I'm right now using a list comprehension but I'm wondering if it can be vectorised somehow?
For example using the following M array:
M = np.random.rand(5, 3)
[[ 0.25891593 0.07299478 0.36586996]
[ 0.30851087 0.37131459 0.16274825]
[ 0.71061831 0.67718718 0.09562581]
[ 0.71588836 0.76772047 0.15476079]
[ 0.92985142 0.22263399 0.88027331]]
I would like to compute the following array:
np.array([np.diag(row) for row in M])
array([[[ 0.25891593, 0. , 0. ],
[ 0. , 0.07299478, 0. ],
[ 0. , 0. , 0.36586996]],
[[ 0.30851087, 0. , 0. ],
[ 0. , 0.37131459, 0. ],
[ 0. , 0. , 0.16274825]],
[[ 0.71061831, 0. , 0. ],
[ 0. , 0.67718718, 0. ],
[ 0. , 0. , 0.09562581]],
[[ 0.71588836, 0. , 0. ],
[ 0. , 0.76772047, 0. ],
[ 0. , 0. , 0.15476079]],
[[ 0.92985142, 0. , 0. ],
[ 0. , 0.22263399, 0. ],
[ 0. , 0. , 0.88027331]]])
Here's one way using element-wise multiplication of np.eye(3) (the 3x3 identity array) and a slightly re-shaped M:
>>> M = np.random.rand(5, 3)
>>> np.eye(3) * M[:,np.newaxis,:]
array([[[ 0.42527357, 0. , 0. ],
[ 0. , 0.17557419, 0. ],
[ 0. , 0. , 0.61920924]],
[[ 0.04991268, 0. , 0. ],
[ 0. , 0.74000307, 0. ],
[ 0. , 0. , 0.34541354]],
[[ 0.71464307, 0. , 0. ],
[ 0. , 0.11878955, 0. ],
[ 0. , 0. , 0.65411844]],
[[ 0.01699954, 0. , 0. ],
[ 0. , 0.39927673, 0. ],
[ 0. , 0. , 0.14378892]],
[[ 0.5209439 , 0. , 0. ],
[ 0. , 0.34520876, 0. ],
[ 0. , 0. , 0.53862677]]])
(By "re-shaped M" I mean that the rows of M are made to face out along the z-axis rather than across the y-axis, giving M the shape (5, 1, 3).)
Despite the good answer of #ajcr, a much faster alternative can be achieved with fancy indexing (tested in NumPy 1.9.0):
import numpy as np
def sol0(M):
return np.eye(M.shape[1]) * M[:,np.newaxis,:]
def sol1(M):
b = np.zeros((M.shape[0], M.shape[1], M.shape[1]))
diag = np.arange(M.shape[1])
b[:, diag, diag] = M
return b
where the timing shows this is approximately 4X faster:
M = np.random.random((1000, 3))
%timeit sol0(M)
#10000 loops, best of 3: 111 µs per loop
%timeit sol1(M)
#10000 loops, best of 3: 23.8 µs per loop
I am trying to save a large numpy array and reload it. Using numpy.save and numpy.load, the array values are corrupted/change. The shape and data type of the array pre-saving, and post-loading, are the same, but the post-loading array has the vast majority of the values zeroed.
The array is (22915,22915), values are float64's, takes 3.94 gb's as a .npy file, and the data entries average about .1 (not tiny floats that might reasonably get converted to zeroes). I am using numpy 1.5.1.
Any help on why this corruption is occurring would be greatly appreciated because I am at a loss. Below is some code providing evidence of the claims above.
In [7]: m
Out[7]:
array([[ 0. , 0.02023, 0.00703, ..., 0.02362, 0.02939, 0.03656],
[ 0.02023, 0. , 0.0135 , ..., 0.04357, 0.04934, 0.05651],
[ 0.00703, 0.0135 , 0. , ..., 0.03037, 0.03614, 0.04331],
...,
[ 0.02362, 0.04357, 0.03037, ..., 0. , 0.01797, 0.02514],
[ 0.02939, 0.04934, 0.03614, ..., 0.01797, 0. , 0.01919],
[ 0.03656, 0.05651, 0.04331, ..., 0.02514, 0.01919, 0. ]])
In [8]: m.shape
Out[8]: (22195, 22195)
In [12]: save('/Users/will/Desktop/m.npy',m)
In [14]: lm = load('/Users/will/Desktop/m.npy')
In [15]: lm
Out[15]:
array([[ 0. , 0.02023, 0.00703, ..., 0. , 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],
...,
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ]])
In [17]: type(lm[0][0])
Out[17]: numpy.float64
In [18]: type(m[0][0])
Out[18]: numpy.float64
In [19]: lm.shape
Out[19]: (22195, 22195)
This is a known issue (note that that links against numpy 1.4). If you really can't upgrade, my advice would be to try to save in a different way (savez, savetxt). If getbuffer is available you can try to write the bytes directly. If all else fails (and you can't upgrade), you can write your own save function pretty easily.