numpy save/load corrupting an array - python

I am trying to save a large numpy array and reload it. Using numpy.save and numpy.load, the array values are corrupted/change. The shape and data type of the array pre-saving, and post-loading, are the same, but the post-loading array has the vast majority of the values zeroed.
The array is (22915,22915), values are float64's, takes 3.94 gb's as a .npy file, and the data entries average about .1 (not tiny floats that might reasonably get converted to zeroes). I am using numpy 1.5.1.
Any help on why this corruption is occurring would be greatly appreciated because I am at a loss. Below is some code providing evidence of the claims above.
In [7]: m
Out[7]:
array([[ 0. , 0.02023, 0.00703, ..., 0.02362, 0.02939, 0.03656],
[ 0.02023, 0. , 0.0135 , ..., 0.04357, 0.04934, 0.05651],
[ 0.00703, 0.0135 , 0. , ..., 0.03037, 0.03614, 0.04331],
...,
[ 0.02362, 0.04357, 0.03037, ..., 0. , 0.01797, 0.02514],
[ 0.02939, 0.04934, 0.03614, ..., 0.01797, 0. , 0.01919],
[ 0.03656, 0.05651, 0.04331, ..., 0.02514, 0.01919, 0. ]])
In [8]: m.shape
Out[8]: (22195, 22195)
In [12]: save('/Users/will/Desktop/m.npy',m)
In [14]: lm = load('/Users/will/Desktop/m.npy')
In [15]: lm
Out[15]:
array([[ 0. , 0.02023, 0.00703, ..., 0. , 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],
...,
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ],
[ 0. , 0. , 0. , ..., 0. , 0. , 0. ]])
In [17]: type(lm[0][0])
Out[17]: numpy.float64
In [18]: type(m[0][0])
Out[18]: numpy.float64
In [19]: lm.shape
Out[19]: (22195, 22195)

This is a known issue (note that that links against numpy 1.4). If you really can't upgrade, my advice would be to try to save in a different way (savez, savetxt). If getbuffer is available you can try to write the bytes directly. If all else fails (and you can't upgrade), you can write your own save function pretty easily.

Related

Printing locations containing non-zero elements in Python

The following code prints row numbers solution1 which have at least one non-zero element. However, corresponding to these row numbers, how do I also print which locations have non-zero elements solution2 as shown in the expected output.? For instance, row 1 has non-zero elements at locations [1,3,4,6], row 2 has non-zero elements at locations [0,2,3,5].
import numpy as np
A=np.array([[ 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. ],
[ 0. , 423.81345923, 0. , 407.01354328,
419.14952534, 0. , 212.13245959, 0. ,
0. , 0. , 0. , 0. ],
[402.93473651, 0. , 216.08166277, 407.01354328,
0. , 414.17017965, 0. , 0. ,
0. , 0. , 0. , 0. ]])
solution1 = []
for idx, e in enumerate(A):
if any(e):
solution1.append(idx)
print("solution 1 =",solution1)
The current output is
solution 1 = [1,2]
The expected output is
solution 1 = [1,2]
solution 2 = [[1,3,4,6],[0,2,3,5]]
Use np.where to find all coordinates for non zero values first, and then split y index by rows:
idx, idy = np.where(A)
np.split(idy, np.flatnonzero(np.diff(idx) != 0) + 1)
# [array([1, 3, 4, 6], dtype=int32), array([0, 2, 3, 5], dtype=int32)]

Iterate over rows, and perform addition

So, here I have a numpy array, array([[-1.228, 0.709, 0. ], [ 0. , 2.836, 0. ], [ 1.228, 0.709, 0. ]]). What my plan is to perform addition to all the rows of this array with a vector (say [1,2,3]), and then append the result onto the end of it i.e the addition of another three rows? I want to perform the same process, like 5 times, so that the vector is added only to the last three rows, which were the result of the previous calculation(addition). Any suggestions?
Just use np.append along the first axis:
import numpy as np
a = np.array([[-1.228, 0.709, 0. ], [ 0. , 2.836, 0. ], [ 1.228, 0.709, 0. ]])
v = np.array([1, 2, 3])
new_a = np.append(a, a+v, axis=0)
For the addition part, just write something like a[0]+[1,2,3] (where a is your array), numpy will perform addition element-wise as expected.
For appending a=np.append(a, [line], axis=1) is what you're looking for, where line is the new line you want to add, for example the result of the previous sum.
The iteration can be easily repeated selecting the last three rows thanks to negative indexing: if you use a[-1], a[-2] and a[-3] you'll be sure to pick the last three lines
If you really need to keep results within a single array, a better option is to create it at the beginning and perform operations you need on it.
arr = np.array([[-1.228, 0.709, 0. ], [ 0. , 2.836, 0. ], [ 1.228, 0.709, 0. ]])
vector = np.array([1,2,3])
N = 5
multiarr = np.tile(arr, (1,N))
>>> multiarr
array([[-1.228, 0.709, 0. , -1.228, 0.709, 0. , -1.228, 0.709, 0. , -1.228, 0.709, 0. , -1.228, 0.709, 0. ],
[ 0. , 2.836, 0. , 0. , 2.836, 0. , 0. , 2.836, 0. , 0. , 2.836, 0. , 0. , 2.836, 0. ],
[ 1.228, 0.709, 0. , 1.228, 0.709, 0. , 1.228, 0.709, 0. , 1.228, 0.709, 0. , 1.228, 0.709, 0. ]])
multivector = (vector * np.arange(N)[:, None]).ravel()
>>> multivector
array([ 0, 0, 0, 1, 2, 3, 2, 4, 6, 3, 6, 9, 4, 8, 12])
>>> multiarr + multivector
array([[-1.228, 0.709, 0. , -0.228, 2.709, 3. , 0.772, 4.709, 6. , 1.772, 6.709, 9. , 2.772, 8.709, 12. ],
[ 0. , 2.836, 0. , 1. , 4.836, 3. , 2. , 6.836, 6. , 3. , 8.836, 9. , 4. , 10.836, 12. ],
[ 1.228, 0.709, 0. , 2.228, 2.709, 3. , 3.228, 4.709, 6. , 4.228, 6.709, 9. , 5.228, 8.709, 12. ]])

How to find minimum value in each row while keeping array dimensions same using numpy?

I've the following array:
np.array([[0.07704314, 0.46752589, 0.39533099, 0.35752864],
[0.45813299, 0.02914078, 0.65307364, 0.58732429],
[0.32757561, 0.32946822, 0.59821108, 0.45585825],
[0.49054429, 0.68553148, 0.26657932, 0.38495586]])
I want to find the minimum value in each row of the array. How can I achieve this?
Expected answer:
[[0.07704314 0. 0. 0. ]
[0. 0.02914078 0. 0. ]
[0.32757561 0 0. 0. ]
[0. 0. 0.26657932 0. ]]
You can use np.where like so:
np.where(a.argmin(1)[:,None]==np.arange(a.shape[1]), a, 0)
Or (more lines but potentially more efficient):
out = np.zeros_like(a)
idx = a.argmin(1)[:, None]
np.put_along_axis(out, idx, np.take_along_axis(a, idx, 1), 1)
IIUC first find out out the min value of each line , then we base on the min value mask all min value in original array as True, using multiple(matrix) , get what we need as result
np.multiply(a,a==np.min(a,1)[:,None])
Out[225]:
array([[0.07704314, 0. , 0. , 0. ],
[0. , 0.02914078, 0. , 0. ],
[0.32757561, 0. , 0. , 0. ],
[0. , 0. , 0.26657932, 0. ]])
np.amin(a, axis=1) where a is your np array

How to construct *array of lists of arrays* when the nested arrays have the same length?

I want to construct an array of list of arrays but my method breaks down if the arrays inside the list have equal dimensions.
I make a list of lists of arrays
and then I convert that list into an array
import numpy as np
foo = [[np.array([4. , 0. , 0.1]), np.array([5. , 0. , 0.1])],
[np.array([6. , 0. , 0.5])],
[],
[],
[]]
foo = np.array(foo)
foo
which results in an array of list of arrays:
Out[40]:
array([list([array([4. , 0. , 0.1]), array([5. , 0. , 0.1])]),
list([array([6. , 0. , 0.5])]), list([]), list([]), list([])],
dtype=object)
perfect.
Now consider the case where array dimensions are equal:
bar = [[np.array([4. , 0. , 0.1])],
[np.array([4. , 0. , 0.1])],
[np.array([4. , 0. , 0.1])],
[np.array([4. , 0. , 0.1])],
[np.array([4. , 0. , 0.1])]]
bar = np.array(bar)
print(bar)
The same method results in merely nested arrays.
Out[1]:
array([[[4. , 0. , 0.1]],
[[4. , 0. , 0.1]],
[[4. , 0. , 0.1]],
[[4. , 0. , 0.1]],
[[4. , 0. , 0.1]]])
Is there a way to make bar an array of list of arrays?
(I would like to have this format because I am appending more arrays to the individual lists)

Numpy: Fastest way of computing diagonal for each row of a 2d array

Given a 2d Numpy array, I would like to be able to compute the diagonal for each row in the fastest way possible, I'm right now using a list comprehension but I'm wondering if it can be vectorised somehow?
For example using the following M array:
M = np.random.rand(5, 3)
[[ 0.25891593 0.07299478 0.36586996]
[ 0.30851087 0.37131459 0.16274825]
[ 0.71061831 0.67718718 0.09562581]
[ 0.71588836 0.76772047 0.15476079]
[ 0.92985142 0.22263399 0.88027331]]
I would like to compute the following array:
np.array([np.diag(row) for row in M])
array([[[ 0.25891593, 0. , 0. ],
[ 0. , 0.07299478, 0. ],
[ 0. , 0. , 0.36586996]],
[[ 0.30851087, 0. , 0. ],
[ 0. , 0.37131459, 0. ],
[ 0. , 0. , 0.16274825]],
[[ 0.71061831, 0. , 0. ],
[ 0. , 0.67718718, 0. ],
[ 0. , 0. , 0.09562581]],
[[ 0.71588836, 0. , 0. ],
[ 0. , 0.76772047, 0. ],
[ 0. , 0. , 0.15476079]],
[[ 0.92985142, 0. , 0. ],
[ 0. , 0.22263399, 0. ],
[ 0. , 0. , 0.88027331]]])
Here's one way using element-wise multiplication of np.eye(3) (the 3x3 identity array) and a slightly re-shaped M:
>>> M = np.random.rand(5, 3)
>>> np.eye(3) * M[:,np.newaxis,:]
array([[[ 0.42527357, 0. , 0. ],
[ 0. , 0.17557419, 0. ],
[ 0. , 0. , 0.61920924]],
[[ 0.04991268, 0. , 0. ],
[ 0. , 0.74000307, 0. ],
[ 0. , 0. , 0.34541354]],
[[ 0.71464307, 0. , 0. ],
[ 0. , 0.11878955, 0. ],
[ 0. , 0. , 0.65411844]],
[[ 0.01699954, 0. , 0. ],
[ 0. , 0.39927673, 0. ],
[ 0. , 0. , 0.14378892]],
[[ 0.5209439 , 0. , 0. ],
[ 0. , 0.34520876, 0. ],
[ 0. , 0. , 0.53862677]]])
(By "re-shaped M" I mean that the rows of M are made to face out along the z-axis rather than across the y-axis, giving M the shape (5, 1, 3).)
Despite the good answer of #ajcr, a much faster alternative can be achieved with fancy indexing (tested in NumPy 1.9.0):
import numpy as np
def sol0(M):
return np.eye(M.shape[1]) * M[:,np.newaxis,:]
def sol1(M):
b = np.zeros((M.shape[0], M.shape[1], M.shape[1]))
diag = np.arange(M.shape[1])
b[:, diag, diag] = M
return b
where the timing shows this is approximately 4X faster:
M = np.random.random((1000, 3))
%timeit sol0(M)
#10000 loops, best of 3: 111 µs per loop
%timeit sol1(M)
#10000 loops, best of 3: 23.8 µs per loop

Categories

Resources