how to rearrange value order in numpy? - python

I have a folder with many images (ordered by their creation time) that I can read into numpy float32 arrays. I want to write these arrays down to the filesystem in a single file in two different formats that a C programm (I can not modify) will access.
The first format is easy:
The values for the arrays one after another from left to right from top to bottom for every array. (The arrays come one after another that way). That I can do with np.tofile trivially.
The second format is more complicated:
For every pixel-coordinate (x, y) I want to write the corresponding pixels of all images one after another sequentially into the file. I tried to stack the arrays and then trnaspose the result. But when I write that down to the filesystem using np.tofile, the file contains the same arrangement of data as with the first format.
How can I tell numpy to rearrange the data?

For the second format, you could use column_stack followed by ravel
In [8]: img1 = np.arange(5, dtype='float32')
In [9]: img2 = np.arange(5, dtype='float32')
In [10]: np.column_stack((img1,img2)).ravel()
Out[10]: array([ 0., 0., 1., 1., 2., 2., 3., 3., 4., 4.], dtype=float32)

Related

Operation between ndarray and heterogeneous ndarray

I've been trying to come uo with a way to add these two ndarrays, one of them with a different amount of elements in each row:
a = np.array([np.array([0, 1]), np.array([4, 5, 6])])
z = np.zeros((3,3))
Expected output:
array([[0., 1., 0.],
[4., 5., 6.]])
Can anyone think of a way to do this using numpy?
I don't think there is a 'numpy-fast' solution for this. I think you will need to loop over a with a for loop and add every line individually.
for i in range(len(a)):
z[i,:len(a[i])] = z[i,:len(a[i])] + a[i]

Create 2D matrices from several csv files

I'm working with Python3 and I would like to load datas from several CSV files.
Each CSV (one measurement) has 3 columns (3 different physical quantities). I want to load each quantity on 3 separate variables. For one CSV file this is quite simple, I used :
TIME,CH1,CH2 = loadtxt(file_path,usecols=(3,4,5),delimiter=',',skiprows=2,unpack=True)
and it worked fine. Now I would like to extend this procedure so I can load several CSV files. Each array would be 2D, each column representing one CSV file. Instead of having several CSV with three variables, I will have 3 2D arrays, which is much more convenient for data analysis.
I thought I could try something like this :
TIME = matrix(zeros((20480,len(file_path)))) # 20480 length of each column
CH1 = matrix(zeros((20480,len(file_path)))) # len(file_path) number of CSV files
CH2 = matrix(zeros((20480,len(file_path))))
for k in range(0,len(file_path)): # reading each CSV file
TIME[:,k],CH1[:,k],CH2[:,k] = loadtxt(file_path[k],usecols=(3,4,5),delimiter=',',skiprows=2,unpack=True)
But it's telling me :
ValueError: could not broadcast input array from shape (20480) into shape (20480,1)
In the end I would like variables looking like this :
TIME = matrix([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
...,
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
Each column is from one different CSV file.
I think this is a quite usual problem, but I don't really get how arrays works in Python. I get this idea from Matlab which is quite straightforward but here I don't know why indexing arrays with TIME[:][:] doesn't work.
Have you any idea how I could do this ?
Thanks.
Use np.array, not np.matrix
I can't emphasize this enough. np.matrix exists only for legacy reasons. See this answer for an explanation of the difference. np.matrix requires 2 dimensions, while np.array permits a single dimension when indexing. This seems to be the source of your error.
Here's a minimal example exhibiting the behaviour you are seeing:
A = np.array([[1, 2, 3], [4, 5, 6]])
B = np.matrix(A)
print(A[:, 0].shape) # (2,)
print(B[:, 0].shape) # (2, 1)
Therefore, define your resultant arrays as np.array objects:
m = 20480
n = len(file_path)
shape = (m, n)
TIME = np.zeros(shape)
CH1 = np.zeros(shape)
CH2 = np.zeros(shape)

Numpy: signed values of element-wise absolute maximum of a 2D array

Let us assume that I have a 2D array named arr of shape (4, 3) as follows:
>>> arr
array([[ nan, 1., -18.],
[ -1., -1., -1.],
[ 1., 1., 5.],
[ 1., -1., 0.]])
Say that, I would like to assign the signed value of the element-wise absolute maximum of (1.0, 1.0, -15.0) and the rows arr[[0, 2], :] back to arr. Which means, I am looking for the output:
>>> arr
array([[ 1., 1., -18.],
[ -1., -1., -1.],
[ 1., 1., -15.],
[ 1., -1., 0.]])
The closest thing I found in the API reference for this is numpy.fmax but it doesn't do the absolute value. If I used:
arr[index_list, :] = np.fmax(arr[index_list, :], new_tuple)
my array would finally look like:
>>> arr
array([[ 1., 1., -15.],
[ -1., -1., -1.],
[ 1., 1., 5.],
[ 1., -1., 0.]])
Now, the API says that this function is
equivalent to np.where(x1 >= x2, x1, x2) when neither x1 nor x2 are NaNs, but it is faster and does proper broadcasting
I tried using the following:
arr[index_list, :] = np.where(np.absolute(arr[index_list, :]) >= np.absolute(new_tuple),
arr[index_list, :], new_tuple)
Although this produced the desired output, I got the warning:
/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevconsole.py:1: RuntimeWarning: invalid value encountered in greater_equal
I believe this warning is because of the NaN which is not handled gracefully here, unlike the np.fmax function. In addition, the API docs mention that np.fmax is faster and does broadcasting correctly (not sure what part of broadcasting is missing in the np.where version)
In conclusion, what I am looking for is something similar to:
arr[index_list, :] = np.fmax(arr[index_list, :], new_tuple, key=abs)
There is no such key attribute available to this function, unfortunately.
Just for context, I am interested in the fastest possible solution because my actual shape of the arr array is an average of (100000, 50) and I am looping through almost 1000 new_tuple tuples (with each tuple equal in shape to the number of columns in arr, of course). The index_list changes for each new_tuple.
Edit 1:
One possible solution is, to begin with replacing all NaN in arr with 0. i.e. arr[np.isnan(arr)] = 0. After this, I can use the np.where with np.absolute trick mentioned in my original text. However, this is probably a lot slower than np.fmax, as suggested by the API.
Edit 2:
The index_list may have repeated indexes in subsequent loops. Every new_tuple comes with a corresponding rule and the index_list is selected based on that rule. There is nothing stopping different rules from having overlapping indexes that they match to. #Divakar has an excellent answer for the case where index_list has no repeats. Other solutions are however welcome covering both cases.
Assuming that list of all index_list has no repeated indexes:
Approach #1
I would propose more of a vectorized solution once we have all of index_lists and new_tuples stored in one place, preferably as a list. As such this could be the preferred one, if we are dealing with lots of such tuples and lists.
So, let's say we have them stored as the following :
new_tuples = [(1.0, 1.0, -15.0), (6.0, 3.0, -4.0)] # list of all new_tuple
index_lists =[[0,2],[4,1,6]] # list of all index_list
The solution thereafter would be to manually repeat, replacing the broadcasting and then use np.where as shown later on in the question. Using np.where on the concern around the said warning, we can ignore, if the new_tuples have non-NaN values. Thus, the solution would be -
idx = np.concatenate(index_lists)
lens = list(map(len,index_lists))
a = arr[idx]
b = np.repeat(new_tuples,lens,axis=0)
arr[idx] = np.where(np.abs(a) > np.abs(b), a, b)
Approach #2
Another approach would be to store the absolute values of arr beforeand : abs_arr = np.abs(arr) and using those within np.where. This should save a lot time within the loop. Thus, the relevant computation would reduce to :
arr[index_list, :] = np.where(abs_arr[index_list, :] > np.abs(b), a, new_tuple)

Place output of numpy function into diagonal of array

I want to take the row sums of one array and place the output into the diagonals of another array. For performance reasons, I want to use the out argument of the np.sum function.
mat1 = np.array([[0.5, 0.5],[0.6, 0.4]])
mat2 = np.zeros([2,2])
mat3 = np.zeros([2,2])
If I want to place the row sums of mat1 into the first row of mat2, I can do it like this:
np.sum(mat1, axis=1, out = mat2[0])
mat2
#array([[ 1., 1.],
# [ 0., 0.]])
However, if I want to place the sums into the diagonal indices of mat3, I can't seem to do so.
np.sum(mat1, axis=1, out = mat3[np.diag_indices(2)])
mat3
#array([[ 0., 0.],
# [ 0., 0.]])
Of course, the following works, but I would like to use the out argument of np.sum
mat3[np.diag_indices(2)] = np.sum(mat1, axis=1)
mat3
#array([[ 1., 0.],
# [ 0., 1.]])
Can someone explain this behavior of the out argument not accepting the diagonal indices of an array as a valid output?
NumPy has two types of indexing: basic indexing and advanced indexing.
Basic indexing is what happens when your index expression uses only integers, slices, ..., and None (a.k.a. np.newaxis). This can be implemented entirely through simple manipulation of offsets and strides, so when basic indexing returns an array, the resulting array is always a view of the original data. Writing to the view writes to the original array.
When you index with an array, as in mat3[np.diag_indices(2)], you get advanced indexing. Advanced indexing cannot be done in a way that returns a view of the original data; it always copies data from the original array. That means that when you try to use the copy as an out parameter:
np.sum(mat1, axis=1, out = mat3[np.diag_indices(2)])
The data is placed into the copy, but the original array is unaffected.
We were supposed to have the ability to use np.diagonal for this by now, but even though the documentation says np.diagonal's output is writeable in NumPy 1.10, the relevant feature for making it writable is still in limbo. It's probably best to just not use the out parameter for this:
mat3[np.diag_indices(2)] = np.sum(mat1, axis=1)

Create a numpy array according to another array along with indices array

I have a numpy array(eg., a = np.array([ 8., 2.])), and another array which stores the indices I would like to get from the former array. (eg., b = np.array([ 0., 1., 1., 0., 0.]).
What I would like to do is to create another array from these 2 arrays, in this case, it should be: array([ 8., 2., 2., 8., 8.])
of course, I can always use a for loop to achieve this goal:
for i in range(5):
c[i] = a[b[i]]
I wonder if there is a more elegant method to create this array. Something like c = a[b[0:5]] (well, this apparently doesn't work)
Only integer arrays can be used for indexing, and you've created b as a float64 array. You can get what you're looking for if you explicitly convert to integer:
bi = np.array(b, dtype=int)
c = a[bi[0:5]]

Categories

Resources