Iterating and selecting a specific array from a multidimensional array in Python - python

Imagine I have something like this:
import numpy as np
arra = np.arange(16).reshape(2, 2, 4)
which gives
array([[[0, 1, 2, 3],
[4, 5, 6, 7]],
[[8, 9, 10, 11],
[12, 13, 14, 15]]])
and I want to make a loop that runs along specific subarrays (in this case, I want to run along each 'column' of each 'matrix') and sum the result (that is, selecting 0 and 4 and summing them (4), selecting 1 and 5 and summing them (6), ..., selecting 3 and 7 and summing them (10), selecting 8 and 12 and summing them (20), ..., selecting 11 and 15 and summing them (26)).
I had thought doing that with the apparently natural:
for i in arra[i, j, k]:
for j in arra[i, j, k]:
for k in arra[i, j, k]:
sum...
The problem is that Python certainly doesn't allow to do what I want in this way. If it were a 2D array it would be easier as I know that the iterator first runs through the rows, so you can transpose to run along the columns, but for a multidimensional (3D in this case) array (N, M, P) with N, M, P >> 1, I was wondering how it could be done.
EDIT: This question has a continuation here: Choosing and iterating specific sub-arrays in multidimensional arrays in Python

You can use map to get this done:
import numpy as np
arra = np.arange(16).reshape(2, 2, 4)
Then the command:
map(sum, arra)
gives you the desired output:
[array([ 4, 6, 8, 10]), array([20, 22, 24, 26])]
Alternatively, you can also use a list comprehension:
res = [sum(ai) for ai in arra]
Then res looks like this:
[array([ 4, 6, 8, 10]), array([20, 22, 24, 26])]
EDIT:
If you want to add identical rows - as you mentioned in the comments below this answer - you can do (using zip):
map(sum, zip(*arra))
which gives you the desired output:
[array([ 8, 10, 12, 14]), array([16, 18, 20, 22])]
For the sake of completeness also the list comprehension:
[sum(ai) for ai in zip(*arra)]
which gives you the same output.

Related

array split with overlap [duplicate]

Let's say I have a list A
A = [1,2,3,4,5,6,7,8,9,10]
I would like to create a new list (say B) using the above list in the following order.
B = [[1,2,3], [3,4,5], [5,6,7], [7,8,9], [9,10,]]
i.e. the first 3 numbers as A[0,1,2] and the second 3 numbers as A[2,3,4] and so on.
I believe there is a function in numpy for such a kind of operation.
Simply use Python's built-in list comprehension with list-slicing to do this:
>>> A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> size = 3
>>> step = 2
>>> A = [A[i : i + size] for i in range(0, len(A), step)]
This gives you what you're looking for:
>>> A
[[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8, 9], [9, 10]]
But you'll have to write a couple of lines to make sure that your code doesn't break for unprecedented values of size/step.
The 'duplicate' Partition array into N chunks with Numpy suggests np.split - that's fine for non-overlapping splits. The example (added after the close?) overlaps, one element across each subarray. Plus it pads with a 0.
How do you split a list into evenly sized chunks? has some good list answers, with various forms of generator or list comprehension, but at first glance I didn't see any that allow for overlaps - though with a clever use of iterators (such as iterator.tee) that should be possible.
We can blame this on poor question wording, but it is not a duplicate.
Working from the example and the comment:
Here my window size is 3., i.e each splitted list should have 3 elements first split [1,2,3] and the step size is 2 , So the second split start should start from 3rd element and 2nd split is [3,4,5] respectively.
Here is an advanced solution using as_strided
In [64]: ast=np.lib.index_tricks.as_strided # shorthand
In [65]: A=np.arange(1,12)
In [66]: ast(A,shape=[5,3],strides=(8,4))
Out[66]:
array([[ 1, 2, 3],
[ 3, 4, 5],
[ 5, 6, 7],
[ 7, 8, 9],
[ 9, 10, 11]])
I increased the range of A because I didn't want to deal with the 0 pad.
Choosing the target shape is easy, 5 sets of 3. Choosing the strides requires more knowledge about striding.
In [69]: x.strides
Out[69]: (4,)
The 1d striding, or stepping from one element to the next, is 4 bytes (the length one element). The step from one row to the next is 2 elements of the original, or 2*4 bytes.
as_strided produces a view. Thus changing an element in it will affect the original, and may change overlapping values. Add .copy() to make a copy; math with the strided array will also produce a copy.
Changing the strides can give non overlapping rows - but be careful about the shape - it is possible to access values outside of the original data buffer.
In [82]: ast(A,shape=[4,3],strides=(12,4))
Out[82]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 17]])
In [84]: ast(A,shape=[3,3],strides=(16,4))
Out[84]:
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]])
edit
A new function gives a safer version of as_strided.
np.lib.strided_tricks.sliding_window_view(np.arange(1,10),3)[::2]
This function that I wrote may help you, although it only outputs filled chunks with a length of len_chunk:
def overlap(array, len_chunk, len_sep=1):
"""Returns a matrix of all full overlapping chunks of the input `array`, with a chunk
length of `len_chunk` and a separation length of `len_sep`. Begins with the first full
chunk in the array. """
n_arrays = np.int(np.ceil((array.size - len_chunk + 1) / len_sep))
array_matrix = np.tile(array, n_arrays).reshape(n_arrays, -1)
columns = np.array(((len_sep*np.arange(0, n_arrays)).reshape(n_arrays, -1) + np.tile(
np.arange(0, len_chunk), n_arrays).reshape(n_arrays, -1)), dtype=np.intp)
rows = np.array((np.arange(n_arrays).reshape(n_arrays, -1) + np.tile(
np.zeros(len_chunk), n_arrays).reshape(n_arrays, -1)), dtype=np.intp)
return array_matrix[rows, columns]

Adding up formatted indexes with Numpy arrays Python

How can I write code that adds up all the numbers in between the indices? The arrays numbers and indices are correlated. The first two index values are 0-3 so the numbers between indices 0-3 are being added up 1 + 5 + 6 = 12. The expected value is what I am trying to find. I am trying to get the results without using a for loop.
numbers = np.array([1, 5, 6, 7, 4, 3, 6, 7, 11, 3, 4, 6, 2, 20]
indices = np.array([0, 3 , 7, 11])
Expected output:
[12, 41, 22]
I'm not sure how you got the expected output - from my calculation, the sum between the indices should be [12, 20, 25]. The following code calculates this:
numbers = np.array([1, 5, 6, 7, 4, 3, 6, 7, 11, 3, 4, 6, 2, 20])
indexes = np.array([0, 3, 7, 11])
tmp = np.zeros(len(numbers) + 1)
np.cumsum(numbers, out=tmp[1:])
result = np.diff(tmp[indexes])
The output of this is [12, 20, 25]
How does this work? It creates an array that is just one size larger than the numbers array (in order to have the first element be zero). Then it calculates the cumulative sum of the elements, starting at index 1 of the tmp array. Then, it takes the diff of the tmp array at the indices provided. As an example, it takes the different of the cumulative sum of the array from index 3 (value = 12) to index 7 (value = 32), 32-12 = 20.
You are likely looking for np.add.reduceat:
>>> np.add.reduceat(numbers, indices)
array([12, 20, 25, 28], dtype=int32)

Iterating Over Rows in Python Array to Extract Column Data

I am using Python and looking to iterate through each row of an Nx9 array and extract certain values from the row to form another matrix with them. The N value can change depending on the file I am reading but I have used N=3 in my example. I only require the 0th, 1st, 3rd and 4th values of each row to form into an array which I need to store. E.g:
result = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
[11, 12, 13, 14, 15, 16, 17, 18, 19],
[21, 22, 23, 24, 25, 26, 27, 28, 29]])
#Output matrix of first row should be: ([[1,2],[4,5]])
#Output matrix of second row should be: ([[11,12],[14,15]])
#Output matrix of third row should be: ([[21,22],[24,25]])
I should then end up with N number of matrices formed with the extracted values - a 2D matrix for each row. However, the matrices formed appear 3D so when transposed and subtracted I receive the error ValueError: operands could not be broadcast together with shapes (2,2,3) (3,2,2). I am aware that a (3,2,2) matrix cannot be subtracted from a (2,2,3) so how do I obtain a 2D matrix N number of times? Would a loop be better suited? Any suggestions?
import numpy as np
result = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9],
[11, 12, 13, 14, 15, 16, 17, 18, 19],
[21, 22, 23, 24, 25, 26, 27, 28, 29]])
a = result[:, 0]
b = result[:, 1]
c = result[:, 2]
d = result[:, 3]
e = result[:, 4]
f = result[:, 5]
g = result[:, 6]
h = result[:, 7]
i = result[:, 8]
output = [[a, b], [d, e]]
output = np.array(output)
output_transpose = output.transpose()
result = 0.5 * (output - output_transpose)
In [276]: result = np.array(
...: [
...: [1, 2, 3, 4, 5, 6, 7, 8, 9],
...: [11, 12, 13, 14, 15, 16, 17, 18, 19],
...: [21, 22, 23, 24, 25, 26, 27, 28, 29],
...: ]
...: )
...:
...: a = result[:, 0]
...
...: i = result[:, 8]
...: output = [[a, b], [d, e]]
In [277]: output
Out[277]:
[[array([ 1, 11, 21]), array([ 2, 12, 22])],
[array([ 4, 14, 24]), array([ 5, 15, 25])]]
In [278]: arr = np.array(output)
In [279]: arr
Out[279]:
array([[[ 1, 11, 21],
[ 2, 12, 22]],
[[ 4, 14, 24],
[ 5, 15, 25]]])
In [280]: arr.shape
Out[280]: (2, 2, 3)
In [281]: arr.T.shape
Out[281]: (3, 2, 2)
transpose exchanges the 1st and last dimensions.
A cleaner way to make a (N,2,2) array from selected columns is:
In [282]: arr = result[:,[0,1,3,4]].reshape(3,2,2)
In [283]: arr.shape
Out[283]: (3, 2, 2)
In [284]: arr
Out[284]:
array([[[ 1, 2],
[ 4, 5]],
[[11, 12],
[14, 15]],
[[21, 22],
[24, 25]]])
Since the last 2 dimensions are 2, you could transpose them, and take the difference:
In [285]: arr-arr.transpose(0,2,1)
Out[285]:
array([[[ 0, -2],
[ 2, 0]],
[[ 0, -2],
[ 2, 0]],
[[ 0, -2],
[ 2, 0]]])
Another way to get the (N,2,2) array is with a matrix index:
In [286]: result[:,[[0,1],[3,4]]]
Out[286]:
array([[[ 1, 2],
[ 4, 5]],
[[11, 12],
[14, 15]],
[[21, 22],
[24, 25]]])
Ok, this is not a coding problem, but a math problem. I wrote some code for you, since it's pretty obvious you're a beginner, so there will be some unfamiliar syntax in there that you should look into so you can avoid problems like this in the future. You might not use them all that often, but it's good to know how to do it, because it expands your understanding of python syntax in general.
First up, the complete code for easy copy and pasting:
import numpy as np
result=np.array([[1,2,3,4,5,6,7,8,9],
[11,12,13,14,15,16,17,18,19],
[21,22,23,24,25,26,27,28,29]])
output = np.array(tuple(result[:,i] for i in (0,1,3)))
def Matrix_Operation(Matrix,Coefficient):
if (Matrix.shape == Matrix.shape[::-1]
and isinstance(Matrix,np.ndarray)
and isinstance(Coefficient,float)):
return Coefficient*(Matrix-Matrix.transpose())
else:
print('The shape of you Matrix is not palindromic')
print('You cannot substitute matrices of unequal shape')
print('Your shape: %s'%str(Matrix.shape))
print(Matrix_Operation(output,0.5))
Now let's talk about a step by step explanation of what's happening here:
import numpy as np
result=np.array([[1,2,3,4,5,6,7,8,9],
[11,12,13,14,15,16,17,18,19],
[21,22,23,24,25,26,27,28,29]])
Python uses indentation (alignment of whitespaces) as an integral part of it's syntax. However, if you provide brackets, a lot of the time you don't need aligning indentations in order for the interpreter to understand your code. If you provide a large array of values manually, it is usually adviseable to start new lines at the commas (here, the commas separating the sublists). It's just more readable and that way your data isn't off screen in your coding program.
output = np.array(tuple(result[:,i] for i in (0,1,3)))
List comprehensions are a big deal in python and really handy for dirty one liners. As far as I know, no other language gives you this option. That's one of the reasons why python is so great. I basically created a list of lists, where each sublist is result[:,i] for every i in (0,1,3). This is cast as a tuple (yes, list comprehensions can also be done with tuples, not just lists). Finally I wrapped it in the np.array function, since this is the type required for our mathematical operations later on.
def Matrix_Operation(Matrix,Coefficient):
if (Matrix.shape == Matrix.shape[::-1]
and isinstance(Matrix,np.ndarray)
and isinstance(Coefficient,(float,int))):
return Coefficient*(Matrix-Matrix.transpose())
else:
print('The shape of you Matrix is not palindromic')
print('You cannot substitute matrices of unequal shape')
print('Your shape: %s'%str(Matrix.shape))
print(Matrix_Operation(output,0.5))
If you're gonna create a complex formula in python code, why not wrap it inside an abstractable function? You can incorporate a lot of "quality control" into a function as well, to check if it is given the correct input for the task it is supposed to do.
Your code failed, because you were trying to subtract a (2,2,3) shaped matrix from a (3,2,2) matrix. So we'll need a code snippet to check, if our provided matrix has a palindromic shape. You can reverse the order of items in a container by doing Container[::-1] and so we ask, if Matrix.shape == Matrix.shape[::-1]. Further, it is necessary, that our Matrix is a np.ndarray and if our coefficient is a number. That's what I'm doing with the isinstance() function. You can check for multiple types at once, which is why the isinstance(Coefficient,(float,int)) contains a tuple with both int and float in it.
Now that we have ensured that our input makes sense, we can preform our Matrix_Operation.
So in conclusion: Check if your math is solid before asking SO for help, because people here can get pretty annoyed at that sort of thing. You probably noticed by now that someone has already downvoted your question. Personally, I believe it's necessary to let newbies ask a couple stupid questions before they get into the groove, but that's what the voting button is for, I guess.

Slicing arrays with lists

So, I create a numpy array:
a = np.arange(25).reshape(5,5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
A conventional slice a[1:3,1:3] returns
array([[ 6, 7],
[11, 12]])
as does using a list in the second a[1:3,[1,2]]
array([[ 6, 7],
[11, 12]])
However, a[[1,2],[1,2]] returns
array([ 6, 12])
Obviously I am not understanding something here. That said, slicing with a list might on occasion be very useful.
Cheers,
keng
You observed effect of so-called Advanced Indexing. Let consider example from link:
import numpy as np
x = np.array([[1, 2], [3, 4], [5, 6]])
print(x)
[[1 2]
[3 4]
[5 6]]
print(x[[0, 1, 2], [0, 1, 0]]) # [1 4 5]
You might think about this as providing lists of (Cartesian) coordinates of grid, as
print(x[0,1]) # 1
print(x[1,1]) # 4
print(x[2,0]) # 5
In the last case, the two individual lists are treated as separate indexing operations (this is really awkward wording so please bear with me).
Numpy sees two lists of two integers and decides that you are therefore asking for two values. The row index of each value comes from the first list, while the column index of each value comes from the second list. Therefore, you get a[1,1] and a[2,2]. The : notation not only expands to the list you've accurately deduced, but also tells numpy that you want all the rows/columns in that range.
If you provide manually curated list indices, they have to be of the same size, because the size of each/any list is the number of elements you'll get back. For example, if you wanted the elements in columns 1 and 2 of rows 1,2,3:
>>> a[1:4,[1,2]]
array([[ 6, 7],
[11, 12],
[16, 17]])
But
>>> a[[1,2,3],[1,2]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (3,) (2,)
The former tells numpy that you want a range of rows and specific columns, while the latter says "get me the elements at (1,1), (2,2), and (3, hey! what the?! where's the other index?)"
a[[1,2],[1,2]] is reading this as, I want a[1,1] and a[2,2]. There are a few ways around this and I likely don't even have the best ways but you could try
a[[1,1,2,2],[1,2,1,2]]
This will give you a flattened version of above
a[[1,2]][:,[1,2]]
This will give you the correct slice, it works be taking the rows [1,2] and then columns [1,2].
It triggers advanced indexing so first slice is the row index, second is the column index. For each row, it selects the corresponding column.
a[[1,2], [1,2]] -> [a[1, 1], a[2, 2]] -> [6, 12]

Split list into separate but overlapping chunks

Let's say I have a list A
A = [1,2,3,4,5,6,7,8,9,10]
I would like to create a new list (say B) using the above list in the following order.
B = [[1,2,3], [3,4,5], [5,6,7], [7,8,9], [9,10,]]
i.e. the first 3 numbers as A[0,1,2] and the second 3 numbers as A[2,3,4] and so on.
I believe there is a function in numpy for such a kind of operation.
Simply use Python's built-in list comprehension with list-slicing to do this:
>>> A = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> size = 3
>>> step = 2
>>> A = [A[i : i + size] for i in range(0, len(A), step)]
This gives you what you're looking for:
>>> A
[[1, 2, 3], [3, 4, 5], [5, 6, 7], [7, 8, 9], [9, 10]]
But you'll have to write a couple of lines to make sure that your code doesn't break for unprecedented values of size/step.
The 'duplicate' Partition array into N chunks with Numpy suggests np.split - that's fine for non-overlapping splits. The example (added after the close?) overlaps, one element across each subarray. Plus it pads with a 0.
How do you split a list into evenly sized chunks? has some good list answers, with various forms of generator or list comprehension, but at first glance I didn't see any that allow for overlaps - though with a clever use of iterators (such as iterator.tee) that should be possible.
We can blame this on poor question wording, but it is not a duplicate.
Working from the example and the comment:
Here my window size is 3., i.e each splitted list should have 3 elements first split [1,2,3] and the step size is 2 , So the second split start should start from 3rd element and 2nd split is [3,4,5] respectively.
Here is an advanced solution using as_strided
In [64]: ast=np.lib.index_tricks.as_strided # shorthand
In [65]: A=np.arange(1,12)
In [66]: ast(A,shape=[5,3],strides=(8,4))
Out[66]:
array([[ 1, 2, 3],
[ 3, 4, 5],
[ 5, 6, 7],
[ 7, 8, 9],
[ 9, 10, 11]])
I increased the range of A because I didn't want to deal with the 0 pad.
Choosing the target shape is easy, 5 sets of 3. Choosing the strides requires more knowledge about striding.
In [69]: x.strides
Out[69]: (4,)
The 1d striding, or stepping from one element to the next, is 4 bytes (the length one element). The step from one row to the next is 2 elements of the original, or 2*4 bytes.
as_strided produces a view. Thus changing an element in it will affect the original, and may change overlapping values. Add .copy() to make a copy; math with the strided array will also produce a copy.
Changing the strides can give non overlapping rows - but be careful about the shape - it is possible to access values outside of the original data buffer.
In [82]: ast(A,shape=[4,3],strides=(12,4))
Out[82]:
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 17]])
In [84]: ast(A,shape=[3,3],strides=(16,4))
Out[84]:
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]])
edit
A new function gives a safer version of as_strided.
np.lib.strided_tricks.sliding_window_view(np.arange(1,10),3)[::2]
This function that I wrote may help you, although it only outputs filled chunks with a length of len_chunk:
def overlap(array, len_chunk, len_sep=1):
"""Returns a matrix of all full overlapping chunks of the input `array`, with a chunk
length of `len_chunk` and a separation length of `len_sep`. Begins with the first full
chunk in the array. """
n_arrays = np.int(np.ceil((array.size - len_chunk + 1) / len_sep))
array_matrix = np.tile(array, n_arrays).reshape(n_arrays, -1)
columns = np.array(((len_sep*np.arange(0, n_arrays)).reshape(n_arrays, -1) + np.tile(
np.arange(0, len_chunk), n_arrays).reshape(n_arrays, -1)), dtype=np.intp)
rows = np.array((np.arange(n_arrays).reshape(n_arrays, -1) + np.tile(
np.zeros(len_chunk), n_arrays).reshape(n_arrays, -1)), dtype=np.intp)
return array_matrix[rows, columns]

Categories

Resources