IndexIng 2d array with 2d matrix specified row and column - python

I have a 6×6 matrix A, i'm trying to indexing the matrix A using two 2×2 matrices B and C. Each row of B and C specify a pair of indices for row and column in A. In detail, each row of B will specify the row needed to be indexed and each row of C specify the column.
For example,
A = np.arange(0,36).reshape(6,6)
B = np.array([[0,1],
[2,4]])
C = np.array([[1,2],
[3,4]])
I need to get a 2×2×2 matrix like this:
results =
[[[ 1 2]
[7 8]]
[[15 16]
[27 28]]]
example of indexing
If just get one matrix using index like B=[0,1] and C=[1,2], it can be done with:
d = A[B,:]
results = d[:,C]
But things different when I need to get two 2×2 matrices (2×2×2), and each matrix is index using each row of B and C.
p.s. Please change the title of this question if you can think of a more precise one.

Related

1-D arrays in NumPy

As far as I know 1-D arrays are those arrays which either have just 1 column and any number of rows or vice versa.
If I run this code:
import numpy as np
a = np.arange(10).reshape(1,10)
b = np.arange(10).reshape(10,1)
print(a.ndim, b.ndim)
It returns that both are 2-D arrays.
Why? I know the computer is working fine. But can you please tell me what is a 1-D array.
A 1-D array is an array with just a single dimension. There are no columns or rows. It has a number of values in a line like say a=[1,2,3,4,5,6]. The very concept of two separate dimensions row and columns do not apply to a 1-D array. Hence when you defined your first array with .reshape(1,10), you gave it the dimensions- 1 and 10. Thus, you actually defined a 2-D array of dimension 1x10.
If you execute this code-
import numpy as np
a = np.arange(10).reshape(1,10)
b = np.arange(10).reshape(10,1)
print(a.ndim, b.ndim)
print(a)
print(b)
You will get this output-
2 2
[[0 1 2 3 4 5 6 7 8 9]]
[[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]]
Which clearly shows that the array a has 2 dimensions- a row and a column, and hence is a 2-D array.
This .reshape(10,1) reshapes the array to a 2-d array with 10 rows and 1 column. However, if using .reshape(10) instead you will get a 1-d array.
The problem is the reshape, you say reshape(1,10). This means, reshape the array in a 2d matrix with 1 row and 10 columns. What you want is a 1d array so you need reshape(10)

Is pandas / numpy's axis the opposite of R's MARGIN?

Is it correct to think about these two things as being opposite? This has been a major source of confusion for me.
Below is an example where I find the column sums of a data frame in R and Python. Notice the opposite values for MARGIN and axis.
In R (using MARGIN=2, i.e. the column margin):
m <- matrix(1:6, nrow=2)
apply(m, MARGIN=2, mean)
[1] 1.5 3.5 5.5
In Python (using axis=0, i.e. the row axis):
In [25]: m = pd.DataFrame(np.array([[1, 3, 5], [2, 4, 6]]))
In [26]: m.apply(np.mean, axis=0)
Out[26]:
0 1.5
1 3.5
2 5.5
dtype: float64
Confusion arises because apply() talks both about which dimension the apply is "over", as well as which dimension is retained. In other words, when you apply() over rows, the result is a vector whose length is the number of columns in the input. This particular confusion is highlighted by Pandas' documentation (but not R's):
axis : {0 or ‘index’, 1 or ‘columns’}
0 or ‘index’: apply function to each column
1 or ‘columns’: apply function to each row
As you can see, 0 means the index (row) dimension is retained, and the column dimension is "applied over" (thus eliminated).
Put another way, application over columns is axis=0 or MARGIN=2, and application over rows is axis=1 or MARGIN=1. The 1 values appear to match, but that's spurious: 1 in Python is the second dimension, because Python is 0-based.
You are correct, the "margin" concept in R's apply is opposite to the "axis" concept in numpy/panda's apply function.
Say we are applying the function f to a 2-dimensional array arr. The function f takes a vector input.
R: The MARGIN argument indicates which array index of arr will be held fixed within each call to f. So if MARGIN=1 each call to f applies to all of the data with same first array index. This means the function is applied once to each row.
So, f is applied to arr[1,], arr[2,], ..., arr[n,] in turn, where n is the number of rows in arr.
numpy/pandas: The axis argument indicates which array index of arr will be varied within each call to f. So if axis=0, for each call to f, the first array index is varied to generate an input vector. This means the function is applied once to each column.
So, f is applied to arr[:,0], arr[:,1], ..., arr[:,m-1] in turn, where m is the number of columns in arr.
The difference in indexing (0-based for Python, 1-based for R) can be confusing but is not the cause of the discrepancy. I have used the appropriate syntax for each language above.
Alternative Explanation
R asks "along which dimensions should the function be applied?". So, indicating rows to R means that you want the function applied to each row. Meanwhile numpy/pandas think of its "axes" as indicating directions, like the axes of a graph. So when you tell apply to work along the row axis, it figures the row axis is vertical, and it works vertically, applying the function to each column.
In both Pandas and R, 'axis' and 'margin' are pretty much synonyms: a data frame has a 'columns' axis or margin going down, and a 'rows' axis or margin going to the right.
Pandas and R's apply implementations differ in what they do with the axis/margin keyword, as follows.
In R, calling Rows <- 1; apply(df, Rows, sum) means
R: "'Row' is the shape of the inputs. Each invocation of f gets passed one row as an argument."
Rows <- 1
Columns <- 2
df <- data.frame(c1 = 1:2, c2 = 3:4, c3 = 5:6, row.names=c('r1', 'r2'))
df
# c1 c2 c3
# r1 1 3 5
# r2 2 4 6
apply(df, Rows, sum)
# r1 9
# r2 12
In Python, calling Rows = 0; df.apply(sum, axis=Rows) means
Pandas: "'Row' is the shape of the output. Every invocation of f gets passed one column as an argument."
import pandas as pd
Rows = 0
Columns = 1
df = pd.DataFrame(
{'c1': [1, 2], 'c2': [3, 4], 'c3': [5, 6]},
index=['r1', 'r2']
)
df
# c1 c2 c3
# r1 1 3 5
# r2 2 4 6
df.apply(sum, axis=Rows)
# c1 c2 c3
# 3 7 11

How do I reverse the first four elements of the 1st axis and reversing the 2nd axis of a numpy array in a single operation?

I have a numpy array M of shape (n, 1000, 6). This can be thought of as n matrices with 1000 rows and 6 columns. For each matrix I would like to reverse the order of the rows (i.e. the top row is now at the bottom and vice versa) and then reverse the order of just the first 4 columns (so column 0 is now column 3, column 1 is column 2, column 2 is column 1 and column 3 is column 0 but column 4 is still column 4 and column 5 is still column 5). I would like to do this in a single operation, without doing indexing on the left side of the expression, so this would not be acceptable:
M[:,0:4,:] = M[:,0:4,:][:,::-1,:]
M[:,:,:] = M[:,:,::-1]
The operation needs to be achieveable using Keras backend which disallowes this. It must be of the form
M = M[indexing here that solves the task]
If I wanted to reverse the order of all the columns instead of just the first 4 this could easily be achieved with M = M[:,::-1,::-1] so I've being trying to modify this to achieve my goal but unfortunately can't work out how. Is this even possible?
M[:, ::-1, [3, 2, 1, 0, 4, 5]]

Perform function on multiple columns in python

I have a data array of 30 trials(columns) each of 256 data points (rows) and would like to run a wavelet transform (which requires a 1D array) on each column with the eventual aim of obtaining the mean coefficients of the 30 trials.
Can someone point me in the right direction please?
If you have a multidimensional numpy array then you can use a for loop:
import numpy as np
A = np.array([[1,2,3], [4,5,6]])
# A is the matrix: 1 2 3
# 4 5 6
for col in A.transpose():
print("Column:", col)
# Perform your wavelet transform here, you can save the
# results to another multidimensional array.
This gives you access to each column as a 1D array.
Output:
Column: [1 4]
Column: [2 5]
Column: [3 6]
If you want to access the rows rather than the columns then loop through A rather than A.transpose().

Numpy Summation for Index in Boolean Array

In Numpy I have a boolean array of equal length of a matrix. I want to run a calculation on the matrix elements that correspond to the boolean array. How do I do this?
a: [true, false, true]
b: [[1,1,1],[2,2,2],[3,3,3]]
Say the function was to sum the elements of the sub arrays
index 0 is True: thus I add 3 to the summation (Starts at zero)
index 1 is False: thus summation remains at 3
index 2 is True: thus I add 9 to the summation for a total of 12
How do I do this (the boolean and summation part; I don't need how to add up each individual sub array)?
You can simply use your boolean array a to index into the rows of b, then take the sum of the resulting (2, 3) array:
import numpy as np
a = np.array([True, False, True])
b = np.array([[1,1,1],[2,2,2],[3,3,3]])
# index rows of b where a is True (i.e. the first and third row of b)
print(b[a])
# [[1 1 1]
# [3 3 3]]
# take the sum over all elements in these rows
print(b[a].sum())
# 12
It sounds like you would benefit from reading the numpy documentation on array indexing.

Categories

Resources