Perform function on multiple columns in python - python

I have a data array of 30 trials(columns) each of 256 data points (rows) and would like to run a wavelet transform (which requires a 1D array) on each column with the eventual aim of obtaining the mean coefficients of the 30 trials.
Can someone point me in the right direction please?

If you have a multidimensional numpy array then you can use a for loop:
import numpy as np
A = np.array([[1,2,3], [4,5,6]])
# A is the matrix: 1 2 3
# 4 5 6
for col in A.transpose():
print("Column:", col)
# Perform your wavelet transform here, you can save the
# results to another multidimensional array.
This gives you access to each column as a 1D array.
Output:
Column: [1 4]
Column: [2 5]
Column: [3 6]
If you want to access the rows rather than the columns then loop through A rather than A.transpose().

Related

IndexIng 2d array with 2d matrix specified row and column

I have a 6×6 matrix A, i'm trying to indexing the matrix A using two 2×2 matrices B and C. Each row of B and C specify a pair of indices for row and column in A. In detail, each row of B will specify the row needed to be indexed and each row of C specify the column.
For example,
A = np.arange(0,36).reshape(6,6)
B = np.array([[0,1],
[2,4]])
C = np.array([[1,2],
[3,4]])
I need to get a 2×2×2 matrix like this:
results =
[[[ 1 2]
[7 8]]
[[15 16]
[27 28]]]
example of indexing
If just get one matrix using index like B=[0,1] and C=[1,2], it can be done with:
d = A[B,:]
results = d[:,C]
But things different when I need to get two 2×2 matrices (2×2×2), and each matrix is index using each row of B and C.
p.s. Please change the title of this question if you can think of a more precise one.

Faster 3D Matrix Operation - Python

I am working with 3D matrix in Python, for example, given matrix like this with size of 2x3x4:
[[[1 2 1 4]
[3 2 1 1]
[4 3 1 4]]
[[2 1 3 3]
[1 4 2 1]
[3 2 3 3]]]
I have task to find the value of entropy in each row in each dimension matrix. For example, in row 1 of dimension 1 of the matrix above [1,2,1,4], the normalized value (as such the total sum is 1) is [0.125, 0.25, 0.125, 0.5] and the value of entropy is calculated by the formula -sum(i*log(i)) where i is the normalized value. The resulting matrix is a 2x3 matrix where in each dimension there are 3 values of entropy (because there are 3 rows).
Here is the working example of my code using random matrix each time:
from scipy.stats import entropy
import numpy as np
matrix = np.random.randint(low=1,high=5,size=(2,3,4)) #how if size is (200,50,1000)
entropy_matrix=np.zeros((matrix.shape[0],matrix.shape[1]))
for i in range(matrix.shape[0]):
normalized = np.array([float(k)/np.sum(j) for j in matrix[i] for k in j]).reshape(matrix.shape[1],matrix.shape[2])
entropy_matrix[i] = np.array([entropy(m) for m in normalized])
My question is how do I scale-up this program to work with very large 3D matrix (for example with size of 200x50x1000) ?
I am using Python in Windows 10 (with Anaconda distribution).
Using 3D matrix size of 200x50x1000, I got running time of 290 s on my computer.
Using the definition of entropy for the second part and broadcasted operation on the first part, one vectorized solution would be -
p1 = matrix/matrix.sum(-1,keepdims=True).astype(float)
entropy_matrix_out = -np.sum(p1 * np.log(p1), axis=-1)
Alternatively, we can use einsum for the second part for further perf. boost -
entropy_matrix_out = -np.einsum('ijk,ijk->ij',p1,np.log(p1),optimize=True)

Deleting rows from 3D array corresponding to rows removed from 3D array

I am dealing with radar reflectivity data that has this shape and my eventual goal is to plot it but before that I have this problem -
(2500,50,200) #(scans, rays, altitude)
I also have three 3D numpy arrays corresponding to the latitude, longitude and altitude that have the same shape as the radar reflectivity i.e
(2500,50,200)
Bulk of the data is comprised of zeros. So I thought it would make sense to remove the entries corresponding to the zeros before I plot them. After removing the entries in the reflectivity data that comprised of zeros I also need to go back and remove the corresponding entries in the latitude, longitude and altitude arrays as well.
Here is an attempt at a toy example
import numpy as np
arr=np.arange(27).reshape((3,3,3))
arrNZ = arr[np.nonzero(arr)]
print(arr.shape,arrNZ)
indx = np.where(arr == 0)
arr1 = np.arange(27).reshape((3,3,3))
n_arr1 = np.delete(arr1,indx)
print(n_arr1.shape)
But this does not seem to work. The element does not appear to have been deleted. Where am I going wrong ?
Is this what you want, or did I misunderstand your problem ?
import numpy as np
ref = np.array([1,0,3,4,0,6,7,8,9]).reshape(3,3) # reflectivity
lat = np.arange(9).reshape(3,3) # latitude
print("ref =\n", ref)
print("lat =\n", lat)
ref2 = ref[np.nonzero(ref)] # keep only non-zero
lat2 = lat[np.nonzero(ref)] # keep the items where 'ref' is non-zero
print("ref2 =", ref2)
print("lat2 =", lat2)
Result:
ref =
[[1 0 3]
[4 0 6]
[7 8 9]]
lat =
[[0 1 2]
[3 4 5]
[6 7 8]]
ref2 = [1 3 4 6 7 8 9]
lat2 = [0 2 3 5 6 7 8]
In your example, indx is equal to (0,0,0)
In [12]: indx
Out[12]: (array([0]), array([0]), array([0]))
So np.delete removed the first element of your original array, and then returned a flattened copy (a 1D copy) because the axis argument to np.delete was zero.
As sciroccorics explained, the best way to trim off elements of an axis is to use an index array to mask them off.

1-D arrays in NumPy

As far as I know 1-D arrays are those arrays which either have just 1 column and any number of rows or vice versa.
If I run this code:
import numpy as np
a = np.arange(10).reshape(1,10)
b = np.arange(10).reshape(10,1)
print(a.ndim, b.ndim)
It returns that both are 2-D arrays.
Why? I know the computer is working fine. But can you please tell me what is a 1-D array.
A 1-D array is an array with just a single dimension. There are no columns or rows. It has a number of values in a line like say a=[1,2,3,4,5,6]. The very concept of two separate dimensions row and columns do not apply to a 1-D array. Hence when you defined your first array with .reshape(1,10), you gave it the dimensions- 1 and 10. Thus, you actually defined a 2-D array of dimension 1x10.
If you execute this code-
import numpy as np
a = np.arange(10).reshape(1,10)
b = np.arange(10).reshape(10,1)
print(a.ndim, b.ndim)
print(a)
print(b)
You will get this output-
2 2
[[0 1 2 3 4 5 6 7 8 9]]
[[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]]
Which clearly shows that the array a has 2 dimensions- a row and a column, and hence is a 2-D array.
This .reshape(10,1) reshapes the array to a 2-d array with 10 rows and 1 column. However, if using .reshape(10) instead you will get a 1-d array.
The problem is the reshape, you say reshape(1,10). This means, reshape the array in a 2d matrix with 1 row and 10 columns. What you want is a 1d array so you need reshape(10)

Transform Pandas DataFrame with n-level hierarchical index into n-D Numpy array

Question
Is there a good way to transform a DataFrame with an n-level index into an n-D Numpy array (a.k.a n-tensor)?
Example
Suppose I set up a DataFrame like
from pandas import DataFrame, MultiIndex
index = range(2), range(3)
value = range(2 * 3)
frame = DataFrame(value, columns=['value'],
index=MultiIndex.from_product(index)).drop((1, 0))
print frame
which outputs
value
0 0 0
1 1
2 3
1 1 5
2 6
The index is a 2-level hierarchical index. I can extract a 2-D Numpy array from the data using
print frame.unstack().values
which outputs
[[ 0. 1. 2.]
[ nan 4. 5.]]
How does this generalize to an n-level index?
Playing with unstack(), it seems that it can only be used to massage the 2-D shape of the DataFrame, but not to add an axis.
I cannot use e.g. frame.values.reshape(x, y, z), since this would require that the frame contains exactly x * y * z rows, which cannot be guaranteed. This is what I tried to demonstrate by drop()ing a row in the above example.
Any suggestions are highly appreciated.
Edit. This approach is much more elegant (and two orders of magnitude faster) than the one I gave below.
# create an empty array of NaN of the right dimensions
shape = map(len, frame.index.levels)
arr = np.full(shape, np.nan)
# fill it using Numpy's advanced indexing
arr[frame.index.codes] = frame.values.flat
# ...or in Pandas < 0.24.0, use
# arr[frame.index.labels] = frame.values.flat
Original solution. Given a setup similar to above, but in 3-D,
from pandas import DataFrame, MultiIndex
from itertools import product
index = range(2), range(2), range(2)
value = range(2 * 2 * 2)
frame = DataFrame(value, columns=['value'],
index=MultiIndex.from_product(index)).drop((1, 0, 1))
print(frame)
we have
value
0 0 0 0
1 1
1 0 2
1 3
1 0 0 4
1 0 6
1 7
Now, we proceed using the reshape() route, but with some preprocessing to ensure that the length along each dimension will be consistent.
First, reindex the data frame with the full cartesian product of all dimensions. NaN values will be inserted as needed. This operation can be both slow and consume a lot of memory, depending on the number of dimensions and on the size of the data frame.
levels = map(tuple, frame.index.levels)
index = list(product(*levels))
frame = frame.reindex(index)
print(frame)
which outputs
value
0 0 0 0
1 1
1 0 2
1 3
1 0 0 4
1 NaN
1 0 6
1 7
Now, reshape() will work as intended.
shape = map(len, frame.index.levels)
print(frame.values.reshape(shape))
which outputs
[[[ 0. 1.]
[ 2. 3.]]
[[ 4. nan]
[ 6. 7.]]]
The (rather ugly) one-liner is
frame.reindex(list(product(*map(tuple, frame.index.levels)))).values\
.reshape(map(len, frame.index.levels))
This can be done quite nicely using the Python xarray package which can be found here: http://xarray.pydata.org/en/stable/. It has great integration with Pandas and is quite intuitive once you get to grips with it.
If you have a multiindex series you can call the built-in method multiindex_series.to_xarray() (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_xarray.html). This will generate a DataArray object, which is essentially a name-indexed numpy array, using the index values and names as coordinates. Following this you can call .values on the DataArray object to get the underlying numpy array.
If you need your tensor to conform to a set of keys in a specific order, you can also call .reindex(index_name = index_values_in_order) (http://xarray.pydata.org/en/stable/generated/xarray.DataArray.reindex.html) on the DataArray. This can be extremely useful and makes working with the newly generated tensor much easier!

Categories

Resources