Numpy Group Reshaping / Indexing - python

The situation is I'd like to take the following Python / NumPy code:
# Procure some data:
z = np.zeros((32,32))
chunks = []
for i in range(0,32,step):
for j in range(0,32,step):
chunks.append( z[i:i+step,j:j+step] )
chunks = np.array(chunks)
chunks.shape # (256, 2, 2)
And vectorize it / remove the for loops. Is this possible? I don't mind much about ordering of the final array, e.g. 256,2,2 vs 2,2,256, as long as the spatial structure remains the same. That is, blocks of 2x2 from the original array.
Perhaps some magic using :: in addition to regular indexing can do this? Any NumPy masters here?

You may need transpose:
a = np.arange(1024).reshape(32,32)
a.reshape(16,2,16,2).transpose((0,2,1,3)).reshape(-1,2,2)
Output:
array([[[ 0, 1],
[ 32, 33]],
[[ 2, 3],
[ 34, 35]],
[[ 4, 5],
[ 36, 37]],
...,
[[ 986, 987],
[1018, 1019]],
[[ 988, 989],
[1020, 1021]],
[[ 990, 991],
[1022, 1023]]])

Related

Numpy reshape the matrix

Does anyone can tell me how to use Numpy to reshape the Matrix
[1,2,3,4]
[5,6,7,8]
[9,10,11,12]
[13,14,15,16]
to
[16,15,14,13]
[12,11,10,9]
[8,7,6,5]
[4,3,2,1]
Thanks:)
python 3.8
numpy 1.21.5
an example of my matrixs:
[[ 1.92982258e+00 1.96782439e+00 2.00233048e-01 3.95128552e-01
4.21665915e-01 -1.10885581e-01 3.15967524e-01 1.86851601e-01]
[ 5.82581567e-01 3.85242821e-01 6.52345512e-01 6.96774921e-01
4.46925274e-01 1.10208991e-01 -1.78544580e-02 2.63118328e-01]
[ 1.18591189e-01 -8.87084649e-02 3.35701398e-01 3.81145692e-01
2.11622312e-02 3.10028567e-01 2.04480529e-01 4.45985566e-01]
[ 5.59840625e-01 2.01962111e-01 5.34994738e-01 2.48421290e-01
2.42632687e-01 2.13238611e-01 3.96632085e-01 4.94549692e-01]
[-7.69809051e-02 -3.00706661e-04 1.44790257e-01 3.49158021e-01
1.10096226e-01 2.03164938e-01 -3.45361600e-01 -3.33408510e-02]
[ 2.33273192e-01 4.39144490e-01 -6.11938054e-02 -6.93128853e-02
-9.55871298e-02 -1.97338746e-02 -6.54788754e-02 2.81574199e-01]
[ 6.61742595e-01 4.04149752e-01 2.33536310e-01 8.86332882e-02
-2.21808751e-01 -5.48789656e-03 5.49503834e-01 -1.22011728e-01]
[-9.58502481e-03 2.36994437e-01 -1.28777627e-01 3.99751917e-01
-1.92452263e-02 -2.58119080e-01 3.40399940e-01 -2.20455571e-01]]
You can rotate the matrix with numpy.rot90(). To get two rotations as your example, pass in k=2:
import numpy as np
a = np.array([
[1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16],
])
np.rot90(a, k=2)
returning:
array([[16, 15, 14, 13],
[12, 11, 10, 9],
[ 8, 7, 6, 5],
[ 4, 3, 2, 1]])
Note the docs that say it returns a view of the original. This means the rotated matrix shares data with the original. Mutating one effects the other.

How to multiply two lists to matrices to a tensor?

I have the two lists of arrays
splocations = [array([1,2,3]),array([4,5,6]),array([7,8,9])]
eviddisp = [array([10,11,12]), array([13,14,15])]
which I would like to multiply with each other such that I multiply each list element (which is an array) with each other list element. Here I would get a 3x2 matrix where each element is a vector. So the matrix element [0,0] would be
array([10, 22, 36]) = array([1,2,3]) * array([10,11,12])
So this matrix would be in fact a tensor of shape 3x2x3. How can I get this tensor/matrix?
I get that I need to use array(splocations) and array(eviddisp) somehow. By I realised, I am looking for a solution with numpy's tensordot, but I don't get it right. How to I proceed?
I think this is what you want, taking automatic broadcasting into account:
from numpy import array
splocations = [array([1,2,3]),array([4,5,6]),array([7,8,9])]
eviddisp = [array([10,11,12]), array([13,14,15])]
splocations = array(splocations)
viddisp = array(eviddisp)
result = splocations[:, None, :]*eviddisp
result
array([[[ 10, 22, 36],
[ 13, 28, 45]],
[[ 40, 55, 72],
[ 52, 70, 90]],
[[ 70, 88, 108],
[ 91, 112, 135]]])

In python, read binary file as floats at once [duplicate]

Okay so I have a datafile from an EEG scan (a binary file, data.eeg), and in matlab the code to read the file and plot a section of the data goes like this:
sr=400; % Sample Rate
Nyq_freq=sr/2; % Nyquist Frequency
fneeg=input('Filename (with path and extension) :', 's');
t=input('How many seconds in total of EEG ? : ');
ch=input('How many channels of EEG ? : ');
le=t*sr; % Length of the Recording
fid=fopen(fneeg, 'r', 'l'); % Open the file to read
EEG=fread(fid,[ch,le],'int16'); % Read Data -> EEG Matrix
fclose ('all');
plot(EEG(:,3))
Here is my attempt to "translate"
from numpy import *
from matplotlib.pylab import *
sample_rate = 400
Nyquist = sample_rate/2.
fneeg = raw_input("Filename (full path & extension): ")
t = int(raw_input("How many secs in total of EEG?: "))
ch = int(raw_input("How many channels of EEG?: "))
le = t*sample_rate
fid = open(fneeg, 'r')
EEG = fromfile(fneeg, int16)
Here's where things get confusing to me. According to the documentation, matlab's fread is a method of reading binary files via fread(loaded_file, size, data_type). The alternative in python is using numpy's fromfile and reshaping (according to this thread here: MATLAB to Python fread) using the built in reshape function. I'm not sure how this works, or even relates to the matlab method? I'm sorry if my question is confusing, matlab is still very new to me
Edit: If you wanna look have a look at the file here it is: https://www.dropbox.com/s/zzm6uvjfm9gpamk/data.eeg
Edit2: The answers to the raw inputs are t=10, ch=32. In fact I'm not sure why I'm even asking for user input now that I think about it..
As discussed in the comments by yourself and #JoeKington, this should work (I removed the input stuff for testing)
import numpy as np
sample_rate = 400
Nyquist = sample_rate/2.0
fneeg = 'data.eeg'
t = 10
ch = 32
le = t*sample_rate
EEG = np.fromfile(fneeg, 'int16').reshape(ch, le, order='F')
Without the reshape, you get:
In [45]: EEG
Out[45]: array([ -39, -25, -22, ..., -168, -586, -46], dtype=int16)
In [46]: EEG.shape
Out[46]: (128000,)
With reshaping:
In [47]: EEG.reshape(ch, le, order='F')
Out[47]:
array([[ -39, -37, -12, ..., 5, 19, 21],
[ -25, -20, 7, ..., 20, 36, 36],
[ -22, -20, 0, ..., 18, 34, 36],
...,
[ 104, 164, 44, ..., 60, -67, -168],
[ 531, 582, 88, ..., 29, -420, -586],
[ -60, -63, -92, ..., -17, -44, -46]], dtype=int16)
In [48]: EEG.reshape(ch, le, order='F').shape
Out[48]: (32, 4000)

Python: Why are eigenvectors not the same as first PCA weights?

Let's generate an array:
import numpy as np
data = np.arange(30).reshape(10,3)
data=data*data
array([[ 0, 1, 4],
[ 9, 16, 25],
[ 36, 49, 64],
[ 81, 100, 121],
[144, 169, 196],
[225, 256, 289],
[324, 361, 400],
[441, 484, 529],
[576, 625, 676],
[729, 784, 841]])
Then find the eigenvalues of the covariance matrix:
mn = np.mean(data, axis=0)
data -= mn
C = np.cov(data.T)
evals, evecs = la.eig(C)
idx = np.argsort(evals)[::-1]
evecs = evecs[:,idx]
print evecs
array([[-0.53926461, -0.73656433, 0.40824829],
[-0.5765472 , -0.03044111, -0.81649658],
[-0.61382979, 0.67568211, 0.40824829]])
Now let's run the matplotlib.mlab.PCA function on the data:
import matplotlib.mlab as mlab
mpca=mlab.PCA(data)
print mpca.Wt
[[ 0.57731894 0.57740574 0.57732612]
[ 0.72184459 -0.03044628 -0.69138514]
[ 0.38163232 -0.81588947 0.43437443]]
Why are the two matrices different? I thought that in finding the PCA, first one had to find the eigenvectors of the covariance matrix, and that this would be exactly equal to the weights.
You need to normalize your data, not just center it, and the output of np.linalg.eig has to be transposed to match that of mlab.PCA:
>>> n_data = (data - data.mean(axis=0)) / data.std(axis=0)
>>> evals, evecs = np.linalg.eig(np.cov(n_data.T))
>>> evecs = evecs[:, np.argsort(evals)[::-1]].T
>>> mlab.PCA(data).Wt
array([[ 0.57731905, 0.57740556, 0.5773262 ],
[ 0.72182079, -0.03039546, -0.69141222],
[ 0.38167716, -0.8158915 , 0.43433121]])
>>> evecs
array([[-0.57731905, -0.57740556, -0.5773262 ],
[-0.72182079, 0.03039546, 0.69141222],
[ 0.38167716, -0.8158915 , 0.43433121]])

NumPy: Execute function over each ndarray element

I have a three dimensional ndarray of 2D coordinates, for example:
[[[1704 1240]
[1745 1244]
[1972 1290]
[2129 1395]
[1989 1332]]
[[1712 1246]
[1750 1246]
[1964 1286]
[2138 1399]
[1989 1333]]
[[1721 1249]
[1756 1249]
[1955 1283]
[2145 1399]
[1990 1333]]]
The ultimate goal is to remove the point closest to a given point ([1989 1332]) from each "group" of 5 coordinates. My thought was to produce a similarly shaped array of distances, and then using argmin to determine the indices of the values to be removed. However, I am not certain how to go about applying a function, like one to calculate a distance to a given point, to every element in an ndarray, at least in a NumPythonic way.
List comprehensions are a very inefficient way to deal with numpy arrays. They're an especially poor choice for the distance calculation.
To find the difference between your data and a point, you'd just do data - point. You can then calculate the distance using np.hypot, or if you'd prefer, square it, sum it, and take the square root.
It's a bit easier if you make it an Nx2 array for the purposes of the calculation though.
Basically, you want something like this:
import numpy as np
data = np.array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395],
[1989, 1332]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399],
[1989, 1333]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399],
[1990, 1333]]])
point = [1989, 1332]
#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)
# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)
print dist
This yields:
array([[[ 299.48121811],
[ 259.38388539],
[ 45.31004304],
[ 153.5219854 ],
[ 0. ]],
[[ 290.04310025],
[ 254.0019685 ],
[ 52.35456045],
[ 163.37074401],
[ 1. ]],
[[ 280.55837182],
[ 247.34186868],
[ 59.6405902 ],
[ 169.77926846],
[ 1.41421356]]])
Now, removing the closest element is a bit harder than simply getting the closest element.
With numpy, you can use boolean indexing to do this fairly easily.
However, you'll need to worry a bit about the alignment of your axes.
The key is to understand that numpy "broadcasts" operations along the last axis. In this case, we want to brodcast along the middle axis.
Also, -1 can be used as a placeholder for the size of an axis. Numpy will calculate the permissible size when -1 is put in as the size of an axis.
What we'd need to do would look a bit like this:
#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]
# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])
You could make that a single line, I'm just breaking it down for readability. The key is that dist != something yields a boolean array which you can then use to index the original array.
So, Putting it all together:
import numpy as np
data = np.array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395],
[1989, 1332]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399],
[1989, 1333]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399],
[1990, 1333]]])
point = [1989, 1332]
#-- Calculate distance ------------
# The reshape is to make it a single, Nx2 array to make calling `hypot` easier
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)
# We can then reshape it back to AxBx1 array, similar to the original shape
dist = dist.reshape(data.shape[0], data.shape[1], 1)
#-- Remove closest point ---------------------
mask = np.squeeze(dist) != dist.min(axis=1)
filtered = data[mask]
# Once again, let's reshape things back to the original shape...
filtered = filtered.reshape(data.shape[0], -1, data.shape[2])
print filtered
Yields:
array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399]]])
On a side note, if more than one point is equally close, this won't work. Numpy arrays have to have the same number of elements along each dimension, so you'll need to re-do your grouping in that case.
If I understand your question correctly, I think you're looking for apply_along_axis. Using numpy's built-in broadcasting, we can simply subtract the point from the array:
>>> a - numpy.array([1989, 1332])
array([[[-285, -92],
[-244, -88],
[ -17, -42],
[ 140, 63],
[ 0, 0]],
[[-277, -86],
[-239, -86],
[ -25, -46],
[ 149, 67],
[ 0, 1]],
[[-268, -83],
[-233, -83],
[ -34, -49],
[ 156, 67],
[ 1, 1]]])
Then we can apply numpy.linalg.norm to it:
>>> dist = a - numpy.array([1989, 1332])
>>> numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
array([[ 299.48121811, 259.38388539, 45.31004304,
153.5219854 , 0. ],
[ 290.04310025, 254.0019685 , 52.35456045,
163.37074401, 1. ],
[ 280.55837182, 247.34186868, 59.6405902 ,
169.77926846, 1.41421356]])
Finally, some boolean mask trickery, along with a couple of reshape calls:
>>> a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))
array([[[1704, 1240],
[1745, 1244],
[1972, 1290],
[2129, 1395]],
[[1712, 1246],
[1750, 1246],
[1964, 1286],
[2138, 1399]],
[[1721, 1249],
[1756, 1249],
[1955, 1283],
[2145, 1399]]])
Joe Kington's answer is faster though. Oh well. I'll leave this for posterity.
def joes(data, point):
dist = data.reshape((-1,2)) - point
dist = np.hypot(*dist.T)
dist = dist.reshape(data.shape[0], data.shape[1], 1)
mask = np.squeeze(dist) != dist.min(axis=1)
return data[mask].reshape((3, 4, 2))
def mine(a, point):
dist = a - point
normed = numpy.apply_along_axis(numpy.linalg.norm, 2, dist)
return a[normed != normed.min(axis=1).reshape((-1, 1))].reshape((3, 4, 2))
>>> %timeit mine(data, point)
1000 loops, best of 3: 586 us per loop
>>> %timeit joes(data, point)
10000 loops, best of 3: 48.9 us per loop
There are multiple ways to do this, but here is one using list comprehensions:
Distance function:
In [35]: from numpy.linalg import norm
In [36]: dist = lambda x,y:norm(x-y)
Input data:
In [39]: GivenMatrix = scipy.rand(3, 5, 2)
In [40]: GivenMatrix
Out[40]:
array([[[ 0.83798666, 0.90294439],
[ 0.8706959 , 0.88397176],
[ 0.91879085, 0.93512921],
[ 0.15989245, 0.57311869],
[ 0.82896003, 0.53589968]],
[[ 0.0207089 , 0.9521768 ],
[ 0.94523963, 0.31079109],
[ 0.41929482, 0.88559614],
[ 0.87885236, 0.45227422],
[ 0.58365369, 0.62095507]],
[[ 0.14757177, 0.86101539],
[ 0.58081214, 0.12632764],
[ 0.89958321, 0.73660852],
[ 0.3408943 , 0.45420989],
[ 0.42656333, 0.42770216]]])
In [41]: q = scipy.rand(2)
In [42]: q
Out[42]: array([ 0.03280889, 0.71057403])
Compute output distances:
In [44]: distances = [[dist(x, q) for x in SubMatrix]
for SubMatrix in GivenMatrix]
In [45]: distances
Out[45]:
[[0.82783910695733931,
0.85564093542511577,
0.91399620574915652,
0.18720096539588818,
0.81508758596405939],
[0.24190557184498068,
0.99617079746515047,
0.42426891258164884,
0.88459501973012633,
0.55808740166908177],
[0.18921712490174292,
0.80103146210692744,
0.86716521557255788,
0.40079819635686459,
0.48482888965287363]]
To rank the results for each submatrix:
In [46]: scipy.argsort(distances)
Out[46]:
array([[3, 4, 0, 1, 2],
[0, 2, 4, 3, 1],
[0, 3, 4, 1, 2]])
As for the deletion, I personally think that's easiest by converting GivenMatrix to a list, then using del:
>>> GivenList = GivenMatrix.tolist()
>>> del GivenList[1][2] # delete third row from the second 5-by-2 submatrix

Categories

Resources