In python, read binary file as floats at once [duplicate] - python

Okay so I have a datafile from an EEG scan (a binary file, data.eeg), and in matlab the code to read the file and plot a section of the data goes like this:
sr=400; % Sample Rate
Nyq_freq=sr/2; % Nyquist Frequency
fneeg=input('Filename (with path and extension) :', 's');
t=input('How many seconds in total of EEG ? : ');
ch=input('How many channels of EEG ? : ');
le=t*sr; % Length of the Recording
fid=fopen(fneeg, 'r', 'l'); % Open the file to read
EEG=fread(fid,[ch,le],'int16'); % Read Data -> EEG Matrix
fclose ('all');
plot(EEG(:,3))
Here is my attempt to "translate"
from numpy import *
from matplotlib.pylab import *
sample_rate = 400
Nyquist = sample_rate/2.
fneeg = raw_input("Filename (full path & extension): ")
t = int(raw_input("How many secs in total of EEG?: "))
ch = int(raw_input("How many channels of EEG?: "))
le = t*sample_rate
fid = open(fneeg, 'r')
EEG = fromfile(fneeg, int16)
Here's where things get confusing to me. According to the documentation, matlab's fread is a method of reading binary files via fread(loaded_file, size, data_type). The alternative in python is using numpy's fromfile and reshaping (according to this thread here: MATLAB to Python fread) using the built in reshape function. I'm not sure how this works, or even relates to the matlab method? I'm sorry if my question is confusing, matlab is still very new to me
Edit: If you wanna look have a look at the file here it is: https://www.dropbox.com/s/zzm6uvjfm9gpamk/data.eeg
Edit2: The answers to the raw inputs are t=10, ch=32. In fact I'm not sure why I'm even asking for user input now that I think about it..

As discussed in the comments by yourself and #JoeKington, this should work (I removed the input stuff for testing)
import numpy as np
sample_rate = 400
Nyquist = sample_rate/2.0
fneeg = 'data.eeg'
t = 10
ch = 32
le = t*sample_rate
EEG = np.fromfile(fneeg, 'int16').reshape(ch, le, order='F')
Without the reshape, you get:
In [45]: EEG
Out[45]: array([ -39, -25, -22, ..., -168, -586, -46], dtype=int16)
In [46]: EEG.shape
Out[46]: (128000,)
With reshaping:
In [47]: EEG.reshape(ch, le, order='F')
Out[47]:
array([[ -39, -37, -12, ..., 5, 19, 21],
[ -25, -20, 7, ..., 20, 36, 36],
[ -22, -20, 0, ..., 18, 34, 36],
...,
[ 104, 164, 44, ..., 60, -67, -168],
[ 531, 582, 88, ..., 29, -420, -586],
[ -60, -63, -92, ..., -17, -44, -46]], dtype=int16)
In [48]: EEG.reshape(ch, le, order='F').shape
Out[48]: (32, 4000)

Related

Why python np.random.choice does not match with matlab randsample?

I'm converting Matlab program to Python.
I'm trying to random sampling from an array by using np.random.choice, but the result doesn't match with Matlab randsample.
For example,
I did this with Python,
np.random.seed(100)
a = np.arange(10, 110, 10)
np.random.choice(a, 2, True)
>> Output: array([90, 90])
And the following is Matlab,
rng(100)
a = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
randsample(a, 2, true)
>> Output: 60 30
Values in both arrays are different.
Am I doing something wrong?
Any help will be appreciated,
Thanks!

Numpy Group Reshaping / Indexing

The situation is I'd like to take the following Python / NumPy code:
# Procure some data:
z = np.zeros((32,32))
chunks = []
for i in range(0,32,step):
for j in range(0,32,step):
chunks.append( z[i:i+step,j:j+step] )
chunks = np.array(chunks)
chunks.shape # (256, 2, 2)
And vectorize it / remove the for loops. Is this possible? I don't mind much about ordering of the final array, e.g. 256,2,2 vs 2,2,256, as long as the spatial structure remains the same. That is, blocks of 2x2 from the original array.
Perhaps some magic using :: in addition to regular indexing can do this? Any NumPy masters here?
You may need transpose:
a = np.arange(1024).reshape(32,32)
a.reshape(16,2,16,2).transpose((0,2,1,3)).reshape(-1,2,2)
Output:
array([[[ 0, 1],
[ 32, 33]],
[[ 2, 3],
[ 34, 35]],
[[ 4, 5],
[ 36, 37]],
...,
[[ 986, 987],
[1018, 1019]],
[[ 988, 989],
[1020, 1021]],
[[ 990, 991],
[1022, 1023]]])

How to multiply two lists to matrices to a tensor?

I have the two lists of arrays
splocations = [array([1,2,3]),array([4,5,6]),array([7,8,9])]
eviddisp = [array([10,11,12]), array([13,14,15])]
which I would like to multiply with each other such that I multiply each list element (which is an array) with each other list element. Here I would get a 3x2 matrix where each element is a vector. So the matrix element [0,0] would be
array([10, 22, 36]) = array([1,2,3]) * array([10,11,12])
So this matrix would be in fact a tensor of shape 3x2x3. How can I get this tensor/matrix?
I get that I need to use array(splocations) and array(eviddisp) somehow. By I realised, I am looking for a solution with numpy's tensordot, but I don't get it right. How to I proceed?
I think this is what you want, taking automatic broadcasting into account:
from numpy import array
splocations = [array([1,2,3]),array([4,5,6]),array([7,8,9])]
eviddisp = [array([10,11,12]), array([13,14,15])]
splocations = array(splocations)
viddisp = array(eviddisp)
result = splocations[:, None, :]*eviddisp
result
array([[[ 10, 22, 36],
[ 13, 28, 45]],
[[ 40, 55, 72],
[ 52, 70, 90]],
[[ 70, 88, 108],
[ 91, 112, 135]]])

Find index of min value in a matrix

I've a 2-Dim array containing the residual sum of squares of a given fit (unimportant here).
RSS[i,j] = np.sum((spectrum_theo - sp_exp_int) ** 2)
I would like to find the matrix element with the minimum value AND its position (i,j) in the matrix. Find the minimum element is OK:
RSS_min = RSS[RSS != 0].min()
but for the index, I've tried:
ij_min = np.where(RSS == RSS_min)
which gives me:
ij_min = (array([3]), array([20]))
I would like to obtain instead:
ij_min = (3,20)
If I try :
ij_min = RSS.argmin()
I obtain:
ij_min = 0,
which is a wrong result.
Does it exist a function, in Scipy or elsewhere, that can do it? I've searched on the web, but I've found answers leading only with 1-Dim arrays, not 2- or N-Dim.
Thanks!
The easiest fix based on what you have right now would just be to extract the elements from the array as a final step:
# ij_min = (array([3]), array([20]))
ij_min = np.where(RSS == RSS_min)
ij_min = tuple([i.item() for i in ij_min])
Does this work for you
import numpy as np
array = np.random.rand((1000)).reshape(10,10,10)
print np.array(np.where(array == array.min())).flatten()
in the case of multiple minimums you could try something like
import numpy as np
array = np.array([[1,1,2,3],[1,1,4,5]])
print zip(*np.where(array == array.min()))
You can combine argmin with unravel_index.
For example, here's an array RSS:
In [123]: np.random.seed(123456)
In [124]: RSS = np.random.randint(0, 99, size=(5, 8))
In [125]: RSS
Out[125]:
array([[65, 49, 56, 43, 43, 91, 32, 87],
[36, 8, 74, 10, 12, 75, 20, 47],
[50, 86, 34, 14, 70, 42, 66, 47],
[68, 94, 45, 87, 84, 84, 45, 69],
[87, 36, 75, 35, 93, 39, 16, 60]])
Use argmin (which returns an integer that is the index in the flattened array), and then pass that to unravel_index along with the shape of RSS to convert the index of the flattened array into the indices of the 2D array:
In [126]: ij_min = np.unravel_index(RSS.argmin(), RSS.shape)
In [127]: ij_min
Out[127]: (1, 1)
ij_min itself can be used as an index into RSS to get the minimum value:
In [128]: RSS_min = RSS[ij_min]
In [129]: RSS_min
Out[129]: 8

iterating through float - variable input can change to float

i am calculating protein capacities (steric mass action model) within several loops (i know, filling up a numpy array can be quite slow and there are faster methods, but it works for now):
import numpy as np
a = [10,20,30] # salt concentrations tested
b = [4,5,6] # measured data points
c = 2 # number of components
q = np.empty((c,len(a),len(b)))
for ii,cs in enumerate(a):
for iii,cp in enumerate(b):
for i in range(c):
q[i,ii,iii] = cs*cp
Basically, q contains the measured data points for each component at each salt concentration and has the shape (number of components,number of salt concentrations,number of measurements). The code works fine. However, if i use only one salt concentration, the line for ii,cs in enumerate(a): does not work anymore (float object is not iterable).
I can use if statements. But is there a better way ( less confusing code) ?
When you use a single salt concentration, instead of writing
a = 2
write
a = [2]
This way you'll keep it as a list and your code will still work.
By the way, you can compute q using the following NumPy one-liner:
In [39]: np.tile(np.outer(a, b), (c, 1, 1))
Out[39]:
array([[[ 40, 50, 60],
[ 80, 100, 120],
[120, 150, 180]],
[[ 40, 50, 60],
[ 80, 100, 120],
[120, 150, 180]]])

Categories

Resources