Handling large binary files in Python [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a binary file (>1GB in size) which contains single precision data, created in Matlab.
I am new to Python and would like to read the same file structure in Python.
any help would be much appreciated:
From Matlab, I can load the file as follow:
fid = fopen('file.dat','r');
my_data = fread(fid,[117276,1794],'single');
Many thanks
InP

Using numpy is easiest with fromfile https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromfile.html:
np.fromfile('file.dat', dtype=np.dtype('single')).reshape((117276, 1794))
where np.dtype('single') is the same as np.dtype('float32')
Note that it may be transposed from what you want since MATLAB reads in column order, while numpy reshapes with row-order.
Also, I'm assuming that using numpy is ok since you are coming from MATLAB and probably will end up using it if you want to keep having MATLAB-like functions and not have to deal with pure python like these answers Reading binary file and looping over each byte

Related

Why does numpy use row-based data as opposed to column-based data? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 months ago.
Improve this question
My problem is a conceptional rather then an actual practical one:
What are the reasons numpy uses row-based data instead of column-based data?
I know that row-based data can be accessed faster from the CPU this way, thus increasing performance. Column-based data on the other hand would be more "mathematically correct".
The performance would be reason alone to justify the convention, but I just wanted to know if there are any other reasons this convention is used (I am aware that not only numpy uses this convention, but that it is used in general, so I suppose another reason is that this follows convention of other libraries too).
Note that I asked this question on the numpy github already, but I wanted to see if I can reach different people with different knowledge here.

Using python to convert wav file to csv file before feed the data into FFT for audio spectrum analyzer [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am working on a simple audio spectrum analyzer using FPGA. For the preprocessing part, my idea is to use python to convert wav file to csv file, and then feed the data to a fast fourier transform module. Is it possible to get it work?
There are plenty of available open source modules to perform this:
A GitHub repository for same.
Just open github and type wav to csv and you'll find quite a lot of them.
Or even google a bit and you can find lot of answers on same.
One small query though. You basically want to convert the .wav file into a time series data right?
In that case, I'll highly recommend to go through:
KDNugget's post about same.

Assign multiple dataset as one variable [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am extracting multiple dataset into one csv file.
data = Dataset(r'C:/path/2011.daily_rain.nc', 'r')
I successfully assigned one dataset but i still have ten more to work with in the same way. Are there any methods or functions can allow me to assign or combine multiple dataset as one variable?
From what you've described, it sounds like you want to perform the same task on each set of data. If that is the case, then consider using storing your dataset paths in an array, then using a for .. in loop to iterate through each path.
Consider the following sample code:
dataset_paths = [
"C:/path/some_data_file-0.nc",
"C:/path/some_data_file-1.nc",
"C:/path/some_data_file-2.nc",
"C:/path/some_data_file-3.nc",
# ... and the rest of your dataset file paths
]
for path in dataset_paths:
data = Dataset(path, 'r')
# Code that uses the data here
Everything in the for .. in block will be run for each path defined in the dataset_paths array. This will allow you to work with each dataset in the same way.

What is the best format to store the EMG signals [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I am working with Myo Armband, I am getting live EMG with python and MATLAB. Now I want to store it so that I can use to classify them when I get the same EMG again.
Raw EMG return value from -128 to 127 (8-bit value).
My question is, what is a good format to save the EMG signal and why?
If you can save them in .dat, you can use simple load in Matlab or python language.
Physionet uses this file format:
https://physionet.org/physiobank/database/emgdb/
If you have a large amount of data, I would suggest HDF5 because it is designed for exactly that and can be loaded by a variety of software, along with storing metadata (https://portal.hdfgroup.org)

Python, efficiently and easily substitute parts on very long string [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a big file with some format. I want to substitute some items in the string by others, the string is long (like a big XML but with no format).
I know where they are, i could locate them by using a regular expression for each, but i wonder which is the best method, easier and better if its the most efficient way.
format/% already searches the string for parameter placeholders internally. Since they're implemented in C, you're not gonna beat their performance with Python code even if your search and replace workload is somewhat simpler. See Faster alternatives to numpy.argmax/argmin which is slow for a glance on C to Python relative performance.

Categories

Resources