Converting a string to a wav file in python - python

I'm new to Python, and to programming in general, so please don't take it too hard on me
I am currently trying to figure out how to write a new wav file using a string (which was derived from another wave file's data)
I performed a fourier transform on that file's data, so now I'm trying to get the values from the Fourier transform written into a new wav file.
I can only use numpy and the included Python library, not scipy
According to the documentation, I have to use wave_write(), but I have no idea what the code is supposed to look like for this function.
I think I'm supposed to do something pertaining to
wave_write.writeframesraw(data)
Then again, not totally sure of what to do.
Any help is greatly appreciated!

Two functions in NumPy can help you with this: astype and tostring.
If you have an array of sound samples, say X then you can convert it to the right format using astype. This will depend on what data type is used in the wav file, and the library you are using to save it. But let us for this example say you want to store it as 16 bit integer. You'll need to scale X according to the data type selected - so in this case the range will be -32768 to 32767 for a signed 16 bit int. If you sample goes from -1.0 to 1.0 then you can simply multiply with 32767.
The next part is simply to convert it to a string using tostring, it could look something the following:
scaled = X * 32767
scaled.astype('<i2').tostring()
You can find the documentation for the functions here:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.astype.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.tostring.html

Related

How to pass complex arguments in netcdf python

I would like to read a netcdf file using python. This file contain a netcdf variable in the double format.
I know that this quantity should be complex and I know that the last argument is always 2 numbers (real and im).
I would like to read the nedcdf variable IN AN EFFICIENT WAY and allocate it to a complex python/numpy variable.
For the moment I have the following INEFFICIENT program that work:
import numpy as N
self.EIG2D = N.zeros((self.nkpt,self.nband,3,self.natom,3,self.natom),dtype=complex)
EIG2Dtmp = root.variables['second_derivative_eigenenergies'][:,:,:,:,:,:,:] #number_of_atoms,
# number_of_cartesian_directions, number_of_atoms, number_of_cartesian_directions,
# number_of_kpoints, product_mband_nsppol, cplex
for ikpt in N.arange(nkpt):
for iband in N.arange(nband):
for icart in N.arange(3):
for iatom in N.arange(natom):
for jcart in N.arange(0,3):
for jatom in N.arange(natom):
self.EIG2D[ikpt,iband,icart,iatom,jcart,jatom] = complex(EIG2Dtmp[iatom,icart,jatom,jcart,ikpt,iband,0],\
EIG2Dtmp[iatom,icart,jatom,jcart,ikpt,iband,1])
How to make this more efficient ?
Thank you in advance,
Samuel.
Thanks to Spencer Hill, the solution for me was
self.EIG2D = numpy.vectorize(complex)(EIG2Dtmp[...,0], EIG2Dtmp[...,1])
You can also refer to Numpy: Creating a complex array from 2 real ones?

Saving mat files in different numerical data formats in scipy.io.savemat

I know this is very simple and I can't believe I haven't found anything on this anywhere but here it goes:
I have a large, high dimensional matrix in python that I want to save in .mat format (for matlab).
I'm using the scipy.io.savemat method to save this matrix but it's always saved as double. I would like to save it as something of lower precision, like single or 16 bit float.
I convert the array to a low-precision data type before saving but it's always saved as double. Is there really no way of saving mat files in a lower-precision float type?
.savemat does not seem to take a dtype argument.
import scipy as sp
sp.io.savemat('test.mat', {'test': sp.array([0.001, 1, 1.004], dtype='Float16')})
Apparently it needs to be either single or double. scipy.io.savemat() does not support other float precisions and it'll casually default to double if it doesn't like your dtype without warning.

How to read NetCDF variable float data into a Numpy array with the same precision and scale as the original NetCDF float values?

I have a NetCDF file which contains a variable with float values with precision/scale == 7/2, i.e. there are possible values from -99999.99 to 99999.99.
When I take a slice of the values from the NetCDF variable and look at them in in my debugger I see that the values I now have in my array have more precision/scale than what I see in the original NetCDF. For example when I look at the values in the ToosUI/ncdump viewer they display as '-99999.99' or '12.45' but when I look at the values in the slice array they look like '-99999.9921875' (a greater scale length). So if I'm using '-99999.99' as the expected value to indicate a missing data point then I won't get a match with what gets pulled into the slice array since those values have a greater scale length and the additional digits in the scale are not just zeros for padding.
For example I see this if I do a ncdump on a point within the NetCDF dataset:
Variable: precipitation(0:0:1, 40:40:1, 150:150:1)
float precipitation(time=1348, lat=180, lon=360);
:units = "mm/month";
:long_name = "precipitation totals";
data:
{
{
{-99999.99}
}
}
However if I get a slice of the data from the variable like so:
value = precipitationVariable[0:1:1, 40:41:1, 150:151:1]
then I see it like this in my debugger (Eclipse/PyDev):
value == ndarray: [[[-99999.9921875]]]
So it seems as if the NetCDF dataset values that I read into a Numpy array are not being read with the same precision/scale of the original values in the NetCDF file. Or perhaps the values within the NetCDF are actually the same as what I'm seeing when I read them, but what's shown to me via ncdump is being truncated due to some format settings in the ncdump program itself.
Can anyone advise as to what's happening here? Thanks in advance for your help.
BTW I'm developing this code using Python 2.7.3 on a Windows XP machine and using the Python module for the NetCDF4 API provided here: https://code.google.com/p/netcdf4-python/
There is no simple way of doing what you want because numpy stores the values as single precision, so they will always have the trailing numbers after 0.99.
However, netCDF already provides a mechanism for missing data (see the best practices guide). How was the netCDF file written in the first place? The missing_value is a special variable attribute that should be used to indicate those values that are missing. In the C and Fortran interfaces, when the file is created all variable values are set to be missing. If you wrote a variable all in one go, you can then set the missing_value attribute to an array of indices where the values are missing. See more about the fill values in the C and Fortran interfaces. This is the recommended approach. The python netCDF4 module plays well with these missing values, and such arrays are read as masked arrays in numpy.
If you must work with the file you currently have, then I'd suggest creating a mask to cover values around your missing value:
import numpy as np
value = precipitationVariable[:]
mask = (value < -99999.98) & (value > -100000.00)
value = np.ma.MaskedArray(value, mask=mask)

reading a binary file in python

I have to read a binary file in python. This is first written by a Fortran 90 program in this way:
open(unit=10,file=filename,form='unformatted')
write(10)table%n1,table%n2
write(10)table%nH
write(10)table%T2
write(10)table%cool
write(10)table%heat
write(10)table%cool_com
write(10)table%heat_com
write(10)table%metal
write(10)table%cool_prime
write(10)table%heat_prime
write(10)table%cool_com_prime
write(10)table%heat_com_prime
write(10)table%metal_prime
write(10)table%mu
if (if_species_abundances) write(10)table%n_spec
close(10)
I can easily read this binary file with the following IDL code:
n1=161L
n2=101L
openr,1,file,/f77_unformatted
readu,1,n1,n2
print,n1,n2
spec=dblarr(n1,n2,6)
metal=dblarr(n1,n2)
cool=dblarr(n1,n2)
heat=dblarr(n1,n2)
metal_prime=dblarr(n1,n2)
cool_prime=dblarr(n1,n2)
heat_prime=dblarr(n1,n2)
mu =dblarr(n1,n2)
n =dblarr(n1)
T =dblarr(n2)
Teq =dblarr(n1)
readu,1,n
readu,1,T
readu,1,Teq
readu,1,cool
readu,1,heat
readu,1,metal
readu,1,cool_prime
readu,1,heat_prime
readu,1,metal_prime
readu,1,mu
readu,1,spec
print,spec
close,1
What I want to do is reading this binary file with Python. But there are some problems.
First of all, here is my attempt to read the file:
import numpy
from numpy import *
import struct
file='name_of_my_file'
with open(file,mode='rb') as lines:
c=lines.read()
I try to read the first two variables:
dummy, n1, n2, dummy = struct.unpack('iiii',c[:16])
But as you can see I had to add to dummy variables because, somehow, the fortran programs add the integer 8 in those positions.
The problem is now when trying to read the other bytes. I don't get the same result of the IDL program.
Here is my attempt to read the array n
double = 8
end = 16+n1*double
nH = struct.unpack('d'*n1,c[16:end])
However, when I print this array I get non sense value. I mean, I can read the file with the above IDL code, so I know what to expect. So my question is: how can I read this file when I don't know exactly the structure? Why with IDL it is so simple to read it? I need to read this data set with Python.
What you're looking for is the struct module.
This module allows you to unpack data from strings, treating it like binary data.
You supply a format string, and your file string, and it will consume the data returning you binary objects.
For example, using your variables:
import struct
content = f.read() #I'm not sure why in a binary file you were using "readlines",
#but if this is too much data, you can supply a size to read()
n, T, Teq, cool = struct.unpack("dddd",content[:32])
This will make n, T, Teq, and cool hold the first four doubles in your binary file. Of course, this is just a demonstration. Your example looks like it wants lists of doubles - conveniently struct.unpack returns a tuple, which I take for your case will still work fine (if not, you can listify them). Keep in mind that struct.unpack needs to consume the whole string passed into it - otherwise you'll get a struct.error. So, either slice your input string, or only read the number of characters you'll use, like I said above in my comment.
For example,
n_content = f.read(8*number_of_ns) #8, because doubles are 8 bytes
n = struct.unpack("d"*number_of_ns,n_content)
Did you give scipy.io.readsav a try?
Simply read you file like this:
mydict = scipy.io.readsav('name_of_file')
It looks like you are trying to read the cooling_0000x.out file generated by RAMSES.
Note that the first two integers (n1, n2) provide the dimensions of the two dimentional tables (arrays) that follow in the body of the file... So you need to first process those two integers before you know how much real*8 data is in the rest of the file.
scipy should be of help -- it lets you read arbitrary dimensioned binary data:
http://wiki.scipy.org/Cookbook/InputOutput#head-e35c7736718209eea00ebf37a7e1dfb91df696e1
If you already have this python code, please let me know as I was going to write it today (17Sep2014).
Rick

How to save double to file in python?

Let's say I need to save a matrix(each line corresponds one row) that could be loaded from fortran later. What method should I prefer? Is converting everything to string is the only one approach?
You can save them in binary format as well. Please see the documentation on the struct standard module, it has a pack function for converting Python object into binary data.
For example:
import struct
value = 3.141592654
data = struct.pack('d', value)
open('file.ext', 'wb').write(data)
You can convert each element of your matrix and write to a file. Fortran should be able to load that binary data. You can speed up the process by converting a row as a whole, like this:
row_data = struct.pack('d' * len(matrix_row), *matrix_row)
Please note, that 'd' * len(matrix_row) is a constant for your matrix size, so you need to calculate that format string only once.
I don't know fortran, so it's hard to tell what is easy for you to perform on that side for parsing.
It sounds like your options are either saving the doubles in plaintext (meaning, 'converting' them to string), or in binary (using struct and the likes). The decision for which one is better depends.
I would go with the plaintext solution, as it means the files will be easily readable, and you won't have to mess with different kinds of details (endianity, default double sizes).
But, there are cases where binary is better (for example, if you have a really big list of doubles and space is of importance, or if it is easier for you to parse it and you need the optimization) - but this is likely not your case.
You can use JSON
import json
matrix = [[2.3452452435, 3.34134], [4.5, 7.9]]
data = json.dumps(matrix)
open('file.ext', 'wb').write(data)
File content will look like:
[[2.3452452435, 3.3413400000000002], [4.5, 7.9000000000000004]]
If legibility and ease of access is important (and file size is reasonable), Fortran can easily parse a simple array of numbers, at least if it knows the size of the matrix beforehand (with something like READ(FILE_ID, '2(F)'), I think):
1.234 5.6789e4
3.1415 9.265358978
42 ...
Two nested for loops in your Python code can easily write your matrix in this form.

Categories

Resources