reading .wav files in python - python

I'm trying (for a course) to read a sound file .wav via ipython. When I try the 'normal' code to read a file:
from scipy.io.wavfile import read
(fs,x) = read ('/Users/joehigham/Desktop/Audio_1.wav')
I get the well known traceback call of
ValueError: string size must be a multiple of element size
Can anyone point me in the right direction as to why this happens, and of course how can I right the problem?
Thanks in advance - I did look round SO for the solution, but nothing (that I found) seems to match this problem with sound files.

Your wav file probably has 24 bit data. You can check with:
import wave
w = wave.open("filename.wav")
print(w.getsampwidth())
If the value printed is 3, your data is 24 bit. If that is the case, scipy.io.wavfile won't work. I wrote a reader that handles 24 bit data; see https://github.com/WarrenWeckesser/wavio (which replaced the gist at https://gist.github.com/WarrenWeckesser/7461781). The reader is also on PyPI.

Related

Is there any feasible solution to read WOT battle results .dat files?

I am new here to try to solve one of my interesting questions in World of Tanks. I heard that every battle data is reserved in the client's disk in the Wargaming.net folder because I want to make a batch of data analysis for our clan's battle performances.
image
It is said that these .dat files are a kind of json files, so I tried to use a couple of lines of Python code to read but failed.
import json
f = open('ex.dat', 'r', encoding='unicode_escape')
content = f.read()
a = json.loads(content)
print(type(a))
print(a)
f.close()
The code is very simple and obviously fails to make it. Well, could anyone tell me the truth about that?
Added on Feb. 9th, 2022
After I tried another set of codes via Jupyter Notebook, it seems like something can be shown from the .dat files
import struct
import numpy as np
import matplotlib.pyplot as plt
import io
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
fbuff = io.BufferedReader(f)
N = len(fbuff.read())
print('byte length: ', N)
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
data =struct.unpack('b'*N, f.read(1*N))
The result is a set of tuple but I have no idea how to deal with it now.
Here's how you can parse some parts of it.
import pickle
import zlib
file = '4402905758116487.dat'
cache_file = open(file, 'rb') # This can be improved to not keep the file opened.
# Converting pickle items from python2 to python3 you need to use the "bytes" encoding or "latin1".
legacyBattleResultVersion, brAllDataRaw = pickle.load(cache_file, encoding='bytes', errors='ignore')
arenaUniqueID, brAccount, brVehicleRaw, brOtherDataRaw = brAllDataRaw
# The data stored inside the pickled file will be a compressed pickle again.
vehicle_data = pickle.loads(zlib.decompress(brVehicleRaw), encoding='latin1')
account_data = pickle.loads(zlib.decompress(brAccount), encoding='latin1')
brCommon, brPlayersInfo, brPlayersVehicle, brPlayersResult = pickle.loads(zlib.decompress(brOtherDataRaw), encoding='latin1')
# Lastly you can print all of these and see a lot of data inside.
The response contains a mixture of more binary files as well as some data captured from the replays.
This is not a complete solution but it's a decent start to parsing these files.
First you can look at the replay file itself in a text editor. But it won't show the code at the beginning of the file that has to be cleaned out. Then there is a ton of info that you have to read in and figure out but it is the stats for each player in the game. THEN it comes to the part that has to do with the actual replay. You don't need that stuff.
You can grab the player IDs and tank IDs from WoT developer area API if you want.
After loading the pickle files like gabzo mentioned, you will see that it is simply a list of values and without knowing what the value is referring to, its hard to make sense of it. The identifiers for the values can be extracted from your game installation:
import zipfile
WOT_PKG_PATH = "Your/Game/Path/res/packages/scripts.pkg"
BATTLE_RESULTS_PATH = "scripts/common/battle_results/"
archive = zipfile.ZipFile(WOT_PKG_PATH, 'r')
for file in archive.namelist():
if file.startswith(BATTLE_RESULTS_PATH):
archive.extract(file)
You can then decompile the python files(uncompyle6) and then go through the code to see the identifiers for the values.
One thing to note is that the list of values for the main pickle objects (like brAccount from gabzo's code) always has a checksum as the first value. You can use this to check whether you have the right order and the correct identifiers for the values. The way these checksums are generated can be seen in the decompiled python files.
I have been tackling this problem for some time (albeit in Rust): https://github.com/dacite/wot-battle-results-parser/tree/main/datfile_parser.

How to get what is stored in 'data' field of las point using liblas?

I am working with mulipulse lidar data that collects points along a number of lines within the flight path. I am trying to determine the name and number of individual lines within the las file. I am using liblas module in python.
I found this documentation that explains the different fields stored in an las file. It mentions a data field (get_data and set_data) at the very bottom of the page.
The 'point data format' and 'point data record length' in the header set aside space for this 'data' field. My header says I have 28 bytes set aside for the data field, and there are 28 values stored in the data field. The 19th value (at least in two datasets from two different sensors) refers to the line number. I have a single value in single pulse data and 4 in multi-pulse data.
I was wondering if there is a standard for what is stored in this field or if it is proprietary.
Also, as a way to get the name of each scan line, I wrote the following code:
import liblas
from liblas import file as lasfile
# Get parameters
las_file = r"E:\Testing\00101.las"
f = lasfile.File(las_file, mode='r')
line_list = []
counter = 0
for p in f:
line_num = p.data[18]
if line_num not in line_list:
line_list.append(line_num)
counter += 1
print line_list
It results with the following error:
Traceback (most recent call last):
File "D:\Tools\Python_Scripts\point_info.py", line 46, in <module>
line_num = p.data[18]
File "C:\Python27\ArcGIS10.1\lib\site-packages\liblas\point.py", line 560, in get_data
length = self.header.data_record_length
File "C:\Python27\ArcGIS10.1\lib\site-packages\liblas\point.py", line 546, in get_header
return header.Header(handle=core.las.LASPoint_GetHeader(self.handle))
WindowsError: [Error -529697949] Windows Error 0xE06D7363
Does anyone know more about the line numbers stored in the las point/header? Can anyone explain the error? It seems to allocate nearly 2gb of ram before I get the error. I am on win xp, so I'm guessing its a memory error, but I don't understand why accessing this 'data' field hogs memory. Any help is greatly appreciated.
I don't pretend to be an expert in any of this, but I'm intrigued by GIS data so this caught my interest. I installed liblas and its dependencies on my Fedora 19 system and played with the example data files that came with liblas.
Using your code I ran into the same problem of watching all my memory get eaten up. I don't know why that should happen - perhaps unwanted references hanging around preventing the garbage collector from working as we'd hope. This could probably be fixed, but I won't try it.
I did notice some interesting features of the liblas module and decided to try them. I believe you can get the data you seek.
After opening your file, have a look at the XML description from the header.
h = f.get_header()
print(h.get_xml())
It's hard to look at (feel free to play with xml.dom.minidom or lxml.etree), but in my example files it showed the byte layout of the point data (mine had 28 bytes too). In mine, offset 18 was a single short (2 bytes) assigned to Point Source ID. You should be able to retrieve this with p.data[18:19], p.get_data()[18:19], p.point_source_id, or p.get_point_source_id(). Unfortunately the data references chew up memory and p.point_source_id has a bug (bug fix pull request submitted to developers). If we change your code to use the last access method, everything seems to work fine. So, try this in your for loop instead:
for p in f:
line_num = p.get_point_source_id()
if line_num not in line_list:
line_list.append(line_num)
counter += 1
Note that
counter == h.get_count()
If you just want the set of unique Point Source ID values ...
line_set = set(p.get_point_source_id() for p in f)
Hopefully your data value is also available as p.get_point_source_id(). Let me know how it works for you in the comments. Cheers!

Linux and python: Combining multiple wave files to one wave file

I am looking for a way that I can combine multiple wave files into one wave file using python and run it on linux. I don't want to use any add on other than the default shell command line and default python modules.
For example, if I have a.wav and b.wav. I want to create a c.wav which start with the content from a.wav then b.wav.
I've found wave module, that I can open a wave file and write into a new file. Since i'm really new in this audio world. I still can't figure out how to do it. Below is my code
import struct, wave
waveFileA = wave.open('./a.wav', 'r')
waveFileB = wave.open('./b.wav', 'r')
waveFileC = wave.open('./c.wav', 'w')
lengthA = waveFileA.getnframes()
for i in range(0,lengthA):
waveFileC.writeframes(waveFileA.readframes(1))
lengthB = waveFileB.getnframes()
for i in range(0,lengthB):
waveFileC.writeframes(waveFileB.readframes(1))
waveFileA.close()
waveFileB.close()
waveFileC.close()
When i run this code, I got this error:
wave.Error: # channels not specified
Please can any one help me?
You need to set the number of channels, sample width, and frame rate:
waveFileC.setnchannels(waveFileA.getnchannels())
waveFileC.setsampwidth(waveFileA.getsampwidth())
waveFileC.setframerate(waveFileA.getframerate())
If you want to handle a.wav and b.wav having different settings, you'll want to use something like pysox to convert them to the same settings, or for nchannels and sampwidth you may be able to tough through it yourself.
Looks like you need to call n=waveFileA.getnchannels() to find out how many channels the first input file uses, likewise for waveFileB, then you'll need to use waveFileC.setnchannels(n) to tell it how many channels to put in the outgoing file. I don't know how it will handle input files with different numbers of channels...
Here is the answer I am looking for
How to join two wav files using python?
(look for a thread by Tom 10)
It's in another thread. some one already solved this problem.

Read matlab file (*.mat) from zipped file without extracting to directory in Python

This specific questions stems from the attempt to handle large data sets produced by a MATLAB algorithm so that I can process them with python algorithms.
Background: I have large arrays in MATLAB (typically 20x20x40x15000 [i,j,k,frame]) and I want to use them in python. So I save the array to a *.mat file and use scipy.io.loadmat(fname) to read the *.mat file into a numpy array. However, a problem arises in that if I try to load the entire *.mat file in python, a memory error occurs. To get around this, I slice the *.mat file into pieces, so that I can load the pieces one at a time into a python array. If I divide up the *.mat by frame, I now have 15,000 *.mat files which quickly becomes a pain to work with (at least in windows). So my solution is to use zipped files.
Question: Can I use scipy to directly read a *.mat file from a zipped file without first unzipping the file to the current working directory?
Specs: Python 2.7, windows xp
Current code:
import scipy.io
import zipfile
import numpy as np
def readZip(zfilename,dim,frames):
data=np.zeros((dim[0],dim[1],dim[2],frames),dtype=np.float32)
zfile = zipfile.ZipFile( zfilename, "r" )
i=0
for info in zfile.infolist():
fname = info.filename
zfile.extract(fname)
mat=scipy.io.loadmat(fname)
data[:,:,:,i]=mat['export']
mat.clear()
i=i+1
return data
Tried code:
mat=scipy.io.loadmat(zfile.read(fname))
produces this error:
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
mat=scipy.io.loadmat(zfile.open(fname))
produces this error:
fileobj.seek(0)
UnsupportedOperation: seek
Any other suggestions on handling the data are appreciated.
Thanks!
I am pretty sure that the answer to my question is NO and there are better ways to accomplish what I am trying to do.
Regardless, with the suggestion from J.F. Sebastian, I have devised a solution.
Solution: Save the data in MATLAB in the HDF5 format, namely hdf5write(fname, '/data', data_variable). This produces a *.h5 file which then can be read into python via h5py.
python code:
import h5py
r = h5py.File(fname, 'r+')
data = r['data']
I can now index directly into the data, however is stays on the hard drive.
print data[:,:,:,1]
Or I can load it into memory.
data_mem = data[:]
However, this once again gives memory errors. So, to get it into memory I can loop through each frame and add it to a numpy array.
h5py FTW!
In one of my frozen applications we bundle some files into the .bin file that py2exe creates, then pull them out like this:
z = zipfile.ZipFile(os.path.join(myDir, 'common.bin'))
data = z.read('schema-new.sql')
I am not certain if that would feed your .mat files into scipy, but I'd consider it worth a try.

Python open raw audio data file

I have these files with the extension ".adc". They are simply raw data files. I can open them with Audacity using File->Import->Raw data with encoding "Signed 16 bit" and sample rate "16000 Khz".
I would like to do the same with python. I think that audioop module is what I need, but I can't seem to find examples on how to use it for something that simple.
The main goal is to open the file and play a certain location in the file, for example from the second 10 to the second 20. Is there something out there for my task ?
Thanx in advance.
For opening the file, you just need file().
For finding a location, you don't need audioop: you just need to convert seconds to bytes and get the required bytes of the file. For instance, if your file is 16 kHz 16bit mono, each second is 32,000 bytes of data. So the 10th second is 320kB into the file. Just seek to the appropriate place in the file and then read the appropriate number of bytes.
And audioop can't help you with the hardest part: namely, playing the audio. The correct way to do this very much depends on your OS.
EDIT: Sorry, I just noticed your username is "thelinuxer". Consider pyAO for playing audio from Python on Linux. You will probably need to change the sample format to play the audio---audioop will help you with this (see ratecv, tomono/tostereo, lin2lin, and bias)
Thanx a lot I was able to do the following:
def play_data(filename, first_sec, second_sec):
import ao
from ao import AudioDevice
dev = AudioDevice(2, bits=16, rate=16000,channels=1)
f = open(filename, 'r')
data_len = (second_sec-first_sec)*32000
f.seek(32000*first_sec)
data = f.read(data_len)
dev.play(data)
f.close()
play_data('AR001_3.adc', 2.5, 5)
You can use PySoundFile to open the file as a NumPy array and play it with python-sounddevice.
import soundfile as sf
import sounddevice as sd
sig, fs = sf.read('myfile.adc', channels=2, samplerate=16000,
format='RAW', subtype='PCM_16')
sd.play(sig, fs)
You can use indexing on the NumPy array to select a certain part of the audio data.

Categories

Resources