Between each word in the wav file I have full silence (I checked with Hex workshop and silence is represented with 0's).
How can I cut the non-silence sound?
I'm programming using python.
Thanks!
Python has a wav module. You can use it to open a wav file for reading and use the `getframes(1)' command to walk through the file frame by frame.
import wave
w = wave.open('beeps.wav', 'r')
for i in range():
frame = w.readframes(1)
The frame returned will be a byte string with hex values in it. If the file is stereo the result will look something like this (4 bytes):
'\xe2\xff\xe2\xff'
If its mono, it will have half the data (2 bytes):
'\xe2\xff'
Each channel is 2 bytes long because the audio is 16 bit. If is 8 bit, each channel will only be one byte. You can use the getsampwidth() method to determine this. Also, getchannels() will determine if its mono or stereo.
You can loop over these bytes to see if they all equal zero, meaning both channels are silent. In the following example I use the ord() function to convert the '\xe2' hex values to integers.
import wave
w = wave.open('beeps.wav', 'r')
for i in range(w.getnframes()):
### read 1 frame and the position will updated ###
frame = w.readframes(1)
all_zero = True
for j in range(len(frame)):
# check if amplitude is greater than 0
if ord(frame[j]) > 0:
all_zero = False
break
if all_zero:
# perform your cut here
print 'silence found at frame %s' % w.tell()
print 'silence found at second %s' % (w.tell()/w..getframerate())
It is worth noting that a single frame of silence doesn't necessarily denote empty space since the amplitude may cross the 0 mark normal frequencies. Therefore, it is recommended that a certain number of frames at 0 be observed before deciding if the region is, in fact, silent.
I have been doing some research on this topic for a project I'm working on and I came across a few problems with the solution provided, namely the method for determining silence is incorrect. A "more correct" implementation would be:
import struct
import wave
wave_file = wave.open("sound_file.wav", "r")
for i in range(wave_file.getnframes()):
# read a single frame and advance to next frame
current_frame = wave_file.readframes(1)
# check for silence
silent = True
# wave frame samples are stored in little endian**
# this example works for a single channel 16-bit per sample encoding
unpacked_signed_value = struct.unpack("<h", current_frame) # *
if abs(unpacked_signed_value[0]) > 500:
silent = False
if silent:
print "Frame %s is silent." % wave_file.tell()
else
print "Frame %s is not silent." % wave_file.tell()
References and Useful Links
*Struct Unpacking will be useful here: https://docs.python.org/2/library/struct.html
**A good reference I found explaining the format of wave files for dealing with different size bit-encodings and multiple channels is: http://www.piclist.com/techref/io/serial/midi/wave.html
Using the built-in ord() function in Python on the first element of the string object returned by the readframes(x) method will not work correctly.
Another key point is that multiple channel audio is interleaved and thus a little extra logic is needed for dealing with channels. Again, the link above goes into detail about this.
Hopefully this helps someone in the future.
Here are some of the more important points from the link, and what I found helpful.
Data Organization
All data is stored in 8-bit bytes, arranged in Intel 80x86 (ie, little endian) format. The bytes of multiple-byte values are stored with the low-order (ie, least significant) bytes first. Data bits are as follows (ie, shown with bit numbers on top):
7 6 5 4 3 2 1 0
+-----------------------+
char: | lsb msb |
+-----------------------+
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8
+-----------------------+-----------------------+
short: | lsb byte 0 | byte 1 msb |
+-----------------------+-----------------------+
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 23 22 21 20 19 18 17 16 31 30 29 28 27 26 25 24
+-----------------------+-----------------------+-----------------------+-----------------------+
long: | lsb byte 0 | byte 1 | byte 2 | byte 3 msb |
+-----------------------+-----------------------+-----------------------+-----------------------+
Interleaving
For multichannel sounds (for example, a stereo waveform), single sample points from each channel are interleaved. For example, assume a stereo (ie, 2 channel) waveform. Instead of storing all of the sample points for the left channel first, and then storing all of the sample points for the right channel next, you "mix" the two channels' sample points together. You would store the first sample point of the left channel. Next, you would store the first sample point of the right channel. Next, you would store the second sample point of the left channel. Next, you would store the second sample point of the right channel, and so on, alternating between storing the next sample point of each channel. This is what is meant by interleaved data; you store the next sample point of each of the channels in turn, so that the sample points that are meant to be "played" (ie, sent to a DAC) simultaneously are stored contiguously.
See also How to edit raw PCM audio data without an audio library?
I have no experience with this, but have a look at the wave module present in the standard library. That may do what you want. Otherwise you'll have to read the file as a byte stream an cut out sequences of 0-bytes (but you cannot just cut out all 0-bytes, as that would invalidate the file...)
You might want to try using sox, a command-line sound processing tool. It has many modes, one of them is silence:
silence: Removes silence from the beginning, middle, or end of a sound file. Silence is anything below a specified threshold.
It supports multiple sound formats and it's quite fast, so parsing large files shouldn't be a problem.
To remove silence from the middle of a file, specify a below_periods that is negative. This value is then treated as a positive value and is also used to indicate the effect should restart processing as specified by the above_periods, making it suitable for removing periods of silence in the middle of the sound file.
I haven't found any python building for libsox, though, but You can use it as You use all command line programs in python (or You can rewrite it - use sox sources for guidance then).
You will need to come up with some threshold value of a minimum number of consecutive zeros before you cut them. Otherwise you'll be removing perfectly valid zeros from the middle of normal audio data. You can iterate through the wave file, copying any non-zero values, and buffering up zero values. When you're buffering zeroes and eventually come across the next non-zero, if the buffer has fewer samples that the threshold, copy them over, otherwise discard it.
Python is not a great tool for this sort of task though. :(
Related
I am trying to debug why something is not quite working and observed thatb64encode does not seem to work quite as I imagined:
import base64
base64.b64encode( bytes("the cat sat on the mat", "utf-8") )
>> b'dGhlIGNhdCBzYXQgb24gdGhlIG1hdA=='
base64.b64encode( bytes("cat sat on the mat", "utf-8") )
>> b'Y2F0IHNhdCBvbiB0aGUgbWF0'
The second input string has only a small difference at the start, so why is it it that the output for each of these strings contains virtually no similarity? Would have expected only the start of each output to be a bit different.
Base64 maps 3 input bytes to 4 output bytes.
Since you added 4 input bytes, the means all of the remaining bytes "shifted" into different locations in the output.
Notice the == (padding) on the first example which went away on the second.
Try adding or removing multiples of 3 input bytes:
cat sat on the mat
my cat sat on the mat
Base64 is a fully deterministic, reversible transformation, but it does not operate on a per-character basis (as you can also observe from the output length not being a multiple of the input).
Rather, groups of three bytes (24 bits) are encoded at a time by turning them into four 6-bit numbers (hence base 64 = 2^6). If the input length is not a multiple of three, it is padded and indicated as such by putting = characters at the end of the output.
Therefore, common substrings in different inputs will only show up as a common substring in the output if they are aligned on this three-byte frame, and grouped into the same triples.
the cat sat on the mat
dGhlIGNhdCBzYXQgb24gdGhlIG1hdA==
he cat sat on the mat
aGUgY2F0IHNhdCBvbiB0aGUgbWF0
e cat sat on the mat
ZSBjYXQgc2F0IG9uIHRoZSBtYXQ=
cat sat on the mat
IGNhdCBzYXQgb24gdGhlIG1hdA==
Observe that if you truncate exactly three characters ("the", leaving the space), the output becomes recognizable again.
I am reading in a series of CAN BUS frames from python-can represented as hex strings, e.g. '9819961F9FFF7FC1' and I know the values in each frame are laid out as follows:
Signal Startbit Length
A 0 8
B 8 4
C 12 4
D 16 12
E 28 12
F 40 16
G 56 4
With each value being an unsigned integer, with little endian byte order. Where I am struggling is how to deal with the 12 bit signals, and how to do it fast as this will be running in real time. As far as I understand struct.unpack only supports 1,2,4, and 8 byte integers. The Bitstring package also only supports whole-byte bitstrings when you specify the endianness.
I clearly don't understand binary well enough to do it by manipulating the bits directly because I have been tearing my hair out trying to get sensible values...
I was able to decode the frame successfully and reasonably quickly with the bitstruct library, which can handle values with any number of bits, as in the code below.
However I found I also had to swap the location of the hex characters if two signals are present on the same byte, as in the CAN frame layout. I'm still not sure why, but it does work.
swapped_frame = frame[0:2] + frame[3] + frame[2] + frame[4:6] + frame[7] + \
frame[6] + frame[8:]
ba = bytearray(swapped_frame.decode('hex'))
A,B,C,D,E,F,G = bitstruct.unpack('<u8u4u4u12u12u16u4', ba)
I am working on an exercise for foo.bar and the basic idea is to take a list of integers and do some things to it to derive a specific subset of that list, then XOR (for a checksum) those values by means of this:
result = 0^1^2^3^4^6
Which equals 2
Another example:
result2 = 17^18^19^20^21^22^23^25^26^29
Which equals 14
I am not quite sure what is going on here and how these values(2, 14) are arrived at.
Actual Description of Problem from Foo.Bar
> Queue To Do
You're almost ready to make your move to destroy the LAMBCHOP doomsday device, but the security checkpoints that guard the underlying systems of the LAMBCHOP are going to be a problem. You were able to take one down without tripping any alarms, which is great! Except that as Commander Lambda's assistant, you've learned that the checkpoints are about to come under automated review, which means that your sabotage will be discovered and your cover blown - unless you can trick the automated review system.
To trick the system, you'll need to write a program to return the same security checksum that the guards would have after they would have checked all the workers through. Fortunately, Commander Lambda's desire for efficiency won't allow for hours-long lines, so the checkpoint guards have found ways to quicken the pass-through rate. Instead of checking each and every worker coming through, the guards instead go over everyone in line while noting their security IDs, then allow the line to fill back up. Once they've done that they go over the line again, this time leaving off the last worker. They continue doing this, leaving off one more worker from the line each time but recording the security IDs of those they do check, until they skip the entire line, at which point they XOR the IDs of all the workers they noted into a checksum and then take off for lunch. Fortunately, the workers' orderly nature causes them to always line up in numerical order without any gaps.
For example, if the first worker in line has ID 0 and the security checkpoint line holds three workers, the process would look like this:
0 1 2 /
3 4 / 5
6 / 7 8
where the guards' XOR (^) checksum is 0^1^2^3^4^6 == 2.
Likewise, if the first worker has ID 17 and the checkpoint holds four workers, the process would look like:
17 18 19 20 /
21 22 23 / 24
25 26 / 27 28
29 / 30 31 32
which produces the checksum 17^18^19^20^21^22^23^25^26^29 == 14.
All worker IDs (including the first worker) are between 0 and 2000000000 inclusive, and the checkpoint line will always be at least 1 worker long.
With this information, write a function answer(start, length) that will cover for the missing security checkpoint by outputting the same checksum the guards would normally submit before lunch. You have just enough time to find out the ID of the first worker to be checked (start) and the length of the line (length) before the automatic review occurs, so your program must generate the proper checksum with just those two values.
Test cases
Inputs:
(int) start = 0
(int) length = 3
Output:
(int) 2
Inputs:
(int) start = 17
(int) length = 4
Output:
(int) 14
You want to look at bitwise operations, this seems to be a somewhat reasonable article about it: https://www.programiz.com/c-programming/bitwise-operators
The basic idea behind bitwise operations is, that each number is represented in base 2, binary format inside the computer. The binary operators use this number representation for their computations.
If you apply an operation (like xor ^, and & or or |) then you use this representation.
A binary number is expressed like this, and can be converted into a decimal representation (and vice versa):
0b1101 = 1 + 4 + 8 = 13
Where every bit represents a power of two.
When xoring two number, say 0b1100 and 0b1010, you are creating a new number, only with those bits set, which are not the same in the two arguments
0b1100 ^ 0b1010 = 0b0110
From your concrete example: 0^1^2^3^4^6 == 2
0^1 = 0b0000^0b0001 = 1
1^2 = 0b0010^0b0001 = 3
3^3 = 0b0011^0b0011 = 0
0^4 = 0b0000^0b0100 = 4
4^6 = 0b0100^0b0110 = 2
Let's look at the values in the first example. Keep in mind that these are all binary values for the bit-wise operators. Note that the actual result for what you posted is 2, not 6: check it on the Python interpreter.
XOR is a parity operator on the given input: if the number of 1 bits is even, XOR returns 0; if it's odd, XOR returns 1. Let's see how many 1 bits are in each binary column:
0000
0001
0010
0011
0100
0110
----
0232 -- decimal count
0010 -- 1 for odd, 0 for even
... and this is the decimal number 2.
If you do the same thing with the longer sequence of larger numbers, XOR's parity comes out to 1110, or 14 decimal.
Note that this will detect several classes of simple errors, but it's main weakness is that it can't detect when two items are reversed in order. For instance, [1, 2, 3] and [2, 1, 3] will have the same checksum. There's a simply upgrade call the "cyclic redundancy check" (CRC) that does an XOR, but rotates the input one more place for each input. The first items is rotated 1 bit, the second item 2 bits, etc.
From a simulation tool I get a binary file containing some measurement points. What I need to do is: parse the measurement values and store them in a list.
According to the documentation of the tool, the data structure of the file looks like this:
First 16 bytes are always the same:
Bytes 0 - 7 char[8] Header
Byte 8 u. char Version
Byte 9 u. char Byte-order (0 for little endian)
Bytes 10 - 11 u. short Record size
Bytes 12 - 15 char[4] Reserved
The quantities are following: (for example one double and one float):
Bytes 16 - 23 double Value of quantity one
Bytes 24 - 27 float Value of quantity two
Bytes 28 - 35 double Next value of quantity one
Bytes 36 - 39 float Next value of quantity two
I also know, that the encoding is little endian.
In my usecase there are two quantities but both of them are floats.
My code so far looks like this:
def parse(self, filePath):
infoFilePath = filePath+ '.info'
quantityList = self.getQuantityList(infoFilePath)
blockSize = 0
for quantity in quantityList:
blockSize += quantity.bytes
with open(filePath, 'r') as ergFile:
# read the first 16 bytes, as they are not needed now
ergFile.read(16)
# now read the rest of the file block wise
block = ergFile.read(blockSize)
while len(block) == blockSize:
for q in quantityList:
q.values.append(np.fromstring(block[:q.bytes], q.dataType)[0])
block = block[q.bytes:]
block = ergFile.read(blockSize)
return quantityList
QuantityList comes from a previous function and contains the quantity structure. Each quantity has a name, dataType, lenOfBytes called bytes and a prepared list for the values called values.
So in my usecase there are two quantities with:
dataType = "<f"
bytes = 4
values=[]
After the parse function has finished I plot the first quantity with matplotlib. As you can see from the attached Images something went wrong during the parsing.
My parsed values:
The reference:
But I am not able to find my fault.
i was able to solve my problem this morning.
The solution couldnt be any easier.
I changed
...
with open(ergFilePath, 'r') as ergFile:
...
to:
...
with open(ergFilePath, 'rb') as ergFile:
...
Notice the change from 'r' to 'rb' as mode.
The python docu made Things clear for me:
Thus, when opening a binary file, you should append 'b' to the mode
value to open the file in binary mode, which will improve portability.
(Appending 'b' is useful even on systems that don’t treat binary and
text files differently, where it serves as documentation.)
So the final parsed values look like this:
Final values
Using Python (3.1 or 2.6), I'm trying to read data from binary data files produced by a GPS receiver. Data for each hour is stored in a separate file, each of which is about 18 MiB. The data files have multiple variable-length records, but for now I need to extract data from just one of the records.
I've got as far as being able to decode, somewhat, the header. I say somewhat because some of the numbers don't make sense, but most do. After spending a few days on this (I've started learning to program using Python), I'm not making progress, so it's time to ask for help.
The reference guide gives me the message header structure and the record structure. Headers can be variable length but are usually 28 bytes.
Header
Field # Field Name Field Type Desc Bytes Offset
1 Sync char Hex 0xAA 1 0
2 Sync char Hex 0x44 1 1
3 Sync char Hex 0x12 1 2
4 Header Lgth uchar Length of header 1 3
5 Message ID ushort Message ID of log 2 4
8 Message Lgth ushort length of message 2 8
11 Time Status enum Quality of GPS time 1 13
12 Week ushort GPS week number 2 14
13 Milliseconds GPSec Time in ms 4 16
Record
Field # Data Bytes Format Units Offset
1 Header 0
2 Number of SV Observations 4 integer n/a H
*For first SV Observation*
3 PRN 4 integer n/a H+4
4 SV Azimuth angle 4 float degrees H+8
5 SV Elevation angle 4 float degrees H+12
6 C/N0 8 double db-Hz H+16
7 Total S4 8 double n/a H+24
...
27 L2 C/N0 8 double db-Hz H+148
28 *For next SV Observation*
SV Observation is satellite - there could be anywhere from 8 to 13
in view.
Here's my code for trying to make sense of the header:
import struct
filename = "100301_110000.nvd"
f = open(filename, "rb")
s = f.read(28)
x, y, z, lgth, msg_id, mtype, port, mlgth, seq, idletime, timestatus, week, millis, recstatus, reserved, version = struct.unpack("<cccBHcBHHBcHLLHH", s)
print(x, y, z, lgth, msg_id, mtype, port, mlgth, seq, idletime, timestatus, week, millis, recstatus, reserved, version)
It outputs:
b'\xaa' b'D' b'\x12' 28 274 b'\x02' 32 1524 0 78 b'\xa0' 1573 126060000 10485760 3545 35358
The 3 sync fields should return xAA x44 x12. (D is the ascii equiv of x44 - I assume.)
The record ID for which I'm looking is 274 - that seems correct.
GPS week is returned as 1573 - that seems correct.
Milliseconds is returned as 126060000 - I was expecting 126015000.
How do I go about finding the records identified as 274 and extracting them? (I'm learning Python, and programming, so keep in mind the answer you give an experienced coder might be over my head.)
You have to read in pieces. Not because of memory constraints, but because of the parsing requirements. 18MiB fits in memory easily. On a 4Gb machine it fits in memory 200 times over.
Here's the usual design pattern.
Read the first 4 bytes only. Use struct to unpack just those bytes.
Confirm the sync bytes and get the header length.
If you want the rest of the header, you know the length, read the rest of the bytes.
If you don't want the header, use seek to skip past it.
Read the first four bytes of a record to get the number of SV Observations. Use struct to unpack it.
Do the math and read the indicated number of bytes to get all the SV Observations in the record.
Unpack them and do whatever it is you're doing.
I strongly suggest building namedtuple objects from the data before doing anything else with it.
If you want all the data, you have to actually read all the data.
"and without reading an 18 MiB file one byte at a time)?" I don't understand this constraint. You have to read all the bytes to get all the bytes.
You can use the length information to read the bytes in meaningful chunks. But you can't avoid reading all the bytes.
Also, lots of reads (and seeks) are often fast enough. Your OS buffers for you, so don't worry about trying to micro-optimize the number of reads.
Just follow the "read length -- read data" pattern.
18 MB should fit comfortably in memory, so I'd just gulp the whole thing into one big string of bytes with a single with open(thefile, 'rb') as f: data = f.read() and then perform all the "parsing" on slices to advance record by record. It's more convenient, and may well be faster than doing many small reads from here and there in the file (though it doesn't affect the logic below, because in either case the "current point of interest in the data" is always moving [[always forward, as it happens]] by amounts computed based on the struct-unpacking of a few bytes at a time, to find the lengths of headers and records).
Given the "start of a record" offset, you can determine its header's length by looking at just one byte ("field four", offset 3 from start of header that's the same as start of record) and look at message ID (next field, 2 bytes) to see if it's the record you care about (so a struct unpack of just those 3 bytes should suffice for that).
Whether it's the record you want or not, you next need to compute the record's length (either to skip it or to get it all); for that, you compute the start of the actual record data (start of record plus length of header plus the next field of the record (the 4 bytes right after the header) times the length of an observation (32 bytes if I read you correctly).
This way you either isolate the substring to be given to struct.unpack (when you've finally reached the record you want), or just add the total length of header + record to the "start of record" offset, to get the offset for the start of the next record.
Apart from writing a parser that correctly reads the file, you may try a somewhat brute-force approach...read the data to the memory and split it using the 'Sync' sentinel. Warning - you might get some false positives. But...
f = open('filename')
data = f.read()
messages = data.split('\xaa\x44\x12')
mymessages = [ msg for msg in messages if len(msg) > 5 and msg[4:5] == '\x12\x01' ]
But it is rather a very nasty hack...