get the duration of an MP3 in Microcontroller

get the duration of an MP3 in Microcontroller - python

Good day
I have been asked to do a project which consist of one STM32 and VS1003 , a FAT32 USB host MP3 player .
all the parts are done but now ,I need to get the duration of a song.
unfortunately TLEN is not available on all the songs so i cant count on it.
my understanding is an mp3 is made by frames and each frame is 0.026 second , each frame starts with 0XFF 0xFX (X can be any) so i need to search for 0xFFFx in 2 separate bytes and count them, then multiply by 0.026 and get the duration ,
since Microcontroller has limited SRAM file needs to be read 2048 bytes by 2048 bytes from USB , i decided to test this theory in computer by Python first then change it to C on microcontroller (for ease of testing the algorithm), but the numbers i'm getting is a lot more than what is expected .
for example an mp3 gives me 25300 of 0XFF 0XFX which translates to 657.5 Second , but i know that the it in fact is 187 Seconds
it seems that 0XFF 0xFx is in the middle of the song too
is there any reliable way to count the headers ? or is there any other way to get the lenght without counting the header ?
any notes or basic code (in python or c or js) is appreciated in advance

The frame sync marker is not 0xFFFx where x is any four bits, it's 0xFFFx or 0xFFEx. Because the same patterns can appear in the audio data, a brute-force search for the patterns won't work -- you'll have to find the first instance of the sync marker, and then calculate the byte length of each frame from the bitrate in the frame header. There's a post on that calculation already, here:
Formula from mp3 Frame Length

Related

Decode serial strings from recorded voltage signal

I used Arduino (a Teensyduino) to intermittently print strings through Serial. These strings are basically of integers ranging from 1 to 1000,
e.g.
Serial1.print('456');
delay(1000);
Serial1.print('999');
At the same time, I directly record the voltage output from the serial transmission pin, using some data acquisition system sampling at 30000 Hz. The voltage recording occurs over the span of an hour, where multiple strings are printed at random times during the hour. These strings are printed typically several seconds apart.
I have a recording of the entire voltage signal across an hour, which I will analyse offline in Python. Given this vector of 0-5V values across an hour, how do I detect all occurrences of strings printed, and also decode all the strings from the voltage values? e.g. retrieve '456' and '999' from the example above

Okay, if you want to do it from scratch, you're doing this wrong.
First thing you need to know is the transmission protocol. If you can transmit whatever you can from the Teensy, then you've got yourself what is called an oracle and you've already half way to the goal: start transmitting different bit sequences (0xFF, 0xF0, 0x0F, 0x00) and see what gets transmitted along the line, and how. Since the Teensy is almost certainly using straight 9600 8N1, you are now at this stage exactly (you could reproduce the oscilloscope picture from voltage data if you wanted).
Read those answers, and you'll get the rest of the road to a working Python code that translates voltage spikes to bits and then to characters.
If you don't have an oracle, it gets more complicated. My own preference in that case would be to get myself a pliable Teensy all for me and do the first part there. Otherwise, you have to first read the post above, then work it backwards looking at data recordings, which will be much more difficult.
In a pinch, in the oracle scenario, you could even shoot yourself all codes from '0' to '9' - or from 0x00 to 0xFF, or from '0000' to '9999' if that's what it takes - then use a convolution to match the codes with whatever's on the wire, and that would get you the decoded signal without even knowing what protocol was used (I did it once, and can guarantee that it can be done. It was back in the middle ages and the decoder ran on a 80286, so it took about four or five seconds to decode each character's millisecond-burst using a C program. Nowadays you could do it real time, I guess).

Why is random video seeks with OpenCV slow?

Seeking to random points in a video file with OpenCV seems to be much slower than in media players like Windows Media Player or VLC. I am trying to seek to different positions on a video file encoded in H264 (or MPEG-4 AVC (part10)) using VideoCapture and the time taken to seek to the position seems to be proportional to the frame number queried. Here's a small code example of what I'm trying to do:
import cv2
cap = cv2.VideoCapture('example_file')
frame_positions = [200, 400, 8000, 200000]
for frame_position in frame_positions:
cap.set(cv2.cv.CV_CAP_PROP_FRAMES, frame_position)
img = cap.read()
cv2.imshow('window', img)
cv2.waitKey(0)
The perceived times for when the images are displayed from above are proportional to the frame number. That is, frame number 200 and 400, barely have any delay, 8000 some noticeable lag, but 200000 would take almost half a minute.
Why isn't OpenCV able to seek as "quickly" as say Windows Media Player? Could it be that OpenCV is not using the FFMPEG codecs correctly while seeking? Would building OpenCV from sources with some alternate configuration for codecs help? If so, could someone tell me what the configuration could be?
I have only tested this on Windows 7 and 10 PCs, with OpenCV binaries as is, with relevant FFMPEG DLLs in system path.
Another observation: With OpenCV (binaries) versions greater than 2.4.9 (Example 2.4.11, 3.3.0), the first seek works, but not the subsequent ones. That is, it can seek to frame 200 from above example, but not to 400 and the rest; the video just jumps back to frame 0. But since it works for me with 2.4.9, I'm happy for now.

GPU acceleration should not matter for seeking, because you are not decoding frames. In addition, even if you were decoding frames, doing so on the GPU would be slower than on the CPU, because your CPU nowadays has video codecs "soldered" into the chip, which makes video decoding very fast, and there would have to be some book-keeping to shovel data from main memory into the GPU.
It sounds like OpenCV implements a "safe" way of seeking: Video files can contain stream offsets. For example, your audio stream may be set off against your video stream. As another example, you might have cut away the beginning of a video and saved the result. If your cut did not happen precisely at a key frame, video editing software like ffmpeg will include a small number of frames before your cut in the output file, in order to allow the frame at which your cut happened to be decoded properly (for which the previous frames might be necessary). In this case, too, there will be a stream offset.
In order to make sure that such offsets are interpreted the right way, that is, to really hit exactly the desired frame relative to "time 0", the only "easy", but expensive way is to really eat and decode all the video frames. And that's apparently what openCV is doing here. Your video players do not bother about this, because everyday users don't notice and the controls in the GUI are anyway much to imprecise.
I might be wrong about this. But answers to other questions and some experiments I conducted to evaluate them showed that only the "slow" way of counting the frames in a video gave accurate results.

It's likely because that is a very basic code example and the mentioned applications are doing something more clever.
A few points:
Windows Media Player has hardware acceleration
Windows Media Player almost definitly uses your GPU, you could try disabling this to see what difference it makes
VLC is an open source project so you could check out it's code to see how it does video seeking
VLC probably also uses your GPU
OpenCV provides GPU functions that will most likely make your code much quicker
If speed for seeking is important, you almost definitly want to work with the GPU when doing video operations:
https://github.com/opencv/opencv/blob/master/samples/gpu/video_reader.cpp

Here are some related github issues:
https://github.com/opencv/opencv/issues/4890
https://github.com/opencv/opencv/issues/9053
Re-encode your video with ffmpeg. It works for me.

Why my laptop gets stuck when working with Python list?

I have a video file and all I want for now is put all the video's frames into a Python list. I am using Python's OpenCV library to do it. But my laptop could never do it. it just gets stuck and I have to cut the power to restart it. my guess is that python list is unable to handle all the frames due to memory deficiency. Here is the code and i believe it is the right way to do what I want(syntax). now I need why the laptop is getting stuck and any solution other than using list.
import cv2
video = cv2.VideoCapture("myvideo.mp4")
all_frames = []
while 1:
ret, frame = video.read()
if ret:
all_frames.append(frame)
continue
break
below is some data about the video that might help you
the video contains 7000 frames.
every frame has (1080, 1920) dimension

You can't afford to do that this way.
When reading, the frames are uncompressed from the .mp4 to raw output like 3 bytes per pixel or such.
So you want to store 7000*3*1080*1920 bytes total which is roughly 43 Gb !!
Not to mention that the constant resizing of the list owing to append creates even more copies, so even if you had the memory available, this would be very long.
The idea behind this program is probably to analyse the frames. So basically you don't need all the frames in memory at the same time.
In that case, read a small number of them (in a revolving buffer), perform your shape detection analysis, whatever, store the analysed data (much smaller) and drop the raw data, repeat (programs performing real-time analysis cannot store all the data, because they're running forever)

How do you read this .wav file byte data?

I used the python wave module and read the first frame from a .wav file and it returned this :
b'\x00\x00\x00\x00\x00\x00'
What does each byte mean and will it be the same for every frame or for just some?
I've done some research into the subject and have found that there are bytes that give information about the .wav file in front of the sound data, so does python miss out this information and skip straight to the sound data or do I have to manually separate it?
There are 2 channels and a sample width of 3 according to python.
UPDATE
I have successfully created the waveform for the wav file, it wasn't as difficult as I first thought, now to show it whilst the song is playing....

The wave module reads the header for you, which is why it can tell you how many channels there are, and what the sample width is.
Reading frames gives you direct access to the raw sample data, but because the WAV format is a bit of a mixed, confused beast it depends on the sample width and channel count how you need to interpret each frame. See this article for a good in-depth discussion on that.

.wav questions and python wave

The module "wave" of python gives me a list of hexadecimal bytes, that I can read like numbers. Let's say the frequency of my sample is 11025. Is there a 'header' in those bytes that specify this? I know I can use the wave method to get the frequency, but I wanna talk about the .wav file structure. It has a header? If I get those bytes, how do I know wich ones are the music and the ones that are information? If I could play these numbers in a speaker 11025 times per second with the intensity from 0 to 255, could I play the sound just like it is in the file?
Thanks!

.wav files are actually RIFF files under the hood. The WAVE section contains both the format information and the waveform data. Reading the codec, sample rate, sample size, and sample polarity from the format information will allow you to play the waveform data assuming you support the codec used.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.