Detecting digital clipping in audio signal - python

Given an audio byte array data in python like so
inp = alsaaudio.PCM(alsaaudio.PCM_CAPTURE, alsaaudio.PCM_NONBLOCK, card)
# Set attributes: Mono, 48000 Hz, 16 bit little endian samples
inp.setchannels(1)
inp.setrate(48000)
inp.setformat(alsaaudio.PCM_FORMAT_S16_LE)
l, data = inp.read()
How do I detect digital clipping, which value does data have to exceed to be sure that it was digitally clipped?

Overdrive is basically gain distortion. It raises the voltage to the point that the driver just cuts the top off and thus distorts the signal. If you need to test this in a digital sense, it would be hard clipping. So you would need to search for values that pass the maximum threshold. With 16-bit audio files the clip is going to be 0db by the nature of it. Because if theres no more bits left to save to, then the software will automatically chop it off to the maximum a 16 bit integer can hold. Unfortunately if the track had been previously distorted and then had the volume lowered so as to blend into the mix better, your probably not going to find it. Unless that is, what you're examining is the only sound source on the track, it which case just find the maximum for the track and set that as your threshold. I can say though that hard clipping shows up as a square wave, so you could search for consecutively identical values for a time period longer then a common audible wave (as to ignore legitimate square wave tones). Thats about the best I can do for you though.

Related

What algorithms used in audio limiters?

I'd like to recreate it in numpy or other python library.
I mean a function, that not just simply clips all the samples above the threshold level or normalizes the whole audio. But a function that takes an audio waveform in a range (-1;1), attack time, decay time and threshold level in dB. Reduces the volume of samples above the threshold without distortion and outputs a new sound.
All the solutions I've found so far either add distortion like ffmpeg or don't use 64-bit floating point calculations like SOX.

Using Fourier Transform to Transform Image into Sound (I don't think it's working)

Backstory
I started messing with electronics, and realized I need an oscilloscope. I went to buy the oscilloscope (for like $40) online and watched tutorials on how to use them. I stumbled upon a video using the "X-Y" function of the oscilloscope to draw images; I thought that was cool. I tried searching how to do this from scratch and learned you need to convert the image into the frequency domain and some how convert that to an audio signal and send the signal to the two channels on the oscilloscope from the left and right channels from the audio output. So now I am trying to do the image processing part.
What I Got So Far
Choosing an Image
First thing I did was to create an nxn image using some drawing software. I've read online that the total number of pixels of the image should be a power of two. I don't know why, but I created 256x256 pixel images to minimize calculation time. Here is the image I used for this example.
I kept the image simple, so I can vividly see the symmetry when it is transformed. Therefore, if there is no symmetry, then there must be something wrong.
The MATLAB Code
The first thing I did was read the image, convert to gray scale, change data type, and grab the size of the image (for size variability for later use).
%Read image
img = imread('tets.jpg');
%Convert image to gray scale
grayImage = rgb2gray(img);
%Incompatability of data type. uint8 type vs double
grayImage = double(grayImage);
%Grab size of image
[nx, ny, nz] = size(grayImage);
The Algorithm
This is where things get a bit hazy. I am somewhat familiar with the Fourier Transform due to some Mechanical Engineering classes, but the topic was broadly introduced and never really fundamentally part of the course. It was more like, "Hey, check out this thing; but use the Laplace Transformation instead."
So somehow you have to incorporate spatial, amplitude, frequency, and time when doing the calculation. I understand that the spatial coordinates is just the location of each pixel on the image in a matrix or bitmap. I also understand that the amplitude is just the gray scale value from 0-255 of a certain pixel. However, I don't necessarily know how to incorporate frequency and time based on the pixel itself. I think I read somewhere that the frequency increases as the y location of the pixel increases, and the time variable increases with the x location. Here's the link (read first part of Part II).
So I tried following the formula as well as other formulas online and this is what I got for the MATLAB code.
if nx ~= ny
error('Image size must be NxN.'); %for some reason
else
%prepare transformation matrix
DFT = zeros(nx,ny);
%compute transformation for each pixel
for ii = 1:1:nx
for jj = 1:1:ny
amplitude = grayImage(ii,jj);
DFT(ii,jj) = amplitude * exp(-1i * 2 * pi * ((ii*ii/nx) + (jj*jj/ny)));
end
end
%plot of complex numbers
plot(DFT, '*');
%calculate magnitude and phase
magnitudeAverage = abs(DFT)/nx;
phase = angle(DFT);
%plot magnitudes and phase
figure;
plot(magnitudeAverage);
figure;
plot(phase);
end
This code simply tries to follow this discrete fourier transform example video that I found on YouTube. After the calculation I plotted the complex numbers in complex domain. This appears to be in polar coordinates; I don't know why. As stated in the video about the Nyquist Limit, I plotted the average magnitude too. As well as the phase angles of the complex numbers. I'll just show you the plots!
The Plots
Complex Numbers
This is the complex plot; I believe it's in polar form instead of cartesian, but I don't know. It appears symmetric too.
Average Amplitude Vs. Sample
The vertical axis is amplitude, and the horizontal axis is the sample number. This looks like the deconstruction of the signal, but then again I don't really know what I am looking at.
Phase Angle Vs. Sample
The vertical axis is the phase angle, and the horizontal axis is the sample number. This looks the most promising because it looks like a plot in the frequency domain, but this isn't suppose to be a plot in the frequency domain; rather, its a plot in the sample domain? Again, I don't know what I am looking at.
I Need Help Understanding
I need to somehow understand these plots, so I know I am getting the right plot. I believe there may be something very wrong in the algorithm because it doesn't necessarily implement the frequency and time component. So maybe you can tell me how that is done? Or at least guide me?
TLDR;
I am trying to convert images into sound files to display on an oscilloscope. I am stuck on the image processing part. I believe there is something wrong with the MATLAB code (check above) because it doesn't necessarily include the frequency and time component of each pixel. I need help with the code and understanding how to interpret the result, so I know the transfromations are correct-ish.

What are the advantages / disadvantages between the different predefined ArUco dictionaries?

I want to use ArUco markers to detect objects and use a predefined dictionary.
I only need a small amount of different markers. About 10. I am now wondering what the advantages and disadvantages are between the different predefined dictionaries.
Dictionaries differ in amount of markers and bit size.
My thoughts so far:
Having a lower amount of markers decreases the inter marker
distance, thus the chance of faulty marker ID classification. However, the maximum amount of available unique markers is lower.
Having a lower bit size helps to identify the markers better if
their pixel size in the captured image is small (marker are printed small / far away in image). However, the maximum amount of available unique markers is lower.
Is my thought process so far correct? Did I miss anything?
So for me, only needing 10 different markers, I probably should stick to the DICT_4X4_50 dictionary to achieve best marker detection results?!
Or would it even be better to create my own dictionary with even less markers to increase inter marker distance?
I am the main ArUco developer. I personally recommend the first 10 markers of the ARUCO_MIP_36h12 dictionary. Unless you are working at an extremely low resolution, there is no real improvement in working with small markers such as 4x4 or 3x3. This is because internally the library reduces the detected marker to a small size (of around 50x50 bits regardless its dimensions in the actual image) and it is in this resolution in which the code is analyzed.
The fully explained pipeline of the ArUco library is described in the latest paper
https://www.researchgate.net/publication/325787310_Speeded_Up_Detection_of_Squared_Fiducial_Markers in Sect 3.2. Also, you can have more information in the documentation at
https://docs.google.com/document/d/1QU9KoBtjSM2kF6ITOjQ76xqL7H0TEtXriJX5kwi9Kgc
Complementing Rafael's answer, on the bit size, with the relevant quote in the docs:
Markers are comprised by an external black border and an inner region that encodes a binary pattern. The binary pattern is unique an identifies each marker. Depending on the dictionary, there are markers with more or fewer bits. The more bits, the more words in the dictionary, and the smaller the chance of a confusion. However, more bits means that more resolution is required for correct detection.

Python: ultrasonic to audio range

I'm using Python 2.7.3 and I have a question relating to ultrasonic frequencies:
Sampling at 40MHz, I measure an ultrasonic signal that's a convolution of a 1MHz resonant frequency and an envelope - The envelope of which depends on the media through which ultrasonic signal travels. I would like to listen to this received signal, my question is:
How may I map the received signal into the range of human hearing? Or put another way,
How may I down-sample and convert this signal to an audio frequency (keep the envelope shape and maybe even elongate the time so it’s longer).
Simulated signal here, but its typically like this in any case:
import numpy as np
import matplotlib.pylab as plt
# resonant frequency is 1MHz
f = 1e6
Omega = 2*np.pi*f
# samle at 40MHz or ts=25ns, for about 1000 samples:
t = np.arange(0,25e-6,25e-9)
y = np.sin(Omega*t) * (t**2) * np.exp(-t/3e-6)
y /= max(y)
plt.plot(y)
plt.grid()
plt.xlabel('sample')
plt.ylabel('value')
plt.show()
There are two common answers to your question:
Just play it at a fraction of the sampling frequency. If you play your signal back with, e.g. 44.1 kHz sampling frequency, you will have an audible tone of approximately 1000 Hz and signal length of roughly 20 ms. (I picked 44.1 kHz as it is certainly one of the frequencies any hw can play back.) This is probably easiest to accomplish by saving your signal into a WAV file (see the wave module) and then you may play it back with anything that plays WAV files.
The standard method would be to mix the resonant frequency down to audible frequencies. This is the fundamental thing in radios. Mathematically it involves multiplying by a carrier frequency which is close to the resonant frequency, and then low-pass filtering the result. The operation can also be viewed as shifting the frequency spectrum closer to 0. However, as your signal envelope is very fast (0.25 ms), this would only result in a short click and thus not be useful here.
Other solutions can be figured out, if there are further requirements. The envelope frequency and the resonant frequency seem to be relatively close to each other, which limits the options. If you need to do this for a real time signal, then the challenge will be elongating the envelope, because then the envelope has to be detected. Otherwise it is not possible to stretch the time.
I wanted to make this a comment, but I have some examples.
There would be many ways to represent this. You could use sound as an encoding medium.
If your original waveform has few properties, like frequency (constant), and envelope (variable/can be approximated), you can for example encode the frequency in a binary form with a short sequence of sounds and silence (1=generate sound/0=generate silence), you could then represent the amplitude with a constant sound with variable frequency (ex. a 100Hz sound would represent a 0 amplitude, and a 10000Hz sound would represent max amplitude). To rebuild the original envelope, you could use interpolation.
I hope you see my point.

Efficient 2D edge detection in Python

I know that this problem has been solved before, but I've been great difficulty finding any literature describing the algorithms used to process this sort of data. I'm essentially doing some edge finding on a set of 2D data. I want to be able to find a couple points on an eye diagram (generally used to qualify high speed communications systems), and as I have had no experience with image processing I am struggling to write efficient methods.
As you can probably see, these diagrams are so called because they resemble the human eye. They can vary a great deal in the thickness, slope, and noise, depending on the signal and the system under test. The measurements that are normally taken are jitter (the horizontal thickness of the crossing region) and eye height (measured at either some specified percentage of the width or the maximum possible point). I know this can best be done with image processing instead of a more linear approach, as my attempts so far take several seconds just to find the left side of the first crossing. Any ideas of how I should go about this in Python? I'm already using NumPy to do some of the processing.
Here's some example data, it is formatted as a 1D array with associated x-axis data. For this particular example, it should be split up every 666 points (2 * int((1.0 / 2.5e9) / 1.2e-12)), since the rate of the signal was 2.5 GB/s, and the time between points was 1.2 ps.
Thanks!
Have you tried OpenCV (Open Computer Vision)? It's widely used and has a Python binding.
Not to be a PITA, but are you sure you wouldn't be better off with a numerical approach? All the tools I've seen for eye-diagram analysis go the numerical route; I haven't seen a single one that analyzes the image itself.
You say your algorithm is painfully slow on that dataset -- my next question would be why. Are you looking at an oversampled dataset? (I'm guessing you are.) And if so, have you tried decimating the signal first? That would at the very least give you fewer samples for your algorithm to wade through.
just going down your route for a moment, if you read those images into memory, as they are, wouldn't it be pretty easy to do two flood fills (starting centre and middle of left edge) that include all "white" data. if the fill routine recorded maximum and minimum height at each column, and maximum horizontal extent, then you have all you need.
in other words, i think you're over-thinking this. edge detection is used in complex "natural" scenes when the edges are unclear. here you edges are so completely obvious that you don't need to enhance them.

Categories

Resources