I would like to get a plot of how much each spatial frequency is present in a grayscale image.
I have been told to try np.fft.fft2 but apparently this is not what I need (according to this question). I was then told to look into np.fft.fftfreq - and while that sounds like what I need it will only take an integer as input, so
np.fft.fftfreq(np.fft.fft2(image))
won't work. Nor does:
np.fft.fftfreq(np.abs(np.fft.fft2(image)))
How else could I try to do this? it seems like a rather trivial task for a fourier transform. It's actually the task of the fourier transform. I don't understand why np.fft.fft2 doesn't have a flag to make the frequency analysis orientation-agnostic.
Maybe you should reread the comments in the linked question, as well as the documentation also posted in the last comment. You are supposed to pass the image shape to np.fft.fftfreq:
freqx = np.fft.fftfreq(image.shape[0])
for x-direction and
freqy = np.fft.fftfreq(image.shape[1])
for y-direction.
The results will be the centers of the frequency bins returned by fft2, for example:
image_fft = np.fft.fft2(image)
Then the frequency corresponding to the amplitude image_fft[i,j] is freqx[i] in x-direction and freqy[i] in y-direction.
Your last sentence indicates that you want to do something completely different, though. The Fourier transform of a two-dimensional input is by common definition also two-dimensional. What deviating definition do you want to use?
Related
The feeling of this transformation for a spectrogram(where the x-axis is time and the y-axis is frequency) is somehow stretching it along the y-axis according to different values of alpha, while the top(maximum frequency) and the bottom(zero frequency) remain unmoved.
But now I don't really have an idea of how to implement it.
First, on which step should I do this frequency warping? I'm using Librosa to extract features and convert audios to log-mel spectrograms. Should this be done before converting to melspectrogram or before/after STFT?
Second, in which way can I map each frequency according to the formula? The author mentioned they used OpenCV's Geometric Image Transformations, but I only found the Affine Transformation and Perspective Transformation that seem related, but I didn't manage to achieve this mapping by using them.
Any suggestion and comment are welcome, thank you so much!
I have some code that just makes some random noise using the numpy random normal distribution function and then I add this to a numpy array that contains an image of my chosen object. I then have to clip the array to between values of -1 and 1.
I am just trying to get my head round whether I should be adding this to the array and clipping or multiplying the array by the noise and clipping?
I can't conceptually understand which I should be doing. Could someone please help?
Thanks
It depends what sort of physical model you are trying to represent; additive and multiplicative noise do not correspond to the same phenomenon. Your image can be considered a variable that changes through time. Noise is an extra term that varies randomly as time passes. If this noise term depends on the state of the image in time, then the image and the noise are correlated and noise is multiplicative. If the two terms are uncorrelated, noise is additive.
Well as you have said it yourself, the problem is that you don't know what you want.
Both methods will increase the entropy of the original data.
What is the purpose of your task?
If you want to simulate something like sensor noise, the addition will do just fine.
You can try both and observe what happens to the distribution of your original data set after the application.
Backstory
I started messing with electronics, and realized I need an oscilloscope. I went to buy the oscilloscope (for like $40) online and watched tutorials on how to use them. I stumbled upon a video using the "X-Y" function of the oscilloscope to draw images; I thought that was cool. I tried searching how to do this from scratch and learned you need to convert the image into the frequency domain and some how convert that to an audio signal and send the signal to the two channels on the oscilloscope from the left and right channels from the audio output. So now I am trying to do the image processing part.
What I Got So Far
Choosing an Image
First thing I did was to create an nxn image using some drawing software. I've read online that the total number of pixels of the image should be a power of two. I don't know why, but I created 256x256 pixel images to minimize calculation time. Here is the image I used for this example.
I kept the image simple, so I can vividly see the symmetry when it is transformed. Therefore, if there is no symmetry, then there must be something wrong.
The MATLAB Code
The first thing I did was read the image, convert to gray scale, change data type, and grab the size of the image (for size variability for later use).
%Read image
img = imread('tets.jpg');
%Convert image to gray scale
grayImage = rgb2gray(img);
%Incompatability of data type. uint8 type vs double
grayImage = double(grayImage);
%Grab size of image
[nx, ny, nz] = size(grayImage);
The Algorithm
This is where things get a bit hazy. I am somewhat familiar with the Fourier Transform due to some Mechanical Engineering classes, but the topic was broadly introduced and never really fundamentally part of the course. It was more like, "Hey, check out this thing; but use the Laplace Transformation instead."
So somehow you have to incorporate spatial, amplitude, frequency, and time when doing the calculation. I understand that the spatial coordinates is just the location of each pixel on the image in a matrix or bitmap. I also understand that the amplitude is just the gray scale value from 0-255 of a certain pixel. However, I don't necessarily know how to incorporate frequency and time based on the pixel itself. I think I read somewhere that the frequency increases as the y location of the pixel increases, and the time variable increases with the x location. Here's the link (read first part of Part II).
So I tried following the formula as well as other formulas online and this is what I got for the MATLAB code.
if nx ~= ny
error('Image size must be NxN.'); %for some reason
else
%prepare transformation matrix
DFT = zeros(nx,ny);
%compute transformation for each pixel
for ii = 1:1:nx
for jj = 1:1:ny
amplitude = grayImage(ii,jj);
DFT(ii,jj) = amplitude * exp(-1i * 2 * pi * ((ii*ii/nx) + (jj*jj/ny)));
end
end
%plot of complex numbers
plot(DFT, '*');
%calculate magnitude and phase
magnitudeAverage = abs(DFT)/nx;
phase = angle(DFT);
%plot magnitudes and phase
figure;
plot(magnitudeAverage);
figure;
plot(phase);
end
This code simply tries to follow this discrete fourier transform example video that I found on YouTube. After the calculation I plotted the complex numbers in complex domain. This appears to be in polar coordinates; I don't know why. As stated in the video about the Nyquist Limit, I plotted the average magnitude too. As well as the phase angles of the complex numbers. I'll just show you the plots!
The Plots
Complex Numbers
This is the complex plot; I believe it's in polar form instead of cartesian, but I don't know. It appears symmetric too.
Average Amplitude Vs. Sample
The vertical axis is amplitude, and the horizontal axis is the sample number. This looks like the deconstruction of the signal, but then again I don't really know what I am looking at.
Phase Angle Vs. Sample
The vertical axis is the phase angle, and the horizontal axis is the sample number. This looks the most promising because it looks like a plot in the frequency domain, but this isn't suppose to be a plot in the frequency domain; rather, its a plot in the sample domain? Again, I don't know what I am looking at.
I Need Help Understanding
I need to somehow understand these plots, so I know I am getting the right plot. I believe there may be something very wrong in the algorithm because it doesn't necessarily implement the frequency and time component. So maybe you can tell me how that is done? Or at least guide me?
TLDR;
I am trying to convert images into sound files to display on an oscilloscope. I am stuck on the image processing part. I believe there is something wrong with the MATLAB code (check above) because it doesn't necessarily include the frequency and time component of each pixel. I need help with the code and understanding how to interpret the result, so I know the transfromations are correct-ish.
I have a python assignment related to image processing.
One of the functions I need to implement is called compute_cubic_interpolation_coefficients(coefficient[][][]), and the description of it is:
This function computes the 4x4 = 16 cubic coefficients for each
quarter of a pixel using a cubic approximation of a Gaussian function.
I must admit I don't fully understand how that should be calculated, I guess (just guess) that the first two dimensions of the array represents the 2D pixels map, while the 3rd dimension array should contain 16 values per each pixels (or something like that).
Note: this is my first assignment for this course so it would be really valuable if anyone may help me to understand how to implement that correctly.
My software should judge spectrum bands, and given the location of the bands, find the peak point and width of the bands.
I learned to take the projection of the image and to find width of each peak.
But I need a better way to find the projection.
The method I used reduces a 1600-pixel wide image (eg 1600X40) to a 1600-long sequence. Ideally I would want to reduce the image to a 10000-long sequence using the same image.
I want a longer sequence as 1600 points provide too low resolution. A single point causes a large difference (there is a 4% difference if a band is judged from 18 to 19) to the measure.
How do I get a longer projection from the same image?
Code I used: https://stackoverflow.com/a/9771560/604511
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("band2.png"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Set the min value to zero for a nice fit
projection /= projection.mean()
projection -= projection.min()
What you want to do is called interpolation. Scipy has an interpolate module, with a whole bunch of different functions for differing situations, take a look here, or specifically for images here.
Here is a recently asked question that has some example code, and a graph that shows what happens.
But it is really important to realise that interpolating will not make your data more accurate, so it will not help you in this situation.
If you want more accurate results, you need more accurate data. There is no other way. You need to start with a higher resolution image. (If you resample, or interpolate, you results will acually be less accurate!)
Update - as the question has changed
#Hooked has made a nice point. Another way to think about it is that instead of immediately averaging (which does throw away the variance in the data), you can produce 40 graphs (like your lower one in your posted image) from each horizontal row in your spectrum image, all these graphs are going to be pretty similar but with some variations in peak position, height and width. You should calculate the position, height, and width of each of these peaks in each of these 40 images, then combine this data (matching peaks across the 40 graphs), and use the appropriate variance as an error estimate (for peak position, height, and width), by using the central limit theorem. That way you can get the most out of your data. However, I believe this is assuming some independence between each of the rows in the spectrogram, which may or may not be the case?
I'd like to offer some more detail to #fraxel's answer (to long for a comment). He's right that you can't get any more information than what you put in, but I think it needs some elaboration...
You are projecting your data from 1600x40 -> 1600 which seems like you are throwing some data away. While technically correct, the whole point of a projection is to bring higher dimensional data to a lower dimension. This only makes sense if...
Your data can be adequately represented in the lower dimension. Correct me if I'm wrong, but it looks like your data is indeed one-dimensional, the vertical axis is a measure of the variability of that particular point on the x-axis (wavelength?).
Given that the projection makes sense, how can we best summarize the data for each particular wavelength point? In my previous answer, you can see I took the average for each point. In the absence of other information about the particular properties of the system, this is a reasonable first-order approximation.
You can keep more of the information if you like. Below I've plotted the variance along the y-axis. This tells me that your measurements have more variability when the signal is higher, and low variability when the signal is lower (which seems useful!):
What you need to do then, is decide what you are going to do with those extra 40 pixels of data before the projection. They mean something physically, and your job as a researcher is to interpret and project that data in a meaningful way!
The code to produce the image is below, the spec. data was taken from the screencap of your original post:
import Image
from scipy import *
from scipy.optimize import leastsq
# Load the picture with PIL, process if needed
pic = asarray(Image.open("spec2.png"))
# Average the pixel values along vertical axis
pic_avg = pic.mean(axis=2)
projection = pic_avg.sum(axis=0)
# Compute the variance
variance = pic_avg.var(axis=0)
from pylab import *
scale = 1/40.
X_val = range(projection.shape[0])
errorbar(X_val,projection*scale,yerr=variance*scale)
imshow(pic,origin='lower',alpha=.8)
axis('tight')
show()