Problem
I have a spectrum that can be download here: https://www.dropbox.com/s/ax1b32aotuzx9f1/example_spectrum.npy?dl=0
Using Python, I am trying to use zero padding to increase the number of points in the frequency domain. To do so I rely on scipy.fft and scipy.ifft functions. I do not obtain the desired result, and would be grateful for anyone that could explain why that is.
Code
Here is the code I have tried:
import numpy as np
from scipy.fft import fft, ifft
import matplotlib.pyplot as plt
spectrum = np.load('example_spectrum.npy')
spectrum_time = ifft(spectrum) # In time domain
spectrum_oversampled = fft(spectrum_time, len(spectrum)+1000) # FFT of zero padded spectrum
xaxis = np.linspace(0, len(spectrum)-1, len(spectrum_oversampled)) # to plot oversampled spectrum
fig, (ax1, ax2) = plt.subplots(2,1)
ax1.plot(spectrum, '.-')
ax1.plot(xaxis, spectrum_oversampled)
ax1.set_xlim(500, 1000)
ax1.set_xlabel('Arbitrary units')
ax1.set_ylabel('Normalized flux')
ax1.set_title('Frequency domain')
ax2.plot(spectrum_time)
ax2.set_ylim(-0.02, 0.02)
ax2.set_title('Time domain')
ax2.set_xlabel('bin number')
plt.tight_layout()
plt.show()
Results
Added figure to show results. Blue is original spectrum, orange is zero padded spectrum.
Expected behavior
I would expect the zero padding to result in a sort of sinc interpolation of the original spectrum. However, the orange curve does not go through the points of the original spectrum.
Does anyone have any idea why I obtain this behavior and/or how to fix this?
Related
i got the following code:
Frequency = df['x [Hz]']
Spectrum = df['test_spec']
x = Spectrum
peaks, _ = find_peaks(x, distance=20)
plt.plot(peaks, x[peaks], "xr"); plt.plot(x); plt.legend(['distance'])
plt.show()
The variable "Frequency" contains the frequencies of an a third band octave band spectrum from 5 - 315 HZ. "Spectrum" contains the associated Noisepressurelevels. Now i want to find peaks in that spectrum. the Value i need is the Frequency, where the peak is located.
The problem is that the plot shows a x-axis with the steps 0,5,10,15, but i want a x-axis-scale with my Frequencies saved in the variable "Frequency".
Hope you can help me.
Thank you for your support.
The documentation of find_peaks() can be a bit confusing, as it calls its input x while in most situations that input would be drawn on the y-axis. find_peaks() doesn't care about the x-axis, supposing it is just the same as an array index (0,1,2,...).
To draw your curve, you need to plot using Frequency on the x-axis, and Spectrum on the y-axis. You can visualize the peaks by using them as an index in both arrays:
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
import numpy as np
Frequency = np.linspace(5, 315, 200)
Spectrum = np.random.randn(200).cumsum()
Spectrum += 1 - Spectrum.min()
peaks, _ = find_peaks(Spectrum, distance=20)
plt.plot(Frequency[peaks], Spectrum[peaks], "xr")
plt.plot(Frequency, Spectrum)
plt.legend(['distance'])
plt.tight_layout()
plt.show()
plt.hist's density argument does not work.
I tried to use the density argument in the plt.hist function to normalize stock returns in my plot, but it didn't work.
The following code worked fine for me and give me the probability density function which I desired.
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(19680801)
# example data
mu = 100 # mean of distribution
sigma = 15 # standard deviation of distribution
x = mu + sigma * np.random.randn(437)
num_bins = 50
plt.hist(x, num_bins, density=1)
plt.show()
But when I tried it with stock data, it simply didn't work. The result gave the unnormalized data. I didn't find any abnormal data in my data array.
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure()
plt.hist(returns, 50,density = True)
plt.show()
# "returns" is a np array consisting of 360 days of stock returns
This is a known issue in Matplotlib.
As stated in Bug Report: The density flag in pyplot.hist() does not work correctly
When density = False, the histogram plot would have counts on the Y-axis. But when density = True, the Y-axis does not mean anything useful. I think a better implementation would plot the PDF as the histogram when density = True.
The developers view this as a feature not a bug since it maintains compatibility with numpy. They have closed several the bug reports about it already with since it is working as intended. Creating even more confusion the example on the matplotlib site appears to show this feature working with the y-axis being assigned a meaningful value.
What you want to do with matplotlib is reasonable but matplotlib will not let you do it that way.
It is not a bug.
Area of the bars equal to 1.
Numbers only seem strange because your bin sizes are small
Since this isn't resolved; based on #user14518925's response which is actually correct, this is treating bin width as an actual valid number whereas from my understanding you want each bin to have a width of 1 such that the sum of frequencies is 1. More succinctly, what you're seeing is:
\sum_{i}y_{i}\times\text{bin size} =1
Whereas what you want is:
\sum_{i}y_{i} =1
therefore, all you really need to change is the tick labels on the y-axis. One way to this is to disable the density option :
density = false
and instead divide by the total sample size as such (shown in your example):
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(19680801)
# example data
mu = 0 # mean of distribution
sigma = 0.0000625 # standard deviation of distribution
x = mu + sigma * np.random.randn(437)
fig = plt.figure()
plt.hist(x, 50, density=False)
locs, _ = plt.yticks()
print(locs)
plt.yticks(locs,np.round(locs/len(x),3))
plt.show()
Another approach, besides that of tvbc, is to change the yticks on the plot.
import matplotlib.pyplot as plt
import numpy as np
steps = 10
bins = np.arange(0, 101, steps)
data = np.random.random(100000) * 100
plt.hist(data, bins=bins, density=True)
yticks = plt.gca().get_yticks()
plt.yticks(yticks, np.round(yticks * steps, 2))
plt.show()
I have plotted histogram and now I want to have curve which will represent the histogram trend. I want my histogram binning to be logarithmic (as I have below in the code; Mass variable is predefined variable, ranging from 10^43-10^45 gram).
I have looked for many many codes but could not suit any of them to my case (tried to modify as well). Do you know how I can make this curve? Actually, I just want to modify my code in the way that it will also include plotting this curve above the histogram.
Thanks,
Salome
See the attached image
import matplotlib.pyplot as plt
import numpy as np
x=Mass
hist, bins = np.histogram(x, bins=10)
logbins = np.logspace(np.log10(bins[0]),np.log10(bins[-1]),len(bins))
n, bins, patches = plt.hist(x=Mass, bins=logbins, color='#0504aa', alpha=0.8, rwidth=0.85)
plt.xscale('log')
plt.xlabel('Mass $(g)$ ')
plt.ylabel('Number of halos')
plt.show()
I have a question regarding the scipy.fft package, and how I can use this to generate a Fourier transform of a pulse.
I am trying to do this for an arbitrary pulse in the future, but I wanted to make it as simple as possible so I have been attempting to FFT a time domain rectangular pulse, which should produce a frequency domain Sinc function. You can see more information here: https://en.wikipedia.org/wiki/Rectangular_function
From my understanding of FFTs, a signal needs to be repeating and periodic, so in a situation like a rectangular pulse, I will need to shift this in order for the FFT algorithm to 'see' it as a symmetric pulse.
My problem arises when I observe the real and imaginary components of my Fourier transform. I expect a rectangular pulse (As it is a real and even function) to be real and symmetrical. However, I notice that there are imaginary components, even when I don't use complex numbers.
My approach to this has been the following:
Define my input pulse
Shift my input pulse so that the function is symmetric around the
origin
Fourier transform this and shift it so negative frequencies are shown
first
Separate imaginary and real components
Plot amplitude and phase of my frequencies
I have attached graphs showing what I have attempted and outlining these steps.
This is my first question on stack overflow so I am unable to post images, but a link to the imgur album is here: https://imgur.com/a/geufY
I am having trouble with the phase information of my frequency, from the images in the imgur folder, I have a linearly increasing phase difference, which should in the ideal case be flat.
I expect it is a problem with how I am shifting my input pulse, and have tried several other methods (I can post them if that would help)
Any help with this would be much appreciated, I have been pouring over examples but these mostly refer to infinite sinusoidal functions rather than pulses.
My Code is shown below:
import numpy as np
import scipy.fftpack as fft
import matplotlib.pyplot as plt
'''Numerical code starts here'''
#Define number of points and time/freq arrays
npts = 2**12
time_array = np.linspace(-1, 1, npts)
freq_array = fft.fftshift(fft.fftfreq(len(time_array), time_array[1]-time_array[0]))
#Define a rectangular pulse
pulse = np.zeros(npts)
pulse_width = 100
pulse[npts/2 - pulse_width/2:npts/2 + pulse_width/2] = 1
#Shift the pulse so that the function is symmetrical about origin
shifted_pulse = fft.fftshift(pulse)
#Calculate the fourier transform of the shifted pulse
pulse_frequencies = fft.fftshift(fft.fft(shifted_pulse))
'''Plotting code starts here'''
#Plot the pulse in the time domain
fig, ax = plt.subplots()
ax.plot(time_array, pulse)
ax.set_title('Time domain pulse', fontsize=22)
ax.set_ylabel('Field Strength', fontsize=22)
ax.set_xlabel('Time', fontsize=22)
#Plot the shifted pulse in the time domain
fig, ax = plt.subplots()
ax.plot(time_array, shifted_pulse)
ax.set_title('Shifted Time domain pulse', fontsize=22)
ax.set_ylabel('Field Strength', fontsize=22)
ax.set_xlabel('Time', fontsize=22)
#Plot the frequency components in the frequency domain
fig, ax = plt.subplots()
ax.plot(freq_array, np.real(pulse_frequencies), 'b-', label='real')
ax.plot(freq_array, np.imag(pulse_frequencies), 'r-', label='imaginary')
ax.set_title('Pulse Frequencies real and imaginary', fontsize=22)
ax.set_ylabel('Spectral Density', fontsize=22)
ax.set_xlabel('Frequency', fontsize=22)
ax.legend()
#Plot the amplitude and phase of the frequency components in the frequency domain
fig, ax = plt.subplots()
ax.plot(freq_array, np.abs(pulse_frequencies), 'b-', label='amplitude')
ax.plot(freq_array, np.angle(pulse_frequencies), 'r-', label='phase')
ax.set_title('Pulse Frequencies intenisty and phase', fontsize=22)
ax.set_ylabel('Spectral Density', fontsize=22)
ax.set_xlabel('Frequency', fontsize=22)
ax.legend()
plt.show()
I think this is a simple question, but I just still can't seem to think of a simple solution. I have a set of data of molecular abundances, with values ranging many orders of magnitude. I want to represent these abundances with boxplots (box-and-whiskers plots), and I want the boxes to be calculated on log scale because of the wide range of values.
I know I can just calculate the log10 of the data and send it to matplotlib's boxplot, but this does not retain the logarithmic scale in plots later.
So my question is basically this:
When I have calculated a boxplot based on the log10 of my values, how do I convert the plot afterward to be shown on a logarithmic scale instead of linear with the log10 values?
I can change tick labels to partly fix this, but I have no clue how I get logarithmic scales back to the plot.
Or is there another more direct way to plotting this. A different package maybe that has this options already included?
Many thanks for the help.
I'd advice against doing the boxplot on the raw values and setting the y-axis to logarithmic, because the boxplot function is not designed to work across orders of magnitudes and you may get too many outliers (depends on your data, of course).
Instead, you can plot the logarithm of the data and manually adjust the y-labels.
Here is a very crude example:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import MultipleLocator, FormatStrFormatter
np.random.seed(42)
values = 10 ** np.random.uniform(-3, 3, size=100)
fig = plt.figure(figsize=(9, 3))
ax = plt.subplot(1, 3, 1)
ax.boxplot(np.log10(values))
ax.set_yticks(np.arange(-3, 4))
ax.set_yticklabels(10.0**np.arange(-3, 4))
ax.set_title('log')
ax = plt.subplot(1, 3, 2)
ax.boxplot(values)
ax.set_yscale('log')
ax.set_title('raw')
ax = plt.subplot(1, 3, 3)
ax.boxplot(values, whis=[5, 95])
ax.set_yscale('log')
ax.set_title('5%')
plt.show()
The right figure shows the box plot on the raw values. This leads to many outliers, because the maximum whisker length is computed as a multiple (default: 1.5) of the interquartile range (the box height), which does not scale across orders of magnitude.
Alternatively, you could specify to draw the whiskers for a given percentile range:
ax.boxplot(values, whis=[5, 95])
In this case you get a fixed amount of outlires (5%) above and below.
You can use plt.yscale:
plt.boxplot(data); plt.yscale('log')