Why do I get rows of zeros in my 2D fft? - python

I am trying to replicate the results from a paper.
"Two-dimensional Fourier Transform (2D-FT) in space and time along sections of constant latitude (east-west) and longitude (north-south) were used to characterize the spectrum of the simulated flux variability south of 40degS." - Lenton et al(2006)
The figures published show "the log of the variance of the 2D-FT".
I have tried to create an array consisting of the seasonal cycle of similar data as well as the noise. I have defined the noise as the original array minus the signal array.
Here is the code that I used to plot the 2D-FT of the signal array averaged in latitude:
import numpy as np
from numpy import ma
from matplotlib import pyplot as plt
from Scientific.IO.NetCDF import NetCDFFile
### input directory
indir = '/home/nicholas/data/'
### get the flux data which is in
### [time(5day ave for 10 years),latitude,longitude]
nc = NetCDFFile(indir + 'CFLX_2000_2009.nc','r')
cflux_southern_ocean = nc.variables['Cflx'][:,10:50,:]
cflux_southern_ocean = ma.masked_values(cflux_southern_ocean,1e+20) # mask land
nc.close()
cflux = cflux_southern_ocean*1e08 # change units of data from mmol/m^2/s
### create an array that consists of the seasonal signal fro each pixel
year_stack = np.split(cflux, 10, axis=0)
year_stack = np.array(year_stack)
signal_array = np.tile(np.mean(year_stack, axis=0), (10, 1, 1))
signal_array = ma.masked_where(signal_array > 1e20, signal_array) # need to mask
### average the array over latitude(or longitude)
signal_time_lon = ma.mean(signal_array, axis=1)
### do a 2D Fourier Transform of the time/space image
ft = np.fft.fft2(signal_time_lon)
mgft = np.abs(ft)
ps = mgft**2
log_ps = np.log(mgft)
log_mgft= np.log(mgft)
Every second row of the ft consists completely of zeros. Why is this?
Would it be acceptable to add a randomly small number to the signal to avoid this.
signal_time_lon = signal_time_lon + np.random.randint(0,9,size=(730, 182))*1e-05
EDIT: Adding images and clarify meaning
The output of rfft2 still appears to be a complex array. Using fftshift shifts the edges of the image to the centre; I still have a power spectrum regardless. I expect that the reason that I get rows of zeros is that I have re-created the timeseries for each pixel. The ft[0, 0] pixel contains the mean of the signal. So the ft[1, 0] corresponds to a sinusoid with one cycle over the entire signal in the rows of the starting image.
Here are is the starting image using following code:
plt.pcolormesh(signal_time_lon); plt.colorbar(); plt.axis('tight')
Here is result using following code:
ft = np.fft.rfft2(signal_time_lon)
mgft = np.abs(ft)
ps = mgft**2
log_ps = np.log1p(mgft)
plt.pcolormesh(log_ps); plt.colorbar(); plt.axis('tight')
It may not be clear in the image but it is only every second row that contains completely zeros. Every tenth pixel (log_ps[10, 0]) is a high value. The other pixels (log_ps[2, 0], log_ps[4, 0] etc) have very low values.

Consider the following example:
In [59]: from scipy import absolute, fft
In [60]: absolute(fft([1,2,3,4]))
Out[60]: array([ 10. , 2.82842712, 2. , 2.82842712])
In [61]: absolute(fft([1,2,3,4, 1,2,3,4]))
Out[61]:
array([ 20. , 0. , 5.65685425, 0. ,
4. , 0. , 5.65685425, 0. ])
In [62]: absolute(fft([1,2,3,4, 1,2,3,4, 1,2,3,4]))
Out[62]:
array([ 30. , 0. , 0. , 8.48528137,
0. , 0. , 6. , 0. ,
0. , 8.48528137, 0. , 0. ])
If X[k] = fft(x), and Y[k] = fft([x x]), then Y[2k] = 2*X[k] for k in {0, 1, ..., N-1} and zero otherwise.
Therefore, I would look into how your signal_time_lon is being tiled. That may be where the problem lies.

Related

optimize function that reads and mirrors a half numpy matrix

I have a text file that has some values of a matrix, but it just has half of the values of it, like this:
1. 1. 0.01
2. 1. 0.052145
2. 2. 0.045
3. 1. 0.054521
3. 2. 0.05424
3. 3. 0.05459898
the first two columns are referent to matrix (x,y) position, and the last one, the value it has. the first two values might be, actually, value-1.
I made a function that reads the file and mirrors these values to a full matrix:
def expand_mirror_matrix(matrix_path='data.txt'):
data = np.loadtxt(matrix_path)
shape = (int(data[-1][0]), int(data[-1][1]))
m = np.zeros(shape)
for d in data:
x, y, z = int(d[0]), int(d[1]), d[2]
m[x-1,y-1] = z
m[shape[0]-x,shape[1]-y]=z
return m
But it has some unnecessary loops, like the first and the last, and the loop that changes the value of the center of the matrix.
Is there a way of optimizing it? This file actually have thousands of lines, it might be great to downgrade this loop execution time.
I believe this does what you want, at least without the mirroring:
def expand_mirror_matrix(matrix_path='data.txt'):
data = np.loadtxt(matrix_path)
shape = (int(data[-1][0]), int(data[-1][1]))
xs = data[:,0].astype(int) - 1 # Numpy uses zero-based indexing.
ys = data[:,1].astype(int) - 1
m = np.zeros(shape)
m[(xs, ys)] = data[:,2]
return m
For your example file above this returns:
array([[0.01 , 0. , 0. ],
[0.052145 , 0.045 , 0. ],
[0.054521 , 0.05424 , 0.05459898]])
If you wish to mirror it you probably want to edit the above function with the following:
m[(xs, ys)] = data[:,2]
m[(ys, xs)] = data[:,2] # Mirrored.
The result of that is:
array([[0.01 , 0.052145 , 0.054521 ],
[0.052145 , 0.045 , 0.05424 ],
[0.054521 , 0.05424 , 0.05459898]])
Note that this assumes the matrix is square.

I would like to trigger a count after every peak in a generated 1D numpy array

I have some code where after every loop an image is analyzed and the average pixel intensity value from the image is appended to a 1D array (result_array). So this 1D array is growing by one value every loop. The array when graphed with time shows a frequency and I would like to start a counter to count the frames in between each 'peak' so that I can use that value to calculate frequency per minute.
For example: when I print (result_array) I get this after 28 loops:
[255. 3. 1. 0. 16. 26. 3. 0. 0. 0. 0. 0. 0. 0.
2. 11. 1. 0. 0. 0. 0. 0. 0. 0. 4. 12. 1. 0.]
By eye the peaks are 255, 26, 11, and 12 and each number in between is a frame that I would like to be counted and turned into a value for a frequency equation. Then refreshed and repeated after each peak. How do I detect this position and then initiate the count? I have very little programming knowledge so the more basic knowledge the better.
Here is my loop:
while True:
ret, frame = cap.read()
fgmask = fgbg.apply(frame)
cv2.imshow('Original', frame)
cv2.imshow('Masked', fgmask)
average = (np.average(fgmask))
average_int = int(average)
result_array = np.append(result_array, average_int)
print(result_array)
Hope it was clear, let me know if you need more information.
Use find_peaks_cwt from scipy.signal to get all peaks positions, and then
calculate diffs
>>> from scipy import signal
>>> peaks_pos = signal.find_peaks_cwt(result_array, range(1, 5))
>>> peaks_pos
array([ 1, 5, 15, 25])
>>>
>>> peaks_pos[1:] - peaks_pos[:-1]
array([ 4, 10, 10])
Piggybacking on #c2huc2hu's suggestion in the comments.
from spicy.signal import find_peaks
peaks, _ = find_peaks(result_array)
Peaks is now
array([ 5, 15, 25])
which are the indices of the peaks in your array. That itself should serve as your counter –- there are five elements before the first peak, then ten (15-5), then ten (25-15).
You can verify this output with
result_array[peaks]
which will yield
array([26., 11., 12.])
the local maxima of your array.

Generating random sparse data in the range 0-1

I am trying to generate sparse 3 dimensional nonparametric datasets in the range 0-1, where the dataset should contain zeros as well. I tried to generate this using:
training_matrix = numpy.random.rand(3000, 3)
but it is not printing the data as 0.00000 in any of the rows.
We start by creating an array of zeros of nrows rows by 3 columns:
import numpy as np
nrows = 3000 # total number of rows
training_matrix = np.zeros((nrows, 3))
Then we randomly draw (without replacement) nz integers from range(nrows). These numbers are the indices of the rows with nonzero data. The sparsity of training_matrix is determined by nz. You can adjust its value to fit your needs (in this example sparsity is set to 50%):
nz = 1500 # number of rows with nonzero data
indices = np.random.choice(nrows, nz, replace=False)
And finally, we populate the selected rows with random numbers through advanced indexing:
training_matrix[indices, :] = np.random.rand(nz, 3)
This is what you get by running the code above:
>>> print(training_matrix)
[[ 0.96088615 0.81550102 0.21647398]
[ 0. 0. 0. ]
[ 0.55381338 0.66734065 0.66437689]
...,
[ 0. 0. 0. ]
[ 0.03182902 0.85349965 0.54315029]
[ 0.71628805 0.2242126 0.02481218]]
Since you want all 5 numbers to be zero, the probability of that occurring is 1/10^5 = 0.00001, with replacement. The probability of getting that is still negligible, even if you have 3000*3=9000 values. Something else you can try doing for your peace of mind is to generate random numbers and truncate them at a certain point, ie 5 decimal places if you want.

cmap to rgba in Matplotlib

I have a specific implementation question about taking data mapped out using a colormapping (cmap) and converting it to rgba values. Essentially, I have a bunch of data which I would like to create an errorbar() plot for where the points as well as the errorbars themselves are colored by the size of some other value (for concreteness let's say it's contribution to the chi-square of the fit of some model). Let's say I have an (N,4) array called D, where the first two columns are the X and Y data, the third column is the value of the errorbar, and the last column is its contribution to the chi-square function.
How would I go about first 1) mapping the range of chi-square contribution values to a cmap, and secondly, 2) how can I get rgba values from these in order to loop over the errorbar() function to plot what I was hoping to plot?
This may actually be helpful (http://matplotlib.org/api/cm_api.html), but I'm unable to find any examples or additional information about how to use ScalarMappable() (which does have a to_rgba() method).
Thanks!
You can map scalar values to a colormap by calling the objects in matplotlib.cm on the values. The values should lie between 0 and 1. So, to get RBGA values for some chi-square distributed data (which I'll generate randomly), I would do:
chisq = np.random.chisquare(4, 8)
chisq -= chisq.min()
chisq /= chisq.max()
errorbar_colors = cm.winter(chisq)
Instead of having the color scale start and end at the minimum and maximum actual values, you could subtract off the minimum and divide by the maximum you want.
Now errorbar_colors will be a (8, 4) array of RGBA values from the winter colormap:
array([[ 0. , 0.7372549 , 0.63137255, 1. ],
[ 0. , 0.7372549 , 0.63137255, 1. ],
[ 0. , 0.4745098 , 0.7627451 , 1. ],
[ 0. , 1. , 0.5 , 1. ],
[ 0. , 0.36078431, 0.81960784, 1. ],
[ 0. , 0.47843137, 0.76078431, 1. ],
[ 0. , 0. , 1. , 1. ],
[ 0. , 0.48627451, 0.75686275, 1. ]])
To plot this, you can just iterate over the colors and the datapoints and draw errorbars:
heights = np.random.randn(8)
sem = .4
for i, (height, color) in enumerate(zip(heights, errorbar_colors)):
plt.plot([i, i], [height - sem, height + sem], c=color, lw=3)
plt.plot(heights, marker="o", ms=12, color=".3")
However, none of the built-in matplotlib colormaps are all that well-suited to this task. For some improvement, you could use seaborn to generate a sequential color palette that can be used to color lines:
import numpy as np
import seaborn
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
chisq = np.random.chisquare(4, 8)
chisq -= chisq.min()
chisq /= chisq.max()
cmap = ListedColormap(seaborn.color_palette("GnBu_d"))
errorbar_colors = cmap(chisq)
heights = np.random.randn(8)
sem = .4
for i, (height, color) in enumerate(zip(heights, errorbar_colors)):
plt.plot([i, i], [height - sem, height + sem], c=color, lw=3)
plt.plot(heights, marker="o", ms=12, color=".3")
But even here, I have doubts that this is going to be the best way to get your point across. I don't know exactly what your data look like, but I would advise making two plots, one with the dependent variable you would be plotting here, and a second with the chi square statistic as the dependent variable. Alternatively, if you're interested in the relationship between the size of the error bars and the chi square value, I would plot that directly with a scatterplot.

Using a python array for calculations

I have a numpy array named distance.
It is actually the distance from the center of a circle divided in equal intervals of 0.1262755.
array([ 0. , 0.12627551, 0.25255103, 0.37882654, 0.50510206,
0.63137757, 0.75765309, 0.8839286 , 1.01020411, 1.13647963,
1.26275514])
I need to use this to find area of the annulus of the circle. The formula is:
math.pi*(R**2-r**2)
wherein "R" denotes the large radii and "r" the small radii. Example for area of second annuli is math.pi(0.25255103^2-0.12627551^2)
I need to repeat this for the entire distance array and I would like to know how?
>>> import numpy as np
>>> a = np.array([ 0. , 0.12627551, 0.25255103, 0.37882654, 0.50510206,
0.63137757, 0.75765309, 0.8839286 , 1.01020411, 1.13647963,
1.26275514])
>>> [math.pi*(R**2-r**2) for R, r in zip(a[1:], a)]
[0.050094279561751477, 0.15028285455350326, 0.25047140574288157, 0.35065999660288272, 0.45084853192401186, 0.55103713865226189, 0.65122565810514155, 0.75141421722864576, 0.85160284775926787, 0.95179134340977567]

Categories

Resources