Python plot frequency of fft.rfft - python

this is my first question here on stackoverflow and I hope I will not make huge mistakes.
I am analyzing a set of time series with sampling rate of 1 Hz. I need to plot their fourier transform in order to study their spectra.
Here it is my piece of code:
from obspy.core import read
import numpy as np
import matplotlib.pyplot as plt
st = read('../SC_noise/*HEC_109C*_s', format='SAC')
stp = st.copy()
stp.detrend('linear')
stp.taper('cosine')
for tr in stp:
dataonly = tr.data
spec = np.fft.rfft(dataonly)
plt.plot(abs(spec))
plt.show()
This works just fine: the plot is the same I get using SAC. But the xaxis does not show frequencies. I've wandered around a little bit and found different ideas: none of them is working.
For example in the case of a fft (here I am using a rfft) this should do the job
samp_rate=1
freq = np.fft.fftfreq(len(spec), d=1./samp_rate)
But if I use it it would give me negative frequencies.
Does anybody have an idea?
Thank you very much in advance for all the help!
Piero

If your NumPy version is new enough (1.8 or better), use numpy.fft.rfftfreq. Otherwise, here is the definition:
def rfftfreq(n, d=1.0):
"""
Return the Discrete Fourier Transform sample frequencies
(for usage with rfft, irfft).
The returned float array `f` contains the frequency bin centers in cycles
per unit of the sample spacing (with zero at the start). For instance, if
the sample spacing is in seconds, then the frequency unit is cycles/second.
Given a window length `n` and a sample spacing `d`::
f = [0, 1, ..., n/2-1, n/2] / (d*n) if n is even
f = [0, 1, ..., (n-1)/2-1, (n-1)/2] / (d*n) if n is odd
Unlike `fftfreq` (but like `scipy.fftpack.rfftfreq`)
the Nyquist frequency component is considered to be positive.
Parameters
----------
n : int
Window length.
d : scalar, optional
Sample spacing (inverse of the sampling rate). Defaults to 1.
Returns
-------
f : ndarray
Array of length ``n//2 + 1`` containing the sample frequencies.
Examples
--------
>>> signal = np.array([-2, 8, 6, 4, 1, 0, 3, 5, -3, 4], dtype=float)
>>> fourier = np.fft.rfft(signal)
>>> n = signal.size
>>> sample_rate = 100
>>> freq = np.fft.fftfreq(n, d=1./sample_rate)
>>> freq
array([ 0., 10., 20., 30., 40., -50., -40., -30., -20., -10.])
>>> freq = np.fft.rfftfreq(n, d=1./sample_rate)
>>> freq
array([ 0., 10., 20., 30., 40., 50.])
"""
if not (isinstance(n,int) or isinstance(n, integer)):
raise ValueError("n should be an integer")
val = 1.0/(n*d)
N = n//2 + 1
results = arange(0, N, dtype=int)
return results * val

Related

generating random numbers between two numbers

Is it possible to generate random numbers that are almost equally spaced which shouldnot be exactly same as numpy.linspace output
I look into the numpy.random.uniform function but it doesnot give the required results.
Moreover the the summation of the values generated by the function should be same as the summation of the values generated by numpy.linspace function.
code
import random
import numpy as np
random.seed(42)
data=np.random.uniform(2,4,10)
print(data)
You might consider drawing random samples around the output of numpy.linspace. Setting these numbers as the mean of the normal distribution and setting the variance not too high would generate numbers close to the output of numpy.linspace. For example,
>>> import numpy as np
>>> exact_numbers = np.linspace(2.0, 10.0, num=5)
>>> exact_numbers
array([ 2., 4., 6., 8., 10.])
>>> approximate_numbers = np.random.normal(exact_numbers, np.ones(5) * 0.1)
>>> approximate_numbers
array([2.12950013, 3.9804745 , 5.80670316, 8.07868932, 9.85288221])
Maybe this trick by combining numpy.linspace and numpy.random.uniform and random choice two indexes and increase one of them and decrease other help you:
(You can change size=10, threshold=0.1 for how random numbers are bigger or smaller)
import numpy as np
size = 10
theroshold = 0.1
r = np.linspace(2,4,size) # r.sum()=30
# array([2. , 2.22222222, 2.44444444, 2.66666667, 2.88888889,
# 3.11111111, 3.33333333, 3.55555556, 3.77777778, 4. ])
c = np.random.uniform(0,theroshold,size)
# array([0.02246768, 0.08661081, 0.0932445 , 0.00360563, 0.06539992,
# 0.0107167 , 0.06490493, 0.0558159 , 0.00268924, 0.00070247])
s = np.random.choice(range(size), size+1)
# array([5, 5, 8, 3, 6, 4, 1, 8, 7, 1, 7])
for idx, (i,j) in enumerate(zip(s, s[1:])):
r[i] += c[idx]
r[j] -= c[idx]
print(r)
print(r.sum())
Output:
[2. 2.27442369 2.44444444 2.5770278 2.83420567 3.19772192
3.39512762 3.50172642 3.77532244 4. ]
30

Improving distribution array element distribution

I am trying to write a command in Python that would calculate the distribution.
For an example the call:
n=10
x = [1]*3 # returns [1,1,1]
y = [s/sum(a)*n for s in x] # returns [3.3,3.3,3.3]
The problem is that this way, the program I am using would round down the values to 3 and basically, the sum of the array would be 9 instead of 10. How could I improve the call so that I get whole values like [4.0,3.0,3.0]?
The problem is to find which elements to round up while rounding all the rest down. One way to decide is that the element with the largest fractional part gets rounded up but no element is rounded up more than once.
import numpy as np
def distribute(weights, total):
# calculate the distribution as you have done
dist = np.array(weights) / sum(weights) * total
# separate the fractional parts
fracs, dist = np.modf(dist)
# adjust if necessary
while dist.sum() != total:
# find the index of the maximum value of the fractional parts
max_idx = np.argmax(fracs)
# increment just that one value
dist[max_idx] += 1
# zero out that fractional part so we don't use it twice
fracs[max_idx] = 0
return dist
>>> distribute([1]*3, 10)
array([4., 3., 3.])
>>> distribute([1, 2, 3], 16)
array([3., 5., 8.])
>>> distribute([1, 2, 3, 1, 1], 21)
array([3., 5., 8., 3., 2.])

bin one column and sum the other of (2,N) array

Question:
I have a dataset like the following:
import numpy as np
x = np.arange(0,10000,0.5)
y = np.arange(x.size)/x.size
Plotting in log-log space, it looks like this:
import matplotlib.pyplot as plt
plt.loglog(x, y)
plt.show()
Obviously there is a lot of redundant information in this log log plot.
I don't need 10000 points to represent this trend.
My question is this: how can I bin this data so that it displays a uniform number of points in each order of magnitude of the logarithmic scale? At each order of magnitude I'd like to get about ten points. Hence I need to bin 'x' with an exponentially growing bin size, and then take the average of all elements of y corresponding to each bin.
Attempt:
First I generate the bins I want to use for x.
# need a nicer way to do this.
# what if I want more than 10 bins per order of magnitude?
bins = 10**np.arange(1,int(round(np.log10(x.max()))))
bins = np.unique((bins.reshape(-1,1)*np.arange(0,11)).flatten())
#array([ 0, 10, 20, 30, 40, 50, 60, 70, 80,
# 90, 100, 200, 300, 400, 500, 600, 700, 800,
# 900, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000,
# 9000, 10000])
Second, I find the index of the bin to which each element of x corresponds:
digits = np.digitize(x, bins)
Now the part I can really use help with. I want to take the average of every element in y corresponding to each bin, and then plot these averages versus the bin midpoints:
# need a nicer way to do this.. is there an np.searchsorted() solution?
# this way is quick and dirty, but it does not scale with acceptable speed
averages = []
for d in np.unique(digits):
mask = digits==d
y_mean = np.mean(y[mask])
averages.append(y_mean)
del mask, y_mean, d
# now plot the averages within each bin against the center of each bin
plt.loglog((bins[1:]+bins[:-1])/2.0, averages)
plt.show()
Summary:
Is there a smoother way to do this? How can I generate an arbitrary n points per order of magnitude instead of 10?
I will answer two of your several questions: How to create bins alternatively and generate an arbitrary n points per order of magnitude instead of 10?
You can make use of np.logspace and np.outer to create your bins for any arbitrary n value as following. The default base in logspace is 10. It generates logarithmically spaced points similar to linspace which generates linearly spaced mesh.
For n=10
n = 10
bins = np.unique(np.outer(np.logspace(0, 3, 4), np.arange(0, n+1)))
# array([0.e+00, 1.e+00, 2.e+00, 3.e+00, 4.e+00, 5.e+00, 6.e+00, 7.e+00,
# 8.e+00, 9.e+00, 1.e+01, 2.e+01, 3.e+01, 4.e+01, 5.e+01, 6.e+01,
# 7.e+01, 8.e+01, 9.e+01, 1.e+02, 2.e+02, 3.e+02, 4.e+02, 5.e+02,
# 6.e+02, 7.e+02, 8.e+02, 9.e+02, 1.e+03, 2.e+03, 3.e+03, 4.e+03,
# 5.e+03, 6.e+03, 7.e+03, 8.e+03, 9.e+03, 1.e+04])
For n=20
n = 20
bins = np.unique(np.outer(np.logspace(0, 3, 4), np.arange(0, n+1)))
# array([0.0e+00, 1.0e+00, 2.0e+00, 3.0e+00, 4.0e+00, 5.0e+00, 6.0e+00, 7.0e+00, 8.0e+00, 9.0e+00, 1.0e+01, 1.1e+01, 1.2e+01, 1.3e+01, 1.4e+01, 1.5e+01, 1.6e+01, 1.7e+01, 1.8e+01, 1.9e+01, 2.0e+01, 3.0e+01, 4.0e+01, 5.0e+01, 6.0e+01, 7.0e+01, 8.0e+01, 9.0e+01, 1.0e+02, 1.1e+02, 1.2e+02, 1.3e+02, 1.4e+02, 1.5e+02, 1.6e+02, 1.7e+02, 1.8e+02, 1.9e+02, 2.0e+02, 3.0e+02, 4.0e+02, 5.0e+02, 6.0e+02, 7.0e+02, 8.0e+02, 9.0e+02, 1.0e+03, 1.1e+03, 1.2e+03, 1.3e+03, 1.4e+03, 1.5e+03, 1.6e+03, 1.7e+03, 1.8e+03, 1.9e+03, 2.0e+03, 3.0e+03, 4.0e+03, 5.0e+03, 6.0e+03, 7.0e+03, 8.0e+03, 9.0e+03, 1.0e+04, 1.1e+04, 1.2e+04, 1.3e+04, 1.4e+04, 1.5e+04, 1.6e+04, 1.7e+04, 1.8e+04, 1.9e+04, 2.0e+04])
EDIT
If you want 0, 10, 20, 30...90, 100, 200, 300... you can do the following
n = 10
bins = np.unique(np.outer(np.logspace(1, 3, 3), np.arange(0, n+1)))
# array([ 0., 10., 20., 30., 40., 50., 60., 70.,
# 80., 90., 100., 200., 300., 400., 500., 600.,
# 700., 800., 900., 1000., 2000., 3000., 4000., 5000.,
# 6000., 7000., 8000., 9000., 10000.])

Square root of all values in numpy array, preserving sign

I'd like to take the square root of every value in a numpy array, while preserving the sign of the value (and not returning complex numbers when negative) - a signed square root.
The code below demonstrates the desired functionality w/ lists, but is not taking advantage of numpy's optimized array manipulating superpowers.
def signed_sqrt(list):
new_list = []
for v in arr:
sign = 1
if v < 0:
sign = -1
sqrt = cmath.sqrt(abs(v))
new_v = sqrt * sign
new_list.append(new_v)
list = [1., 81., -7., 4., -16.]
list = signed_sqrt(list)
# [1., 9., -2.6457, 2. -4.]
For some context, I'm computing the Hellinger Kernel for [thousands of] image comparisons.
Any smooth way to do this with numpy? Thanks.
You can try using the numpy.sign function to capture the sign, and just take the square root of the absolute value.
import numpy as np
x = np.array([-1, 1, 100, 16, -100, -16])
y = np.sqrt(np.abs(x)) * np.sign(x)
# [-1, 1, 10, 4, -10, -4]

Python get get average of neighbours in matrix with na value

I have very large matrix, so dont want to sum by going through each row and column.
a = [[1,2,3],[3,4,5],[5,6,7]]
def neighbors(i,j,a):
return [a[i][j-1], a[i][(j+1)%len(a[0])], a[i-1][j], a[(i+1)%len(a)][j]]
[[np.mean(neighbors(i,j,a)) for j in range(len(a[0]))] for i in range(len(a))]
This code works well for 3x3 or small range of matrix, but for large matrix like 2k x 2k this is not feasible. Also this does not work if any of the value in matrix is missing or it's like na
This code works well for 3x3 or small range of matrix, but for large matrix like 2k x 2k this is not feasible. Also this does not work if any of the value in matrix is missing or it's like na. If any of the neighbor values is na then skip that neighbour in getting the average
Shot #1
This assumes you are looking to get sliding windowed average values in an input array with a window of 3 x 3 and considering only the north-west-east-south neighborhood elements.
For such a case, signal.convolve2d with an appropriate kernel could be used. At the end, you need to divide those summations by the number of ones in kernel, i.e. kernel.sum() as only those contributed to the summations. Here's the implementation -
import numpy as np
from scipy import signal
# Inputs
a = [[1,2,3],[3,4,5],[5,6,7],[4,8,9]]
# Convert to numpy array
arr = np.asarray(a,float)
# Define kernel for convolution
kernel = np.array([[0,1,0],
[1,0,1],
[0,1,0]])
# Perform 2D convolution with input data and kernel
out = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()
Shot #2
This makes the same assumptions as in shot #1, except that we are looking to find average values in a neighborhood of only zero elements with the intention to replace them with those average values.
Approach #1: Here's one way to do it using a manual selective convolution approach -
import numpy as np
# Convert to numpy array
arr = np.asarray(a,float)
# Pad around the input array to take care of boundary conditions
arr_pad = np.lib.pad(arr, (1,1), 'wrap')
R,C = np.where(arr==0) # Row, column indices for zero elements in input array
N = arr_pad.shape[1] # Number of rows in input array
offset = np.array([-N, -1, 1, N])
idx = np.ravel_multi_index((R+1,C+1),arr_pad.shape)[:,None] + offset
arr_out = arr.copy()
arr_out[R,C] = arr_pad.ravel()[idx].sum(1)/4
Sample input, output -
In [587]: arr
Out[587]:
array([[ 4., 0., 3., 3., 3., 1., 3.],
[ 2., 4., 0., 0., 4., 2., 1.],
[ 0., 1., 1., 0., 1., 4., 3.],
[ 0., 3., 0., 2., 3., 0., 1.]])
In [588]: arr_out
Out[588]:
array([[ 4. , 3.5 , 3. , 3. , 3. , 1. , 3. ],
[ 2. , 4. , 2. , 1.75, 4. , 2. , 1. ],
[ 1.5 , 1. , 1. , 1. , 1. , 4. , 3. ],
[ 2. , 3. , 2.25, 2. , 3. , 2.25, 1. ]])
To take care of the boundary conditions, there are other options for padding. Look at numpy.pad for more info.
Approach #2: This would be a modified version of convolution based approach listed earlier in Shot #1. This is same as that earlier approach, except that at the end, we selectively replace
the zero elements with the convolution output. Here's the code -
import numpy as np
from scipy import signal
# Inputs
a = [[1,2,3],[3,4,5],[5,6,7],[4,8,9]]
# Convert to numpy array
arr = np.asarray(a,float)
# Define kernel for convolution
kernel = np.array([[0,1,0],
[1,0,1],
[0,1,0]])
# Perform 2D convolution with input data and kernel
conv_out = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()
# Initialize output array as a copy of input array
arr_out = arr.copy()
# Setup a mask of zero elements in input array and
# replace those in output array with the convolution output
mask = arr==0
arr_out[mask] = conv_out[mask]
Remarks: Approach #1 would be the preferred way when you have fewer number of zero elements in input array, otherwise go with Approach #2.
This is an appendix to comments under #Divakar's answer (rather than an independent answer).
Out of curiosity I tried different 'pseudo' convolutions against the scipy convolution. The fastest one was the % (modulus) wrapping one, which surprised me: obviously numpy does something clever with its indexing, though obviously not having to pad will save time.
fn3 -> 9.5ms, fn1 -> 21ms, fn2 -> 232ms
import timeit
setup = """
import numpy as np
from scipy import signal
N = 1000
M = 750
P = 5 # i.e. small number -> bigger proportion of zeros
a = np.random.randint(0, P, M * N).reshape(M, N)
arr = np.asarray(a,float)"""
fn1 = """
arr_pad = np.lib.pad(arr, (1,1), 'wrap')
R,C = np.where(arr==0)
N = arr_pad.shape[1]
offset = np.array([-N, -1, 1, N])
idx = np.ravel_multi_index((R+1,C+1),arr_pad.shape)[:,None] + offset
arr[R,C] = arr_pad.ravel()[idx].sum(1)/4"""
fn2 = """
kernel = np.array([[0,1,0],
[1,0,1],
[0,1,0]])
conv_out = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()
mask = arr == 0.0
arr[mask] = conv_out[mask]"""
fn3 = """
R,C = np.where(arr == 0.0)
arr[R, C] = (arr[(R-1)%M,C] + arr[R,(C-1)%N] + arr[R,(C+1)%N] + arr[(R+1)%M,C]) / 4.0
"""
print(timeit.timeit(fn1, setup, number = 100))
print(timeit.timeit(fn2, setup, number = 100))
print(timeit.timeit(fn3, setup, number = 100))
Using numpy and scipy.ndimage, you can apply a "footprint" that defines where you look for the neighbours of each element and apply a function to those neighbours:
import numpy as np
import scipy.ndimage as ndimage
# Getting neighbours horizontally and vertically,
# not diagonally
footprint = np.array([[0,1,0],
[1,0,1],
[0,1,0]])
a = [[1,2,3],[3,4,5],[5,6,7]]
# Need to make sure that dtype is float or the
# mean won't be calculated correctly
a_array = np.array(a, dtype=float)
# Can specify that you want neighbour selection to
# wrap around at the borders
ndimage.generic_filter(a_array, np.mean,
footprint=footprint, mode='wrap')
Out[36]:
array([[ 3.25, 3.5 , 3.75],
[ 3.75, 4. , 4.25],
[ 4.25, 4.5 , 4.75]])

Categories

Resources