I need to do a Fourier transform of a map in Python. Fast Fourier Transforms expect periodic boundary conditions, but the input map is not periodic. So I need to apply an input filter/weight slowly tapering the map toward zero at the edges. Are there libraries for doing this in python?
My favorite function to apodize a map is the generalized Gaussian (also called 'Super-Gaussian' which is a Gaussian whose exponent is raised to a power P. By setting P to, say, 4 or 6 you get a flat-top pulse which falls off smoothly, which is good for FFT applications where sharp edges always create ripples in conjugate space.
The generalized Gaussian is available on Scipy. Here is a minimal code (Python 3) to apodize a 2D array with a generalized Gaussian. As noted in previous comments, there are dozens of functions which would work just as well.
import numpy as np
from scipy.signal import general_gaussian
# A 128x128 array
array = np.random.rand(128,128)
# Define a general Gaussian in 2D as outer product of the function with itself
window = np.outer(general_gaussian(128,6,50),general_gaussian(128,6,50))
# Multiply
ap_array = window*array
Such tapering is often referred to as a "window".
Scipy has many window functions.
You can use numpy.expand_dims to create the 2D window you want.
Regarding Stefan's comment, apparently the numpy team thinks that including more than arrays was a mistake. I would stick to using scipy for signal processing. Watch out, as they moved quite a bit of functions around in their 1.0 release so older documentation is, well, quite old.
As a final note: a "filter" is typically reserved for multiplications you apply in the Frequency domain, not spatial domain.
Related
In an effort to improve my procedural map generation I've been learning more about how generating noise actually works.
With that in mind I've been doing a Python adaptation of this tutorial series about noise and noise derivatives. So I thought I'd build a node/module system based on ANL and libnoise that might at least be useful to someone else when I'm done.
I've been translating this Javascript version of libnoise, as I've used it previously and was familiar with it, into Python and adapting it for 1D and 4D noise (in addition to the 2D and 3D it already did) and derivatives.
The derivative adding, subtracting and multiplying used in the original tutorial cover a lot of the module functions, but I've come to the more complicated ones and I'm struggling to figure out how I should be treating the derivatives.
I'm at the Blend module, which takes three different noise inputs and interpolates two of them with the third as the alpha / time value in the lerp function, as follows;
def get1D(self, x):
a = self.sourceModules[0].get1D(x)
b = self.sourceModules[1].get1D(x)
alpha = self.sourceModules[2].get1D(x)
return lerp(a, b, alpha)
And I'm a bit lost as to what to do with that. Should I just discard the derivatives for the noise and calculate a new one based on the interpolation? How would this work for higher dimensions where I have multiple derivatives for each axis?
Or do I interpolate the old noise derivatives into a new one?
The original also performs some kind of easing on the alpha noise variable, my understanding from working out the noise derivatives is that this would definitely need to have a derivative version, but should the derivative from that noise play any role in the final mix?
Hello Everyone,
I am a newbie in data science and would like to know the significance of using the abs () function and squaring the values received as an output of fft () function of python's scipy. fftpack library, used while trying to plot a power spectral density for a dataset. I have found that many of code examples to plot a power spectral density do use an abs () and then square the values obtained thereafter. Can anyone please provide me a reason for doing so? Can't we just directly plot the values obtained from fft () function in python's scipy. fftpack library?
Here is the code I have written till now to plot a power spectral density by referring some of the code examples,
import scipy.io as sio
import numpy as np
Import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("denoised.csv")
data = df.values
x = data[:,0]
from scipy.fftpack import fft,fftfreq
dft= fft(data)
PSD = np.abs(dft) ** 2
The general-purpose FFT consumes complex-valued data (i.e., real and imaginary) and returns complex-valued data. Even if your input is real-only, all FFT routines I’m familiar with (FFTW, Numpy’s FFT, Scipy’s FFTPACK, Matlab, etc.) have fft() that returns complex-valued data.
So. To plot a complex-valued vector, we have to somehow convert it to real. One option is to plot the real & imag components separately but that’s usually not as interesting as the magnitude/abs (real-squared plus imag-squared): real versus imag can tell us the behavior of the phase of the signal, which for real signals is usually random and uninteresting, whereas the magnitude combines the real and imag components and tells us in a straightforward way the amount of energy in a given frequency bin—useful!
If the magnitude of a complex number is its energy, the magnitude-squared is its power. Often engineers like to see magnitude-squared because they can cross-reference that number with, say, the power ratings of the hardware they’re working with. It’s just a convention.
Some side-notes: if your data is real, a real-to-complex FFT will run faster. It’s called rfft but it’s output is a little confusing: it returns the complex output formatted as [real, imag, real, imag, …]. (The community has raised concerns about this unusual and non-standard convention by FFTPACK in this Scipy issue.) If possible, I usually try and use numpy.fft.rfft because it returns complex-valued data as one would expect. (This real-to-complex rfft returns half as many complex-valued outputs as the complex-to-complex fft, that’s where the runtime improvement comes from.)
Another side-note: this question isn’t really related to data science, just digital signal processing. Consider asking such questions on http://dsp.stackexchange.com next time (no big deal that you asked it here though).
I’m exploring 3D interactive volume convolution with some simple stencils using dask right now.
Let me explain what I mean:
Assume that you have a 3D data which you would like to process through Sobel Transform (for example to get L1 or L2 gradient).
Then you divide your input 3D image into subvolumes (with some overlapping boundaries – for 3x3x3 stencil Sobel it will demand +2 samples overlap/padding)
Now let’s assume that you create a delayed computation of the Sobel 3D transform on entire 3D volume – but not executing it yet.
And now the most important part:
I want to write a function which will extract some particular 2D section from the virtually transformed data.
And then finally let dask everything to compute:
But what I need dask to do is not to compute the entire transform for me and then provide a section.
I need it to execute only those tasks which are needed to compute that particular 2D transformed image slice.
Do you think – it’s possible?
In order to explain it with image – please consider this to be a 3D domain decomposition (this is from DWT – but good for illustration from here):
illistration of domain decomposition
And assume that there is a function which computes 3D transform of the entire volume using dask.
But what I would like to get – for example – is 2D image of the transformed 3D data which consists from LLL1,LLH1,HLH1,HLL1 planes (essentially a single slice).
The important part is not to compute the whole subcubes – but let dask somehow automatically track the dependencies in the compute graph and evaluate only those.
Please don’t worry about compute v.s. copy time.
Assume that it has perfect ratio.
Let me know if more clarification is needed!
Thanks for your help!
I'm hearing a few questions. I'll answer each individually
Can Dask track which tasks are required for a subset of outputs and only compute those?
Yes. Lazy Dask operations produce a dependency graph. In the case of dask.arrays this graph is per-chunk. If your output only depends on a subset of the graph then Dask will remove tasks that are not necessary. The in-depth docs for this are here and the cull optimization in particular.
As an example consider this 100,000 by 100,000 array
>>> x = da.random.random((100000, 100000), chunks=(1000, 1000))
And lets say that I add a couple of 1d slices from it
>>> y = x[5000, :] + x[:, 5000].T
The resulting optimized graph is only large enough to compute the output
>>> graph = y._optimize(y.dask, y._keys()) # you don't need to do this
>>> len(graph) # it happens automatically
301
And we can compute the result quite quickly:
In [8]: %time y.compute()
CPU times: user 3.18 s, sys: 120 ms, total: 3.3 s
Wall time: 936 ms
Out[8]:
array([ 1.59069994, 0.84731881, 1.86923216, ..., 0.45040813,
0.86290539, 0.91143427])
Now, this wasn't perfect. It did have to produce all of the 1000x1000 chunks that our two slices touched. But you can control the granularity there.
Short answer: Dask will automatically inspect the graph and only run those tasks that are necessary to evaluate the output. You don't need to do anything special to do this.
Is it a good idea to do overlapping array computations with dask.array?
Maybe. The relevant doc page is here on Overlapping Blocks with Ghost Cells. Dask.array has convenience functions to make this easy to write down. However it will create in-memory copies. Many people in your position find memcopy too slow. Dask generally doesn't support in-place computation so we can't be as efficient as proper MPI code. I'll leave the performance question here to you though.
Not to detract from the nicely laid out answer from #MRocklin, but more to add to it.
I also regularly find myself needing to do things like edge detection and other image processing techniques on large scale array data. As Dask is a very nice library for constructing and exploring such computational workflows on large array data, have put together some utility libraries for some common image processing techniques in a GitHub organization called dask-image. They have largely been designed to mimic SciPy's ndimage API.
As to using a Sobel operator with Dask, one can use this sobel function from dask-ndfilters (permissively licensed) to perform this operation on a Dask Array. It handles proper haloing on the blocks underneath the hood returning a new Dask Array.
As SciPy's sobel function (and dask-ndfilters' sobel as well) operate on one dimension, one will need to map over each axis and stack to get the full Sobel operator result. That said, this is quite straightforward to do. Below is a brief snippet showing how to do this on a random Dask Array. Also included is taking a slice along the XZ-plane. Though one could just as easily take any other slice or perform additional operations on the data.
Hope this helps. :)
import dask.array as da
import dask_ndfilters as da_ndfilt
d = da.random.random((100, 120, 140), chunks=(25, 30, 35))
ds = da.stack([da_ndfilt.sobel(d, axis=i) for i in range(d.ndim)])
dsp = ds[:, :, 0, :]
asp = dsp.compute()
I have a simple Python 3 TKinter Image Editor using OpenCV3 and numpy.
I wanted to implement a Fourier Transform and used the first example from here with numpy:
f = np.fft.fft2(img)
fshift = np.fft.fftshift(f)
magnitude_spectrum = 20*np.log(np.abs(fshift))
I can't just use fshift as result and display it. Unfortunately the line with 20*np.log... isn't explained on the site and reading other explanations for the fourier transform didn't explain it to me either.
Until now i have no parameter for this manipulation, but I tested that the output differs if I change 20 to something else. Should this be user configurable or why does this code example do this?
As explained in another answer, fshift is complex and cannot be displayed directly. If you want to see the spectrum you need to take the absolute value.
However, there is a reason for taking the logarithm. The values of the spectrum vary over a great range. So, often you only see a single peak (or in 2D a bright spot) at the center. The logarithm compresses the range of values - larger peaks are scaled down more than smaller peaks. This is useful for visualization because it allows you to see details at all amplitudes.
Here is an illustration:
Note that the scaling factor of 20 has a physical meaning for signals:
20 * log(abs(f)) = 10 * log(abs(f)^2)
The factor 10 is arbitrary, but the factor 2 (2*10) is equivalent to squaring the spectrum before taking fhe logarithm. If you only want to visualize the FFT, this factor does not matter - only the logarithm is important.
You cannot display fshift because it is a complex array. You can only plot real valued arrays.
In [26]: fshift.dtype
Out[26]: dtype('complex128')
np.abs returns the modulus or magnitude of those complex numbers.
You can see the parameters of fft in the numpy documentation. Remark that Scipy also has a fft implementation, a wrapper of fftpack. Last, you may find the scipy tutorial on Fourier Transforms useful.
I have seen (by researching) convolution being done via numpy, but if I wish to convolve two standard distributions (specifically a normal with a uniform) which are readily available in the scipy library, is there a direct way of doing it rather than creating two arrays via numpy and convolve?
In general, computing convolutions for distributions requires solving integrals. I worked on this problem as part of the work for my dissertation [1] and wrote some Java (rather idiosyncratic) to carry out the operations. Basically my approach was to make a catalog of distributions for which there are known results, and fall back on a numerical method (convolution via discretization and FFT) if there is no known result.
For the combination of a Gaussian and a uniform, the result is like a Gaussian bump split into two and pasted onto each end of a uniform distribution, when the uniform is wide enough, otherwise it just looks like a bump. I can try to find formulas for that if you are interested.
You can try to compute the integrals via a symbolic computation system such as Maxima. [2] For example, Maxima says the convolution of a unit Gaussian with a unit uniform is:
-(erf((sqrt(2)*s-sqrt(2))/2)-erf(s/sqrt(2)))/2
[1] http://riso.sourceforge.net/docs/dodier-dissertation.pdf (in particular section C.3.17)
[2] http://sourceforge.net/p/maxima