Finding FFT Coefficents With Python - python

I am a new user of Python. I have a signal that contains 16 datas.
for example:
'a = [ 1, 2, 3, 4, 1, 1, 1, 1, 1, 1 ,2, 3, 4, 1, 1]'
I tried to numpy.fft.fft but I can not figure out how can I sum these frequencies and calculate the Fourier Coefficients.
Thank you.

The numpy docs include a helpful example at the end of the page for np.fft.fft (https://numpy.org/doc/stable/reference/generated/numpy.fft.fft.html)
Basically, you want to use np.fft.fft(a) to transform your data, in tandem with np.fft.fftfreq(np.shape(a)[-1]) to figure out which frequencies your transform corresponds to.
Check out the docs for np.fft.fftfreq as well (https://numpy.org/doc/stable/reference/generated/numpy.fft.fftfreq.html#numpy.fft.fftfreq)
See here (https://dsp.stackexchange.com/questions/26927/what-is-a-frequency-bin) for a discussion on frequency bins and here (https://realpython.com/python-scipy-fft/) for a solid tutorial on scipy/numpy fft.

Related

Automatically detecting clusters in a 2d array/heatmap

I have run object detection on a video file and summed the seconds each pixel is activated to find the amount of time an object is shown in this area which gives me a 2d array of time values. Since these objects are in the same position of the video most of the time it leads to some areas of the screen having much higher activation than others. Now I would like to find a way to automatically detect "clusters" without knowing the number of clusters beforehand. I have considered using something like k-means but also read a little about finding local maximums, but I can't quite figure out how to put all this together or which method is the best to go with. Also, the objects vary in size, so I'm not sure I can go with the local maximum method?
The final result would be a list of ids and maximum time value for each cluster.
[[3, 3, 3, 0, 0, 0, 0, 0, 0]
[3, 3, 3, 0, 0, 0, 2, 2, 2]
[3, 3, 3, 0, 0, 0, 2, 2, 2]
[0, 0, 0, 0, 0, 0, 2, 2, 2]]
From this example array I would end out with a list:
id | Seconds
1 | 3
2 | 2
I havn't tried much since I have no clue where to start and any recommendations of methods with code examples or links to where I can find it to accomplish this would be greatly appreciated! :)
You could look at different methods for clustering in: https://scikit-learn.org/stable/modules/clustering.html
If you do not know the number of clusters beforehand you might want to use a different algorithm than K-means (one which is not dependent on the number of clusters). I would suggest reading about dbscan and hdbscan for this task. Good luck :)

argrelextrema and flat extrema

Function argrelextrema from scipy.signal does not detect flat extrema.
Example:
import numpy as np
from scipy.signal import argrelextrema
data = np.array([ 0, 1, 2, 1, 0, 1, 3, 3, 1, 0 ])
argrelextrema(data, np.greater)
(array([2]),)
the first max (2) is detected, the second max (3, 3) is not detected.
Any workaround for this behaviour?
Thanks.
Short answer: Probably argrelextrema will not be flexible enough for your task. Consider writing your own function matching your needs.
Longer answer: Are you bound to use argrelextrema? If yes, then you can play around with the comparator and the order arguments of argrelextrema (see the reference).
For your easy example, it would be enough to chose np.greater_equal as comparator.
>>> data = np.array([ 0, 1, 2, 1, 0, 1, 3, 3, 1, 0 ])
>>> print(argrelextrema(data, np.greater_equal,order=1))
(array([2, 6, 7]),)
Note however that in this way
>>> data = np.array([ 0, 1, 2, 1, 0, 1, 3, 3, 4, 1, 0 ])
>>> print(argrelextrema(data, np.greater_equal,order=1))
(array([2, 6, 8]),)
behaves differently that you would probably like, finding the first 3 and the 4 as maxima, since argrelextrema now sees everything as a maximum that is greater or equal to its two nearest neighbors. You can now use the order argument to decide to how many neighbors this comparison must hold - choosing order=2 would change my upper example to only find 4 as a maximum.
>>> print(argrelextrema(data, np.greater_equal,order=2))
(array([2, 8]),)
There is, however, a downside to this - let's change the data once more:
>>> data = np.array([ 0, 1, 2, 1, 0, 1, 3, 3, 4, 1, 5 ])
>>> print(argrelextrema(data, np.greater_equal,order=2))
(array([ 2, 10]),)
Adding another peak as a last value keeps you from finding your peak at 4, as argrelextrema is now seeing a second-neighbor that is greater than 4 (which can be useful for noisy data, but not necessarily the behavior expected in all cases).
Using argrelextrema, you will always be limited to binary operations between a fixed number of neighbors. Note, however, that all argrelextrema is doing in your example above is to return n, if data[n] > data[n-1] and data[n] > data[n+1]. You could easily implement this yourself, and then refine the rules, for example by checking the second neighbor in case that the first neighbor has the same value.
For the sake of completeness, there seems to be a more elaborate function in scipy.signal, find_peaks_cwt. I have however no experience using it and can therefore not give you more details about it.
I'm really surprised that no one figured out an answer to this. All you need to do is preprocess the array to remove duplicates that are located next to each other and you can run argrelextrema like so:
import numpy as np
from scipy.signal import argrelextrema
data = np.array([ 0, 1, 2, 1, 0, 1, 3, 3, 1, 0 ])
filter_table = [False] + list(np.equal(data[:-1], data[1:]))
data = np.array([x for idx, x in enumerate(data) if not filter_table[idx]])
argrelextrema(data, np.greater)

How to plot histogram of multiple lists?

I have a dataset with 13k Kickstarter projects and their tweets over the duration of a project. Each project contains a list with the number of tweets for each day,
e.g. [10, 2, 4, 7, 2, 4, 3, 0, 4, 0, 1, 3, 0, 3, 4, 0, 0, 2, 3, 2, 0, 4, 5, 1, 0, 2, 0, 2, 1, 2, 0].
I've taken a subset of the data by setting the duration of the projects on 31 days so that each list has the same length, containing 31 values.
This piece of code prints each list of tweets:
for project in data:
data[project]["tweets"]
What is the easiest way to plot a histogram with matplotlib? I need a frequency distribution of the total number of tweets for each day. How do I count the values from each index? Is their an easy way using Pandas to do this?
The lists are also accessible in a Pandas data frame:
df = pd.DataFrame.from_dict(data, orient='index')
df1 = df[['tweets']]
Histogram is probably not what you need. It's a good solution if you have a list of numbers (for example, IQs of people) and you want to attribute each number to a category (f.e. 79-, 80-99, 100+). There will be 3 bins and height of each bin will represent the quantity of numbers that fit in the corresponding category.
In your case, you already have the height of each bin, so (as I understand) what you want is a plot that looks like like a histogram. This (as I understand) is not supported by matplotlib and would require using matplotlib not the way it was intended to be used.
If you're OK with using plots instead of histograms, that's what you can do.
import matplotlib.pyplot as plt
lists = [data[project]["tweets"] for project in data] # Collect all lists into one
sum_list = [sum(x) for x in zip(*lists)] # Create a list with sums of tweets for each day
plt.plot(sum_list) # Create a plot for sum_list
plt.show() # Show the plot
If you want to make a plot look like a histogram you should do that:
plt.bar(range(0, len(sum_list)), sum_list)
instead of plt.plot.

Poisson Point Process in Python 3 with numpy, without scipy

I need to write a function in Python 3 which returns an array of positions (x,y) on a rectangular field (e.g. 100x100 points) that are scattered according to a homogenous spatial Poisson process.
So far I have found this resource with Python code, but unfortunately, I'm unable to find/install scipy for Python 3:
http://connor-johnson.com/2014/02/25/spatial-point-processes/
It has helped me understand what a Poisson point process actually is and how it works, though.
I have been playing around with numpy.random.poisson for a while now, but I am having a tough time interpreting what it returns.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.poisson.html
>>> import numpy as np
>>> np.random.poisson(1, (1, 5, 5))
array([[[0, 2, 0, 1, 0],
[3, 2, 0, 2, 1],
[0, 1, 3, 3, 2],
[0, 1, 2, 0, 2],
[1, 2, 1, 0, 3]]])
What I think that command does is creating one 5x5 field = (1, 5, 5) and scattering objects with a rate of lambda = 1 over that field. The numbers displayed in the resulting matrix are the probability of an object lying on that specific position.
How can I scatter, say, ten objects over that 5x5 field according to a homogenous spatial Poisson process? My first guess would be to iterate over the whole array and insert an object on every position with a "3", then one on every other position with a "2", and so on, but I'm unsure of the actual probability I should use to determine if an object should be inserted or not.
According to the following resource, I can simulate 10 objects being scattered over a field with a rate of 1 by simply multiplying the rate and the object count (10*1 = 10) and using that value as my lambda, i.e.
>>> np.random.poisson(10, (1, 5, 5))
array([[[12, 12, 10, 16, 16],
[ 8, 6, 8, 12, 9],
[12, 4, 10, 3, 8],
[15, 10, 10, 15, 7],
[ 8, 13, 12, 9, 7]]])
However, I don't see how that should make things easier. I only increase the rate at which objects appear by 10 that way.
Poisson point process in matlab
To sum it up, my primary question is: How can I use numpy.random.poisson(lam, size) to model a number n of objects being scattered over a 2-dimensional field dx*dy?
It seems I've looked at the problem in the wrong way. After more offline research I found out that it actually is sufficient to create a random Poisson value which represents the number of objects, for example
n = np.random.poisson(100) and create the same amount of random values between 0 and 1
x = np.random.rand(n)
y = np.random.rand(n)
Now I just need to join the two arrays of x- and y-values to an array of (x,y) tuples. Those are the random positions I was looking for. I can multiply every x and y value by the side length of my field, e.g. 100, to scale the values to the 100x100 field I want to display.
I thought that the "randomness" of those positions should be determined by a random Poisson process, but it seems that just the number of positions needs to be determined by it, not the actual positional values.
That's all correct. You definitely don't need SciPy, though when I first simulated a Poisson point process in Python I also used SciPy. I presented the original code with details in the simulation process in this post:
https://hpaulkeeler.com/poisson-point-process-simulation/
I just use NumPy in the more recent code:
import numpy as np; #NumPy package for arrays, random number generation, etc
import matplotlib.pyplot as plt #for plotting
#Simulation window parameters
xMin=0;xMax=1;
yMin=0;yMax=1;
xDelta=xMax-xMin;yDelta=yMax-yMin; #rectangle dimensions
areaTotal=xDelta*yDelta;
#Point process parameters
lambda0=100; #intensity (ie mean density) of the Poisson process
#Simulate a Poisson point process
numbPoints = np.random.poisson(lambda0*areaTotal);#Poisson number of points
xx = xDelta*np.random.uniform(0,1,numbPoints)+xMin;#x coordinates of Poisson points
yy = yDelta*np.random.uniform(0,1,numbPoints)+yMin;#y coordinates of Poisson points
The code can also be found here:
https://github.com/hpaulkeeler/posts/tree/master/PoissonRectangle
I've also uploaded there more Python (and MATLAB and Julia) code for simulating several points processes, including Poisson point processes on various shapes and cluster point processes.
https://github.com/hpaulkeeler/posts

How to convert Euclidean distances between vectors to similarity scores

Below is my code for calculating Euclidean distance between vectors, and a snippet of my transformed data set (vectors).
import itertools
import numpy as np
vect=[[2, 1, 1, 1, 1, 3, 4, 2, 5, 1],
[1, 5, 2, 1, 1, 1, 1, 1, 1, 2],
[2, 1, 1, 1, 2, 1, 1, 1, 1, 1]]
for u1, u2 in itertools.combinations(vect, 2):
x = np.array(u1)
y = np.array(u2)
space = np.linalg.norm(y - x)
print space
The Euclidean distances between the vectors are:
7.0
5.56776436283
4.472135955
My goal is to compute the similarity between the vectors and output a similarity score for each comparison. Typical similarity score lies between 0 and 1, with 0-being not similar and 1-exactly similar. The question here is how can I convert the Euclidean distances to similarity scores?
Someone suggested this formula: 1/1+d(P1, P2) ie inverse of Euclidean distance=similarity score.Any suggestions? thanks
There are plenty of similarity-measures out there. As user2357112 says, the best one depends on your application. I suggest taking a look at some of the kernels listed here:
http://crsouza.blogspot.co.uk/2010/03/kernel-functions-for-machine-learning.html
I have found the chi-square kernel to be a good default choice in my applications - particularly if the vectors are histograms.
If you have a subset of data for which you know already which ones you would like to be similar to each other i would suggest trying some different kernels and plotting the resultant similarity matrix over these samples ( if you had 100 test samples you would get a 100x100 similarity matrix that you could plot simply as an heat map using the imshow method in matplotlib.pyplot).

Categories

Resources