How to extract specific parts of a numpy array? - python

I have the following looking correlation function.
I want to extract only the main peak of the function in a seperate array. The central peak has the form of a gaussian.. I want to seperate the peak with a width arround the peak of approximately four times the FWHM of the gaussian peak. I have the correlation function stored in a numpy array. Any tips/ideas how to approach this ?

Numpy's argmax (Docs) function returns the index of the max value of a numpy array. With that value you could then get the values around that index.
Example:
m = numpy.argmax(arr)
values = arr[m-width:m+width]

Related

Vector and RMS averaging in FFT

I have a data array on which I have performed an FFT. This is the code that I have applied.
import numpy as np
# "data" is a column vector on which FFT needs to be performed
# N = No. of points in "data"
# dt = time interval between two corresponding data points
FFT_data = np.fft.fft(data) # Complex values
FFT_data_real = 2/N*abs(FFT_data) # Absolute values
However, I went through following link: https://www.dsprelated.com/showarticle/1159.php
Here it says, to enhance the SNR we can apply "RMS-averaged FFT" and "Vector Averaged FFT".
Can somebody please let me know how to we go about doing these two methodologies in Python or is there any documentation/links to which we can refer ?
As your reference indicates:
If you take the square root of the average of the squares of your sample spectra, you are doing RMS Averaging. Another alternative is Vector Averaging in which you average the real and complex components separately.
Obviously to be able to perform either averaging you'd need to have more than a single data set to average. In your example code, you have a single column vector data. Let's assume you have multiple such column vectors arranged as a 2D NxM matrix, where N is the number of points per dataset and M is the number of datasets. Since the datasets are stored in columns, when computing the FFT you will need to specify the parameter axis=0 to compute the FFT along columns.
RMS-averaged FFT
As the name suggests, for this method you need to take the square-root of the mean of the squared amplitudes. Since the different sets are stored in columns, you'd need to do the average along the axis 1 (the other axis than the one used for the FFT).
FFT_data = np.fft.fft(data, axis=0) # Complex values
FFT_data_real = 2/N*abs(FFT_data) # Absolute values
rms_averaged = np.sqrt(np.mean(FFT_data_real**2, axis=1))
Vector Averaged FFT
In this case you need to obtain the real and imaginary components of the FFT data, then compute the average on each separately:
FFT_data = np.fft.fft(data, axis=0) # Complex values
real_part_avg = 2/N*np.mean(np.real(FFT_data),axis=1)
imag_part_avg = 2/N*np.mean(np.imag(FFT_data),axis=1)
vector_averaged = np.abs(real_part_avg+1j*imag_part_avg)
Note that I've kept the 2/N scaling you had for the absolute values.
But what can I do if I really only have one dataset?
If that dataset happens to be stationary and sufficiently large then you could break down your dataset into smaller blocks. This can be done by reshaping your vector into an NxM matrix with the following:
data = data.reshape(N,M)
...
Then you could perform the averaging with either method.

How to find peaks in 1d array

I am reading a csv file in python and preparing a dataframe out of it. I have a Microsoft Kinect which is recording Arm Abduction exercise and generating this CSV file.
I have this array of Y-Coordinates of ElbowLeft joint. You can visualize this here. Now, I want to come up with a solution which can count number of peaks or local maximum in this array.
Can someone please help me to solve this problem?
You can use the find_peaks_cwt function from the scipy.signal module to find peaks within 1-D arrays:
from scipy import signal
import numpy as np
y_coordinates = np.array(y_coordinates) # convert your 1-D array to a numpy array if it's not, otherwise omit this line
peak_widths = np.arange(1, max_peak_width)
peak_indices = signal.find_peaks_cwt(y_coordinates, peak_widths)
peak_count = len(peak_indices) # the number of peaks in the array
More information here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks_cwt.html
It's easy, put the data in a 1-d array and compare each value with the neighboors, the n-1 and n+1 data are smaller than n.
Read data as Robert Valencia suggests
max_local=0
for u in range (1,len(data)-1):
if ((data[u]>data[u-1])&(data[u]>data[u+1])):
max_local=max_local+1
You could try to smooth the data with a smoothing filter and then find all values where the value before and after are less than the current value. This assumes you want all peaks in the sequence. The reason you need the smoothing filter is to avoid local maxima. The level of smoothing required will depend on the noise present in your data.
A simple smoothing filter sets the current value to the average of the N values before and N values after the current value in your sequence along with the current value being analyzed.

Get median value in each bin in a 2D grid

I have a 2-D array of coordinates and each coordinates correspond to a value z (like z=f(x,y)). Now I want to divide this whole 2-D coordinate set into, for example, 100 even bins. And calculate the median value of z in each bin. Then use scipy.interpolate.griddata function to create a interpolated z surface. How can I achieve it in python? I was thinking of using np.histogram2d but I think there is no median function in it. And I found myself have hard time understanding how scipy.stats.binned_statistic work. Can someone help me please. Thanks.
With numpy.histogram2d you can both count the number of data and sum it, thus it gives you the possibility to compute the average.
I would try something like this:
import numpy as np
coo=np.array([np.arange(1000),np.arange(1000)]).T #your array coordinates
def func(x, y): return x*(1-x)*np.sin(np.pi*x) / (1.5+np.sin(2*np.pi*y**2)**2)
z = func(coo[:,0], coo[:,1])
(n,ex,ey)=np.histogram2d(coo[:,0], coo[:,1],bins=100) # here we get counting
(tot,ex,ey)=np.histogram2d(coo[:,0], coo[:,1],bins=100,weights=z) # here we get total over z
average=tot/n
average=np.nan_to_num(average) #cure 0/0
print(average)
you'll need a few functions or one depending on how you want to structure things:
function to create the bins should take in your data, determine how big each bin is and return an array or array of arrays (also called lists in python).
Happy to help with this but would need more information about the data.
get the median of the bins:
Numpy (part of scipy) has a median function
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.median.html
essentially the median on an array called
"bin"
would be:
$ numpy.median(bin)
Note: numpy.median does accept multiple arrays, so you could get the median for some or all of your bins at once. numpy.median(bins) which would return an array of the median for each bin
Updated
Not 100% on your example code, so here goes:
import numpy as np
# added some parenthesis as I wasn't sure of the math. also removed ;'s
def bincalc(x, y):
return x*(1-x)*(np.sin(np.pi*x))/(1.5+np.sin(2*(np.pi*y)**2)**2)
coo = np.random.rand(1000,2)
tcoo = coo[0]
a = []
for i in tcoo:
a.append(bincalc(coo[0],coo[1]))
z_med = np.median(a)
print(z_med)`

Visualization of Standard Deviation in an array

As a python newbie I need a little help.
I have an array with 100 rows and 100 columns. Each position stands for a temperature value. I now want to calculate the mean of the whole array (I have that so far) and then create a new array with the same dimension like the first one and with the standrard deviation at each positions. At the end I want to get an array with the deviation from the mean at each postion, so I want to know, how far each value spreads from the mean. I hope you understand what I mean? For better understanding: the array is an infrared thermography image of a house. With the calulation of standard deviation I want to get the best reactive/sensitive pixels in the image. Maybe someone has done something like this before. In the end I want to export the file, so that I get an image that is similar looking to the infrared image. But not with the raw temperatures but the standard deviation temperatures.
Importing the file and calculating the mean like this:
data_mean = []
my_array = np.genfromtxt((line.replace(',','.') for line in data),skip_header=9,delimiter=";")
data_mean.append(np.nanmean(my_array))
Then I need calculation the standard deviation of each position in the array.
Thank you so much in advance for any help!
data_mean = np.mean(my_array) #gets you the mean of the whole array
return an array where every value is the mean of your data
meanArray = np.ones(my_array.shape)*data_mean
variationFromMean = my_array - meanArray
Is this what you were looking for?
If you are keeping the data in an array format here is a solution:
import numpy as np
#Find the mean of the array data values
mean_value = np.mean(data_mean)
#Find the standard deviation of the array data values
standard_deviation = np.std(data_mean)
#create an array consisting of the standard deviations from the mean
array = data_mean/standard_deviation

pandas rolling_std only perform every Nth calculation

I am working on some code optimization. Currently I use the pandas rolling_mean and rolling_std to compute normalized cross correlations of time series data from seismic instruments. For non-pertinent technical reasons I am only interested in every Nth value of the output of these pandas rolling mean and rolling std calls, so I am looking for away to only compute every Nth value. I may have to write a cython code to do this but I would prefer not to. Here is an example:
import pandas as pd
import numpy as np
As=5000 #Array size
as=150 #Moving window size
N=3 # only interested in every N values of output array
ar=np.random.rand(As) # generate generic random array
RSTD=pd.rolling_std(ar,as)[as-1:] # dont return the nans before widows overlap
foo=RSTD[::N] # use array indexing to decimate RSTD to only return every Nth value
Is there a good pandas way to only calculate every Nth value of RSTD rather than calculate all the values and decimate?
Thanks

Categories

Resources