As a python newbie I need a little help.
I have an array with 100 rows and 100 columns. Each position stands for a temperature value. I now want to calculate the mean of the whole array (I have that so far) and then create a new array with the same dimension like the first one and with the standrard deviation at each positions. At the end I want to get an array with the deviation from the mean at each postion, so I want to know, how far each value spreads from the mean. I hope you understand what I mean? For better understanding: the array is an infrared thermography image of a house. With the calulation of standard deviation I want to get the best reactive/sensitive pixels in the image. Maybe someone has done something like this before. In the end I want to export the file, so that I get an image that is similar looking to the infrared image. But not with the raw temperatures but the standard deviation temperatures.
Importing the file and calculating the mean like this:
data_mean = []
my_array = np.genfromtxt((line.replace(',','.') for line in data),skip_header=9,delimiter=";")
data_mean.append(np.nanmean(my_array))
Then I need calculation the standard deviation of each position in the array.
Thank you so much in advance for any help!
data_mean = np.mean(my_array) #gets you the mean of the whole array
return an array where every value is the mean of your data
meanArray = np.ones(my_array.shape)*data_mean
variationFromMean = my_array - meanArray
Is this what you were looking for?
If you are keeping the data in an array format here is a solution:
import numpy as np
#Find the mean of the array data values
mean_value = np.mean(data_mean)
#Find the standard deviation of the array data values
standard_deviation = np.std(data_mean)
#create an array consisting of the standard deviations from the mean
array = data_mean/standard_deviation
Related
I have a data array on which I have performed an FFT. This is the code that I have applied.
import numpy as np
# "data" is a column vector on which FFT needs to be performed
# N = No. of points in "data"
# dt = time interval between two corresponding data points
FFT_data = np.fft.fft(data) # Complex values
FFT_data_real = 2/N*abs(FFT_data) # Absolute values
However, I went through following link: https://www.dsprelated.com/showarticle/1159.php
Here it says, to enhance the SNR we can apply "RMS-averaged FFT" and "Vector Averaged FFT".
Can somebody please let me know how to we go about doing these two methodologies in Python or is there any documentation/links to which we can refer ?
As your reference indicates:
If you take the square root of the average of the squares of your sample spectra, you are doing RMS Averaging. Another alternative is Vector Averaging in which you average the real and complex components separately.
Obviously to be able to perform either averaging you'd need to have more than a single data set to average. In your example code, you have a single column vector data. Let's assume you have multiple such column vectors arranged as a 2D NxM matrix, where N is the number of points per dataset and M is the number of datasets. Since the datasets are stored in columns, when computing the FFT you will need to specify the parameter axis=0 to compute the FFT along columns.
RMS-averaged FFT
As the name suggests, for this method you need to take the square-root of the mean of the squared amplitudes. Since the different sets are stored in columns, you'd need to do the average along the axis 1 (the other axis than the one used for the FFT).
FFT_data = np.fft.fft(data, axis=0) # Complex values
FFT_data_real = 2/N*abs(FFT_data) # Absolute values
rms_averaged = np.sqrt(np.mean(FFT_data_real**2, axis=1))
Vector Averaged FFT
In this case you need to obtain the real and imaginary components of the FFT data, then compute the average on each separately:
FFT_data = np.fft.fft(data, axis=0) # Complex values
real_part_avg = 2/N*np.mean(np.real(FFT_data),axis=1)
imag_part_avg = 2/N*np.mean(np.imag(FFT_data),axis=1)
vector_averaged = np.abs(real_part_avg+1j*imag_part_avg)
Note that I've kept the 2/N scaling you had for the absolute values.
But what can I do if I really only have one dataset?
If that dataset happens to be stationary and sufficiently large then you could break down your dataset into smaller blocks. This can be done by reshaping your vector into an NxM matrix with the following:
data = data.reshape(N,M)
...
Then you could perform the averaging with either method.
So I have a dataset that I want to be normalized. The datset contains of bunch of numbers so im just going to post one line of it:
1,1,22,22,22,19,18,14,49.895756,17.775994,5.27092,0.771761,0.018632,0.006864,0.003923,0.003923,0.486903,0.100025,1,0
Does anyone know how to do it? I'm not allowed to use Scikit-Learn.
Normalization takes all your values and transforms them so that they lie in between 0 and 1.
To perform this:
First find the minimum value (call it a) and the maximum value (call it b)
Take every value in your data set (call it d) and find (d-a)/(b-a).
(d-a) makes sure that the range goes from [a,b] to [0,b-a] and then dividing by (b-a) makes the range [0,1].
In Python you would first convert your dataset to a numpy array (a much more efficient data structure)
import numpy as np
d = np.array(your_dataset)
Then find the max and min
a = d.min()
b = d.max()
Finally you perform the operation
d = (d-a)/(b-a)
In order to normalize a dataset you simply calculate the average df['column_name'].mean() and standard deviation df['column_name'].std() for the dataset and subsequently subtract the average from every value in your dataset and divide the result by the standard deviation.
So the result would look something like this:
avg = df['column_name'].mean()
std = df['column_name'].std()
normalized = (df['column_name'] - avg) / std
Say I have a data set of 100 data. The interesting part about this data set is that each data is a 4x3 matrix. My question is how should I calculate the standard deviation of this data set? I tried the following code, but I don't know if the result is correct. If it is correct, I want to know how it works. I know the standard deviation equation for 1d data, but I don't know the definition of std for a collection of m x n data. There is only explanation for 1d data in the docstring of np.std.
import numpy as np
datalist = []
for _ in range(100):
data = np.random.random((4,3))
datalist.append(data)
std = np.std(np.asarray(datalist))
print(std)
Seems like you're having unnecessary steps. To begin with, you can get 100 matrices of 4x3 like this:
x = np.random.rand(100, 4, 3)
Then just call np.std on it:
np.std(x)
0.2827262559096299
That's if you want the standard deviation of all values. If you want it per matrix cell, specify the axis argument:
np.std(x, axis=0)
array([[0.27863211, 0.2670126 , 0.28752064],
[0.28540484, 0.25365294, 0.28905531],
[0.28848584, 0.27695767, 0.26886147],
[0.27138472, 0.3135065 , 0.29361115]])
axis=0 means that it's going to collapse the axis 0 (the one with size 100), which will return a matrix of 4x3.
I am reading a csv file in python and preparing a dataframe out of it. I have a Microsoft Kinect which is recording Arm Abduction exercise and generating this CSV file.
I have this array of Y-Coordinates of ElbowLeft joint. You can visualize this here. Now, I want to come up with a solution which can count number of peaks or local maximum in this array.
Can someone please help me to solve this problem?
You can use the find_peaks_cwt function from the scipy.signal module to find peaks within 1-D arrays:
from scipy import signal
import numpy as np
y_coordinates = np.array(y_coordinates) # convert your 1-D array to a numpy array if it's not, otherwise omit this line
peak_widths = np.arange(1, max_peak_width)
peak_indices = signal.find_peaks_cwt(y_coordinates, peak_widths)
peak_count = len(peak_indices) # the number of peaks in the array
More information here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks_cwt.html
It's easy, put the data in a 1-d array and compare each value with the neighboors, the n-1 and n+1 data are smaller than n.
Read data as Robert Valencia suggests
max_local=0
for u in range (1,len(data)-1):
if ((data[u]>data[u-1])&(data[u]>data[u+1])):
max_local=max_local+1
You could try to smooth the data with a smoothing filter and then find all values where the value before and after are less than the current value. This assumes you want all peaks in the sequence. The reason you need the smoothing filter is to avoid local maxima. The level of smoothing required will depend on the noise present in your data.
A simple smoothing filter sets the current value to the average of the N values before and N values after the current value in your sequence along with the current value being analyzed.
I have a 2-D array of coordinates and each coordinates correspond to a value z (like z=f(x,y)). Now I want to divide this whole 2-D coordinate set into, for example, 100 even bins. And calculate the median value of z in each bin. Then use scipy.interpolate.griddata function to create a interpolated z surface. How can I achieve it in python? I was thinking of using np.histogram2d but I think there is no median function in it. And I found myself have hard time understanding how scipy.stats.binned_statistic work. Can someone help me please. Thanks.
With numpy.histogram2d you can both count the number of data and sum it, thus it gives you the possibility to compute the average.
I would try something like this:
import numpy as np
coo=np.array([np.arange(1000),np.arange(1000)]).T #your array coordinates
def func(x, y): return x*(1-x)*np.sin(np.pi*x) / (1.5+np.sin(2*np.pi*y**2)**2)
z = func(coo[:,0], coo[:,1])
(n,ex,ey)=np.histogram2d(coo[:,0], coo[:,1],bins=100) # here we get counting
(tot,ex,ey)=np.histogram2d(coo[:,0], coo[:,1],bins=100,weights=z) # here we get total over z
average=tot/n
average=np.nan_to_num(average) #cure 0/0
print(average)
you'll need a few functions or one depending on how you want to structure things:
function to create the bins should take in your data, determine how big each bin is and return an array or array of arrays (also called lists in python).
Happy to help with this but would need more information about the data.
get the median of the bins:
Numpy (part of scipy) has a median function
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.median.html
essentially the median on an array called
"bin"
would be:
$ numpy.median(bin)
Note: numpy.median does accept multiple arrays, so you could get the median for some or all of your bins at once. numpy.median(bins) which would return an array of the median for each bin
Updated
Not 100% on your example code, so here goes:
import numpy as np
# added some parenthesis as I wasn't sure of the math. also removed ;'s
def bincalc(x, y):
return x*(1-x)*(np.sin(np.pi*x))/(1.5+np.sin(2*(np.pi*y)**2)**2)
coo = np.random.rand(1000,2)
tcoo = coo[0]
a = []
for i in tcoo:
a.append(bincalc(coo[0],coo[1]))
z_med = np.median(a)
print(z_med)`