Python: plotting across time with a single column file

Python: plotting across time with a single column file - python

I must write a function that allows me to find the local max and min from a series of values.
Data for function is x, y of each "peak".
Output are 4 vectors that contain x, y max and min "peaks".
To find max peaks, I must "stand" on each data point and check it is mare or less than neighbors on both sides in order to decide if it is a peak (save as max/min peak).
Points on both ends only have 1 neighbor, do not consider those for this analysis.
Then write a program to read a data file and invoke the function to calculate the peaks. The program must generate a graph showing the entered data with the calculated peaks.
1st file is an Array of float64 of (2001,) size. All data is in column 0. This file represents the amplitude of a signal in time, frequency of sampling is 200Hz. Asume initial time is 0.
Graph should look like this
Program must also generate an .xls file that shows 2 tables; 1 with min peaks, and another with max peaks. Each table must be titled and consist of 2 column, one with the time at which peaks occur, and the other with the amplitude of each peak.
No Pandas allowed.
first file is a .txt file, and is single column, 2001 rows total
0
0.0188425
0.0376428
0.0563589
0.0749497
0.0933749
0.111596
0.129575
0.147277
0.164669
0.18172
...
Current attempt:
import numpy as np
import matplotlib.pyplot as plt
filename = 'location/file_name.txt'
T = np.loadtxt(filename,comments='#',delimiter='\n')
x = T[::1] # all the files of column 0 are x vales
a = np.empty(x, dtype=array)
y = np.linspace[::1/200]
X, Y = np.meshgrid(x,y)

This does what you ask. I had to generate random data, since you didn't share yours. You can surely build your spreadsheet from the minima and maxima values.
import numpy as np
import matplotlib.pyplot as plt
#filename = 'location/file_name.txt'
#T = np.loadtxt(filename,comments='#',delimiter='\n')
#
#y = T[::1] # all the files of column 0 are x vales
y = np.random.random(200) * 2.0
minima = []
maxima = []
for i in range(0,y.shape[0]-1):
if y[i-1] < y[i] and y[i+1] < y[i]:
maxima.append( (i/200, y[i]) )
if y[i-1] > y[i] and y[i+1] > y[i]:
minima.append( (i/200, y[i]) )
minima = np.array(minima)
maxima = np.array(maxima)
print(minima)
print(maxima)
x = np.linspace(0, 1, 200 )
plt.plot( x, y )
plt.scatter( maxima[:,0], maxima[:,1] )
plt.show()

Related

Interpolate: spectra (wavelength, counts) at a given temperature, to create grid of temperature and counts

I have a number of spectra: wavelength/counts at a given temperature. The wavelength range is the same for each spectrum.
I would like to interpolate between the temperature and counts to create a large grid of spectra (temperature and counts (at a given wavelength range).
The code below is my current progress. When I try to get a spectrum for a given temperature I only get one value of counts when I need a range of counts representing the spectrum (I already know the wavelengths).
I think I am confused about arrays and interpolation. What am I doing wrong?
import pandas as pd
import numpy as np
from scipy import interpolate
image_template_one = pd.read_excel("mr_image_one.xlsx")
counts = np.array(image_template_one['counts'])
temp = np.array(image_template_one['temp'])
inter = interpolate.interp1d(temp, counts, kind='linear')
temp_new = np.linspace(30,50,0.5)
counts_new = inter(temp_new)
I am now think that I have two arrays; [wavelength,counts] and [wavelength, temperature]. Is this correct, and, do I need to interpolate between the arrays?
Example data

I think what you want to achieve can be done with interp2d:
from scipy import interpolate
# dummy data
data = pd.DataFrame({
'temp': [30]*6 + [40]*6 + [50]*6,
'wave': 3 * [a for a in range(400,460,10)],
'counts': np.random.uniform(.93,.95,18),
})
# make the interpolator
inter = interpolate.interp2d(data['temp'], data['wave'], data['counts'])
# scipy's interpolators return functions,
# which you need to call with the values you want interpolated.
new_x, new_y = np.linspace(30,50,100), np.linspace(400,450,100)
interpolated_values = inter(new_x, new_y)

Filter frequency using python

I am new to Python so please pardon me if this question is very basic.
I have Accelerometer Vector Magnitude (acc_VM) signal with sampling frequency of 100Hz. I have to find the Fourier transform of this signal and find the fundamental frequency between range Df.
Df is the family of frequencies corresponding to walking. Here we use Df = [1.2, 4]Hz. How can I choose the frequency range Df = [1.2, 4]Hz using python should I implement filters OR is combFunction() the correct code ?
def combFunction(n):
combSignal = []
for element in n:
if element>1.2 and element<4 :
combSignal.append(element)
else:
combSignal.append(0)
return np.maximum(combSignal)
def hann(total_data):
hann_array = np.zeros(total_data)
for i in range(total_data):
hann_array[i] = 0.5 - 0.5 * np.cos((2 * np.pi * i)/(total_data - 1))
return hann_array
def calculate_FT(x):
hann_weight = hann(len(x))
x_multiplied_hann = x * hann_weight
X = np.abs(np.fft.rfft(x_multiplied_hann))
combSignal = combFunction(X)
calculate_FT(acc_VM)

The FFT does not return frequencies, but rather an array of amplitudes for a fixed set of evenly spaced frequencies.
As a result your combFunction, as implemented, would pick the components which have a spectrum amplitude between 1.2 and 4.
To be able to select frequencies, you would need the corresponding array of those evenly spaced frequencies, which you can get
from np.fft.rfftfreq.
Note that you will need the sampling rate (and if your data isn't uniformly sampled, you will need to resample it).
In the code that follows I'll use the variable sampling_rate for that. Then the frequencies will be given by:
freqs = np.fft.rfftfreq(len(data), sampling_rate)
Now let's extract the array indices corresponding to those frequencies that are within the frequency band of interest:
in_band = np.where([f >= 1.2 and f <= 4 for f in freqs])[0]
Then you may get the location within this band where the original spectrum X has a peak:
peak_location = np.argmax(X[in_band])
which gives you a peak spectrum amplitude X[in_band[peak_location]] at a frequency f[in_band[peak_location]].
Putting it all together should give you something like the following:
def find_peak_in_frequency_range(X, freqs, fmin, fmax):
in_band = np.where([f >= fmin and f <= fmax for f in freqs])[0]
peak_location = np.argmax(X[in_band])
return f[in_band[peak_location]], X[in_band[peak_location]]
def calculate_FT(x, sampling_rage):
hann_weight = hann(len(x))
x_multiplied_hann = x * hann_weight
X = np.abs(np.fft.rfft(x_multiplied_hann))
freqs = np.fft.rfftfreq(len(x), sampling_rate)
peakFreq,peakAmp = find_peak_in_frequency_range(X, freqs, 1.2, 4)
Note that you may get better results by using a spectrum estimation method such as scipy.signal.welch instead of simply taking the FFT.
For sake of illustration, I've ran the above on a sample data set (file 1.csv with some resampling):

Xarray: Loading several CSV files into a dataset

I have several comma-separated data files that I want to load into an xarray dataset. Each row in each file represents a different spatial value of a field in a fixed grid, and every file represents a different point in time. The grid spacing is fixed and unchanging in time. The spacing of the grid is not uniform. The ultimate goal is to compute max_{x, y} { std_t[ value(x, y, t) * sqrt(y **2 + x ** 2)] }, where sqrt is the square root, std_t is standard deviation with respect to time and max_{x, y} is the maximum across all space.
I am having trouble loading the data. It is not clear to me how one is supposed to load several CSV files into an xarray dataset. There is an open_mfdataset function, which is designed for loading several data files into a dataset, but seems to expect hdf5 or netcdf files.
It seems like there is no way to load regular CSV files into an xarray dataset, and that preprocessing the data is necessary. In my example, I decided to preprocess the csv files to hdf5 files beforehand, to make use of the h5netcdf engine. This has created what appears to be an hdf5-specific problem for me.
below is my best attempt at loading the data so far. Unfortunately, it results in an empty xarray dataset. I tried several options in the open_mfdataset function, and the following code is only one realization of several attempts at using the function.
How can I load these csv files into a single xarray dataset, to set myself up to find the maximum across space of the standard deviation in time of the value of interest?
import xarray as xr
import numpy as np
import pandas as pd
'''
Create example files
- Each file contains a spatial-dependent value, f(x, y)
- Each file represents a different point in time, f(x, y, t)
'''
for ii in range(7):
# create csv file
fl = open('exampleFile%i.dat' % ii, 'w')
fl.write('time x1 x2 value\n')
for xx in range(10):
for yy in range(10):
fl.write('%i %i %i %i\n' %
(ii, xx, yy, (xx - yy) * np.exp(ii)))
fl.close()
# convert csv to hdf5
dat = pd.read_csv('exampleFile%i.dat' % ii)
dat.to_hdf('exampleFile%i.hdf5' % ii, 'data', mode='w')
'''
Read all files into xarray dataframe
(the ultimate goal is to find the
maximum across time of
the standard deviation across space
of the "value" column)
'''
result = xr.open_mfdataset('exampleFile*.hdf5', engine='h5netcdf', combine='nested')
... When I run the code, the result variable does not appear to contain the desired data:
In: result
Out:
<xarray.Dataset>
Dimensions: ()
Data variables:
*empty*
Attributes:
PYTABLES_FORMAT_VERSION: 2.1
TITLE: Empty(dtype=dtype('S1'))
VERSION: 1.0
Edit
An answer was posted that assumes a uniformly spaced spatial grid. Here is a slightly modified example that does not assume an evenly-spaced grid of spatial points.
The example also assumes three spatial dimensions. That is more true to my real problem, and I realized that might be an important detail in this simple example.
import xarray as xr
import numpy as np
import pandas as pd
'''
Create example files
- Each file contains a spatial-dependent value, f(x, y)
- Each file represents a different point in time, f(x, y, t)
'''
for ii in range(7):
# create csv file
fl = open('exampleFile%i.dat' % ii, 'w')
fl.write('time x y z value\n')
for xx in range(10):
for yy in range(int(10 + xx // 2)):
for zz in range(int(10 + xx //3 + yy // 3)):
fl.write('%i %f %f %f %f\n' %
(ii, xx * np.exp(- 1 * yy * zz) , yy * np.exp(xx - zz), zz * np.exp(xx * yy), (xx - yy) * np.exp(ii)))
fl.close()
# convert csv to hdf5
dat = pd.read_csv('exampleFile%i.dat' % ii)
dat.to_hdf('exampleFile%i.hdf5' % ii, 'data', mode='w')
'''
Read all files into xarray dataframe
(the ultimate goal is to find the
maximum across time of
the standard deviation across space
of the "value" column)
'''
result = xr.open_mfdataset('exampleFile*.hdf5', engine='h5netcdf', combine='nested')

My approach would be to create a parsing function that converts the CSVs into xarray.Datasets.
This way you can use xarray.concat to combine them to a final dataset, on which you can perform your computations.
The following works with your example data:
from glob import glob
def csv2xr(csv, sep=" "):
df = pd.read_csv(csv, sep)
x = df.x1.unique()
y = df.x2.unique()
pix = df.value.values.reshape(1, x.size, y.size)
ds = xr.Dataset({
"value": xr.DataArray(
pix,
dims=['time', 'x', 'y'],
coords={"time": df.time.unique(), "x": x, "y": y})
})
return ds
csvs = glob("*dat")
ds_full = xr.concat([csv2xr(x) for x in csvs], dim="time")
print(ds_full)
#<xarray.Dataset>
# Dimensions: (time: 7, x: 10, y: 10)
# Coordinates:
# * time (time) int64 4 3 2 0 1 6 5
# * x (x) int64 0 1 2 3 4 5 6 7 8 9
# * y (y) int64 0 1 2 3 4 5 6 7 8 9
# Data variables:
# value (time, x, y) int64 0 -54 -109 -163 -218 -272 ... 593 445 296 148 0
Then to get the max of the std over time:
ds_full.std("time").max()

I hope I understood your problem. See if this works for you.
When defining the key arguments for read_csv, note that is is better using delim_whitespace=True instead of sep=" ". This will avoind considering double columns if somewhere you have double spaces.
I am passing to read_csv that time,x,y and z are all coordinates and I am converting them to xarray. It will automatically structure your unstructured data and fill the holes with NaN. Then I am concatenating all xarray objects into a single object by time.
from glob import glob
fnames = glob('*.dat')
fnames.sort()
kw = dict(delim_whitespace=True,index_col=['time','x','y','z'])
ds = xr.concat([pd.read_csv(fname,**kw).to_xarray() for fname in fnames],'time')
The final result is an xarray object like this:
Now you can do everything with this object.
ds.max(['x','y','z']).std('time') will return the standard deviation in time of the spatial maximum value for all variables (in this case it is only value column). Beware that sometimes you may have to pass skipna=True to avoid having NaN outputs from your analysis.
Please, let me know it that solves your problem and I would be glad adapting it if it does not tackle some specific issue your are having with your data.

Filling a 3D Array and Plotting the Values

I would like to write a code in Python that evaluates the time evolution of a density distribution, p(x,y). The initial conditions is p(t=0,x,y)=exp[-((x-500)^2)/500] and the formula for the solution is in the code below: t-time index, i-space index (x-direction), j-space index (y-direction), and v=0.8
My goal is to run the scheme for 10 iterations and plot the results at the final time step (t=9). What I'm getting is a big array just filled with zeros. I think it's because I am not using the 3D arrays correctly, does anyone have any suggestions? Thank you.
My attempt:
import numpy as np
import matplotlib.pyplot as plt
#Input Parameters
Nx = 1000 #number of grid points in x-direction
Ny = 500 #number of grid points in y-direction
T = 10 #number of time steps
v = 0.8
p = np.zeros((T,Nx,Ny))
P = np.zeros((T,Nx,Ny))
for t in range(0,T-1):
for i in range(0,Nx-1):
for j in range(0,Ny-1):
P[t,i,j] = p[t,i,j]-((v/2)*(p[t,i+1,j]-p[t,i,j]))
p[0,i,j] = np.exp(((-1*(i-500))**2)/500)
x = P[9,i]
y = P[9,j]
print(x)
plt.plot(x,y)
plt.xlim([0,1000])
plt.ylim([0,500])
plt.xlabel('x-direction')
plt.ylabel('y-direction')
plt.title("Density Distribution After 10 Iterations")

Looks like you only fill the values for t in range(0,T-1) which stops at T=8, and you are trying to get x = P[9,i]. They never get filled so obviously they are all 0.
Try to use range(0, T), it will loop over 0,1,2,...,T-1. Also change range(0,Nx), range(0,Ny)

find peaks location in a spectrum numpy

I have a TOF spectrum and I would like to implement an algorithm using python (numpy) that finds all the maxima of the spectrum and returns the corresponding x values.
I have looked up online and I found the algorithm reported below.
The assumption here is that near the maximum the difference between the value before and the value at the maximum is bigger than a number DELTA. The problem is that my spectrum is composed of points equally distributed, even near the maximum, so that DELTA is never exceeded and the function peakdet returns an empty array.
Do you have any idea how to overcome this problem? I would really appreciate comments to understand better the code since I am quite new in python.
Thanks!
import sys
from numpy import NaN, Inf, arange, isscalar, asarray, array
def peakdet(v, delta, x = None):
maxtab = []
mintab = []
if x is None:
x = arange(len(v))
v = asarray(v)
if len(v) != len(x):
sys.exit('Input vectors v and x must have same length')
if not isscalar(delta):
sys.exit('Input argument delta must be a scalar')
if delta <= 0:
sys.exit('Input argument delta must be positive')
mn, mx = Inf, -Inf
mnpos, mxpos = NaN, NaN
lookformax = True
for i in arange(len(v)):
this = v[i]
if this > mx:
mx = this
mxpos = x[i]
if this < mn:
mn = this
mnpos = x[i]
if lookformax:
if this < mx-delta:
maxtab.append((mxpos, mx))
mn = this
mnpos = x[i]
lookformax = False
else:
if this > mn+delta:
mintab.append((mnpos, mn))
mx = this
mxpos = x[i]
lookformax = True
return array(maxtab), array(mintab)
Below is shown part of the spectrum. I actually have more peaks than those shown here.

This, I think could work as a starting point. I'm not a signal-processing expert, but I tried this on a generated signal Y that looks quite like yours and one with much more noise:
from scipy.signal import convolve
import numpy as np
from matplotlib import pyplot as plt
#Obtaining derivative
kernel = [1, 0, -1]
dY = convolve(Y, kernel, 'valid')
#Checking for sign-flipping
S = np.sign(dY)
ddS = convolve(S, kernel, 'valid')
#These candidates are basically all negative slope positions
#Add one since using 'valid' shrinks the arrays
candidates = np.where(dY < 0)[0] + (len(kernel) - 1)
#Here they are filtered on actually being the final such position in a run of
#negative slopes
peaks = sorted(set(candidates).intersection(np.where(ddS == 2)[0] + 1))
plt.plot(Y)
#If you need a simple filter on peak size you could use:
alpha = -0.0025
peaks = np.array(peaks)[Y[peaks] < alpha]
plt.scatter(peaks, Y[peaks], marker='x', color='g', s=40)
The sample outcomes:
For the noisy one, I filtered peaks with alpha:
If the alpha needs more sophistication you could try dynamically setting alpha from the peaks discovered using e.g. assumptions about them being a mixed gaussian (my favourite being the Otsu threshold, exists in cv and skimage) or some sort of clustering (k-means could work).
And for reference, this I used to generate the signal:
Y = np.zeros(1000)
def peaker(Y, alpha=0.01, df=2, loc=-0.005, size=-.0015, threshold=0.001, decay=0.5):
peaking = False
for i, v in enumerate(Y):
if not peaking:
peaking = np.random.random() < alpha
if peaking:
Y[i] = loc + size * np.random.chisquare(df=2)
continue
elif Y[i - 1] < threshold:
peaking = False
if i > 0:
Y[i] = Y[i - 1] * decay
peaker(Y)
EDIT: Support for degrading base-line
I simulated a slanting base-line by doing this:
Z = np.log2(np.arange(Y.size) + 100) * 0.001
Y = Y + Z[::-1] - Z[-1]
Then to detect with a fixed alpha (note that I changed sign on alpha):
from scipy.signal import medfilt
alpha = 0.0025
Ybase = medfilt(Y, 51) # 51 should be large in comparison to your peak X-axis lengths and an odd number.
peaks = np.array(peaks)[Ybase[peaks] - Y[peaks] > alpha]
Resulting in the following outcome (the base-line is plotted as dashed black line):
EDIT 2: Simplification and a comment
I simplified the code to use one kernel for both convolves as #skymandr commented. This also removed the magic number in adjusting the shrinkage so that any size of the kernel should do.
For the choice of "valid" as option to convolve. It would probably have worked just as well with "same", but I choose "valid" so I didn't have to think about the edge-conditions and if the algorithm could detect spurios peaks there.

As of SciPy version 1.1, you can also use find_peaks:
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
np.random.seed(0)
Y = np.zeros(1000)
# insert #deinonychusaur's peaker function here
peaker(Y)
# make data noisy
Y = Y + 10e-4 * np.random.randn(len(Y))
# find_peaks gets the maxima, so we multiply our signal by -1
Y *= -1
# get the actual peaks
peaks, _ = find_peaks(Y, height=0.002)
# multiply back for plotting purposes
Y *= -1
plt.plot(Y)
plt.plot(peaks, Y[peaks], "x")
plt.show()
This will plot (note that we use height=0.002 which will only find peaks higher than 0.002):
In addition to height, we can also set the minimal distance between two peaks. If you use distance=100, the plot then looks as follows:
You can use
peaks, _ = find_peaks(Y, height=0.002, distance=100)
in the code above.

After looking at the answers and suggestions I decided to offer a solution I often use because it is straightforward and easier to tweak.
It uses a sliding window and counts how many times a local peak appears as a maximum as window shifts along the x-axis. As #DrV suggested, no universal definition of "local maximum" exists, meaning that some tuning parameters are unavoidable. This function uses "window size" and "frequency" to fine tune the outcome. Window size is measured in number of data points of independent variable (x) and frequency counts how sensitive should peak detection be (also expressed as a number of data points; lower values of frequency produce more peaks and vice versa). The main function is here:
def peak_finder(x0, y0, window_size, peak_threshold):
# extend x, y using window size
y = numpy.concatenate([y0, numpy.repeat(y0[-1], window_size)])
x = numpy.concatenate([x0, numpy.arange(x0[-1], x0[-1]+window_size)])
local_max = numpy.zeros(len(x0))
for ii in range(len(x0)):
local_max[ii] = x[y[ii:(ii + window_size)].argmax() + ii]
u, c = numpy.unique(local_max, return_counts=True)
i_return = numpy.where(c>=peak_threshold)[0]
return(list(zip(u[i_return], c[i_return])))
along with a snippet used to produce the figure shown below:
import numpy
from matplotlib import pyplot
def plot_case(axx, w_f):
p = peak_finder(numpy.arange(0, len(Y)), -Y, w_f[0], w_f[1])
r = .9*min(Y)/10
axx.plot(Y)
for ip in p:
axx.text(ip[0], r + Y[int(ip[0])], int(ip[0]),
rotation=90, horizontalalignment='center')
yL = pyplot.gca().get_ylim()
axx.set_ylim([1.15*min(Y), yL[1]])
axx.set_xlim([-50, 1100])
axx.set_title(f'window: {w_f[0]}, count: {w_f[1]}', loc='left', fontsize=10)
return(None)
window_frequency = {1:(15, 15), 2:(100, 100), 3:(100, 5)}
f, ax = pyplot.subplots(1, 3, sharey='row', figsize=(9, 4),
gridspec_kw = {'hspace':0, 'wspace':0, 'left':.08,
'right':.99, 'top':.93, 'bottom':.06})
for k, v in window_frequency.items():
plot_case(ax[k-1], v)
pyplot.show()
Three cases show parameter values that render (from left to right panel):
(1) too many, (2) too few, and (3) an intermediate amount of peaks.
To generate Y data, I used the function #deinonychusaur gave above, and added some noise to it from #Cleb's answer.
I hope some might find this useful, but it's efficiency primarily depends on actual peak shapes and distances.

Finding a minimum or a maximum is not that simple, because there is no universal definition for "local maximum".
Your code seems to look for a miximum and then accept it as a maximum if the signal falls after the maximum below the maximum minus some delta value. After that it starts to look for a minimum with similar criteria. It does not really matter if your data falls or rises slowly, as the maximum is recorded when it is reached and appended to the list of maxima once the level fallse below the hysteresis threshold.
This is a possible way to find local minima and maxima, but it has several shortcomings. One of them is that the method is not symmetric, i.e. if the same data is run backwards, the results are not necessarily the same.
Unfortunately, I cannot help much more, because the correct method really depends on the data you are looking at, its shape and its noisiness. If you have some samples, then we might be able to come up with some suggestions.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: plotting across time with a single column file - python

Related

Interpolate: spectra (wavelength, counts) at a given temperature, to create grid of temperature and counts

Filter frequency using python

Xarray: Loading several CSV files into a dataset

Filling a 3D Array and Plotting the Values

find peaks location in a spectrum numpy

Categories

Resources