I have a spectrum (of an oil sample) as a 2D array in a cvs file that i want to find the peaks for in wavelengths 600 - 1800 cm-1. I've tried the scipy.signal.find_peaks but that takes a 1D array and I have a 2D array with the wavelengths and corresponding peak values.
any help would be appreactiated since im very beginner at python
Edit: I also tried doing the following:
from detecta import detect_peaks
ind = detect_peaks(df)
where df is the name of my array (which has two columns) and an error pops up: ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)
scipy.signal.find_peaks() only takes a one-dimensional array containing the peaks. So you should be able to just select the column in your DataFrame with the peaks as so:
# note that find_peaks returns an array of peak indices, and a dictionary of properties
ind, properties = scipy.signal.find_peaks(df["name of column with peaks"])
Then if you only want the peaks, select the rows using the ind array you just created:
peak_df = df[df.index.isin(ind)]
Related
I get confused by this example.
A = np.random.random((6, 4, 5))
A
A.min(axis=0)
A.min(axis=1)
A.min(axis=2)
What mins are we really computing here?
I know I can think of this array as a 6x5x4 Parallelepiped in 3D space and I know A.min(axis=0) means we go along the 0-th axis. OK, but as we go along that 0-th axis all we get is 6 "layers" which are basically rectangles of size 4x5 filled with numbers. So what min am I computing when saying A.min(axis=0) for example?!?! I am just trying to visualize it in my head.
From A.min(axis=0) I get back a 4x5 2D matrix. Why? Shouldn't I get just 6 values in a 1D array. I am walking along the 0-th axis so shouldn't I get 6 values back - one value for each of these 4x5 rectangles?
I always find this notation confusing and just don't get it, sorry.
You calculate the min across one particular axis when you are interested in maintaining the structure of the remainder axes.
The gif below may help to understand.
In this example, your result will have shape (3, 2).
That's because you are getting the smallest value along axis 0, which squeezes that dimension into only 1 value, so we don't need the dimension anymore.
Case description
I have a set of spectral maps (intensity dependent on time and frequency) for a set of detectors which can be fit into a 3D array BlockDataset of size M x N x K (here: M = number of frequencies, N number of time steps and K is the number of detectors).
The M frequencies are log-spaced and the K detectors are normally indexed by a tuple consiting of 2 angles but for brevity I'm considering only one angle. The N time values are equidistant.
Creating a HoloViews dataset from BlockDataset with appropriate value arrays for all of the dimensions is possible, but requires me to switch from a simple hv.Image display to a hv.QuadMesh display.
Problem description
If the dataset is created with actual angle values, instead of just detector numbers, a conversion to a HoloMap fails with the following error:
DataError: The shape of the intensity value array does not match the expected dimensionality indicated by the key dimensions. Expected 2-D array, found 3-D array.
If detector numbers (integers) are used instead of angles (floating point numbers) there's no problem.
Code
timeDim = hv.Dimension("time", label="Time", unit="sec", values=times)
freqDim = hv.Dimension("frequency", label = "Angular frequency", unit="$\\frac{rad}{s}", values=omega)
polarAngleDim = hv.Dimension("angle", label=" $\varphi$", unit="rad", values=angles[:,0])
intensityDim = hv.Dimension("intensity", label="Intensity $\\frac{d^2 W}{d\Omega d\omega}(t,\vartheta,\varphi)", unit="J/(s srad)")
hvDatasetNatural = hv.Dataset((times, angles[:,0], omega, BlockDataset.transpose()), [timeDim, polarAngleDim, freqDim], intensityDim)
subset = hvDatasetNatural.select( angle=list(angles[selectedIndices,0]) )
img = subset.to( new_type=hv.QuadMesh, kdims=[timeDim, freqDim])
The selection of a subset appears to work properly, but neither the conversion of the subset, nor of the entire dataset to QuadMesh works.
Note again: times are lin-spaced float values, angles are nonlinearly spaced floats and omega are log-spaced float values.
Query
What may be the problem here? I.e., why doesn't .to() work on the dataset when 2 of the 3 dimensions are non-equidistant, non-integer values but it works well if only omega is kept non-equidistant?
I can construct a QuadMesh for a specific angle using hv.QuadMesh( (...), kdim=[..]) and hence essentially unwrapping the original object.
(an extra) Why does an aggregation along the, e.g., time dimension using subset.reduce(timeDim, np.sum) work, but subset.reduce(timeDim, np.trapz) fails with:
DataError: None of the available storage backends were able to support the supplied data format. GridInterface raised following error:
GridInterface interface requires at least one value dimension.
Wee know, in axis parameter 0,1 means column and row wise maximum element index but
for 2,3 & so on what it indicates? An example code is given here. What is the output significance in this code?
When you have an array of higher dimensions you will also have new axes. For example, in a dimension 3 array (e.g. a cube) you will have 3 axes (row, column, depth).
When you pass the axis in the np.argmax you are telling numpy along which axis you want the maximum argument. 3 will throw an error because your array only has 3 axes (0, 1, 2).
Here is an article about numpy arrays axes.
I am trying to do spatial rainfall correlations between rainfall time series and the SST. My code is as follows
#'OND_rainfall_index_list' is a list of 27 (1990-2016) values of spatially averaged rainfall in eastern Africa
#trial_x is a (27,48,80) multidimensional array of SST (time, lat,lon) in the Indian Ocean
corr = np.zeros((27,48,80))
corr.shape
for m in range(48):
for n in range(80):
corr[m,n]=stats.pearsonr(OND_rainfall_index_list, trial_x[:,m,n])[0]
OND_rainfall_index_list is a list of 27 values of spatially averaged rainfall in eastern Africa whereas trial_x is a multidimensional array of SST.
I am trying to initialize a matrix which I can then fill with correlation values. I would like to end up with a matrix of the same size as trial_x but with correlation values instead. How can I go about this? At the moment I get the following error when I run the loop.
> IndexError: index 48 is out of bounds for axis 1 with size 48
Your matrix corr has the shape (27,48,80). Which is a 3D array.
However, you have the line:
corr[m,n]=stats.pearsonr(OND_rainfall_index_list, trial_x[:,m,n])[0]
What do you expect corr[m, n] to relate to? The 1st and 2nd dimension or the 2nd and 3rd?
I expect the 2nd and 3rd.
So, if the output of this:
stats.pearsonr(OND_rainfall_index_list, trial_x[:,m,n])[0]
has a size of (27,), then use:
corr[:, m,n]=stats.pearsonr(OND_rainfall_index_list, trial_x[:,m,n])[0]
but if it has a size of (1,), then use:
corr = np.zeros((48,80))
A 3D label map is matrix in which every pixel (voxel) has an integer label. These values are expected to be contiguous, meaning that a segment with label k will not be fragmented.
Given such label map (segmentation), what is the fastest way to obtain the coordinates of a minimum bounding box around each segment, in Python?
I have tried the following:
Iterate through the matrix using multiindex iterator (from numpy.nditer) and construct a reverse index dictionary. This means that for every label you get the 3 coordinates of every voxel where the label is present.
For every label get the max and min of each coordinate.
The good thing is that you get all the location information in one O(N) pass. The bad thing is that I dont need this detailed information. I just need the extremities, so there might be a faster way to do this, using some numpy functions which are faster than so many list appends. Any suggestions?
The one pass through the matrix takes about 8 seconds on my machine, so it would be great to get rid of it. To give an idea of the data, there are a few hundred labels in a label map. Sizes of the label map can be 700x300x30 or 300x300x200 or something similar.
Edit: Now storing only updated max and min per coordinate for every label. This removes the need to maintain and store all these large lists (append).
If I understood your problem correctly, you have groups of voxels, and you would like to have the extremes of a group in each axis.
Let'd define:
arr: 3D array of integer labels
labels: list of labels (integers 0..labmax)
The code:
import numpy as np
# number of highest label:
labmax = np.max(labels)
# maximum and minimum positions along each axis (initialized to very low and high values)
b_first = np.iinfo('int32').min * np.ones((3, labmax + 1), dtype='int32')
b_last = np.iinfo('int32').max * np.ones((3, labmax + 1), dtype='int32')
# run through all of the dimensions making 2D slices and marking all existing labels to b
for dim in range(3):
# create a generic slice object to make the slices
sl = [slice(None), slice(None), slice(None)]
bf = b_first[dim]
bl = b_last[dim]
# go through all slices in this dimension
for k in range(arr.shape[dim]):
# create the slice object
sl[dim] = k
# update the last "seen" vector
bl[arr[sl].flatten()] = k
# if we have smaller values in "last" than in "first", update
bf[:] = np.clip(bf, None, bl)
After this operation we have six vectors giving the smallest and largest indices for each axis. For example, the bounding values along second axis of label 13 are b_first[1][13] and b_last[1][13]. If some label is missing, all corresponding b_first and b_last will be the maximum int32 value.
I tried this with my computer, and for a (300,300,200) array it takes approximately 1 sec to find the values.