find max value in islands defined by other vector

find max value in islands defined by other vector - python

I have a vector of values vals, a same-dimension vector of frequencies freqs, and a set of frequency values pins.
I need to find the max values of vals within the corresponding interval around each pin (from pin-1 to pin+1). However, the intervals merge if they overlap (e.g., [1,2] and [0.5,1.5] become [0.5,2]).
I have a code that (I think) works, but I feel is not optimal at all:
import numpy as np
np.random.seed(666)
freqs = np.linspace(0, 20, 50)
vals = np.random.randint(100, size=(len(freqs), 1)).flatten()
print(freqs)
print(vals)
pins = [2, 6, 10, 11, 15, 15.2]
# find one interval for every pin and then sum to find final ones
islands = np.zeros((len(freqs), 1)).flatten()
for pin in pins:
island = np.zeros((len(freqs), 1)).flatten()
island[(freqs >= pin-1) * (freqs <= pin+1)] = 1
islands += island
islands = np.array([1 if x>0 else 0 for x in islands])
print(islands)
maxs = []
k = 0
idxs = []
for i,x in enumerate(islands):
if (x > 0) and (k == 0): # island begins
k += 1
idxs.append(i)
elif (x > 0) and (k > 0): # island continues
pass
elif (x == 0) and (k > 0): # island finishes
idxs.append(i)
maxs.append(np.max(vals[idxs[0]:idxs[1]]))
k = 0
idxs = []
continue
print(maxs)
Which gives maxs=[73, 97, 79, 77].

Here's some optimizations for your code. There are many numpy functions that make your life easier, get to know them and use them ;). I tried commenting my code to make it as understandable as possible, but let me know if anything is unclear!
import numpy as np
np.random.seed(666)
freqs = np.linspace(0, 20, 50)
vals = np.random.randint(100, size=(len(freqs), 1)).flatten()
print(freqs)
print(vals)
pins = [2, 6, 10, 11, 15, 15.2]
# find one interval for every pin and then sum to find final ones
islands = np.zeros_like(freqs) # in stead of: np.zeros((len(freqs), 1)).flatten()
for pin in pins:
island = np.zeros_like(freqs) # see above comment
island[(freqs >= pin-1) & (freqs <= pin+1)] = 1 # "&" makes it more readable
islands += island
# in stead of np.array([1 if x>0 else 0 for x in islands])
islands = np.where(islands > 0, 1, 0) # read as: where "islands > 0" put a '1', else put a '0'
# compare each value with the next to get island/sea transistions (islands are 1's seas are 0's)
island_edges = islands[:-1] != islands[1:]
# split at the edges (+1 to account for starting at the 1 index with comparisons
# islands_and_seas is a list of 'seas' and 'islands'
islands_and_seas = np.split(islands, np.where(island_edges)[0]+1)
# do the same as above but on the 'vals' array
islands_and_seas_vals = np.split(vals, np.where(island_edges)[0]+1)
# get the max values for the seas and islands
max_vals = np.array([np.max(arr) for arr in islands_and_seas_vals])
# create an array where the islands -> True, and seas -> False
islands_and_seas_bool = [np.all(arr) for arr in islands_and_seas]
# select only the max values of islands with
maxs = max_vals[islands_and_seas_bool]
print(maxs)

Related

Repeating a function 1000 times and saving each iteration result in a list

I have tried to simulate some event-onsets and predictors for an experiment. I have two predictors (circles and squares). The stimuli ('events') take 1 second and the ISI (interstimulus interval) is 8 seconds. I am also interested in both contrasts against baseline (circles against baseline; squares against baseline). In the end, I am trying to run the function that I have defined (simulate_data_fixed, n=420 is a paramater that is fixed) for 1000, at each iteration I would like to calculate an efficiency score in the end and store the efficiency scores in a list.
def simulate_data_fixed_ISI(N=420):
dg_hrf = glover_hrf(tr=1, oversampling=1)
# Create indices in regularly spaced intervals (9 seconds, i.e. 1 sec stim + 8 ISI)
stim_onsets = np.arange(10, N - 15, 9)
stimcodes = np.repeat([1, 2], stim_onsets.size / 2) # create codes for two conditions
np.random.shuffle(stimcodes) # random shuffle
stim = np.zeros((N, 1))
c = np.array([[0, 1, 0], [0, 0, 1]])
# Fill stim array with codes at onsets
for i, stim_onset in enumerate(stim_onsets):
stim[stim_onset] = 1 if stimcodes[i] == 1 else 2
stims_A = (stim == 1).astype(int)
stims_B = (stim == 2).astype(int)
reg_A = np.convolve(stims_A.squeeze(), dg_hrf)[:N]
reg_B = np.convolve(stims_B.squeeze(), dg_hrf)[:N]
X = np.hstack((np.ones((reg_B.size, 1)), reg_A[:, np.newaxis], reg_B[:, np.newaxis]))
dvars = [(c[i, :].dot(np.linalg.inv(X.T.dot(X))).dot(c[i, :].T))
for i in range(c.shape[0])]
eff = c.shape[0] / np.sum(dvars)
return eff
However, I want to run this entire chunk 1000 times and store the 'eff' in an array, etc. so that later on I want to display them as a histogram. How ı can do this?

If I understand you correctly you should be able just to run
EFF = [simulate_data_fixed_ISI() for i in range(1000)] #1000 repeats
As #theonlygusti clarified, this line, EFF, runs your function simulate_data_fixed_ISI() 1000 times and put each return in the array EFF
Test
import numpy as np
def simulate_data_fixed_ISI(n=1):
"""
Returns 'n' random numbers
"""
return np.random.rand(n)
EFF = [simulate_data_fixed_ISI() for i in range(5)]
EFF
#[array([0.19585137]),
# array([0.91692933]),
# array([0.49294667]),
# array([0.79751017]),
# array([0.58294512])]

Your question seems to boil down to:
I am trying to run the function that I have defined for 1000, at each iteration I would like to calculate an efficiency score in the end and store the efficiency scores in a list
I guess "the function that I have defined" is the simulate_data_fixed_ISI in your question?
Then you can simply run it 1000 times using a basic for loop, and add the results into a list:
def simulate_data_fixed_ISI(N=420):
dg_hrf = glover_hrf(tr=1, oversampling=1)
# Create indices in regularly spaced intervals (9 seconds, i.e. 1 sec stim + 8 ISI)
stim_onsets = np.arange(10, N - 15, 9)
stimcodes = np.repeat([1, 2], stim_onsets.size / 2) # create codes for two conditions
np.random.shuffle(stimcodes) # random shuffle
stim = np.zeros((N, 1))
c = np.array([[0, 1, 0], [0, 0, 1]])
# Fill stim array with codes at onsets
for i, stim_onset in enumerate(stim_onsets):
stim[stim_onset] = 1 if stimcodes[i] == 1 else 2
stims_A = (stim == 1).astype(int)
stims_B = (stim == 2).astype(int)
reg_A = np.convolve(stims_A.squeeze(), dg_hrf)[:N]
reg_B = np.convolve(stims_B.squeeze(), dg_hrf)[:N]
X = np.hstack((np.ones((reg_B.size, 1)), reg_A[:, np.newaxis], reg_B[:, np.newaxis]))
dvars = [(c[i, :].dot(np.linalg.inv(X.T.dot(X))).dot(c[i, :].T))
for i in range(c.shape[0])]
eff = c.shape[0] / np.sum(dvars)
return eff
eff_results = []
for _ in range(1000):
efficiency_score = simulate_data_fixed_ISI()
eff_results.append(efficiency_score)
Now eff_results contains 1000 entries, each of which is a call to your function simulate_data_fixed_ISI

Projecting the velocity on the line of sight - EDITED

I have a huge text file which contains the position (x,y,z) and velocity component (vx,vy,vz) of a million stars. After doing some rotations and projections, I obtain new position and velocity component (x',y',z',vx',vy',vz') of the stars.
My final step is to compute the velocity along the line of sight, it's like I have to "average" the vz component, and to do this I try to create a FITS file in which every pixel contain the mean value of the vz component.
Here there's a part of my code:
mod = np.genfromtxt('data_bar_region.txt')
x = list(mod[:,0])
y = list(mod[:,1])
vz = mod[:,5]
x_rang_1 = np.arange(-40, 41, 1)
y_rang_1 = np.arange(-40, 41, 1)
fake_data_1 = np.array((len(x_rang_1),len(x_rang_1)))
for i in range(len(x_rang_1)-1):
for j in range(len(y_rang_1)-1):
vel_tmp = []
for index in range(len(x)):
if x_rang_1[i] <= x[index] <= x_rang_1[i+1]:
if y_rang_1[j] <= y[index] <= y_rang_1[j+1]:
vel_tmp.append(vz[index])
fake_data_1[j,i] = np.mean(vel_tmp)
hdu1 = fits.PrimaryHDU(fake_data_1)
hdu1.writeto('TEST.fits')
This code is too much slow (it took about 8 hours on my laptop) and I don't know how I can speed up.
Did you have some suggestions or other ways to compute the v_LOS in a better and faster way?
EDIT : Before performing the "averaging", I have to divide the image into portions of several shape and dimensions (such portions are called "bins").
Here there's an [image of the bins (on the right panel, there's the same image of bins but it's zoomed to better evidence what are bins)] 1.
So, I have another FITS file (called bins.fits) with the same dimension of fake_data_1, and I just want to find the correspondence between these 2 files, because I want to calculate the mean and the std of the distribution of stars in the several bins.
Alternatively, I have a text file which contains the info on which pixel belongs to a specific bin, for example:
x y bin
1 1 34
1 2 34
1 3 34
. . .
34 56 37
34 57 37
34 58 37
and so on. The bins.file has the size (564,585), and so, also the fake_data_1, changing opportunity the start and stop of x and y range. I attached the whole script:
mod = np.genfromtxt('data_new_bar_scaled.txt')
# to match the correct position and size of the observation,
# I have to multiply by a factor equal to the semi-size
x = mod[:, 0]*(585-1)/200
y = mod[:, 1]*(564-1)/200
vz = mod[:,5]
A = fits.open('NGC4277_TESIkinematic.fits')
bins = A[7].data.T
start_x = -(585-1)/2
stop_x = (585-1)/2
step_x = step # step in x_rang_1
x_rang = np.arange(start_x, stop_x + step_x, step_x)
start_y = -(564-1)/2
stop_y = (564-1)/2
step_y = step # step in y_rang_1
y_rang = np.arange(start_y, stop_y + step_y, step_y)
fake_data_1 = np.empty((len(x_rang), len(y_rang)))
fake_data_1[:] = np.NaN # initialize with NaN
print(fake_data_1.shape)
print(bins.shape)
d = {}
for i in range(len(x)):
index_for_x = math.floor((x[i] - start_x) / step_x)
index_for_y = math.floor((y[i] - start_y) / step_y)
if 0 <= index_for_x < len(x_rang) and 0 <= index_for_y < len(y_rang):
key = (x_rang[index_for_x], y_rang[index_for_y])
if key in d:
d[key].append(vz[i])
else:
d[key] = [vz[i]]
bb = np.unique(bins)
print(len(bb))
for i, x in enumerate(x_rang):
for j, y in enumerate(y_rang):
key = (x, y)
for z in range(len(bb)):
j,k = np.where(bb[z]==bins)
print('index :', z)
if key in d:
fake_data_1[j,k] = np.mean(d[key])

Your code is so slow since the nested loops in your code iterate over a million of stars 1600 (80*80) times. You can improve the performance by using a dictionary and iterating over a million of stars just once.
You can try the following code, which is about 1600 times faster:
import numpy as np
import math
mod = np.genfromtxt('data_bar_region.txt')
x = list(mod[:, 0])
y = list(mod[:, 1])
vz = mod[:, 5]
x_rang_1 = np.arange(-40, 41, 1)
y_rang_1 = np.arange(-40, 41, 1)
fake_data_1 = np.empty((len(x_rang_1), len(y_rang_1)))
fake_data_1[:] = np.NaN # initialize with NaN
d = {}
for i in range(len(x)):
key = (math.floor(x[i]), math.floor(y[i]))
if key in d:
d[key].append(vz[i])
else:
d[key] = [vz[i]]
for i, x in enumerate(x_rang_1):
for j, y in enumerate(y_rang_1):
key = (x, y)
if key in d:
fake_data_1[i, j] = np.mean(d[key])
hdu1 = fits.PrimaryHDU(fake_data_1)
hdu1.writeto('TEST.fits')
UPDATE
For a generalized version for step in x_rang_1 (or y_rang_1), you can try the following code:
import numpy as np
import math
mod = np.genfromtxt('data_bar_region.txt')
x = list(mod[:, 0])
y = list(mod[:, 1])
vz = mod[:, 5]
start_x_rang_1 = -40
stop_x_rang_1 = 40
step_x_rang_1 = 0.5 # step in x_rang_1
x_rang_1 = np.arange(start_x_rang_1, stop_x_rang_1 + step_x_rang_1, step_x_rang_1)
start_y_rang_1 = -40
stop_y_rang_1 = 40
step_y_rang_1 = 1 # step in y_rang_1
y_rang_1 = np.arange(start_y_rang_1, stop_y_rang_1 + step_y_rang_1, step_y_rang_1)
fake_data_1 = np.empty((len(x_rang_1), len(y_rang_1)))
fake_data_1[:] = np.NaN # initialize with NaN
d = {}
for i in range(len(x)):
index_for_x_rang_1 = math.floor((x[i] - start_x_rang_1) / step_x_rang_1)
index_for_y_rang_1 = math.floor((y[i] - start_y_rang_1) / step_y_rang_1)
if 0 <= index_for_x_rang_1 < len(x_rang_1) and 0 <= index_for_y_rang_1 < len(y_rang_1):
key = (x_rang_1[index_for_x_rang_1], y_rang_1[index_for_y_rang_1])
if key in d:
d[key].append(vz[i])
else:
d[key] = [vz[i]]
for i, x in enumerate(x_rang_1):
for j, y in enumerate(y_rang_1):
key = (x, y)
if key in d:
fake_data_1[i, j] = np.mean(d[key])
hdu1 = fits.PrimaryHDU(fake_data_1)
hdu1.writeto('TEST.fits')
UPDATE 2
Maybe like the following?
When I supposed the inputs are
x y vz
0 0.1 10
1.8 0 4
1.2 1.9 5.2
bins = np.array(
[[34, 35, 34, 34, 36],
[37, 36, 34, 35, 36],
[34, 35, 37, 36, 34]]) # shape: (5, 3)
You want the following code?
import numpy as np
import math
x = np.array([0, 1.8, 1.2, ])
y = np.array([0.1, 0, 1.9, ])
vz = np.array([10, 4, 5.2])
start_x_rang_1 = 0
stop_x_rang_1 = 2
step_x_rang_1 = 1 # step in x_rang_1
x_rang_1 = np.arange(start_x_rang_1, stop_x_rang_1 + step_x_rang_1, step_x_rang_1)
start_y_rang_1 = 0
stop_y_rang_1 = 0.5
step_y_rang_1 = 2 # step in y_rang_1
y_rang_1 = np.arange(start_y_rang_1, stop_y_rang_1 + step_y_rang_1, step_y_rang_1)
fake_data_1 = np.empty((len(x_rang_1), len(y_rang_1))) # shape: (3, 5)
fake_data_1[:] = np.NaN # initialize with NaN
bins = np.array(
[[34, 35, 34, 34, 36],
[37, 36, 34, 35, 36],
[34, 35, 37, 36, 34]]) # shape: (3, 5)
d_bins = {}
for i in range(len(x)):
index_for_x_rang_1 = math.floor((x[i] - start_x_rang_1) / step_x_rang_1)
index_for_y_rang_1 = math.floor((y[i] - start_y_rang_1) / step_y_rang_1)
if 0 <= index_for_x_rang_1 < len(x_rang_1) and 0 <= index_for_y_rang_1 < len(y_rang_1):
key = bins[index_for_x_rang_1, index_for_y_rang_1]
if key in d_bins:
d_bins[key].append(vz[i])
else:
d_bins[key] = [vz[i]]
d_bins_mean = {}
for bin in d_bins:
d_bins_mean[bin] = np.mean(d_bins[bin])
get_corresponding_mean = np.vectorize(lambda x: d_bins_mean.get(x, np.NaN))
result = get_corresponding_mean(bins)
print(result)
which prints
[[10. nan 10. 10. nan]
[ 4.6 nan 10. nan nan]
[10. nan 4.6 nan 10. ]]

Matlab findpeaks algorithm to Python

I'm trying to apply the findpeaks method offered by Matlab on a Python project in order to achieve the same results.
On Internet, I retrieved many algorithms to find peaks in python but the best source I found out is the following one -> https://github.com/MonsieurV/py-findpeaks
However, this didn't solve my problem.
In Matlab, I have this line of code:
[pks, locs] = findpeaks(a, 'MINPEAKDISTANCE', 72)
Hence, i tried out initially with the method offered by peakutils.indexes, in the following way :
locs= peakutils.indexes(y=a, thres=0, min_dist=72)
for val in locs:
pks.append(a[val])
I am not really sure about 'thres=0' but in matlab the default value of threshold is 0, even if it seems intended in a different way with respect to peakutils.indexes.
The problem is that in the Matlab case I got 6635 peaks while in peakutils.indexes I got 6630 peaks (I am working on the signal 108 from MIT-BIH ARRHYTHMIA DATABASE offered by PhysioNet) . Moreover, some of them are not equals, that is in Matlab maybe one peak is located at 155 while in Python it is located at 158, and this, even if it is a small difference, causes problems in my algorithm.
I am actually working on this version of the pan and tompkins algorithm for ecg signal analysis-> https://it.mathworks.com/matlabcentral/fileexchange/45840-complete-pan-tompkins-implementation-ecg-qrs-detector

some time ago I was facing the same problem and I found this function that worked just fine. It's a Matlab equivalent, try out and let us know if it worked for you. The code is not mine.
# %load ./../functions/detect_peaks.py
"""Detect peaks in data based on their amplitude and other features."""
from __future__ import division, print_function
import numpy as np
__author__ = "Marcos Duarte, https://github.com/demotu/BMC"
__version__ = "1.0.4"
__license__ = "MIT"
def detect_peaks(x, mph=None, mpd=1, threshold=0, edge='rising',
kpsh=False, valley=False, show=False, ax=None):
"""Detect peaks in data based on their amplitude and other features.
Parameters
----------
x : 1D array_like
data.
mph : {None, number}, optional (default = None)
detect peaks that are greater than minimum peak height.
mpd : positive integer, optional (default = 1)
detect peaks that are at least separated by minimum peak distance (in
number of data).
threshold : positive number, optional (default = 0)
detect peaks (valleys) that are greater (smaller) than `threshold`
in relation to their immediate neighbors.
edge : {None, 'rising', 'falling', 'both'}, optional (default = 'rising')
for a flat peak, keep only the rising edge ('rising'), only the
falling edge ('falling'), both edges ('both'), or don't detect a
flat peak (None).
kpsh : bool, optional (default = False)
keep peaks with same height even if they are closer than `mpd`.
valley : bool, optional (default = False)
if True (1), detect valleys (local minima) instead of peaks.
show : bool, optional (default = False)
if True (1), plot data in matplotlib figure.
ax : a matplotlib.axes.Axes instance, optional (default = None).
Returns
-------
ind : 1D array_like
indeces of the peaks in `x`.
Notes
-----
The detection of valleys instead of peaks is performed internally by simply
negating the data: `ind_valleys = detect_peaks(-x)`
The function can handle NaN's
See this IPython Notebook [1]_.
References
----------
.. [1] http://nbviewer.ipython.org/github/demotu/BMC/blob/master/notebooks/DetectPeaks.ipynb
Examples
--------
>>> from detect_peaks import detect_peaks
>>> x = np.random.randn(100)
>>> x[60:81] = np.nan
>>> # detect all peaks and plot data
>>> ind = detect_peaks(x, show=True)
>>> print(ind)
>>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5
>>> # set minimum peak height = 0 and minimum peak distance = 20
>>> detect_peaks(x, mph=0, mpd=20, show=True)
>>> x = [0, 1, 0, 2, 0, 3, 0, 2, 0, 1, 0]
>>> # set minimum peak distance = 2
>>> detect_peaks(x, mpd=2, show=True)
>>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5
>>> # detection of valleys instead of peaks
>>> detect_peaks(x, mph=0, mpd=20, valley=True, show=True)
>>> x = [0, 1, 1, 0, 1, 1, 0]
>>> # detect both edges
>>> detect_peaks(x, edge='both', show=True)
>>> x = [-2, 1, -2, 2, 1, 1, 3, 0]
>>> # set threshold = 2
>>> detect_peaks(x, threshold = 2, show=True)
"""
x = np.atleast_1d(x).astype('float64')
if x.size < 3:
return np.array([], dtype=int)
if valley:
x = -x
# find indices of all peaks
dx = x[1:] - x[:-1]
# handle NaN's
indnan = np.where(np.isnan(x))[0]
if indnan.size:
x[indnan] = np.inf
dx[np.where(np.isnan(dx))[0]] = np.inf
ine, ire, ife = np.array([[], [], []], dtype=int)
if not edge:
ine = np.where((np.hstack((dx, 0)) < 0) & (np.hstack((0, dx)) > 0))[0]
else:
if edge.lower() in ['rising', 'both']:
ire = np.where((np.hstack((dx, 0)) <= 0) & (np.hstack((0, dx)) > 0))[0]
if edge.lower() in ['falling', 'both']:
ife = np.where((np.hstack((dx, 0)) < 0) & (np.hstack((0, dx)) >= 0))[0]
ind = np.unique(np.hstack((ine, ire, ife)))
# handle NaN's
if ind.size and indnan.size:
# NaN's and values close to NaN's cannot be peaks
ind = ind[np.in1d(ind, np.unique(np.hstack((indnan, indnan-1, indnan+1))), invert=True)]
# first and last values of x cannot be peaks
if ind.size and ind[0] == 0:
ind = ind[1:]
if ind.size and ind[-1] == x.size-1:
ind = ind[:-1]
# remove peaks < minimum peak height
if ind.size and mph is not None:
ind = ind[x[ind] >= mph]
# remove peaks - neighbors < threshold
if ind.size and threshold > 0:
dx = np.min(np.vstack([x[ind]-x[ind-1], x[ind]-x[ind+1]]), axis=0)
ind = np.delete(ind, np.where(dx < threshold)[0])
# detect small peaks closer than minimum peak distance
if ind.size and mpd > 1:
ind = ind[np.argsort(x[ind])][::-1] # sort ind by peak height
idel = np.zeros(ind.size, dtype=bool)
for i in range(ind.size):
if not idel[i]:
# keep peaks with the same height if kpsh is True
idel = idel | (ind >= ind[i] - mpd) & (ind <= ind[i] + mpd) \
& (x[ind[i]] > x[ind] if kpsh else True)
idel[i] = 0 # Keep current peak
# remove the small peaks and sort back the indices by their occurrence
ind = np.sort(ind[~idel])
if show:
if indnan.size:
x[indnan] = np.nan
if valley:
x = -x
_plot(x, mph, mpd, threshold, edge, valley, ax, ind)
return ind
def _plot(x, mph, mpd, threshold, edge, valley, ax, ind):
"""Plot results of the detect_peaks function, see its help."""
try:
import matplotlib.pyplot as plt
except ImportError:
print('matplotlib is not available.')
else:
if ax is None:
_, ax = plt.subplots(1, 1, figsize=(8, 4))
ax.plot(x, 'b', lw=1)
if ind.size:
label = 'valley' if valley else 'peak'
label = label + 's' if ind.size > 1 else label
ax.plot(ind, x[ind], '+', mfc=None, mec='r', mew=2, ms=8,
label='%d %s' % (ind.size, label))
ax.legend(loc='best', framealpha=.5, numpoints=1)
ax.set_xlim(-.02*x.size, x.size*1.02-1)
ymin, ymax = x[np.isfinite(x)].min(), x[np.isfinite(x)].max()
yrange = ymax - ymin if ymax > ymin else 1
ax.set_ylim(ymin - 0.1*yrange, ymax + 0.1*yrange)
ax.set_xlabel('Data #', fontsize=14)
ax.set_ylabel('Amplitude', fontsize=14)
mode = 'Valley detection' if valley else 'Peak detection'
ax.set_title("%s (mph=%s, mpd=%d, threshold=%s, edge='%s')"
% (mode, str(mph), mpd, str(threshold), edge))
# plt.grid()
plt.show()

Just pass your data without the for loop. It should find all the picks. The following should work:
peaks = peakutils.indexes(data, thres=10/max(data), min_dist=20)
where data is an array of float64. Maybe try to play with the threshold. You should also make sure the min_dist is smaller than peaks distance.
Good luck.

Efficient double iteration over array

I have the following code, where points is many lines by 3 cols list of lists, coorRadius is a radius within which I want to find the local coordinate maxima, and localCoordinateMaxima is an array where I store the i's of these maxima:
for i,x in enumerate(points):
check = 1
for j,y in enumerate(points):
if linalg.norm(x-y) <= coorRadius and x[2] < y[2]:
check = 0
if check == 1:
localCoordinateMaxima.append(i)
print localCoordinateMaxima
Unfortunately, this takes forever when I have several thousand points, I am looking for a way to speed it up. I tried to do it with if all() condition, however I didn't manage it and I am not even sure it will be more efficient. Could you guys propose a way to make it faster?
Best!

Your search for neighbors is best done using a KDTree.
from scipy.spatial import cKDTree
tree = cKDTree(points)
pairs = tree.query_pairs(coorRadius)
Now pairs is a set of two item tuples (i, j), where i < j and points[i] and points[j] are within coorRadius of each other. You can now simply iterate over these, which will likely be a much smaller set than the len(points)**2 you are currently iterating over:
is_maximum = [True] * len(points)
for i, j in pairs:
if points[i][2] < points[j][2]:
is_maximum[i] = False
elif points[j][2] < points[i][2]:
is_maximum[j] = False
localCoordinateMaxima, = np.nonzero(is_maximum)
This can be further sped up by vectorizing it:
pairs = np.array(list(pairs))
pairs = np.vstack((pairs, pairs[:, ::-1]))
pairs = pairs[np.argsort(pairs[:, 0])]
is_z_smaller = points[pairs[:, 0], 2] < points[pairs[:, 1], 2]
bins, = np.nonzero(pairs[:-1, 0] != pairs[1:, 0])
bins = np.concatenate(([0], bins+1))
is_maximum = np.logical_and.reduceat(is_z_smaller, bins)
localCoordinateMaxima, = np.nonzero(is_maximum)
The above code assumes that every point has at least one neighbor within coorRadius. If that is not the case, you need to slightly complicate things:
pairs = np.array(list(pairs))
pairs = np.vstack((pairs, pairs[:, ::-1]))
pairs = pairs[np.argsort(pairs[:, 0])]
is_z_smaller = points[pairs[:, 0], 2] < points[pairs[:, 1], 2]
bins, = np.nonzero(pairs[:-1, 0] != pairs[1:, 0])
has_neighbors = pairs[np.concatenate(([True], bins)), 0]
bins = np.concatenate(([0], bins+1))
is_maximum = np.ones((len(points),), bool)
is_maximum[has_neighbors] &= np.logical_and.reduceat(is_z_smaller, bins)
localCoordinateMaxima, = np.nonzero(is_maximum)

Here is the version of your code just tightened-up a bit:
for i, x in enumerate(points):
x2 = x[2]
for y in points:
if linalg.norm(x-y) <= coorRadius and x2 < y[2]:
break
else:
localCoordinateMaxima.append(i)
print localCoordinateMaxima
Changes:
Factor-out the x[2] lookup.
The j variable was unused.
Add a break for an early-out
Use a for-else construct instead of a flag variable

With numpy this is not too hard. You can do it with a single (long) expression, if you want:
import numpy as np
points = np.array(points)
localCoordinateMaxima = np.where(np.all((np.linalg.norm(points-points[None,:], axis=-1) >
coorRadius) |
(points[:,2] >= points[:,None,2]),
axis=-1))
The algorithm your current code implements is essentially where(not(any(w <= x and y < z))). If you distribute the not through the logical operations inside of it (using Demorgan's laws), you can avoid one level of nesting by flipping the inequalities, getting where(all(w > x or y >= z))).
w is a matrix of norms applied to the differences of the points broadcast together. x is a constant. y and z are both arrays with the third coordinates of the points, shaped so that they broadcast together into the same shape as w.

Repeating A Function for Many Values and Building a 2D Array

I have a code.
It takes in a value N and does a quantum walk for that many steps and gives an array that shows the probability at each position.
It's quite a complex calculation and N must be a single integer.
What I want to do is repeat this calculation for 100 values of N and build a large 2D array.
Any idea how I would do this?
Here's my code:
N = 100 # number of random steps
P = 2*N+1 # number of positions
#defining a quantum coin
coin0 = array([1, 0]) # |0>
coin1 = array([0, 1]) # |1>
#defining the coin operator
C00 = outer(coin0, coin0) # |0><0|
C01 = outer(coin0, coin1) # |0><1|
C10 = outer(coin1, coin0) # |1><0|
C11 = outer(coin1, coin1) # |1><1|
C_hat = (C00 + C01 + C10 - C11)/sqrt(2.)
#step operator
ShiftPlus = roll(eye(P), 1, axis=0)
ShiftMinus = roll(eye(P), -1, axis=0)
S_hat = kron(ShiftPlus, C00) + kron(ShiftMinus, C11)
#walk operator
U = S_hat.dot(kron(eye(P), C_hat))
#defining the initial state
posn0 = zeros(P)
posn0[N] = 1 # array indexing starts from 0, so index N is the central posn
psi0 = kron(posn0,(coin0+coin1*1j)/sqrt(2.))
#the state after N steps
psiN = linalg.matrix_power(U, N).dot(psi0)
#finidng the probabilty operator
prob = empty(P)
for k in range(P):
posn = zeros(P)
posn[k] = 1
M_hat_k = kron( outer(posn,posn), eye(2))
proj = M_hat_k.dot(psiN)
prob[k] = proj.dot(proj.conjugate()).real
prob[prob==0] = np.nan
nanmask = np.isfinite(prob)
prob_masked=prob[nanmask] #this is the final probability to be plotted
P_masked=arange(P)[nanmask] #these are the possible positions
Rather than writing out the array I get as it is 100 units, this is a graph of the position and probability at N = 100
I eventually want to make a 3D plot of position against N against probability.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

find max value in islands defined by other vector - python

Related

Repeating a function 1000 times and saving each iteration result in a list

Projecting the velocity on the line of sight - EDITED

Matlab findpeaks algorithm to Python

Efficient double iteration over array

Repeating A Function for Many Values and Building a 2D Array

Categories

Resources