Is there a numpy alternative to this for loop problem? - python

I have 3 arrays of the same length:
import numpy as np
weights = np.array([10, 14, 18, 22, 26, 30, 32, 34, 36, 38, 40])
resistances = np.array([15, 16.5, 18, 19.5, 21, 24, 27, 30, 33, 36, 39])
depths = np.array([0,1,2,3,4,5,6,7,8,9,10])
I want to take each item in weights, then find the nearest match that is >= this item in resistances, and then using the index of this nearest match I want to return the corresponding value from depths i.e. depths[index].
BUT, with the additional condition that if nothing is >= the max value in weights then just return last value in depths. I then want to populate a list with the results.
Is there a better way than the for loop approach below? I would like to avoid the loop.
SWP = []
for w in weights:
if len(depths[w<=resistances]) == 0:
swp=depths[-1]
else:
swp = np.min(depths[w<=resistances])
SWP.append(swp)
SWP

You can .clip
the indices that np.searchsorted produces with len(resistances)-1:
depths[
np.searchsorted(resistances, weights).clip(max=len(resistances)-1)
]
So any index larger than the last one - will become the last one.
Alternative idea (but only if your resistances are sorted) - clip the weights with the maximum of resistances:
depths[
np.searchsorted(resistances, weights.clip(max=resistances.max()))
]

Usually to do what you're talking about you want to create a function that can be mapped over a list.
import numpy as np
weights = np.array([10, 14, 18, 22, 26, 30, 32, 34, 36, 38, 40])
resistances = np.array([15, 16.5, 18, 19.5, 21, 24, 27, 30, 33, 36, 39])
depth = np.array([0,1,2,3,4,5,6,7,8,9,10])
def evaluate_weight(w):
depths = depth[resistances<=w]
return np.max(depths) if len(depths) else 0
SWP = list(map(evaluate_weight, weights))

Related

Numpy: given a set of ranges, is there an efficient way to find the set of ranges that are disjoint with all other ranges?

Is there an elegant way to find the set of disjoint ranges from a set of ranges in numpy?
ranges = [[0,3], [2,4],[5,10]] # there are about 50 000 elements
disjoint_ranges = [] # these are all disjoint
adjoint_ranges = [] # these do not all have to be mutually adjoint
for index, range_1 in enumerate(ranges):
i, j = range_1 # all ranges are ordered s.t. i<j
for swap_2 in ranges[index+1:]: # the list of ranges is ordered by increasing i
a, b, _ = swap_2
if a<j and a>i:
adjoint_swaps.append(swap)
adjoint_swaps.append(swap_2)
else:
if swap not in adjoint_swaps:
swaps_to_do.append(swap)
print(adjoint_swaps)
print(swaps_to_do)
Looping on numpy array kinda defeats the purpose of using numpy. You can detect disjoint ranges by leveraging the accumulate method.
With your ranges sorted in order of their lower bound, you can accumulate the maximum of the upper bounds to determine the coverage of previous ranges over subsequent ones. Then compare the lower bound of each range to the reach of the previous ones to know if there is a forward overlap. Then you only need to compare the upper bound of each range with the next one's lower bound to detect backward overlaps. The combination of forward and backward overlaps will allow you to flag all overlapping ranges and, by elimination, find the ones that are completely disjoint from others:
import numpy as np
ranges = np.array( [ [1,8], [10,15], [2,5], [18,24], [7,10] ] )
ranges.sort(axis=0)
overlaps = np.zeros(ranges.shape[0],dtype=np.bool)
overlaps[1:] = ranges[1:,0] < np.maximum.accumulate(ranges[:-1,1])
overlaps[:-1] |= ranges[1:,0] < ranges[:-1,1]
disjoints = ranges[overlaps==False]
print(disjoints)
[[10 15]
[18 24]]
I'm not sure with numpy but there is the following with pandas:
from functools import reduce
import pandas as pd
ranges = [
pd.RangeIndex(10, 20),
pd.RangeIndex(15, 25),
pd.RangeIndex(30, 50),
pd.RangeIndex(40, 60),
]
disjoints = reduce(lambda x, y : x.symmetric_difference(y), ranges)
disjoints
Int64Index([10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34, 35, 36,
37, 38, 39, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
dtype='int64')

How to detect the peak Values using Python SciPy, getting index error"arrays used as indices must be of integer (or boolean) type"

I have the speed data in that I need to detect the values where threshold is greater than 20 and valley greater than 0. I used this code for peak detection but I am getting index error
import numpy as np
from scipy.signal import find_peaks, find_peaks_cwt
import matplotlib.pyplot as plt
import pandas as pd
import sys
np.set_printoptions(threshold=sys.maxsize)
zero_locs = np.where(x==0)
search_lims = np.append(zero_locs, len(x)) # limits for search area
diff_x = np.diff(x)
diff_x_mapped = diff_x > 0
peak_locs = []
x = np.array([1, 9, 18, 24, 26, 5, 26, 25, 26, 16, 20, 16, 23, 5, 1, 27,
22, 26, 27, 26, 25, 24, 25, 26, 3, 25, 26, 24, 23, 12, 22, 11, 15, 24, 11,
26, 26, 26, 24, 25, 24, 24, 22, 22, 22, 23, 24])
for i in range(len(search_lims)-1):
peak_loc = search_lims[i] + np.where(diff_x_mapped[search_lims[i]:search_lims[i+1]]==0)[0][0]
if x[peak_loc] > 20:
peak_locs.append(peak_loc)
fig= plt.figure(figsize=(10,4))
plt.plot(x)
plt.plot(np.array(peak_locs), x[np.array(peak_locs)], "x", color = 'r')
I tried using peak detection algorithm where it is not detecting peaks where the peak value is above 20 i need to detect the peaks where x values is 0 and peak values is 20
expected output: the marked peaks has to be detected
by running the above script i am getting this error
IndexError: arrays used as indices must be of integer (or boolean) type
how to get ride of this error any suggestions thanks in regards
You found no peaks.
That is, len(peak_locs) is zero.
So you wind up with this array, whose type defaulted to float:
>>> np.array(peak_locs)
array([], dtype=float64)
To fix it?
Find more peaks!

Percent from a list

I have a List with points from a Test
points= [0,0,0,0,0,0,8,8,8,9,10,11,11,12,12,13,14,14,15,15,16,
16,17,17, 18, 19,21,21,23, 23, 24, 24, 24, 25,25, 25,
26, 27, 27, 28, 29, 29, 29, 29, 30, 30, 30, 31,31, 32,
34, 35, 36, 36, 37, 38]
If we assum all participants get full points on the next two tests(in total 80, 40 for each test), which percentage of participants can still attain the mark “A”. The function shall return the percentage, in the mathematical sense so between 0 and 1.
You can get an A if you have more than 88 points.
Thats my code till now and I dunno what to do next
The answer should look likes this:
Potential Top Marks: 89.285714%
Here is a simple solution without numpy:
sum( i +80 >= 88 for i in points )/len(points)*100
This returns (in Python 3):
89.28571428571429
edit: simplified by h4z3s tip.
Use this:
Without numpy:
potential = [p + 80 for p in points]
percentage = sum([1 for i in potential if i>=88]) / float(len(potential)) * 100
89.28571428571429
Using numpy:
import numpy as np
potential = [p + 80 for p in points]
percentage = sum(np.array(potential)>=88) / float(len(potential)) * 100
89.28571428571429

Receiving error message when trying to calculate p_value

Getting a, "TypeError: unsupported operand type(s) for /: 'generator' and 'int'",
Problem is when I try and calculate the p_value, not sure what I am doing wrong. Forgive me if my question is a bit vague
import numpy as np
import random
beer = [27, 19, 20, 20, 23, 17, 21, 24, 31, 26, 28, 20, 27, 19, 25, 31, 24, 28, 24, 29, 21, 21, 18, 27, 20]
water = [21, 19, 13, 22, 15, 22, 15, 22, 20, 12, 24, 24, 21, 19, 18, 16, 23, 20]
#running a permutation test
def permutation_test():
combined = beer + water
random.shuffle(combined)
#slice function to create 2 groups of the same length as the beer test group
split = len(beer)
group_one,group_two = combined[:split], combined[split:] #first25, last25
return np.mean(group_one)-np.mean(group_two)
#monte carlo method to run the permutation test 100 000 times
iterate = [permutation_test() for _ in range(100000)]
#calculating effect size, standard score
effect_size = np.median(beer) - np.median(water)
standard_score = (effect_size - np.mean(iterate))/np.std(iterate)
#calculating p-value to assess whether the observed effect size is an anomaly
p_value = np.mean(test >= effect_size for test in iterate)
print(standard_score, p_value)
Your list comprehension expression is not correctly defined:
Use this to solve the problem:
p_value = np.mean([(test >= effect_size) for test in iterate])

numpy/scipy, loop over subarrays

Lately I've been doing a lot of processing on 8x8 blocks of image-data.
Standard approach has been to use nested for-loops to extract the blocks, e.g.
for y in xrange(0,height,8):
for x in xrange(0,width,8):
d = image_data[y:y+8,x:x+8]
# further processing on the 8x8-block
I can't help to wonder if there is a way to vectorize this operation or another approach using numpy/scipy that I can use instead? An iterator of some kind?
A MWE1:
#!/usr/bin/env python
import sys
import numpy as np
from scipy.fftpack import dct, idct
import scipy.misc
import matplotlib.pyplot as plt
def dctdemo(coeffs=1):
unzig = np.array([
0, 1, 8, 16, 9, 2, 3, 10,
17, 24, 32, 25, 18, 11, 4, 5,
12, 19, 26, 33, 40, 48, 41, 34,
27, 20, 13, 6, 7, 14, 21, 28,
35, 42, 49, 56, 57, 50, 43, 36,
29, 22, 15, 23, 30, 37, 44, 51,
58, 59, 52, 45, 38, 31, 39, 46,
53, 60, 61, 54, 47, 55, 62, 63])
lena = scipy.misc.lena()
width, height = lena.shape
# reconstructed
rec = np.zeros(lena.shape, dtype=np.int64)
# Can this part be vectorized?
for y in xrange(0,height,8):
for x in xrange(0,width,8):
d = lena[y:y+8,x:x+8].astype(np.float)
D = dct(dct(d.T, norm='ortho').T, norm='ortho').reshape(64)
Q = np.zeros(64, dtype=np.float)
Q[unzig[:coeffs]] = D[unzig[:coeffs]]
Q = Q.reshape([8,8])
q = np.round(idct(idct(Q.T, norm='ortho').T, norm='ortho'))
rec[y:y+8,x:x+8] = q.astype(np.int64)
plt.imshow(rec, cmap='gray')
plt.show()
if __name__ == '__main__':
try:
c = int(sys.argv[1])
except ValueError:
sys.exit()
else:
if 1 <= int(sys.argv[1]) <= 64:
dctdemo(int(sys.argv[1]))
Footnotes:
Actual application: https://github.com/figgis/dctdemo
There's a function view_as_windows for this in Scikit Image
http://scikit-image.org/docs/dev/api/skimage.util.html#view-as-windows
Unfortunately I will have to finish this answer another time, but you can grab the windows in a form that you can pass to dct with:
from skimage.util import view_as_windows
# your code...
d = view_as_windows(lena.astype(np.float), (8, 8)).reshape(-1, 8, 8)
dct(d, axis=0)
There is a function called extract_patches in the scikit-learn feature extraction routines. You need to specify a patch_size and an extraction_step. The result will be a view on your image as patches, which may overlap. The resulting array is 4D, the first 2 index the patch, and the last two index the pixels of the patch. Try this
from sklearn.feature_extraction.image import extract_patches
patches = extract_patches(image_data, patch_size=(8, 8), extraction_step=(4, 4))
This gives (8, 8) size patches that overlap by half.
Note that up until now this uses no extra memory, because it is implemented using stride tricks. You can force a copy by reshaping
patches = patches.reshape(-1, 8, 8)
which will basically yield a list of patches.

Categories

Resources