numpy/scipy, loop over subarrays - python

Lately I've been doing a lot of processing on 8x8 blocks of image-data.
Standard approach has been to use nested for-loops to extract the blocks, e.g.
for y in xrange(0,height,8):
for x in xrange(0,width,8):
d = image_data[y:y+8,x:x+8]
# further processing on the 8x8-block
I can't help to wonder if there is a way to vectorize this operation or another approach using numpy/scipy that I can use instead? An iterator of some kind?
A MWE1:
#!/usr/bin/env python
import sys
import numpy as np
from scipy.fftpack import dct, idct
import scipy.misc
import matplotlib.pyplot as plt
def dctdemo(coeffs=1):
unzig = np.array([
0, 1, 8, 16, 9, 2, 3, 10,
17, 24, 32, 25, 18, 11, 4, 5,
12, 19, 26, 33, 40, 48, 41, 34,
27, 20, 13, 6, 7, 14, 21, 28,
35, 42, 49, 56, 57, 50, 43, 36,
29, 22, 15, 23, 30, 37, 44, 51,
58, 59, 52, 45, 38, 31, 39, 46,
53, 60, 61, 54, 47, 55, 62, 63])
lena = scipy.misc.lena()
width, height = lena.shape
# reconstructed
rec = np.zeros(lena.shape, dtype=np.int64)
# Can this part be vectorized?
for y in xrange(0,height,8):
for x in xrange(0,width,8):
d = lena[y:y+8,x:x+8].astype(np.float)
D = dct(dct(d.T, norm='ortho').T, norm='ortho').reshape(64)
Q = np.zeros(64, dtype=np.float)
Q[unzig[:coeffs]] = D[unzig[:coeffs]]
Q = Q.reshape([8,8])
q = np.round(idct(idct(Q.T, norm='ortho').T, norm='ortho'))
rec[y:y+8,x:x+8] = q.astype(np.int64)
plt.imshow(rec, cmap='gray')
plt.show()
if __name__ == '__main__':
try:
c = int(sys.argv[1])
except ValueError:
sys.exit()
else:
if 1 <= int(sys.argv[1]) <= 64:
dctdemo(int(sys.argv[1]))
Footnotes:
Actual application: https://github.com/figgis/dctdemo

There's a function view_as_windows for this in Scikit Image
http://scikit-image.org/docs/dev/api/skimage.util.html#view-as-windows
Unfortunately I will have to finish this answer another time, but you can grab the windows in a form that you can pass to dct with:
from skimage.util import view_as_windows
# your code...
d = view_as_windows(lena.astype(np.float), (8, 8)).reshape(-1, 8, 8)
dct(d, axis=0)

There is a function called extract_patches in the scikit-learn feature extraction routines. You need to specify a patch_size and an extraction_step. The result will be a view on your image as patches, which may overlap. The resulting array is 4D, the first 2 index the patch, and the last two index the pixels of the patch. Try this
from sklearn.feature_extraction.image import extract_patches
patches = extract_patches(image_data, patch_size=(8, 8), extraction_step=(4, 4))
This gives (8, 8) size patches that overlap by half.
Note that up until now this uses no extra memory, because it is implemented using stride tricks. You can force a copy by reshaping
patches = patches.reshape(-1, 8, 8)
which will basically yield a list of patches.

Related

How to detect the peak Values using Python SciPy, getting index error"arrays used as indices must be of integer (or boolean) type"

I have the speed data in that I need to detect the values where threshold is greater than 20 and valley greater than 0. I used this code for peak detection but I am getting index error
import numpy as np
from scipy.signal import find_peaks, find_peaks_cwt
import matplotlib.pyplot as plt
import pandas as pd
import sys
np.set_printoptions(threshold=sys.maxsize)
zero_locs = np.where(x==0)
search_lims = np.append(zero_locs, len(x)) # limits for search area
diff_x = np.diff(x)
diff_x_mapped = diff_x > 0
peak_locs = []
x = np.array([1, 9, 18, 24, 26, 5, 26, 25, 26, 16, 20, 16, 23, 5, 1, 27,
22, 26, 27, 26, 25, 24, 25, 26, 3, 25, 26, 24, 23, 12, 22, 11, 15, 24, 11,
26, 26, 26, 24, 25, 24, 24, 22, 22, 22, 23, 24])
for i in range(len(search_lims)-1):
peak_loc = search_lims[i] + np.where(diff_x_mapped[search_lims[i]:search_lims[i+1]]==0)[0][0]
if x[peak_loc] > 20:
peak_locs.append(peak_loc)
fig= plt.figure(figsize=(10,4))
plt.plot(x)
plt.plot(np.array(peak_locs), x[np.array(peak_locs)], "x", color = 'r')
I tried using peak detection algorithm where it is not detecting peaks where the peak value is above 20 i need to detect the peaks where x values is 0 and peak values is 20
expected output: the marked peaks has to be detected
by running the above script i am getting this error
IndexError: arrays used as indices must be of integer (or boolean) type
how to get ride of this error any suggestions thanks in regards
You found no peaks.
That is, len(peak_locs) is zero.
So you wind up with this array, whose type defaulted to float:
>>> np.array(peak_locs)
array([], dtype=float64)
To fix it?
Find more peaks!

Is there a numpy alternative to this for loop problem?

I have 3 arrays of the same length:
import numpy as np
weights = np.array([10, 14, 18, 22, 26, 30, 32, 34, 36, 38, 40])
resistances = np.array([15, 16.5, 18, 19.5, 21, 24, 27, 30, 33, 36, 39])
depths = np.array([0,1,2,3,4,5,6,7,8,9,10])
I want to take each item in weights, then find the nearest match that is >= this item in resistances, and then using the index of this nearest match I want to return the corresponding value from depths i.e. depths[index].
BUT, with the additional condition that if nothing is >= the max value in weights then just return last value in depths. I then want to populate a list with the results.
Is there a better way than the for loop approach below? I would like to avoid the loop.
SWP = []
for w in weights:
if len(depths[w<=resistances]) == 0:
swp=depths[-1]
else:
swp = np.min(depths[w<=resistances])
SWP.append(swp)
SWP
You can .clip
the indices that np.searchsorted produces with len(resistances)-1:
depths[
np.searchsorted(resistances, weights).clip(max=len(resistances)-1)
]
So any index larger than the last one - will become the last one.
Alternative idea (but only if your resistances are sorted) - clip the weights with the maximum of resistances:
depths[
np.searchsorted(resistances, weights.clip(max=resistances.max()))
]
Usually to do what you're talking about you want to create a function that can be mapped over a list.
import numpy as np
weights = np.array([10, 14, 18, 22, 26, 30, 32, 34, 36, 38, 40])
resistances = np.array([15, 16.5, 18, 19.5, 21, 24, 27, 30, 33, 36, 39])
depth = np.array([0,1,2,3,4,5,6,7,8,9,10])
def evaluate_weight(w):
depths = depth[resistances<=w]
return np.max(depths) if len(depths) else 0
SWP = list(map(evaluate_weight, weights))

Plotly Sankey Diagram, aligning nodes

I created a Sankey diagram using plotly (python) and it looks like this:
As you can see, some links overlap, but this plot can be easily changed (manually) to this:
I think the overlapping result comes from the 3rd column of nodes being centered on Y. Is there a way for me to align the 3rd column to the top (or bottom) to fix this problem? (or any other fix is also welcome of course)
The only thing I've found is setting x and y for nodes manually, but I seem to not be able to only set the y, and this also would involve calculating all those coordinates.
Thank you for the help!
Edit: My code
import plotly.graph_objects as go
sources = [23, 23, 23, 23, 23, 23, 23, 24, 8, 23, 23, 23, 30, 17, 5, 12, 20, 20, 23, 18, 18, 18, 18, 23, 33, 33, 33, 33, 33, 23, 16, 16, 23]
targets = [7, 13, 6, 21, 1, 2, 15, 23, 23, 32, 25, 19, 23, 23, 23, 23, 27, 22, 20, 31, 4, 0, 3, 18, 11, 26, 9, 14, 28, 33, 29, 10, 16]
values = [50.0, 1542.78, 287.44, 2619.76, 1583.26, 722.1, 5133.69, 6544.0, 2563.35, 6476.59, 4314.0, 82.87, 650.0, 1773.68, 16723.0, 32297.7, 81.64, 266.92, 348.56, 388.57, 743.2, 5403.24, 5821.52, 12356.53, 12905.68, 316.12, 497.68, 354.42, 3830.44, 17904.34, 175.95, 1224.46, 1400.41]
fig = go.Figure(data=[go.Sankey(
node = dict(
pad = 5,
thickness = 10,
line = dict(color = "black", width = 0.5),
label = list(range(len(values))),
color = "blue"
),
link = dict(
source = sources,
target = targets,
value = values
))])
fig.update_layout(title_text=
"Basic Sankey Diagram", font_size=8)
fig.write_html("test.html")
There's an open issue on github that both x and y positions have to be set in order for manual positioning to work. Does manually adding y coordinates along with x coordinates address your problem?
In general there other issues with sankey sorting as well.
I have been working with problems in this area only in plotly.R so I'm afraid I can't offer specific python suggestions to modify your code.
If you're also looking for suggestions about calculating the coordinates manually, you can calculate this as
1 - (cumulative_sum_of_higher_nodes + current_node_size/2)
or
1 - (cumulative_sum_of_all_nodes_including_current_node - current_node_size/2)
assuming y = 0 is at the bottom of the plot area.

Using curve_fit to a function defined by indefinite integral in Python

I'm trying to make a code to fit 2 curves with 5 parameters to real data. They are shown here:
The first curve only depends on a,b and gamma. So I decided to use curve_fit once to these 3 (which works) and then use it again on the second curve to adjust the last two alpha and k_0.
Problem is that the second is defined by this indefinite integral and i can't code it properly.
I have tried to treat x as a symbol and integrate using sym.integrate and just integrate normally with quad. Neither worked. In the second case, I get "ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" in "mortes" function.
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.integrate as integrate
import numpy as np
import sympy as sym
#Experimental x and y data points
#Dados de SP
xData = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])
ycasos = np.array([2, 13, 65, 459, 1406, 4466, 8419, 13894, 20004, 31174, 44411, 61183, 80558, 107142, 140549, 172875, 215793, 265581, 312530, 366890, 412027, 479481, 552318, 621731, 697530, 749244, 801422, 853085, 890690, 931673, 970888, 1003429, 1034816, 1062634, 1089255])
ymortes = np.array([0, 0, 15, 84, 260, 560, 991, 1667, 2586, 3608, 4688, 6045, 7532, 9058, 10581, 12494, 14263, 15996, 17702, 19647, 21517, 23236, 25016, 26780, 28392, 29944, 31313, 32567, 33927, 35063, 36136, 37223, 37992, 38726, 39311])
#Dados do Brasil
#xData = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45])
#ycasos = np.array([2,9,121,1128,3912,10298,20818,36739,58973,96559,155939,233142, 347398, 498440, 672846, 850514, 1067579, 1313667, 1577004, 1839850, 2074860, 2394513, 2707877, 3012412, 3317096, 3582362, 3846153, 4123000, 4315687, 4528240, 4717991, 4906833, 5082637, 5224362, 5380635, 5535605, 5653561, 5848959, 6052786, 6290272, 6577177, 6880127, 7213155, 7465806, 7716405, 8013708])
#ymortes = np.array([])
u0 = ycasos[0]
v0 = ymortes[0]
#u(t)
def casos(x,a,b,gama):
return u0 * (a ** (1/gama)) * np.exp(a*x) *((a + b * (u0 ** gama) * (np.exp(a*gama*x)-1)) ** (-1/gama))
#Plot experimental data points
plt.plot(xData, ycasos, 'bo', label='reais')
# Initial guess for the parameters
#initialGuess = [3.0,1.5,0.05]
#Primeiro fit
copt, ccov = curve_fit(casos, xData, ycasos,bounds=(0, [1., 1., np.inf]),maxfev=100000)
a_opt = copt[0]
b_opt = copt[1]
gama_opt = copt[2]
print('Primeira etapa \n')
print('Parametros encontrados: a=%.9f, b=%.9f,gama=%.9f \n' % tuple(copt))
def integrand(t,alpha):
return np.exp((a_opt - alpha)*t) *((a_opt + b_opt * (u0 ** gama_opt) * (np.exp(a_opt*gama_opt*t)-1)) ** (-1/gama_opt))
def mortes(x,k0,alpha):
return u0 * (a_opt ** (1/gama_opt)) * k0 * integrate.quad(integrand, 0, x, args=(alpha)) + v0
#Segundo fit
mopt, mcov = curve_fit(mortes, xData, ymortes, bounds=(0, [np.inf, a_opt]), maxfev=100000)
print('Segunda etapa \n')
print('Parametros encontrados: k0=%.9f, alpha=%.9f \n' % tuple(mopt))
#x values for the fitted function
xFit = np.arange(0.0, float(len(xData)), 0.01)
#Plot the fitted function
plt.plot(xFit, casos(xFit, *copt), 'r', label='estimados')
plt.xlabel('t')
plt.ylabel('casos')
plt.legend()
plt.show()
The upper bound of an integral (integrate.quad) has to be a float, not an array as your x (argument of mortes()):
In this way it should work:
def mortes(x,k0,alpha):
integralRes = []
for upBound in x:
integralRes.append(integrate.quad(integrand, 0, upBound, args=(alpha))[0])
return u0 * (a_opt ** (1/gama_opt)) * k0 * np.array(integralRes) + v0
p.s. Elegant editions of the code style are more than welcomed (like allowing passing an array to upper and lower bounds of integrate.quad ).

Numpy Array Slicing

I have a 1D numpy array, and some offset/length values. I would like to extract from this array all entries which fall within offset, offset+length, which are then used to build up a new 'reduced' array from the original one, that only consists of those values picked by the offset/length pairs.
For a single offset/length pair this is trivial with standard array slicing [offset:offset+length]. But how can I do this efficiently (i.e. without any loops) for many offset/length values?
Thanks,
Mark
>>> import numpy as np
>>> a = np.arange(100)
>>> ind = np.concatenate((np.arange(5),np.arange(10,15),np.arange(20,30,2),np.array([8])))
>>> a[[ind]]
array([ 0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28, 8])
There is the naive method; just doing the slices:
>>> import numpy as np
>>> a = np.arange(100)
>>>
>>> offset_length = [(3,10),(50,3),(60,20),(95,1)]
>>>
>>> np.concatenate([a[offset:offset+length] for offset,length in offset_length])
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 50, 51, 52, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 95])
The following might be faster, but you would have to test/benchmark.
It works by constructing a list of the desired indices, which is valid method of indexing a numpy array.
>>> indices = [offset + i for offset,length in offset_length for i in xrange(length)]
>>> a[indices]
array([ 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 50, 51, 52, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 95])
It's not clear if this would actually be faster than the naive method but it might be if you have a lot of very short intervals. But I don't know.
(This last method is basically the same as #fraxel's solution, just using a different method of making the index list.)
Performance testing
I've tested a few different cases: a few short intervals, a few long intervals, lots of short intervals. I used the following script:
import timeit
setup = 'import numpy as np; a = np.arange(1000); offset_length = %s'
for title, ol in [('few short', '[(3,10),(50,3),(60,10),(95,1)]'),
('few long', '[(3,100),(200,200),(600,300)]'),
('many short', '[(2*x,1) for x in range(400)]')]:
print '**',title,'**'
print 'dbaupp 1st:', timeit.timeit('np.concatenate([a[offset:offset+length] for offset,length in offset_length])', setup % ol, number=10000)
print 'dbaupp 2nd:', timeit.timeit('a[[offset + i for offset,length in offset_length for i in xrange(length)]]', setup % ol, number=10000)
print ' fraxel:', timeit.timeit('a[np.concatenate([np.arange(offset,offset+length) for offset,length in offset_length])]', setup % ol, number=10000)
This outputs:
** few short **
dbaupp 1st: 0.0474979877472
dbaupp 2nd: 0.190793991089
fraxel: 0.128381967545
** few long **
dbaupp 1st: 0.0416231155396
dbaupp 2nd: 1.58000087738
fraxel: 0.228138923645
** many short **
dbaupp 1st: 3.97210478783
dbaupp 2nd: 2.73584890366
fraxel: 7.34302687645
This suggests that my first method is the fastest when you have a few intervals (and it is significantly faster), and my second is the fastest when you have lots of intervals.

Categories

Resources