Related
I have the speed data in that I need to detect the values where threshold is greater than 20 and valley greater than 0. I used this code for peak detection but I am getting index error
import numpy as np
from scipy.signal import find_peaks, find_peaks_cwt
import matplotlib.pyplot as plt
import pandas as pd
import sys
np.set_printoptions(threshold=sys.maxsize)
zero_locs = np.where(x==0)
search_lims = np.append(zero_locs, len(x)) # limits for search area
diff_x = np.diff(x)
diff_x_mapped = diff_x > 0
peak_locs = []
x = np.array([1, 9, 18, 24, 26, 5, 26, 25, 26, 16, 20, 16, 23, 5, 1, 27,
22, 26, 27, 26, 25, 24, 25, 26, 3, 25, 26, 24, 23, 12, 22, 11, 15, 24, 11,
26, 26, 26, 24, 25, 24, 24, 22, 22, 22, 23, 24])
for i in range(len(search_lims)-1):
peak_loc = search_lims[i] + np.where(diff_x_mapped[search_lims[i]:search_lims[i+1]]==0)[0][0]
if x[peak_loc] > 20:
peak_locs.append(peak_loc)
fig= plt.figure(figsize=(10,4))
plt.plot(x)
plt.plot(np.array(peak_locs), x[np.array(peak_locs)], "x", color = 'r')
I tried using peak detection algorithm where it is not detecting peaks where the peak value is above 20 i need to detect the peaks where x values is 0 and peak values is 20
expected output: the marked peaks has to be detected
by running the above script i am getting this error
IndexError: arrays used as indices must be of integer (or boolean) type
how to get ride of this error any suggestions thanks in regards
You found no peaks.
That is, len(peak_locs) is zero.
So you wind up with this array, whose type defaulted to float:
>>> np.array(peak_locs)
array([], dtype=float64)
To fix it?
Find more peaks!
I came across a line of code used to reduce a 3D Tensor to a 2D Tensor in PyTorch. The 3D tensor x is of size torch.Size([500, 50, 1]) and this line of code:
x = x[lengths - 1, range(len(lengths))]
was used to reduce x to a 2D tensor of size torch.Size([50, 1]). lengths is also a tensor of shape torch.Size([50]) containing values.
Please can anyone explain how this works? Thank you.
After being quite stumped by the behavior, I did some more digging into this, and found that it is consistent behavior with the indexing of multi-dimensional NumPy arrays. What makes this counter-intuitive is the less obvious fact that both arrays have to have the same length, i.e. in this case len(lengths).
In fact, it works as the following:
* lengths is determining the order in which you access the first dimension. I.e., if you have a 1D array a = [0, 1, 2, ...., 500], and access it with the list b = [300, 200, 100], then the result a[b] = [301, 201, 101] (This also explains the lengths - 1 operator, which simply causes the accessed values to be the same as the index used in b, or lengths, respectively).
* range(len(lengths)) then *simply chooses the i-th element in the i-th row. If you have a square matrix, you can interpret this as the diagonal of the matrix. Since you only access a single element for each position along the first two dimensions, this can be stored in a single dimension (thus reducing your 3D tensor to 2D). The latter dimension is simply kept "as is".
If you want to play around with this, I strongly recommend to change the range() value to something longer/shorter, which will result in the following error:
IndexError: shape mismatch: indexing arrays could not be broadcast
together with shapes (x,) (y,)
where x and y are your specific length values.
To write this accessing method out in the long form to understand what happens "under the hood", also consider the below example:
import torch
x = torch.randint(500, 50, 1)
lengths = torch.tensor([2, 30, 1, 4]) # random examples to explore
diag = list(range(len(lengths))) # [0, 1, 2, 3]
result = []
for i, row in enumerate(lengths):
temp_tensor = x[row, :, :] # temp_tensor.shape = [1, 50, 1]
temp_tensor = temp_tensor.squeeze(0)[diag[i]] # temp_tensor.shape = [1, 1]
result.append(temp.tensor)
# back to pytorch
result = torch.tensor(result)
result.shape # [4, 1]
The key feature here is passing values of a tensor lengths as indices for x.
Here simplified example, I swaped dimensions of container, so index dimenson goes first:
container = torch.arange(0, 50 )
container = f.reshape((5, 10))
>>>tensor([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]])
indices = torch.arange( 2, 7, dtype=torch.long )
>>>tensor([2, 3, 4, 5, 6])
print( container[ range( len(indices) ), indices] )
>>>tensor([ 2, 13, 24, 35, 46])
Note: we got one thing from a row ( range( len(indices) ) makes sequential row numbers), with column number given by indices[ row_number ]
I have a list:
code = ['<s>', 'are', 'defined', 'in', 'the', '"editable', 'parameters"', '\n', 'section.', '\n', 'A', 'larger', '`tsteps`', 'value', 'means', 'that', 'the', 'LSTM', 'will', 'need', 'more', 'memory', '\n', 'to', 'figure', 'out']
And I want to convert to one hot encoding. I tried:
to_categorical(code)
And I get an error: ValueError: invalid literal for int() with base 10: '<s>'
What am I doing wrong?
keras only supports one-hot-encoding for data that has already been integer-encoded. You can manually integer-encode your strings like so:
Manual encoding
# this integer encoding is purely based on position, you can do this in other ways
integer_mapping = {x: i for i,x in enumerate(code)}
vec = [integer_mapping[word] for word in code]
# vec is
# [0, 1, 2, 3, 16, 5, 6, 22, 8, 22, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
Using scikit-learn
from sklearn.preprocessing import LabelEncoder
import numpy as np
code = np.array(code)
label_encoder = LabelEncoder()
vec = label_encoder.fit_transform(code)
# array([ 2, 6, 7, 9, 19, 1, 16, 0, 17, 0, 3, 10, 5, 21, 11, 18, 19,
# 4, 22, 14, 13, 12, 0, 20, 8, 15])
You can now feed this into keras.utils.to_categorical:
from keras.utils import to_categorical
to_categorical(vec)
instead use
pandas.get_dummies(y_train)
tf.keras.layers.CategoryEncoding
In TF 2.6.0, One Hot Encoding (OHE) or Multi Hot Encoding (MHE) can be implemented using tf.keras.layers.CategoryEncoding , tf.keras.layers.StringLookup, and tf.keras.layers.IntegerLookup.
I think this way is not plausible in TF 2.4.x so it must have been implemented after.
See Classify structured data using Keras preprocessing layers for the actual implementation.
def get_category_encoding_layer(name, dataset, dtype, max_tokens=None):
# Create a layer that turns strings into integer indices.
if dtype == 'string':
index = layers.StringLookup(max_tokens=max_tokens)
# Otherwise, create a layer that turns integer values into integer indices.
else:
index = layers.IntegerLookup(max_tokens=max_tokens)
# Prepare a `tf.data.Dataset` that only yields the feature.
feature_ds = dataset.map(lambda x, y: x[name])
# Learn the set of possible values and assign them a fixed integer index.
index.adapt(feature_ds)
# Encode the integer indices.
encoder = layers.CategoryEncoding(num_tokens=index.vocabulary_size())
# Apply multi-hot encoding to the indices. The lambda function captures the
# layer, so you can use them, or include them in the Keras Functional model later.
return lambda feature: encoder(index(feature))
Try converting it to a numpy array first:
from numpy import array
and then:
to_categorical(array(code))
Lately I've been doing a lot of processing on 8x8 blocks of image-data.
Standard approach has been to use nested for-loops to extract the blocks, e.g.
for y in xrange(0,height,8):
for x in xrange(0,width,8):
d = image_data[y:y+8,x:x+8]
# further processing on the 8x8-block
I can't help to wonder if there is a way to vectorize this operation or another approach using numpy/scipy that I can use instead? An iterator of some kind?
A MWE1:
#!/usr/bin/env python
import sys
import numpy as np
from scipy.fftpack import dct, idct
import scipy.misc
import matplotlib.pyplot as plt
def dctdemo(coeffs=1):
unzig = np.array([
0, 1, 8, 16, 9, 2, 3, 10,
17, 24, 32, 25, 18, 11, 4, 5,
12, 19, 26, 33, 40, 48, 41, 34,
27, 20, 13, 6, 7, 14, 21, 28,
35, 42, 49, 56, 57, 50, 43, 36,
29, 22, 15, 23, 30, 37, 44, 51,
58, 59, 52, 45, 38, 31, 39, 46,
53, 60, 61, 54, 47, 55, 62, 63])
lena = scipy.misc.lena()
width, height = lena.shape
# reconstructed
rec = np.zeros(lena.shape, dtype=np.int64)
# Can this part be vectorized?
for y in xrange(0,height,8):
for x in xrange(0,width,8):
d = lena[y:y+8,x:x+8].astype(np.float)
D = dct(dct(d.T, norm='ortho').T, norm='ortho').reshape(64)
Q = np.zeros(64, dtype=np.float)
Q[unzig[:coeffs]] = D[unzig[:coeffs]]
Q = Q.reshape([8,8])
q = np.round(idct(idct(Q.T, norm='ortho').T, norm='ortho'))
rec[y:y+8,x:x+8] = q.astype(np.int64)
plt.imshow(rec, cmap='gray')
plt.show()
if __name__ == '__main__':
try:
c = int(sys.argv[1])
except ValueError:
sys.exit()
else:
if 1 <= int(sys.argv[1]) <= 64:
dctdemo(int(sys.argv[1]))
Footnotes:
Actual application: https://github.com/figgis/dctdemo
There's a function view_as_windows for this in Scikit Image
http://scikit-image.org/docs/dev/api/skimage.util.html#view-as-windows
Unfortunately I will have to finish this answer another time, but you can grab the windows in a form that you can pass to dct with:
from skimage.util import view_as_windows
# your code...
d = view_as_windows(lena.astype(np.float), (8, 8)).reshape(-1, 8, 8)
dct(d, axis=0)
There is a function called extract_patches in the scikit-learn feature extraction routines. You need to specify a patch_size and an extraction_step. The result will be a view on your image as patches, which may overlap. The resulting array is 4D, the first 2 index the patch, and the last two index the pixels of the patch. Try this
from sklearn.feature_extraction.image import extract_patches
patches = extract_patches(image_data, patch_size=(8, 8), extraction_step=(4, 4))
This gives (8, 8) size patches that overlap by half.
Note that up until now this uses no extra memory, because it is implemented using stride tricks. You can force a copy by reshaping
patches = patches.reshape(-1, 8, 8)
which will basically yield a list of patches.
If I have an ndarray like this:
>>> a = np.arange(27).reshape(3,3,3)
>>> a
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I know I can get the maximum along a certain axis using np.max(axis=...):
>>> a.max(axis=2)
array([[ 2, 5, 8],
[11, 14, 17],
[20, 23, 26]])
Alternatively, I could get the indices along that axis which correspond to the maximum values from:
>>> indices = a.argmax(axis=2)
>>> indices
array([[2, 2, 2],
[2, 2, 2],
[2, 2, 2]])
My question -- Given the array indices and the array a, is there an elegant way to reproduce the array the array returned by a.max(axis=2)?
This would probably work:
import itertools as it
import numpy as np
def apply_mask(field,indices):
data = np.empty(indices.shape)
#It seems highly likely that there is a more numpy-approved way to do this.
idx = [range(i) for i in indices.shape]
for idx_tup,zidx in zip(it.product(*idx),indices.flat):
data[idx_tup] = field[idx_tup+(zidx,)]
return data
But, it seems pretty hacky/inefficient. It also doesn't allow for me to use this with any axis other than the "last" axis. Is there a numpy function (or some use of magical numpy indexing) to make this work? The naive a[:,:,a.argmax(axis=2)] doesn't work.
UPDATE:
It seems the following also works (and is a little nicer):
import numpy as np
def apply_mask(field,indices):
data = np.empty(indices.shape)
for idx_tup,zidx in np.ndenumerate(indices):
data[idx_tup] = field[idx_tup+(zidx,)]
return data
I would like to do this because I would like to extract the indices based on the data in 1 array (typically using argmax(axis=...)) and use those indices to pull data out of a bunch of other (equivalently shaped) arrays. I'm open to alternative ways to accomplish this (e.g. using boolean masked arrays). However, I like the "safety" that I get using these "index" arrays. With this I am guaranteed to have the right number of elements to create a new array which looks like a 2d "slice" through the 3d field.
Here is some magic numpy indexing that will do what you want, but unfortunately it's pretty unreadable.
def apply_mask(a, indices, axis):
magic_index = [np.arange(i) for i in indices.shape]
magic_index = np.ix_(*magic_index)
magic_index = magic_index[:axis] + (indices,) + magic_index[axis:]
return a[magic_index]
or equally unreadable:
def apply_mask(a, indices, axis):
magic_index = np.ogrid[tuple(slice(i) for i in indices.shape)]
magic_index.insert(axis, indices)
return a[magic_index]
I use index_at() to create the full index:
import numpy as np
def index_at(idx, shape, axis=-1):
if axis<0:
axis += len(shape)
shape = shape[:axis] + shape[axis+1:]
index = list(np.ix_(*[np.arange(n) for n in shape]))
index.insert(axis, idx)
return tuple(index)
a = np.random.randint(0, 10, (3, 4, 5))
axis = 1
idx = np.argmax(a, axis=axis)
print a[index_at(idx, a.shape, axis=axis)]
print np.max(a, axis=axis)