Conditional Numpy slicing and append to another array - python

I have two numpy arrays in Python.
vec_1 = np.array([2.3, 1.4, 7.3, 1.8, 0, 0, 0])
vec_2 = np.array([29, 7, 5.8, 2.4, 6.7, 5, 8])
I am wanting a slice from vec_1 where the slice would be all 0's (except the last one) plus the preceding (Non 0) value so that the slice from vec_1 would be:
slice = ([1.8,0,0])
The slice would replace the last x elements of vec_2 so that it would look like so:
vec_2 = ([29, 7, 5.8, 2.4, 1.8, 0, 0])
vec_2's last 3 elements in this example are replaced by the slice from vec_1. Lastly, how could this be made dynamic so that slice lengths are determined in step 1 and then replace the last x elements in vec_2. When a 0 is observed in vec_1, it will be 0 from that point to the end of the array.

import numpy as np
vec_1 = np.array([2.3, 1.4, 7.3, 1.8, 0, 0, 0])
vec_2 = np.array([29, 7, 5.8, 2.4, 6.7, 5, 8])
##Take the lowest value where 0 appears in vec_1 and subtract 1. :-1 to remove the last 0
vec_1_slice = vec_1[np.where(vec_1 == 0)[0][0] - 1:-1]
##Remove the last however many digits are in vec_1_slice then add vec_1_slice
vec_2 = np.append(vec_2[:-len(vec_1_slice)], vec_1_slice)
Output
vec_2
Out[237]: array([29. , 7. , 5.8, 2.4, 1.8, 0. , 0. ])

Related

Find numpy array coordinates of neighboring maximum

I used the accepted answer in this question to obtain local maxima in a numpy array of 2 or more dimensions so I could assign labels to them. Now I would like to also assign these labels to neighboring cells in the array, depending on gradient – i.e. a cell gets the same label as the neighboring cell with the highest value. This way I can iteratively assign labels to my entire array.
Assume I have an array A like
>>> A = np.array([[ 1. , 2. , 2.2, 3.5],
[ 2.1, 2.4, 3. , 3.3],
[ 1. , 3. , 3.2, 3. ],
[ 2. , 4.1, 4. , 2. ]])
Applying the maximum_filter I get
>>> scipy.ndimage.filters.maximum_filter(A, size=3)
array([[ 2.4, 3. , 3.5, 3.5],
[ 3. , 3.2, 3.5, 3.5],
[ 4.1, 4.1, 4.1, 4. ],
[ 4.1, 4.1, 4.1, 4. ]])
Now, for every cell in this array I would like to have the coordinates of the maximum found by the filter, i.e.
array([[[1,1],[1,2],[0,3],[0,3]],
[[2,1],[2,2],[0,3],[0,3]],
[[3,1],[3,1],[3,1],[3,2]],
[[3,1],[3,1],[3,1],[3,2]]])
I would then use these coordinates to assign my labels iteratively.
I can do it for two dimensions using loops, ignoring borders
highest_neighbor_coordinates = np.array([[(argmax2D(A[i-1:i+2, j-1:j+2])+np.array([i-1, j-1])) for j in range(1, A.shape[1]-1)] for i in range(1, A.shape[0]-1)])
but after seeing the many filter functions in scipy.ndimage I was hoping there would be a more elegant and extensible (to >=3 dimensions) solution.
We can use pad with reflected elements to simulate the max-filter operation and get sliding windows on it with scikit-image's view_as_windows, compute the flattened argmax indices, offset those with ranged values to translate onto global scale -
from skimage.util import view_as_windows as viewW
def window_argmax_global2D(A, size):
hsize = (size-1)//2 # expects size as odd number
m,n = A.shape
A1 = np.pad(A, (hsize,hsize), mode='reflect')
idx = viewW(A1, (size,size)).reshape(-1,size**2).argmax(-1).reshape(m,n)
r,c = np.unravel_index(idx, (size,size))
rows = np.abs(r + np.arange(-hsize,m-hsize)[:,None])
cols = np.abs(c + np.arange(-hsize,n-hsize))
return rows, cols
Sample run -
In [201]: A
Out[201]:
array([[1. , 2. , 2.2, 3.5],
[2.1, 2.4, 3. , 3.3],
[1. , 3. , 3.2, 3. ],
[2. , 4.1, 4. , 2. ]])
In [202]: rows, cols = window_argmax_global2D(A, size=3)
In [203]: rows
Out[203]:
array([[1, 1, 0, 0],
[2, 2, 0, 0],
[3, 3, 3, 3],
[3, 3, 3, 3]])
In [204]: cols
Out[204]:
array([[1, 2, 3, 3],
[1, 2, 3, 3],
[1, 1, 1, 2],
[1, 1, 1, 2]])
Extending to n-dim
We would use np.ogrid for this extension part :
def window_argmax_global(A, size):
hsize = (size-1)//2 # expects size as odd number
shp = A.shape
N = A.ndim
A1 = np.pad(A, (hsize,hsize), mode='reflect')
idx = viewW(A1, ([size]*N)).reshape(-1,size**N).argmax(-1).reshape(shp)
offsets = np.ogrid[tuple(map(slice, shp))]
out = np.unravel_index(idx, ([size]*N))
return [np.abs(i+j-hsize) for i,j in zip(out,offsets)]

irregular slicing in numpy

consider the below numpy array
a = np.arange(20)
and the slicing requirement give below
b = [[0,4],
[4,9],
[9,15],
[15,19]]
how can i slice 'a' based on irregular slicing information in 'b'? like for example:
np.mean(a[b[:,0]:b[:,1]])
I know how to achieve this with loop statement, like
[np.mean(a[b[_][0]:b[_][1]]) for _ in range(len(b))]
but is there a way in which i can avoid using loops?
You can use np.add.reduceat with flattened b as the indices:
np.add.reduceat(a, np.ravel(b))[::2]/np.diff(b, axis=1).ravel()
# array([ 1.5, 6. , 11.5, 16.5])
with for loop:
[np.mean(a[b[_][0]:b[_][1]]) for _ in range(len(b))]
# [1.5, 6.0, 11.5, 16.5]
For more, you can see the first example in help(np.add.reduceat):
Examples
--------
To take the running sum of four successive values:
>>> np.add.reduceat(np.arange(8),[0,4, 1,5, 2,6, 3,7])[::2]
array([ 6, 10, 14, 18])
Let's try np.split.
>>> list(map(np.mean, np.split(a, b[:, 1])))
[1.5, 6.0, 11.5, 16.5, 19.0]
Using a list comprehension:
>>> [np.mean(x) for x in np.split(a, b[:, 1])]
[1.5, 6.0, 11.5, 16.5, 19.0]
Using cumsum and np.diff
c = b[:, 1]
np.diff(np.append(0, a.cumsum()[c - 1])) / np.diff(np.append(0, c))
array([ 1.5, 6. , 11.5, 16.5])

Need to collate information from array in python, by finding match values that lie in different rows

I'm looking to collate 2 entries into one for many columns in a data array, by checking to see if several values in the two entries are the same.
0 A [[0.0, 0.5, 2.5, 2.5]
1 B [0.5, 1.0, 2.0, 2.0]
2 M [2.5, 2.5, 0.5, 0.0]
3 N [2.0, 2.0, 1.0, 0.5]
4 R [14.3, 13.8, 13.9, 14.2]]
Above shows the format the array takes, with the numbering and annotation of the rows on the left. Each column in the array is one distinct measurement.
Rows 0-3 are the x-locations along a straight line of 2 pairs of electrodes used to make a measurement (pair 1 = A & B, pair 2 = M & N); R is the measured resistivity when the four electrodes above it are used. As can be seen, in the 1st and 4th measurement, pair AB of measurement 1 = pair MN of measurement 4, and vice versa. The same is true of the 2nd and 3rd reading.
What I'm trying to do is to search through the array to find each pair of measurements, then collate that into one entry. This entry would take the first measurement's electrode locations (A,B,M &N), together with the first measurement's R value, but would also contain an extra row with the second measurement's R value. The result from the example above can be seen below.
0 A [[0.0, 0.5]
1 B [0.5, 1.0]
2 M [2.5, 2.5]
3 N [2.0, 2.0]
4 R1 [14.3, 13.8]
5 R2 [13.9, 14.2]]
Some information that may be useful:
The numbers are floats
The first set of measurements (i.e. before there will be any pairs) are in the first half of the dataset. What I mean by that is if there was an array with 100 columns(equalling 100 measurements), the columns 51-100 would be the pairs of the columns 1-50. The columns 51-100 do not follow the same pattern as the columns 1-50 though (i.e. column 1 wouldn't always equal column 51 in that example).
The electrodes do always follow the same pattern in the pair of measurements; "A" in measurement 1 will always = "M in measurement 2 in the pair, equally B = N, M = A & N = B.
I've been thinking bout how to do it, and I've thought that some kind of if statement such as the one below may be a start, but really I'm a complete novice, and this is quite a complex problem to search for an answer.
if all(A1 == M2, B1 == N2, M1 == A2, N1 == B2):
Any help would be really appreciated, even if it's just a pointer to wherever would be a good starting point to search for more information.
Thanks in advance!
Edit
Just to clarify, the order of R2 is liable to change for each dataset, and isn't the same as the order of R1. What I'm after doing is querying the A, B, M & N values to find the pairs of readings, then adding the paired R2 reading under its corresponding R1 reading.
Here is an example dataset that is a little larger:
#Input array
Arr1 =
[[0.5, 0.5, 1, 1, 1.5, 1.5, 1.5, 5, 4.5, 4.5, 3.5, 2.5, 2, 1]
[0, 0, 0.5, 0.5, 1, 1, 0.5, 5.5, 5, 5.5, 4, 3, 2.5, 1.5]
[1, 3.5, 2.5, 5, 2, 4.5, 4.5, 1, 1.5, 1.5, 0.5, 1, 1.5, 0.5]
[1.5, 4, 3, 5.5, 2.5, 5, 5.5, 0.5, 1, 0.5, 0, 0.5, 1, 0]
[14.3, 13.3, 25.1, 17.2, 19.9, 15.4, 16.1, 17.1, 15.3, 16.1, 13.4, 25.1, 19.8, 14.4]]
#Output array - extra R row and half the columns
Arr2 =
[[0.5, 0.5, 1, 1, 1.5, 1.5, 1.5]
[0, 0, 0.5, 0.5, 1, 1, 0.5]
[1, 3.5, 2.5, 5, 2, 4.5, 4.5]
[1.5, 4, 3, 5.5, 2.5, 5, 5.5]
[14.3, 13.3, 25.1, 17.2, 19.9, 15.4, 16.1]
[14.4, 13.4, 25.1, 17.1, 19.8, 15.3, 16.1]]
Here's a way to find the index of each R2 value that you're after and create the final transformation to your specifications, edited based on our earlier dialogue in the comments below:
#Input array
Arr1 = [[0.5, 0.5, 1, 1, 1.5, 1.5, 1.5, 5, 4.5, 4.5, 3.5, 2.5, 2, 1],
[0, 0, 0.5, 0.5, 1, 1, 0.5, 5.5, 5, 5.5, 4, 3, 2.5, 1.5],
[1, 3.5, 2.5, 5, 2, 4.5, 4.5, 1, 1.5, 1.5, 0.5, 1, 1.5, 0.5],
[1.5, 4, 3, 5.5, 2.5, 5, 5.5, 0.5, 1, 0.5, 0, 0.5, 1, 0],
[14.3, 13.3, 25.1, 17.2, 19.9, 15.4, 16.1,
17.1, 15.3, 16.1, 13.4, 25.1, 19.8, 14.4]]
#Output array - extra R row and half the columns
Arr2 = [[0.5, 0.5, 1, 1, 1.5, 1.5, 1.5],
[0, 0, 0.5, 0.5, 1, 1, 0.5],
[1, 3.5, 2.5, 5, 2, 4.5, 4.5],
[1.5, 4, 3, 5.5, 2.5, 5, 5.5],
[14.3, 13.3, 25.1, 17.2, 19.9, 15.4, 16.1],
[14.4, 13.4, 25.1, 17.1, 19.8, 15.3, 16.1]]
# get the first half of each list in Arr1
half_1 = [i[:len(i)//2] for i in Arr1[:-1]]
# 'flip' the arrays so that there's a list for each element 0, 1, ...
half_1_flip = [[i[j] for i in half_1] for j in range(len(half_1[0]))]
# get the second half of each list in Arr1
half_2 = [i[len(i)//2:] for i in Arr1[:-1]]
# 'rotate' the arrays so that A / B and M / N switch places
half_2_rotate = half_2[len(half_2)//2:] + half_2[:len(half_2)//2]
# 'flip' the arrays so that there's a list for each element 0, 1, ...
half_2_flip = [[i[j] for i in half_2_rotate]
for j in range(len(half_2_rotate[0]))]
# find each matching index of the first flipped list in the second list
seek_indices = [half_2_flip.index(a) for i, a in enumerate(half_1_flip)]
# pull out original R1 and R2
r1 = Arr1[-1][:len(Arr1[-1])//2]
r2 = Arr1[-1][len(Arr1[-1])//2:]
# reorder R2 based on indices
ordered_r2 = [r2[i] for i in seek_indices]
# get final transform
transform = half_1 + [r1] + [ordered_r2]
assert transform == Arr2
Another approach to the problem might be slicing the data using the following function:
import numpy as np
def transform(arr):
arr1 = arr[:,0:2]
arr1 = np.append(arr1,[arr[-1,2:]],axis=0)
return arr1
using given data:
arr = np.array([[0.0, 0.5, 2.5, 2.5],
[0.5, 1.0, 2.0, 2.0],
[2.5, 2.5, 0.5, 0.0],
[2.0, 2.0, 1.0, 0.5],
[14.3, 13.8, 13.9, 14.2]])
transform(arr) returns:
array([[ 0. , 0.5],
[ 0.5, 1. ],
[ 2.5, 2.5],
[ 2. , 2. ],
[ 14.3, 13.8],
[ 13.9, 14.2]])

Numpy: use bins with infinite range

In my Python script I have floats that I want to bin. Right now I'm doing:
min_val = 0.0
max_val = 1.0
num_bins = 20
my_bins = numpy.linspace(min_val, max_val, num_bins)
hist,my_bins = numpy.histogram(myValues, bins=my_bins)
But now I want to add two more bins to account for values that are < 0.0 and for those that are > 1.0. One bin should thus include all values in ( -inf, 0), the other one all in [1, inf)
Is there any straightforward way to do this while still using numpy's histogram function?
The function numpy.histogram() happily accepts infinite values in the bins argument:
numpy.histogram(my_values, bins=numpy.r_[-numpy.inf, my_bins, numpy.inf])
Alternatively, you could use a combination of numpy.searchsorted() and numpy.bincount(), though I don't see much advantage to that approach.
You can specify numpy.inf as the upper and -numpy.inf as the lower bin limits.
With Numpy version 1.16 you have histogram_bin_edges. With this, todays solution calls histogram_bin_edges to get the bins, concatenate -inf and +inf and pass this as bins to histogram:
a=[1,2,3,4,2,3,4,7,4,6,7,5,4,3,2,3]
np.histogram(a, bins=np.concatenate(([np.NINF], np.histogram_bin_edges(a), [np.PINF])))
Results in:
(array([0, 1, 3, 0, 4, 0, 4, 1, 0, 1, 0, 2]),
array([-inf, 1. , 1.6, 2.2, 2.8, 3.4, 4. , 4.6, 5.2, 5.8, 6.4, 7. , inf]))
if you prefer to have the last bin empty (as I do), you can use the range parameter and add a small number to max:
a=[1,2,3,4,2,3,4,7,4,6,7,5,4,3,2,3]
np.histogram(a, bins=np.concatenate(([np.NINF], np.histogram_bin_edges(a, range=(np.min(a), np.max(a)+.1)), [np.PINF])))
Results in:
(array([0, 1, 3, 0, 4, 4, 0, 1, 0, 1, 2, 0]),
array([-inf, 1. , 1.61, 2.22, 2.83, 3.44, 4.05, 4.66, 5.27, 5.88, 6.49, 7.1 , inf]))

How to improve my performance in filling gaps in time series and data lists with Python

I'm having a time series data sets comprising of 10 Hz data over several years. For one year my data has around 3.1*10^8 rows of data (each row has a time stamp and 8 float values). My data has gaps which I need to identify and fill with 'NaN'. My python code below is capable of doing so but the performance is by far too bad for my kind of problem. I cannot get though my data set in anything even close to a reasonable time.
Below an minimal working example.
I have for example series (time-seris-data) and data as lits with same lengths:
series = [1.1, 2.1, 3.1, 7.1, 8.1, 9.1, 10.1, 14.1, 15.1, 16.1, 20.1]
data_a = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
data_b = [1.2, 1.2, 1.2, 2.2, 2.2, 2.2, 2.2, 3.2, 3.2, 3.2, 4.2]
I would like series to advance in intervals of 1, hence the gaps of series are 4.1, 5.1, 6.1, 11.1, 12.1, 13.1, 17.1, 18.1, 19.1. The data_a and data_b lists shall be filled with float(nan)'s.
so data_a for example should become:
[1.2, 1.2, 1.2, nan, nan, nan, 2.2, 2.2, 2.2, 2.2, nan, nan, nan, 3.2, 3.2, 3.2, nan, nan, nan, 4.2]
I archived this using:
d_max = 1.0 # Normal increment in series where no gaps shall be filled
shift = 0
for i in range(len(series)-1):
diff = series[i+1] - series[i]
if diff > d_max:
num_fills = round(diff/d_max)-1 # Number of fills within one gap
for it in range(num_fills):
data_a.insert(i+1+it+shift, float(nan))
data_b.insert(i+1+it+shift, float(nan))
shift = int(shift + num_fills) # Shift the index by the number of inserts from the previous gap filling
I searched for other solutions to this problems but only came across the use of the find() function yielding the indices of the gaps. Is the function find() faster than my solution? But then how would I insert NaN's in data_a and data_b in a more efficient way?
First, realize that your innermost loop is not necessary:
for it in range(num_fills):
data_a.insert(i+1+it+shift, float(nan))
is the same as
data_a[i+1+shift:i+1+shift] = [float(nan)] * int(num_fills)
That might make it slightly faster because there's less allocation and less moving items going on.
Then, for large numerical problems, always use NumPy. It may take some effort to learn, but the performance is likely to go up orders of magnitude. Start with something like:
import numpy as np
series = np.array([1.1, 2.1, 3.1, 7.1, 8.1, 9.1, 10.1, 14.1, 15.1, 16.1, 20.1])
data_a = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
data_b = [1.2, 1.2, 1.2, 2.2, 2.2, 2.2, 2.2, 3.2, 3.2, 3.2, 4.2]
d_max = 1.0 # Normal increment in series where no gaps shall be filled
shift = 0
# the following two statements use NumPy's broadcasting
# to implicit run some loop at the C level
diff = series[1:] - series[:-1]
num_fills = np.round(diff / d_max) - 1
for i in np.where(diff > d_max)[0]:
nf = num_fills[i]
nans = [np.nan] * nf
data_a[i+1+shift:i+1+shift] = nans
data_b[i+1+shift:i+1+shift] = nans
shift = int(shift + nf)
IIRC, inserts into python lists are expensive, with the size of the list.
I'd recommend not loading your huge data sets into memory, but to iterate through them with a generator function something like:
from itertools import izip
series = [1.1, 2.1, 3.1, 7.1, 8.1, 9.1, 10.1, 14.1, 15.1, 16.1, 20.1]
data_a = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
data_b = [1.2, 1.2, 1.2, 2.2, 2.2, 2.2, 2.2, 3.2, 3.2, 3.2, 4.2]
def fillGaps(series,data_a,data_b,d_max=1.0):
prev = None
for s, a, b in izip(series,data_a,data_b):
if prev is not None:
diff = s - prev
if s - prev > d_max:
for x in xrange(int(round(diff/d_max))-1):
yield (float('nan'),float('nan'))
prev = s
yield (a,b)
newA = []
newB = []
for a,b in fillGaps(series,data_a,data_b):
newA.append(a)
newB.append(b)
E.g. read the data into the izip and write it out instead of list appends.

Categories

Resources