Numpy: use bins with infinite range - python

In my Python script I have floats that I want to bin. Right now I'm doing:
min_val = 0.0
max_val = 1.0
num_bins = 20
my_bins = numpy.linspace(min_val, max_val, num_bins)
hist,my_bins = numpy.histogram(myValues, bins=my_bins)
But now I want to add two more bins to account for values that are < 0.0 and for those that are > 1.0. One bin should thus include all values in ( -inf, 0), the other one all in [1, inf)
Is there any straightforward way to do this while still using numpy's histogram function?

The function numpy.histogram() happily accepts infinite values in the bins argument:
numpy.histogram(my_values, bins=numpy.r_[-numpy.inf, my_bins, numpy.inf])
Alternatively, you could use a combination of numpy.searchsorted() and numpy.bincount(), though I don't see much advantage to that approach.

You can specify numpy.inf as the upper and -numpy.inf as the lower bin limits.

With Numpy version 1.16 you have histogram_bin_edges. With this, todays solution calls histogram_bin_edges to get the bins, concatenate -inf and +inf and pass this as bins to histogram:
a=[1,2,3,4,2,3,4,7,4,6,7,5,4,3,2,3]
np.histogram(a, bins=np.concatenate(([np.NINF], np.histogram_bin_edges(a), [np.PINF])))
Results in:
(array([0, 1, 3, 0, 4, 0, 4, 1, 0, 1, 0, 2]),
array([-inf, 1. , 1.6, 2.2, 2.8, 3.4, 4. , 4.6, 5.2, 5.8, 6.4, 7. , inf]))
if you prefer to have the last bin empty (as I do), you can use the range parameter and add a small number to max:
a=[1,2,3,4,2,3,4,7,4,6,7,5,4,3,2,3]
np.histogram(a, bins=np.concatenate(([np.NINF], np.histogram_bin_edges(a, range=(np.min(a), np.max(a)+.1)), [np.PINF])))
Results in:
(array([0, 1, 3, 0, 4, 4, 0, 1, 0, 1, 2, 0]),
array([-inf, 1. , 1.61, 2.22, 2.83, 3.44, 4.05, 4.66, 5.27, 5.88, 6.49, 7.1 , inf]))

Related

Numpy BinCount for Float Values

I am using numpy.bincount previously for integers and it worked. However, after reviewing the documentation, this method only works for integers. How can produce a similar count but for float values (between 0 and 1)?
Current Code:
import numpy as np
np.bincount(np.array([0, 1, 1, 3, 2, 1, 7]))
>>> array([1, 3, 1, 1, 0, 0, 0, 1])
np.bincount(np.array([0.91, 0.74, 1.0, 0.89, 0.91, 0.74]))
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'
Use np.histogram
arr = np.array([0.91, 0.74, 1.0, 0.89, 0.91, 0.74])
counts, edges = np.histogram(arr, bins=5)
print(counts)
# array([2, 0, 1, 2, 1], dtype=int64)
print(edges)
# array([0.74 , 0.792, 0.844, 0.896, 0.948, 1. ])

Conditional Numpy slicing and append to another array

I have two numpy arrays in Python.
vec_1 = np.array([2.3, 1.4, 7.3, 1.8, 0, 0, 0])
vec_2 = np.array([29, 7, 5.8, 2.4, 6.7, 5, 8])
I am wanting a slice from vec_1 where the slice would be all 0's (except the last one) plus the preceding (Non 0) value so that the slice from vec_1 would be:
slice = ([1.8,0,0])
The slice would replace the last x elements of vec_2 so that it would look like so:
vec_2 = ([29, 7, 5.8, 2.4, 1.8, 0, 0])
vec_2's last 3 elements in this example are replaced by the slice from vec_1. Lastly, how could this be made dynamic so that slice lengths are determined in step 1 and then replace the last x elements in vec_2. When a 0 is observed in vec_1, it will be 0 from that point to the end of the array.
import numpy as np
vec_1 = np.array([2.3, 1.4, 7.3, 1.8, 0, 0, 0])
vec_2 = np.array([29, 7, 5.8, 2.4, 6.7, 5, 8])
##Take the lowest value where 0 appears in vec_1 and subtract 1. :-1 to remove the last 0
vec_1_slice = vec_1[np.where(vec_1 == 0)[0][0] - 1:-1]
##Remove the last however many digits are in vec_1_slice then add vec_1_slice
vec_2 = np.append(vec_2[:-len(vec_1_slice)], vec_1_slice)
Output
vec_2
Out[237]: array([29. , 7. , 5.8, 2.4, 1.8, 0. , 0. ])

Find numpy array coordinates of neighboring maximum

I used the accepted answer in this question to obtain local maxima in a numpy array of 2 or more dimensions so I could assign labels to them. Now I would like to also assign these labels to neighboring cells in the array, depending on gradient – i.e. a cell gets the same label as the neighboring cell with the highest value. This way I can iteratively assign labels to my entire array.
Assume I have an array A like
>>> A = np.array([[ 1. , 2. , 2.2, 3.5],
[ 2.1, 2.4, 3. , 3.3],
[ 1. , 3. , 3.2, 3. ],
[ 2. , 4.1, 4. , 2. ]])
Applying the maximum_filter I get
>>> scipy.ndimage.filters.maximum_filter(A, size=3)
array([[ 2.4, 3. , 3.5, 3.5],
[ 3. , 3.2, 3.5, 3.5],
[ 4.1, 4.1, 4.1, 4. ],
[ 4.1, 4.1, 4.1, 4. ]])
Now, for every cell in this array I would like to have the coordinates of the maximum found by the filter, i.e.
array([[[1,1],[1,2],[0,3],[0,3]],
[[2,1],[2,2],[0,3],[0,3]],
[[3,1],[3,1],[3,1],[3,2]],
[[3,1],[3,1],[3,1],[3,2]]])
I would then use these coordinates to assign my labels iteratively.
I can do it for two dimensions using loops, ignoring borders
highest_neighbor_coordinates = np.array([[(argmax2D(A[i-1:i+2, j-1:j+2])+np.array([i-1, j-1])) for j in range(1, A.shape[1]-1)] for i in range(1, A.shape[0]-1)])
but after seeing the many filter functions in scipy.ndimage I was hoping there would be a more elegant and extensible (to >=3 dimensions) solution.
We can use pad with reflected elements to simulate the max-filter operation and get sliding windows on it with scikit-image's view_as_windows, compute the flattened argmax indices, offset those with ranged values to translate onto global scale -
from skimage.util import view_as_windows as viewW
def window_argmax_global2D(A, size):
hsize = (size-1)//2 # expects size as odd number
m,n = A.shape
A1 = np.pad(A, (hsize,hsize), mode='reflect')
idx = viewW(A1, (size,size)).reshape(-1,size**2).argmax(-1).reshape(m,n)
r,c = np.unravel_index(idx, (size,size))
rows = np.abs(r + np.arange(-hsize,m-hsize)[:,None])
cols = np.abs(c + np.arange(-hsize,n-hsize))
return rows, cols
Sample run -
In [201]: A
Out[201]:
array([[1. , 2. , 2.2, 3.5],
[2.1, 2.4, 3. , 3.3],
[1. , 3. , 3.2, 3. ],
[2. , 4.1, 4. , 2. ]])
In [202]: rows, cols = window_argmax_global2D(A, size=3)
In [203]: rows
Out[203]:
array([[1, 1, 0, 0],
[2, 2, 0, 0],
[3, 3, 3, 3],
[3, 3, 3, 3]])
In [204]: cols
Out[204]:
array([[1, 2, 3, 3],
[1, 2, 3, 3],
[1, 1, 1, 2],
[1, 1, 1, 2]])
Extending to n-dim
We would use np.ogrid for this extension part :
def window_argmax_global(A, size):
hsize = (size-1)//2 # expects size as odd number
shp = A.shape
N = A.ndim
A1 = np.pad(A, (hsize,hsize), mode='reflect')
idx = viewW(A1, ([size]*N)).reshape(-1,size**N).argmax(-1).reshape(shp)
offsets = np.ogrid[tuple(map(slice, shp))]
out = np.unravel_index(idx, ([size]*N))
return [np.abs(i+j-hsize) for i,j in zip(out,offsets)]

Need to collate information from array in python, by finding match values that lie in different rows

I'm looking to collate 2 entries into one for many columns in a data array, by checking to see if several values in the two entries are the same.
0 A [[0.0, 0.5, 2.5, 2.5]
1 B [0.5, 1.0, 2.0, 2.0]
2 M [2.5, 2.5, 0.5, 0.0]
3 N [2.0, 2.0, 1.0, 0.5]
4 R [14.3, 13.8, 13.9, 14.2]]
Above shows the format the array takes, with the numbering and annotation of the rows on the left. Each column in the array is one distinct measurement.
Rows 0-3 are the x-locations along a straight line of 2 pairs of electrodes used to make a measurement (pair 1 = A & B, pair 2 = M & N); R is the measured resistivity when the four electrodes above it are used. As can be seen, in the 1st and 4th measurement, pair AB of measurement 1 = pair MN of measurement 4, and vice versa. The same is true of the 2nd and 3rd reading.
What I'm trying to do is to search through the array to find each pair of measurements, then collate that into one entry. This entry would take the first measurement's electrode locations (A,B,M &N), together with the first measurement's R value, but would also contain an extra row with the second measurement's R value. The result from the example above can be seen below.
0 A [[0.0, 0.5]
1 B [0.5, 1.0]
2 M [2.5, 2.5]
3 N [2.0, 2.0]
4 R1 [14.3, 13.8]
5 R2 [13.9, 14.2]]
Some information that may be useful:
The numbers are floats
The first set of measurements (i.e. before there will be any pairs) are in the first half of the dataset. What I mean by that is if there was an array with 100 columns(equalling 100 measurements), the columns 51-100 would be the pairs of the columns 1-50. The columns 51-100 do not follow the same pattern as the columns 1-50 though (i.e. column 1 wouldn't always equal column 51 in that example).
The electrodes do always follow the same pattern in the pair of measurements; "A" in measurement 1 will always = "M in measurement 2 in the pair, equally B = N, M = A & N = B.
I've been thinking bout how to do it, and I've thought that some kind of if statement such as the one below may be a start, but really I'm a complete novice, and this is quite a complex problem to search for an answer.
if all(A1 == M2, B1 == N2, M1 == A2, N1 == B2):
Any help would be really appreciated, even if it's just a pointer to wherever would be a good starting point to search for more information.
Thanks in advance!
Edit
Just to clarify, the order of R2 is liable to change for each dataset, and isn't the same as the order of R1. What I'm after doing is querying the A, B, M & N values to find the pairs of readings, then adding the paired R2 reading under its corresponding R1 reading.
Here is an example dataset that is a little larger:
#Input array
Arr1 =
[[0.5, 0.5, 1, 1, 1.5, 1.5, 1.5, 5, 4.5, 4.5, 3.5, 2.5, 2, 1]
[0, 0, 0.5, 0.5, 1, 1, 0.5, 5.5, 5, 5.5, 4, 3, 2.5, 1.5]
[1, 3.5, 2.5, 5, 2, 4.5, 4.5, 1, 1.5, 1.5, 0.5, 1, 1.5, 0.5]
[1.5, 4, 3, 5.5, 2.5, 5, 5.5, 0.5, 1, 0.5, 0, 0.5, 1, 0]
[14.3, 13.3, 25.1, 17.2, 19.9, 15.4, 16.1, 17.1, 15.3, 16.1, 13.4, 25.1, 19.8, 14.4]]
#Output array - extra R row and half the columns
Arr2 =
[[0.5, 0.5, 1, 1, 1.5, 1.5, 1.5]
[0, 0, 0.5, 0.5, 1, 1, 0.5]
[1, 3.5, 2.5, 5, 2, 4.5, 4.5]
[1.5, 4, 3, 5.5, 2.5, 5, 5.5]
[14.3, 13.3, 25.1, 17.2, 19.9, 15.4, 16.1]
[14.4, 13.4, 25.1, 17.1, 19.8, 15.3, 16.1]]
Here's a way to find the index of each R2 value that you're after and create the final transformation to your specifications, edited based on our earlier dialogue in the comments below:
#Input array
Arr1 = [[0.5, 0.5, 1, 1, 1.5, 1.5, 1.5, 5, 4.5, 4.5, 3.5, 2.5, 2, 1],
[0, 0, 0.5, 0.5, 1, 1, 0.5, 5.5, 5, 5.5, 4, 3, 2.5, 1.5],
[1, 3.5, 2.5, 5, 2, 4.5, 4.5, 1, 1.5, 1.5, 0.5, 1, 1.5, 0.5],
[1.5, 4, 3, 5.5, 2.5, 5, 5.5, 0.5, 1, 0.5, 0, 0.5, 1, 0],
[14.3, 13.3, 25.1, 17.2, 19.9, 15.4, 16.1,
17.1, 15.3, 16.1, 13.4, 25.1, 19.8, 14.4]]
#Output array - extra R row and half the columns
Arr2 = [[0.5, 0.5, 1, 1, 1.5, 1.5, 1.5],
[0, 0, 0.5, 0.5, 1, 1, 0.5],
[1, 3.5, 2.5, 5, 2, 4.5, 4.5],
[1.5, 4, 3, 5.5, 2.5, 5, 5.5],
[14.3, 13.3, 25.1, 17.2, 19.9, 15.4, 16.1],
[14.4, 13.4, 25.1, 17.1, 19.8, 15.3, 16.1]]
# get the first half of each list in Arr1
half_1 = [i[:len(i)//2] for i in Arr1[:-1]]
# 'flip' the arrays so that there's a list for each element 0, 1, ...
half_1_flip = [[i[j] for i in half_1] for j in range(len(half_1[0]))]
# get the second half of each list in Arr1
half_2 = [i[len(i)//2:] for i in Arr1[:-1]]
# 'rotate' the arrays so that A / B and M / N switch places
half_2_rotate = half_2[len(half_2)//2:] + half_2[:len(half_2)//2]
# 'flip' the arrays so that there's a list for each element 0, 1, ...
half_2_flip = [[i[j] for i in half_2_rotate]
for j in range(len(half_2_rotate[0]))]
# find each matching index of the first flipped list in the second list
seek_indices = [half_2_flip.index(a) for i, a in enumerate(half_1_flip)]
# pull out original R1 and R2
r1 = Arr1[-1][:len(Arr1[-1])//2]
r2 = Arr1[-1][len(Arr1[-1])//2:]
# reorder R2 based on indices
ordered_r2 = [r2[i] for i in seek_indices]
# get final transform
transform = half_1 + [r1] + [ordered_r2]
assert transform == Arr2
Another approach to the problem might be slicing the data using the following function:
import numpy as np
def transform(arr):
arr1 = arr[:,0:2]
arr1 = np.append(arr1,[arr[-1,2:]],axis=0)
return arr1
using given data:
arr = np.array([[0.0, 0.5, 2.5, 2.5],
[0.5, 1.0, 2.0, 2.0],
[2.5, 2.5, 0.5, 0.0],
[2.0, 2.0, 1.0, 0.5],
[14.3, 13.8, 13.9, 14.2]])
transform(arr) returns:
array([[ 0. , 0.5],
[ 0.5, 1. ],
[ 2.5, 2.5],
[ 2. , 2. ],
[ 14.3, 13.8],
[ 13.9, 14.2]])

numpy: how interpolate between two arrays for various timesteps?

I'm looking for a way to do a simple linear interpolation between two numpy arrays that represent a start and endpoint in time.
The two arrays have the same length:
fst = np.random.random_integers(5, size=(10.))
>>> array([4, 4, 1, 3, 1, 4, 3, 2, 5, 2])
snd = np.random.random_integers(5, size=(10.))
>>> array([1, 1, 3, 4, 1, 5, 5, 5, 4, 3])
Between my start and endpoint there are 3 timesteps. How can I interpolate between fst and snd? I want to be able, taking the first entry of fst and snd as an example, to retrieve the value of each timestep like
np.interp(1, [1,5], [4,1])
np.interp(2, [1,5], [4,1])
...
# that is
np.interp([1,2,3,4,5], [1,5], [4,1])
>>> array([ 4. , 3.25, 2.5 , 1.75, 1. ])
But than not just for the first entry but over the whole array.
Obviously, this won't do it:
np.interp(1, [1,5], [fst,snd])
Well I know I get there in a loop, e.g.
[np.interp(2, [1,5], [item,snd[idx]]) for idx,item in enumerate(fst)]
>>> [3.25, 3.25, 1.5, 3.25, 1.0, 4.25, 3.5, 2.75, 4.75, 2.25]
but I believe when you are lopping over numpy arrays you are doing something fundamentally wrong.
The facilities in scipy.interpolate.interp1d allow this to be done quite easily if you form your samples into a 2D matrix. In your case, you can construct a 2xN array, and construct an interpolation function that operates down the columns:
from scipy.interpolate import interp1d
fst = np.array([4, 4, 1, 3, 1, 4, 3, 2, 5, 2])
snd = np.array([1, 1, 3, 4, 1, 5, 5, 5, 4, 3])
linfit = interp1d([1,5], np.vstack([fst, snd]), axis=0)
You can then generate an interpolated vector at any time of interest. For example linfit(2) produces:
array([ 3.25, 3.25, 1.5 , 3.25, 1. , 4.25, 3.5 , 2.75, 4.75, 2.25])
or you can invoke linfit() with a vector of time values, e.g. linfit([1,2,3]) gives:
array([[ 4. , 4. , 1. , 3. , 1. , 4. , 3. , 2. , 5. , 2. ],
[ 3.25, 3.25, 1.5 , 3.25, 1. , 4.25, 3.5 , 2.75, 4.75, 2.25],
[ 2.5 , 2.5 , 2. , 3.5 , 1. , 4.5 , 4. , 3.5 , 4.5 , 2.5 ]])
If you're only doing linear interpolation, you could also just do something like:
((5-t)/(5-1)) * fst + ((t-1)/(5-1)) * snd
to directly compute the interpolated vector at any time t.

Categories

Resources