I am trying to define a function that finds the minimum value of an array and slices it around that value (plus or minus 5 positions). My array looks something like this:
[[ 0. 9.57705087]
[ 0.0433 9.58249315]
[ 0.0866 9.59745942]
[ 0.1299 9.62194967]
[ 0.1732 9.65324278]
[ 0.2165 9.68725702]
[ 0.2598 9.72263184]
[ 0.3031 9.75256437]
[ 0.3464 9.77025178]
[ 0.3897 9.76889121]
[ 0.433 9.74167982]
[ 0.4763 9.68589645]
[ 0.5196 9.59881999]
[ 0.5629 9.48861383]
[ 0.6062 9.3593597 ]]
However, I am dealing with much larger sets and need a function that can do it automatically without me having to manually find the minimun and then slice the array around that.I want to find the minimun of the array[:,1] values and then apply the slicing to the whole array.
Use np.argmin() to get the index of the minimum value. This will do it using the second column only (you haven't specified if it's the minimum value across columns or not).
your_array[:np.argmin(your_array[:, 1]), :]
To slice it 5 values further than the minimum, use:
your_array[:np.argmin(your_array[:, 1]) + 5, :]
Given your objective array:
import numpy as np
anarray = np.array([[ 0., 9.57705087],
[ 0.0433, 9.58249315],
[ 0.0866, 9.59745942],
[ 0.1299, 9.62194967],
[ 0.1732, 9.65324278],
[ 0.2165, 9.68725702],
[ 0.2598, 9.72263184],
[ 0.3031, 9.75256437],
[ 0.3464, 9.77025178],
[ 0.3897, 9.76889121],
[ 0.433, 9.74167982],
[ 0.4763, 9.68589645],
[ 0.5196, 9.59881999],
[ 0.5629, 0.48861383],
[ 0.6062, 9.3593597]])
This function will do the job:
def slice_by_five(array):
argmin = np.argmin(array[:,1])
if argmin < 5:
return array[:argmin+6,:]
return array[argmin-5:argmin+6,:]
check = slice_by_five(anarray)
print(check)
Output:
[[0.3897 9.76889121]
[0.433 9.74167982]
[0.4763 9.68589645]
[0.5196 9.59881999]
[0.5629 9.48861383]
[0.6062 9.3593597 ]]
The function can certainly be generalized to account for any neighborhood of size n:
def slice_by_n(array, n):
argmin = np.argmin(array[:,1])
if argmin < n:
return array[:argmin+n+1,:]
return array[argmin-n:argmin+n+1,:]
check = slice_by_n(anarray, 2)
print(check)
Output:
[[0.5196 9.59881999]
[0.5629 9.48861383]
[0.6062 9.3593597 ]]
Related
Let's say that I have this numpy array:
import numpy as np
np.random.seed(0)
data = np.random.normal(size=(5,5))
which result in:
I would like to select all pairs with a specific indexes distance along each row.
For example if I choose a index distance 4 along each row I expect to have:
res[0,0]=1.76,res[0,1]=2.24
res[1,0]=0.40,res[1,1]=1.86
res[2,0]=-0.97,res[2,1]=-0.10
res[3,0]=0.95,res[3,1]=0.41
...
....
I now that I could that with a for cycle but I would like to have something smarter. I was thing to create two list of indexes and then to fill res but also in this I need a cycle.
Best
hstack
I guess that something in the line of
win=3 # Size of window. You say 4, but what you describe is 3 in my view. But you know how to add 1 if needed :D
np.hstack((data[:, :data.shape[1]-win].reshape(-1,1), data[:, win:].reshape(-1,1)))
should do
Result is
array([[ 1.76405235, 2.2408932 ],
[ 0.40015721, 1.86755799],
[-0.97727788, -0.10321885],
[ 0.95008842, 0.4105985 ],
[ 0.14404357, 0.12167502],
[ 1.45427351, 0.44386323],
[ 0.33367433, 0.3130677 ],
[ 1.49407907, -0.85409574],
[-2.55298982, -0.74216502],
[ 0.6536186 , 2.26975462]])
Explanation:
data[:,:data.shape[1]-win] is
array([[ 1.76405235, 0.40015721],
[-0.97727788, 0.95008842],
[ 0.14404357, 1.45427351],
[ 0.33367433, 1.49407907],
[-2.55298982, 0.6536186 ]])
So, just the first columns of data. Number of column, data.shape[1]-win, being the number of possible columns for data's width and win size.
Likewise, data[:, win:] is
array([[ 2.2408932 , 1.86755799],
[-0.10321885, 0.4105985 ],
[ 0.12167502, 0.44386323],
[ 0.3130677 , -0.85409574],
[-0.74216502, 2.26975462]])
Which are this time the last columns (same number of columns), but separated by win indexes.
.reshape(-1,1) flatten vertically those data, if I may use this "flatten vertically" description. For example data[:,:data.shape[1]-win].reshape(-1,1) is the same but with 10 rows of 1 column instead of 5 rows of 2 columns.
array([[ 1.76405235],
[ 0.40015721],
[-0.97727788],
[ 0.95008842],
[ 0.14404357],
[ 1.45427351],
[ 0.33367433],
[ 1.49407907],
[-2.55298982],
[ 0.6536186 ]])
hstack put those two together.
Indexation
Another method, maybe closer to the one you're apparently about to create indexes list, would be
W=data.shape[1]-win # number of pair per row
iy=np.arange(len(data)*2*W)//W//2
ix=np.array([[i,i+win] for i in range(W)]*len(data)).flatten()
data[iy,ix].reshape(-1,2)
That is about 2 times longer in term of cpu time. But it is worth noting that most of cpu time is spend in the creation of indexes ix and iy. So if you have many data sets of the same shape, this option could be faster, since you compute ix and iy once for all
You can take elements by pairs of indices with numpy.take:
np.take(data, [[0, 3], [1, 4]], axis=1).reshape(data.shape[0] * 2, 2)
array([[ 1.76405235, 2.2408932 ],
[ 0.40015721, 1.86755799],
[-0.97727788, -0.10321885],
[ 0.95008842, 0.4105985 ],
[ 0.14404357, 0.12167502],
[ 1.45427351, 0.44386323],
[ 0.33367433, 0.3130677 ],
[ 1.49407907, -0.85409574],
[-2.55298982, -0.74216502],
[ 0.6536186 , 2.26975462]])
I am new to Python. I would like to create a new array, that contains all values from an existing array with the step.
I tried to implement it but I think there is another way to have better performance. Any try or recommendation is highly appreciated.
Ex: Currently, I have:
An array: 115.200 values (2D dimension)
Step: 10.000
....
array([[ 0.2735, -0.308 ],
[ 0.287 , -0.3235],
[ 0.2925, -0.324 ],
[ 0.312 , -0.329 ],
[ 0.3275, -0.345 ],
[ 0.3305, -0.352 ],
[ 0.332 , -0.3465],
...
[ 0.3535, -0.353 ],
[ 0.361 , -0.3445],
[ 0.3545, -0.329 ]])
Expectation: A new array is sliced the array above by step of 10.000.
Below is my code:
for x in ecg_data:
number_samples_by_duration_exp_temp = 10000
# len(ecg_property.sample) = 115200
times = len(ecg_property.sample) / number_samples_by_duration_exp_temp
index_by_time = [int(y)*number_samples_by_duration_exp_temp for y in np.arange(1, times, 1)]
list = []
temp = 0
for z in index_by_time:
arr_samples_by_duration = ecg_property.sample[temp:z]
list.append(arr_samples_by_duration)
temp = z
numpy can not be used for this purpose as len(ecg_property.sample) #115,200 is not fully divisible by number_samples_by_duration_exp_temp #10,000 and numpy cannot allow elements of varying lengths :)
You can try list comprehension.
result_list = [ecg_property.sample[temp :temp+step] for temp in np.arange(times)*step ]
where
step=10000 and times = len(ecg_property.sample) / step
It can be further modified if needed and as per requirement.
(You can try out each step in above line of code in this answer and see the output to understand each step )
Hope this works out.
ty!
Assume I have a multidimensional Numpy Array. Now I want to:
Slice out a certain row range defined by startIndex and endIndex.
Get a array with the original array minus the slice (so the left over).
The code below does this trick, however is this the most performance one?
Because my array is very big, can I (memory neutral) slice out the original array so that afterwards the original array is the left over. So except some overhead for the header of the new array this will cost no additional memory?
Is my snippet below (with creating new arrays), the the most efficient solution if we retain the original array?
Example:
import numpy as np
X = np.random.random((6, 2))
print('Orig',X)
startIndex = 2
endIndex = 4
print('Slice ',X[startIndex:endIndex])
print('LeftOver ',np.concatenate((X[:startIndex-1],X[endIndex:])))
Output:
Orig [[ 0.94661646 0.3911347 ]
[ 0.6807441 0.676658 ]
[ 0.81109554 0.18089991]
[ 0.6161699 0.19907537]
[ 0.12859196 0.34866049]
[ 0.22283545 0.04949782]]
Slice [[ 0.81109554 0.18089991]
[ 0.6161699 0.19907537]]
LeftOver [[ 0.94661646 0.3911347 ]
[ 0.12859196 0.34866049]
[ 0.22283545 0.04949782]]
Concatenate make a copy, and you need it if order matters.
But if your slices are slim, and order doesn't matter, a more economic way can be:
import numpy as np
size=6
X = np.random.random((size, 2))
print('Orig\n',X)
startIndex = 3
endIndex = 5
Slice=X[startIndex:endIndex].copy()
length = min(endIndex-startIndex,size-endIndex) # to check overlap
X[startIndex:startIndex+length]=X[-length:]
Left=X[:size-len(Slice)]
print('Slice\n',Slice)
print('LeftOver\n',Left)
because at most 2x the size of the slice is copied, not the whole array.
it gives:
Orig
[[ 0.39351322 0.42100711]
[ 0.14793363 0.12149344]
[ 0.94524844 0.22004186]
[ 0.816418 0.35630767]
[ 0.37781821 0.12336287]
[ 0.65995888 0.23812275]]
Slice
[[ 0.816418 0.35630767]
[ 0.37781821 0.12336287]]
LeftOver
[[ 0.39351322 0.42100711]
[ 0.14793363 0.12149344]
[ 0.94524844 0.22004186]
[ 0.65995888 0.23812275]]
I am trying to find mirror images in a numpy array. In particular, (x,y) == (y,x) but I want to rule out tuples with identical values (x,x).
Given a numpy array pckList with the size (198L,3L) containing floats.
I have the following code:
np.sum([x==pckLst[:,2] for x in pckLst[:,1]])
Which returns a given number, lets say 73
np.sum([x==pckLst[:,2] for x in pckLst[:,1]] and [x==pckLst[:,1] for x in pckLst[:,1]])
Returns a larger number, lets say 266.
Can someone please explain how this comes about?
I thought the first line returns True, when seen as tuples (x,y) == (any,y) and the second line returns only true when (x,y) == (y,x).
Is this correct?
EDIT:
Further explaination:
pckLst=[[ 112.066, 6.946, 6.938],
[ 111.979, 6.882, 7.634],
[ 112.014, 6.879, 7.587],
[ 112.005, 6.887, 7.554],
[ 111.995, 6.88, 6.88 ],
[ 112.048, 6.774, 6.88 ],
[ 111.808, 7.791, 7.566],
[ 111.802, 6.88, 6.774]]
Now I would like to find [ 112.048, 6.774, 6.88 ], since (6.88, 6.774) == (6.774, 6.88). However, [ 111.995, 6.88, 6.88 ] should not be considered a match.
Rather than commenting on your code here is a simpler implementation
a=np.array([[1,1,10],[1,2,20],[2,1,30],[1,3,40],[2,3,50]])
xy= a[:,:2].tolist()
[[x,y,z] for [x,y,z] in a if [y,x] in xy and x!=y]
[[1, 2, 20], [2, 1, 30]]
The arguments to "and" in your example are python-lists. The truth value of a list is True if it is not empty. Thats why you get a bigger sum in the latter case.
This will return the sum of elements with (x,y) == (y,x). It obviously only works if your just interested in the sum and not particular indices:
import numpy
pckLst = numpy.array([[ 112.066, 6.946, 6.938],
[ 111.979, 6.882, 7.634],
[ 112.014, 6.879, 7.587],
[ 112.005, 6.887, 7.554],
[ 111.995, 6.88, 6.88 ],
[ 112.048, 6.774, 6.88 ],
[ 111.808, 7.791, 7.566],
[ 111.802, 6.88, 6.774]])
coords = pckLst[:,1:]
equal_ids = numpy.ravel(coords[:,:1] != coords[:,1:])
unequal_coords = coords[equal_ids]
flipped = numpy.fliplr(unequal_coords)
coords_tuple_set = set(tuple(map(tuple, unequal_coords)))
flipped_tuple_set = set(tuple(map(tuple, flipped)))
print coords_tuple_set
print flipped_tuple_set
# need to devide by two, because we get (x,y) and (y,x) by the intersection
print "number of mirrored points:",
print len(coords_tuple_set.intersection(flipped_tuple_set))/2
I have an array called phases, let's say it looks like this:
phases = numpy.random.uniform(0,1,10)
I now want to populate a matrix where every row is some function f applied to a successive index of phases, and every column is a multiple of it, looking something like this:
[[ f(phases[0]) f(2*phases[0]) f(3*phases[0]) ]
[ f(phases[1]) f(2*phases[1]) f(3*phases[1]) ]
... ... ...
[ f(phases[9]) f(2*phases[9]) f(3*phases[9]) ]]
We can say f is something simple for the sake of example, like f(x) = x+1.
So I figured I would just use numpy.fromfunction as follows:
numpy.fromfunction(lambda i,j: (j+1)*phases[i]+1,
(phases.size, 3), dtype=float)
but this gives me an error:
IndexError: arrays used as indices must be of integer (or boolean) type
How can I access the ith element of phases within fromfunction?
Or is this the wrong approach to take?
numpy.fromfunction does not work as expected, its documentation is also misleading.
The function is not called for each cell, but once with all indices.
def fromfunction(function, shape, **kwargs):
dtype = kwargs.pop('dtype', float)
args = indices(shape, dtype=dtype)
return function(*args,**kwargs)
So now, to get your result, you can do the following :
In [57]: vf = numpy.vectorize(f)
In [58]: vf(numpy.outer(phases, numpy.arange(1,4)))
Out[58]:
array([[ 1.87176928, 2.74353857, 3.61530785],
[ 1.23090955, 1.4618191 , 1.69272866],
[ 1.29294723, 1.58589445, 1.87884168],
[ 1.05863891, 1.11727783, 1.17591674],
[ 1.28370397, 1.56740794, 1.85111191],
[ 1.87210286, 2.74420573, 3.61630859],
[ 1.08652975, 1.1730595 , 1.25958925],
[ 1.33835545, 1.6767109 , 2.01506634],
[ 1.74479635, 2.48959269, 3.23438904],
[ 1.76381301, 2.52762602, 3.29143903]])
outer will perform the outer product of two vectors, exactly what you want except from the function.
Your function must be able to handle arrays. For non-trivial operations, you will have to vectorize the function, so that it will be applied cell-by-cell. In your example, you don't have to care.
I think the easiest approach that follows NumPy idioms (and therefore vectorizes well) is to make the matrix you want first, and then apply your function f to it.
>>> phases = numpy.random.uniform(0,1,10)
>>> phases = phases.reshape((10, 1))
>>> phases = np.tile(phases, (1, 3))
This gives you the a matrix (actually an ndarray) of the form
[[ phases[0] 2*phases[0] 3*phases[0] ]
[ phases[1] 2*phases[1] 3*phases[1] ]
... ... ...
[ phases[9] 2*phases[9] 3*phases[9] ]]
which you can then apply your function to.
>>> def f(x):
... return numpy.sin(x)
>>> f(phases)
array([[ 0.56551297, 0.93280166, 0.97312359],
[ 0.38704365, 0.71375602, 0.92921009],
[ 0.62778184, 0.97731738, 0.89368501],
[ 0.0806512 , 0.16077695, 0.23985519],
[ 0.4140241 , 0.75374405, 0.95819095],
[ 0.25929821, 0.50085902, 0.70815838],
[ 0.25399811, 0.49133634, 0.69644753],
[ 0.7754078 , 0.97927926, 0.46134512],
[ 0.53301912, 0.90197836, 0.99331443],
[ 0.44019133, 0.79049912, 0.9793933 ]])
This only works if your function, f, is "vectorized", which is to say that it accepts an ndarray and operates element-wise on that array. If that's not the case, then you can use numpy.vectorize to get a version of that function that does so.
>>> import math
>>> def f(x):
... return math.sin(x)
>>> f(phases)
TypeError: only length-1 arrays can be converted to Python scalars
>>> f = numpy.vectorize(f)
>>> f(phases)
array([[ 0.56551297, 0.93280166, 0.97312359],
[ 0.38704365, 0.71375602, 0.92921009],
[ 0.62778184, 0.97731738, 0.89368501],
[ 0.0806512 , 0.16077695, 0.23985519],
[ 0.4140241 , 0.75374405, 0.95819095],
[ 0.25929821, 0.50085902, 0.70815838],
[ 0.25399811, 0.49133634, 0.69644753],
[ 0.7754078 , 0.97927926, 0.46134512],
[ 0.53301912, 0.90197836, 0.99331443],
[ 0.44019133, 0.79049912, 0.9793933 ]])