I am looking for a quick way to find the start and end indexes of each "block" of consecutive trues in a Vector.
Both julia or python would do the job for me. I'll write my example in julia syntax:
Say I have a vector
a = [false, true, true, true, false, true, false, true, true, false]
what I want to get is something like this (with 1-based indexing):
[[2, 4], [6, 6], [8, 9]]
The exact form/type of the returned value does not matter, I am mostly looking for a quick and syntactically easy solution. Single trues surrounded by falses should also be detected, as given in my example.
My use-case with this is that I want to find intervals in a Vector of data where the values are below a certain threshold. So I get a boolean array from my data where this is true. Ultimately I want to shade these intervals in a plot, for which I need the start and end indeces of each interval.
function intervals(a)
jumps = diff([false; a; false])
zip(findall(jumps .== 1), findall(jumps .== -1) .- 1)
end
Quick in terms of keystrokes, maybe not in performance or readability :)
My use-case with this is that I want to find intervals in a Vector of data where the values are below a certain threshold.
Let's say your vector is v and your threshold is 7:
julia> println(v); threshold
[9, 6, 1, 9, 5, 9, 4, 5, 6, 1]
7
You can use findall to get the indices where the value is below the threshold, and get the boundaries from that:
julia> let start = 1, f = findall(<(threshold), v), intervals = Tuple{Int, Int}[]
for i in Iterators.drop(eachindex(f), 1)
if f[i] - f[i - 1] > 1
push!(intervals, (f[start], f[i - 1]))
start = i
end
end
push!(intervals, (f[start], last(f)))
end
3-element Vector{Tuple{Int64, Int64}}:
(2, 3)
(5, 5)
(7, 10)
Here's a version that avoids running findall first, and is a bit faster as a consequence:
function intervals(v)
ints = UnitRange{Int}[]
i = firstindex(v)
while i <= lastindex(v)
j = findnext(v, i) # find next true
isnothing(j) && break
k = findnext(!, v, j+1) # find next false
isnothing(k) && (k = lastindex(v)+1)
push!(ints, j:k-1)
i = k+1
end
return ints
end
It also returns a vector of UnitRanges, since that seemed a bit more natural to me.
try this:
a = [False, True, True, True, False, True, False, True, True, False]
index = 0
foundTrue = False
booleanList = []
sublist = []
for i in a:
index += 1
if foundTrue:
if i == False:
foundTrue = False
sublist.append(index-1)
booleanList.append(sublist)
sublist = []
else:
if i == True:
foundTrue = True
sublist.append(index)
print(booleanList)
output should be: [[2, 4], [6, 6], [8, 9]]
This iterates in the a list and when it finds a True it marks a flag (foundTrue) and stores its index on sublist. Now with the maked flag (foundTrue), if it finds a False, then we store the previous index from that False into sublist, appends it to the booleanList and resets sublist.
This is not the shortest but very fast without using any find functions.
function find_intervals(v)
i = 0
res = Tuple{Int64, Int64}[]
while (i+=1) <= length(v)
v[i] || continue
s = f = i
while i < length(v) && v[i+=1]
f = i
end
push!(res, (s,f))
end
res
end
For a = [false, true, true, true, false, true, false, true, true, false], it gives:
find_intervals(a)
3-element Vector{Tuple{Int64, Int64}}:
(2, 4)
(6, 6)
(8, 9)
Related
I'm looking for a vectorized function that returns a mask with values of True if the value in the array has been seen before and False otherwise.
I'm looking for the fastest solution possible as speed is very important.
For example this is what I would like to see:
array = [1, 2, 1, 2, 3]
mask = [False, False, True, True, False]
So is_duplicate = array[mask] should return [1, 2].
Is there a fast, vectorized way to do this? Thanks!
Approach #1 : With sorting
def mask_firstocc(a):
sidx = a.argsort(kind='stable')
b = a[sidx]
out = np.r_[False,b[:-1] == b[1:]][sidx.argsort()]
return out
We can use array-assignment to boost perf. further -
def mask_firstocc_v2(a):
sidx = a.argsort(kind='stable')
b = a[sidx]
mask = np.r_[False,b[:-1] == b[1:]]
out = np.empty(len(a), dtype=bool)
out[sidx] = mask
return out
Sample run -
In [166]: a
Out[166]: array([2, 1, 1, 0, 0, 4, 0, 3])
In [167]: mask_firstocc(a)
Out[167]: array([False, False, True, False, True, False, True, False])
Approach #2 : With np.unique(..., return_index)
We can leverage np.unique with its return_index which seems to return the first occurence of each unique elemnent, hence a simple array-assignment and then indexing works -
def mask_firstocc_with_unique(a):
mask = np.ones(len(a), dtype=bool)
mask[np.unique(a, return_index=True)[1]] = False
return mask
Use np.unique
a = np.array([1, 2, 1, 2, 3])
_, ix = np.unique(a, return_index=True)
b = np.full(a.shape, True)
b[ix] = False
In [45]: b
Out[45]: array([False, False, True, True, False])
You can achieve that using the enumerate method - which lets you loop through using index + value :
array = [1, 2, 1, 2, 3]
mask = []
for i,v in enumerate(array):
if array.index(v) == i:
mask.append(False)
else:
mask.append(True)
print(mask)
Output:
[False, False, True, True, False]
Almost by definition, this can't be vectorized. The value of mask for any index depends on the value of array for every value between 0 and index. There may be some algorithm where you expand array into a NxN matrix and do fancy tests, but you're still going to have an O(n^2) algorithm. The straightforward set algorithm is O(n log n).
Is there any quick way to find out, if two points on 2D boolean area are connected and you can mobe only up, down, left and right on a square with value True?
Let's say you would have following 6x6 2D list:
In code, that would be:
bool2DList = [6][6]
bool2DList = { True, True, False, False, True, True,
False, True, True, False, True, True,
False, True, True, False, False, True,
False, False, False, False, False, True,
False, False, False, False, True, True,
False, True, True, True, True, True }
Green squares have value True and blue ones False. I was thinking about function( it would probably need to be recursive ), in which you would put a 2D list as a argument alongside with a list of tuples ( coordinates ) of several points and finaly one tuple of special point, it could have header like this:
def FindWay( bool2DList,listOfPoints,specialPointCoord )
In this example the special point would be the point P with coordinates 5;1. Let's imagine you would start walking from that special points. What points could you reach without stepping on the blue squares? In this example, only points P4 and P5 ( the output could be let's say the coordinates of those points, so 0;5 and 5;3 ). It would probably need to be recursive, but I have no idea, how the body should look like.
Thank you.
I'm afraid there is no trivial way to do this. It's a graph traversal problem, and Python doesn't have built-in functions supporting that. I expect that you'll want a simple implementation of a breadth-first graph search.
Very briefly, you keep a list of nodes you've visited, but not handled; another list of nodes you've handled. The steps look like this:
handled = []
visited = [P]
while visited is not empty:
remove a node A from the visited list
for each node B you can reach directly from A:
if B is new (not in visited or handled list):
put B on the visited list
put A on the handled list
This will find all nodes you can reach. If you're worried about a particular node, then inside the loop, check to see whether B is your target node. When you put B on the visited list, put it on the front for depth-first, on the back for breadth-first.
In this application, "all the nodes you can reach" consists of the bordering ones with the same Boolean label.
Here is an option how you can code it:
A = np.array([[0, 1, 1, 0], [1, 0, 1, 1], [1, 0, 1, 0], [0, 1, 0, 0]]).astype(bool)
print A
[[False True True False]
[ True False True True]
[ True False True False]
[False True False False]]
We can upgrade standard dfs function for our needs:
def dfs_area(A, start):
graph = {}
res = np.zeros_like(A)
for index, x in np.ndenumerate(A):
graph[index] = x
visited, stack = set(), [start]
while stack:
vertex = stack.pop()
x, y = vertex[0], vertex[1]
if vertex not in visited and graph[vertex] == True:
visited.add(vertex)
if x < A.shape[0]-1:
stack.append((x+1, y))
if x > 1:
stack.append((x-1, y))
if y < A.shape[1]-1:
stack.append((x, y+1))
if y > 1:
stack.append((x, y-1))
res[tuple(np.array(list(visited)).T)] = 1
return res
Assume that we want points connected to (1, 2) - second row, third value:
mask = dfs_area(A, (1,2))
>> mask
array([[0, 1, 1, 0],
[0, 0, 1, 1],
[0, 0, 1, 0],
[0, 0, 0, 0]])
I have a N x M numpy array (matrix). Here is an example with a 3 x 5 array:
x = numpy.array([[0,1,2,3,4,5],[0,-1,2,3,-4,-5],[0,-1,-2,-3,4,5]])
I'd like to scan all the columns of x and replace the values of each column if they are equal to a specific value.
This code for example aims to replace all the negative values (where the value is equal to the column number) to 100:
for i in range(1,6):
x[:,i == -(i)] = 100
This code obtains this warning:
DeprecationWarning: using a boolean instead of an integer will result in an error in the future
I'm using numpy 1.8.2. How can I avoid this warning without downgrade numpy?
I don't follow what your code is trying to do:
the i == -(i)
will evaluate to something like this:
x[:, True]
x[:, False]
I don't think this is what you want. You should try something like this:
for i in range(1, 6):
mask = x[:, i] == -i
x[:, i][mask] = 100
Create a mask over the whole column, and use that to change the values.
Even without the warning, the code you have there will not do what you want. i is the loop index and will equal minus itself only if i == 0, which is never. Your test will always return false, which is cast to 0. In other words your code will replace the first element of each row with 100.
To get this to work I would do
for i in range(1, 6):
col = x[:,i]
col[col == -i] = 100
Notice that you use the name of the array for the masking and that you need to separate the conventional indexing from the masking
If you are worried about the warning spewing out text, then ignore it as a Warning/Exception:
import numpy
import warnings
warnings.simplefilter('default') # this enables DeprecationWarnings to be thrown
x = numpy.array([[0,1,2,3,4,5],[0,-1,2,3,-4,-5],[0,-1,-2,-3,4,5]])
with warnings.catch_warnings():
warnings.simplefilter("ignore") # and this ignores them
for i in range(1,6):
x[:,i == -(i)] = 100
print(x) # just to show that you are actually changing the content
As you can see in the comments, some people are not getting DeprecationWarning. That is probably because python suppresses developer-only warnings since 2.7
As others have said, your loop isn't doing what you think it is doing. I would propose you change your code to use numpy's fancy indexing.
# First, create the "test values" (column index):
>>> test_values = numpy.arange(6)
# test_values is array([0, 1, 2, 3, 4, 5])
#
# Now, we want to check which columns have value == -test_values:
#
>>> mask = (x == -test_values) & (x < 0)
# mask is True wherever a value in the i-th column of x is negative i
>>> mask
array([[False, False, False, False, False, False],
[False, True, False, False, True, True],
[False, True, True, True, False, False]], dtype=bool)
#
# Now, set those values to 100
>>> x[mask] = 100
>>> x
array([[ 0, 1, 2, 3, 4, 5],
[ 0, 100, 2, 3, 100, 100],
[ 0, 100, 100, 100, 4, 5]])
I'm often met with an analog of the following problem, and have had trouble writing clean code to solve it. Usually, I have something involving a temporary variable and a for loop, but is there a more elegant way?
Suppose I have a list of booleans or values which evaluate to booleans:
[True, False, True, False, False, True]
How would I map this to a list of values, with the index of the previous True, inclusive?
[0, 0, 2, 2, 2, 5]
[EDIT] Have tried something along the lines of:
def example(lst):
rst, tmp = [], None
for i in range(len(lst)):
if lst[i]:
tmp = i
rst.append(tmp)
return rst
Assuming the first element of the list is always True.
While it still uses a for loop and a temporary variable, it's still relatively clean, I think. If you want, you could replace the yield and append to a list and return that.
def get_indexes(booleans):
previous = 0
for index, b in enumerate(booleans):
if b:
previous = index
yield previous
>>> b = [True, False, True, False, False, True]
>>> list(get_indexes(b))
[0, 0, 2, 2, 2, 5]
This is even shorter (although potentially less readable):
def get_indexes(booleans):
previous = 0
for index, b in enumerate(booleans):
previous = index if b else previous
yield previous
Try this:
index = 0
bools = [True, False, True, False, False, True]
result = []
for i in range(len(bools)):
index = i if bools[i] else index
result.append(index)
Not tested, but should work.
[i if b else i-lst[i::-1].index(True) for i,b in enumerate(lst)]
The title might be ambiguous, didn't know how else to word it.
I have gotten a bit far with my particle simulator in python using numpy and matplotlib, I have managed to implement coloumb, gravity and wind, now I just want to add temperature and pressure but I have a pre-optimization question (root of all evil). I want to see when particles crash:
Q: Is it in numpy possible to take the difference of an array with each of its own element based on a bool condition? I want to avoid looping.
Eg: (x - any element in x) < a
Should return something like
[True, True, False, True]
If element 0,1 and 3 in x meets the condition.
Edit:
The loop quivalent would be:
for i in len(x):
for j in in len(x):
#!= not so important
##earlier question I asked lets me figure that one out
if i!=j:
if x[j] - x[i] < a:
True
I notice numpy operations are far faster than if tests and this has helped me speed up things ALOT.
Here is a sample code if anyone wants to play with it.
#Simple circular box simulator, part of part_sim
#Restructure to import into gravity() or coloumb () or wind() or pressure()
#Or to use all forces: sim_full()
#Note: Implement crashing as backbone to all forces
import numpy as np
import matplotlib.pyplot as plt
N = 1000 #Number of particles
R = 8000 #Radius of box
r = np.random.randint(0,R/2,2*N).reshape(N,2)
v = np.random.randint(-200,200,r.shape)
v_limit = 10000 #Speedlimit
plt.ion()
line, = plt.plot([],'o')
plt.axis([-10000,10000,-10000,10000])
while True:
r_hit = np.sqrt(np.sum(r**2,axis=1))>R #Who let the dogs out, who, who?
r_nhit = ~r_hit
N_rhit = r_hit[r_hit].shape[0]
r[r_hit] = r[r_hit] - 0.1*v[r_hit] #Get the dogs back inside
r[r_nhit] = r[r_nhit] +0.1*v[r_nhit]
#Dogs should turn tail before they crash!
#---
#---crash code here....
#---crash end
#---
vmin, vmax = np.min(v), np.max(v)
#Give the particles a random kick when they hit the wall
v[r_hit] = -v[r_hit] + np.random.randint(vmin, vmax, (N_rhit,2))
#Slow down honey
v_abs = np.abs(v) > v_limit
#Hit the wall at too high v honey? You are getting a speed reduction
v[v_abs] *=0.5
line.set_ydata(r[:,1])
line.set_xdata(r[:,0])
plt.draw()
I plan to add colors to the datapoints above once I figure out how...such that high velocity particles can easily be distinguished in larger boxes.
Eg: x - any element in x < a Should return something like
[True, True, False, True]
If element 0,1 and 3 in x meets the condition. I notice numpy operations are far faster than if tests and this has helped me speed up things ALOT.
Yes, it's just m < a. For example:
>>> m = np.array((1, 3, 10, 5))
>>> a = 6
>>> m2 = m < a
>>> m2
array([ True, True, False, True], dtype=bool)
Now, to the question:
Q: Is it in numpy possible to take the difference of an array with each of its own element based on a bool condition? I want to avoid looping.
I'm not sure what you're asking for here, but it doesn't seem to match the example directly below it. Are you trying to, e.g., subtract 1 from each element that satisfies the predicate? In that case, you can rely on the fact that False==0 and True==1 and just subtract the boolean array:
>>> m3 = m - m2
>>> m3
>>> array([ 0, 2, 10, 4])
From your clarification, you want the equivalent of this pseudocode loop:
for i in len(x):
for j in in len(x):
#!= not so important
##earlier question I asked lets me figure that one out
if i!=j:
if x[j] - x[i] < a:
True
I think the confusion here is that this is the exact opposite of what you said: you don't want "the difference of an array with each of its own element based on a bool condition", but "a bool condition based on the difference of an array with each of its own elements". And even that only really gets you to a square matrix of len(m)*len(m) bools, but I think the part left over is that the "any".
At any rate, you're asking for an implicit cartesian product, comparing each element of m to each element of m.
You can easily reduce this from two loops to one (or, rather, implicitly vectorize one of them, gaining the usual numpy performance benefits). For each value, create a new array by subtracting that value from each element and comparing the result with a, and then join those up:
>>> a = -2
>>> comparisons = np.array([m - x < a for x in m])
>>> flattened = np.any(comparisons, 0)
>>> flattened
array([ True, True, False, True], dtype=bool)
But you can also turn this into a simple matrix operation pretty easily. Subtracting every element of m from every other element of m is just m - m.T. (You can make the product more explicit, but the way numpy handles adding row and column vectors, it isn't necessary.) And then you just compare every element of that to the scalar a, and reduce with any, and you're done:
>>> a = -2
>>> m = np.matrix((1, 3, 10, 5))
>>> subtractions = m - m.T
>>> subtractions
matrix([[ 0, 2, 9, 4],
[-2, 0, 7, 2],
[-9, -7, 0, -5],
[-4, -2, 5, 0]])
>>> comparisons = subtractions < a
>>> comparisons
matrix([[False, False, False, False],
[False, False, False, False],
[ True, True, False, True],
[ True, False, False, False]], dtype=bool)
>>> np.any(comparisons, 0)
matrix([[ True, True, False, True]], dtype=bool)
Or, putting it all together in one line:
>>> np.any((m - m.T) < a, 0)
matrix([[ True, True, True, True]], dtype=bool)
If you need m to be an array rather than a matrix, you can replace the subtraction line with m - np.matrix(m).T.
For higher dimensions, you actually do need to work in arrays, because you're trying to cartesian-product a 2D array with itself to get a 4D array, and numpy doesn't do 4D matrices. So, you can't use the simple "row vector - column vector = matrix" trick. But you can do it manually:
>>> m = np.array([[1,2], [3,4]]) # 2x2
>>> m4d = m.reshape(1, 1, 2, 2) # 1x1x2x2
>>> m4d
array([[[[1, 2],
[3, 4]]]])
>>> mt4d = m4d.T # 2x2x1x1
>>> mt4d
array([[[[1]],
[[3]]],
[[[2]],
[[4]]]])
>>> subtractions = m - mt4d # 2x2x2x2
>>> subtractions
array([[[[ 0, 1],
[ 2, 3]],
[[-2, -1],
[ 0, 1]]],
[[[-1, 0],
[ 1, 2]],
[[-3, -2],
[-1, 0]]]])
And from there, the remainder is the same as before. Putting it together into one line:
>>> np.any((m - m.reshape(1, 1, 2, 2).T) < a, 0)
(If you remember my original answer, I'd somehow blanked on reshape and was doing the same thing by multiplying m by a column vector of 1s, which obviously is a much stupider way to proceed.)
One last quick thought: If your algorithm really is "the bool result of (for any element y of m, x - y < a) for each element x of m", you don't actually need "for any element y", you can just use "for the maximal element y". So you can simplify from O(N^2) to O(N):
>>> (m - m.max()) < a
Or, if a is positive, that's always false, so you can simplify to O(1):
>>> np.zeros(m.shape, dtype=bool)
But I'm guessing your real algorithm is actually using abs(x - y), or something more complicated, which can't be simplified in this way.