How do I obtain a mask, reversing numpy.flatnonzero? - python

Given an arbitrary one-dimensional mask:
In [1]: import numpy as np
...: mask = np.array(np.random.random_integers(0,1,20), dtype=bool)
...: mask
Out[1]:
array([ True, False, True, False, False, True, False, True, True,
False, True, False, True, False, False, True, True, False,
True, True], dtype=bool)
We can obtain an array of the True elements of mask using np.flatnonzero:
In[2]: np.flatnonzero(mask)
Out[2]: array([ 0, 2, 5, 7, 8, 10, 12, 15, 16, 18, 19], dtype=int64)
But now how do I reverse this process and go from _2 to a mask?

Create an all-false mask and then use numpy's index array functionality to assign the True entries for the mask.
In[3]: new_mask = np.zeros(20, dtype=bool)
...: new_mask
Out[3]:
array([False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False], dtype=bool)
In[4]: new_mask[_2] = True
...: new_mask
Out[4]:
array([ True, False, True, False, False, True, False, True, True,
False, True, False, True, False, False, True, True, False,
True, True], dtype=bool)
As a check we see that:
In[5]: np.flatnonzero(new_mask)
Out[5]: array([ 0, 2, 5, 7, 8, 10, 12, 15, 16, 18, 19], dtype=int64)
As expected, _5 == _2:
In[6]: np.all(_5 == _2)
Out[6]: True

You could use np.bincount:
In [304]: mask = np.random.binomial(1, 0.5, size=10).astype(bool); mask
Out[304]: array([ True, True, False, True, False, False, False, True, False, True], dtype=bool)
In [305]: idx = np.flatnonzero(mask); idx
Out[305]: array([0, 1, 3, 7, 9])
In [306]: np.bincount(idx, minlength=len(mask)).astype(bool)
Out[306]: array([ True, True, False, True, False, False, False, True, False, True], dtype=bool)

Related

Return a boolean array of values < 40

How can I get a boolean 1 dimentional output for values <40 from the below given array. Since there are three values <40 so the output should be: array([ True, True, True])
x = np.array([[40, 37, 70],[62, 61, 98],[65, 89, 22],[95, 98, 81],[44, 32, 79]])
You can do it like this:
import numpy as np
x = np.array([[40, 37, 70],[62, 61, 98],[65, 89, 22],[95, 98, 81],[44, 32, 79]])
x<40
Output:
array([[False, True, False],
[False, False, False],
[False, False, True],
[False, False, False],
[False, True, False]])
Or if you want a 1d result, you can use .flatten():
y = x.flatten()
y<40
Output:
array([False, True, False, False, False, False, False, False, True,
False, False, False, False, True, False])
If you want a 1d list like [True]*n where n is the number of values <40, you can do:
np.array([i for i in x.flatten()<40 if i])
Output:
array([True, True, True])
This could be solved in many ways, one could be:
x[x<40]<40

Subtract elements from one list from another list, using list comprehension. Returns incomplete list?

I have an array l1 of size (81x2), and another l2 of size (8x2). All elements of l2 are also contained in l1. I'm trying to generate an array l3 of size (73x2) containing all elements of l1 minus the ones in l2 ( ==> l3 = l1 - l2 ), but using list comprehension.
I found many similar questions on here, and almost all agree on a solution like this to generate l3:
n = 9
index = np.arange(n)
l1 = np.array([(i,j) for i in index for j in index])
l2 = np.array([(0, 3),(0, 5),(2, 4),(4, 4),(4, 2),(4, 6),(8, 3),(8, 5)])
l3 = [(i,j) for (i,j) in l1 if (i,j) not in l2]
print(l3)
However, the code above generates an array l3 that only contains 20 of the expected (81-8=) 73 elements. I don't understand how list comprehension operates here or why only those particular 20 elements are kept. Can anyone help?
NOTE: many people advise using set() instead of list comprehension for this problem, but I haven't tried that yet and I'd really like to understand why list comprehension is failing in the code above.
Let's test the first row of l1:
In [46]: i,j = l1[0]
In [47]: i,j
Out[47]: (0, 0)
In [48]: (i,j) in l2
Out[48]: True
It's True because 0 occurs in l2. It isn't testing by rows.
There isn't a 7 in l2, so this is False
In [49]: (7,7) in l2
Out[49]: False
Make sure your list comprehension test works.
One way to test for matches is:
In [72]: x = (l1==l2[:,None,:]).all(axis=2).any(axis=0)
In [73]: x
Out[73]:
array([False, False, False, True, False, True, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, True, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, True, False, True, False, True, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, True, False, True, False, False, False])
This has 8 True values, the ones that exactly match l2:
In [74]: x.sum()
Out[74]: 8
In [75]: l1[x]
Out[75]:
array([[0, 3],
[0, 5],
[2, 4],
[4, 2],
[4, 4],
[4, 6],
[8, 3],
[8, 5]])
So the rest would be accessed with:
In [76]: l1[~x]
TO work with sets, we need to convert the arrays to lists of tuples
In [85]: s1 = set([tuple(x) for x in l1])
In [86]: s2 = set([tuple(x) for x in l2])
In [87]: len(s1.difference(s2))
Out[87]: 73
Another approach is to convert the arrays to structured arrays:
In [88]: import np.lib.recfunctions as rf
In [102]: r1 = rf.unstructured_to_structured(l1,dtype=np.dtype('i,i'))
In [103]: r2 = rf.unstructured_to_structured(l2,dtype=np.dtype('i,i'))
In [104]: r2
Out[104]:
array([(0, 3), (0, 5), (2, 4), (4, 4), (4, 2), (4, 6), (8, 3), (8, 5)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
Now isin works - the arrays are both 1d, as required by isin:
In [105]: np.isin(r1,r2)
Out[105]:
array([False, False, False, True, False, True, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, True, False, False, False, False,
...])

Python, The fastest way to find numbers indexes in np. array

The fastest way to find numbers indexes in np. array in Python is?
Suppose we have a list of numbers from 0 to 20, and we want to know the indexes of digits 2 and 5
The canonical way would be to use numpy's where method:
a = np.array(range(20))
np.where((a == 2) | (a == 5))
Note that in order to combine the two terms (a == 2) and (a == 5) we need the bitwise or operator |. The reason is that both (a == 2) and (a == 5) return a numpy array of dtype('bool'):
>>> a == 2
array([False, False, True, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False])
>>> (a == 5)
array([False, False, False, False, False, True, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False])
>>> (a == 2) | (a==5)
array([False, False, True, False, False, True, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False])

What is a faster solution for integer coordinates in circle of radius x

I'm trying to create a method that returns the count of integer coordinates within a circle of radius rad but I believe my current solution is too slow. What would you recommend for a better implementation? I'd like to code the solution myself as I'm still a beginner.
This is my current solution:
def points(rad):
possiblePoints = []
negX = -rad
negY = -rad
#creating a list of all possible points from -rad,-rad to rad, rad
while(negX <= rad):
while(negY <= rad):
possiblePoints.append((negX,negY))
negY += 1
negY = -rad
negX += 1
count = 0
index=0
negX = -rad
negY = -rad
distance = 0
#counting the possible points for distances within rad
for i in possiblePoints:
for j in range(0,2):
distance += (possiblePoints[index][j])**2
distance = distance**(0.5)
if(distance<=rad):
count+=1
distance = 0
index += 1
return count
A point (x, y) is within the radius given x2+y2≤r2. We can calculate the number of coordinates for a fixed y-value as:
ny=1 + 2×⌊√(r2-y2)⌋
If we then slice y between 0 and r, and we count each ny for y>0 twice, we obtain the total number, so we can calculate this as:
import numpy as np
from math import floor
def total(radius):
r0 = floor(radius)
ys = np.arange(1, r0+1)
halves = 2 * np.sum(2*np.floor(np.sqrt(radius * radius - ys * ys)) + 1)
return halves + 1 + 2 * r0
We thus calculate for each "layer" the number of integral coordinates, and we double each layer, since there is a "co-layer" with the same number of integral coordinates in the opposite direction. We then add the number of coordinates with y=0 which is 1+2×⌊r⌋.
The above thus works in O(r) with r the radius of the circle.
Sample output:
>>> total(0)
1.0
>>> total(0.1)
1.0
>>> total(1)
5.0
>>> total(1.1)
5.0
>>> total(1.42)
9.0
>>> total(2)
13.0
>>> total(3)
29.0
A slower alternative in O(r2) is to generate a grid, and then perform a comparison in bulk on it, like:
import numpy as np
from math import floor
def total_slow(radius):
r0 = floor(radius)
xs = np.arange(-r0, r0+1)
ys = xs[:, None]
return np.sum(xs*xs+ys*ys <= radius*radius)
The above is however useful if we want to verify that our solution works. For example:
>>> total_slow(0)
1
>>> total_slow(1)
5
>>> total_slow(2)
13
>>> total_slow(3)
29
produces the same results, but we can use this to verify that it always produces the same result for a large number of radiuses, for example:
>>> [total(r) == total_slow(r) for r in np.arange(0, 10, 0.03)]
[True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]
This thus means that we verified that the two generated the same result for 0, 0.03, 0.06, etc. up to 10. The above is of course not a formal proof of the correctness, but it gives us some empricial evidence.
Performance
We can test performance by using the timeit module, and we tested the algorithms with three radiusses, each experiment was repeated 10'000 times:
>>> timeit(partial(total, 1.23), number=10000)
0.11901686200144468
>>> timeit(partial(total, 12.3), number=10000)
0.1255619800031127
>>> timeit(partial(total, 123), number=10000)
0.1318465179974737
>>> timeit(partial(total_slow, 1.23), number=10000)
0.11859546599953319
>>> timeit(partial(total_slow, 12.3), number=10000)
0.15540562200112618
>>> timeit(partial(total_slow, 123), number=10000)
1.3335393390007084
>>> timeit(partial(total_slow, 1.23), number=10000)
0.11859546599953319
>>> timeit(partial(total_slow, 12.3), number=10000)
0.15540562200112618
>>> timeit(partial(total_slow, 123), number=10000)
1.3335393390007084
>>> timeit(partial(points, 1), number=10000)
0.1152820099996461
>>> timeit(partial(points, 12), number=10000)
3.7115225509987795
>>> timeit(partial(points, 123), number=10000)
433.3227958699972
With points the method of #ShlomiF, we here use integral numbers, since for a radius with a floating point number, this method will produce incorrect results.
We thus obtain the following table:
radius total total_slow points
1.23 0.119 0.119 0.115*
12.3 0.125 0.155 3.711*
123. 0.131 1.334 433.323*
* with integral radiuses
This is expected behavior if we take time complexity into account: eventually a linear approach will outperform a quadratic approach.
For faster, the same logic but a quick numpy solution :
def count_in(radius):
r=radius+1
X,Y=np.mgrid[-r:r,-r:r]
return (X2 + Y2 <= radius**2).sum()
You will speed your algorithm by ~x30 generally. Detail explanations later.
For algorithm, I suggest you a better way :
Just follow the blue line, evaluating pink area. Have fun.
Explanations for radius=3 :
In [203]: X
Out[203]:
array([[-4, -4, -4, -4, -4, -4, -4, -4],
[-3, -3, -3, -3, -3, -3, -3, -3],
[-2, -2, -2, -2, -2, -2, -2, -2],
[-1, -1, -1, -1, -1, -1, -1, -1],
[ 0, 0, 0, 0, 0, 0, 0, 0],
[ 1, 1, 1, 1, 1, 1, 1, 1],
[ 2, 2, 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3, 3, 3]])
In [204]: Y
Out[204]:
array([[-4, -3, -2, -1, 0, 1, 2, 3],
[-4, -3, -2, -1, 0, 1, 2, 3],
[-4, -3, -2, -1, 0, 1, 2, 3],
[-4, -3, -2, -1, 0, 1, 2, 3],
[-4, -3, -2, -1, 0, 1, 2, 3],
[-4, -3, -2, -1, 0, 1, 2, 3],
[-4, -3, -2, -1, 0, 1, 2, 3],
[-4, -3, -2, -1, 0, 1, 2, 3]])
In [205]: X*X+Y*Y
Out[205]:
array([[32, 25, 20, 17, 16, 17, 20, 25],
[25, 18, 13, 10, 9, 10, 13, 18],
[20, 13, 8, 5, 4, 5, 8, 13],
[17, 10, 5, 2, 1, 2, 5, 10],
[16, 9, 4, 1, 0, 1, 4, 9],
[17, 10, 5, 2, 1, 2, 5, 10],
[20, 13, 8, 5, 4, 5, 8, 13],
[25, 18, 13, 10, 9, 10, 13, 18]])
And finally :
In [207]: (X**2 + Y**2 <= 3**2)
Out[207]:
array([[False, False, False, False, False, False, False, False],
[False, False, False, False, True, False, False, False],
[False, False, True, True, True, True, True, False],
[False, False, True, True, True, True, True, False],
[False, True, True, True, True, True, True, True],
[False, False, True, True, True, True, True, False],
[False, False, True, True, True, True, True, False],
[False, False, False, False, True, False, False, False]], dtype=bool)
In [208]: (X**2 + Y**2 <= 3**2).sum() # True is one
Out[208]: 29
Things can definitely be made faster than using lots of unnecessary loops.
Why not "build" and "check" in the same loop?
Also, you can make this (almost) a one-liner with the help of product from itertools:
import numpy as np
from itertools import product
def points(rad):
x = np.arange(-rad, rad + 1)
return len([(p1, p2) for p1, p2 in product(x, x) if p1 ** 2 + p2 ** 2 <= rad ** 2])
# Examples
print(points(5))
>> 81
print(points(1))
>> 5
print(points(2))
>> 13
The list comprehension gives the actual points, if that's of any use.
Good luck!
Some explanations in case needed:
x is just a list of integers between (and including) -rad and rad.
product(x, x) is the Cartesian product of x and itself, thus all the points in the square defined by both coordinates being between -rad and rad.
The list comprehension returns only points with a distance from the origin smaller or equal to rad.
The function returns the length of this list.

numpy mask array limiting the frequency of masked values

Starting from an array:
a = np.array([1,1,1,2,3,4,5,5])
and a filter:
m = np.array([1,5])
I am now building a mask with:
b = np.in1d(a,m)
that correctly returns:
array([ True, True, True, False, False, False, True, True], dtype=bool)
I would need to limit the number of boolean Trues for unique values to a maximum value of 2, so that 1 is masked only two times instead of three). The resulting mask would then appear (no matter the order of the first real True values):
array([ True, True, False, False, False, False, True, True], dtype=bool)
or
array([ True, False, True, False, False, False, True, True], dtype=bool)
or
array([ False, True, True, False, False, False, True, True], dtype=bool)
Ideally this is a kind of "random" masking over a limited frequency of values. So far I tried to random select the original unique elements in the array, but actually the mask select the True values no matter their frequency.
For a generic case with unsorted input array, here's one approach based on np.searchsorted -
N = 2 # Parameter to decide how many duplicates are allowed
sortidx = a.argsort()
idx = np.searchsorted(a,m,sorter=sortidx)[:,None] + np.arange(N)
lim_counts = (a[:,None] == m).sum(0).clip(max=N)
idx_clipped = idx[lim_counts[:,None] > np.arange(N)]
out = np.in1d(np.arange(a.size),idx_clipped)[sortidx.argsort()]
Sample run -
In [37]: a
Out[37]: array([5, 1, 4, 2, 1, 3, 5, 1])
In [38]: m
Out[38]: [1, 2, 5]
In [39]: N
Out[39]: 2
In [40]: out
Out[40]: array([ True, True, False, True, True, False, True, False], dtype=bool)

Categories

Resources