How to apply conditional statement in numpy array? - python

I am trying to apply conditional statements in a numpy array and to get a boolean array with 1 and 0 values.
I tried so far the np.where(), but it allows only 3 arguments and in my case I have some more.
I first create the array randomly:
numbers = np.random.uniform(1,100,5)
Now, if the value is lower then 30, I would like to get a 0. If the value is greater than 70, I would like to get 1. And if the value is between 30 and 70, I would like to get a random number between 0 and 1. If this number is greater than 0.5, then the value from the array should get 1 as a boolean value and in other case 0. I guess this is made again with the np.random function, but I dont know how to apply all of the arguments.
If the input array is:
[10,40,50,60,90]
Then the expected output should be:
[0,1,0,1,1]
where the three values in the middle are randomly distributed so they can differ when making multiple tests.
Thank you in advance!

Use numpy.select and 3rd condition should should be simplify by numpy.random.choice:
numbers = np.array([10,40,50,60,90])
print (numbers)
[10 40 50 60 90]
a = np.select([numbers < 30, numbers > 70], [0, 1], np.random.choice([1,0], size=len(numbers)))
print (a)
[0 0 1 0 1]
If need 3rd condition with compare by 0.5 is possible convert mask to integers for True, False to 1, 0 mapping:
b = (np.random.rand(len(numbers)) > .5).astype(int)
#alternative
#b = np.where(np.random.rand(len(numbers)) > .5, 1, 0)
a = np.select([numbers < 30, numbers > 70], [0, 1], b)
Or you can chain 3 times numpy.where:
a = np.where(numbers < 30, 0,
np.where(numbers > 70, 1,
np.where(np.random.rand(len(numbers)) > .5, 1, 0)))
Or use np.select:
a = np.select([numbers < 30, numbers > 70, np.random.rand(len(numbers)) > .5],
[0, 1, 1], 0)

Related

How to get index of multiple, possibly different, elements in numpy?

I have a numpy array with many rows in it that look roughly as follows:
0, 50, 50, 2, 50, 1, 50, 99, 50, 50
50, 2, 1, 50, 50, 50, 98, 50, 50, 50
0, 50, 50, 98, 50, 1, 50, 50, 50, 50
0, 50, 50, 50, 50, 99, 50, 50, 2, 50
2, 50, 50, 0, 98, 1, 50, 50, 50, 50
I am given a variable n<50. Each row, of length 10, has the following in it:
Every number from 0 to n, with one possibly missing. In the example above, n=2.
Possibly a 98, which will be in the place of the missing number, if there is a number missing.
Possibly a 99, which will be in the place of the missing number, if there is a number missing, and there is not already a 98.
Many 50's.
What I want to get is an array with all the indices of the 0s in the first row, all the indices of the 1s in the second row, all the indices of the 2s in the third row, etc. For the above example, my desired output is this:
0, 6, 0, 0, 3
5, 2, 5, 5, 5
3, 1, 3, 8, 0
You may have noticed the catch: sometimes, exactly one of the numbers is replaced either by a 98, or a 99. It's pretty easy to write a for loop which determines which number, if any, was replaced, and uses that to get the array of indices.
Is there a way to do this with numpy?
The follwing numpy solution rather aggressively uses the assumptions listed in OP. If they are not 100% guaranteed some more checks may be in order.
The mildly clever bit (even if I say so myself) here is to use the data array itself for finding the right destinations of their indices. For example, all the 2's need their indices stored in row 2 of the output array. Using this we can bulk store most of the indices in a single operation.
Example input is in array data:
n = 2
y,x = data.shape
out = np.empty((y,n+1),int)
# find 98 falling back to 99 if necessary
# and fill output array with their indices
# if neither exists some nonsense will be written but that does no harm
# most of this will be overwritten later
out.T[...] = ((data-98)&127).argmin(axis=1)
# find n+1 lowest values in each row
idx = data.argpartition(n,axis=1)[:,:n+1]
# construct auxiliary indexer
yr = np.arange(y)[:,None]
# put indices of low values where they belong
out[yr,data[yr,idx[:,:-1]]] = idx[:,:-1]
# ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
# the clever bit
# rows with no missing number still need the last value
nomiss, = (data[range(y),idx[:,n]] == n).nonzero()
out[nomiss,n] = idx[nomiss,n]
# admire
print(out.T)
outputs:
[[0 6 0 0 3]
[5 2 5 5 5]
[3 1 3 8 0]]
I don't think you're getting away without a for-loop here. But here's how you could go about it.
For each number in n, find all of the locations where it is known. Example:
locations = np.argwhere(data == 1)
print(locations)
[[0 5]
[1 2]
[2 5]
[4 5]]
You can then turn this into a map for easy lookup per number in n:
known = {
i: dict(np.argwhere(data == i))
for i in range(n + 1)
}
pprint(known)
{0: {0: 0, 2: 0, 3: 0, 4: 3},
1: {0: 5, 1: 2, 2: 5, 4: 5},
2: {0: 3, 1: 1, 3: 8, 4: 0}}
Do the same for the unknown numbers:
unknown = dict(np.argwhere((data == 98) | (data == 99)))
pprint(unknown)
{0: 7, 1: 6, 2: 3, 3: 5, 4: 4}
And now for each location in the result, you can lookup the index in the known list and fallback to the unknown.
result = np.array(
[
[known[i].get(j, unknown.get(j)) for j in range(len(data))]
for i in range(n + 1)
]
)
print(result)
[[0 6 0 0 3]
[5 2 5 5 5]
[3 1 3 8 0]]
Bonus: Getting fancy with dictionary constructor and unpacking:
from collections import OrderedDict
unknown = np.argwhere((data == 98) | (data == 99))
results = np.array([
[*OrderedDict((*unknown, *np.argwhere(data == i))).values()]
for i in range(n + 1)
])
print(results)

How to replace the N smallest elements in each row of numpy array?

I would like to replace the N smallest elements in each row for 0, and that the resulting array would respect the same order and shape of the original array.
Specifically, if the original numpy array is:
import numpy as np
x = np.array([[0,50,20],[2,0,10],[1,1,0]])
And N = 2, I would like for the result to be the following:
x = np.array([[0,50,0],[0,0,10],[0,1,0]])
I tried the following, but in the last row it replaces 3 elements instead of 2 (because it replaces both 1s and not only one)
import numpy as np
N = 2
x = np.array([[0,50,20],[2,0,10],[1,1,0]])
x_sorted = np.sort(x , axis = 1)
x_sorted[:,N:] = 0
replace = x_sorted.copy()
final = np.where(np.isin(x,replace),0,x)
Note that this is small example and I would like that it works for a much bigger matrix.
Thanks for your time!
One way using numpy.argsort:
N = 2
x[x.argsort().argsort() < N] = 0
Output:
array([[ 0, 50, 0],
[ 0, 0, 10],
[ 0, 1, 0]])
Use numpy.argpartition to find the index of N smallest elements, and then use the index to replace values:
N = 2
idy = np.argpartition(x, N, axis=1)[:, :N]
x[np.arange(len(x))[:,None], idy] = 0
x
array([[ 0, 50, 0],
[ 0, 0, 10],
[ 1, 0, 0]])
Notice if there are ties, it could be undetermined which values get replaced depending on the algorithm used.

Get index of numpy-array elements by comparing element-positions between arrays

Context
I have the following example-arrays in numpy:
import numpy as np
# All arrays in this example have the shape (15,)
# Note: All values > 0 are unqiue!
a = np.array([8,5,4,-1,-1, 7,-1,-1,12,11,-1,-1,14,-1,-1])
reference = np.array([0,1,2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14])
lookup = np.array([3,6,0,-2,-2,24,-2,-2,24,48,-2,-2,84,-2,-2])
My goal is to find the elements inside the reference in a, then get the index in a and use it to extract the corresponding elements in lookup.
Finding out the matching elements and their indices works with np.flatnonzero( np.isin() ).
I can also lookup the correspodning values:
# Example how to find the index
np.flatnonzero( np.isin( reference, a) )
# -> array([ 4, 5, 7, 8, 11, 12, 14])
# Example how to find corresponding values:
lookup[ np.flatnonzero( np.isin( a, reference) ) ]
# -> array([ 3, 6, 0, 24, 24, 48, 84], dtype=int64)
Problem
I want to fill an array z with the values I looked up, following the reference.
This means, that e.g. the 8th element of z corresponds to the 8th element in the lookup-value for the 8th element in reference (= 8). This value would be 3 (reference[8] -> a[0] because a==8 here -> lookup[0] -> 3).
z = np.zeros(reference.size)
z[np.flatnonzero(np.isin(reference, a))] = ? -> numpy-array of correctly ordered lookup_values
The expected outcome for z would be:
z = [ 0 0 0 0 0 6 0 24 3 0 0 48 24 0 84]
I cannot get my head around this; I have to avoid for-loops due to performance reasons and would need a pure numpy-solution (best without udfs).
How can I fill z according with the lookup-values at the correct position?
Note: As stated in the code above, all values a > 0 are unique. Thus, there is no need to take care about the duplicated values for a < 0.
You say that you 'have to avoid for-loops due to performance reasons', so I assume that your real-world datastructure a is going to be large (thousands or millions of elements?). Since np.isin(reference, a) performs a linear search in a for every element of reference, your runtime will be O(len(reference) * len(a)).
I would strongly suggest using a dict for a, allowing lookup in O(1) per element of reference, and loop in python using for. For sufficiently large a this will outperform the 'fast' linear search performed by np.isin.
The most natural way I can think of is to just treat a and lookup as a dictionary:
In [82]: d = dict(zip(a, lookup))
In [83]: np.array([d.get(i, 0) for i in reference])
Out[83]: array([ 0, 0, 0, 0, 0, 6, 0, 24, 3, 0, 0, 48, 24, 0, 84])
This does have a bit of memory overhead but nothing crazy if reference isn't too large.
I actually had an enlightenment.
# Initialize the result
# All non-indexed entries shall be 0
z = np.zeros(reference.size, dtype=np.int64)
Now evaluate which elements in a are relevant:
mask = np.flatnonzero(np.isin(a, reference))
# Short note: If we know that any positive element of a is a number
# Which has to be in the reference, we can also shorten this to
# a simple boolean mask. This will be significantly faster to process.
mask = (a > 0)
Now the following trick: All values a > 0 are unique. Additionally, their value corresponds to the position in reference (e.g. 8 in a shall correspond to the 8th position in reference. Thus, we can use the values as index themselves:
z[ a[mask] ] = lookup[mask]
This results in the desired outcome:
z = [ 0 0 0 0 0 6 0 24 3 0 0 48 24 0 84]

Count the number of times values appear within a range of values

How do I output a list which counts and displays the number of times different values fit into a range?
Based on the below example, the output would be x = [0, 3, 2, 1, 0] as there are 3 Pro scores (11, 24, 44), 2 Champion scores (101, 888), and 1 King score (1234).
- P1 = 11
- P2 = 24
- P3 = 44
- P4 = 101
- P5 = 1234
- P6 = 888
totalsales = [11, 24, 44, 101, 1234, 888]
Here is ranking corresponding to the sales :
Sales___________________Ranking
0-10____________________Noob
11-100__________________Pro
101-1000________________Champion
1001-10000______________King
100001 - 200000__________Lord
This is one way, assuming your values are integers and ranges do not overlap.
from collections import Counter
# Ranges go to end + 1
score_ranges = [
range(0, 11), # Noob
range(11, 101), # Pro
range(101, 1001), # Champion
range(1001, 10001), # King
range(10001, 200001) # Lord
]
total_sales = [11, 24, 44, 101, 1234, 888]
# This counter counts how many values fall into each score range (by index).
# It works by taking the index of the first range containing each value (or -1 if none found).
c = Counter(next((i for i, r in enumerate(score_ranges) if s in r), -1) for s in total_sales)
# This converts the above counter into a list, taking the count for each index.
result = [c[i] for i in range(len(score_ranges))]
print(result)
# [0, 3, 2, 1, 0]
As a general rule homework should not be posted on stackoverflow. As such, just a pointer on how to solve this, implementation is up to you.
Iterate over the totalsales list and check if each number is in range(start,stop). Then for each matching check increment one per category in your result list (however using a dict to store the result might be more apt).
Here a possible solution with no use of modules such as numpy or collections:
totalsales = [11, 24, 44, 101, 1234, 888]
bins = [10, 100, 1000, 10000, 20000]
output = [0]*len(bins)
for s in totalsales:
slot = next(i for i, x in enumerate(bins) if s <= x)
output[slot] += 1
output
>>> [0, 3, 2, 1, 0]
If your sales-to-ranking mapping always follows a logarithmic curve, the desired output can be calculated in linear time using math.log10 with collections.Counter. Use an offset of 0.5 and the abs function to handle sales of 0 and 1:
from collections import Counter
from math import log10
counts = Counter(int(abs(log10(abs(s - .5)))) for s in totalsales)
[counts.get(i, 0) for i in range(5)]
This returns:
[0, 3, 2, 1, 0]
Here, I have used the power of dataframe to store the values, then using bin and cut to group the values into the right categories. The extracting the value count into list.
Let me know if it is okay.
import pandas as pd
import numpy
df = pd.DataFrame([11, 24, 44, 101, 1234, 888], columns=['P'])# Create dataframe
bins = [0, 10, 100, 1000, 10000, 200000]
labels = ['Noob','Pro', 'Champion', 'King', 'Lord']
df['range'] = pd.cut(df.P, bins, labels = labels)
df
outputs:
P range
0 11 Pro
1 24 Pro
2 44 Pro
3 101 Champion
4 1234 King
5 888 Champion
Finally, to get the value count. Use:
my = df['range'].value_counts().sort_index()#this counts to the number of occurences
output=map(int,my.tolist())#We want the output to be integers
output
The result below:
[0, 3, 2, 1, 0]
You can use collections.Counter and a dict:
from collections import Counter
totalsales = [11, 24, 44, 101, 1234, 888]
ranking = {
0: 'noob',
10: 'pro',
100: 'champion',
1000: 'king',
10000: 'lord'
}
c = Counter()
for sale in totalsales:
for k in sorted(ranking.keys(), reverse=True):
if sale > k:
c[ranking[k]] += 1
break
Or as a two-liner (credits to #jdehesa for the idea):
thresholds = sorted(ranking.keys(), reverse=True)
c = Counter(next((ranking[t] for t in thresholds if s > t)) for s in totalsales)

Sorting by another matrix works in one case but fails for another

I need to sort matrices according to the descending order of the values in another matrix.
E.g. in a first step I would have the following matrix A:
1 0 1 0 1
0 1 0 1 0
0 1 0 1 1
1 0 1 0 0
Then for the procedure I am following I need to take the rows of the matrix as binary numbers and sort them in descending order of their binary value.
I am doing this the following way:
for i in range(0,num_rows):
for j in range(0,num_cols):
row_val[i] = row_val[i] + A[i][j] * (2 ** (num_cols - 1 - j))
This gets me a 4x1 vector row_val with the following values:
21
10
11
20
Now I am sorting the rows of the matrix according to row_val by
A = [x for _,x in sorted(zip(row_val,A),reverse=True)]
This works perfectly fine I get the matrix A:
1 0 1 0 1
1 0 1 0 0
0 1 0 1 1
0 1 0 1 0
However now I need to apply the same procedure to the columns. So I calculate a the col_val vector with the binary values of the columns:
12
3
12
3
3
To sort the matrix A according to the vector col_val I thought I could just transpose matrix A and then do the same as before:
At = np.transpose(A)
At = [y for _,y in sorted(zip(col_val,At),reverse=True)]
Unfortunatly this fails with the error message
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I am suspecting that this might be because there are several entries with the same value in vector col_val, however in an example shown in another question the sorting seems to work for a case with several equal entries.
Your suspicion is correct, you can't sort multidimensional numpy arrays using the Python builtin sorted because the comparison of two rows, say, will yield a row of truth values instead of a single one
A[0] < A[1]
# array([False, True, False, True, False])
so sorted can't tell which should go before the other.
In your first example this is masked by lexicographic ordering of tuples: Because tuples are compared left to right and because row_val has unique entries the comparison never looks at the second elements.
But in your second example because some col_val entries are equal, the comparison will look at At for a tie breaker which is where the exception occurs.
Here is a working method which uses numpy methods:
A[np.argsort(np.packbits(A, axis=1).ravel())[::-1]]
# array([[1, 0, 1, 0, 1],
# [1, 0, 1, 0, 0],
# [0, 1, 0, 1, 1],
# [0, 1, 0, 1, 0]])
A[:, np.argsort(np.packbits(A, axis=0).ravel())[::-1]]
# array([[1, 1, 1, 0, 0],
# [0, 0, 0, 1, 1],
# [1, 0, 0, 1, 1],
# [0, 1, 1, 0, 0]])
Explanation:
np.packbits as the name suggests packs binary vectors into bit field; it is almost equivalent to your hand-written code - there is one small difference in that packbits operates on chunks of 8 and pads with zero on the right, so for example [1, 1] will go to 192, not 3.
np.argsort does an indirect sort, so it doesn't actually move the elements of its operand A but just writes down the sequence of indices I into A which would sort it A[I] == np.sort(A). This is useful when we want to sort something based on the order of something else like in this case.

Categories

Resources