I have an array with approximately 12000 length, something like array([0.3, 0.6, 0.3, 0.5, 0.1, 0.9, 0.4...]). Also, I have a column in a dataframe that provides values like 2,3,7,3,2,7.... The length of the column is 48, and the sum of those values is 36.
I want to distribute the values, which means the 12000 lengths of array is distributed by specific every value. For example, the first value in that column( = 2) gets its own array of 12000*(2/36) (maybe [0.3, 0.6, 0.3]), and the second value ( = 3) gets its array of 12000*(3/36), and its value continues after the first value(something like [0.5, 0.1, 0.9, 0.4]) and so on.
import pandas as pd
import numpy as np
# mock some data
a = np.random.random(12000)
df = pd.DataFrame({'col': np.random.randint(1, 5, 48)})
indices = (len(a) * df.col.to_numpy() / sum(df.col)).cumsum()
indices = np.concatenate(([0], indices)).round().astype(int)
res = []
for s, e in zip(indices[:-1], indices[1:]):
res.append(a[round(s):round(e)])
# some tests
target_pcts = df.col.to_numpy() / sum(df.col)
realized_pcts = np.array([len(sl) / len(a) for sl in res])
diffs = target_pcts / realized_pcts
assert 0.99 < np.min(diffs) and np.max(diffs) < 1.01
assert all(np.concatenate([*res]) == a)
Related
I need to perform something similar to the built-in torch.argmax() function on a one-dimensional tensor, but instead of picking the index of the first of the maximum values, I want to be able to pick a random index of one of the maximum values. For example:
my_tensor = torch.tensor([0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.1])
index_1 = random_max_val_index_fn(my_tensor)
index_2 = random_max_val_index_fn(my_tensor)
print(f"{index_1}, {index_2}")
> 5, 1
You can get the indexes of all the maximums first and then choose randomly from them:
def rand_argmax(tens):
max_inds, = torch.where(tens == tens.max())
return np.random.choice(max_inds)
sample runs:
>>> my_tensor = torch.tensor([0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.1])
>>> rand_argmax(my_tensor)
2
>>> rand_argmax(my_tensor)
5
>>> rand_argmax(my_tensor)
2
>>> rand_argmax(my_tensor)
1
I think this should work:
import numpy as np
import torch
your_tensor = torch.tensor([0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.1])
argmaxes = np.argwhere(your_tensor==torch.max(your_tensor)).flatten()
rand_argmax = np.random.choice(argmaxes)
print(rand_argmax)
make sure you adjust for np.random.choice to account for replacement
For machine learning, I'm appliying Parzen Window algorithm.
I have an array (m,n). I would like to check on each row if any of the values is > 0.5 and if each of them is, then I would return 0, otherwise 1.
I would like to know if there is a way to do this without a loop thanks to numpy.
You can use np.all with axis=1 on a boolean array.
import numpy as np
arr = np.array([[0.8, 0.9], [0.1, 0.6], [0.2, 0.3]])
print(np.all(arr>0.5, axis=1))
>> [True False False]
import numpy as np
# Value Initialization
a = np.array([0.75, 0.25, 0.50])
y_predict = np.zeros((1, a.shape[0]))
#If the value is greater than 0.5, the value is 1; otherwise 0
y_predict = (a > 0.5).astype(float)
I have an array (m,n). I would like to check on each row if any of the values is > 0.5
That will be stored in b:
import numpy as np
a = # some np.array of shape (m,n)
b = np.any(a > 0.5, axis=1)
and if each of them is, then I would return 0, otherwise 1.
I'm assuming you mean 'and if this is the case for all rows'. In this case:
c = 1 - 1 * np.all(b)
c contains your return value, either 0 or 1.
I want to get a random 0-1 sequence. Now I generate the number one by one. My code is as follows:
p_arr = [0.1, 0.5, 0.3, 0.8]
seq = []
for pb in p_arr:
seq.append(np.random.choice(2, 1, p=[1-pb, pb]))
It's very time-consuming when the length of p_arr is very large (i.e. 10000). I wonder if there is another faster way to do this.
If you want to use Numpy, do:
Try it online!
import numpy as np
p_arr = [0.1, 0.5, 0.3, 0.8]
seq = (np.random.rand(len(p_arr)) > p_arr).astype(np.uint8)
print(seq)
If you don't want to use Numpy then you can still make your code simpler:
Try it online!
import random
p_arr = [0.1, 0.5, 0.3, 0.8]
seq = []
for pb in p_arr:
seq.append(1 if random.random() > pb else 0)
print(seq)
Also notice that in both cases I thought that your p_arr contains probability of 0 (not 1). If you want to inverse logic, then replace > with < in both of my codes.
I am trying to write a function that returns the value of the smallest integer that needs to be multiplied for a list of floats to be all integers. I tried implementing something with the "Least Common Multiple," but I'm not sure if the math checks out...
Say I have the following list (or list-like object) of float values:
example = [0.5, 0.4, 0.2, 0.1]
How could I write a function that returns func(example) = 10 ?
Another example would be...
example = [0.05, 0.1, 0.7, 0.8]
> func(example)
20
Since...
> 20 * np.array(example)
np.array([1, 2, 14, 16])
And all are integers.
Find the largest decimal places, multiply it to the list, find gcd, and find the minimum integer multiplier.
import numpy as np
import decimal
from math import gcd
from functools import reduce
def find_gcd(lst):
x = reduce(gcd, lst)
return x
example = [0.05, 0.1, 0.7, 0.8, 0.9]
decimal_places = min([decimal.Decimal(str(val)).as_tuple().exponent for val in example])
x1 = np.array(example)
multiplier = 1/(10**decimal_places)
gcd_val = find_gcd(map(int, x1 * multiplier))
min_multipler = int(multiplier/gcd_val)
print('Minimum Integer Multipler: ', min_multipler)
If you don't like Decimal.
example = [0.05, 0.1, 0.7, 0.8, 0.9]
n_places = max([len(str(val).split('.')[1]) for val in example])
multiplier = 10**n_places
x1 = np.array(example)
gcd_val = find_gcd(map(int, x1 * multiplier))
min_multipler = int(multiplier/gcd_val)
print('Minimum Integer Multipler: ', min_multipler)
If you have an upper bound den_max on plausible denominators the fractions.Fraction class has a handy limit_denominator method.
For example:
import fractions
max_den = 1000
fractions.Fraction(1/3)
# probably not what we want
# Fraction(6004799503160661, 18014398509481984)
fractions.Fraction(1/3).limit_denominator(max_den)
# better
# Fraction(1, 3)
import sympy
example = [0.5, 0.4, 0.2, 0.1]
sympy.lcm([fractions.Fraction(x).limit_denominator(max_den).denominator for x in example])
# 10
example = [0.05, 0.1, 0.7, 0.8]
sympy.lcm([fractions.Fraction(x).limit_denominator(max_den).denominator for x in example])
# 20
I have a Numpy array, and I need to find the N maximum product subarrays of M elements. For example, I have the array p = [0.1, 0.2, 0.8, 0.5, 0.7, 0.9, 0.3, 0.5] and I want to find the 5 highest product subarrays of 3 elements. Is there a "fast" way to do that?
Here is another quick way to do it:
import numpy as np
p = [0.1, 0.2, 0.8, 0.5, 0.7, 0.9, 0.3, 0.5]
n = 5
m = 3
# Cumulative product (starting with 1)
pc = np.cumprod(np.r_[1, p])
# Cumulative product of each window
w = pc[m:] / pc[:-m]
# Indices of the first element of top N windows
idx = np.argpartition(w, n)[-n:]
print(idx)
# [1 2 5 4 3]
Approach #1
We can create sliding windows and then perform prod reduction and finally np.argpartition to get top N ones among them -
from skimage.util.shape import view_as_windows
def topN_windowed_prod(a, W, N):
w = view_as_windows(a,W)
return w[w.prod(1).argpartition(-N)[-N:]]
Sample run -
In [2]: p = np.array([0.1, 0.2, 0.8, 0.5, 0.7, 0.9, 0.3, 0.5])
In [3]: topN_windowed_prod(p, W=3, N=2)
Out[3]:
array([[0.8, 0.5, 0.7],
[0.5, 0.7, 0.9]])
Note that the order is not maintained with np.argpartition. So, if we need the top N in descending order of prod values, use range(N) with it. More info.
Approach #2
For smaller window lengths, we can simply slice and get our desired result, like so -
def topN_windowed_prod_with_slicing(a, W, N):
w = view_as_windows(a,W)
L = len(a)-W+1
acc = a[:L].copy()
for i in range(1,W):
acc *= a[i:i+L]
idx = acc.argpartition(-N)[-N:]
return w[idx]