Changing groupby keys on the fly - python

I need to split a sorted list of probabilities into groups. The first group contains probabilities from (0.5,1), the second (0.25,0.5) etc.
I've produced some code that splits a list containing powers of two less than 1 into two lists: one of list members greater than 0.5, the other containing (original) list members less than 0.5.
from itertools import groupby
from operator import itemgetter
import doctest
N= 10
twos = [2**(-(i+1)) for i in range(0,N)]
def split_by_prob(items,cutoff):
"""
(list of double) -> list of (lists) of double
Splits a set into subsets based on probability
>>> split_by_prob(twos, 0.5)
[[0.5], [ 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
"""
groups = []
keys = []
for k,g in it.groupby(enumerate(items), lambda (j, x): x<cutoff):
groups.append((map(itemgetter(1),g)))
return groups
Calling this code from the command line does exactly this:
>>> g = split_into_groups(twos,0.5)
>>> g
[[0.5], [0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
My question: how can I change the cutoff on each iteration? I.e. if I passed the function a list of cutoffs (e.g. cutoffs = [0.5, 0.125, 0.0625], I'd get a list of lists each with the respective members of the original list grouped into the correct category. In this case the groups returned will be something like [[0.5],[0.25,0125],[0.0625],[0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]

If I understand you correctly you can just iterate over a list of cutoffs using x < i for each i in cutoffs.
cutoffs = [0.5, 0.125, 0.0625]
def split_by_prob(items,cutoffs):
"""
(list of double) -> list of (lists) of double
Splits a set into subsets based on probability
# >>> split_by_prob(twos, 0.5)
[[0.5], [ 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
"""
groups = []
keys = []
for i in cutoffs:
for k,g in groupby(enumerate(items), lambda (j, x): x < i):
groups.append((map(itemgetter(1),g)))
return groups
print split_by_prob(twos, cutoffs)
[0.5], [0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625], [0.5, 0.25, 0.125], [0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625], [0.5, 0.25, 0.125, 0.0625], [0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]

I've figured out what I needed to do, and the full code is below. I'm not sure how efficient or pythonic it is however:
import numpy as np
from itertools import groupby
from operator import itemgetter
import doctest
N= 10
twos = [2**(-(i+1)) for i in range(0,N)]
cutoffs = [0.5, 0.125, 0.03125]
def split_by_prob(items,cutoff,groups):
"""
(list of double) -> list of (lists) of double
Splits a set into subsets based on probability
>>> split_by_prob(twos, 0.5)
[[0.5], [ 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
"""
for k,g in groupby(enumerate(items), lambda (j, x): x<cutoff):
groups.append((map(itemgetter(1),g)))
return groups
def split_into_groups(items, cutoffs):
"""
(list of double) -> list of (lists) of double
Splits a set into subsets based on probability
>>> split_by_prob(twos, cutoffs)
[[0.5], [0.25, 0.125], [0.0625, 0.03125], [0.015625, 0.0078125, 0.00390625, 0.001953125, 0.0009765625]]
"""
groups = items
final = []
for i in cutoffs:
groups = split_by_prob(groups,i,[])
final.append(groups[0])
groups = groups.pop()
final.append(groups)
return final

Related

getting values from a CDF

Good morning, everyone. I have a set of values.
Arr = np.array([0.11, 0.14, 0.22, 0.26, 0.31, 0.36, 0.44, 0.69, 0.70, 0.70, 0.70, 0.75, 0.98, 1.40])
I have constructed the CDF function in this way:
def ecdf(a):
x, counts = np.unique(a, return_counts=True)
cusum = np.cumsum(counts)
return x, cusum / cusum[-1]
def plot_ecdf(a):
x, y = ecdf(a)
x = np.insert(x, 0, x[0])
y = np.insert(y, 0, 0.)
plt.plot(x, y, drawstyle='steps-post')
plt.grid(True)
ecdf_ = ecdf(Arr)
plot_ecdf(ecdf_)
Obtaining this figure:
Now I want to divide the space (y-axis) into 5 parts. To do this I am using the following function:
from scipy.stats.qmc import LatinHypercube
engine = LatinHypercube(d=1)
sample = engine.random(n=5) #Array of float64
For example, obtaining 5 values randomly generated:
0.0886183
0.450613
0.808077
0.753524
0.343108
At this point I would like to keep the corresponding values in the CDF as in the picture.
I also observed that in this way the constructed CDF has a discrete set of values. Which may not be optimal for my purpose.

Distribute values based on sum and list of provided values

I need to generate list of values from provided that satisfy this requirements:
Sum of all generated values should be equal of total, only providedValues should be used to get the sum, providedValues and total can be any double.
For example:
total = 1.0
providedValues = [0.5, 0.25]
Values in output list should be randomly distributed, for example output can be: [0.5, 0.25, 0.25], [0.25, 0.5, 0.25] or [0.25, 0.25, 0.5]
In case sum can't be equal total:
total = 1.0
providedValues = [0.3]
algorithm should throw error.
Language for implementation not so matter, I'll try to read any.
This algorithm will return all the possible combinations that sum to total.
import itertools
import numpy as np
def find_combination(total, providedValues):
i = 1
rv = []
while True:
combs = list(itertools.combinations_with_replacement(providedValues,i))
validCombs = [comb for comb in combs if np.isclose(sum(comb),total)]
if validCombs:
rv.extend(validCombs)
elif not [comb for comb in combs if sum(comb) <= total]:
return rv
i += 1
Output:
>>> find_combination(1.0, [0.5, 0.25])
[(0.5, 0.5), (0.5, 0.25, 0.25), (0.25, 0.25, 0.25, 0.25)]
>>> find_combination(1.0, [0.3])
[]
If you want to get all permutations of the results, you can use
>>> set(itertools.permutations((0.5, 0.25, 0.25)))
{(0.25, 0.25, 0.5), (0.25, 0.5, 0.25), (0.5, 0.25, 0.25)}
For example:
>>> set(y for x in find_combination(1.0, [0.5, 0.25]) for y in itertools.permutations(x))
{(0.25, 0.25, 0.25, 0.25),
(0.25, 0.25, 0.5),
(0.25, 0.5, 0.25),
(0.5, 0.25, 0.25),
(0.5, 0.5)}
Here is my solution based on there are two values provided, you may want to change it for you need
from itertools import permutations, combinations
def get_scala(x,y,t):
# get list of scala combinations
# find a,b that a*x+b*y = total
scala_list = []
amax = int(t // x) # possible max scala for x
bmax = int(t // y) # possible max scala for y
for i in range(1, amax+1):
for j in range(1, bmax+1):
if i*x + j*y == t: # find the scala combination that == total
scala_list.append((i, j))
if scala_list:
return scala_list
else:
print("Warning: cannot add up to the total")
def dist(x, y, scala):
a, b = scala
# get a base list with a number of x and b number of y [x,x,y,y,y]
bl = [x]*a + [y]*b
# get permutations and using set to get rid of duplicate items
return set(permutations(bl))
for l in get_scala(0.3, 0.2, 1):
for d in dist(0.3, 0.2, l):
print(d)
the output would look look:
(0.2, 0.3, 0.2, 0.3)
(0.2, 0.2, 0.3, 0.3)
(0.3, 0.2, 0.2, 0.3)
(0.3, 0.2, 0.3, 0.2)
(0.3, 0.3, 0.2, 0.2)
(0.2, 0.3, 0.3, 0.2)

How to generate list of floats in descending order that sum to 1?

I want to generate a list of floats of size M, where each item in the list is greater than the other proceeding items i.e. Descending order. and the sum of the list must sum to 1. and for the same M magnitude can I generate more than one list that obey to the given constraints.
I'm thinking of an equation in the following form:
Xi+1 = compute([Xi,Xi-1...X0], M, Random)
But I am not able to figure out the extent of this function. Thank you in advance.
okay, so let's pick 10 random numbers from 0 to 10, and sort them. Then compute sum and rebuild a new list with each element divided by this sum:
import random
# create a non-normalized ascending list of numbers
lst = sorted(random.uniform(0,10) for _ in range(10))
# compute the sum
temp_sum = sum(lst)
# now divide each member by the sum to normalize the list
lst = [i/temp_sum for i in lst]
print(lst,sum(lst))
one output could be:
[0.0340212528820301, 0.05665995400192079, 0.07733861892990018,
0.07752841352220373, 0.08556431469182045, 0.11628857362899164,
0.11706017358757258, 0.12523809404875455, 0.14272942597136748,
0.16757117873543856] 1.0
The sum could be not exactly 1 because of floating point inaccuracy, but will be very close.
If you want something that is mathematically predictable...
def makeDescendingUnitArray(length: int):
if (not isinstance(length, int)) or (length < 1):
raise ValueError("Array Length must be an int with a value of at least 1")
if length == 1:
return [1]
else:
constant = 1
output = list()
for x in range(length - 2):
constant /= 2
output.append(constant)
return output + [2*constant/3, constant/3]
for arrayLength in range(1, 10):
array = makeDescendingUnitArray(arrayLength)
print(array)
Produces the following arrays...
[1]
[0.6666666666666666, 0.3333333333333333]
[0.5, 0.3333333333333333, 0.16666666666666666]
[0.5, 0.25, 0.16666666666666666, 0.08333333333333333]
[0.5, 0.25, 0.125, 0.08333333333333333, 0.041666666666666664]
[0.5, 0.25, 0.125, 0.0625, 0.041666666666666664, 0.020833333333333332]
[0.5, 0.25, 0.125, 0.0625, 0.03125, 0.020833333333333332, 0.010416666666666666]
[0.5, 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.010416666666666666, 0.005208333333333333]
[0.5, 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.005208333333333333, 0.0026041666666666665]
If you want a mathematically predictable one-liner, then there's this...
(loop to show you what it looks like)
for length in range(1, 10):
array = [2*x/(length * (length + 1)) for x in range(length,0,-1)]
print(sum(array), array)
This produces the following output. Note that this is just as susceptible to the floating point rounding errors as all of the other algorithms. There are some better and some worse algorithms, but at some point, they'll all have some error.
Sum: 1.0 Array: [1.0]
Sum: 1.0 Array: [0.6666666666666666, 0.3333333333333333]
Sum: 0.9999999999999999 Array: [0.5, 0.3333333333333333, 0.16666666666666666]
Sum: 0.9999999999999999 Array: [0.4, 0.3, 0.2, 0.1]
Sum: 1.0 Array: [0.3333333333333333, 0.26666666666666666, 0.2, 0.13333333333333333, 0.06666666666666667]
Sum: 0.9999999999999998 Array: [0.2857142857142857, 0.23809523809523808, 0.19047619047619047, 0.14285714285714285, 0.09523809523809523, 0.047619047619047616]
Sum: 1.0 Array: [0.25, 0.21428571428571427, 0.17857142857142858, 0.14285714285714285, 0.10714285714285714, 0.07142857142857142, 0.03571428571428571]
Sum: 1.0 Array: [0.2222222222222222, 0.19444444444444445, 0.16666666666666666, 0.1388888888888889, 0.1111111111111111, 0.08333333333333333, 0.05555555555555555, 0.027777777777777776]
Sum: 0.9999999999999999 Array: [0.2, 0.17777777777777778, 0.15555555555555556, 0.13333333333333333, 0.1111111111111111, 0.08888888888888889, 0.06666666666666667, 0.044444444444444446, 0.022222222222222223]

Python: for a list of lists, get mean value in each position

I have a list of lists:
list_of_lists = []
list_1 = [-1, 0.67, 0.23, 0.11]
list_2 = [-1]
list_3 = [0.54, 0.24, -1]
list_4 = [0.2, 0.85, 0.8, 0.1, 0.9]
list_of_lists.append(list_1)
list_of_lists.append(list_2)
list_of_lists.append(list_3)
list_of_lists.append(list_4)
The position is meaningful. I want to return a list that contains the mean per position, excluding -1. That is, I want:
[(0.54+0.2)/2, (0.67+0.24+0.85)/3, (0.23+0.8)/2, (0.11+0.1)/2, 0.9/1]
which is actually:
[0.37, 0.5866666666666667, 0.515, 0.10500000000000001, 0.9]
How can I do this in a pythonic way?
EDIT:
I am working with Python 2.7, and I am not looking for the mean of each list; instead, I'm looking for the mean of 'all list elements at position 0 excluding -1', and the mean of 'all list elements at position 1 excluding -1', etc.
The reason I had:
[(0.54+0.2)/2, (0.67+0.24+0.85)/3, (0.23+0.8)/2, (0.11+0.1)/2, 0.9/1]
is that values in position 0 are -1, -1, 0.54, and 0.2, and I want to exclude -1; position 1 has 0.67, 0.24, and 0.85; position 3 has 0.23, -1, and 0.8, etc.
A solution without third-party libraries:
from itertools import zip_longest
from statistics import mean
def f(lst):
return [mean(x for x in t if x != -1) for t in zip_longest(*lst, fillvalue=-1)]
>>> f(list_of_lists)
[0.37, 0.5866666666666667, 0.515, 0.10500000000000001, 0.9]
It uses itertools.zip_longest with fillvalue set to -1 to "transpose" the list and set missing values to -1 (will be ignored at the next step). Then, a generator expression and statistics.mean are used to filter out -1s and get the average.
Here is a vectorised numpy-based solution.
import numpy as np
a = [[-1, 0.67, 0.23, 0.11],
[-1],
[0.54, 0.24, -1],
[0.2, 0.85, 0.8, 0.1, 0.9]]
# first create non-jagged numpy array
b = -np.ones([len(a), max(map(len, a))])
for i, j in enumerate(a):
b[i][0:len(j)] = j
# count negatives per column (for use later)
neg_count = [np.sum(b[:, i]==-1) for i in range(b.shape[1])]
# set negatives to 0
b[b==-1] = 0
# calculate means
means = [np.sum(b[:, i])/(b.shape[0]-neg_count[i]) \
if (b.shape[0]-neg_count[i]) != 0 else 0 \
for i in range(b.shape[1])]
# [0.37,
# 0.58666666666666667,
# 0.51500000000000001,
# 0.10500000000000001,
# 0.90000000000000002]
You can use pandas module to process.Code would like this :
import numpy as np
import pandas as pd
list_1 = [-1, 0.67, 0.23, 0.11,np.nan]
list_2 = [-1,np.nan,np.nan,np.nan,np.nan]
list_3 = [0.54, 0.24, -1,np.nan,np.nan]
list_4 = [0.2, 0.85, 0.8, 0.1, 0.9]
df=pd.DataFrame({"list_1":list_1,"list_2":list_2,"list_3":list_3,"list_4":list_4})
df=df.replace(-1,np.nan)
print(list(df.mean(axis=1)))

Back-and-Forth Linspace Generator

I'm looking to have a generator function that returns points along a line, given a minimum-distance k. This is simple enough, and can be done with numpy as follows:
points = np.linspace(start, end, k)
However, I would like to generate the points as a sort of "increasing resolution", so that on a line from 0 to 1, the generator would yield:
1/2, 1/4, 3/4, 1/8, 3/8, 5/8, ...
Again, this is easy enough to do recursively (just accept the endpoints and call self on each half), but I'd like a generator that can achieve the same thing without having to fill an array with everything from the start, and without duplicate points.
What would be the best way to do this?
A way to achieve this is by using:
def infinite_linspace():
den = 2
while True:
for i in range(1,den,2):
yield i/den
den <<= 1
Here we thus iterate with the numerator from 1 to den-1 (inclusive), and then double the denominator.
The first 15 numbers are then:
>>> list(islice(infinite_linspace(), 15))
[0.5, 0.25, 0.75, 0.125, 0.375, 0.625, 0.875, 0.0625, 0.1875, 0.3125, 0.4375, 0.5625, 0.6875, 0.8125, 0.9375]
>>> [1/2,1/4,3/4,1/8,3/8,5/8,7/8,1/16,3/16,5/16,7/16,9/16,11/16,13/16,15/16]
[0.5, 0.25, 0.75, 0.125, 0.375, 0.625, 0.875, 0.0625, 0.1875, 0.3125, 0.4375, 0.5625, 0.6875, 0.8125, 0.9375]
We can even put more intelligence into it to obtain the i-th element relatively fast as well:
class Linspace:
def __iter__(self):
den = 2
while True:
for i in range(1,den,2):
yield i/den
den <<= 1
def __getitem__(self, idx):
if not isinstance(idx, int):
raise TypeError('idx should be an integer')
if idx < 0:
raise ValueError('idx should be positive')
den = denn = idx+1
denn |= den >> 1
while den != denn:
den = denn
denn |= denn >> 1
denn += 1
return (2*idx+3-denn)/denn
So now we can access in logarithmic time for instance the 10-th, 15-th and 123'456-th element:
>>> l = Linspace()
>>> l[9]
0.3125
>>> l[14]
0.9375
>>> l[123455]
0.8837966918945312
Here is a shorter, pseudo O(1) way of directly computing the i-th element:
def jumpy(i):
i = (i<<1) + 3
return i / (1<<i.bit_length()-1) - 1
list(map(jumpy, range(15)))
# [0.5, 0.25, 0.75, 0.125, 0.375, 0.625, 0.875, 0.0625, 0.1875, 0.3125, 0.4375, 0.5625, 0.6875, 0.8125, 0.9375]

Categories

Resources