I have a problem that I'm not sure how to solve properly.
Suppose we have to generate 1 <= n <= 40 numbers: X[1], X[2], ..., X[n].
For each number, we have some discrete space we can draw a number from. This space is not always a range and can be quite large (thousands/millions of numbers).
Another constraint is that the resulting array of numbers should be sorted in ascending order: X[1] <= X[2] <= ... <= X[n].
As an example for three numbers:
X[1] in {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}
X[2] in {10, 20, 30, 50}
X[3] in {1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003}
Examples of valid outputs for this test: [9, 20, 2001], [18, 30, 1995]
Example of invalid outputs for this test: [25, 10, 1998] (not increasing order)
I already tried different methods but what I'm not satisfied with is that they all yield not uniformly distributed results, i.e. there is a strong bias in all my solutions and some samples are underrepresented.
One of the methods is to try to randomly generate numbers one by one and at each iteration reduce the space for the upcoming numbers to satisfy the increasing order condition. This is bad because this solution always biases the last numbers towards the higher end of their possible range.
I already gave up on looking for an exact solution that could yield samples uniformly. I would really appreciate any reasonable solution (preferably, on Python, but anything will do, really).
I won't code it for you but here's the logic to do the non brute force approach:
Let's define N(i,x) the number of possible samples of X[1],...,X[i] where X[i]=x. And S(i) the possible values for X[i]. You have the recursion formula N(i,x) = Sum over y in S(i-1) with y<=x of N(i-1,y). This allows you to very quickly compute all N(i,x). It is then easy to build up your sample from the end:
Knowing all N(n,x), you can draw X[n] from S(n) with probability N(n,X[n]) / (Sum over x in S(N) of N(n,x))
And then you keep building down: given you have already drawn X[n],X[n-1],...,X[i+1] you draw X[i] from S(i) with X[i]<=X[i+1] with probability N(i,X[i]) / (Sum over x in S(i) with x<=X[i+1] of N(i,x))
Here is an implementation of the hueristic I suggested in the comments:
import random
def rand_increasing(sets):
#assume: sets is list of sets
sets = [s.copy() for s in sets]
n = len(sets)
indices = list(range(n))
random.shuffle(indices)
chosen = [0]*n
for i,k in enumerate(indices):
chosen[k] = random.choice(list(sets[k]))
for j in indices[(i+1):]:
if j > k:
sets[j] = {x for x in sets[j] if x > chosen[k]}
else:
sets[j] = {x for x in sets[j] if x < chosen[k]}
return chosen
#test:
sets = [{8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31},
{10, 20, 30, 50},
{1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003}]
for _ in range(10):
print(rand_increasing(sets))
Typical output:
[24, 50, 1996]
[26, 30, 2001]
[17, 30, 1995]
[11, 20, 2000]
[12, 20, 1996]
[11, 50, 2003]
[14, 20, 2002]
[9, 10, 2001]
[8, 30, 1999]
[8, 10, 1998]
Of course, if you can get uniform sampling with Julien's approach, that is preferable. (This heuristic might give uniform -- but that would require proof). Also note that poor choices in the earlier stages might drive some of the later sets in the permutation to being empty, raising an error. The function could be called in a loop with proper error trapping, yielding a hit-or-miss approach.
Related
I am tasked with finding the exact coordinate of a maximum value in a list of lists in python. This list of lists is referred to as a grid to emulate topographical coordinates.
Here is the grid, along with my code to find the maximum:
grid = [[15, 16, 18, 19, 12, 11],
[13, 19, 23, 21, 16, 12],
[12, 15, 17, 19, 22, 10],
[10, 14, 16, 13, 9, 6]]
maxi = 0
for i in grid:
for j in i:
if j > maxi:
maxi = j
This code finds the maximum, however I am stuck on finding the coordinates. The output should be:
global max: (1,2) 23
Because the maximum (23) is on the First row, and on the second column.
I have tried using index and find but they do not work or take my value as an input. Any tips or help are appreciated, thank you in advance.
You can use the builtin function enumerate.
Update your code to this:
grid = [[15, 16, 18, 19, 12, 11],
[13, 19, 23, 21, 16, 12],
[12, 15, 17, 19, 22, 10],
[10, 14, 16, 13, 9, 6]]
maxi = -float('inf')
maxCoord = None
for i, row in enumerate(grid):
for j, col in enumerate(row):
if col > maxi:
maxi = col
maxCoord = (i, j)
print(maxCoord, maxi) #(1, 2) 23
Enumerate could be an option, as it was already proposed. If you want to keep your original function to find the max value, you can call its coordinates using:
for sublist in grid:
if maxi in sublist:
print(grid.index(sublist), sublist.index(maxi))
So I am trying to create a function where from a list of numbers, it tells me how many pairs of socks there are. Eg. in a list of [10, 20, 20, 10, 10, 30, 50, 10, 20], it tells me there are 3 pairs, because 10x10 and 20x20 and 10x10, with 30, 50 and 20 being left over. But is enough for answer to simply be just '3'!
So this is my code so far, where
n: the number of socks in the pile
ar: the colors of each sock
def sockMerchant(n, ar):
ar = [10, 20, 20, 10, 10, 30, 50, 10, 20]
n = len(ar)
pair = []
for i in ar:
if ar.count >= 2 and (ar.count % 2) == 0:
pair.append(i)
if ar.count < 0:
return False
return (n,ar)
print(len(pair))
However...code not quite there yet. Am i making a mistake in how i call the function? And how is my approach, in first testing whether the number appears at least twice and in even counts, to check for pairs? Please do advise me!
A simple approach would be to count the numbers in a dictionary, and sum the number of pairs found, which must be a multiple of two.
More specifically, you can sum() up the pairs from a collections.Counter() object. Remember that we use // for floor division to round down to the correct number of pairs.
Sample Implementation
from collections import Counter
def sum_pairs(lst):
return sum(v // 2 for v in Counter(lst).values())
Tests
>>> sum_pairs([10, 20, 20, 10, 10, 30, 50, 10, 20, 20])
4
>>> sum_pairs([10, 20, 20, 10, 10, 30, 50, 10, 20, 20, 20])
4
>>> sum_pairs([10, 20, 20, 10, 10, 30, 50, 10, 20])
3
>>> sum_pairs([10, 20, 30, 50])
0
>>> sum_pairs([10, 20, 30, 50, 10])
1
Note: Just for clarity, Counter is a subclass of dict. Its the simplest way to count items from a list.
This question already has answers here:
Best way to find the intersection of multiple sets?
(7 answers)
Closed 8 years ago.
I have a list-
list_of_sets = [{0, 1, 2}, {0}]
I want to calculate the intersection between the elements of the list. I have thought about this solution:
a = list_of_sets[0]
b = list_of_sets[1]
c = set.intersection(a,b)
This solution works as i know the number of the elements of the list. (So i can declare as many as variable i need like a,b etc.)
My problem is that i can't figure out a solution for the other case, where the number of the elements of the list is unknown.
N.B: the thought of counting the number of elements of the list using loop and than creating variables according to the result has already been checked. As i have to keep my code in a function (where the argument is list_of_sets), so i need a more generalized solution that can be used for any numbered list.
Edit 1:
I need a solution for all the elements of the list. (not pairwise or for 3/4 elements)
If you wanted the intersection between all elements of all_sets:
intersection = set.intersection(*all_sets)
all_sets is a list of sets. the set is the set type.
For pairwise calculations,
This calculates intersections of all unordered pairs of 2 sets from a list all_sets. Should you need for 3, then use 3 as the argument.
from itertools import combinations, starmap
all_intersections = starmap(set.intersection, combinations(all_sets, 2))
If you did need the sets a, b for calculations, then:
for a, b in combinations(all_sets, 2):
# do whatever with a, b
You want the intersection of all the set. Then:
list_of_sets[0].intersection(*list_of_sets[1:])
Should work.
Take the first set from the list and then intersect it with the rest (unpack the list with the *).
You can use reduce for this. If you're using Python 3 you will have to import it from functools. Here's a short demo:
#!/usr/bin/env python
n = 30
m = 5
#Find sets of numbers i: 1 <= i <= n that are coprime to each number j: 2 <= j <= m
list_of_sets = [set(i for i in range(1, n+1) if i % j) for j in range(2, m+1)]
print 'Sets in list_of_sets:'
for s in list_of_sets:
print s
print
#Get intersection of all the sets
print 'Numbers less than or equal to %d that are coprime to it:' % n
print reduce(set.intersection, list_of_sets)
output
Sets in list_of_sets:
set([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29])
set([1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 20, 22, 23, 25, 26, 28, 29])
set([1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15, 17, 18, 19, 21, 22, 23, 25, 26, 27, 29, 30])
set([1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 29])
Numbers less than or equal to 30 that are coprime to it:
set([1, 7, 11, 13, 17, 19, 23, 29])
Actually, we don't even need reduce() for this, we can simply do
set.intersection(*list_of_sets)
I need to create a list of groups of items, grouped so that the sum of the negative logarithms of the probabilities is roughly 1.
So far I've come up with
probs = np.random.dirichlet(np.ones(50)*100.,size=1).tolist()
logs = [-1 * math.log(1-x,2) for x in probs[0]]
zipped = zip(range(0,50), logs)
for key, igroup in iter.groupby(zipped, lambda x: x[1] < 1):
print(list(igroup))
I.e. I create a list of random numbers, take their negative logarithms, then zip these probabilities together with the item number.
I then want to create groups by adding together the numbers in the second column of the tuple until the sum is 1 (or slightly above it).
I've tried:
for key, igroup in iter.groupby(zipped, lambda x: x[1]):
for thing in igroup:
print(list(iter.takewhile(lambda x: x < 1, iter.accumulate(igroup))))
and various other variations on using itertools.accmuluate, but I can't get it to work.
Does anyone have an idea of what could be going wrong (I think I'm doing too much work).
Ideally, the output should be something like
groups = [[1,2,3], [4,5], [6,7,8,9]]
etc i.e these are the groups which satisfy this property.
Using numpy.ufunc.accumulate and simple loop:
import numpy as np
def group(xs, start=1):
last_sum = 0
for stop, acc in enumerate(np.add.accumulate(xs), start):
if acc - last_sum >= 1:
yield list(range(start, stop))
last_sum = acc
start = stop
if start < stop:
yield list(range(start, stop))
probs = np.random.dirichlet(np.ones(50) * 100, size=1)
logs = -np.log2(1 - probs[0])
print(list(group(logs)))
Sample output:
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]]
ALTERNATIVE
Using numpy.searchsorted:
def group(xs, idx_start=1):
xs = np.add.accumulate(xs)
idxs = np.searchsorted(xs, np.arange(xs[-1]) + 1, side='left').tolist()
return [list(range(i+idx_start, j+idx_start)) for i, j in zip([0] + idxs, idxs)]
range(5, 15) [1, 1, 5, 6, 10, 10, 10, 11, 17, 28]
range(6, 24) [4, 10, 10, 10, 15, 16, 18, 20, 24, 30]
range(7, 41) [9, 18, 19, 23, 23, 26, 28, 40, 42, 44]
range(11, 49) [9, 23, 24, 27, 29, 31, 43, 44, 45, 45]
range(38, 50) [1, 40, 41, 42, 44, 48, 49, 49, 49, 50]
I get the above outpout from a print command from a function. What I really want is a combined list of the range, for example in the top line 5,6,7...15,1,1,5,6 etc.
The output range comes from
range_draws=range(int(lower),int(upper))
which I naively thought would give a range. The other numbers come from a sliced list.
Could someone help me to get the desired result.
The range() function returns a special range object to save on memory (no need to keep all the numbers in memory when only the start, end and step size will do). Cast it to a list to 'expand' it:
list(yourrange) + otherlist
To quote the documentation:
The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).