Related
Community of Stackoverflow:
I'm trying to create a list of sublists with a loop based on a random sampling of values of another list; and each sublist has the restriction of not having a duplicate or a value that has already been added to a prior sublist.
Let's say (example) I have a main list:
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
#I get:
[[1,13],[4,1],[8,13]]
#I WANT:
[[1,13],[4,9],[8,14]] #(no duplicates when checking previous sublists)
The real code that I thought it would work is the following (as a draft):
matrixvals=list(matrix.index.values) #list where values are obtained
lists=[[]for e in range(0,3)] #list of sublists that I want to feed
vls=[] #stores the values that have been added to prevent adding them again
for e in lists: #initiate main loop
for i in range(0,5): #each sublist will contain 5 different random samples
x=random.sample(matrixvals,1) #it doesn't matter if the samples are 1 or 2
if any(x) not in vls: #if the sample isn't in the evaluation list
vls.extend(x)
e.append(x)
else: #if it IS, then do a sample but without those already added values (line below)
x=random.sample([matrixvals[:].remove(x) for x in vls],1)
vls.extend(x)
e.append(x)
print(lists)
print(vls)
It didn't work as I get the following:
[[[25], [16], [15], [31], [17]], [[4], [2], [13], [42], [13]], [[11], [7], [13], [17], [25]]]
[25, 16, 15, 31, 17, 4, 2, 13, 42, 13, 11, 7, 13, 17, 25]
As you can see, number 13 is repeated 3 times, and I don't understand why
I would want:
[[[25], [16], [15], [31], [17]], [[4], [2], [13], [42], [70]], [[11], [7], [100], [18], [27]]]
[25, 16, 15, 31, 17, 4, 2, 13, 42, 70, 11, 7, 100, 18, 27] #no dups
In addition, is there a way to convert the sample.random results as values instead of lists? (to obtain):
[[25,16,15,31,17]], [4, 2, 13, 42,70], [11, 7, 100, 18, 27]]
Also, the final result in reality isn't a list of sublists, actually is a dictionary (the code above is a draft attempt to solve the dict problem), is there a way to obtain that previous method in a dict? With my present code I got the next results:
{'1stkey': {'1stsubkey': {'list1': [41,
40,
22,
28,
26,
14,
41,
15,
40,
33],
'list2': [41, 40, 22, 28, 26, 14, 41, 15, 40, 33],
'list3': [41, 40, 22, 28, 26, 14, 41, 15, 40, 33]},
'2ndsubkey': {'list1': [21,
7,
31,
12,
8,
22,
27,...}
Instead of that result, I would want the following:
{'1stkey': {'1stsubkey': {'list1': [41,40,22],
'list2': [28, 26, 14],
'list3': [41, 15, 40, 33]},
'2ndsubkey': {'list1': [21,7,31],
'list2':[12,8,22],
'list3':[27...,...}#and so on
Is there a way to solve both list and dict problem? Any help will be very appreciated; I can made some progress even only with the list problem
Thanks to all
I realize you may be more interested in finding out why your particular approach isn't working. However, if I've understood your desired behavior, I may be able to offer an alternative solution. After posting my answer, I will take a look at your attempt.
random.sample lets you sample k number of items from a population (collection, list, whatever.) If there are no repeated elements in the collection, then you're guaranteed to have no repeats in your random sample:
from random import sample
pool = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
num_samples = 4
print(sample(pool, k=num_samples))
Possible output:
[9, 11, 8, 7]
>>>
It doesn't matter how many times you run this snippet, you will never have repeated elements in your random sample. This is because random.sample doesn't generate random objects, it just randomly picks items which already exist in a collection. This is the same approach you would take when drawing random cards from a deck of cards, or drawing lottery numbers, for example.
In your case, pool is the pool of possible unique numbers to choose your sample from. Your desired output seems to be a list of three lists, where each sublist has two samples in it. Rather than calling random.sample three times, once for each sublist, we should call it once with k=num_sublists * num_samples_per_sublist:
from random import sample
pool = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
num_sublists = 3
samples_per_sublist = 2
num_samples = num_sublists * samples_per_sublist
assert num_samples <= len(pool)
print(sample(pool, k=num_samples))
Possible output:
[14, 10, 1, 8, 6, 3]
>>>
OK, so we have six samples rather than four. No sublists yet. Now you can simply chop this list of six samples up into three sublists of two samples each:
from random import sample
pool = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
num_sublists = 3
samples_per_sublist = 2
num_samples = num_sublists * samples_per_sublist
assert num_samples <= len(pool)
def pairwise(iterable):
yield from zip(*[iter(iterable)]*samples_per_sublist)
print(list(pairwise(sample(pool, num_samples))))
Possible output:
[(4, 11), (12, 13), (8, 15)]
>>>
Or if you really want sublists, rather than tuples:
def pairwise(iterable):
yield from map(list, zip(*[iter(iterable)]*samples_per_sublist))
EDIT - just realized that you don't actually want a list of lists, but a dictionary. Something more like this? Sorry I'm obsessed with generators, and this isn't really easy to read:
keys = ["1stkey"]
subkeys = ["1stsubkey", "2ndsubkey"]
num_lists_per_subkey = 3
num_samples_per_list = 5
num_samples = num_lists_per_subkey * num_samples_per_list
min_sample = 1
max_sample = 50
pool = list(range(min_sample, max_sample + 1))
def generate_items():
def generate_sub_items():
from random import sample
samples = sample(pool, k=num_samples)
def generate_sub_sub_items():
def chunkwise(iterable, n=num_samples_per_list):
yield from map(list, zip(*[iter(iterable)]*n))
for list_num, chunk in enumerate(chunkwise(samples), start=1):
key = f"list{list_num}"
yield key, chunk
for subkey in subkeys:
yield subkey, dict(generate_sub_sub_items())
for key in keys:
yield key, dict(generate_sub_items())
print(dict(generate_items()))
Possible output:
{'1stkey': {'1stsubkey': {'list1': [43, 20, 4, 27, 2], 'list2': [49, 44, 18, 8, 37], 'list3': [19, 40, 9, 17, 6]}, '2ndsubkey': {'list1': [43, 20, 4, 27, 2], 'list2': [49, 44, 18, 8, 37], 'list3': [19, 40, 9, 17, 6]}}}
>>>
I need to find the longest contiguous subsequence in a rising sequence in Python.
For example if I have A = [1, 2, 3, 5, 8, 9, 11, 13, 17, 18, 19, 20, 21, 25, 27, 28, 29, 30]
The answer would be [17, 18, 19, 20, 21] because it's the longest contiguous subsequence with 5 numbers (whereas [1, 2, 3] is 3 numbers long and [27, 28, 29, 30] is 4 numbers long.)
My code is stuck in an endless loop
num_list = [1, 2, 3, 5, 8, 9, 11, 13, 17, 18, 19, 20, 21, 23, 25, 26, 27]
longest_sequence = {}
longest_sequence_length = 1
for num in num_list:
sequence_length = 1
while True:
if (num + sequence_length) in num_list:
sequence_length += 1
else:
if sequence_length > longest_sequence_length:
longest_sequence_length_length = sequence_length
longest_sequence = {"start": num, "end": num + (sequence_length - 1)}
break
print(f"The longest sequence is {longest_sequence_length} numbers long"
f" and it's between {longest_sequence['start']} and {longest_sequence['end']}")
You can use numpy to solve it in one line:
import numpy as np
A = [1, 2, 3, 5, 8, 9, 11, 13, 17, 18, 19, 20, 21, 25, 27, 28, 29, 30]
out = max(np.split(A, np.where(np.diff(A) != 1)[0] + 1), key=len).tolist()
You can also find the same outcome by running 3 iterations.
(i) First you need to find the differences between consecutive elements in A; that's found in diff (with zip(A,A[1:]), you can access consecutive elements).
(ii) Then you split A on indices where the difference is not 1; that's being done in the second iteration. Basically, if a difference is 1, append the value in A to the running sublist, if not, create a new sublist and put the corresponding value to this new sublist.
(iii) Finally, using max() function, you can find the longest sublist using key=len.
This exact same job is done by the numpy code above.
diff = [j-i for i,j in zip(A, A[1:])]
splits = [[A[0]]]
for x,d in zip(A[1:], diff):
if d == 1:
splits[-1].append(x)
else:
splits.append([x])
out = max(splits, key=len)
Output:
[17, 18, 19, 20, 21]
In line 13 you need a break instead of a continue statement.
Also, in line 11 you had a little mistake, added an extra "_length" to you variable name.
What I want to do is reference several different ranges from within a list, i.e. I want the 4-6th elements, the 12 - 18th elements, etc. This was my initial attempt:
test = theList[4:7, 12:18]
Which I would expect to give do the same thing as:
test = theList[4,5,6,12,13,14,15,16,17]
But I got a syntax error. What is the best/easiest way to do this?
You can add the two lists.
>>> theList = list(range(20))
>>> theList[4:7] + theList[12:18]
[4, 5, 6, 12, 13, 14, 15, 16, 17]
You can also use itertools module :
>>> from itertools import islice,chain
>>> theList=range(20)
>>> list(chain.from_iterable(islice(theList,*t) for t in [(4,7),(12,18)]))
[4, 5, 6, 12, 13, 14, 15, 16, 17]
Note that since islice returns a generator in each iteration it performs better than list slicing in terms of memory use.
Also you can use a function for more indices and a general way .
>>> def slicer(iterable,*args):
... return chain.from_iterable(islice(iterable,*i) for i in args)
...
>>> list(slicer(range(40),(2,8),(10,16),(30,38)))
[2, 3, 4, 5, 6, 7, 10, 11, 12, 13, 14, 15, 30, 31, 32, 33, 34, 35, 36, 37]
Note : if you want to loop over the result you don't need convert the result to list!
You can add the two lists as #Bhargav_Rao stated. More generically, you can also use a list generator syntax:
test = [theList[i] for i in range(len(theList)) if 4 <= i <= 7 or 12 <= i <= 18]
This question already has answers here:
Best way to find the intersection of multiple sets?
(7 answers)
Closed 8 years ago.
I have a list-
list_of_sets = [{0, 1, 2}, {0}]
I want to calculate the intersection between the elements of the list. I have thought about this solution:
a = list_of_sets[0]
b = list_of_sets[1]
c = set.intersection(a,b)
This solution works as i know the number of the elements of the list. (So i can declare as many as variable i need like a,b etc.)
My problem is that i can't figure out a solution for the other case, where the number of the elements of the list is unknown.
N.B: the thought of counting the number of elements of the list using loop and than creating variables according to the result has already been checked. As i have to keep my code in a function (where the argument is list_of_sets), so i need a more generalized solution that can be used for any numbered list.
Edit 1:
I need a solution for all the elements of the list. (not pairwise or for 3/4 elements)
If you wanted the intersection between all elements of all_sets:
intersection = set.intersection(*all_sets)
all_sets is a list of sets. the set is the set type.
For pairwise calculations,
This calculates intersections of all unordered pairs of 2 sets from a list all_sets. Should you need for 3, then use 3 as the argument.
from itertools import combinations, starmap
all_intersections = starmap(set.intersection, combinations(all_sets, 2))
If you did need the sets a, b for calculations, then:
for a, b in combinations(all_sets, 2):
# do whatever with a, b
You want the intersection of all the set. Then:
list_of_sets[0].intersection(*list_of_sets[1:])
Should work.
Take the first set from the list and then intersect it with the rest (unpack the list with the *).
You can use reduce for this. If you're using Python 3 you will have to import it from functools. Here's a short demo:
#!/usr/bin/env python
n = 30
m = 5
#Find sets of numbers i: 1 <= i <= n that are coprime to each number j: 2 <= j <= m
list_of_sets = [set(i for i in range(1, n+1) if i % j) for j in range(2, m+1)]
print 'Sets in list_of_sets:'
for s in list_of_sets:
print s
print
#Get intersection of all the sets
print 'Numbers less than or equal to %d that are coprime to it:' % n
print reduce(set.intersection, list_of_sets)
output
Sets in list_of_sets:
set([1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29])
set([1, 2, 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 20, 22, 23, 25, 26, 28, 29])
set([1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15, 17, 18, 19, 21, 22, 23, 25, 26, 27, 29, 30])
set([1, 2, 3, 4, 6, 7, 8, 9, 11, 12, 13, 14, 16, 17, 18, 19, 21, 22, 23, 24, 26, 27, 28, 29])
Numbers less than or equal to 30 that are coprime to it:
set([1, 7, 11, 13, 17, 19, 23, 29])
Actually, we don't even need reduce() for this, we can simply do
set.intersection(*list_of_sets)
I will assign 8 days off to a crew randomly in a calendar month.
I would like to randomly choose 8 days, and the days off distribution should be as even as possible. I mean all 8 days-off shouldn't be gathered in first 8 days of the month, for example.
For example: [1, 5, 8, 14, 18, 24, 27, 30] is a good distribution.
[1,2,3,4,26,27,28,29] is not a good distribution.
Actually, a crew can't work 7 consecutive days. In every 7 days, there must be 1 day-off.
All days are treated equally, ie Sundays are not days-off by themselves. Crew may work on weekends as well.
I want to choose days-off one by one. Not 8 of them together at once.
Could you recommend an algorithm using python to achieve this?
Not all days in the month may be available to be days off.
Best Regards
Use random.sample() to get a random set from a sequence. List the days that are available, then pass that to the .sample() function:
import sample
daysoff = [1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20]
picked = random.sample(daysoff, 8)
In the above example I used the day of the month, and the list omits certain days (say, sundays and the last 10 days of the month), then we pick 8 random days from that population.
This is the key here:
Actually, a crew can't work 7 consecutive days. In every 7 days, there must be 1 day-off.
Reword the problem to say a random 2 days in every 7 days (or divide the month into four lengths of time as appropriate). You are then guaranteed an even-ish distribution. Use random.sample() as Martijn Pieters suggests.
You can generate two values using this technique from the first week, then yield them in sequence if you want them one by one.
edit:
As observed by tcaswell, there are still some cases where you end up with ten days in a row on duty. To combat this, you can assign a day off every three days, create a list of ten, and remove two days at random from the subset of days that don't invalidate the 7-continuous-day criteria.
Alternatively, you could just keep generating lists using the original algorithm until it fits the criteria, since you're very likely to get a valid solution anyway. You'd have to write a validation function of some kind, but it would be very easy to do since you're just counting the longest continuous string of days on.
CODE:
An implementation of the second option.
import random
from itertools import chain
from itertools import count
def candidate(m):
''' Returns 2 days per week, in m days, where m is the length of the month. '''
weeks = weeksmaker(m)
return sorted(list(chain(*[random.sample(week, 2) for week in weeks])))
def weeksmaker(m):
''' Divides a month up into four weeks, randomly assigning extra days to weeks. '''
weeks = [range(i, i+7) for i in xrange(1,29,7)]
for i in range(m - 28):
weeks[random.randint(1, len(weeks))-1].append(i)
c = count(1)
return [[c.next() for day in week] for week in weeks]
def valid(days, c):
''' Validity check. Cant work more than c consecutive days. '''
for i in xrange(1, len(days)):
if days[i] - days[i-1] > c:
return False
else:
return True
def daysoff(m, n, c):
''' In month length m, need n days off, cant work more than c consecutive days. '''
while True:
days = candidate(n)
if valid(days, c):
return days
>>> for i in range(28, 32):
... daysoff(i, 8, 7)
...
[6, 7, 10, 14, 18, 20, 27, 28]
[4, 7, 10, 13, 19, 21, 23, 24]
[2, 4, 9, 13, 15, 20, 25, 27]
[1, 3, 9, 12, 18, 19, 24, 28]
You should just split the total number of days.
This code works regardless of the amount of days off needed, and regardless of the days there are in total.
from random import randint
def foo(l, n):
dist = round(len(l)/n)
return [randint(l[i*dist], l[(i+1)*dist-1]) for i in range(n)]
In [1]: days = [i for i in range(1,31)]
In [2]: foo(days, 8)
Out[2]: [1, 4, 6, 9, 13, 16, 20, 27]
In [3]: mylist = [i for i in range(500)]
In [4]: foo(mylist, 5)
Out[4]: [80, 147, 250, 346, 448]
Some problems will occur with the rounding tho, list index might get out of range or so.
This (I think) does what #Martijn did and has the extra benefit of not including consecutive days (eg if you don't want 8 off-days in a row):
#Day selector
import random
Ndays = 8
daysoff = range(1,25)
concurrent_tol = 3
while True:
cntr = 0
sample = random.sample(daysoff, Ndays)
sample.sort()
for i in range(1,Ndays-1):
if abs(sample[i]-sample[i-1]) == 1:
cntr +=1
if abs(sample[i]-sample[i+1]) == 1:
cntr +=1
if cntr<concurrent_tol:
print "Found a good set of off-days :"
print sample
break
else:
print "Didn't find a good set, trying again"
print sample
Output example:
Didn't find a good set, trying again
[3, 4, 5, 6, 7, 8, 9, 11]
Didn't find a good set, trying again
[1, 5, 6, 7, 12, 14, 19, 20]
Didn't find a good set, trying again
[4, 5, 7, 9, 11, 15, 16, 20]
Didn't find a good set, trying again
[3, 4, 6, 7, 12, 13, 14, 23]
Didn't find a good set, trying again
[1, 7, 10, 12, 15, 16, 17, 22]
Didn't find a good set, trying again
[5, 7, 8, 11, 17, 18, 19, 23]
Didn't find a good set, trying again
[3, 8, 11, 12, 13, 15, 17, 21]
Didn't find a good set, trying again
[2, 5, 7, 8, 9, 12, 13, 21]
Found a good set of off-days :
[1, 2, 5, 12, 15, 17, 19, 20]
This also has the added benefit of looking ugly. Note that possible days are 1-24, as defined in daysoff.
Generate (and store) a list of all valid work schedules (via brute force...there are only 30C8 ways to do it). You can then safely and quickly pick from that list later.
import itertools
import numpy as np
good_lst = []
for days_off in itertools.combinations(range(30),8):
if np.max(np.diff( (0,) + days_off + (30,))) < 7:
good_lst.append(days_off)
(there may be some off-by-one bugs in there somplace)
This ran on a decent machine in ~5min. You will probably want to do more pruning as (0, 1, 2, 3, 6, 12, 18, 24) is a valid work schedule, but involves 4 sections of 6 work days.