I have a list of sublists of random positive integers. This list is controlled by 3 parameters:
max_num: the maximum integer allowed in each sublist, e.g. if max_num = 3, the list will look like [[1,3], [3], [1,2,3], [1], ...];
max_length: the maximum number of intergers in each sublist;
n_gen: the number of sublists generated, i.e., the length of the list.
You can generate such list using the following code
import random
random.seed(10)
def random_numbers(length, max_num):
return [random.randint(1, max_num) for _ in range(length)]
max_num = 3
max_length = 3 # I want max_length=10
n_gen = 10 # I want n_gen=200
lst = [random_numbers(random.randint(1, max_length), max_num) for _ in range(n_gen)]
Now I want to split the list into two partitions, each partition has the same amount of each number. For example, if lst = [[1,2,3], [2,3], [1,3], [3]], one of the solution would be bipartition = [[[1,2,3], [3]], [[2,3], [1,3]]].
I managed to write the following brute-force enumeration for all possible bipartitions, which works fine for small parameters.
from itertools import product
lst1 = []
lst2 = []
for pattern in product([True, False], repeat=len(lst)):
lst1.append([x[1] for x in zip(pattern, lst) if x[0]])
lst2.append([x[1] for x in zip(pattern, lst) if not x[0]])
bipartitions = []
for l1, l2 in zip(lst1, lst2):
flat1 = [i for l in l1 for i in l]
flat2 = [i for l in l2 for i in l]
if sorted(flat1) == sorted(flat2):
bipartitions.append([l1, l2])
for bipartition in bipartitions:
print(bipartition)
Output:
[[[1, 2, 2], [1, 1, 2], [2, 3], [3, 2]], [[1], [2, 2, 1], [3], [1, 2], [3], [2, 2]]]
[[[1, 2, 2], [1, 1, 2], [3], [3], [2, 2]], [[2, 3], [1], [2, 2, 1], [1, 2], [3, 2]]]
[[[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]], [[1, 1, 2], [1, 2], [3], [2, 2], [3, 2]]]
[[[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]], [[1, 1, 2], [3], [1, 2], [2, 2], [3, 2]]]
[[[1, 2, 2], [2, 3], [1], [1, 2], [3, 2]], [[1, 1, 2], [2, 2, 1], [3], [3], [2, 2]]]
[[[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]], [[1, 1, 2], [2, 3], [1, 2], [3], [2, 2]]]
[[[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]], [[1, 1, 2], [2, 3], [3], [1, 2], [2, 2]]]
[[[1, 2, 2], [1], [3], [1, 2], [3], [2, 2]], [[1, 1, 2], [2, 3], [2, 2, 1], [3, 2]]]
[[[1, 2, 2], [2, 2, 1], [3], [1, 2], [3]], [[1, 1, 2], [2, 3], [1], [2, 2], [3, 2]]]
[[[1, 1, 2], [2, 3], [1], [2, 2], [3, 2]], [[1, 2, 2], [2, 2, 1], [3], [1, 2], [3]]]
[[[1, 1, 2], [2, 3], [2, 2, 1], [3, 2]], [[1, 2, 2], [1], [3], [1, 2], [3], [2, 2]]]
[[[1, 1, 2], [2, 3], [3], [1, 2], [2, 2]], [[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]]]
[[[1, 1, 2], [2, 3], [1, 2], [3], [2, 2]], [[1, 2, 2], [1], [2, 2, 1], [3], [3, 2]]]
[[[1, 1, 2], [2, 2, 1], [3], [3], [2, 2]], [[1, 2, 2], [2, 3], [1], [1, 2], [3, 2]]]
[[[1, 1, 2], [3], [1, 2], [2, 2], [3, 2]], [[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]]]
[[[1, 1, 2], [1, 2], [3], [2, 2], [3, 2]], [[1, 2, 2], [2, 3], [1], [2, 2, 1], [3]]]
[[[2, 3], [1], [2, 2, 1], [1, 2], [3, 2]], [[1, 2, 2], [1, 1, 2], [3], [3], [2, 2]]]
[[[1], [2, 2, 1], [3], [1, 2], [3], [2, 2]], [[1, 2, 2], [1, 1, 2], [2, 3], [3, 2]]]
However, when the parameters becomes larger, this becomes infeasible. Now I would like to generate random bipartitions that has same amount of each number, I guess a greedy algorithm will do. For my current task, I need to use
max_num = 3
max_length = 10
n_gen = 200
Any suggestions?
Edit: I am aware that there will be cases where such bipartition is not possible at all. My thought is when the bipartition suggested by the greedy algorithm after a maximum number of suggestions (e.g. 1000 if fast enough), we should believe there is no such bipartitions. When the parameters are large, even a check of whether such bipartition exist will be infeasible.
Holy heck this one was a doozy. First off, let me state the obvious. A greedy algorithm is deterministic, since it will always choose the optimal path. Second, the odds of actually being able to bipartition something is very, very unlikely. I also suggest that if you want to generate bipartitions, trying to find them from random sets like this is not a good idea.
Anyhow, on to the code. First, let me say that the code is not pretty, nor is it completely optimized. Towards the end there I wasn't even being Pythonic in some areas, but they are all easily fixable. I've been at this for hours, but it was a fun project. The copying of the list stands out as the prime suspect. You can re-write it and optimize it in your own time. I also can't guarantee that it's bug-free, but I'm pretty sure it is. Only exception being that you need to make sure that it at least does one "careful" search if you want any results. That brings me to the next point, the algorithm itself.
We start off by doing a pretty standard greedy algorithm. We pick an index from our partitionee and, WLOG, assign it to the left bipartition. Next we look at all possible ways of inserting all remaining lists. We choose the one that brings us closest to 0. We repeat until we hit some breakpoint, after which we switch to your exhaustive algorithm.
Now, odds are we don't find a partition for high values of your constants. I believe this is just a statistical thing, and not a problem with the algorithm, but I could be wrong.
I also implemented a rough feasibility test, and you'll see quite quickly that ~90% of all randomly generated nested lists can immediately be discarded as impossible to bipartition.
However, the addition of the greedy algorithm now allows me, on my machine, to go from testing ~15 length partitions to ~30 length ones, with good success of finding one. It also runs in less than a 10th of second with e.g. 3, 3, 40, 12 as its constants.
Finally, here is the code Note that I only made it generate one partition to test, so you might need to run it a few times before you even get a feasible one:
from itertools import product
import random
import datetime
import time
import sys
MAX_NUM = 3
MAX_LEN = 3
NUM_GEN = 200
NSWITCH = 12
random.seed(time.time())
def feasible(partitionee):
'''Does a rough test to see if it is feasible to find a partition'''
counts = [0 for _ in range(MAX_NUM)]
flat = [x for sublist in partitionee for x in sublist]
for n in flat:
counts[n-1] += 1
for n in counts:
if n % 2 != 0:
return False
return True
def random_numbers(length, max_num, n_lists):
'''Create a random list of lists of numbers.'''
lst = []
for i in range(n_lists):
sublist_length = random.randint(1, length)
lst.append([random.randint(1, max_num) for _ in range(sublist_length)])
return lst
def diff(lcounts, rcounts):
'''Calculate the difference between the counts in our dictionaries.'''
difference = 0
for i in range(MAX_NUM):
difference += abs(lcounts[i] - rcounts[i])
return difference
def assign(partition, d, sublist):
'''Assign a sublist to a partition, and update its dictionary.'''
partition.append(sublist)
for n in sublist:
d[n-1] += 1
def assign_value(d1, d2, sublist):
'''Calculates the loss of assigning sublist.'''
for n in sublist:
d1[n-1] += 1
left_score = diff(d1, d2)
for n in sublist:
d1[n-1] -= 1
d2[n-1] += 1
right_score = diff(d1, d2)
for n in sublist:
d2[n-1] -= 1
return (left_score, right_score)
def greedy_partition(left, right, lcounts, rcounts, i, partitionee):
# Assign the i:th sublist to the left partition.
assign(left, lcounts, partitionee[i])
del partitionee[i]
for _ in range(NUM_GEN - NSWITCH):
# Go through all unassigned sublists and get their loss.
value_for_index = {}
for i, sublist in enumerate(partitionee):
value = assign_value(lcounts, rcounts, sublist)
value_for_index[i] = value
# Find which choice would be closest to 0 difference.
min_value = 100000000000 # BIG NUMBER
best_index = -1
choose_left = True
for key, value in value_for_index.items():
if min(value) < min_value:
min_value = min(value)
choose_left = value[0] < value[1]
best_index = key
# Assign it to the proper list.
if choose_left:
assign(left, lcounts, partitionee[best_index])
else:
assign(right, rcounts, partitionee[best_index])
del partitionee[best_index]
return diff(lcounts, rcounts)
# Create our list to partition.
partition_me = random_numbers(MAX_LEN, MAX_NUM, NUM_GEN)
start_time = datetime.datetime.now()
# Start by seeing if it's even feasible to partition.
if not feasible(partition_me):
print('No bipartition possible!')
sys.exit()
# Go through all possible starting arrangements.
min_score_seen = 100000000000 # BIG NUMBER
best_bipartition = []
for i in range(NUM_GEN):
# Create left and right partitions, as well as maps to count how many of each
# number each partition has accumulated.
left = []
right = []
lcounts = [0 for i in range(MAX_NUM)]
rcounts = [0 for i in range(MAX_NUM)]
# Copy partitionee since it will be consumed.
partition = partition_me.copy()
# Do greedy partition.
score = greedy_partition(left, right, lcounts, rcounts, i, partition)
if score < min_score_seen:
min_score_seen = score
best_bipartition = [left] + [right]
# Now that we've been greedy and fast, we will be careful and slow.
# Consider all possible remaining arrangements.
print('Done with greedy search, starting careful search.')
left = best_bipartition[0]
right = best_bipartition[1]
for pattern in product([True, False], repeat=len(partition)):
lst1 = left + ([x[1] for x in zip(pattern, partition) if x[0]])
lst2 = right +([x[1] for x in zip(pattern, partition) if not x[0]])
left_flat = [x for sublist in lst1 for x in sublist]
right_flat = [x for sublist in lst2 for x in sublist]
if sorted(left_flat) == sorted(right_flat):
print('Found bipartition by careful search:')
print([lst1] + [lst2])
break
end_time = datetime.datetime.now()
print('Time taken: ', end='')
print(end_time - start_time)
This is quite different what I found in many threads - I don't mean to make list flat but unnest levels as follows:
[[[3, 3]]] should be [3, 3]
[[[3, 4], [3, 3]]] should be [[3, 4], [3, 3]] but not [3, 4], [3, 3] nor [3, 4, 3, 3] because this changes the structure completely.
Basically, I wanted to reduce levels to get the same len(a_list) in first and second iteration before loop break. But my idea is somewhat wrong:
This code works for anything but [[3], [4]]. Dunno what's wrong today because it worked yesterday. Need some help to correct this function. Now it returns [3] but should be unchanged.
# Unlevel list - reduce unnecessary nesting without changing nested lists structure
def unlevelList(l):
if len(l) > 0 and isinstance(l, list):
done = True
while done == True:
if isinstance(l[0], list):
if len(l) == len(l[0]):
l = l[0]
else:
l = l[0]
done = False
else:
done = False
return l
else:
return l
I'd be inclined to do this with recursion: if the object is a list of length 1, strip off the outer layer; then, recursively unlevel all of its children.
def unlevel(obj):
while isinstance(obj, list) and len(obj) == 1:
obj = obj[0]
if isinstance(obj, list):
return [unlevel(item) for item in obj]
else:
return obj
test_cases = [
[[[3, 3]]],
[[[3, 4], [3, 3]]],
[[3], [4]],
[[[3]]],
[[[3], [3, 3]]]
]
for x in test_cases:
print("When {} is unleveled, it becomes {}".format(x, unlevel(x)))
Result:
When [[[3, 3]]] is unleveled, it becomes [3, 3]
When [[[3, 4], [3, 3]]] is unleveled, it becomes [[3, 4], [3, 3]]
When [[3], [4]] is unleveled, it becomes [3, 4]
When [[[3]]] is unleveled, it becomes 3
When [[[3], [3, 3]]] is unleveled, it becomes [3, [3, 3]]
Edit: reading your question again, I think perhaps you want [[3], [4]] to remain [[3], [4]]. If that is the case, then I interpret the requirements to be "only strip off excess brackets from the top layer; leave inner one-element lists unaffected". In which case you don't need recursion. Just strip off the top list until you can't any more, then return it.
def unlevel(obj):
while isinstance(obj, list) and len(obj) == 1:
obj = obj[0]
return obj
test_cases = [
[[[3, 3]]],
[[[3, 4], [3, 3]]],
[[3], [4]],
[[[3]]],
[[[3], [3, 3]]]
]
for x in test_cases:
print("When {} is unleveled, it becomes {}".format(x, unlevel(x)))
Result:
When [[[3, 3]]] is unleveled, it becomes [3, 3]
When [[[3, 4], [3, 3]]] is unleveled, it becomes [[3, 4], [3, 3]]
When [[3], [4]] is unleveled, it becomes [[3], [4]]
When [[[3]]] is unleveled, it becomes 3
When [[[3], [3, 3]]] is unleveled, it becomes [[3], [3, 3]]
Id recommend a recursive solution as well
def unnest(l):
if isinstance(l, list) and len(l) == 1 and isinstance(l[0], list):
return unnest(l[0])
return l
Some test cases
test_cases = [
[[[3], [3, 3]]],
[[[3, 3]]],
[[[3, 4], [3, 3]]],
[[3], [4]],
[[[3]]]
]
for i in test_cases:
print(unnest(i))
gives
[[3], [3, 3]]
[3, 3]
[[3, 4], [3, 3]]
[[3], [4]]
[3]
This code seems to do exactly what you want. Maintain the lists as lists(but flat).
import itertools
a = [[[[1, 2]]], [[2, 3, 4, 5]], [[[[[[134, 56]]]]]], 9, 8, 0]
res = []
for element in a:
if isinstance(element, list):
while len(element) == 1:
element = list(itertools.chain(*element))
res.append(element)
else:
res.append(element)
print(res)
With the result res being [[1, 2], [2, 3, 4, 5], [134, 56], 9, 8, 0]
I've been trying to convert my list
alist = [[1,[1,2]],[2,[3,4,5]],[3,[1,2]],[4,[3,4,5]],[5,[5,6,7]],[6,[1,2]]]
into this. Since the second item of those two sublists are same.
[[[1,3,6],[1,2]],[[2,4],[3,4,5]]]
This is my code
alist = [[1,[1,2]],[2,[3,4,5]],[3,[1,2]],[4,[3,4,5]],[5,[5,6,7]],[6,[1,2]]]
lst=[]
for i in range(len(alist)):
inner = []
inner1=[]
for j in range(i+1,len(alist)):
if i+1 < len(alist):
if alist[i][1] == alist[j][1]:
inner1.append(alist[i][0])
inner1.append(alist[j][0])
inner.append(inner1)
inner.append(alist[i][1])
lst.append(inner)
print(lst)
but it gives this instead
[[[1, 3, 1, 6], [1, 2], [1, 3, 1, 6], [1, 2]], [[1, 3, 1, 6], [1, 2], [1, 3, 1, 6], [1, 2]], [[2, 4], [3, 4, 5]], [[3, 6], [1, 2]]]
It works when there's only 2 elements that are the same but when there's 3 it doesn't work.
Example
[2,4],[3,4,5] #2 of the same elements from alist works
[1,3,1,6],[1,2] #3 of the same elements from alist doesn't work
Can anyone please offer a solution?
You can use a dict (an Ordered one since you have to maintain the order) to group "heads" by "tails":
alist = [[1,[1,2]],[2,[3,4,5]],[3,[1,2]],[4,[3,4,5]],[5,[5,6,7]],[6,[1,2]]]
from collections import OrderedDict
c = OrderedDict()
for head, tail in alist:
c.setdefault(tuple(tail), []).append(head)
res = [[heads, list(tail)] for tail, heads in c.items()]
print res
prints
[[[1, 3, 6], [1, 2]], [[2, 4], [3, 4, 5]], [[5], [5, 6, 7]]]
If you want to omit 5 (a group with a single "head"), add a condition to the res= line:
res = [[heads, list(tail)] for tail, heads in c.items() if len(heads) > 1]
I'm finding some weird behavior in python that I can't explain. I've written the following function:
def process_data(data_in, unique_recs_in):
recs = unique_recs_in
for x, dat in enumerate(recs):
recs[x].append(data_in.count(dat))
return recs
where data_in and unique_recs_in are lists of lists. 'data_in' counts represents receptors, with a list being stored for each receptor each time in fails a critera. 'Unique_recs_in' is a list of all the unique receptor locations.
What I can't figure out is when I call this function, my output 'recs' returns properly. However, 'unique_recs_in' changes when I run the function and is identical to 'recs'. I have bug tested the code and can confirm that it's in this function that that happens. Anyone have any ideas?
Edit: sample input below
data_in
[['631507.40000', '4833767.20000', '60.00'], ['631507.40000', '4833767.20000', '63.00'], ['631507.40000', '4833767.20000', '66.00']]
unique_recs_in:
[['631552.90000', '4833781.00000', '24.00'], ['631569.50000', '4833798.80000', '48.00'], ['631589.20000', '4833745.50000', '12.00']]
recs = unique_recs_in simply creates a new reference to the list object, to get a completely new copy of a list of lists use copy.deepcopy.
>>> lis = [[1, 2], [3, 4]]
>>> a = lis
>>> a.append(4) #changes both `a` and `lis`
>>> a, lis
([[1, 2], [3, 4], 4], [[1, 2], [3, 4], 4])
Even a shallow copy is not enough for list of lists:
>>> a = lis[:]
>>> a[0].append(100) #Inner lists are still same object, just the outer list has changed.
>>> a, lis
([[1, 2, 100], [3, 4]], [[1, 2, 100], [3, 4]])
copy.deepcopy returns a completely new copy:
>>> import copy
>>> a = copy.deepcopy(lis)
>>> lis
[[1, 2, 100], [3, 4], 4]
>>> a.append(999)
>>> a, lis
([[1, 2, 100], [3, 4], 4, 999], [[1, 2, 100], [3, 4], 4])
>>> a[0].append(1000)
>>> a, lis
([[1, 2, 100, 1000], [3, 4], 4, 999], [[1, 2, 100], [3, 4], 4])
If the list contains only immutable objects, then only a shallow copy is enough:
recs = unique_recs_in[:]
You might find this helpful as well: Python list([]) and []
You will probably want to store in recs a copy of unique_recs_in so it wont be modified. Try with this:
recs = [l[:] for l in unique_recs_in] # although Ashwini `deepcopy` is more elegant
By assigning a list to another list l1=l2 you're just establishing and alias between them (ie: both variables reference to the same list in memory) so modifying one will modify the other (because they are the same).
Hope this helps!