How to update a list during a combination? - python

I would like to compare the elements of a list, and at the end of each for loop the combination resumes with the new updated list.
from itertools import combinations
aListe = ['a', 'b', 'c', 'd']
for first, second in combinations(aListe, 2):
# do something
aListe.remove(first)
ValueError: list.remove(x): x not in list
more specific example of the # do something.
Imagine my list contains shapely polygon such as
aListe = [polygon1, polygon2, polygon3, polygon4]
for first, second in combinations(aListe, 2):
if area(first) > area(second):
aListe.remove(first)
if the area of the first polygon in the list is already greater than the second I don't want it to be compared to the others. I would like the next for loop start on an updated list without the first polygon.

As you have seen, your code failed because itertools.combinations visits each list item multiple times and it can only be removed once (assuming that it is only contained once).
But even if you had added a check like if first in aList to circumvent this error, you might have gotten unexpected results, as is generally the case when removing items from a list while iterating over it, see Removing from a list while iterating over it.
In your case, instead of actually removing a polygon from the list, I would mark it as "removed" by adding it to a "removed" set, and skip an iteration if an already removed polygon is encountered again.
(Note, in order to add polygons to a set, if I understand https://shapely.readthedocs.io/en/stable/manual.html correctly, "[…] use the geometry ids as keys since the shapely geometries themselves are not hashable.")
from itertools import combinations
removed = set()
aListe = [polygon1, polygon2, polygon3, polygon4]
for first, second in combinations(aListe, 2):
if id(first) in removed:
continue
if area(first) > area(second):
removed.add(id(first))

Related

Shuffling with constraints on pairs

I have n lists each of length m. assume n*m is even. i want to get a randomly shuffled list with all elements, under the constraint that the elements in locations i,i+1 where i=0,2,...,n*m-2 never come from the same list. edit: other than this constraint i do not want to bias the distribution of random lists. that is, the solution should be equivalent to a complete random choice that is reshuffled until the constraint hold.
example:
list1: a1,a2
list2: b1,b2
list3: c1,c2
allowed: b1,c1,c2,a2,a1,b2
disallowed: b1,c1,c2,b2,a1,a2
A possible solution is to think of your number set as n chunks of item, each chunk having the length of m. If you randomly select for each chunk exactly one item from each lists, then you will never hit dead ends. Just make sure that the first item in each chunk (except the first chunk) will be of different list than the last element of the previous chunk.
You can also iteratively randomize numbers, always making sure you pick from a different list than the previous number, but then you can hit some dead ends.
Finally, another possible solution is to randomize a number on each position sequentially, but only from those which "can be put there", that is, if you put a number, none of the constraints will be violated, that is, you will have at least a possible solution.
A variation of b above that avoids dead ends: At each step you choose twice. First, randomly chose an item. Second, randomly choose where to place it. At the Kth step there are k optional places to put the item (the new item can be injected between two existing items). Naturally, you only choose from allowed places.
Money!
arrange your lists into a list of lists
save each item in the list as a tuple with the list index in the list of lists
loop n*m times
on even turns - flatten into one list and just rand pop - yield the item and the item group
on odd turns - temporarily remove the last item group and pop as before - in the end add the removed group back
important - how to avoid deadlocks?
a deadlock can occur if all the remaining items are from one group only.
to avoid that, check in each iteration the lengths of all the lists
and check if the longest list is longer than the sum of all the others.
if true - pull for that list
that way you are never left with only one list full
here's a gist with an attempt to solve this in python
https://gist.github.com/YontiLevin/bd32815a0ec62b920bed214921a96c9d
A very quick and simple method i am trying is:
random shuffle
loop over the pairs in the list:
if pair is bad:
loop over the pairs in the list:
if both elements of the new pair are different than the bad pair:
swap the second elements
break
will this always find a solution? will the solutions have the same distribution as naive shuffling until finding a legit solution?

Get unique entries in list of lists by an item

This seems like a fairly straightforward problem but I can't seem to find an efficient way to do it. I have a list of lists like this:
list = [['abc','def','123'],['abc','xyz','123'],['ghi','jqk','456']]
I want to get a list of unique entries by the third item in each child list (the 'id'), i.e. the end result should be
unique_entries = [['abc','def','123'],['ghi','jqk','456']]
What is the most efficient way to do this? I know I can use set to get the unique ids, and then loop through the whole list again. However, there are more than 2 million entries in my list and this is taking too long. Appreciate any pointers you can offer! Thanks.
How about this: Create a set that keeps track of ids already seen, and only append sublists where id's where not seen.
l = [['abc','def','123'],['abc','xyz','123'],['ghi','jqk','456']]
seen = set()
new_list = []
for sl in l:
if sl[2] not in seen:
new_list.append(sl)
seen.add(sl[2])
print new_list
Result:
[['abc', 'def', '123'], ['ghi', 'jqk', '456']]
One approach would be to create an inner loop. within the first loop you iterate over the outer list starting from 1, previously you will need to create an arraylist which will add the first element, inside the inner loop starting from index 0 you will check only if the third element is located as part of the third element within the arraylist current holding elements, if it is not found then on another arraylist whose scope is outside outher loop you will add this element, else you will use "continue" keyword. Finally you will print out the last arraylist created.

Python - Grab Random Names

Alright, so I have a question. I am working on creating a script that grabs a random name from a list of provided names, and generates them in a list of 5. I know that you can use the command
items = ['names','go','here']
rand_item = items[random.randrange(len(items))]
This, if I am not mistaken, should grab one random item from the list. Though if I am wrong correct me, but my question is how would I get it to generate, say a list of 5 names, going down like below;
random
names
generated
using
code
Also is there a way to make it where if I run this 5 days in a row, it doesn't repeat the names in the same order?
I appreciate any help you can give, or any errors in my existing code.
Edit:
The general use for my script will be to generate task assignments for a group of users every day, 5 days a week. What I am looking for is a way to generate these names in 5 different rotations.
I apologize for any confusion. Though some of the returned answers will be helpful.
Edit2:
Alright so I think I have mostly what I want, thank you Markus Meskanen & mescalinum, I used some of the code from both of you to resolve most of this issue. I appreciate it greatly. Below is the code I am using now.
import random
items = ['items', 'go', 'in', 'this', 'string']
rand_item = random.sample(items, 5)
for item in random.sample(items, 5):
print item
random.choice() is good for selecting on element at random.
However if you want to select multiple elements at random without repetition, you could use random.sample():
for item in random.sample(items, 5):
print item
For the last question, you should trust the (pseudo-) random generator to not give the same sequence on two consecutive days. The random seed is initialized with current time by default, so it's unlikely to observe the same sequence on two consecutive days, altough not impossible, especially if the number of items is small.
If you absolutely need to avoid this, save the last sequence to a file, and load it before shuffling, and keep shuffling until it gives you a different order.
You could use random.choice() to get one item only:
items = ['names','go','here']
rand_item = random.choice(items)
Now just repeat this 5 times (a for loop!)
If you want the names just in a random order, use random.shuffle() to get a different result every time.
It is not clear in your question if you simply want to shuffle the items or make choose a subset. From what I've made sense you want the second case.
You can use random.sample, to get a given number of random items from a list in python. If I wanted to get 3 randomly items from a list of five letters, I would do:
>>> import random
>>> random.sample(['a', 'b', 'c', 'd', 'e'], 3)
['b', 'a', 'e']
Note that the letters are not necessarily returned in the same order - 'b' is returned before 'a', although that wasn't the case in the original list.
Regarding the second part of your question, preventing it from generating
the same letters in the same order, you can append every new generated sublists in a file, retrieving this file during your script execution and generating a new sublist until it is different from every past generated sublist.
random.shuffle(items) will handle the random order generation
In [15]: print items
['names', 'go', 'here']
In [16]: for item in items: print item
names
go
here
In [17]: random.shuffle(items)
In [18]: for item in items: print item
here
names
go
For completeness, I agree with the above poster on random.choice().

Compare items in list with nested for-loop

I have a list of URLs in an open CSV which I have ordered alphabetically, and now I would like to iterate through the list and check for duplicate URLs. In a second step, the duplicate should then be removed from the list, but I am currently stuck on the checking part which I have tried to solve with a nested for-loop as follows:
for i in short_urls:
first_url = i
for s in short_urls:
second_url = s
if i == s:
print "duplicate"
else:
print "all good"
The print statements will obviously be replaced once the nested for-loop is working. Currently, the list contains a few duplicates, but my nested loop does not seem to work correctly as it does not recognise any of the duplicates.
My question is: are there better ways to do perform this exercise, and what is the problem with the current nested for-loop?
Many thanks :)
By construction, your method is faulty, even if you indent the if/else block correctly. For instance, imagine if you had [1, 2, 3] as short_urls for the sake of argument. The outer for loop will pick out 1 to compare to the list against. It will think it's finding a duplicate when in the inner for loop it encounters the first element, a 1 as well. Essentially, every element will be tagged as a duplicate and if you plan on removing duplicates, you'll end up with an empty list.
The better solution is to call set(short_urls) to get a set of your urls with the duplicates removed. If you want a list (as opposed to a set) of urls with the duplicates removed, you can convert the set back into a list with list(set(short_urls)).
In other words:
short_urls = ['google.com', 'twitter.com', 'google.com']
duplicates_removed_list = list(set(short_urls))
print duplicates_removed_list # Prints ['google.com', 'twitter.com']
if i == s:
is not inside the second for loop. You missed an indentation
for i in short_urls:
first_url = i
for s in short_urls:
second_url = s
if i == s:
print "duplicate"
else:
print "all good"
EDIT: Also you are comparing every element of an array with every element of the same array. This means compare the element at position 0 with the element at postion 0, which is obviously the same.
What you need to do is starting the second for at the position after that reached in the first for.

Nested for loop index out of range

I'm coming up with a rather trivial problem, but since I'm quite new to python, I'm smashing my head to my desk for a while. (Hurts). Though I believe that's more a logical thing to solve...
First I have to say that I'm using the Python SDK for Cinema 4D so I had to change the following code a bit. But here is what I was trying to do and struggling with:
I'm trying to group some polygon selections, which are dynamically generated (based on some rules, not that important).
Here's how it works the mathematical way:
Those selections are based on islands (means, that there are several polygons connected).
Then, those selections have to be grouped and put into a list that I can work with.
Any polygon has its own index, so this one should be rather simple, but like I said before, I'm quite struggling there.
The main problem is easy to explain: I'm trying to access a non existent index in the first loop, resulting in an index out of range error. I tried evaluating the validity first, but no luck. For those who are familiar with Cinema 4D + Python, I will provide some of the original code if anybody wants that. So far, so bad. Here's the simplified and adapted code.
edit: Forgot to mention that the check which causes the error actually should only check for duplicates, so the current selected number will be skipped since it hal already been processed. This is necessary due to computing-heavy calculations.
Really hope, anybody can bump me in the right direction and this code makes sense so far. :)
def myFunc():
sel = [0,1,5,12] # changes with every call of "myFunc", for example to [2,8,4,10,9,1], etc. - list alway differs in count of elements, can even be empty, groups are beeing built from these values
all = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] # the whole set
groups = [] # list to store indices-lists into
indices = [] # list to store selected indices
count = 0 # number of groups
tmp = [] # temporary list to copy the indices list into before resetting
for i in range(len(all)): # loop through values
if i not in groups[count]: # that's the problematic one; this one actually should check whether "i" is already inside of any list inside the group list, error is simply that I'm trying to check a non existent value
for index, selected in enumerate(sel): # loop through "sel" and return actual indices. "selected" determines, if "index" is selected. boolean.
if not selected: continue # pretty much self-explanatory
indices.append(index) # push selected indices to the list
tmp = indices[:] # clone list
groups.append(tmp) # push the previous generated list to another list to store groups into
indices = [] # empty/reset indices-list
count += 1 # increment count
print groups # debug
myFunc()
edit:
After adding a second list which will be filled by extend, not append that acts as counter, everything worked as expected! The list will be a basic list, pretty simple ;)
groups[count]
When you first call this, groups is an empty list and count is 0. You can't access the thing at spot 0 in groups, because there is nothing there!
Try making
groups = [] to groups = [[]] (i.e. instead of an empty list, a list of lists that only has an empty list).
I'm not sure why you'd want to add the empty list to groups. Perhaps this is better
if i not in groups[count]:
to
if not groups or i not in groups[count]:
You also don't need to copy the list if you're not going to use it for anything else. So you can replace
tmp = indices[:] # clone list
groups.append(tmp) # push the previous generated list to another list to store groups into
indices = [] # empty/reset indices-list
with
groups.append(indices) # push the previous generated list to another list to store groups into
indices = [] # empty/reset indices-list
You may even be able to drop count altogether (you can always use len(groups)). You can also replace the inner loop with a listcomprehension
def myFunc():
sel = [0,1,5,12] # changes with every call of "myFunc", for example to [2,8,4,10,9,1], etc. - list alway differs in count of elements, can even be empty, groups are beeing built from these values
all = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15] # the whole set
groups = [] # list to store indices-lists into
for i in range(len(all)): # loop through values
if not groups or i not in groups[-1]: # look in the latest group
indices = [idx for idx, selected in enumerate(sel) if selected]
groups.append(indices) # push the previous generated list to another list to store groups into
print groups # debug
correct line 11 from:
if i not in groups[count]
to:
if i not in groups:

Categories

Resources