Manual product with unknown number of arguments - python

The following examples give the same result:
A.
product = []
for a in "abcd":
for b in "xy":
product.append((a,b))
B.
from itertools import product
list(product("abcd","xy"))
How can I calculate the cartesian product like in example A when I don't know the number of arguments n?
REASON I'm asking this:
Consider this piece of code:
allocations = list(product(*strategies.values()))
for alloc in allocations:
PWC[alloc] = [a for (a,b) in zip(help,alloc) if coalitions[a] >= sum(b)]
The values of the strategies dictionary are list of tuples, help is an auxiliary variable (a list with the same length of every alloc) and coalitions is another dictionary that assigns to the tuples in help some numeric value.
Since strategies values are sorted, I know that the if statement won't be true anymore after a certain alloc. Since allocations is a pretty big list, I would avoid tons of comparisons and tons of sums if I could use the example algorithm A.

You can do:
items = ["abcd","xy"]
from itertools import product
list(product(*items))
The list items can contain an arbitrary number of strings and it'll the calculation with product will provide you with the Cartesian product of those strings.
Note that you don't have to turn it into a list - you can iterate over it and stop when you no longer wish to continue:
for item in product(*items):
print(item)
if condition:
break

If you just want to abort the allocations after you hit a certain condition, and you want to avoid generating all the elements from the cartesian product for those, then simply don’t make a list of all combinations in the first place.
itertools.product is lazy that means that it will only generate a single value of the cartesian product at a time. So you never need to generate all elements, and you also never need to compare the elements then. Just don’t call list() on the result as that would iterate the whole sequence and store all possible combinations in memory:
allocations = product(*strategies.values())
for alloc in allocations:
PWC[alloc] = [a for (a,b) in zip(help,alloc) if coalitions[a] >= sum(b)]
# check whether you can stop looking at more values from the cartesian product
if someCondition(alloc):
break
It’s just important to note how itertools.product generates the values, what pattern it follows. It’s basically equivalent to the following:
for a in firstIterable:
for b in secondIterable:
for c in thirdIterable:
…
for n in nthIterable:
yield (a, b, c, …, n)
So you get an increasing pattern from the left side of your iterables. So make sure that you order the iterables in a way that you can correctly specify a break condition.

Related

Selecting mutiple random sequences of N elements from M different-length lists in Python

I have X lists of elements, each list containing a different number of elements (without repetitions inside a single list). I want to generate (if possible, 500) sequences of 3 elements, where each element belongs to a different list, and sequences do not repeat. So something like:
X (in this case 4) lists of elements: [A1,A2], [B1,B2,B3,B4], [C1], [D1,D2,D3,D4,D5]
possible results: [A1,B2,D2], [B3,C1,D2], [A1,B2,C1]... (here 500 sequences are impossible, so would be less)
I think I know how to do it with a nasty loop: join all the lists, random.sample(len(l),3) from the joint list, if 2 indices belong to the same list repeat, if not, check if the sequence was not found before. But that would be very slow. I am looking for a more pythonic or more mathematically clever way.
Perhaps a better way would be to use random.sample([A,B,C,D], 3, p=[len(A), len(B), len(C), len(D)]), then for each sequence from it randomly select an element from each group in the sequence, then check if a new sequence generated in this way hasn't been generated before. But again, a lot of looping.
Any better ideas?
Check itertools module (combination and permutation in particular).
You can get a random.choice() from the permutations of 3 elements from the X lists (thus selecting 3 lists), and for each of them get a random.choice() (random module).

How to count amount of combinations in a python list

I need some help to count the amount of combinations in a list array in python.
I need to count the amount of possible combinations between three letters in all of the elements and then find the most repeated one. eg, ABC, CDA, CCA, etc...
I have created a for loop to look in each element of the list, then I have another loop to check each combo of three letters and add it to a new list. I am not sure about how to count the amount of times a combination is repeated, and then to find the mode, I think I might use the max() function.
this is part of the code I have, but it does not work as I am expecting, because it is just adding each item of the list into an independent list.
lst = ["ABCDABCD", "ABDCABD", "ACCACABB", "BACDABC"]
for combo in lst:
for i in range (0,3):
combolst = []
combolst.append(lst[i].split())
print(combolst)
I am new to coding so that's why I'm here. Thanks!
(Assuming my math memory isn't garbage)
So okay, we are interested in combinations. Your code simply splits the list and creates a new one (as you said). Then we would use the combination formula : n!/(z!(n-z)!).
Where:
n is the number of elements, in this case the length of our string in question
z would be how many objects we wish to choose
Thus you would get:
for combo in lst:
n = math.factorial(len(combo))
r = math.factorial(3)
nMinR = math.factorial((len(combo) - 3))
result = n/(r*nMinR)
print(result)
This is for combination, if we want permutations (where order does matter)
for combo in lst:
n = math.factorial(len(combo))
nMinR = math.factorial((len(combo) - 3))
result = n/(nMinR)
print(result)
I hope I understood your question correctly. Here is some reading about combinations vs permutations (https://medium.com/i-math/combinations-permutations-fa7ac680f0ac). Keep in mind, the above code will only print out how many possible combinations or permutations are possible; it won't actually try to construct the possible values

Shuffling with constraints on pairs

I have n lists each of length m. assume n*m is even. i want to get a randomly shuffled list with all elements, under the constraint that the elements in locations i,i+1 where i=0,2,...,n*m-2 never come from the same list. edit: other than this constraint i do not want to bias the distribution of random lists. that is, the solution should be equivalent to a complete random choice that is reshuffled until the constraint hold.
example:
list1: a1,a2
list2: b1,b2
list3: c1,c2
allowed: b1,c1,c2,a2,a1,b2
disallowed: b1,c1,c2,b2,a1,a2
A possible solution is to think of your number set as n chunks of item, each chunk having the length of m. If you randomly select for each chunk exactly one item from each lists, then you will never hit dead ends. Just make sure that the first item in each chunk (except the first chunk) will be of different list than the last element of the previous chunk.
You can also iteratively randomize numbers, always making sure you pick from a different list than the previous number, but then you can hit some dead ends.
Finally, another possible solution is to randomize a number on each position sequentially, but only from those which "can be put there", that is, if you put a number, none of the constraints will be violated, that is, you will have at least a possible solution.
A variation of b above that avoids dead ends: At each step you choose twice. First, randomly chose an item. Second, randomly choose where to place it. At the Kth step there are k optional places to put the item (the new item can be injected between two existing items). Naturally, you only choose from allowed places.
Money!
arrange your lists into a list of lists
save each item in the list as a tuple with the list index in the list of lists
loop n*m times
on even turns - flatten into one list and just rand pop - yield the item and the item group
on odd turns - temporarily remove the last item group and pop as before - in the end add the removed group back
important - how to avoid deadlocks?
a deadlock can occur if all the remaining items are from one group only.
to avoid that, check in each iteration the lengths of all the lists
and check if the longest list is longer than the sum of all the others.
if true - pull for that list
that way you are never left with only one list full
here's a gist with an attempt to solve this in python
https://gist.github.com/YontiLevin/bd32815a0ec62b920bed214921a96c9d
A very quick and simple method i am trying is:
random shuffle
loop over the pairs in the list:
if pair is bad:
loop over the pairs in the list:
if both elements of the new pair are different than the bad pair:
swap the second elements
break
will this always find a solution? will the solutions have the same distribution as naive shuffling until finding a legit solution?

Automatic dictionary generation - Python

I am trying to create a program which outputs all permutations of a string of length n whilst avoiding a defined substring, of length k. For example:
Derive all possible strings, up to a length of 5 characters, that can be generated from an initial empty set, which can either go to A or B, but the string cannot contain the substring "AAB" which is not allowed.
i.e. base case of [""] is the empty set.
The dictionary would be - A:{A}, B:{A,B}
From the empty set we can go to A, and we can go to B. We can not go to a B after an A but we can go to an A after a B. And both A and B can access themselves
example output: a,b,aa,bb,ba,aaa,bbb,baa,bba ... etc
How would I go about prompting a user to define a substring to avoid, and from that generate a dictionary which abides to these rules?
Any help or clarification would be greatly received.
Regards,
rkhad
The itertools module has a useful method called permutations():
(from http://docs.python.org/library/itertools.html#itertools.permutations)
itertools.permutations(iterable[, r])
Return successive r length permutations of elements in the iterable.
If r is not specified or is None, then r defaults to the length of the
iterable and all possible full-length permutations are generated.
Permutations are emitted in lexicographic sort order. So, if the input
iterable is sorted, the permutation tuples will be produced in sorted
order.
List comprehensions provide an easy way to filter generated permutations like this, but beware that if you are storing permutations of a large string that you will quickly get a very large list. You may want to therefore use a set to whittle down your list to non-duplicates. Also, you may find the function sorted to be useful if you intend to iterate through your "paths" in lexicographic order. Lastly, the in operator, when applied to strings, checks for a substring (x in y checks if x is a substring of y).
>>> from itertools import permutations
>>> perms = [''.join(p) for p in permutations('AAAABBBB', 4)]
>>> len(perms)
1680
>>> len(set(perms))
16
>>> filtered = [p for p in sorted(set(perms)) if 'AB' not in p]
>>> filtered
['AAAA', 'BAAA', 'BBAA', 'BBBA', 'BBBB']
I'm working on my dissertation right now too, in the area of Formal Languages. The concept of substring membership can be represented by a very simple regular grammar which corresponds to a deterministic finite automaton. To jog your memory:
http://en.wikipedia.org/wiki/Regular_grammar
http://en.wikipedia.org/wiki/Finite-state_machine
When you look into these you will find that you need to somehow keep track of the current "state" of your computation if you want it to have different "dictionaries" at different phases. I encourage you to read the wikipedia articles, and ask me some follow-up questions as I'd be happy to help you work through this.

Comparing massive lists of dictionaries in python

I never actually thought I'd run into speed-issues with python, but I have. I'm trying to compare really big lists of dictionaries to each other based on the dictionary values. I compare two lists, with the first like so
biglist1=[{'transaction':'somevalue', 'id':'somevalue', 'date':'somevalue' ...}, {'transactio':'somevalue', 'id':'somevalue', 'date':'somevalue' ...}, ...]
With 'somevalue' standing for a user-generated string, int or decimal. Now, the second list is pretty similar, except the id-values are always empty, as they have not been assigned yet.
biglist2=[{'transaction':'somevalue', 'id':'', 'date':'somevalue' ...}, {'transactio':'somevalue', 'id':'', 'date':'somevalue' ...}, ...]
So I want to get a list of the dictionaries in biglist2 that match the dictionaries in biglist1 for all other keys except id.
I've been doing
for item in biglist2:
for transaction in biglist1:
if item['transaction'] == transaction['transaction']:
list_transactionnamematches.append(transaction)
for item in biglist2:
for transaction in list_transactionnamematches:
if item['date'] == transaction['date']:
list_transactionnamematches.append(transaction)
... and so on, not comparing id values, until I get a final list of matches. Since the lists can be really big (around 3000+ items each), this takes quite some time for python to loop through.
I'm guessing this isn't really how this kind of comparison should be done. Any ideas?
Index on the fields you want to use for lookup. O(n+m)
matches = []
biglist1_indexed = {}
for item in biglist1:
biglist1_indexed[(item["transaction"], item["date"])] = item
for item in biglist2:
if (item["transaction"], item["date"]) in biglist1_indexed:
matches.append(item)
This is probably thousands of times faster than what you're doing now.
What you want to do is to use correct data structures:
Create a dictionary of mappings of tuples of other values in the first dictionary to their id.
Create two sets of tuples of values in both dictionaries. Then use set operations to get the tuple set you want.
Use the dictionary from the point 1 to assign ids to those tuples.
Forgive my rusty python syntax, it's been a while, so consider this partially pseudocode
import operator
biglist1.sort(key=(operator.itemgetter(2),operator.itemgetter(0)))
biglist2.sort(key=(operator.itemgetter(2),operator.itemgetter(0)))
i1=0;
i2=0;
while i1 < len(biglist1) and i2 < len(biglist2):
if (biglist1[i1]['date'],biglist1[i1]['transaction']) == (biglist2[i2]['date'],biglist2[i2]['transaction']):
biglist3.append(biglist1[i1])
i1++
i2++
elif (biglist1[i1]['date'],biglist1[i1]['transaction']) < (biglist2[i2]['date'],biglist2[i2]['transaction']):
i1++
elif (biglist1[i1]['date'],biglist1[i1]['transaction']) > (biglist2[i2]['date'],biglist2[i2]['transaction']):
i2++
else:
print "this wont happen if i did the tuple comparison correctly"
This sorts both lists into the same order, by (date,transaction). Then it walks through them side by side, stepping through each looking for relatively adjacent matches. It assumes that (date,transaction) is unique, and that I am not completely off my rocker with regards to tuple sorting and comparison.
In O(m*n)...
for item in biglist2:
for transaction in biglist1:
if (item['transaction'] == transaction['transaction'] &&
item['date'] == transaction['date'] &&
item['foo'] == transaction['foo'] ) :
list_transactionnamematches.append(transaction)
The approach I would probably take to this is to make a very, very lightweight class with one instance variable and one method. The instance variable is a pointer to a dictionary; the method overrides the built-in special method __hash__(self), returning a value calculated from all the values in the dictionary except id.
From there the solution seems fairly obvious: Create two initially empty dictionaries: N and M (for no-matches and matches.) Loop over each list exactly once, and for each of these dictionaries representing a transaction (let's call it a Tx_dict), create an instance of the new class (a Tx_ptr). Then test for an item matching this Tx_ptr in N and M: if there is no matching item in N, insert the current Tx_ptr into N; if there is a matching item in N but no matching item in M, insert the current Tx_ptr into M with the Tx_ptr itself as a key and a list containing the Tx_ptr as the value; if there is a matching item in N and in M, append the current Tx_ptr to the value associated with that key in M.
After you've gone through every item once, your dictionary M will contain pointers to all the transactions which match other transactions, all neatly grouped together into lists for you.
Edit: Oops! Obviously, the correct action if there is a matching Tx_ptr in N but not in M is to insert a key-value pair into M with the current Tx_ptr as the key and as the value, a list of the current Tx_ptr and the Tx_ptr that was already in N.
Have a look at Psyco. Its a Python compiler that can create very fast, optimized machine code from your source.
http://sourceforge.net/projects/psyco/
While this isn't a direct solution to your code's efficiency issues, it could still help speed things up without needing to write any new code. That said, I'd still highly recommend optimizing your code as much as possible AND use Psyco to squeeze as much speed out of it as possible.
Part of their guide specifically talks about using it to speed up list, string, and numeric computation heavy functions.
http://psyco.sourceforge.net/psycoguide/node8.html
I'm also a newbie. My code is structured in much the same way as his.
for A in biglist:
for B in biglist:
if ( A.get('somekey') <> B.get('somekey') and #don't match to itself
len( set(A.get('list')) - set(B.get('list')) ) > 10:
[do stuff...]
This takes hours to run through a list of 10000 dictionaries. Each dictionary contains lots of stuff but I could potentially pull out just the ids ('somekey') and lists ('list') and rewrite as a single dictionary of 10000 key:value pairs.
Question: how much faster would that be? And I assume this is faster than using a list of lists, right?

Categories

Resources