Correct generation of words in python

Correct generation of words in python - python

I think the intentions of this code are clear. I want to have in X all possible input words, with each digit being an element in a list. The following code works for 4 digits, but it gets unsustainable for bigger words. How can I make it more scalable? Let's assume I want the words of n digits instead of four.
d = [0,1]
X = [[x1,x2,x3,x4] for x1 in d for x2 in d for x3 in d for x4 in d]

You can use itertools.product for that:
from itertools import product
d = [0,1]
x = [list(t) for t in product(d,repeat=4)]
This gives:
>>> x
[[0, 0, 0, 0], [0, 0, 0, 1], [0, 0, 1, 0], [0, 0, 1, 1], [0, 1, 0, 0], [0, 1, 0, 1], [0, 1, 1, 0], [0, 1, 1, 1], [1, 0, 0, 0], [1, 0, 0, 1], [1, 0, 1, 0], [1, 0, 1, 1], [1, 1, 0, 0], [1, 1, 0, 1], [1, 1, 1, 0], [1, 1, 1, 1]]
And by modifying repeat= to for instance 5 you get all possible 5-digit lists.
In case you do not need lists - the elements are not supposed to be altered - tuples can be used, and in that case you can drop the list(..) construction:
# list of tuples
from itertools import product
d = [0,1]
x = list(product(d,repeat=4))
This generates:
>>> x
[(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 0), (0, 0, 1, 1), (0, 1, 0, 0), (0, 1, 0, 1), (0, 1, 1, 0), (0, 1, 1, 1), (1, 0, 0, 0), (1, 0, 0, 1), (1, 0, 1, 0), (1, 0, 1, 1), (1, 1, 0, 0), (1, 1, 0, 1), (1, 1, 1, 0), (1, 1, 1, 1)]
Note that product(..) itself is a generator: it generates the elements lazily. This can be useful if the number of resulting elements will be huge and you can process them one-at-a-time. In that case you thus better do not construct a list. You can for instance use:
for tup in product(d,repeat=4):
print(tup)
To print all tuples. Although the effect would be the same as with a for tup in x:, now the memory usage can be lower (it depends on the implementation of the garbage collector in the interpreter): since not all tuples have to be in memory at the same time. The next tuple the generator emit can possibly reuse the memory the previous tuple took since that place is now vacant (at least given you do not store the tuples in a list, etc.)

Related

Python3 - Permutations for 7 digit number that totals to a number

I need to find a solution for the below problem in Python3. I tried itertools.combinations but not clear on how to do it.
Prepare a 7-digit number that sums to 5. Each digit can be between 0-4 only. Also, there can be repetitions. Valid example numbers are -
[ [2,1,1,0,0,1,0], [3,0,1,0,0,1,0], [0,0,0,4,0,0,1], [1,0,0,3,0,1,0], [1,1,1,1,0,1,0], ...... ]
As you can see, numbers may appear more than once in this list.
How can I create a list of all combinations meeting the criteria above?

You can get all that sum to 5 with:
list(p for p in itertools.product(range(5),repeat = 7) if sum(p) == 5)
This yields 455 solutions.

This function will find every combination, with repeated combinations, that sum to N:
from itertools import product
from typing import List, Tuple
def perm_n_digit_total(n_digits, total, choices) -> List[Tuple]:
return list(filter(
lambda x: sum(x) == total,
product(choices, repeat=n_digits)
))
Example:
perm_n_digit_total(3, 1, range(4))
Out[43]: [(0, 0, 1), (0, 1, 0), (1, 0, 0)]
perm_n_digit_total(7, 5, range(4))[::50]
Out[49]:
[(0, 0, 0, 0, 0, 0, 5),
(0, 0, 0, 3, 1, 1, 0),
(0, 0, 2, 0, 3, 0, 0),
(0, 1, 0, 1, 3, 0, 0),
(0, 2, 0, 0, 1, 0, 2),
(0, 4, 1, 0, 0, 0, 0),
(1, 0, 1, 1, 1, 0, 1),
(1, 1, 1, 1, 1, 0, 0),
(2, 0, 1, 0, 0, 2, 0),
(3, 1, 0, 0, 0, 1, 0)]

Here's an itertools'less recursive solution.
def find_solutions(target, numbers, depth, potential_solution=[]):
if depth == 0:
if sum(potential_solution) == target:
print(potential_solution)
return
current_sum = sum(potential_solution)
for n in numbers:
new_sum = current_sum + n
if new_sum > target:
continue
find_solutions(target, numbers, depth - 1, potential_solution + [n])
find_solutions(target=5, numbers=[0,1,2,3,4], depth=7)
Output
[0, 0, 0, 0, 0, 1, 4]
[0, 0, 0, 0, 0, 2, 3]
[0, 0, 0, 0, 0, 3, 2]
[0, 0, 0, 0, 0, 4, 1]
[0, 0, 0, 0, 1, 0, 4]
[0, 0, 0, 0, 1, 1, 3]
...
[3, 1, 1, 0, 0, 0, 0]
[3, 2, 0, 0, 0, 0, 0]
[4, 0, 0, 0, 0, 0, 1]
[4, 0, 0, 0, 0, 1, 0]
[4, 0, 0, 0, 1, 0, 0]
[4, 0, 0, 1, 0, 0, 0]
[4, 0, 1, 0, 0, 0, 0]
[4, 1, 0, 0, 0, 0, 0]

If I got it, you need something like this:
import itertools
value = [0, 1, 2, 3, 4]
p = itertools.product(value, repeat=7)
for j in list(p):
print(j)

As each digit can only take 5 unique values - you would require itertools.combinations_with_replacement -
from itertools import combinations_with_replacement
zero_four = list(range(5))
for c in combinations_with_replacement(zero_four, 7):
if sum(c) == 5:
print(c)
This will give you all possible combinations that sum to 5 but not all the permutations -
Output
(0, 0, 0, 0, 0, 1, 4)
(0, 0, 0, 0, 0, 2, 3)
(0, 0, 0, 0, 1, 1, 3)
(0, 0, 0, 0, 1, 2, 2)
(0, 0, 0, 1, 1, 1, 2)
(0, 0, 1, 1, 1, 1, 1)
To get all permutations - you can use the itertools.permutations but since your output can have repeated elements, you will need to use a set to retain only unique permutations -
for c in combinations_with_replacement(zero_four, 7):
if sum(c) == 5:
print(set(permutations(c)))

enumerating strings of legnth K from an alphabet

I have two elements, let's say [0,1] and I want to construct all possible combinations of length 2k with the same number of these two elements. For example, let 2k=6 and our output should be like (0,0,0,1,1,1) ,(0,0,1,0,1,1),(1,1,1,0,0,0) etc.
I was trying to use something like this [x for x in itertools.product([1,0], repeat=6)] but it gives me all possible sequences(the number of ones and zeros may not be the same). Is it possible to somehow immediately create a list with a given property?

Try itertools.permutations, like below:
import itertools
def perms(k):
l=k//2*[1] + k//2 * [0]
m=[i for i in itertools.permutations(l)]
return list(set(m))
Output for k=6:
>>> perms(6)
[(0, 1, 1, 0, 0, 1), (0, 1, 0, 1, 1, 0), (0, 1, 0, 0, 1, 1), (1, 1, 0, 0, 1, 0), (1, 1, 1, 0, 0, 0), (1, 0, 1, 0, 1, 0), (0, 0, 0, 1, 1, 1), (1, 0, 0, 1, 0, 1), (0, 1, 1, 0, 1, 0), (0, 0, 1, 1, 0, 1), (1, 1, 0, 1, 0, 0), (0, 1, 0, 1, 0, 1), (1, 1, 0, 0, 0, 1), (1, 0, 1, 1, 0, 0), (1, 0, 0, 0, 1, 1), (1, 0, 1, 0, 0, 1), (0, 0, 1, 0, 1, 1), (1, 0, 0, 1, 1, 0), (0, 1, 1, 1, 0, 0), (0, 0, 1, 1, 1, 0)]
The code can be adjusted to work with more general structures (more elements, other than [0,1] etc. Let me know if you need any help with that.

Saying a list of length 2k with equal number of elements [a, b] is equivalent to saying you want to a list with a in k positions and b in the other. So we can generate a list full of a's or b's and find all the possible combinations how to put k elements of the other variable in this list. This set of possible combinations is the same as selecting a set of k indices from the range [0, ..., 2k]. For the mathematics around this see [1].
from itertools import combinations
k = 6
assert k // 2
a = 0
b = 1
l = [
[a if i in combination else b for i in range(k)]
for combination in combinations(range(k), int(k/2))
]
[1] https://en.wikipedia.org/wiki/Combination

Python - Get all unique combinations with replacement from lists of list with unequal length

Note : This is not a duplicate question as the title might say
If I have a list of list , I need to get all combinations from it with replacement.
import itertools
l = [[1,2,3] ,[1,2,3], [1,2,3]]
n = []
for i in itertools.product(*l):
if sorted(i) not in n:
n.append(sorted(i))
for i in n:
print(i)
[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 2]
[1, 2, 3]
[1, 3, 3]
[2, 2, 2]
[2, 2, 3]
[2, 3, 3]
[3, 3, 3]
Thanks to #RoadRunner and #Idlehands.
Above code is perfect with 2 problems :
For large list, itertools.product throws MemoryError. When l has 18 3-length sublists to give ~400mil combn.
Order matters and thus sorted would not work for my problem. This could be confusing for some and hence explaining with below example.
l = [[1,2,3], [1], [1,2,3]]
Here I have 2 unique groups :
Group1 : elements 0, 2 which has same value [1,2,3]
Group 2 : element 1 which has value [1]
Thus, the solutions I need is :
[1,1,1]
[1,1,2]
[1,1,3]
[2,1,2]
[2,1,3]
[3,1,3]
Thus location 1 was fixed to 1.
Hope this example helps.

What about grouping sequences with the same elements in different order with a collections.defaultdict, then picking the first element from each key:
from itertools import product
from collections import defaultdict
l = [[1] ,[1,2,3], [1,2,3]]
d = defaultdict(list)
for x in product(*l):
d[tuple(sorted(x))].append(x)
print([x[0] for x in d.values()])
Which gives:
[(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]
Alternatively, this can also be done with keeping a set of what has been added:
from itertools import product
l = [[1] ,[1,2,3], [1,2,3]]
seen = set()
combs = []
for x in product(*l):
curr = tuple(sorted(x))
if curr not in seen:
combs.append(x)
seen.add(curr)
print(combs)
# [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]
If you don't want to sort, consider using a frozenset with collections.Counter():
from collections import Counter
from itertools import product
l = [[1] ,[1,2,3], [1,2,3]]
seen = set()
combs = []
for x in product(*l):
curr = frozenset(Counter(x).items())
if curr not in seen:
seen.add(curr)
combs.append(x)
print(combs)
# [(1, 1, 1), (1, 1, 2), (1, 1, 3), (1, 2, 2), (1, 2, 3), (1, 3, 3)]
Note: You can also use setdefault() for the first approach, if you don't want to use a defaultdict().

Edited Answer:
Based on the new information, in order to handle a plethora of combination overloading the itertools.product(), we can try to pull the list in small batches:
from itertools import product
l = [list(range(3))]*18
prods = product(*l)
uniques = set()
results = []
totals = 0
def run_batch(n=1000000):
for i in range(n):
try:
result = next(prods)
except StopIteration:
break
unique = tuple(sorted(result))
if unique not in uniques:
uniques.add(unique)
results.append(result)
global totals
totals += i
run_batch()
print('Total iteration this batch: {0}'.format(totals))
print('Number of unique tuples: {0}'.format(len(uniques)))
print('Number of wanted combos: {0}'.format(len(results)))
Output:
Total iteration this batch: 999999
Number of unique tuples: 103
Number of wanted combos: 103
First 10 results:
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 2)
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 2)
Here we can control the batch size by calling next(prod) with the range of your choice, and continue as you see fit. The uniques are sorted tuples in a set as a reference point, and the results are in the proper order you wanted. Both size should be the same and are surprisingly small when I ran with the list of 3^18. I'm not well acquainted with memory allocation but this way the program shouldn't store all the unwanted results in memory, so you should therefore have more wiggle room. Otherwise, you can always opt to export the results to a file to make room. Obviously this sample only show the length of the list, but you can easily display/save that for your own purpose.
I can't argue this is the best approach or most optimized, but It seems to work for me. Maybe it'll work for you? This batch took approximately ~10s to run 5 times (avg ~2s each batch). The entire set of prods took me 15 minutes to run:
Total iteration: 387420102
Number of unique tuples: 190
Number of wanted combos: 190
Original Answer:
#RoadRunner had a neat solution with sort() and defaultdict, but I feel the latter was not needed. I leveraged his sort() suggestion and implemented a modified version here.
From this answer:
l = [[1] ,[1,2,3], [1,2,3]]
n = []
for i in itertools.product(*l):
if sorted(i) not in n:
n.append(sorted(i))
for i in n:
print(i)
Output:
[1, 1, 1]
[1, 1, 2]
[1, 1, 3]
[1, 2, 2]
[1, 2, 3]
[1, 3, 3]

For short input sequences, this can be done by filtering the output of itertools.product to just the unique values. One not optimized way is set(tuple(sorted(t)) for t in itertools.product(*l)), converting to a list if you like.
If you have enough of a Cartesian product fanout that this is too inefficient, and if your input example showing the sublists as sorted is something you can rely on, you could borrow a note from the docs' discussion of permutations and filter out non-sorted values:
The code for permutations() can be also expressed as a subsequence of product(), filtered to exclude entries with repeated elements (those from the same position in the input pool)
So you'd want a quick test for whether a value is sorted or not, something like this answer:
https://stackoverflow.com/a/3755410/2337736
And then list(t for t in itertools.product(*l) if is_sorted(t))
Beyond that, I think you'd have to get into recursion or a fixed length of l.

binary list contains all possible options by knowing its length in python

i am trying to get a binary list contains all possibilities by providing the length of these possible lists , now i found a solution but it is not very handy to be used in other functions.
example : i want a list of lists each one represents one binary option of four digits.
if the length is 4 then the result should be the following.
[[0, 0, 0, 0], [0, 0, 0, 1], [0, 0, 1, 0], [0, 0, 1, 1], [0, 1, 0, 0], [0, 1, 0, 1], [0, 1, 1, 0], [0, 1, 1, 1], [1, 0, 0, 0], [1, 0, 0, 1], [1, 0, 1, 0], [1, 0, 1, 1], [1, 1, 0, 0], [1, 1, 0, 1], [1, 1, 1, 0], [1, 1, 1, 1]]
what i have done is by the following code:
>>> [[a, b, c, d] for a in [0,1] for b in [0,1] for c in [0,1] for d in [0,1]]
Now , i am looking for a way that by knowing the length of each member binary list we can generate the big list without the need to type manually [ a, b, c, d] , so if is possible to generate the list by a function lets say L_set(4) we get the list above . and if we type L_set(3) we get the following:
[[0, 0, 0], [0, 0, 1], [0, 1, 0], [0, 1, 1], [1, 0, 0], [1, 0, 1], [1, 1, 0], [1, 1, 1]]
and by typing L_set(2) we get :
[[0, 0], [0, 1], [1, 0], [1, 1]]
and so on.
After spending few hours i felt stuck here in this point , i hope that some of you can help.
Thanks

Looks like a job for itertools.product:
>>> import itertools
>>> n = 4
>>> list(itertools.product((0,1), repeat=n))
[(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 0), (0, 0, 1, 1), (0, 1, 0, 0), (0, 1, 0, 1), (0, 1, 1, 0), (0, 1, 1, 1), (1, 0, 0, 0), (1, 0, 0, 1), (1, 0, 1, 0), (1, 0, 1, 1), (1, 1, 0, 0), (1, 1, 0, 1), (1, 1, 1, 0), (1, 1, 1, 1)]

I think the itertools module in the standard library can help, in particular the product function.
http://docs.python.org/2/library/itertools.html#itertools.product
for x in itertools.product( [0, 1] , repeat=3 ):
print x
gives
(0, 0, 0)
(0, 0, 1)
(0, 1, 0)
(0, 1, 1)
(1, 0, 0)
(1, 0, 1)
(1, 1, 0)
(1, 1, 1)
the repeat parameter is the length of each combination in the output

If you know the length to be n, then try this:
[list(bin(i)[2:]) for i in xrange((2**n)-1)]

In [12]: list(product(range(2), repeat=2)
Out[12]: [(0, 0), (0, 1), (1, 0), (1, 1)]

generating list for joint distribution

I'm pretty sure this is an easy problem but I am completely blacking out on how to fix this. I am trying to work my way through the PGM class on coursera and it starts of with joint probability distribution. So I am trying to generate a list of all possible distributions given n variables, where each variable can take on some discrete value between 0...z
so for instance say we have 3 variables, and each can take on values of just 0 and 1 I want to generate this:
[[0, 0, 1]
[0, 1, 0]
[1, 0, 0]
[1, 1, 0]
[0, 1, 1]
[1, 1, 1]
[1, 0, 1]
[0, 0, 0]]
I am working in python I am drawing a blank on how to dynamically generate this.

If you prefer list comprehension:
[[a, b, c] for a in range(2) for b in range(2) for c in range(2)]
And I forgot to mention that you can use pprint to get the effect you want:
>>> import pprint
>>> pprint.pprint([[a, b, c] for a in range(2) for b in range(2) for c in range(2)])
[[0, 0, 0],
[0, 0, 1],
[0, 1, 0],
[0, 1, 1],
[1, 0, 0],
[1, 0, 1],
[1, 1, 0],
[1, 1, 1]]
>>>

It sounds like you want the Cartesian product:
from itertools import product
for x in product([0,1], [0,1], [0,1]):
print x
[0, 0, 0]
[0, 0, 1]
[0, 1, 0]
[0, 1, 1]
[1, 0, 0]
[1, 0, 1]
[1, 1, 0]
[1, 1, 1]

Slight improvement over Nathan's method:
>>> import itertools
>>> list(itertools.product([0, 1], repeat=3))
[(0, 0, 0),
(0, 0, 1),
(0, 1, 0),
(0, 1, 1),
(1, 0, 0),
(1, 0, 1),
(1, 1, 0),
(1, 1, 1)]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Correct generation of words in python - python

Related

Python3 - Permutations for 7 digit number that totals to a number

enumerating strings of legnth K from an alphabet

Python - Get all unique combinations with replacement from lists of list with unequal length

binary list contains all possible options by knowing its length in python

generating list for joint distribution

Categories

Resources