Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I need to generate every possible combination from a given charset to a given range.
Like,
charset=list(map(str,"abcdefghijklmnopqrstuvwxyz"))
range=10
And the out put should be,
[a,b,c,d..................,zzzzzzzzzy,zzzzzzzzzz]
I know I can do this using already in use libraries.But I need to know how they really works.If anyone can give me a commented code of this kind of algorithm in Python or any programming language readable,I would be very grateful.
Use itertools.product, combined with itertools.chain to put the various lengths together:
from itertools import chain, product
def bruteforce(charset, maxlength):
return (''.join(candidate)
for candidate in chain.from_iterable(product(charset, repeat=i)
for i in range(1, maxlength + 1)))
Demonstration:
>>> list(bruteforce('abcde', 2))
['a', 'b', 'c', 'd', 'e', 'aa', 'ab', 'ac', 'ad', 'ae', 'ba', 'bb', 'bc', 'bd', 'be', 'ca', 'cb', 'cc', 'cd', 'ce', 'da', 'db', 'dc', 'dd', 'de', 'ea', 'eb', 'ec', 'ed', 'ee']
This will efficiently produce progressively larger words with the input sets, up to length maxlength.
Do not attempt to produce an in-memory list of 26 characters up to length 10; instead, iterate over the results produced:
for attempt in bruteforce(string.ascii_lowercase, 10):
# match it against your password, or whatever
if matched:
break
If you REALLY want to brute force it, try this, but it will take you a ridiculous amount of time:
your_list = 'abcdefghijklmnopqrstuvwxyz'
complete_list = []
for current in xrange(10):
a = [i for i in your_list]
for y in xrange(current):
a = [x+i for i in your_list for x in a]
complete_list = complete_list+a
On a smaller example, where list = 'ab' and we only go up to 5, this prints the following:
['a', 'b', 'aa', 'ba', 'ab', 'bb', 'aaa', 'baa', 'aba', 'bba', 'aab', 'bab', 'abb', 'bbb', 'aaaa', 'baaa', 'abaa', 'bbaa', 'aaba', 'baba', 'abba', 'bbba', 'aaab', 'baab', 'abab', 'bbab', 'aabb', 'babb', 'abbb', 'bbbb', 'aaaaa', 'baaaa', 'abaaa', 'bbaaa', 'aabaa', 'babaa', 'abbaa', 'bbbaa', 'aaaba','baaba', 'ababa', 'bbaba', 'aabba', 'babba', 'abbba', 'bbbba', 'aaaab', 'baaab', 'abaab', 'bbaab', 'aabab', 'babab', 'abbab', 'bbbab', 'aaabb', 'baabb', 'ababb', 'bbabb', 'aabbb', 'babbb', 'abbbb', 'bbbbb']
I found another very easy way to create dictionaries using itertools.
generator=itertools.combinations_with_replacement('abcd', 4 )
This will iterate through all combinations of 'a','b','c' and 'd' and create combinations with a total length of 1 to 4. ie. a,b,c,d,aa,ab.........,dddc,dddd. generator is an itertool object and you can loop through normally like this,
for password in generator:
''.join(password)
Each password is infact of type tuple and you can work on them as you normally do.
If you really want a bruteforce algorithm, don't save any big list in the memory of your computer, unless you want a slow algorithm that crashes with a MemoryError.
You could try to use itertools.product like this :
from string import ascii_lowercase
from itertools import product
charset = ascii_lowercase # abcdefghijklmnopqrstuvwxyz
maxrange = 10
def solve_password(password, maxrange):
for i in range(maxrange+1):
for attempt in product(charset, repeat=i):
if ''.join(attempt) == password:
return ''.join(attempt)
solved = solve_password('solve', maxrange) # This worked for me in 2.51 sec
itertools.product(*iterables) returns the cartesian products of the iterables you entered.
[i for i in product('bar', (42,))] returns e.g. [('b', 42), ('a', 42), ('r', 42)]
The repeat parameter allows you to make exactly what you asked :
[i for i in product('abc', repeat=2)]
Returns
[('a', 'a'),
('a', 'b'),
('a', 'c'),
('b', 'a'),
('b', 'b'),
('b', 'c'),
('c', 'a'),
('c', 'b'),
('c', 'c')]
Note:
You wanted a brute-force algorithm so I gave it to you. Now, it is a very long method when the password starts to get bigger because it grows exponentially (it took 62 sec to find the word 'solved').
itertools is ideally suited for this:
itertools.chain.from_iterable((''.join(l)
for l in itertools.product(charset, repeat=i))
for i in range(1, maxlen + 1))
A solution using recursion:
def brute(string, length, charset):
if len(string) == length:
return
for char in charset:
temp = string + char
print(temp)
brute(temp, length, charset)
Usage:
brute("", 4, "rce")
import string, itertools
#password = input("Enter password: ")
password = "abc"
characters = string.printable
def iter_all_strings():
length = 1
while True:
for s in itertools.product(characters, repeat=length):
yield "".join(s)
length +=1
for s in iter_all_strings():
print(s)
if s == password:
print('Password is {}'.format(s))
break
Simple solution using the itertools and string modules
# modules to easily set characters and iterate over them
import itertools, string
# character limit so you don't run out of ram
maxChar = int(input('Character limit for password: '))
# file to save output to, so you can look over the output without using so much ram
output_file = open('insert filepath here', 'a+')
# this is the part that actually iterates over the valid characters, and stops at the
# character limit.
x = list(map(''.join, itertools.permutations(string.ascii_lowercase, maxChar)))
# writes the output of the above line to a file
output_file.write(str(x))
# saves the output to the file and closes it to preserve ram
output_file.close()
I piped the output to a file to save ram, and used the input function so you can set the character limit to something like "hiiworld". Below is the same script but with a more fluid character set using letters, numbers, symbols, and spaces.
import itertools, string
maxChar = int(input('Character limit for password: '))
output_file = open('insert filepath here', 'a+')
x = list(map(''.join, itertools.permutations(string.printable, maxChar)))
x.write(str(x))
x.close()
from random import choice
sl = 4 #start length
ml = 8 #max length
ls = '9876543210qwertyuiopasdfghjklzxcvbnm' # list
g = 0
tries = 0
file = open("file.txt",'w') #your file
for j in range(0,len(ls)**4):
while sl <= ml:
i = 0
while i < sl:
file.write(choice(ls))
i += 1
sl += 1
file.write('\n')
g += 1
sl -= g
g = 0
print(tries)
tries += 1
file.close()
Try this:
import os
import sys
Zeichen=["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s",";t","u","v","w","x","y","z"]
def start(): input("Enter to start")
def Gen(stellen): if stellen==1: for i in Zeichen: print(i) elif stellen==2: for i in Zeichen: for r in Zeichen: print(i+r) elif stellen==3: for i in Zeichen: for r in Zeichen: for t in Zeichen: print(i+r+t) elif stellen==4: for i in Zeichen: for r in Zeichen: for t in Zeichen: for u in Zeichen: print(i+r+t+u) elif stellen==5: for i in Zeichen: for r in Zeichen: for t in Zeichen: for u in Zeichen: for o in Zeichen: print(i+r+t+u+o) else: print("done")
#*********************
start()
Gen(1)
Gen(2)
Gen(3)
Gen(4)
Gen(5)
Related
Is there a simple way to count using letters in Python? Meaning, 'A' will be used as 1, 'B' as 2 and so on, and after 'Z' will be 'AA', 'AB' and so on. So below code would generate:
def get_next_letter(last_letter):
return last_letter += 1 # pseudo
>>> get_next_letter('a')
'b'
>>> get_next_letter('b')
'c'
>>> get_next_letter('c')
'd'
...
>>> get_next_letter('z')
'aa'
>>> get_next_letter('aa')
'ab'
>>> get_next_letter('ab')
'ac'
...
>>> get_next_letter('az')
'ba'
>>> get_next_letter('ba')
'bb'
...
>>> get_next_letter('zz')
'aaa'
Based on #Charlie Clark's implementation of the openpyxl util get_column_letter, we can have:
def get_number_letter(n):
letters = []
while n > 0:
n, remainder = divmod(n, 26)
# check for exact division and borrow if needed
if remainder == 0:
remainder = 26
n-= 1
letters.append(chr(remainder+64))
return ''.join(reversed(letters))
This gives the letter representation of a number. Now, to increment, we need the reverse. Based on that logic (and the general number base logic), I wrote:
def number_from_string(letters):
n = 0
for i, c in enumerate(reversed(letters)):
n += (ord(c)-64)*26**i
return n
And now we can combine them to:
def get_next_letter(letters):
return get_number_letter(number_from_string(letters)+1)
Original answer:
This kind of "counting" is very similar to how Excel indexes its columns. Therefore it is possible to take advantage of the openpyxl package, which has two utility functions: get_column_letter and column_index_from_string:
from openpyxl.utils import get_column_letter, column_index_from_string
def get_next_letter(letters):
return get_column_letter(column_index_from_string(letters)+1)
NOTE: as this is based on Excel, it is limited to count up-to 'ZZZ'. i.e. calling the function with 'ZZZ' will raise an exception.
Output example for both implementations:
>>> get_next_letter('A')
'B'
>>> get_next_letter('Z')
'AA'
>>> get_next_letter('BD')
'BE'
Let's start with the simple special case of getting just the single-character strings.
from string import ascii_lowercase
def population():
yield from ascii_lowercase
Then
>>> x = population()
>>> list(x)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
>>> x = population()
>>> next(x)
'a'
>>> next(x)
'b'
So we'd like to add the two-character sequences next:
from string import ascii_lowercase
from itertools import product
def population():
yield from ascii_lowercase
yield from map(''.join, product(ascii_lowercase, repeat=2)
Note that the single-character strings are just a special case of the product with repeat=1, so we could have written
from string import ascii_lowercase
from itertools import product
def population():
yield from map(''.join, product(ascii_lowercase, repeat=1)
yield from map(''.join, product(ascii_lowercase, repeat=2)
We can write this with a loop:
def population():
for k in range(1, 3):
yield from map(''.join, product(ascii_lowercase, repeat=k)
but we don't necessarily want an artificial upper limit on what strings we can produce; we want, in theory, to produce all of them. For that, we replace range with itertools.count.
from string import ascii_lowercase
from itertools import product, count
def population():
for k in count(1):
yield from map(''.join, product(ascii_lowercase, repeat=k)
all proposed are just way too complicated
I came up with below, using a recursive call,
this is it!
def getNextLetter(previous_letter):
"""
'increments' the provide string to the next letter recursively
raises TypeError if previous_letter is not a string
returns "a" if provided previous_letter was emtpy string
"""
if not isinstance(previous_letter, str):
raise TypeError("the previous letter should be a letter, doh")
if previous_letter == '':
return "a"
for letter_location in range(len(previous_letter) - 1, -1, -1):
if previous_letter[letter_location] == "z":
return getNextLetter(previous_letter[:-1])+"a"
else:
return (previous_letter[:-1])+chr(ord(previous_letter[letter_location])+1)
# EOF
I have list of tuples:
my_list = [(1,'a','b','c'), (2,'d','e','f'), (3,'g','h','i'), (1,'j','k','l'), (2,'m','n','o'), (1,'p','q','r'), (2,'s','t','u')]
I need to split it on sublists of tuples starting with a tuple where first item is '1'.
[(1,'a','b','c'), (2,'d','e','f'), (3,'g','h','i')]
[(1,'j','k','l'), (2,'m','n','o')]
[(1,'p','q','r'), (2,'s','t','u')]
You're effectively computing some kind of a "groupwhile" function -- you want to split at every tuple you find starting in a 1. This looks an awful lot like itertools.groupby, and if we keep a tiny bit of global state (the one_count variable in our example) we can re-use the grouping/aggregation logic already built-in to the language to get your desired result.
import itertools
# The inner function is just so that one_count will be initialized only
# as many times as we want to call this rather than exactly once via
# some kind of global variable.
def gen_count():
def _cnt(t, one_count=[0]):
if t[0] == 1:
one_count[0] += 1
return one_count[0]
return _cnt
result = [list(g[1]) for g in itertools.groupby(my_list, key=gen_count())]
A more traditional solution would be to iterate through your example and append intermediate outputs to a result set.
result = []
for i, *x in my_list:
if i==1:
result.append([(i, *x)])
else:
result[-1].append((i, *x))
Try this code. I assume the break is when the first character (1) is found again. I also assume the output is a list.
my_list = [(1,'a','b','c'), (2,'d','e','f'), (3,'g','h','i'), (1,'j','k','l'), (2,'m','n','o'), (1,'p','q','r'), (2,'s','t','u')]
ch = my_list[0][0]
all = []
st = 0
for i, t in enumerate(my_list):
if t[0] == ch:
if i != 0:
all.append(my_list[st:i])
st = i
else:
all.append(my_list[st:i])
print(all)
Output
[
[(1, 'a', 'b', 'c'), (2, 'd', 'e', 'f'), (3, 'g', 'h', 'i')],
[(1, 'j', 'k', 'l'), (2, 'm', 'n', 'o')],
[(1, 'p', 'q', 'r')]
]
I want to list all possible words with n letters where the first letter can be a1 or a2, the second can be b1, b2 or b3, the third can be c1 or c2, ... Here's a simple example input-output for n=2 with each letter having 2 alternatives:
input = [["a","b"],["c","d"]]
output = ["ac", "ad", "bc", "bd"]
I tried doing this recursively by creating all possible words with the first 2 letters first, so something like this:
def go(l):
if len(l) > 2:
head = go(l[0:2])
tail = l[2:]
tail.insert(0, head)
go(tail)
elif len(l) == 2:
res = []
for i in l[0]:
for j in l[1]:
res.append(i+j)
return res
elif len(l) == 1:
return l
else:
return None
However, this becomes incredibly slow for large n or many alternatives per letter. What would be a more efficient way to solve this?
Thanks
I think you just want itertools.product here:
>>> from itertools import product
>>> lst = ['ab', 'c', 'de']
>>> words = product(*lst)
>>> list(words)
[('a', 'c', 'd'), ('a', 'c', 'e'), ('b', 'c', 'd'), ('b', 'c', 'e')]`
Or, if you wanted them joined into words:
>>> [''.join(word) for word in product(*lst)]
['acd', 'ace', 'bcd', 'bce']
Or, with your example:
>>> lst = [["a","b"],["c","d"]]
>>> [''.join(word) for word in product(*lst)]
['ac', 'ad', 'bc', 'bd']
Of course for very large n or very large sets of letters (size m), this will be slow. If you want to generate an exponentially large set of outputs (O(m**n)), that will take exponential time. But at least it has constant rather than exponential space (it generates one product at a time, instead of a giant list of all of them), and will be faster than what you were on your way to by a decent constant factor, and it's a whole lot simpler and harder to get wrong.
You can use the permutations from the built-in itertools module to achieve this, like so
>>> from itertools import permutations
>>> [''.join(word) for word in permutations('abc', 2)]
['ab', 'ac', 'ba', 'bc', 'ca', 'cb']
Generating all strings of some length with given alphabet :
test.py :
def generate_random_list(alphabet, length):
if length == 0: return []
c = [[a] for a in alphabet[:]]
if length == 1: return c
c = [[x,y] for x in alphabet for y in alphabet]
if length == 2: return c
for l in range(2, length):
c = [[x]+y for x in alphabet for y in c]
return c
if __name__ == "__main__":
for p in generate_random_list(['h','i'],2):
print p
$ python2 test.py
['h', 'h']
['h', 'i']
['i', 'h']
['i', 'i']
Next Way :
def generate_random_list(alphabet, length):
c = []
for i in range(length):
c = [[x]+y for x in alphabet for y in c or [[]]]
return c
if __name__ == "__main__":
for p in generate_random_list(['h','i'],2):
print p
Next Way :
import itertools
if __name__ == "__main__":
chars = "hi"
count = 2
for item in itertools.product(chars, repeat=count):
print("".join(item))
import itertools
print([''.join(x) for x in itertools.product('hi',repeat=2)])
Next Way :
from itertools import product
#from string import ascii_letters, digits
#for i in product(ascii_letters + digits, repeat=2):
for i in product("hi",repeat=2):
print(''.join(i))
I am trying to build a function that returns or yields a list of k-mers from a list of letters (DNA-bases). K would represent the length (or the order) of the k-mers.
I have made this function which prints the desired result into the screen. The point is that I cannot get the function return those values.
def function(k,y=''):
letters=['A','C','T','G']
if k==0:
print(y)
else:
for m in letters:
kmer=m+y
function(k-1,kmer)
I have though in returning a list or yielding the k-mers but none of the options work. When I change the print for yield or return the function returns None.
It may be a concept error, I am just understanding recursive functions as I come from a biological background.
Thanks in advance.
The trick is to use yield from in your recursive call (needs Python 3.3+):
def function(k, y=''):
if k==0:
yield y
else:
for m in ['A','C','T','G']:
yield from function(k-1, m+y)
Testing:
>>> [x for x in function(2)]
['AA', 'CA', 'TA', 'GA', 'AC', 'CC', 'TC', 'GC', 'AT', 'CT', 'TT', 'GT', 'AG', 'CG', 'TG', 'GG']
People already showed how to use yield, I will show how to return all the kmers together (might be undesirable if k is too high and there will be too many of them)
def giveKmers(k):
def function(k, y=''):
letters = ['A', 'C', 'T', 'G']
if k:
for m in letters:
function(k - 1, m + y)
else:
arr.append(y)
arr = []
function(k)
return arr
print giveKmers(2)
In perl, to get a list of all strings from "a" to "azc", to only thing to do is using the range operator:
perl -le 'print "a".."azc"'
What I want is a list of strings:
["a", "b", ..., "z", "aa", ..., "az" ,"ba", ..., "azc"]
I suppose I can use ord and chr, looping over and over, this is simple to get for "a" to "z", eg:
>>> [chr(c) for c in range(ord("a"), ord("z") + 1)]
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
But a bit more complex for my case, here.
Thanks for any help !
Generator version:
from string import ascii_lowercase
from itertools import product
def letterrange(last):
for k in range(len(last)):
for x in product(ascii_lowercase, repeat=k+1):
result = ''.join(x)
yield result
if result == last:
return
EDIT: #ihightower asks in the comments:
I have no idea what I should do if I want to print from 'b' to 'azc'.
So you want to start with something other than 'a'. Just discard anything before the start value:
def letterrange(first, last):
for k in range(len(last)):
for x in product(ascii_lowercase, repeat=k+1):
result = ''.join(x)
if first:
if first != result:
continue
else:
first = None
yield result
if result == last:
return
A suggestion purely based on iterators:
import string
import itertools
def string_range(letters=string.ascii_lowercase, start="a", end="z"):
return itertools.takewhile(end.__ne__, itertools.dropwhile(start.__ne__, (x for i in itertools.count(1) for x in itertools.imap("".join, itertools.product(letters, repeat=i)))))
print list(string_range(end="azc"))
Use the product call in itertools, and ascii_letters from string.
from string import ascii_letters
from itertools import product
if __name__ == '__main__':
values = []
for i in xrange(1, 4):
values += [''.join(x) for x in product(ascii_letters[:26], repeat=i)]
print values
Here's a better way to do it, though you need a conversion function:
for i in xrange(int('a', 36), int('azd', 36)):
if base36encode(i).isalpha():
print base36encode(i, lower=True)
And here's your function (thank you Wikipedia):
def base36encode(number, alphabet='0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', lower=False):
'''
Convert positive integer to a base36 string.
'''
if lower:
alphabet = alphabet.lower()
if not isinstance(number, (int, long)):
raise TypeError('number must be an integer')
if number < 0:
raise ValueError('number must be positive')
# Special case for small numbers
if number < 36:
return alphabet[number]
base36 = ''
while number != 0:
number, i = divmod(number, 36)
base36 = alphabet[i] + base36
return base36
I tacked on the lowercase conversion option, just in case you wanted that.
I generalized the accepted answer to be able to start middle and to use other than lowercase:
from string import ascii_lowercase, ascii_uppercase
from itertools import product
def letter_range(first, last, letters=ascii_lowercase):
for k in range(len(first), len(last)):
for x in product(letters, repeat=k+1):
result = ''.join(x)
if len(x) != len(first) or result >= first:
yield result
if result == last:
return
print list(letter_range('a', 'zzz'))
print list(letter_range('BA', 'DZA', ascii_uppercase))
def strrange(end):
values = []
for i in range(1, len(end) + 1):
values += [''.join(x) for x in product(ascii_lowercase, repeat=i)]
return values[:values.index(end) + 1]