Strings of two letters and fixed length - python

I'm wondering how to generate a list of all possible two-letter strings of length 10 in Python. For example, the list would go:
aaaaaaaaaa
aaaaaaaaab
aaaaaaaaba
aaaaaaaabb
...
...
...
bbbbbbbbab
bbbbbbbbba
bbbbbbbbbb
Also, I'm aware of how naive my question might be; I'm still in the learning process.

from itertools import product
prod = [''.join(p) for p in product('ab', repeat=10)]
or if you just want to print it like in your example:
from itertools import product
for p in product('ab', repeat=10):
print(''.join(p))
See the documentation for itertools.product

You can count from 0 to 2**10-1, convert those numbers using bin and replace 0/1 with a/b. Just pad the left side with 0's to the right length.

def s(d):
if d:
for c in 'ab':
for rest in s(d-1):
yield c + rest
else:
yield ''
print list(s(10))
or
def x(d):
return ([ 'a' + q for q in x(d-1) ] +
[ 'b' + q for q in x(d-1) ]) if d else [ '' ]
print x(10)

Related

n length combinations from two or more digits with repetition limit

I have the letters a & b and from them I want to take n length combinations of which a and b have limit for repetition.
For example, if n = 7, a = 4 and b = 3, here are possible desired outcome would be starting with 'b':
bbbaaaa
bbabaaa
bbaabaa
bbaaaba
bbaaaab
babbaaa
bababaa
babaaba
babaaab
baabbaa
baababa
baabaab
baaabba
baaabab
baaaabb
I've looked into a lot of python & c functions, but none do exactly what I'm asking, and I don't know how to alter/use them into doing so.
What I initially thought of was storing all possible combinations and then picking them where a=(length) of them. However, that easily runs into memory issues...
Thanks
Use itertools.permutations() on the string 'aaaabbb'. It's not "efficient" and you'd need to remove duplicates.
from itertools import permutations
for l in set(permutations('a'*4 + 'b'*3, 7)):
print(*l, sep='')
babbaaa
abaabba
bbaaaab
aaabbba
ababaab
abaaabb
baaaabb
babaaba
aababba
baaabba
aabbaab
abbbaaa
abbaaba
baababa
bababaa
aabaabb
aaabbab
abaabab
bbabaaa
baaabab
aaaabbb
aabbbaa
bbbaaaa
baabbaa
babaaab
aababab
abbabaa
bbaaaba
abababa
baabaab
aaababb
abbaaab
bbaabaa
ababbaa
aabbaba
Generalised into a function:
from itertools import permutations
def f(**kwargs):
population = ''.join(s*n for s,n in kwargs.items())
return (''.join(l) for l in set(permutations(population, len(population))))
>>> f(a=3, b=4)
<generator object f.<locals>.<genexpr> at 0x7fc1ec51fd60>
>>> list(f(a=3, b=4))
['aabbabb', 'bbaaabb', 'bbbbaaa', 'aaabbbb', 'bbbaaab', 'abaabbb', 'bbbaaba', 'baabbab', 'babbaab', 'bbabbaa', 'babaabb', 'babbaba', 'baaabbb', 'aabbbab', 'aabbbba', 'baabbba', 'bbaabab', 'baababb', 'bbabaab', 'aababbb', 'abbbbaa', 'bbaabba', 'bbababa', 'abbabab', 'abababb', 'bababab', 'abbabba', 'bababba', 'abbbaab', 'abbbaba', 'abbaabb', 'babbbaa', 'bbbabaa', 'ababbab', 'ababbba']
>>> print(*(f(a=3, b=4)))
aabbabb bbaaabb bbbbaaa aaabbbb bbbaaab abaabbb bbbaaba baabbab babbaab bbabbaa babaabb babbaba baaabbb aabbbab aabbbba baabbba bbaabab baababb bbabaab aababbb abbbbaa bbaabba bbababa abbabab abababb bababab abbabba bababba abbbaab abbbaba abbaabb babbbaa bbbabaa ababbab ababbba
>>> list(f(a=1,b=1,c=1))
['cab', 'bac', 'abc', 'acb', 'bca', 'cba']
You are looking for permutations, not combinations. Then you cast it as a set to get rid of identical permutations.
import itertools as it
def find_combos(n,a,b):
lst = ["a"]*a + ["b"]*b
return set(it.permutations(lst))
for p in find_combos(7,4,3):
print(p)
I believe your most effecient method is going to be to use the combination values from a range() as the position values for inserting new characters. Also, utilizing a recursive function, I believe we can write a function to accommodate any size alphabet.
from itertools import combinations
letters = 'abcdefghijklmnop'
def combos(*sizes,level=[]):
a = sum(sizes[len(level):])
b = sizes[len(level)]
if a!=b:
for i in combinations(range(a),b):
for r in f(*sizes, level=level + [i]):
yield r
else:
r = [letters[len(sizes)-1]]*sizes[-1]
for l,c in reversed(list(zip(level,letters))):
for i in l:
r.insert(i,c)
yield ''.join(r)
print(list(combos(3,4)))
print(list(combos(2,2)))
print(list(combos(2,1,2)))

Remove adjacent duplicates given a condition

I'm trying to write a function that will take a string, and given an integer, will remove all the adjacent duplicates larger than the integer and output the remaining string. I have this function right now that removes all the duplicates in a string, and I'm not sure how to put the integer constraint into it:
def remove_duplicates(string):
s = set()
list = []
for i in string:
if i not in s:
s.add(i)
list.append(i)
return ''.join(list)
string = "abbbccaaadddd"
print(remove_duplicates(string))
This outputs
abc
What I would want is a function like
def remove_duplicates(string, int):
.....
Where if for the same string I input int=2, I want to remove my n characters without removing all the characters. Output should be
abbccaadd
I'm also concerned about run time and complexity for very large strings, so if my initial approach is bad, please suggest a different approach. Any help is appreciated!
Not sure I understand your question correctly. I think that, given m repetitions of a character, you want to remove up to k*n duplicates such that k*n < m.
You could try this, using groupby:
>>> from itertools import groupby
>>> string = "abbbccaaadddd"
>>> n = 2
>>> ''.join(c for k, g in groupby(string) for c in k * (len(list(g)) % n or n))
'abccadd'
Here, k * (len(list(g)) % n or n) means len(g) % n repetitions, or n if that number is 0.
Oh, you changed it... now my original answer with my "interpretation" of your output actually works. You can use groupby together with islice to get at most n characters from each group of duplicates.
>>> from itertools import groupby, islice
>>> string = "abbbccaaadddd"
>>> n = 2
>>> ''.join(c for _, g in groupby(string) for c in islice(g, n))
'abbccaadd'
Create group of letters, but compute the length of the groups, maxed out by your parameter.
Then rebuild the groups and join:
import itertools
def remove_duplicates(string,maxnb):
groups = ((k,min(len(list(v)),maxnb)) for k,v in itertools.groupby(string))
return "".join(itertools.chain.from_iterable(v*k for k,v in groups))
string = "abbbccaaadddd"
print(remove_duplicates(string,2))
this prints:
abbccaadd
can be a one-liner as well (cover your eyes!)
return "".join(itertools.chain.from_iterable(v*k for k,v in ((k,min(len(list(v)),maxnb)) for k,v in itertools.groupby(string))))
not sure about the min(len(list(v)),maxnb) repeat value which can be adapted to suit your needs with a modulo (like len(list(v)) % maxnb), etc...
You should avoid using int as a variable name as it is a python keyword.
Here is a vanilla function that does the job:
def deduplicate(string: str, treshold: int) -> str:
res = ""
last = ""
count = 0
for c in string:
if c != last:
count = 0
res += c
last = c
else:
if count < treshold:
res += c
count += 1
return res

Find values in list which differ from reference list by up to N characters

I have a list like the following:
Test = ['ASDFGH', 'QWERTYU', 'ZXCVB']
And a reference list like this:
Ref = ['ASDFGY', 'QWERTYI', 'ZXCAA']
I want to extract the values from Test if they are N or less characters different from any one of the items in Ref.
For example, if N = 1, only the first two elements of Test should be output. If N = 2, all three elements fit this criteria and should be returned.
It should be noted that I am looking for same charcacter length values (ASDFGY -> ASDFG matching doesn't work for N = 1), so I want something more efficient than levensthein distance.
I have over 1000 values in ref and a couple hundred million in Test so efficiency is key.
Using a generation expression with sum:
Test = ['ASDFGH', 'QWERTYU', 'ZXCVB']
Ref = ['ASDFGY', 'QWERTYI', 'ZXCAA']
from collections import Counter
def comparer(x, y, n):
return (len(x) == len(y)) and (sum(i != j for i, j in zip(x, y)) <= n)
res = [a for a, b in zip(Ref, Test) if comparer(a, b, 1)]
print(res)
['ASDFGY', 'QWERTYI']
Using difflib
Demo:
import difflib
N = 1
Test = ['ASDFGH', 'QWERTYU', 'ZXCVB']
Ref = ['ASDFGY', 'QWERTYI', 'ZXCAA']
result = []
for i,v in zip(Test, Ref):
c = 0
for j,s in enumerate(difflib.ndiff(i, v)):
if s.startswith("-"):
c += 1
if c <= N:
result.append( i )
print(result)
Output:
['ASDFGH', 'QWERTYU']
The newer regex module offers a "fuzzy" match possibility:
import regex as re
Test = ['ASDFGH', 'QWERTYU', 'ZXCVB']
Ref = ['ASDFGY', 'QWERTYI', 'ZXCAA', 'ASDFGI', 'ASDFGX']
for item in Test:
rx = re.compile('(' + item + '){s<=3}')
for r in Ref:
if rx.search(r):
print(rf'{item} is similar to {r}')
This yields
ASDFGH is similar to ASDFGY
ASDFGH is similar to ASDFGI
ASDFGH is similar to ASDFGX
QWERTYU is similar to QWERTYI
ZXCVB is similar to ZXCAA
You can control it via the {s<=3} part which allows three or less substitutions.
To have pairs, you could write
pairs = [(origin, difference)
for origin in Test
for rx in [re.compile(rf"({origin}){{s<=3}}")]
for difference in Ref
if rx.search(difference)]
Which would yield for
Test = ['ASDFGH', 'QWERTYU', 'ZXCVB']
Ref = ['ASDFGY', 'QWERTYI', 'ZXCAA', 'ASDFGI', 'ASDFGX']
the following output:
[('ASDFGH', 'ASDFGY'), ('ASDFGH', 'ASDFGI'),
('ASDFGH', 'ASDFGX'), ('QWERTYU', 'QWERTYI'),
('ZXCVB', 'ZXCAA')]

find all possible rotation of a given string using python

Given string is "abc" then it should print out "abc", "bca", "cba"
My approach: find length of the given string and rotate them till length
def possible_rotation():
a = "abc"
b = len(a)
for i in range (b-1):
c = a[:i] + a[i:]
print c
Above code simply prints abc, abc. Any idea what am I missing here?
def possible_rotation():
a = "abc"
b = len(a)
for i in range (b):
c = a[i:]+a[:i]
print c
possible_rotation()
Output:
abc
bca
cab
You have 2 issues.The range issue and the rotation logic.it should be a[i:]+a[:i] not the other way round.For range range(b-1) should be range(b)
You have two errors:
range(b-1) should be range(b);
a[:i] + a[i:] should be a[i:] + a[:i].
This is what I did. I used a deque, A class in collections and then used the rotate function like this
from collections import deque
string = 'abc'
for i in range(len(string)):
c = deque(string)
c.rotate(i)
print ''.join(list(c))
And gives me this output.
abc
cab
bca
What it does. It creates a deque object, A double ended queue object, which has a method rotate, rotate takes the number of steps to rotate and returns the objects shifted to the right with the number of steps kinda like rshift in binary operations. Through the loops it shifts ad produces a deque object that I convert to list and finally to a string.
Hope this helps
for i in range(b):
print(a[i:] + a[:i])
0 - [a,b,c] + []
1 - [b,c] + [a]
2 - [c ] + [a,b]
swap the lists
No need to do (b-1),You simply do it by:
def possible_rotation():
a = "abc"
for i in range(0,len(a)):
strng = a[i:]+a[:i]
print strng
possible_rotation()
`
This looks to be homework, but here's a solution using the built-in collections.deque:
from collections import deque
def possible_rotations(string):
rotated = deque(string)
joined = None
while joined != string:
rotated.rotate(1)
joined = ''.join(x for x in rotated)
print(joined)
Test it out:
>>> print(possible_rotations('abc'))
cab
bca
abc
Two things:
Firstly, as already pointed out in the comments, you should iterate over range(b) instead of range(b-1). In general, range(b) is equal to [0, 1, ..., b-1], so in your example that would be [0, 1, 2].
Secondly, you switched around the two terms, it should be: a[i:] + a[:i].

Mapping two list of lists based on its items into list pairs in Python

I have two list of lists which basically need to be mapped to each other based on their matching items (list). The output is a list of pairs that were mapped. When the list to be mapped is of length one, we can look for direct matches in the other list. The problem arises, when the list to be mapped is of length > 1 where I need to find, if the list in A is a subset of B.
Input:
A = [['point'], ['point', 'floating']]
B = [['floating', 'undefined', 'point'], ['point']]
My failed Code:
C = []
for a in A:
for b in B:
if a == b:
C.append([a, b])
else:
if set(a).intersection(b):
C.append([a, b])
print C
Expected Output:
C = [
[['point'], ['point']],
[['point', 'floating'], ['floating', 'undefined', 'point']]
]
Just add a length condition to the elif statement:
import pprint
A = [['point'], ['point', 'floating']]
B = [['floating', 'undefined', 'point'], ['point']]
C = []
for a in A:
for b in B:
if a==b:
C.append([a,b])
elif all (len(x)>=2 for x in [a,b]) and not set(a).isdisjoint(b):
C.append([a,b])
pprint.pprint(C)
output:
[[['point'], ['point']],
[['point', 'floating'], ['floating', 'undefined', 'point']]]
Just for interests sake, here's a "one line" implementation using itertools.ifilter.
from itertools import ifilter
C = list(ifilter(
lambda x: x[0] == x[1] if len(x[0]) == 1 else set(x[0]).issubset(x[1]),
([a,b] for a in A for b in B)
))
EDIT:
Having reading the most recent comments on the question, I think I may have misinterpreted what exactly is considered to be a match. In which case, something like this may be more appropriate.
C = list(ifilter(
lambda x: x[0] == x[1] if len(x[0])<2 or len(x[1])<2 else set(x[0]).intersection(x[1]),
([a,b] for a in A for b in B)
))
Either way, the basic concept is the same. Just change the condition in the lamba to match exactly what you want to match.

Categories

Resources