Create string combination based on replacement - python

Given a word and a dictionary of replacement characters, I need to form a Combination of characters based on the replacement
Input
word = 'accompanying'
substitutions={'c':['$'], 'a': ['4'], 'g': ['9']}
Output
{'a$$ompanyin9', 'ac$ompanyin9','a$companyin9','4ccomp4nying', '4$$omp4nying',
'4$comp4nying','4c$omp4nying', '4ccomp4nyin9', 'a$$ompanying', 'a$companying', 'ac$ompanying',
'accompanyin9', 'accompanying', '4$$omp4nyin9', '4$comp4nyin9', '4c$omp4nyin9','etc.,'}
I wrote a code, But it does not provide me all the combinations which I am expecting
Sample Code
from itertools import product
substitutions={'c':['$'], 'a': ['4'], 'g': ['9']}
for key in substitutions.keys():
if key not in substitutions[key]:
substitutions[key].append(key)
wordPossibilities = []
word = 'accompanying'
for substitute in [zip(substitutions.keys(),ch) for ch in product(*substitutions.values())]:
temp=word
for replacement in substitute:
temp=temp.replace(*replacement)
wordPossibilities.append(temp)
print(set(wordPossibilities))
My Output
{'4$$omp4nyin9', 'a$$ompanyin9', 'a$$ompanying', 'accompanyin9',
'accompanying', '4ccomp4nyin9', '4$$omp4nying', '4ccomp4nying'}
My code replaces all characters in the provided string if found a replacement. How do I make replacements based on Indexes to find all possible combinations?

It is clean and straightforward to use a generator with recursion:
word = 'accompanying'
subs={'c':['$'], 'a': ['4'], 'g': ['9']}
def get_subs(d, c = []):
if not d:
yield ''.join(c)
else:
for i in [d[0], *subs.get(d[0], [])]:
yield from get_subs(d[1:], c+[i])
print(list(get_subs(word)))
Output:
['accompanying', 'accompanyin9', 'accomp4nying', 'accomp4nyin9', 'ac$ompanying', 'ac$ompanyin9', 'ac$omp4nying', 'ac$omp4nyin9', 'a$companying', 'a$companyin9', 'a$comp4nying', 'a$comp4nyin9', 'a$$ompanying', 'a$$ompanyin9', 'a$$omp4nying', 'a$$omp4nyin9', '4ccompanying', '4ccompanyin9', '4ccomp4nying', '4ccomp4nyin9', '4c$ompanying', '4c$ompanyin9', '4c$omp4nying', '4c$omp4nyin9', '4$companying', '4$companyin9', '4$comp4nying', '4$comp4nyin9', '4$$ompanying', '4$$ompanyin9', '4$$omp4nying', '4$$omp4nyin9']
However, itertools.product can be used for a shorter solution:
from itertools import product as prod
s = ''.join('{}' if i in subs else i for i in word)
result = [s.format(*i) for i in prod(*[[i, *subs[i]] for i in word if i in subs])]
Output:
['accompanying', 'accompanyin9', 'accomp4nying', 'accomp4nyin9', 'ac$ompanying', 'ac$ompanyin9', 'ac$omp4nying', 'ac$omp4nyin9', 'a$companying', 'a$companyin9', 'a$comp4nying', 'a$comp4nyin9', 'a$$ompanying', 'a$$ompanyin9', 'a$$omp4nying', 'a$$omp4nyin9', '4ccompanying', '4ccompanyin9', '4ccomp4nying', '4ccomp4nyin9', '4c$ompanying', '4c$ompanyin9', '4c$omp4nying', '4c$omp4nyin9', '4$companying', '4$companyin9', '4$comp4nying', '4$comp4nyin9', '4$$ompanying', '4$$ompanyin9', '4$$omp4nying', '4$$omp4nyin9']

Obviously, you need to rewrite your logic to consider individual instances of the desired letters, rather than each unique letter. Find all occurrences of desired letters; use itertools to get the power set; make the indicated substitutions for each element of the power set. power_set comes from this SO answer. I've left the code "exploded" in some places to show the logic more readily. You will likely want to wrap the final loop into a one-line return expression.
from itertools import chain, combinations
def power_set(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
substitutions={'c':['$'], 'a': ['4', 'a'], 'g': ['9']}
word = 'accordingly'
# Get index of each desired letter and its poosible substitutions
sub_idx = [(pos, letter, sub_letter) for pos, letter in enumerate(word)
if letter in list(substitutions.keys()) for sub_letter in substitutions[letter]]
print("Replacement set", sub_idx)
for possibility in power_set(sub_idx):
# Make each of the substitutions indicated in the power set
new_word = list(word)
for pos, _, sub_letter in possibility:
new_word[pos] = sub_letter
print(''.join(new_word))
Output:
Replacement set [(0, 'a', '4'), (0, 'a', 'a'), (1, 'c', '$'), (2, 'c', '$'), (8, 'g', '9')]
accordingly
4ccordingly
accordingly
a$cordingly
ac$ordingly
accordin9ly
accordingly
4$cordingly
4c$ordingly
4ccordin9ly
a$cordingly
ac$ordingly
accordin9ly
a$$ordingly
a$cordin9ly
ac$ordin9ly
a$cordingly
ac$ordingly
accordin9ly
4$$ordingly
4$cordin9ly
4c$ordin9ly
a$$ordingly
a$cordin9ly
ac$ordin9ly
a$$ordin9ly
a$$ordingly
a$cordin9ly
ac$ordin9ly
4$$ordin9ly
a$$ordin9ly
a$$ordin9ly

Related

Combinations by changing 3 or more places in a string

The code below takes a string, then in p = there is a mapping for every index that can be changed and with what characters. For example d1 is atp[0], so the character a (at string[0]) can be replaced by d or 1. The number of characters that have to change at a time is limited to the number 3.
from itertools import combinations, product
string = "abc123"
p = ["d1", "c3", "", "", "0", "56"]
d = {idx: (v if string[idx] in v else string[idx]+v) for idx, v in enumerate(p)}
all_of_em = (''.join(whatever) for whatever in product(*d.values()))
fewer = [w for w in all_of_em if sum(a != b for a, b in zip(w, string)) == 3]
with open("list.txt","w") as f:
for w in fewer:
f.write(w+"\n")
As a result of the above code, we find all possible combinations if we change 3 places in a string with the specified alternative characters in p.
acc105
acc106
a3c105
a3c106
dbc105
dbc106
dcc125
dcc126
dcc103
d3c125
d3c126
d3c103
1bc105
1bc106
1cc125
1cc126
1cc103
13c125
13c126
13c103
The goal is to print the results faster, for example these lines should be changed I think:
with open("list.txt","w") as f:
for w in fewer:
f.write(w+"\n")
So the output will be saved as python3 py.py >> list.txt
Will enjoy to learn from your solution.
Your solution is based on a brute force approach. You are generating all possible alternative strings and then filtering out the ones that do not meet the criteria of only 3 changes. A better approach would be to look only at those combinations that will meet the criteria. I will ignore the part of saving to a file, since it will be the same for both solutions. A faster solution would be:
def change_string(input_string, mapping, replace=3):
input_string = list(input_string)
to_replace = dict()
for idx, replacement in enumerate(mapping):
if not replacement: continue
to_replace[idx] = replacement
if input_string[idx] in replacement:
to_replace[idx] = [char for char in replacement if char != mapping[idx]]
for indices in combinations(to_replace, r=replace):
for chars in product(*[to_replace[index] for index in indices]):
temp = input_string[:]
for index, char in zip(indices, chars):
temp[index] = char
yield ''.join(temp)
Explanation
I change the input string to a list, so I can do the replacement faster, since lists are mutable and strings are not.
Then I filter the mapping (p) to represent only indices that are going to be changed. This removes all empty strings and provides me with the indices that I have to look at.
to_replace = dict()
for idx, replacement in enumerate(mapping):
if not replacement: continue
to_replace[idx] = replacement
if input_string[idx] in replacement:
to_replace[idx] = [char for char in replacement if char != mapping[idx]]
Note: I also make sure that the values in mapping are unequal to the original string values, which might not be what you want.
Then I create all possible combinations of indices with the required length (replace=3).
for indices in combinations(to_replace, r=replace):
Using your example this will contain the following group of indices:
(0, 1, 4)
(0, 1, 5)
(0, 4, 5)
(1, 4, 5)
Then I create all possible character combinations from those indices:
for chars in product(*[to_replace[index] for index in indices]):
For example with indices (0, 1, 4) or the values ('d1', 'c3', '0'):
('d', 'c', '0')
('d', '3', '0')
('1', 'c', '0')
('1', '3', '0')
Are all the character combinations produced.
Then I create a copy of the input string (note it is a list, so we can perform fast replacements) and replace the characters at the correct indices.
Comparison
Your function
def OP(input_string, replace=3):
p = ["d1", "c3", "", "", "0", "56"]
d = {idx: (v if input_string[idx] in v else input_string[idx] + v) for idx, v in enumerate(p)}
all_of_em = (''.join(whatever) for whatever in product(*d.values()))
fewer = [w for w in all_of_em if sum(a != b for a, b in zip(w, input_string)) == replace]
return fewer
Replace is 3
print(timeit.timeit("OP('abc123')", setup="from __main__ import OP", number=100_000))
# 5.6281933 seconds
print(timeit.timeit("list(change_string('abc123', ['d1', 'c3', '', '', '0', '56']))",
setup="from __main__ import change_string", number=100_000))
# 1.3682368 seconds
Which is about 3 times as fast, now the interesting part is to see what happens if we increase the replace value to 4
Replace is 4
print(timeit.timeit("OP('abc123', replace=4)", setup="from __main__ import OP", number=100_000))
# 5.5450302 seconds
print(timeit.timeit("list(change_string('abc123', ['d1', 'c3', '', '', '0', '56'], replace=4))",
setup="from __main__ import change_string", number=100_000))
# 0.6179974 seconds
A whooping 9 times faster, since my solution only has to check a few combinations.
Similar increase can be seen with using replace is 2 or 1.
Using a generator function will avoid creation and manipulation of large lists in memory. You can write it to the file as a single block of text using join.
def replace(S,R,N):
if not N: yield S; return
for i,chars in enumerate(R[:(1-N) or None]):
for c in chars:
yield from (S[:i]+c+s for s in replace(S[i+1:],R[i+1:],N-1))
def writeReplace(S,R,N):
with open("list.txt","w") as f:
f.write("\n".join(replace(S,R,3)))
S = "abc123"
R = ["d1", "c3", "", "", "0", "56"]
writeReplace(S,R,3)
dcc103
dcc125
dcc126
d3c103
d3c125
d3c126
dbc105
dbc106
1cc103
1cc125
1cc126
13c103
13c125
13c126
1bc105
1bc106
acc105
acc106
a3c105
a3c106
This is roughly 2.5x faster.

Find all substrings in a string using recursion Python 3

How would you make a list of all the possible substrings in a string using recursion? (no loops) I know that you can recurse using s[1:] to cut off the first position and s[:-1] to cut off the last position. So far I have come up with this:
def lst_substrings(s):
lst = []
if s == "":
return lst
else:
lst.append(s)
return lst_substrings(s[1:])
but this would only make a list of all the substrings that are sliced by the first position if it worked
Fun problem, here's my solution - feedback appreciated.
Output
In [73]: lstSubStrings("Hey")
Out[73]: ['', 'y', 'H', 'Hey', 'He', 'e', 'ey']
Solution
def lstSubStrings(s):
# BASE CASE: when s is empty return the empty string
if(len(s) is 0):
return [s]
substrs = []
# a string is a substring of itself - by the definition of subset in math
substrs.append(s)
# extend the list of substrings by all substrings with the first
# character cut out
substrs.extend(lstSubStrings(s[1:]))
# extend the list of substrings by all substrings with the last
# character cut out
substrs.extend(lstSubStrings(s[:-1]))
# convert the list to `set`, removing all duplicates, and convert
# back to a list
substrs = list(set(substrs))
return substrs
EDIT: Duh. Just realized now that practically the same solution has been posted by someone who was quicker than me. Vote for his answer. I'll leave this as it is a bit more concise and in case you want to sort the resulting list by substring length. Use len(item, item), i.e. leave the - sign, to sort in ascending order.
This will do:
def lst_substrings(s):
lst = [s]
if len(s) > 0:
lst.extend(lst_substrings(s[1:]))
lst.extend(lst_substrings(s[:-1]))
return list(set(lst))
sub = lst_substrings("boby")
sub.sort(key=lambda item: (-len(item), item))
print(sub)
Output is:
['boby', 'bob', 'oby', 'bo', 'by', 'ob', 'b', 'o', 'y', '']

python unique string creation

I've looked at several other SO questions (and google'd tons) that are 'similar'-ish to this, but none of them seem to fit my question right.
I am trying to make a non fixed length, unique text string, only containing characters in a string I specify. E.g. made up of capital and lower case a-zA-Z characters. (for this example I use only a, b, and c lower case)
Something like this (broken code below)
def next(index, validCharacters = 'abc'):
return uniqueShortAsPossibleString
The index argument would be an index (integer) that relate to a text string, for instance:
next(1) == 'a'
next(2) == 'b'
next(3) == 'c'
next(4) == 'aa'
next(5) == 'ab'
next(6) == 'ac'
next(7) == 'ba'
next(8) == 'bb'
next(9) == 'bc'
next(10) == 'ca'
next(11) == 'cb'
next(12) == 'cc'
And so forth. The string:
Must be unique, I'll be using it as an identifier, and it can only be a-zA-Z chars
As short as possible, with lower index numbers being shortest (see above examples)
Contain only the characters specified in the given argument string validCharacters
In conclusion, how could I write the next() function to relate an integer index value to an unique short string with the characters specified?
P.S. I'm new to SO, this site has helped me tons throughout the years, and while I've never made an account or asked a question (till now), I really hope I've done an okay job explaining what I'm trying to accomplish with this.
What you are trying to do is write the parameter of the next function in another base.
Let's suppose validCharacters contains k characters: then the job of the next function will be to transform parameter p into base k by using the characters in validCharacters.
In your example, you can write the numbers in base 3 and then associate each digit with one letter:
next(1) -> 1 -> 'a'
next(2) -> 2 -> 'b'
next(4) -> 11 -> 'aa'
next(7) -> 21 -> 'ba'
And so forth.
With this method, you can call next(x) without knowing or computing any next(x-i), which you can't do with iterative methods.
You're trying to convert a number to a number in another base, but using arbitrary characters for the digits of that base.
import string
chars = string.lowercase + string.uppercase
def identifier(x, chars):
output = []
base = len(chars)
while x:
output.append(chars[x % base])
x /= base
return ''.join(reversed(output))
print identifier(1, chars)
This lets you jump to any position, you're counting so the identifiers are totally unique, and it is easy to use any character set of any length (of two or more), and lower numbers give shorter identifiers.
itertools can always give you obfuscated one-liner iterators:
from itertools import combinations_with_replacement, chain
chars = 'abc'
a = chain(*(combinations_with_replacement(chars, i) for i in range(1, len(chars) + 1)))
Basically, this code creates an iterator that combines all combinations of chars of lengths 1, 2, ..., len(chars).
The output of for x in a: print x is:
('a',)
('b',)
('c',)
('a', 'b')
('a', 'c')
('b', 'a')
('b', 'c')
('c', 'a')
('c', 'b')
('a', 'b', 'c')
('a', 'c', 'b')
('b', 'a', 'c')
('b', 'c', 'a')
('c', 'a', 'b')
('c', 'b', 'a')
You can't really "associate" the index with annoying, but the following is a generator that will yield and provide the output you're asking for:
from itertools import combinations_with_replacement
def uniquenames(chars):
for i in range(1, len(chars)):
for j in combinations_with_replacement(chars, i):
yield ''.join(j)
print list(uniquenames('abc'))
# ['a', 'b', 'c', 'aa', 'ab', 'ac', 'bb', 'bc', 'cc']
As far as I understood we shouldn't specify maximum length of output string. So range is not enough:
>>> from itertools import combinations_with_replacement, count
>>> def u(chars):
... for i in count(1):
... for k in combinations_with_replacement(chars, i):
... yield "".join(k)
...
>>> g = u("abc")
>>> next(g)
'a'
>>> next(g)
'b'
>>> next(g)
'c'
>>> next(g)
'aa'
>>> next(g)
'ab'
>>> next(g)
'ac'
>>> next(g)
'bb'
>>> next(g)
'bc'
So it seems like you are trying to enumerate through all the strings generated by the language {'a','b','c'}. This can be done using finite state automata (though you don't want to do that). One simple way to enumerate through the language is to start with a list and append all the strings of length 1 in order (so a then b then c). Then append each letter in the alphabet to each string of length n-1. This will keep it in order as long as you append all the letters in the alphabet to a given string before moving on to the lexicographically next string.

introducing mutations in a DNA string in python

Given a DNA string for example AGC. I am trying to generate all possible uniq strings allowing upto #n (user defined number) mismatches in the given string.
I am able to do this for one mismatch in the following way but not able to implement the recursive solution to generate all the possible combinations based on #n mismatch, DNA string and mutation set(AGCTN)
temp_dict = {}
sequence = 'AGC'
for x in xrange(len(sequence)):
prefix = sequence[:x]
suffix = sequence[x+1:]
temp_dict.update([ (prefix+base+suffix,1) for base in 'ACGTN'])
print temp_dict
An example:
for a given sample string : ACG, the following are the 13 uniq sequences allowing upto one mismatch
{'ACC': 1, 'ATG': 1, 'AAG': 1, 'ANG': 1, 'ACG': 1, 'GCG': 1, 'AGG': 1,
'ACA': 1, 'ACN': 1, 'ACT': 1, 'TCG': 1, 'CCG': 1, 'NCG': 1}
I want to generalize this so that the program can take a 100 characters long DNA string and return a list/dict of uniq strings allowing user defined #mismatches
Thanks!
-Abhi
Assuming I understand you, I think you can use the itertools module. The basic idea is to choose locations where there's going to be a mismatch using combinations and then construct all satisfying lists using product:
import itertools
def mismatch(word, letters, num_mismatches):
for locs in itertools.combinations(range(len(word)), num_mismatches):
this_word = [[char] for char in word]
for loc in locs:
orig_char = word[loc]
this_word[loc] = [l for l in letters if l != orig_char]
for poss in itertools.product(*this_word):
yield ''.join(poss)
For your example case:
>>> mismatch("ACG", "ACGTN", 0)
<generator object mismatch at 0x1004bfaa0>
>>> list(mismatch("ACG", "ACGTN", 0))
['ACG']
>>> list(mismatch("ACG", "ACGTN", 1))
['CCG', 'GCG', 'TCG', 'NCG', 'AAG', 'AGG', 'ATG', 'ANG', 'ACA', 'ACC', 'ACT', 'ACN']
I believe the accepted answer only gives N mismatches, not up to N. A slight modification to the accepted answer should correct this I think:
from itertools import combinations,product
def mismatch(word, i = 2):
for d in range(i+1):
for locs in combinations(range(len(word)), d):
thisWord = [[char] for char in word]
for loc in locs:
origChar = word[loc]
thisWord[loc] = [l for l in "ACGT" if l != origChar]
for poss in product(*thisWord):
yield "".join(poss)
kMerList = list(mismatch("AAAA",3))
print kMerList
I am completely new to programming, so please correct me if I'm wrong.

Python Function to return a list of common letters in first and last names

Question: DO NOT USE SETS IN YOUR FUNCTION: Uses lists to return a list of the common letters in the first and last names (the intersection) Prompt user for first and last name and call the function with the first and last names as arguments and print the returned list.
I can't figure out why my program is just printing "No matches" even if there are letter matches. Anything helps! Thanks a bunch!
Code so far:
import string
def getCommonLetters(text1, text2):
""" Take two strings and return a list of letters common to
both strings."""
text1List = text1.split()
text2List = text2.split()
for i in range(0, len(text1List)):
text1List[i] = getCleanText(text1List[i])
for i in range(0, len(text2List)):
text2List[i] = getCleanText(text2List[i])
outList = []
for letter in text1List:
if letter in text2List and letter not in outList:
outList.append(letter)
return outList
def getCleanText(text):
"""Return letter in lower case stripped of whitespace and
punctuation characters"""
text = text.lower()
badCharacters = string.whitespace + string.punctuation
for character in badCharacters:
text = text.replace(character, "")
return text
userText1 = raw_input("Enter your first name: ")
userText2 = raw_input("Enter your last name: ")
result = getCommonLetters(userText1, userText2)
numMatches = len(result)
if numMatches == 0:
print "No matches."
else:
print "Number of matches:", numMatches
for letter in result:
print letter
Try this:
def CommonLetters(s1, s2):
l1=list(''.join(s1.split()))
l2=list(''.join(s2.split()))
return [x for x in l1 if x in l2]
print CommonLetters('Tom','Dom de Tommaso')
Output:
>>> ['T', 'o', 'm']
for letter in text1List:
Here's your problem. text1List is a list, not a string. You iterate on a list of strings (['Bobby', 'Tables'] for instance) and you check if 'Bobby' is in the list text2List.
You want to iterate on every character of your string text1 and check if it is present in the string text2.
There's a few non-pythonic idioms in your code, but you'll learn that in time.
Follow-up: What happens if I type my first name in lowercase and my last name in uppercase? Will your code find any match?
Prior to set() being the common idiom for duplicate removal in Python 2.5, you could use the conversion of a list to a dictionary to remove duplicates.
Here is an example:
def CommonLetters(s1, s2):
d={}
for l in s1:
if l in s2 and l.isalpha():
d[l]=d.get(l,0)+1
return d
print CommonLetters('matteo', 'dom de tommaso')
This prints the count of the common letters like so:
{'a': 1, 'e': 1, 'm': 1, 't': 2, 'o': 1}
If you want to have a list of those common letters, just use the keys() method of the dictionary:
print CommonLetters('matteo', 'dom de tommaso').keys()
Which prints just the keys:
['a', 'e', 'm', 't', 'o']
If you want upper and lower case letters to match, add the logic to this line:
if l in s2 and l.isalpha():

Categories

Resources