Replace symbols in a string with conflicting keys in a dictionary

Replace symbols in a string with conflicting keys in a dictionary - python

I need a translator, that have a dictionary with keys like
's': 'd'
and
'sch': 'b'
.
That's a rough example, but the point is, when i have an input word like "schto", it needs to replace it as "bkr", substitute 'sch' to 'b'. BUT there are the key 's', thus it translates the word as "dnokr", leave out and never lookup for 'sch', because there the key with the symbol 's' and it translates it first before 'sch'. What is a workaround here to replace the input word with the key 'sch' first, not with separate 's', 'c', and 'h'?
Here is the example of the code.
newdict = {'sch': 'b', 'sh': 'q', 'ch': 'w', 's': 'd', 'c': 'n', 'h': 'o', 't': 'k', 'o': 'r'}
code = input("Type: ")
code = "".join([newdict[w] for w in code])
print(code)

Regular expressions are greedy by default. If you're using a version of Python in which the insertion-order of key-value pairs in a dictionary are guaranteed, and you insert the key-value pairs in such a way that the longer ones come first, something like this should work for you - re.sub takes either a string with which to replace a match, or it takes a callable (function/lambda/whatever), which accepts the current match as an argument, and must return a string with which to replace it:
import re
lookup = {
"sch": "b",
"sh": "q",
"s": "d"
}
def replace(match):
return lookup[match.group()]
pattern = "|".join(lookup)
print(re.sub(pattern, replace, "schush swim"))
Output:
buq dwim
>>>

If you are using Python version 3.4+, then dictionary maintain the insertions order of keys. And hence you can achieve this using str.replace() while iterating over dict.items().
It'll recursively update the strings based on mapping. For example, if 'h' is replaced by 'o', then 'o' will be replaced by 'r'.
newdict = {'sch': 'b', 'sh': 'q', 'ch': 'w', 's': 'd', 'c': 'n', 'h': 'o', 't': 'k', 'o': 'r'}
my_word = "schto"
for k, v in newdict.items():
my_word = my_word.replace(k, v)
where my_word will give you your desired string as 'bkr'.
Here, since the dict.items() maintains the insertion order, keys which are defined first will be executed first during the iteration. Hence, you can define the priority of your rules by defining the keys you want to give precedence by declaring them before the other keys.

Related

Replace numbers with letters and offer all permutations

I need to determine all possible letter combinations of a string that has numbers when converting numbers into possible visually similar letters.
Using the dictionary:
number_appearance = {
'1': ['l', 'i'],
'2': ['r', 'z'],
'3': ['e', 'b'],
'4': ['a'],
'5': ['s'],
'6': ['b', 'g'] ,
'7': ['t'],
'8': ['b'],
'9': ['g', 'p'],
'0': ['o', 'q']}
I want to write a function that takes an input and creates all possible letter combinations. For example:
text = 'l4t32'
def convert_numbers(text):
return re.sub('[0-9]', lambda x: number_appearance[x[0]][0], text)
I want the output to be a list with all possible permutations:
['later', 'latbr', 'latbz', 'latez]
The function above works if you are just grabbing the first letter in each list from number_appearance, but I'm trying to figure out the best way to iterate through all possible combinations. Any help would be much appreciated!

As an upgrade from your own answer, I suggest the following:
def convert_numbers(text):
all_items = [number_appearance.get(char, [char]) for char in text]
return [''.join(elem) for elem in itertools.product(*all_items)]
The improvements are that:
it doesn't convert text to a list (there is no need for that)
you don't need regex
it will still work if you decide instead that you also want to add other characters on top of numbers

def convert_num_appearance(text):
string_characters = [character for character in text]
all_items = []
for item in string_characters:
if re.search('[a-zA-Z]', item):
all_items.append([item])
elif re.search('\d', item):
all_items.append(number_appearance[item])
return [''.join(elem) for elem in itertools.product(*all_items)]

I would break down the problem like so:
First, create a function that can do the replacement for a given set of replacement letters. My input specification is a sequence of letters, where the first letter is the replacement for the '0' character, next for 1 etc. This allows me to use the index in that sequence to determine the character being replaced, while generating a plain sequence rather than a dict or other complex structure. To do the replacement, I will use the built-in translate method of the original string. That requires a dictionary as described in the documentation, which I can easily build with a dict comprehension, or with the provided helper method str.maketrans (a static method of the str type).
Use itertools.product to generate those sequences.
Use a list comprehension to apply the replacement for each sequence.
Thus:
from itertools import product
def replace_digits(original, replacement):
# translation = {ord(str(i)): c for i, c in enumerate(replacement)}
translation = str.maketrans('0123456789', ''.join(replacement))
print(translation)
return original.translate(translation)
replacements = product(
['o', 'q'], ['l', 'i'], ['r', 'z'], ['e', 'b'], ['a'],
['s'], ['b', 'g'] , ['t'], ['b'], ['g', 'p']
)
[replace_digits('14732', r) for r in replacements]
(You will notice there are duplicates in the result; this is because of variant replacements for symbols that don't appear in the input.)

Adding two strings to a dictionary

I'm trying to check if key and value are same or not in a dictionary, if they are print the count of correct words and if not check how many letters are exact match.
eg. {'KEY':'KET'}
the output should be 1 mismatch for Y!=T
I tried zip function to add key and value to a new dictionary, but it doesn't add repeating letters to dictionary. like below.
word_dict={'PRETTY': 'PRESEN'}
for key,value in word_dict.items():
if key==value:
count_correct+=1
elif key!=value and len(key)==len(value):
new_dict=dict(zip(key,value))
print (new_dict)
output of above code is:
{'P': 'P', 'T': 'E', 'E': 'E', 'Y': 'N', 'R': 'R'}
which is missing one 'T':'S'
I know I could convert key and value in different lists and compare the indexes of both. But I would also like to know if creating a dictionary adds all the values from both strings.

"TypeError: unhashable type: 'list'" yet I'm trying to only slice the value of the list, not use the list itself

I've been having issues trying to create a dictionary by using the values from a list.
alphabetList = list(string.ascii_lowercase)
alphabetList.append(list(string.ascii_lowercase))
alphabetDict = {}
def makeAlphabetDict (Dict, x):
count = 0
while count <= len(alphabetList):
item1 = x[(count + (len(alphabetList) / 2))]
item2 = item1
Dict[item1] = item2
count += 1
makeAlphabetDict(alphabetDict , alphabetList)
Which returns:
TypeError: unhashable type: 'list'
I tried here and other similar questions yet I still can't see why Python thinks I'm trying to use the list, rather than just a slice from a list.

Your list contains a nested list:
alphabetList.append(list(string.ascii_lowercase))
You now have a list with ['a', 'b', ..., 'z', ['a', 'b', ..., 'z']]. It is that last element in the outer list that causes your problem.
You'd normally would use list.extend() to add additional elements:
alphabetList.extend(string.ascii_lowercase)
You are using string.ascii_lowercase twice there; perhaps you meant to use ascii_uppercase for one of those strings instead? Even so, your code always uses the same character for both key and value so it wouldn't really matter here.
If you are trying to map lowercase to uppercase or vice-versa, just use zip() and dict():
alphabetDict = dict(zip(string.ascii_lowercase, string.ascii_uppercase))
where zip() produces pairs of characters, and dict() takes those pairs as key-value pairs. The above produces a dictionary mapping lowercase ASCII characters to uppercase:
>>> import string
>>> dict(zip(string.ascii_lowercase, string.ascii_uppercase))
{'u': 'U', 'v': 'V', 'o': 'O', 'k': 'K', 'n': 'N', 'm': 'M', 't': 'T', 'l': 'L', 'h': 'H', 'e': 'E', 'p': 'P', 'i': 'I', 'b': 'B', 'x': 'X', 'q': 'Q', 'g': 'G', 'd': 'D', 'r': 'R', 'z': 'Z', 'c': 'C', 'w': 'W', 'a': 'A', 'y': 'Y', 'j': 'J', 'f': 'F', 's': 'S'}

As Martijn Pieters noted, you have problem with the list append that adds a list within your other list. You can add two list in any of the following ways for simplicity:
alphabetList = list(string.ascii_lowercase)
alphabetList += list(string.ascii_lowercase)
# Adds two lists; same as that of alphabetList.extend(alphabetList)
alphabetList = list(string.ascii_lowercase) * 2
# Just for your use case to iterate twice over the alphabets
In either case, your alphabetDict will have only 26 alphabets and not 52 as you cannot have repeated keys within the dict.

KeyError: '\n' python 2.7.5

I have a dictonairy I want to compare to my string, for the each ke in the dictoniary which matches that in the string I wish to convert the string character to that of the dictoniary
I want to compare my dictionary to my string character by character and when they match replace the strings character with the value of the dictionary's match e.g. if A is in the string it will match to A in the dictionary and be replaced with T which is written to the file line2_u_rev_comp. However the error KeyError: '\n' occurs instead. What is this signaling and how can it be removed?
REV_COMP = {
'A': 'T',
'T': 'A',
'C': 'G',
'G': 'C',
'N': 'N',
'U': 'A'
}
tbl = REV_COMP
line2_u_rev_comp = [tbl[k] for k in line2_u_rev[::-1]]
''.join(line2_u_rev_comp)

'\n' means new line, and you can get rid of it (and other extraneous whitespace) using str.strip, e.g.:
line2_u_rev_comp = [tbl[k] for k in line2_u_rev.strip()[::-1]]

line2_u_rev_comp = [tbl.get(k,k) ... ]
this will either get it from the dictionary or return itself

The problem is the tbl[k] but you don't check if the key exists in the dict, if not you need to return k it self.
you also need to reverse again the list since your for statement is reversed.
Try this code:
line2_u_rev = "MY TEST IS THIS"
REV_COMP = {
'A': 'T',
'T': 'A',
'C': 'G',
'G': 'C',
'N': 'N',
'U': 'A'
}
tbl = REV_COMP
line2_u_rev_comp = [tbl[k] if k in tbl else k for k in line2_u_rev[::-1]][::-1]
print ''.join(line2_u_rev_comp)
Output:
MY AESA IS AHIS

Making a dictionary from a list of lists

I have been unable to figure this out, I think the problem might be in the way I am making the list of lists. Can anyone help out? Thanks!
My desired outcome is
codondict = {'A': ['GCT','GCC','GCA','GCG'], 'C': ['TGT','TGC'], &c
but what i get is:
{'A': 'A', 'C': 'C', &c.
Here's my terminal:
A=['GCT','GCC','GCA','GCG']
C=['TGT','TGC']
D=['GAT','GAC']
E=['GAA','GAG']
F=['TTT','TTC']
G=['GGT','GGC','GGA','GGG']
H=['CAT','CAC']
I=['ATT','ATC','ATA']
K=['AAA','AAG']
L=['TTA','TTG','CTT','CTC','CTA','CTG']
M=['ATG']
N=['AAT','AAC']
P=['CCT','CCC','CCA','CCG']
Q=['CAA','CAG']
R=['CGT','CGC','CGA','CGG','AGA','AGG']
S=['TCT','TCC','TCA','TCG','AGT','AGC']
T=['ACT','ACC','ACA','ACG']
V=['GTT','GTC','GTA','GTG']
W=['TGG']
Y=['TAT','TAC']
aminoacids=['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y']
from collections import defaultdict
codondict=defaultdict(list)
for i in aminoacids:
... for j in i:(ALSO TRIED for j in list(i))
... ... codondict[i]=j
...
codondict
defaultdict(, {'A': 'A', 'C': 'C', 'E': 'E', 'D': 'D', 'G': 'G', 'F': 'F', 'I': 'I', 'H': 'H', 'K': 'K', 'M': 'M', 'L': 'L', 'N': 'N', 'Q': 'Q', 'P': 'P', 'S': 'S', 'R': 'R', 'T': 'T', 'W': 'W', 'V': 'V', 'Y': 'Y'})

You can try this:
condondict= dict(A=['GCT','GCC','GCA','GCG'],
C=['TGT','TGC'],
D=['GAT','GAC'],
E=['GAA','GAG'],
F=['TTT','TTC'],
G=['GGT','GGC','GGA','GGG'],
H=['CAT','CAC'],
I=['ATT','ATC','ATA'],
K=['AAA','AAG'],
L=['TTA','TTG','CTT','CTC','CTA','CTG'],
M=['ATG'],
N=['AAT','AAC'],
P=['CCT','CCC','CCA','CCG'],
Q=['CAA','CAG'],
R=['CGT','CGC','CGA','CGG','AGA','AGG'],
S=['TCT','TCC','TCA','TCG','AGT','AGC'],
T=['ACT','ACC','ACA','ACG'],
V=['GTT','GTC','GTA','GTG'],
W=['TGG'],
Y=['TAT','TAC'])
The reason to use defaultdict() is to allow access/creation of dictionary values without causing a KeyError, or by-pass using the form:
if key not in mydict.keys():
mydict[key] = []
mydict[key].append(something)
If your not creating new keys dynamically, you don't really need to use defaultdict().
Also if your keys already represent the aminoacids, you and just iterate over the keys themselves.
for aminoacid, sequence in condondict.iteritems():
# do stuff with with data...

Another way to do what you need is using the locals() function, which returns a dictionary containing the whole set of variables of the local scope, with the variable names as the keys and its contents as values.
for i in aminoacids:
codondict[i] = locals()[i]
So, you could get the A list, for example, using: locals()['A'].

That's kind of verbose, and is confusing the name of a variable 'A' with its value A. Keeping to what you've got:
aminoacids = { 'A': A, 'C': C, 'D': D ... }
should get you the dictionary you ask for:
{ 'A' : ['GCT', 'GCC', 'GCA', 'GCG'], 'C' : ['TGT', 'TGC'], ... }
where the order of keys 'A' and 'C' may not be what you get back because dictionaries are not ordered.

You can use globals() built-in too, and dict comprehension:
codondict = {k:globals()[k] for k in aminoacids}
it's better to rely on locals() instead of globals(), like stummjr's solution, but you can't do so with dict comprehension directly
codondict = dict([(k,locals()[k]) for k in aminoacids])
However you can do this:
loc = locals()
codondict = {k:loc[k] for k in aminoacids}
If you change dinamically your aminoacids list or the aminoacids assignments, it's better to use something lazier, like:
codondict = lambda: {k:globals()[k] for k in aminoacids}
with this last you can always use the updated dictionary, but it's now a callable, so use codondict()[x] instead of codondict[x] to get an actual dict. This way you can store the entire dict like hist = codondict() in case you need to compare different historical versions of codondict. That's small enough to be useful in interactive modes, but not recommended in bigger codes, though.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace symbols in a string with conflicting keys in a dictionary - python

Related

Replace numbers with letters and offer all permutations

Adding two strings to a dictionary

"TypeError: unhashable type: 'list'" yet I'm trying to only slice the value of the list, not use the list itself

KeyError: '\n' python 2.7.5

Making a dictionary from a list of lists

Categories

Resources