I need to determine all possible letter combinations of a string that has numbers when converting numbers into possible visually similar letters.
Using the dictionary:
number_appearance = {
'1': ['l', 'i'],
'2': ['r', 'z'],
'3': ['e', 'b'],
'4': ['a'],
'5': ['s'],
'6': ['b', 'g'] ,
'7': ['t'],
'8': ['b'],
'9': ['g', 'p'],
'0': ['o', 'q']}
I want to write a function that takes an input and creates all possible letter combinations. For example:
text = 'l4t32'
def convert_numbers(text):
return re.sub('[0-9]', lambda x: number_appearance[x[0]][0], text)
I want the output to be a list with all possible permutations:
['later', 'latbr', 'latbz', 'latez]
The function above works if you are just grabbing the first letter in each list from number_appearance, but I'm trying to figure out the best way to iterate through all possible combinations. Any help would be much appreciated!
As an upgrade from your own answer, I suggest the following:
def convert_numbers(text):
all_items = [number_appearance.get(char, [char]) for char in text]
return [''.join(elem) for elem in itertools.product(*all_items)]
The improvements are that:
it doesn't convert text to a list (there is no need for that)
you don't need regex
it will still work if you decide instead that you also want to add other characters on top of numbers
def convert_num_appearance(text):
string_characters = [character for character in text]
all_items = []
for item in string_characters:
if re.search('[a-zA-Z]', item):
all_items.append([item])
elif re.search('\d', item):
all_items.append(number_appearance[item])
return [''.join(elem) for elem in itertools.product(*all_items)]
I would break down the problem like so:
First, create a function that can do the replacement for a given set of replacement letters. My input specification is a sequence of letters, where the first letter is the replacement for the '0' character, next for 1 etc. This allows me to use the index in that sequence to determine the character being replaced, while generating a plain sequence rather than a dict or other complex structure. To do the replacement, I will use the built-in translate method of the original string. That requires a dictionary as described in the documentation, which I can easily build with a dict comprehension, or with the provided helper method str.maketrans (a static method of the str type).
Use itertools.product to generate those sequences.
Use a list comprehension to apply the replacement for each sequence.
Thus:
from itertools import product
def replace_digits(original, replacement):
# translation = {ord(str(i)): c for i, c in enumerate(replacement)}
translation = str.maketrans('0123456789', ''.join(replacement))
print(translation)
return original.translate(translation)
replacements = product(
['o', 'q'], ['l', 'i'], ['r', 'z'], ['e', 'b'], ['a'],
['s'], ['b', 'g'] , ['t'], ['b'], ['g', 'p']
)
[replace_digits('14732', r) for r in replacements]
(You will notice there are duplicates in the result; this is because of variant replacements for symbols that don't appear in the input.)
I've been having issues trying to create a dictionary by using the values from a list.
alphabetList = list(string.ascii_lowercase)
alphabetList.append(list(string.ascii_lowercase))
alphabetDict = {}
def makeAlphabetDict (Dict, x):
count = 0
while count <= len(alphabetList):
item1 = x[(count + (len(alphabetList) / 2))]
item2 = item1
Dict[item1] = item2
count += 1
makeAlphabetDict(alphabetDict , alphabetList)
Which returns:
TypeError: unhashable type: 'list'
I tried here and other similar questions yet I still can't see why Python thinks I'm trying to use the list, rather than just a slice from a list.
Your list contains a nested list:
alphabetList.append(list(string.ascii_lowercase))
You now have a list with ['a', 'b', ..., 'z', ['a', 'b', ..., 'z']]. It is that last element in the outer list that causes your problem.
You'd normally would use list.extend() to add additional elements:
alphabetList.extend(string.ascii_lowercase)
You are using string.ascii_lowercase twice there; perhaps you meant to use ascii_uppercase for one of those strings instead? Even so, your code always uses the same character for both key and value so it wouldn't really matter here.
If you are trying to map lowercase to uppercase or vice-versa, just use zip() and dict():
alphabetDict = dict(zip(string.ascii_lowercase, string.ascii_uppercase))
where zip() produces pairs of characters, and dict() takes those pairs as key-value pairs. The above produces a dictionary mapping lowercase ASCII characters to uppercase:
>>> import string
>>> dict(zip(string.ascii_lowercase, string.ascii_uppercase))
{'u': 'U', 'v': 'V', 'o': 'O', 'k': 'K', 'n': 'N', 'm': 'M', 't': 'T', 'l': 'L', 'h': 'H', 'e': 'E', 'p': 'P', 'i': 'I', 'b': 'B', 'x': 'X', 'q': 'Q', 'g': 'G', 'd': 'D', 'r': 'R', 'z': 'Z', 'c': 'C', 'w': 'W', 'a': 'A', 'y': 'Y', 'j': 'J', 'f': 'F', 's': 'S'}
As Martijn Pieters noted, you have problem with the list append that adds a list within your other list. You can add two list in any of the following ways for simplicity:
alphabetList = list(string.ascii_lowercase)
alphabetList += list(string.ascii_lowercase)
# Adds two lists; same as that of alphabetList.extend(alphabetList)
alphabetList = list(string.ascii_lowercase) * 2
# Just for your use case to iterate twice over the alphabets
In either case, your alphabetDict will have only 26 alphabets and not 52 as you cannot have repeated keys within the dict.
I have a dictonairy I want to compare to my string, for the each ke in the dictoniary which matches that in the string I wish to convert the string character to that of the dictoniary
I want to compare my dictionary to my string character by character and when they match replace the strings character with the value of the dictionary's match e.g. if A is in the string it will match to A in the dictionary and be replaced with T which is written to the file line2_u_rev_comp. However the error KeyError: '\n' occurs instead. What is this signaling and how can it be removed?
REV_COMP = {
'A': 'T',
'T': 'A',
'C': 'G',
'G': 'C',
'N': 'N',
'U': 'A'
}
tbl = REV_COMP
line2_u_rev_comp = [tbl[k] for k in line2_u_rev[::-1]]
''.join(line2_u_rev_comp)
'\n' means new line, and you can get rid of it (and other extraneous whitespace) using str.strip, e.g.:
line2_u_rev_comp = [tbl[k] for k in line2_u_rev.strip()[::-1]]
line2_u_rev_comp = [tbl.get(k,k) ... ]
this will either get it from the dictionary or return itself
The problem is the tbl[k] but you don't check if the key exists in the dict, if not you need to return k it self.
you also need to reverse again the list since your for statement is reversed.
Try this code:
line2_u_rev = "MY TEST IS THIS"
REV_COMP = {
'A': 'T',
'T': 'A',
'C': 'G',
'G': 'C',
'N': 'N',
'U': 'A'
}
tbl = REV_COMP
line2_u_rev_comp = [tbl[k] if k in tbl else k for k in line2_u_rev[::-1]][::-1]
print ''.join(line2_u_rev_comp)
Output:
MY AESA IS AHIS
I have been unable to figure this out, I think the problem might be in the way I am making the list of lists. Can anyone help out? Thanks!
My desired outcome is
codondict = {'A': ['GCT','GCC','GCA','GCG'], 'C': ['TGT','TGC'], &c
but what i get is:
{'A': 'A', 'C': 'C', &c.
Here's my terminal:
A=['GCT','GCC','GCA','GCG']
C=['TGT','TGC']
D=['GAT','GAC']
E=['GAA','GAG']
F=['TTT','TTC']
G=['GGT','GGC','GGA','GGG']
H=['CAT','CAC']
I=['ATT','ATC','ATA']
K=['AAA','AAG']
L=['TTA','TTG','CTT','CTC','CTA','CTG']
M=['ATG']
N=['AAT','AAC']
P=['CCT','CCC','CCA','CCG']
Q=['CAA','CAG']
R=['CGT','CGC','CGA','CGG','AGA','AGG']
S=['TCT','TCC','TCA','TCG','AGT','AGC']
T=['ACT','ACC','ACA','ACG']
V=['GTT','GTC','GTA','GTG']
W=['TGG']
Y=['TAT','TAC']
aminoacids=['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y']
from collections import defaultdict
codondict=defaultdict(list)
for i in aminoacids:
... for j in i:(ALSO TRIED for j in list(i))
... ... codondict[i]=j
...
codondict
defaultdict(, {'A': 'A', 'C': 'C', 'E': 'E', 'D': 'D', 'G': 'G', 'F': 'F', 'I': 'I', 'H': 'H', 'K': 'K', 'M': 'M', 'L': 'L', 'N': 'N', 'Q': 'Q', 'P': 'P', 'S': 'S', 'R': 'R', 'T': 'T', 'W': 'W', 'V': 'V', 'Y': 'Y'})
You can try this:
condondict= dict(A=['GCT','GCC','GCA','GCG'],
C=['TGT','TGC'],
D=['GAT','GAC'],
E=['GAA','GAG'],
F=['TTT','TTC'],
G=['GGT','GGC','GGA','GGG'],
H=['CAT','CAC'],
I=['ATT','ATC','ATA'],
K=['AAA','AAG'],
L=['TTA','TTG','CTT','CTC','CTA','CTG'],
M=['ATG'],
N=['AAT','AAC'],
P=['CCT','CCC','CCA','CCG'],
Q=['CAA','CAG'],
R=['CGT','CGC','CGA','CGG','AGA','AGG'],
S=['TCT','TCC','TCA','TCG','AGT','AGC'],
T=['ACT','ACC','ACA','ACG'],
V=['GTT','GTC','GTA','GTG'],
W=['TGG'],
Y=['TAT','TAC'])
The reason to use defaultdict() is to allow access/creation of dictionary values without causing a KeyError, or by-pass using the form:
if key not in mydict.keys():
mydict[key] = []
mydict[key].append(something)
If your not creating new keys dynamically, you don't really need to use defaultdict().
Also if your keys already represent the aminoacids, you and just iterate over the keys themselves.
for aminoacid, sequence in condondict.iteritems():
# do stuff with with data...
Another way to do what you need is using the locals() function, which returns a dictionary containing the whole set of variables of the local scope, with the variable names as the keys and its contents as values.
for i in aminoacids:
codondict[i] = locals()[i]
So, you could get the A list, for example, using: locals()['A'].
That's kind of verbose, and is confusing the name of a variable 'A' with its value A. Keeping to what you've got:
aminoacids = { 'A': A, 'C': C, 'D': D ... }
should get you the dictionary you ask for:
{ 'A' : ['GCT', 'GCC', 'GCA', 'GCG'], 'C' : ['TGT', 'TGC'], ... }
where the order of keys 'A' and 'C' may not be what you get back because dictionaries are not ordered.
You can use globals() built-in too, and dict comprehension:
codondict = {k:globals()[k] for k in aminoacids}
it's better to rely on locals() instead of globals(), like stummjr's solution, but you can't do so with dict comprehension directly
codondict = dict([(k,locals()[k]) for k in aminoacids])
However you can do this:
loc = locals()
codondict = {k:loc[k] for k in aminoacids}
If you change dinamically your aminoacids list or the aminoacids assignments, it's better to use something lazier, like:
codondict = lambda: {k:globals()[k] for k in aminoacids}
with this last you can always use the updated dictionary, but it's now a callable, so use codondict()[x] instead of codondict[x] to get an actual dict. This way you can store the entire dict like hist = codondict() in case you need to compare different historical versions of codondict. That's small enough to be useful in interactive modes, but not recommended in bigger codes, though.