How do I convert this iterative function to a recursive one? - python

This function map input strings to that in a dictionary, outputting the result. Any idea how this can be approached recursively?
def dna(seq):
hashtable = {'A': 'U', 'G': 'C', 'T': 'A', 'C': 'G'}
ans = ''
for i in range(len(seq)):
ans += hashtable[seq[i]]
return ans
print(dna('AGCTGACGTA'))
Thanks.

You could do:
def dna(seq):
if not seq:
return ''
return {'A': 'U', 'G': 'C', 'T': 'A', 'C': 'G'}[seq[0]] + dna(seq[1:])
Although this is almost certainly slower, uses more memory, and will hit Python's recursion limit. The recommended approach for almost all usecases would be iterative; modify your code to use Python's builtin string join:
def dna(seq):
hashtable = {'A': 'U', 'G': 'C', 'T': 'A', 'C': 'G'}
ans = []
for elem in seq:
ans.append(hashtable[elem])
return ''.join(ans)

You should understand recursion is not always the answer.
There is a maximum recursion depth in python which you can change. But still you will have a limit. See: https://stackoverflow.com/a/3323013/2681662
The maximum recursion depth allowed:
import sys
print(sys.getrecursionlimit())
So iterative approach is better in your case.
Still let's see how the recursive version would look like.
For a recursive function you have to follow simple rules:
Create an exit condition
Call yourself (the function) again.
def dna_r(seq):
hashy = {'A': 'U', 'G': 'C', 'T': 'A', 'C': 'G'}
if len(seq) == 1:
return hashy[seq]
return dna_r(seq[0]) + dna_r(seq[1:])

Related

"TypeError: unhashable type: 'list'" yet I'm trying to only slice the value of the list, not use the list itself

I've been having issues trying to create a dictionary by using the values from a list.
alphabetList = list(string.ascii_lowercase)
alphabetList.append(list(string.ascii_lowercase))
alphabetDict = {}
def makeAlphabetDict (Dict, x):
count = 0
while count <= len(alphabetList):
item1 = x[(count + (len(alphabetList) / 2))]
item2 = item1
Dict[item1] = item2
count += 1
makeAlphabetDict(alphabetDict , alphabetList)
Which returns:
TypeError: unhashable type: 'list'
I tried here and other similar questions yet I still can't see why Python thinks I'm trying to use the list, rather than just a slice from a list.
Your list contains a nested list:
alphabetList.append(list(string.ascii_lowercase))
You now have a list with ['a', 'b', ..., 'z', ['a', 'b', ..., 'z']]. It is that last element in the outer list that causes your problem.
You'd normally would use list.extend() to add additional elements:
alphabetList.extend(string.ascii_lowercase)
You are using string.ascii_lowercase twice there; perhaps you meant to use ascii_uppercase for one of those strings instead? Even so, your code always uses the same character for both key and value so it wouldn't really matter here.
If you are trying to map lowercase to uppercase or vice-versa, just use zip() and dict():
alphabetDict = dict(zip(string.ascii_lowercase, string.ascii_uppercase))
where zip() produces pairs of characters, and dict() takes those pairs as key-value pairs. The above produces a dictionary mapping lowercase ASCII characters to uppercase:
>>> import string
>>> dict(zip(string.ascii_lowercase, string.ascii_uppercase))
{'u': 'U', 'v': 'V', 'o': 'O', 'k': 'K', 'n': 'N', 'm': 'M', 't': 'T', 'l': 'L', 'h': 'H', 'e': 'E', 'p': 'P', 'i': 'I', 'b': 'B', 'x': 'X', 'q': 'Q', 'g': 'G', 'd': 'D', 'r': 'R', 'z': 'Z', 'c': 'C', 'w': 'W', 'a': 'A', 'y': 'Y', 'j': 'J', 'f': 'F', 's': 'S'}
As Martijn Pieters noted, you have problem with the list append that adds a list within your other list. You can add two list in any of the following ways for simplicity:
alphabetList = list(string.ascii_lowercase)
alphabetList += list(string.ascii_lowercase)
# Adds two lists; same as that of alphabetList.extend(alphabetList)
alphabetList = list(string.ascii_lowercase) * 2
# Just for your use case to iterate twice over the alphabets
In either case, your alphabetDict will have only 26 alphabets and not 52 as you cannot have repeated keys within the dict.

encoding using a random cipher

I'm trying to write a program that takes a long string of letters and characters, and creates a dictionary of {original character:random character}. It should remove characters that have already been assigned a random value.
This is what I have:
import random
all_chars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.!?'
def make_encoder(all_chars):
all_chars=list(all_chars)
encoder = {}
for c in range (0,len(all_chars)):
e = random.choice(all_chars)
all_chars.remove(e)
key = all_chars[c]
encoder[key] = e
return encoder
I keep getting index out of range: 33 on line 10 key = all_chars[c]
Here's my whole code, with the first problem fixed:
import random
all_chars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.!?'
def make_encoder(all_chars):
list_chars= list(all_chars)
all_chars= list(all_chars)
encoder = {}
i=0
while len(encoder) < len(all_chars):
e = random.choice(all_chars)
key = all_chars[i]
if key not in encoder.keys():
encoder[key] = e
i += 1
return encoder
def encode_message(encoder,msg):
encoded_msg = ""
for x in msg:
c = encoder[x]
encoded_msg = encoded_msg + c
def make_decoder(encoder):
decoder = {}
for k in encoder:
v = encoder[k]
decoder[v] = k
return decoder
def decode_message(decoder,msg):
decoded_msg = ""
for x in msg:
c = decoder[x]
decoded_msg = decoded_msg + c
def main():
alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ,.!?"
e = make_encoder(alphabet)
d = make_decoder(e)
print(e)
print(d)
phrase = input("enter a phrase")
print(phrase)
encoded = encode_message(e,phrase)
print(encoded)
decoded = decode_message(d,encoded)
print(decoded)
I now get TypeError: iteration over non-sequence of type NoneType for the line for x in msg:
You are altering the list. Point: never alter list while iterating over it.
for c in range (0,len(all_chars)): this line will iterate till length of list but at same time you removing element, so list got altered, that is why you got list out of range.
try like this:
import random
all_chars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.!?'
def make_encoder(all_chars):
all_char = list(all_chars)
encoder = {}
i=0
while len(encoder) < len(all_char):
e = random.choice(all_char)
key = all_char[i]
if key not in encoder.keys():
encoder[key] = e
i += 1
return encoder
output:
>>> make_encoder(all_chars)
{'!': '3', ',': 'l', '.': 'J', '1': 'y', '0': 'l', '3': 'G', '2': ',', '5': '6', '4': 'f', '7': 'f', '6': 'C', '9': 'F', '8': 'y', '?': 'S', 'A': 'm', 'C': 'z', 'B': 'b', 'E': 'J', 'D': '0', 'G': 'S', 'F': 'v', 'I': 'v', 'H': '?', 'K': 'd', 'J': 'X', 'M': 'o', 'L': 'O', 'O': 'Q', 'N': 'P', 'Q': 'Z', 'P': '8', 'S': 'r', 'R': 'h', 'U': 'o', 'T': 'M', 'W': 'l', 'V': '.', 'Y': 'R', 'X': 'C', 'Z': 'a', 'a': 's', 'c': 'Y', 'b': 'X', 'e': 's', 'd': 'd', 'g': 'L', 'f': 'G', 'i': 'm', 'h': 'k', 'k': 'f', 'j': '1', 'm': 'J', 'l': 'L', 'o': '2', 'n': 'N', 'q': 'n', 'p': 'l', 's': 'W', 'r': '7', 'u': 'y', 't': 'S', 'w': 'J', 'v': 'E', 'y': 'r', 'x': 'C', 'z': 'i'}
You're modifying the list as you iterate over it:
for c in range(0,len(all_chars)):
e = random.choice(all_chars)
all_chars.remove(e)
The range item range(0,len(all_chars)) is only generated when the for loop starts. That means it will always assume its length is what it started as.
After you remove a character, all_chars.remove(e), now the list is one item shorter than when the for loop started, leading to the eventual over-run.
How about this instead:
while all_chars: # While there are chars left in the list
...
You should never modify an iterable while you are iterating over it.
Think about it: you told Python to loop from 0 to the length of the list all_chars, which is 66 in the beginning. But you are constantly shrinking this length with all_chars.remove(e). So, the loop still loops 66 times, but all_chars only has 66 items for the first iteration. Afterwards, it has 65, then 64, then 63, etc.
Eventually, you will run into an IndexError when c equals the length of the list (which happens at c==33). Note that it is not when c is greater than the length because Python indexes start at 0:
>>> [1, 2, 3][3] # There is no index 3 because 0 is the first index
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> [1, 2, 3][2] # 2 is the greatest index
3
>>>
To fix the problem, you can either:
Stop removing elements from all_chars inside the loop. That way, its length will always be 66.
Use a while True: loop and break when all_chars is empty (you run out of characters).
I would recommend making two strings or at least separating the two databases.
import random
all_chars='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.!?'
def make_encoder(all_chars):
list_chars= list(all_chars)
all_chars= list(all_chars) #<-------------EDIT
encoder = {}
for c in all_chars:
e = random.choice(list_chars)
list_chars.remove(e)
key = c #<---------------EDIT
encoder[key] = e
return encoder<--------EDIT, unindented this line.
That is your issue, because you were taking away from the list you were iterating though. Making two lists, although a little messy, is the best way.
You don't have to remove it from the initial string (it's bad practice to change a item while iterating over it)
Just check if the item isn't already in the dictonary.
import random
all_chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.!?'
encoder = {}
n = 0
while len(all_chars) != len(encoder):
rand = random.choice(all_chars)
if rand not in encoder:
encoder[all_chars[n]] = rand
n += 1
for k,v in sorted(encoder.iteritems()):
print k,v
By the way, your encoder may work fine by doing this, but you have no way to decode it back since you are using a random factor to build the encoder. You can fix this by using random.seed('KEY').

Reverse complement of DNA strand using Python

I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells with something other than A, T, G and C. I was able to get reverse complement with this piece of code:
def complement(seq):
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
bases = list(seq)
bases = [complement[base] for base in bases]
return ''.join(bases)
def reverse_complement(s):
return complement(s[::-1])
print "Reverse Complement:"
print(reverse_complement("TCGGGCCC"))
However, when I try to find the item which is not present in the complement dictionary, using the code below, I just get the complement of the last base. It doesn't iterate. I'd like to know how I can fix it.
def complement(seq):
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
bases = list(seq)
for element in bases:
if element not in complement:
print element
letters = [complement[base] for base in element]
return ''.join(letters)
def reverse_complement(seq):
return complement(seq[::-1])
print "Reverse Complement:"
print(reverse_complement("TCGGGCCCCX"))
The other answers are perfectly fine, but if you plan to deal with real DNA sequences I suggest using Biopython. What if you encounter a character like "-", "*" or indefinitions? What if you want to do further manipulations of your sequences? Do you want to create a parser for each file format out there?
The code you ask for is as easy as:
from Bio.Seq import Seq
seq = Seq("TCGGGCCC")
print seq.reverse_complement()
# GGGCCCGA
Now if you want to do another transformations:
print seq.complement()
print seq.transcribe()
print seq.translate()
Outputs
AGCCCGGG
UCGGGCCC
SG
And if you run into strange chars, no need to keep adding code to your program. Biopython deals with it:
seq = Seq("TCGGGCCCX")
print seq.reverse_complement()
# XGGGCCCGA
In general, a generator expression is simpler than the original code and avoids creating extra list objects. If there can be multiple-character insertions go with the other answers.
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
seq = "TCGGGCCC"
reverse_complement = "".join(complement.get(base, base) for base in reversed(seq))
import string
old_chars = "ACGT"
replace_chars = "TGCA"
tab = string.maketrans(old_chars,replace_chars)
print "AAAACCCGGT".translate(tab)[::-1]
that will give you the reverse compliment = ACCGGGTTTT
The get method of a dictionary allows you to specify a default value if the key is not in the dictionary. As a preconditioning step I would map all your non 'ATGC' bases to single letters (or punctuation or numbers or anything that wont show up in your sequence), then reverse the sequence, then replace the single letter alternates with their originals. Alternatively, you could reverse it first and then search and replace things like sni with ins.
alt_map = {'ins':'0'}
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
def reverse_complement(seq):
for k,v in alt_map.iteritems():
seq = seq.replace(k,v)
bases = list(seq)
bases = reversed([complement.get(base,base) for base in bases])
bases = ''.join(bases)
for k,v in alt_map.iteritems():
bases = bases.replace(v,k)
return bases
>>> seq = "TCGGinsGCCC"
>>> print "Reverse Complement:"
>>> print(reverse_complement(seq))
GGGCinsCCGA
The fastest one liner for reverse complement is the following:
def rev_compl(st):
nn = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
return "".join(nn[n] for n in reversed(st))
def ReverseComplement(Pattern):
revcomp = []
x = len(Pattern)
for i in Pattern:
x = x - 1
revcomp.append(Pattern[x])
return ''.join(revcomp)
# this if for the compliment
def compliment(Nucleotide):
comp = []
for i in Nucleotide:
if i == "T":
comp.append("A")
if i == "A":
comp.append("T")
if i == "G":
comp.append("C")
if i == "C":
comp.append("G")
return ''.join(comp)
Give a try to below code,
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
seq = "TCGGGCCC"
reverse_complement = "".join(complement.get(base, base) for base in reversed(seq))
Considering also degenerate bases:
def rev_compl(seq):
BASES ='NRWSMBDACGTHVKSWY'
return ''.join([BASES[-j] for j in [BASES.find(i) for i in seq][::-1]])
This may be the quickest way to complete a reverse compliment:
def complement(seq):
complementary = { 'A':'T', 'T':'A', 'G':'C','C':'G' }
return ''.join(reversed([complementary[i] for i in seq]))
Using the timeit module for speed profiling, this is the fastest algorithm I came up with with my coworkers for sequences < 200 nucs:
sequence \
.replace('A', '*') \ # Temporary symbol
.replace('T', 'A') \
.replace('*', 'T') \
.replace('C', '&') \ # Temporary symbol
.replace('G', 'C') \
.replace('&', 'G')[::-1]

Making a dictionary from a list of lists

I have been unable to figure this out, I think the problem might be in the way I am making the list of lists. Can anyone help out? Thanks!
My desired outcome is
codondict = {'A': ['GCT','GCC','GCA','GCG'], 'C': ['TGT','TGC'], &c
but what i get is:
{'A': 'A', 'C': 'C', &c.
Here's my terminal:
A=['GCT','GCC','GCA','GCG']
C=['TGT','TGC']
D=['GAT','GAC']
E=['GAA','GAG']
F=['TTT','TTC']
G=['GGT','GGC','GGA','GGG']
H=['CAT','CAC']
I=['ATT','ATC','ATA']
K=['AAA','AAG']
L=['TTA','TTG','CTT','CTC','CTA','CTG']
M=['ATG']
N=['AAT','AAC']
P=['CCT','CCC','CCA','CCG']
Q=['CAA','CAG']
R=['CGT','CGC','CGA','CGG','AGA','AGG']
S=['TCT','TCC','TCA','TCG','AGT','AGC']
T=['ACT','ACC','ACA','ACG']
V=['GTT','GTC','GTA','GTG']
W=['TGG']
Y=['TAT','TAC']
aminoacids=['A','C','D','E','F','G','H','I','K','L','M','N','P','Q','R','S','T','V','W','Y']
from collections import defaultdict
codondict=defaultdict(list)
for i in aminoacids:
... for j in i:(ALSO TRIED for j in list(i))
... ... codondict[i]=j
...
codondict
defaultdict(, {'A': 'A', 'C': 'C', 'E': 'E', 'D': 'D', 'G': 'G', 'F': 'F', 'I': 'I', 'H': 'H', 'K': 'K', 'M': 'M', 'L': 'L', 'N': 'N', 'Q': 'Q', 'P': 'P', 'S': 'S', 'R': 'R', 'T': 'T', 'W': 'W', 'V': 'V', 'Y': 'Y'})
You can try this:
condondict= dict(A=['GCT','GCC','GCA','GCG'],
C=['TGT','TGC'],
D=['GAT','GAC'],
E=['GAA','GAG'],
F=['TTT','TTC'],
G=['GGT','GGC','GGA','GGG'],
H=['CAT','CAC'],
I=['ATT','ATC','ATA'],
K=['AAA','AAG'],
L=['TTA','TTG','CTT','CTC','CTA','CTG'],
M=['ATG'],
N=['AAT','AAC'],
P=['CCT','CCC','CCA','CCG'],
Q=['CAA','CAG'],
R=['CGT','CGC','CGA','CGG','AGA','AGG'],
S=['TCT','TCC','TCA','TCG','AGT','AGC'],
T=['ACT','ACC','ACA','ACG'],
V=['GTT','GTC','GTA','GTG'],
W=['TGG'],
Y=['TAT','TAC'])
The reason to use defaultdict() is to allow access/creation of dictionary values without causing a KeyError, or by-pass using the form:
if key not in mydict.keys():
mydict[key] = []
mydict[key].append(something)
If your not creating new keys dynamically, you don't really need to use defaultdict().
Also if your keys already represent the aminoacids, you and just iterate over the keys themselves.
for aminoacid, sequence in condondict.iteritems():
# do stuff with with data...
Another way to do what you need is using the locals() function, which returns a dictionary containing the whole set of variables of the local scope, with the variable names as the keys and its contents as values.
for i in aminoacids:
codondict[i] = locals()[i]
So, you could get the A list, for example, using: locals()['A'].
That's kind of verbose, and is confusing the name of a variable 'A' with its value A. Keeping to what you've got:
aminoacids = { 'A': A, 'C': C, 'D': D ... }
should get you the dictionary you ask for:
{ 'A' : ['GCT', 'GCC', 'GCA', 'GCG'], 'C' : ['TGT', 'TGC'], ... }
where the order of keys 'A' and 'C' may not be what you get back because dictionaries are not ordered.
You can use globals() built-in too, and dict comprehension:
codondict = {k:globals()[k] for k in aminoacids}
it's better to rely on locals() instead of globals(), like stummjr's solution, but you can't do so with dict comprehension directly
codondict = dict([(k,locals()[k]) for k in aminoacids])
However you can do this:
loc = locals()
codondict = {k:loc[k] for k in aminoacids}
If you change dinamically your aminoacids list or the aminoacids assignments, it's better to use something lazier, like:
codondict = lambda: {k:globals()[k] for k in aminoacids}
with this last you can always use the updated dictionary, but it's now a callable, so use codondict()[x] instead of codondict[x] to get an actual dict. This way you can store the entire dict like hist = codondict() in case you need to compare different historical versions of codondict. That's small enough to be useful in interactive modes, but not recommended in bigger codes, though.

How to use list comprehension to add an element to copies of a dictionary?

given:
template = {'a': 'b', 'c': 'd'}
add = ['e', 'f']
k = 'z'
I want to use list comprehension to generate
[{'a': 'b', 'c': 'd', 'z': 'e'},
{'a': 'b', 'c': 'd', 'z': 'f'}]
I know I can do this:
out = []
for v in add:
t = template.copy()
t[k] = v
out.append(t)
but it is a little verbose and has no advantage over what I'm trying to replace.
This slightly more general question on merging dictionaries is somewhat related but more or less says don't.
[dict(template,z=value) for value in add]
or (to use k):
[dict(template,**{k:value}) for value in add]

Categories

Resources