I wrote a simple program to translate DNA to RNA. Basically, you input a string, it separates the string into characters and sends them to a list, shifts the letter and returns a string from the resulting list. This program correctly translates a to u, and to to a, but does not change g to c and c to g.
This is the program:
def trad(x):
h=[]
for letter in x:
h.append(letter)
for letter in h:
if letter=="a":
h[h.index(letter)]="u"
continue
if letter=="t":
h[h.index(letter)]="a"
continue
if letter=="g":
h[h.index(letter)]="c"
continue
if letter=="c":
h[h.index(letter)]="g"
continue
ret=""
for letter in h:
ret+=letter
return ret
while True:
stry=raw_input("String?")
print trad(stry)
Now, just altering the program by not iterating over elements, but on positions, it works as expected. This is the resulting code:
def trad(x):
h=[]
for letter in x:
h.append(letter)
for letter in xrange (0, len(h)):
if h[letter]=="a":
h[letter]="u"
continue
if h[letter]=="t":
h[letter]="a"
continue
if h[letter]=="g":
h[letter]="c"
continue
if h[letter]=="c":
h[letter]="g"
continue
ret=""
for letter in h:
ret+=letter
return ret
while True:
stry=raw_input("String?")
print trad(stry)
Why does this strange behaviour occur, and how can I resolve it?
You are going about this a much harder way than is necessary, this could easily be done using str.translate() - a method on str instances that translates instances of one character to another, which is exactly what you want:
import string
replacements = string.maketrans("atgc", "uacg")
while True:
stry=raw_input("String?")
print stry.translate(replacements)
This is an answer for 2.x, in 3.x, use str.maketrans() instead.
I'm not sure what type of issue you are having, but here's a simple way to do it, using a dictionary.
def trad(coding_strand):
mRNA_parts = {'a': 'u', 't': 'a', 'g': 'c', 'c': 'g'}
mRNA = ''
for nucleotide in coding_strand: # this makes it lowercase
mRNA += mRNA_parts[nucleotide.lower()]
return mRNA.upper() # returns it as uppercase
I have it returned as uppercase because, generally, nucleotides in DNA/RNA are written in uppercase.
I also revised your method... It's better to iterate through the indices themselves; then you don't have to do l.index(elem).
def trad(coding_strand):
mRNA = []
for index in range(len(coding_strand)):
nucleotide = coding_strand[index].upper()
if nucleotide == 'A':
mRNA.append('U')
elif nucleotide == 'T':
mRNA.append('A')
elif nucleotide == 'C':
mRNA.append('G')
elif nucleotide == 'G':
mRNA.append('C')
ret = ''
for letter in mRNA:
ret += mRNA
print ret
I don't suggest using a string and adding on to it nor using a list; a list comprehension is much more effective.
Here's a semi-one-liner, courtesy of BurhanKhalid:
def trad(coding_strand):
mRNA_parts = {'A': 'U', 'T': 'A', 'G': 'C', 'C': 'G'}
return ''.join([mRNA_parts[nucleotide] for nucleotide in coding_strand.upper()])
A complete one-liner:
def trade(coding_strand, key={'A': 'U', 'T': 'A', 'G': 'C', 'C': 'G'}): ''.join(return [key[i] for i in coding_strand.upper()])
Some references:
Dictionaries
List Comprehensions
Related
I have a dictionary that looks like this:
Letters = {'a': 'z', 'b': 'q', 'm':'j'}
I also have a block of random text.
How do I get replace the letters in my text with the letter from the dictionary. So switch 'a' to 'z' and 'b' to 'q'.
What I've done so far is returning the wrong output and I can't think of any other way:
splitter = text.split()
res = " ".join(Letters.get(i,i) for i in text.split())
print(res)
What we can do is loop through each character in the string and add either the new character or the original character if the new character is not found.
text = "my text ba"
letters = {'a': 'z', 'b': 'q', 'm': 'j'}
result = ""
for letter in text: # For each letter in our string,
result += letters.get(letter, letter) # Either get a new letter, or use the original letter if it is not found.
If you want, you can also do this with a one-liner!
text = "my text ba"
letters = {'a': 'z', 'b': 'q', 'm': 'j'}
result = "".join(letters.get(letter, letter) for letter in text)
This is a bit confusing, so to break it down, for letter in text, we get either the new letter, or we use the (default) original letter.
You can see the documentation for the dict.get method here.
translation_table = str.maketrans({'a': 'z', 'b': 'q', 'm':'j'})
translated = your_text.translate(translation_table)
This is concise and fast
You're iterating over the words in text, not over the letters in the words.
There's no need to use text.split(). Just iterate over text itself to get the letters. And then join them using an empty string.
res = "".join(Letters.get(i,i) for i in text)
I need to write a program where I have a list of names of counties and I need to find how many of each of the 5 vowels is in that list and put the 5 numbers in a dictionary.
I wanted to make a for-loop to go through each vowel, and each time it goes through the loop, add a new entry in a dictionary, with the vowel as the key and the count as the value.
It should print: {'a':4, 'e':4, 'i':4, 'o':4, 'u':4}. I don't know how many of the vowels there are so I just wrote 4 for all the values in the example.
The list of counties is really long so I just pasted a shortened version here.
counties = ['Autauga','Baldwin','Barbour','Bibb','Blount','Bullock','Butler','Calhoun','Chambers','Cherokee','Chilton','Choctaw','Clarke','Clay','Cleburne','Coffee','Colbert','Conecuh','Coosa','Covington','Crenshaw','Cullman','Dale','Dallas']
letter = ('a', 'e', 'i', 'o', 'u')
counter = 0
d={}
for it in clist:
def func(clist, letterlist, count):
count += clist.count(letterlist)
print("the number of vowels:" count)
return count
func(counties, letter, counter)
As you can see, I am very new to Python and have no idea what I am doing. I can't get it to work and definitely can't get it in a dictionary.
Any help would be appreciated!
You can use nested for loops to iterate over the counties list and the characters of each county, and keep incrementing the output dict with the character as the key if the character is in the vowels list:
d = {}
for county in counties:
for character in county.lower():
if character in vowels:
d[character] = d.get(character, 0) + 1
d becomes:
{'a': 16, 'u': 10, 'i': 4, 'o': 14, 'e': 14}
Alternatively, you can use collections.Counter with a generator expression that extracts the vowel characters from the list of strings:
from collections import Counter
Counter(c for county in counties for c in county.lower() if c in vowels)
I believe you're trying to do what I've put below (I've omitted your list of countries). I tried to add comments, and you can add print lines to see what each piece of code is doing.
vowels = ('a', 'e', 'i','o', 'u')
d={}
## select countries 1 at a time
for country in countries:
# convert each country to lowercase, to match list of vowels, otherwise, you need to deal with upper and lowercase
country = country.lower()
# select letter in country 1 at a time
for i in range(len(country)):
# check to see whether the letter selected in the country, the ith letter, is a vowel
if (country[i] == 'a') or (country[i] == 'e') or (country[i] == 'i') or (country[i] == 'o') or (country[i] == 'u'):
# if the ith letter is a vowel, but is not yet in the dictionary, add it
if (country[i]) not in d:
d[country[i]] = 1
# if the ith letter is a vowel, and is already in the dictionary, then increase the counter by 1
else:
d[country[i]] += 1
print(d) # {'a': 16, 'u': 10, 'i': 4, 'o': 14, 'e': 14}
I'm trying to create a Python function that uses the Caesar cipher to encrypt a message.
So far, the code I have is
letter = input("Enter a letter: ")
def alphabet_position(letter):
alphabet_pos = {'A':0, 'a':0, 'B':1, 'b':1, 'C':2, 'c':2, 'D':3,
'd':3, 'E':4, 'e':4, 'F':5, 'f':5, 'G':6, 'g':6,
'H':7, 'h':7, 'I':8, 'i':8, 'J':9, 'j':9, 'K':10,
'k':10, 'L':11, 'l':11, 'M':12, 'm':12, 'N': 13,
'n':13, 'O':14, 'o':14, 'P':15, 'p':15, 'Q':16,
'q':16, 'R':17, 'r':17, 'S':18, 's':18, 'T':19,
't':19, 'U':20, 'u':20, 'V':21, 'v':21, 'W':22,
'w':22, 'X':23, 'x':23, 'Y':24, 'y':24, 'Z':25, 'z':25 }
pos = alphabet_pos[letter]
return pos
When I try to run my code, it will ask for the letter but it doesn't return anything after that
Please help if you have any suggestions.
you would need to access your dictionary in a different way:
pos = alphabet_pos.get(letter)
return pos
and then you can finally call the function.
alphabet_position(letter)
You can define two dictionaries, one the reverse of the other. You need to be careful on a few aspects:
Whether case is important. If it's not, use str.casefold as below.
What happens when you roll off the end of the alphabet, e.g. 13th letter after "z". Below we assume you start from the beginning again.
Don't type out the alphabet manually. You can use the string module.
Here's a demo:
letter = input("Enter a letter: ")
from string import ascii_lowercase
def get_next(letter, n):
pos_alpha = dict(enumerate(ascii_lowercase))
alpha_pos = {v: k for k, v in pos_alpha.items()}
return pos_alpha[alpha_pos[letter.casefold()] + n % 26]
get_next(letter, 13)
Enter a letter: a
'n'
If you need a entirely new encoded dict
import string
import numpy as np, random
letters = string.ascii_uppercase
d=dict(zip(list(letters),range(0,len(letters))))
encoded_dic={}
def get_caesar_value(v, by=13):
return(v+by)%26
for k,v in d.items():
encoded_dic[k]=chr(65+get_caesar_value(v))
print(encoded_dic)
Output:
{'A': 'N', 'C': 'P', 'B': 'O', 'E': 'R', 'D': 'Q', 'G': 'T', 'F': 'S', 'I': 'V', 'H': 'U', 'K': 'X', 'J': 'W', 'M': 'Z', 'L': 'Y', 'O': 'B', 'N': 'A', 'Q': 'D', 'P': 'C', 'S': 'F', 'R': 'E', 'U': 'H', 'T': 'G', 'W': 'J', 'V': 'I', 'Y': 'L', 'X': 'K', 'Z': 'M'}
The code you have only maps letters to a position. We'll rewrite it and make a rotate function.
Code
import string
import itertools as it
LOOKUP = {
**{x:i for i, x in enumerate(string.ascii_lowercase)},
**{x:i for i, x in enumerate(string.ascii_uppercase)}
}
def abc_position(letter):
"""Return the alpha position of a letter."""
return LOOKUP[letter]
def rotate(letter, shift=13):
"""Return a letter shifted some positions to the right; recycle at the end."""
iterable = it.cycle(string.ascii_lowercase)
start = it.dropwhile(lambda x: x != letter.casefold(), iterable)
# Advance the iterator
for i, x in zip(range(shift+1), start):
res = x
if letter.isupper():
return res.upper()
return res
Tests
func = abc_position
assert func("a") == 0
assert func("A") == 0
assert func("c") == 2
assert func("z") == 25
func = rotate
assert func("h") == "u"
assert func("a", 0) == "a"
assert func("A", 0) == "A"
assert func("a", 2) == "c"
assert func("c", 3) == "f"
assert func("A", 2) == "C"
assert func("a", 26) == "a"
# Restart after "z"
assert func("z", 1) == "a"
assert func("Z", 1) == "A"
Demo
>>> letter = input("Enter a letter: ")
Enter a letter: h
>>> rot = rotate(letter, 13)
>>> rot
'u'
>>> abc_position(rot)
20
Here we rotated the letter "h" 13 positions, got a letter and then determined the position of this resultant letter in the normal string of abc's.
Details
abc_position()
This function was rewritten to lookup the position of a letter. It merges two dictionaries:
one that enumerates a lowercase ascii letters
one that enumerates a uppercase ascii letters
The string module has this letters already.
rotate()
This function only rotates lowercase letters; uppercase letters are translated from the lowercase position. The string of letters is rotated by making an infinite cycle (an iterator) of lowercase letters.
The cycle is first advanced to start at the desired letter. This is done by dropping all letters that don't look like the one passed in.
Then it is advanced in a loop some number of times equal to shift. The loop is just one way to consume or move the iterator ahead. We only care about the last letter, not the ones in between. This letter is returned, either lower or uppercase.
Since a letter is returned (not a position), you can now use your abc_position() function to find it's normal position.
Alternatives
Other rotation functions can substitute rotate():
import codecs
def rot13(letter):
return codecs.encode(letter, "rot13")
def rot13(letter):
table = str.maketrans(
"ABCDEFGHIJKLMabcdefghijklmNOPQRSTUVWXYZnopqrstuvwxyz",
"NOPQRSTUVWXYZnopqrstuvwxyzABCDEFGHIJKLMabcdefghijklm")
return str.translate(letter, table)
However, these options are constrained to rot13, while rotate() can be shifted by any number. Note: rot26 will cycle back to the beginning, e.g. rotate("a", 26) -> a.
See also this post on how to make true rot13 cipher.
See also docs on itertools.cycle and itertools.dropwhile.
You can do it with quick calculations from ord and chr functions instead:
def encrypt(letter):
return chr((ord(letter.lower()) - ord('a') + 13) % 26 + ord('a'))
so that:
print(encrypt('a'))
print(encrypt('o'))
outputs:
n
b
I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells with something other than A, T, G and C. I was able to get reverse complement with this piece of code:
def complement(seq):
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
bases = list(seq)
bases = [complement[base] for base in bases]
return ''.join(bases)
def reverse_complement(s):
return complement(s[::-1])
print "Reverse Complement:"
print(reverse_complement("TCGGGCCC"))
However, when I try to find the item which is not present in the complement dictionary, using the code below, I just get the complement of the last base. It doesn't iterate. I'd like to know how I can fix it.
def complement(seq):
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
bases = list(seq)
for element in bases:
if element not in complement:
print element
letters = [complement[base] for base in element]
return ''.join(letters)
def reverse_complement(seq):
return complement(seq[::-1])
print "Reverse Complement:"
print(reverse_complement("TCGGGCCCCX"))
The other answers are perfectly fine, but if you plan to deal with real DNA sequences I suggest using Biopython. What if you encounter a character like "-", "*" or indefinitions? What if you want to do further manipulations of your sequences? Do you want to create a parser for each file format out there?
The code you ask for is as easy as:
from Bio.Seq import Seq
seq = Seq("TCGGGCCC")
print seq.reverse_complement()
# GGGCCCGA
Now if you want to do another transformations:
print seq.complement()
print seq.transcribe()
print seq.translate()
Outputs
AGCCCGGG
UCGGGCCC
SG
And if you run into strange chars, no need to keep adding code to your program. Biopython deals with it:
seq = Seq("TCGGGCCCX")
print seq.reverse_complement()
# XGGGCCCGA
In general, a generator expression is simpler than the original code and avoids creating extra list objects. If there can be multiple-character insertions go with the other answers.
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
seq = "TCGGGCCC"
reverse_complement = "".join(complement.get(base, base) for base in reversed(seq))
import string
old_chars = "ACGT"
replace_chars = "TGCA"
tab = string.maketrans(old_chars,replace_chars)
print "AAAACCCGGT".translate(tab)[::-1]
that will give you the reverse compliment = ACCGGGTTTT
The get method of a dictionary allows you to specify a default value if the key is not in the dictionary. As a preconditioning step I would map all your non 'ATGC' bases to single letters (or punctuation or numbers or anything that wont show up in your sequence), then reverse the sequence, then replace the single letter alternates with their originals. Alternatively, you could reverse it first and then search and replace things like sni with ins.
alt_map = {'ins':'0'}
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
def reverse_complement(seq):
for k,v in alt_map.iteritems():
seq = seq.replace(k,v)
bases = list(seq)
bases = reversed([complement.get(base,base) for base in bases])
bases = ''.join(bases)
for k,v in alt_map.iteritems():
bases = bases.replace(v,k)
return bases
>>> seq = "TCGGinsGCCC"
>>> print "Reverse Complement:"
>>> print(reverse_complement(seq))
GGGCinsCCGA
The fastest one liner for reverse complement is the following:
def rev_compl(st):
nn = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
return "".join(nn[n] for n in reversed(st))
def ReverseComplement(Pattern):
revcomp = []
x = len(Pattern)
for i in Pattern:
x = x - 1
revcomp.append(Pattern[x])
return ''.join(revcomp)
# this if for the compliment
def compliment(Nucleotide):
comp = []
for i in Nucleotide:
if i == "T":
comp.append("A")
if i == "A":
comp.append("T")
if i == "G":
comp.append("C")
if i == "C":
comp.append("G")
return ''.join(comp)
Give a try to below code,
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
seq = "TCGGGCCC"
reverse_complement = "".join(complement.get(base, base) for base in reversed(seq))
Considering also degenerate bases:
def rev_compl(seq):
BASES ='NRWSMBDACGTHVKSWY'
return ''.join([BASES[-j] for j in [BASES.find(i) for i in seq][::-1]])
This may be the quickest way to complete a reverse compliment:
def complement(seq):
complementary = { 'A':'T', 'T':'A', 'G':'C','C':'G' }
return ''.join(reversed([complementary[i] for i in seq]))
Using the timeit module for speed profiling, this is the fastest algorithm I came up with with my coworkers for sequences < 200 nucs:
sequence \
.replace('A', '*') \ # Temporary symbol
.replace('T', 'A') \
.replace('*', 'T') \
.replace('C', '&') \ # Temporary symbol
.replace('G', 'C') \
.replace('&', 'G')[::-1]
How can't the codes below work, in order to get the complement of the character entered? It seems like the loop never end, but let say, if I enter 'Z' as dna, why wouldn't it break and quit? Did I use the break or if wrongly? How about elif?
def get_complement(dna):
''' (ch) -> ch
Reverse the 'A' to 'T' or vice versa and 'C' to 'G' and vice versa too.
>>> get_complement('A')
'C'
>>> get_complement('G')
'T'
'''
if dna == 'A':
print ('C')
if dna == 'C':
print ('A')
if dna == 'T':
print ('G')
if dna == 'G' :
print ('T')
while {'A', 'C', 'G', 'T'}.isnotsubset(set(dna)) :
break
return ('')
You should set up a map, using a dictionary
complement = {'A': 'C', 'C': 'A', 'T': 'G', 'G': 'T'}
Then for some string you can do
original = "ATCGTCA"
"".join(complement[letter] for letter in original)
Output
'CGATGAC'
For just a single character:
complement['A']
Output
'C'
As your example is written (and as Cyber has written his answer based on your example) you are not getting the complement. You're getting A -> C (instead of the complement T), T -> G instead of A, etc.
Using a dictionary as Cyber has done, it should look like this:
complement = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
And in code, including a check for non-DNA characters:
original = "ATCGTCA"
bad_original = "ATCGTCAZ"
complement = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
for dna in (original, bad_original):
try:
output = "".join([complement[x] for x in dna])
except KeyError:
output = "Contains non-DNA characters"
print output
Where "original" yields "TAGCAGT" and "bad_original" yields "Contains non-DNA characters".
Note that this is complement, not the reverse complement, which is usually of more interest.
More generally, if you are planning on using this for sequences of DNA, you should probably look into the BioPython module (http://biopython.org/wiki/Seq#Complement_and_reverse_complement), which will get you complement (and reverse complement) with more versatility, error checking, etc.