Related
I have a dictionary that looks like this:
Letters = {'a': 'z', 'b': 'q', 'm':'j'}
I also have a block of random text.
How do I get replace the letters in my text with the letter from the dictionary. So switch 'a' to 'z' and 'b' to 'q'.
What I've done so far is returning the wrong output and I can't think of any other way:
splitter = text.split()
res = " ".join(Letters.get(i,i) for i in text.split())
print(res)
What we can do is loop through each character in the string and add either the new character or the original character if the new character is not found.
text = "my text ba"
letters = {'a': 'z', 'b': 'q', 'm': 'j'}
result = ""
for letter in text: # For each letter in our string,
result += letters.get(letter, letter) # Either get a new letter, or use the original letter if it is not found.
If you want, you can also do this with a one-liner!
text = "my text ba"
letters = {'a': 'z', 'b': 'q', 'm': 'j'}
result = "".join(letters.get(letter, letter) for letter in text)
This is a bit confusing, so to break it down, for letter in text, we get either the new letter, or we use the (default) original letter.
You can see the documentation for the dict.get method here.
translation_table = str.maketrans({'a': 'z', 'b': 'q', 'm':'j'})
translated = your_text.translate(translation_table)
This is concise and fast
You're iterating over the words in text, not over the letters in the words.
There's no need to use text.split(). Just iterate over text itself to get the letters. And then join them using an empty string.
res = "".join(Letters.get(i,i) for i in text)
I'm trying to create a Python function that uses the Caesar cipher to encrypt a message.
So far, the code I have is
letter = input("Enter a letter: ")
def alphabet_position(letter):
alphabet_pos = {'A':0, 'a':0, 'B':1, 'b':1, 'C':2, 'c':2, 'D':3,
'd':3, 'E':4, 'e':4, 'F':5, 'f':5, 'G':6, 'g':6,
'H':7, 'h':7, 'I':8, 'i':8, 'J':9, 'j':9, 'K':10,
'k':10, 'L':11, 'l':11, 'M':12, 'm':12, 'N': 13,
'n':13, 'O':14, 'o':14, 'P':15, 'p':15, 'Q':16,
'q':16, 'R':17, 'r':17, 'S':18, 's':18, 'T':19,
't':19, 'U':20, 'u':20, 'V':21, 'v':21, 'W':22,
'w':22, 'X':23, 'x':23, 'Y':24, 'y':24, 'Z':25, 'z':25 }
pos = alphabet_pos[letter]
return pos
When I try to run my code, it will ask for the letter but it doesn't return anything after that
Please help if you have any suggestions.
you would need to access your dictionary in a different way:
pos = alphabet_pos.get(letter)
return pos
and then you can finally call the function.
alphabet_position(letter)
You can define two dictionaries, one the reverse of the other. You need to be careful on a few aspects:
Whether case is important. If it's not, use str.casefold as below.
What happens when you roll off the end of the alphabet, e.g. 13th letter after "z". Below we assume you start from the beginning again.
Don't type out the alphabet manually. You can use the string module.
Here's a demo:
letter = input("Enter a letter: ")
from string import ascii_lowercase
def get_next(letter, n):
pos_alpha = dict(enumerate(ascii_lowercase))
alpha_pos = {v: k for k, v in pos_alpha.items()}
return pos_alpha[alpha_pos[letter.casefold()] + n % 26]
get_next(letter, 13)
Enter a letter: a
'n'
If you need a entirely new encoded dict
import string
import numpy as np, random
letters = string.ascii_uppercase
d=dict(zip(list(letters),range(0,len(letters))))
encoded_dic={}
def get_caesar_value(v, by=13):
return(v+by)%26
for k,v in d.items():
encoded_dic[k]=chr(65+get_caesar_value(v))
print(encoded_dic)
Output:
{'A': 'N', 'C': 'P', 'B': 'O', 'E': 'R', 'D': 'Q', 'G': 'T', 'F': 'S', 'I': 'V', 'H': 'U', 'K': 'X', 'J': 'W', 'M': 'Z', 'L': 'Y', 'O': 'B', 'N': 'A', 'Q': 'D', 'P': 'C', 'S': 'F', 'R': 'E', 'U': 'H', 'T': 'G', 'W': 'J', 'V': 'I', 'Y': 'L', 'X': 'K', 'Z': 'M'}
The code you have only maps letters to a position. We'll rewrite it and make a rotate function.
Code
import string
import itertools as it
LOOKUP = {
**{x:i for i, x in enumerate(string.ascii_lowercase)},
**{x:i for i, x in enumerate(string.ascii_uppercase)}
}
def abc_position(letter):
"""Return the alpha position of a letter."""
return LOOKUP[letter]
def rotate(letter, shift=13):
"""Return a letter shifted some positions to the right; recycle at the end."""
iterable = it.cycle(string.ascii_lowercase)
start = it.dropwhile(lambda x: x != letter.casefold(), iterable)
# Advance the iterator
for i, x in zip(range(shift+1), start):
res = x
if letter.isupper():
return res.upper()
return res
Tests
func = abc_position
assert func("a") == 0
assert func("A") == 0
assert func("c") == 2
assert func("z") == 25
func = rotate
assert func("h") == "u"
assert func("a", 0) == "a"
assert func("A", 0) == "A"
assert func("a", 2) == "c"
assert func("c", 3) == "f"
assert func("A", 2) == "C"
assert func("a", 26) == "a"
# Restart after "z"
assert func("z", 1) == "a"
assert func("Z", 1) == "A"
Demo
>>> letter = input("Enter a letter: ")
Enter a letter: h
>>> rot = rotate(letter, 13)
>>> rot
'u'
>>> abc_position(rot)
20
Here we rotated the letter "h" 13 positions, got a letter and then determined the position of this resultant letter in the normal string of abc's.
Details
abc_position()
This function was rewritten to lookup the position of a letter. It merges two dictionaries:
one that enumerates a lowercase ascii letters
one that enumerates a uppercase ascii letters
The string module has this letters already.
rotate()
This function only rotates lowercase letters; uppercase letters are translated from the lowercase position. The string of letters is rotated by making an infinite cycle (an iterator) of lowercase letters.
The cycle is first advanced to start at the desired letter. This is done by dropping all letters that don't look like the one passed in.
Then it is advanced in a loop some number of times equal to shift. The loop is just one way to consume or move the iterator ahead. We only care about the last letter, not the ones in between. This letter is returned, either lower or uppercase.
Since a letter is returned (not a position), you can now use your abc_position() function to find it's normal position.
Alternatives
Other rotation functions can substitute rotate():
import codecs
def rot13(letter):
return codecs.encode(letter, "rot13")
def rot13(letter):
table = str.maketrans(
"ABCDEFGHIJKLMabcdefghijklmNOPQRSTUVWXYZnopqrstuvwxyz",
"NOPQRSTUVWXYZnopqrstuvwxyzABCDEFGHIJKLMabcdefghijklm")
return str.translate(letter, table)
However, these options are constrained to rot13, while rotate() can be shifted by any number. Note: rot26 will cycle back to the beginning, e.g. rotate("a", 26) -> a.
See also this post on how to make true rot13 cipher.
See also docs on itertools.cycle and itertools.dropwhile.
You can do it with quick calculations from ord and chr functions instead:
def encrypt(letter):
return chr((ord(letter.lower()) - ord('a') + 13) % 26 + ord('a'))
so that:
print(encrypt('a'))
print(encrypt('o'))
outputs:
n
b
I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells with something other than A, T, G and C. I was able to get reverse complement with this piece of code:
def complement(seq):
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
bases = list(seq)
bases = [complement[base] for base in bases]
return ''.join(bases)
def reverse_complement(s):
return complement(s[::-1])
print "Reverse Complement:"
print(reverse_complement("TCGGGCCC"))
However, when I try to find the item which is not present in the complement dictionary, using the code below, I just get the complement of the last base. It doesn't iterate. I'd like to know how I can fix it.
def complement(seq):
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
bases = list(seq)
for element in bases:
if element not in complement:
print element
letters = [complement[base] for base in element]
return ''.join(letters)
def reverse_complement(seq):
return complement(seq[::-1])
print "Reverse Complement:"
print(reverse_complement("TCGGGCCCCX"))
The other answers are perfectly fine, but if you plan to deal with real DNA sequences I suggest using Biopython. What if you encounter a character like "-", "*" or indefinitions? What if you want to do further manipulations of your sequences? Do you want to create a parser for each file format out there?
The code you ask for is as easy as:
from Bio.Seq import Seq
seq = Seq("TCGGGCCC")
print seq.reverse_complement()
# GGGCCCGA
Now if you want to do another transformations:
print seq.complement()
print seq.transcribe()
print seq.translate()
Outputs
AGCCCGGG
UCGGGCCC
SG
And if you run into strange chars, no need to keep adding code to your program. Biopython deals with it:
seq = Seq("TCGGGCCCX")
print seq.reverse_complement()
# XGGGCCCGA
In general, a generator expression is simpler than the original code and avoids creating extra list objects. If there can be multiple-character insertions go with the other answers.
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
seq = "TCGGGCCC"
reverse_complement = "".join(complement.get(base, base) for base in reversed(seq))
import string
old_chars = "ACGT"
replace_chars = "TGCA"
tab = string.maketrans(old_chars,replace_chars)
print "AAAACCCGGT".translate(tab)[::-1]
that will give you the reverse compliment = ACCGGGTTTT
The get method of a dictionary allows you to specify a default value if the key is not in the dictionary. As a preconditioning step I would map all your non 'ATGC' bases to single letters (or punctuation or numbers or anything that wont show up in your sequence), then reverse the sequence, then replace the single letter alternates with their originals. Alternatively, you could reverse it first and then search and replace things like sni with ins.
alt_map = {'ins':'0'}
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
def reverse_complement(seq):
for k,v in alt_map.iteritems():
seq = seq.replace(k,v)
bases = list(seq)
bases = reversed([complement.get(base,base) for base in bases])
bases = ''.join(bases)
for k,v in alt_map.iteritems():
bases = bases.replace(v,k)
return bases
>>> seq = "TCGGinsGCCC"
>>> print "Reverse Complement:"
>>> print(reverse_complement(seq))
GGGCinsCCGA
The fastest one liner for reverse complement is the following:
def rev_compl(st):
nn = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
return "".join(nn[n] for n in reversed(st))
def ReverseComplement(Pattern):
revcomp = []
x = len(Pattern)
for i in Pattern:
x = x - 1
revcomp.append(Pattern[x])
return ''.join(revcomp)
# this if for the compliment
def compliment(Nucleotide):
comp = []
for i in Nucleotide:
if i == "T":
comp.append("A")
if i == "A":
comp.append("T")
if i == "G":
comp.append("C")
if i == "C":
comp.append("G")
return ''.join(comp)
Give a try to below code,
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
seq = "TCGGGCCC"
reverse_complement = "".join(complement.get(base, base) for base in reversed(seq))
Considering also degenerate bases:
def rev_compl(seq):
BASES ='NRWSMBDACGTHVKSWY'
return ''.join([BASES[-j] for j in [BASES.find(i) for i in seq][::-1]])
This may be the quickest way to complete a reverse compliment:
def complement(seq):
complementary = { 'A':'T', 'T':'A', 'G':'C','C':'G' }
return ''.join(reversed([complementary[i] for i in seq]))
Using the timeit module for speed profiling, this is the fastest algorithm I came up with with my coworkers for sequences < 200 nucs:
sequence \
.replace('A', '*') \ # Temporary symbol
.replace('T', 'A') \
.replace('*', 'T') \
.replace('C', '&') \ # Temporary symbol
.replace('G', 'C') \
.replace('&', 'G')[::-1]
I wrote a simple program to translate DNA to RNA. Basically, you input a string, it separates the string into characters and sends them to a list, shifts the letter and returns a string from the resulting list. This program correctly translates a to u, and to to a, but does not change g to c and c to g.
This is the program:
def trad(x):
h=[]
for letter in x:
h.append(letter)
for letter in h:
if letter=="a":
h[h.index(letter)]="u"
continue
if letter=="t":
h[h.index(letter)]="a"
continue
if letter=="g":
h[h.index(letter)]="c"
continue
if letter=="c":
h[h.index(letter)]="g"
continue
ret=""
for letter in h:
ret+=letter
return ret
while True:
stry=raw_input("String?")
print trad(stry)
Now, just altering the program by not iterating over elements, but on positions, it works as expected. This is the resulting code:
def trad(x):
h=[]
for letter in x:
h.append(letter)
for letter in xrange (0, len(h)):
if h[letter]=="a":
h[letter]="u"
continue
if h[letter]=="t":
h[letter]="a"
continue
if h[letter]=="g":
h[letter]="c"
continue
if h[letter]=="c":
h[letter]="g"
continue
ret=""
for letter in h:
ret+=letter
return ret
while True:
stry=raw_input("String?")
print trad(stry)
Why does this strange behaviour occur, and how can I resolve it?
You are going about this a much harder way than is necessary, this could easily be done using str.translate() - a method on str instances that translates instances of one character to another, which is exactly what you want:
import string
replacements = string.maketrans("atgc", "uacg")
while True:
stry=raw_input("String?")
print stry.translate(replacements)
This is an answer for 2.x, in 3.x, use str.maketrans() instead.
I'm not sure what type of issue you are having, but here's a simple way to do it, using a dictionary.
def trad(coding_strand):
mRNA_parts = {'a': 'u', 't': 'a', 'g': 'c', 'c': 'g'}
mRNA = ''
for nucleotide in coding_strand: # this makes it lowercase
mRNA += mRNA_parts[nucleotide.lower()]
return mRNA.upper() # returns it as uppercase
I have it returned as uppercase because, generally, nucleotides in DNA/RNA are written in uppercase.
I also revised your method... It's better to iterate through the indices themselves; then you don't have to do l.index(elem).
def trad(coding_strand):
mRNA = []
for index in range(len(coding_strand)):
nucleotide = coding_strand[index].upper()
if nucleotide == 'A':
mRNA.append('U')
elif nucleotide == 'T':
mRNA.append('A')
elif nucleotide == 'C':
mRNA.append('G')
elif nucleotide == 'G':
mRNA.append('C')
ret = ''
for letter in mRNA:
ret += mRNA
print ret
I don't suggest using a string and adding on to it nor using a list; a list comprehension is much more effective.
Here's a semi-one-liner, courtesy of BurhanKhalid:
def trad(coding_strand):
mRNA_parts = {'A': 'U', 'T': 'A', 'G': 'C', 'C': 'G'}
return ''.join([mRNA_parts[nucleotide] for nucleotide in coding_strand.upper()])
A complete one-liner:
def trade(coding_strand, key={'A': 'U', 'T': 'A', 'G': 'C', 'C': 'G'}): ''.join(return [key[i] for i in coding_strand.upper()])
Some references:
Dictionaries
List Comprehensions
Is there an way to range over characters? something like this.
for c in xrange( 'a', 'z' ):
print c
I hope you guys can help.
This is a great use for a custom generator:
Python 2:
def char_range(c1, c2):
"""Generates the characters from `c1` to `c2`, inclusive."""
for c in xrange(ord(c1), ord(c2)+1):
yield chr(c)
then:
for c in char_range('a', 'z'):
print c
Python 3:
def char_range(c1, c2):
"""Generates the characters from `c1` to `c2`, inclusive."""
for c in range(ord(c1), ord(c2)+1):
yield chr(c)
then:
for c in char_range('a', 'z'):
print(c)
import string
for char in string.ascii_lowercase:
print char
See string constants for the other possibilities, including uppercase, numbers, locale-dependent characters, all of which you can join together like string.ascii_uppercase + string.ascii_lowercase if you want all of the characters in multiple sets.
You have to convert the characters to numbers and back again.
for c in xrange(ord('a'), ord('z')+1):
print chr(c) # resp. print unicode(c)
For the sake of beauty and readability, you can wrap this in a generator:
def character_range(a, b, inclusive=False):
back = chr
if isinstance(a,unicode) or isinstance(b,unicode):
back = unicode
for c in xrange(ord(a), ord(b) + int(bool(inclusive)))
yield back(c)
for c in character_range('a', 'z', inclusive=True):
print(chr(c))
This generator can be called with inclusive=False (default) to imitate Python's usual bhehaviour to exclude the end element, or with inclusive=True (default) to include it. So with the default inclusive=False, 'a', 'z' would just span the range from a to y, excluding z.
If any of a, b are unicode, it returns the result in unicode, otherwise it uses chr.
It currently (probably) only works in Py2.
There are other good answers here (personally I'd probably use string.lowercase), but for the sake of completeness, you could use map() and chr() on the lower case ascii values:
for c in map(chr, xrange(97, 123)):
print c
If you have a short fixed list of characters, just use Python's treatment of strings as lists.
for x in 'abcd':
print x
or
[x for x in 'abcd']
I like an approach which looks like this:
base64chars = list(chars('AZ', 'az', '09', '++', '//'))
It certainly can be implemented with a lot of more comfort, but it is quick and easy and very readable.
Python 3
Generator version:
def chars(*args):
for a in args:
for i in range(ord(a[0]), ord(a[1])+1):
yield chr(i)
Or, if you like list comprehensions:
def chars(*args):
return [chr(i) for a in args for i in range(ord(a[0]), ord(a[1])+1)]
The first yields:
print(chars('ĀĈ'))
<generator object chars at 0x7efcb4e72308>
print(list(chars('ĀĈ')))
['Ā', 'ā', 'Ă', 'ă', 'Ą', 'ą', 'Ć', 'ć', 'Ĉ']
while the second yields:
print(chars('ĀĈ'))
['Ā', 'ā', 'Ă', 'ă', 'Ą', 'ą', 'Ć', 'ć', 'Ĉ']
It is really convenient:
base64chars = list(chars('AZ', 'az', '09', '++', '//'))
for a in base64chars:
print(repr(a),end='')
print('')
for a in base64chars:
print(repr(a),end=' ')
outputs
'A''B''C''D''E''F''G''H''I''J''K''L''M''N''O''P''Q''R''S''T''U''V''W''X''Y''Z''a''b''c''d''e''f''g''h''i''j''k''l''m''n''o''p''q''r''s''t''u''v''w''x''y''z''0''1''2''3''4''5''6''7''8''9''+''/'
'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O' 'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'W' 'X' 'Y' 'Z' 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' 'k' 'l' 'm' 'n' 'o' 'p' 'q' 'r' 's' 't' 'u' 'v' 'w' 'x' 'y' 'z' '0' '1' '2' '3' '4' '5' '6' '7' '8' '9' '+' '/'
Why the list()? Without base64chars might become a generator (depending on the implementation you chose) and thus can only be used in the very first loop.
Python 2
Similar can be archived with Python 2. But it is far more complex if you want to support Unicode, too. To encourage you to stop using Python 2 in favor of Python 3 I do not bother to provide a Python 2 solution here ;)
Try to avoid Python 2 today for new projects. Also try to port old projects to Python 3 first before extending them - in the long run it will be worth the effort!
Proper handling of Unicode in Python 2 is extremely complex, and it is nearly impossible to add Unicode support to Python 2 projects if this support was not build in from the beginning.
Hints how to backport this to Python 2:
Use xrange instead of range
Create a 2nd function (unicodes?) for handling of Unicode:
Use unichr instead of chr to return unicode instead of str
Never forget to feed unicode strings as args to make ord and array subscript work properly
for character in map( chr, xrange( ord('a'), ord('c')+1 ) ):
print character
prints:
a
b
c
# generating 'a to z' small_chars.
small_chars = [chr(item) for item in range(ord('a'), ord('z')+1)]
# generating 'A to Z' upper chars.
upper_chars = [chr(item).upper() for item in range(ord('a'), ord('z')+1)]
For Uppercase Letters:
for i in range(ord('A'), ord('Z')+1):
print(chr(i))
For Lowercase letters:
for i in range(ord('a'), ord('z')+1):
print(chr(i))
Inspired from the top post above, I came up with this :
map(chr,range(ord('a'),ord('z')+1))
Using #ned-batchelder's answer here, I'm amending it a bit for python3
def char_range(c1, c2):
"""Generates the characters from `c1` to `c2`, inclusive."""
"""Using range instead of xrange as xrange is deprecated in Python3"""
for c in range(ord(c1), ord(c2)+1):
yield chr(c)
Then same thing as in Ned's answer:
for c in char_range('a', 'z'):
print c
Thanks Ned!
i had the same need and i used this :
chars = string.ascii_lowercase
range = list(chars)[chars.find('a'):chars.find('k')+1]
Hope this will Help Someone
Use "for count in range" and chr&ord:
print [chr(ord('a')+i) for i in range(ord('z')-ord('a'))]
Use list comprehension:
for c in [chr(x) for x in range(ord('a'), ord('z'))]:
print c
Another option (operates like range - add 1 to stop if you want stop to be inclusive)
>>> import string
>>> def crange(arg, *args):
... """character range, crange(stop) or crange(start, stop[, step])"""
... if len(args):
... start = string.ascii_letters.index(arg)
... stop = string.ascii_letters.index(args[0])
... else:
... start = string.ascii_letters.index('a')
... stop = string.ascii_letters.index(arg)
... step = 1 if len(args) < 2 else args[1]
... for index in range(start, stop, step):
... yield string.ascii_letters[index]
...
>>> [_ for _ in crange('d')]
['a', 'b', 'c']
>>>
>>> [_ for _ in crange('d', 'g')]
['d', 'e', 'f']
>>>
>>> [_ for _ in crange('d', 'v', 3)]
['d', 'g', 'j', 'm', 'p', 's']
>>>
>>> [_ for _ in crange('A', 'G')]
['A', 'B', 'C', 'D', 'E', 'F']
Depending on how complex the range of characters is, a regular expression may be convenient:
import re
import string
re.findall("[a-f]", string.printable)
# --> ['a', 'b', 'c', 'd', 'e', 'f']
re.findall("[n-qN-Q]", string.printable)
# --> ['n', 'o', 'p', 'q', 'N', 'O', 'P', 'Q']
This works around the pesky issue of accidentally including the punctuation characters in between numbers, uppercase and lowercase letters in the ASCII table.