Replacing Multiple Letters in a String with Each Other in Python [duplicate] - python

This question already has answers here:
Replace multiple elements in string with str methods
(2 answers)
Closed 8 years ago.
So I understand how to use str.replace() to replace single letters in a string, and I also know how to use the following replace_all function:
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i,j)
return text
But I am trying to replace letters with each other. For example replace each A with T and each T with A, each C with G and each G with C, but I end up getting a string composed of only two letters, either A and G or C and T, for example, and I know the output should be composed of four letters. Here is the code I have tried (I'd rather avoid built in functions):
d={'A': 'T', 'C': 'G', 'A': 'T', 'G': 'C'}
DNA_String = open('rosalind_rna.txt', 'r')
DNA_String = DNA_String.read()
reverse = str(DNA_String[::-1])
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i,j)
return text
complement = replace_all(reverse, d)
print complement
I also tried using:
complement = str.replace(reverse, 'A', 'T')
complement = str.replace(reverse, 'T', 'A')
complement = str.replace(reverse, 'G', 'C')
complement = str.replace(reverse, 'C', 'G')
But I end up getting a string that is four times as long as it should be.
I've also tried:
complement = str.replace(reverse, 'A', 'T').replace(reverse, 'T', 'A').replace(reverse, 'G', 'C')str.replace(reverse, 'C', 'G')
But I get an error message that an integer input is needed.

You can map each letter to another letter.
>>> M = {'A':'T', 'T':'A', 'C':'G', 'G':'C'}
>>> STR = 'CGAATT'
>>> S = "".join([M.get(c,c) for c in STR])
>>> S
'GCTTAA'

You should probably use str.translate for this. Use string.maketrans to create an according transition table.
>>> import string
>>> d ={'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
>>> s = "ACTG"
>>> _from, to = map(lambda t: ''.join(t), zip(*d.items()))
>>> t = string.maketrans(_from, to)
>>> s.translate(t)
'TGAC'
By the way, the error you get with this line
complement = str.replace(reverse, 'A', 'T').replace(reverse, 'T', 'A')...
is that you are explicitly passing the self keyword when it is passed implicitly. Doing str.replace(reverse, 'A', 'T') is equivalent to reverse.replace('A', 'T'). Accordingly, when you do str.replace(...).replace(reverse, 'T', 'A'), this is equivalent to str.replace(str.replace(...), reverse, 'T', 'A'), i.e. the result of the first replace is inserted as self in the other replace, and the other parameters are shifted and the 'A' is interpreted as the count parameter, which has to be an int.

I think this is happening because you're replacing all the As with Ts and then replacing all those Ts (as well as those in the original string) with As. Try replacing with lower-case letters and then converting the whole string with upper():
dic = {'A': 't', 'T': 'a', 'C': 'g', 'G': 'c'}
text = 'GATTCCACCGT'
for i, j in dic.iteritems():
text = text.replace(i,j)
text = text.upper()
gives:
'CTAAGGTGGCA'

Related

Loop over letters in a string that contains the alphabet to determine which are missing from a dictionary

I am very new to python and trying to find the solution to this for a class.
I need the function missing_letters to take a list, check the letters using histogram and then loop over the letters in alphabet to determine which are missing from the input parameter. Finally I need to print the letters that are missing, in a string.
alphabet = "abcdefghijklmnopqrstuvwxyz"
test = ["one","two","three"]
def histogram(s):
d = dict()
for c in s:
if c not in d:
d[c] = 1
else:
d[c] += 1
return d
def missing_letter(s):
for i in s:
checked = (histogram(i))
As you can see I haven't gotten very far, at the moment missing_letters returns
{'o': 1, 'n': 1, 'e': 1}
{'t': 1, 'w': 1, 'o': 1}
{'t': 1, 'h': 1, 'r': 1, 'e': 2}
I now need to loop over alphabet to check which characters are missing and print. Any help and direction will be much appreciated. Many thanks!
You can use set functions in python, which is very fast and efficient:
alphabet = set('abcdefghijklmnopqrstuvwxyz')
s1 = 'one'
s2 = 'two'
s3 = 'three'
list_of_missing_letters = set(alphabet) - set(s1) - set(s2) - set(s3)
print(list_of_missing_letters)
Or like this:
from functools import reduce
alphabet = set('abcdefghijklmnopqrstuvwxyz')
list_of_strings = ['one', 'two', 'three']
list_of_missing_letters = set(alphabet) - \
reduce(lambda x, y: set(x).union(set(y)), list_of_strings)
print(list_of_missing_letters)
Or using your own histogram function:
alphabet = "abcdefghijklmnopqrstuvwxyz"
test = ["one", "two", "three"]
def histogram(s):
d = dict()
for c in s:
if c not in d:
d[c] = 1
else:
d[c] += 1
return d
def missing_letter(t):
test_string = ''.join(t)
result = []
for l in alphabet:
if l not in histogram(test_string).keys():
result.append(l)
return result
print(missing_letter(test))
Output:
['a', 'b', 'c', 'd', 'f', 'g', 'i', 'j', 'k', 'l', 'm', 'p', 'q', 's', 'u', 'v', 'x', 'y', 'z']
from string import ascii_lowercase
words = ["one","two","three"]
letters = [l.lower() for w in words for l in w]
# all letters not in alphabet
letter_str = "".join(x for x in ascii_lowercase if x not in letters)
Output:
'abcdfgijklmpqsuvxyz'
It is not the easiest question to understand, but from what I can gather you require all the letters of the alphabet not in the input to be returned in console.
So a loop as opposed to functions which have been already shown would be:
def output():
output = ""
for i in list(alphabet):
for key in checked.keys():
if i != key:
if i not in list(output):
output += i
print(output)
Sidenote: Please either make checked a global variable or put it outside of function so this function can use it

prints results only when string item index == list item index

I've been working on a little project of mine lately, but I've run into a problem that I'm stuck at. I've already checked various places, but I couldn't really find what I'm looking for. This is my code:
special_alphabet = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u,
v, w, x, y, z]
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k','l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
name = input('Please insert your name: ')
item_alphabet = -1
item_special_alphabet = -1
index = -1
for item in name:
item_alphabet = item_alphabet + 1
item_special_alphabet = item_special_alphabet + 1
index = index + 1
if alphabet[item_alphabet] == name[index]:
print(special_alphabet[item_special_alphabet])
The special_alphabet list contains the special characters that I have in variables. I didn't display them because they're too long, but they're there.
The problem I'm having right now is that when I run this code and type in my input, it does actually check the 'name' (string) I've inserted, it just does it in order of the list(alphabet basically). So when I enter: Amine, it only returns the special character for A (because it's the first (0) in both the string and the list) and E (same reason, just it's fifth.)
What I'm looking for is how to make it go through the whole list without any order whatsoever and check all the items in it before running the if statement and printing out the special characters.
Thank you in advance.
You can use str.maketrans() and str.translate() for these kinds of translation jobs:
trans_tab = str.maketrans(dict(zip(alphabet, special_alphabet)))
name = input('Please insert your name: ')
translated_name = name.translate(trans_tab)
print(translated_name)
If you pass str.maketrans() a single dictionary argument that consists of strings of length 1 as keys and arbitrary length strings as values, it'll build you a translation table usable with str.translate(), which creates a new copy of the string where each character has been mapped through the given translation table.
For example:
In [15]: trans = str.maketrans({
...: 'A': 'A ',
...: 'm': 'M ',
...: 'i': 'I ',
...: 'n': 'N ',
...: 'e': 'E '
...: })
In [16]: input("> ").translate(trans)
> Amine
Out[16]: 'A M I N E '
A dictionary of alphabet and spacial alphabet may be a best design
in your case try;
for item in name:
ind = alphabet.find(item)
if ind != -1:
print(special_alphabet[ind])

Returning the value of an index in a python list based on other values

I have put the letters a-z in a list. How would I find the value of an item in the list depending on what the user typed?
For example if they type the letter a it would return c, f would return h and x would return z.
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
newletters = []
offset = 2
userInput = input('type a string')
newvalue = chr(ord(userInput)+offset)
split = list(newvalue)
print split
the above works for a character but not for a string..help?!
You can try this:
>>> offset = 2
>>> aString = raw_input("digit a letter: ")
>>> aString
'a'
>>> chr(ord(aString)+offset)
'c'
documentation:
https://docs.python.org/2/library/functions.html#chr
https://docs.python.org/2/library/functions.html#ord
If you want to iterate over an entire string, a simple way is using a for loop. I assume the input string is always lowercase.
EDIT2: I improved the solution to handle the case when a letter is 'y' or 'z' and without "rotation" should begin a not alphabetic character, eg:
# with only offset addiction this return a non-alphabetic character
>>> chr(ord('z')+2)
'|'
# the 'z' rotation return the letter 'b'
>>> letter = "z"
>>> ord_letter = ord(letter)+offset
>>> ord_letter_rotated = ((ord_letter - 97) % 26) + 97
>>> chr(ord_letter_rotated)
'b'
The code solution:
offset = 2
aString = raw_input("digit the string to convert: ")
#aString = "abz"
newString = ""
for letter in aString:
ord_letter = ord(letter)+offset
ord_letter_rotated = ((ord_letter - 97) % 26) + 97
newString += chr(ord_letter_rotated)
print newString
The output of this code for the entire lowercase alphabet:
cdefghijklmnopqrstuvwxyzab
Note: you can obtain the lowercase alphabet for free also this way:
>>> import string
>>> string.lowercase
'abcdefghijklmnopqrstuvwxyz'
See the wikipedia page to learn something about ROT13:
https://en.wikipedia.org/wiki/ROT13
What should happen for z? Should it become b?
You can use Python's maketrans and translate functions to do this as follows:
import string
def rotate(text, by):
s_from = string.ascii_lowercase
s_to = string.ascii_lowercase[by:] + string.ascii_lowercase[:by]
cypher_table = string.maketrans(s_from, s_to)
return text.translate(cypher_table)
user_input = raw_input('type a string: ').lower()
print rotate(user_input, 2)
This works on the whole string as follows:
type a string: abcxyz
cdezab
How does it work?
If you print s_from and s_to they look as follows:
abcdefghijklmnopqrstuvwxyz
cdefghijklmnopqrstuvwxyzab
maketrans creates a mapping table to map characters in s_from to s_to. translate then applies this mapping to your string.

Reverse complement of DNA strand using Python

I have a DNA sequence and would like to get reverse complement of it using Python. It is in one of the columns of a CSV file and I'd like to write the reverse complement to another column in the same file. The tricky part is, there are a few cells with something other than A, T, G and C. I was able to get reverse complement with this piece of code:
def complement(seq):
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
bases = list(seq)
bases = [complement[base] for base in bases]
return ''.join(bases)
def reverse_complement(s):
return complement(s[::-1])
print "Reverse Complement:"
print(reverse_complement("TCGGGCCC"))
However, when I try to find the item which is not present in the complement dictionary, using the code below, I just get the complement of the last base. It doesn't iterate. I'd like to know how I can fix it.
def complement(seq):
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
bases = list(seq)
for element in bases:
if element not in complement:
print element
letters = [complement[base] for base in element]
return ''.join(letters)
def reverse_complement(seq):
return complement(seq[::-1])
print "Reverse Complement:"
print(reverse_complement("TCGGGCCCCX"))
The other answers are perfectly fine, but if you plan to deal with real DNA sequences I suggest using Biopython. What if you encounter a character like "-", "*" or indefinitions? What if you want to do further manipulations of your sequences? Do you want to create a parser for each file format out there?
The code you ask for is as easy as:
from Bio.Seq import Seq
seq = Seq("TCGGGCCC")
print seq.reverse_complement()
# GGGCCCGA
Now if you want to do another transformations:
print seq.complement()
print seq.transcribe()
print seq.translate()
Outputs
AGCCCGGG
UCGGGCCC
SG
And if you run into strange chars, no need to keep adding code to your program. Biopython deals with it:
seq = Seq("TCGGGCCCX")
print seq.reverse_complement()
# XGGGCCCGA
In general, a generator expression is simpler than the original code and avoids creating extra list objects. If there can be multiple-character insertions go with the other answers.
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
seq = "TCGGGCCC"
reverse_complement = "".join(complement.get(base, base) for base in reversed(seq))
import string
old_chars = "ACGT"
replace_chars = "TGCA"
tab = string.maketrans(old_chars,replace_chars)
print "AAAACCCGGT".translate(tab)[::-1]
that will give you the reverse compliment = ACCGGGTTTT
The get method of a dictionary allows you to specify a default value if the key is not in the dictionary. As a preconditioning step I would map all your non 'ATGC' bases to single letters (or punctuation or numbers or anything that wont show up in your sequence), then reverse the sequence, then replace the single letter alternates with their originals. Alternatively, you could reverse it first and then search and replace things like sni with ins.
alt_map = {'ins':'0'}
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
def reverse_complement(seq):
for k,v in alt_map.iteritems():
seq = seq.replace(k,v)
bases = list(seq)
bases = reversed([complement.get(base,base) for base in bases])
bases = ''.join(bases)
for k,v in alt_map.iteritems():
bases = bases.replace(v,k)
return bases
>>> seq = "TCGGinsGCCC"
>>> print "Reverse Complement:"
>>> print(reverse_complement(seq))
GGGCinsCCGA
The fastest one liner for reverse complement is the following:
def rev_compl(st):
nn = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
return "".join(nn[n] for n in reversed(st))
def ReverseComplement(Pattern):
revcomp = []
x = len(Pattern)
for i in Pattern:
x = x - 1
revcomp.append(Pattern[x])
return ''.join(revcomp)
# this if for the compliment
def compliment(Nucleotide):
comp = []
for i in Nucleotide:
if i == "T":
comp.append("A")
if i == "A":
comp.append("T")
if i == "G":
comp.append("C")
if i == "C":
comp.append("G")
return ''.join(comp)
Give a try to below code,
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
seq = "TCGGGCCC"
reverse_complement = "".join(complement.get(base, base) for base in reversed(seq))
Considering also degenerate bases:
def rev_compl(seq):
BASES ='NRWSMBDACGTHVKSWY'
return ''.join([BASES[-j] for j in [BASES.find(i) for i in seq][::-1]])
This may be the quickest way to complete a reverse compliment:
def complement(seq):
complementary = { 'A':'T', 'T':'A', 'G':'C','C':'G' }
return ''.join(reversed([complementary[i] for i in seq]))
Using the timeit module for speed profiling, this is the fastest algorithm I came up with with my coworkers for sequences < 200 nucs:
sequence \
.replace('A', '*') \ # Temporary symbol
.replace('T', 'A') \
.replace('*', 'T') \
.replace('C', '&') \ # Temporary symbol
.replace('G', 'C') \
.replace('&', 'G')[::-1]

How could I print out the nth letter of the alphabet in Python?

ASCII math doesn't seem to work in Python:
'a' + 5
DOESN'T WORK
How could I quickly print out the nth letter of the alphabet without having an array of letters?
My naive solution is this:
letters = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
print letters[5]
chr and ord convert characters from and to integers, respectively. So:
chr(ord('a') + 5)
is the letter 'f'.
ASCII math aside, you don't have to type your letters table by hand.
The string constants in the string module provide what you were looking for.
>>> import string
>>> string.ascii_uppercase[5]
'F'
>>>
chr(ord('a')+5)
​​​​​​​​​​​​​​​​​​​
if u want to go really out of the way (probably not very good) you could create a new class CharMath:
class CharMath:
def __init__(self,char):
if len(char) > 1: raise IndexError("Not a single character provided")
else: self.char = char
def __add__(self,num):
if type(num) == int or type(num) == float: return chr(ord(self.char) + num)
raise TypeError("Number not provided")
The above can be used:
>>> CharMath("a") + 5
'f'
import string
print string.letters[n + is_upper*26]
For example:
>>> n = 5
>>> is_upper = False
>>> string.letters[n+is_upper*26]
'f'
>>> is_upper = True
>>> string.letters[n+is_upper*26]
'F'
You need to use the ord function, like
print(ord('a')-5)
Edit: gah, I was too slow :)
If you like it short, try the lambda notation:
>>> A = lambda c: chr(ord('a')+c)
>>> A(5)
'f'

Categories

Resources