I have been working on Rosalind exercises for Bioinformatics stronghold on RNA Splicing. I am currently using Python 3.6 version. It didn't tell me there is any error in my code, so I'm assuming my code is fine. However, there is no output produced, no error warning or whatsoever. Below is my code:
DNA_CODON_TABLE = {
'TTT': 'F', 'CTT': 'L', 'ATT': 'I', 'GTT': 'V',
'TTC': 'F', 'CTC': 'L', 'ATC': 'I', 'GTC': 'V',
'TTA': 'L', 'CTA': 'L', 'ATA': 'I', 'GTA': 'V',
'TTG': 'L', 'CTG': 'L', 'ATG': 'M', 'GTG': 'V',
'TCT': 'S', 'CCT': 'P', 'ACT': 'T', 'GCT': 'A',
'TCC': 'S', 'CCC': 'P', 'ACC': 'T', 'GCC': 'A',
'TCA': 'S', 'CCA': 'P', 'ACA': 'T', 'GCA': 'A',
'TCG': 'S', 'CCG': 'P', 'ACG': 'T', 'GCG': 'A',
'TAT': 'Y', 'CAT': 'H', 'AAT': 'N', 'GAT': 'D',
'TAC': 'Y', 'CAC': 'H', 'AAC': 'N', 'GAC': 'D',
'TAA': '-', 'CAA': 'Q', 'AAA': 'K', 'GAA': 'E',
'TAG': '-', 'CAG': 'Q', 'AAG': 'K', 'GAG': 'E',
'TGT': 'C', 'CGT': 'R', 'AGT': 'S', 'GGT': 'G',
'TGC': 'C', 'CGC': 'R', 'AGC': 'S', 'GGC': 'G',
'TGA': '-', 'CGA': 'R', 'AGA': 'R', 'GGA': 'G',
'TGG': 'W', 'CGG': 'R', 'AGG': 'R', 'GGG': 'G'
}
def result(s):
result = ''
lines = s.split()
dna = lines[0]
introns = lines[1:]
for intron in introns:
dna = dna.replace(intron, '')
for i in range(0, len(dna), 3):
codon = dna[i:i+3]
protein = None
if codon in DNA_CODON_TABLE:
protein = DNA_CODON_TABLE[codon]
if protein == '-':
break
if protein:
result += protein
return ''.join(list(result))
if __name__ == "__main__":
"""small_dataset = ' '"""
large_dataset = open('rosalind_splc.txt').read().strip()
print (result(large_dataset))
This is the content in rosalind_splc.txt text file:
>Rosalind_3363
ATGGGGCTGAGCCCATGTCTAAATGATATCTTGGTGCATTGCAATCTAACTATTTTTTCG
CAACCATGTTCCATCTGGCGCAAAATGGGCGTGTAGGGAGCTTCGCTATAGTCACTGAAG
AACATTCGCAACTTACAGCTCTCGAGAGGGTACAGCTGGACGGTGTTTGTTTGGTCTAAG
TCTGAGTCCAAAGTCGTTGAATGTCGAGCTAGGTTGACGTCATTCTTCGAGTTACGTCTT
CATTGATTCGCGGCGGCCGCCAGCATTTGATTGTACACATCCGACGTCTTTGGCAATCTA
CATAATTATATTGAGAGGGGCGCCATTACTCGAACCCATAACAAACAACTGTCCGTTTAC
AAGGTTATATTATCATGACCTAATGGTTGAGCTACGGAGTGGGGGGCCCTCGGCTACAGG
TGTTAAACTATCCTGCGGATGCGGATCTTAGCCCGATTTGCATGGCCCAGTAAGGCGCTG
ATTGTAAACCGCCTAGCATACATGTGCTTCTTACTCCAGGGTCCATTGCTACCAGTTCGC
TTCTGACGCCTCAATTGTACCTTCCTTTTTTGAATGGCAACCTGCAATAGCAGTCGACTG
ATGGGGCGTTACAGTATGAAGGCTATATTTACATTATCTCTAAACACACTGCTACCGCGA
AACCCCAACTCGGACCGGTCAGAGCGCTCGTGCTTTGTTCTTGGTCGCTAGCGACCAACA
GTGGATAGGTGGGCGCGGGCCTTGCACCTCCTAGAGCATCACGTGGAGTGGATGCAAACA
GTCTATGGTCCCCCGCTTCGGCTCACGGGTAACGTCTCTTGTGGTACTAGACCATAGGCA
TCCAGGTGAGGGCTACATCCGTATTTAATGAAACTGAGTTCCTCCAAAGCTCCTCGGGAC
GCAGGCAGGTTCATCCGCAGTCAGTAAGGGAGGGAAGAGCTTTCCCCGTTCCACCCAGAT
GCCCTGTGCACGGGAGAGAGATCCAGGTGGTAG
>Rosalind_0423
TCGCAACTTACAGCTCTCGAGAGGG
>Rosalind_5768
GCCCAGTAAGGCGCTGATTGTAAACCGCCTAGCATACAT
>Rosalind_6780
GTCTTCATTGATTCGCGGCGGCCGCCAGCA
>Rosalind_6441
GCAAACAGTCT
>Rosalind_3315
TTGGTCGCTAGCGACCAACAGTGGATAGGTGGGCGCGGGCCTTGCACCT
>Rosalind_7467
TTATCTCTAAACACACTGC
>Rosalind_3159
CGCAGTCAGTAAGGGAGG
>Rosalind_6420
TCTAAGTCTGAGTCCAAAGTCGTTGAATGTCGAGCTAGGTTGACGT
>Rosalind_8344
GGGGCGCCATTACTCGAACCCATAACAAACAACT
>Rosalind_2993
CCAGGTGAGGGCTACATCCGTAT
>Rosalind_0536
ATTATCATGACCTAATG
>Rosalind_3774
TCGCAACCATGTTCCAT
>Rosalind_7168
GGGCCCTCGGCTACAGGTGTTAAACTAT
>Rosalind_8059
CAATTGTACCTTCCTTTTTTGAATG
Since there is no output given, I would like to know which part of my code need to be fixed in order for the output to come out. Thanks.
To understand which part of your code you need to change, it helps to understand what goes wrong in your code. If you have a code editor with a debugger, it helps to step through the code. If you don't have one, you can use the online tool http://pythontutor.com. Here is a direct link to your code with the first few lines of your input.
Click on the forward button under the code. At step 20 you jump into your function result(). After step 24 your input is split on the newlines. You can see that lines is now:
lines = ['>Rosalind_3363',
'ATGGGGCTGAGCCCATGTCTAAATGATATCTTGGTGCATTGCAATCTAACTATTTTTTCG',
'CAACCATGTTCCATCTGGCGCAAAATGGGCGTGTAGGGAGCTTCGCTATAGTCACTGAAG',
'>Rosalind_0423',
'TCGCAACTTACAGCTCTCGAGAGGG',
'>Rosalind_5768',
'GCCCAGTAAGGCGCTGATTGTAAACCGCCTAGCATACAT']
In step 25, you assign the first item of lines to the variable dna. So dna is now equal to >Rosalind_3363. You assign the rest of the items in the list to the variable introns in the next step. So now we have
dna = '>Rosalind_3363'
introns = ['ATGGGGCTGAGCCCATGTCTAAATGATATCTTGGTGCATTGCAATCTAACTATTTTTTCG',
'CAACCATGTTCCATCTGGCGCAAAATGGGCGTGTAGGGAGCTTCGCTATAGTCACTGAAG',
'>Rosalind_0423',
'TCGCAACTTACAGCTCTCGAGAGGG',
'>Rosalind_5768',
'GCCCAGTAAGGCGCTGATTGTAAACCGCCTAGCATACAT']
Here the first signs of trouble are already apparent. You probably expect dna to contain a DNA sequence. But it contains the sequence header of the FASTA file. Similarly, introns should only contain DNA sequences as well, but here they also contains FASTA sequence headers (>Rosalind_0423, >Rosalind_5768).
So what happens in the next lines doesn't make any sense anymore with the data you have now.
In the lines
for intron in introns:
dna = dna.replace(intron, '')
you want to remove the introns from the DNA, but dna doesn't contain a DNA sequence string and introns contains other things than substrings of dna. So after this loop, dna still equals >Rosalind_3363. None of the three letter sequences of dna (>Ro, sal, ind, ...) are valid codons, so they are not found in DNA_CODON_TABLE. And hence, result() returns an empty string.
Now my guess as to what happened. You lifted the code verbatim from the internet (it is exactly equal to the code here) without understanding what it does and without realizing that the original author had already preprocessed the input data.
So, what do you need to do to fix the code?
parse the FASTA file, for example using Bio.SeqIO.parse()
If necessary, concatenate the DNA strings of the first sequence. This is what should end up in your dnavariable
the following sequence strings are what should end up in your introns variable.
Related
I am trying to make code that breaks a password that I create, at first I got it to just make random answers to the password I created and eventually I would get the right one.
But I realized that if I could change the first letter of my answer and then when I had done all of the letters, change the second letter.
Ex: AA AB AC ... AY AZ BA BB BC.
I understand that I could make a loop to print every single letter, but how would I be able to change the first letter after I have gone through every letter.
I also need this to be able to break a password of any length so the loop would have to be able to change how many letters I need. I also need to get rid of the brackets and quotes in the output.
lower_upper_alphabet = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
while done == 0:
for i in range (int(passwordlen)):
for i in range(52):
for i in range(len(lower_upper_alphabet)):
characters2 = []
characters2.append(str(lower_upper_alphabet[next1]))
next1 += 1
print(characters2)
Output:
["A"]
["B"]
["C"]
["D"]
["E"]
["F"]
["G"]
["H"]
["I"]
["J"]
["K"]
["L"]
["M"]
["N"]
["O"]
["P"]
["Q"]
["R"]
["S"]
["T"]
["U"]
["V"]
["W"]
["X"]
["Y"]
["Z"]
I am trying to print a python list using join after it has randomly selected a specified amount of characters. What I want is for it to print all characters beside each other instead of printing each character on a separate line. Everything works fine up until my for statement, if I print out password_letters it will print (on separate lines) the specified amount based on nr_letters. All I want is to join/concatenate the specified letters onto one line. I have followed the documentation on here and some on google, but I still can't find where I have gone wrong.
Please help me find where I have gone wrong in the below code:
import random
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
nr_letters= int(input("How many letters would you like in your password?\n"))
password_letters = random.sample(letters, nr_letters )
for letter in password_letters:
print("".join(letter))
No need for a loop, just join the list.
print("".join(password_letters))
I want to write a really short script that will help me generate a random/nonsense word with the following qualities:
-Has 8 letters
-First letter is "A"
-Second and Fourth letters are random letters
-Fifth letter is a vowel
-Sixth and Seventh letters are random letters and are the same
-Eighth letter is a vowel that's not "a"
This is what I have tried so far (using all the info I could find and understand online)
firsts = 'A'
seconds = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
thirds = ['a', 'e', 'i', 'o', 'u', 'y']
fourths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
fifths = ['a', 'e', 'i', 'o', 'u', 'y']
sixths = sevenths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
eighths = ['e', 'i', 'o', 'u', 'y']
print [''.join(first, second, third, fourth, fifth)
for first in firsts
for second in seconds
for third in thirds
for fourth in fourths
for fifth in fifths
for sixth in sixths
for seventh in sevenths
for eighth in eighths]
However it keeps showing a SyntaxError: invalid syntax after the for and now I have absolutely no idea how to make this work. If possible please look into this for me, thank you so much!
So the magic function you need to know about to pick a random letter is random.choice. You can pass a list into this function and it will give you a random element from that list. It also works with strings because strings are basically a list of chars. Also to make your life easier, use string module. string.ascii_lowercase returns all the letters from a to z in a string so you don't have to type it out. Lastly, you don't use loops to join strings together. Keep it simple. You can just add them together.
import string
from random import choice
first = 'A'
second = choice(string.ascii_lowercase)
third = choice(string.ascii_lowercase)
fourth = choice(string.ascii_lowercase)
fifth = choice("aeiou")
sixthSeventh = choice(string.ascii_lowercase)
eighth = choice("eiou")
word = first + second + third + fourth + fifth + sixthSeventh + sixthSeventh + eighth
print(word)
Try this:
import random
sixth=random.choice(sixths)
s='A'+random.choice(seconds)+random.choice(thirds)+random.choice(fourths)+random.choice(fifths)+sixth+sixth+random.choice(eighths)
print(s)
Output:
Awixonno
Ahiwojjy
etc
There are several things to consider. First, the str.join() method takes in an iterable (e.g. a list), not a bunch of individual elements. Doing
''.join([first, second, third, fourth, fifth])
fixes the program in this respect. If you are using Python 3, print() is a function, and so you should add parentheses around the entire list comprehension.
With the syntax out of the way, let's get to a more interesting problem: Your program constructs every (82255680 !) possible word. This takes a long time and memory. What you want is probably to just pick one. You can of course do this by first constructing all, then picking one at random. It's far cheaper though to pick one letter from each of firsts, seconds, etc. at random and then collecting these. All together then:
import random
firsts = ['A']
seconds = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
thirds = ['a', 'e', 'i', 'o', 'u', 'y']
fourths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
fifths = ['a', 'e', 'i', 'o', 'u', 'y']
sixths = sevenths = ['a','b','c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
eighths = ['e', 'i', 'o', 'u', 'y']
result = ''.join([
random.choice(firsts),
random.choice(seconds),
random.choice(thirds),
random.choice(fourths),
random.choice(fifths),
random.choice(sixths),
random.choice(sevenths),
random.choice(eighths),
])
print(result)
To improve the code from here, try to:
Find a way to generate the "data" in a neater way than writing it out explicitly. As an example:
import string
seconds = list(string.ascii_lowercase) # you don't even need list()!
Instead of having a separate variable firsts, seconds, etc., collect these into a single variable, e.g. a single list containing each original list as a single str with all characters included.
This will implement what you describe. You can make the code neater by putting the choices into an overall list rather than have several different variables, but you will have to explicitly deal with the fact that the sixth and seventh letters are the same; they will not be guaranteed to be the same simply because there are the same choices available for each of them.
The list choices_list could contain sub-lists per your original code, but as you are choosing single characters it will work equally with strings when using random.choice and this also makes the code a bit neater.
import random
choices_list = [
'A',
'abcdefghijklmnopqrstuvwxyz',
'aeiouy',
'abcdefghijklmnopqrstuvwxyz',
'aeiouy',
'abcdefghijklmnopqrstuvwxyz',
'eiouy'
]
letters = [random.choice(choices) for choices in choices_list]
word = ''.join(letters[:6] + letters[5:]) # here the 6th letter gets repeated
print(word)
Some example outputs:
Alaeovve
Aievellu
Ategiwwo
Aeuzykko
Here's the syntax fix:
print(["".join([first, second, third])
for first in firsts
for second in seconds
for third in thirds])
This method might take up a lot of memory.
Here is the code:
test = "\n".join(["gym", "meetup", "Christian associations"])
print(sorted(test, reverse=True))
Can someone please explain why I'm getting the following below, instead of reverse of the above list. It's weird how this long list of gibberish appeared.
['y', 'u', 't', 't', 't', 's', 's', 's', 's', 'r', 'p', 'o', 'o', 'n', 'n', 'm', 'm', 'i', 'i', 'i', 'i', 'h', 'g', 'e', 'e', 'c', 'a', 'a', 'a', 'C', ' ', '\n', '\n']
if you want to reverse list you can try with this code,
test = ["gym", "meetup", "Christian associations"]
test.reverse()
print(test)
In your code, you combined a list to string. Then, pass that string to sort function, that sorts the alphabets of string, not the elements of the list.
Explanation for your code:
You first line produces the following string: "gym\nmeetup\nChristian associations"
Your second line takes previous string as list of characters, sorts characters by their unicode value and returns a list of characters.
Hope that's clear.
Are you trying to achieve something like this:
print('\n'.join(sorted(["gym", "meetup", "Christian associations"], reverse=True)))
Output:
meetup
gym
Christian associations
I am working through the 'Rosalind' problems and I've become stuck on what the issue with my code is... The problem is:
Either strand of a DNA double helix can serve as the coding strand for
RNA transcription. Hence, a given DNA string implies six total reading
frames, or ways in which the same region of DNA can be translated into
amino acids: three reading frames result from reading the string
itself, whereas three more result from reading its reverse complement.
An open reading frame (ORF) is one which starts from the start codon
and ends by stop codon, without any other stop codons in between.
Thus, a candidate protein string is derived by translating an open
reading frame into amino acids until a stop codon is reached.
Given: A DNA string s of length at most 1 kbp in FASTA format.
Return: Every distinct candidate protein string that can be translated
from ORFs of s. Strings can be returned in any order.
Here is my code (Python):
DNA_Codons = {
'TTT': 'F', 'CTT': 'L', 'ATT': 'I', 'GTT': 'V',
'TTC': 'F', 'CTC': 'L', 'ATC': 'I', 'GTC': 'V',
'TTA': 'L', 'CTA': 'L', 'ATA': 'I', 'GTA': 'V',
'TTG': 'L', 'CTG': 'L', 'ATG': 'M', 'GTG': 'V',
'TCT': 'S', 'CCT': 'P', 'ACT': 'T', 'GCT': 'A',
'TCC': 'S', 'CCC': 'P', 'ACC': 'T', 'GCC': 'A',
'TCA': 'S', 'CCA': 'P', 'ACA': 'T', 'GCA': 'A',
'TCG': 'S', 'CCG': 'P', 'ACG': 'T', 'GCG': 'A',
'TAT': 'Y', 'CAT': 'H', 'AAT': 'N', 'GAT': 'D',
'TAC': 'Y', 'CAC': 'H', 'AAC': 'N', 'GAC': 'D',
'TAA': '-', 'CAA': 'Q', 'AAA': 'K', 'GAA': 'E',
'TAG': '-', 'CAG': 'Q', 'AAG': 'K', 'GAG': 'E',
'TGT': 'C', 'CGT': 'R', 'AGT': 'S', 'GGT': 'G',
'TGC': 'C', 'CGC': 'R', 'AGC': 'S', 'GGC': 'G',
'TGA': '-', 'CGA': 'R', 'AGA': 'R', 'GGA': 'G',
'TGG': 'W', 'CGG': 'R', 'AGG': 'R', 'GGG': 'G'
}
bases={"A":"T",
"T":"A",
"G":"C",
"C":"G"}
def Pro(DNA, start, Rev):
#Calculates the Reverse compliment if using
if Rev == True:
reverse=DNA[::-1]
compliment=[]
for base in reverse:
compliment+=bases[base]
Seq="".join(compliment)
elif Rev== False:
Seq=DNA
Protein=[]
#Finds a start codon
for i in range(start, len(Seq),3):
codon=Seq[i:i+3]
if codon=="ATG":
#Starting from that start codon, returns a protein, breaks if stop codon
#-2 included so that it's always in blocks of 3
for j in range(i,len(Seq)-2,3):
new_codon=Seq[j:j+3]
if DNA_Codons[new_codon]!="-":
Protein+=[DNA_Codons[new_codon]]
else:
#Adds in the '-' to split proteins that start within the same Reading Frame
Protein+=[DNA_Codons[new_codon]]
break
return Protein
f = open('rosalind_orf.txt','r').read()
#Puts each FASTA String into an arrary
strings=f.split(">")
#removes the FASTA ID from the string in array and new line characters
for i in range(len(strings)):
strings[i]=strings[i].strip("Rosalind_0123456789")
strings[i]=strings[i].replace("\n","")
DNA=strings[1]
#Adds proteins from all Open Reading Frames
Proteins=[]
for i in range(len(DNA)):
Proteins+="".join(Pro(DNA,i,False)).split('-')
Proteins+="".join(Pro(DNA,i,True)).split('-')
#Mades a list of Unique Proteins and prints them
Unique_Proteins=[]
for p in Proteins:
if (p not in Unique_Proteins and p!=""):
Unique_Proteins+=[p]
print p
Using the sample data:
Rosalind_99 AGCCATGTAGCTAACTCAGGTTACATGGGGATGACCCCGCGACTTGGATTAGAGTCTCTTTTGGAATAAGCCTGAATGATCCGAGTAGCATCTCAG
My code works fine, however for every question dataset I've been given it fails...
Here is one of the question datasets that I've failed on:
Rosalind_1485 GACCAGAATGCGTTAGTCGGCCTCAGAGCGCACAAAAACCAGTATTTACAAAGTGGGACG
TAGCGCCCCGCGGCGTCCTTTTGCCCTATCGAAAGTATAGGCATCAGCTTTTTACCACCT
TGTCATAGGTAAACTGCCCGACCCAGGTCCGGCCCTCAGCCCAACGCAGATAAACCAAGG
TTATAGATGTGGCCTGTAGGCATATTGCTCTTAATGTTATAAAGAGCGAAGCGTGGTCTC
GGTTTGTAAACATTAATCAAATTCCCAGGCACTAAGCCATGGTCGCCCCGGATTGGTTTT
CCGGTGTACGCATCGGTGGCAGCTGGAGGGGACAGTTTAGGTGCTGCAATTGAACATGAA
ACTGCACGAAAGGTGGGGTGGGCCGGATCTTGCGGGCCTCGAAAGGGTAGTGTTCCTCTG
CTATCTAGTCCAATTACCTGTAGTATATATGATCAGGCCGTCGGTTACTTAGCTAAGTAA
CCGACGGCCTGATCATCTCCTAGGAAATGGTCCTGAATGCGAACTAGGTTCCGTGGAATG
ATGGGGCCCAGAGGAAACCTGTACGCAATGGATCCCGGACAGATAGACCGGGAGGTCTTG
CAACCTCTTGTGGGAGTTACAGGCCGTACCTGAATTGCCCTCGTACCATTTGAAATGGTG
CGACGCCTGTACGCAACAATCGTTCGCCTGGATAATACAGACGGCCATTTCTGTAGGAAC
GATACCGTAACGCGACGTCAGGCATGACGTTAACTGCGTCACGTTTCATACCACTATGTG
AGGTACCCACTCCTTCATTTACCGCGAGATAAAGAGCCACCACCACCTTCTCTTGGTTTC
CATGCGCCGATCGGCTAAACGTGCATCACATTCAGGCGAAGAGTCAAATGGAAGCTCGCA
ATTTTAGGCCTTTATGGCGAATATCCCGCAAGCCTTAGGCGCGT
Obviously this code is nowhere near efficient and there's lot that could be improved upon, I'm just curious as to why it's not working.