Consensus string & profile matrix - python

I have written the following python code to solve one of the Rosalind problems (http://rosalind.info/problems/cons/) and for some reason, Rosalind says the answer is wrong but I did some spot-checking and it appears right.
The problem is as follows:
Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)
A sample dataset is:
>Rosalind_1
ATCCAGCT
>Rosalind_2
GGGCAACT
>Rosalind_3
ATGGATCT
>Rosalind_4
AAGCAACC
>Rosalind_5
TTGGAACT
>Rosalind_6
ATGCCATT
>Rosalind_7
ATGGCACT
A sample solution is:
ATGCAACT
A: 5 1 0 0 5 5 0 0
C: 0 0 1 4 2 0 6 1
G: 1 1 6 3 0 1 0 0
T: 1 5 0 0 0 1 1 6
My attempt to solve this:
from Bio import SeqIO
A,C,G,T = [],[],[],[]
consensus=""
for i in range(0,len(record.seq)):
countA,countC,countG,countT=0,0,0,0
for record in SeqIO.parse("fasta.txt", "fasta"):
if record.seq[i]=="A":
countA=countA+1
if record.seq[i]=="C":
countC=countC+1
if record.seq[i]=="G":
countG=countG+1
if record.seq[i]=="T":
countT=countT+1
A.append(countA)
C.append(countC)
G.append(countG)
T.append(countT)
if countA >= max(countC,countG,countT):
consensus=consensus+"A"
elif countC >= max(countA,countG,countT):
consensus=consensus+"C"
elif countG >= max(countA,countC,countT):
consensus=consensus+"G"
elif countT >= max(countA,countC,countG):
consensus=consensus+"T"
print("A: "+" ".join([str(i) for i in A]))
print("C: "+" ".join([str(i) for i in C]))
print("G: "+" ".join([str(i) for i in G]))
print("T: "+" ".join([str(i) for i in T]))
print(consensus)
Would be great if someone can take a look and suggest what I am doing wrong? Many thanks!

For your consensus string, your code is not handling the case in which you have a tie, i.e., two nucleotides in a given position are equally frequent. The way your code is written now, this case will result in nothing being printed at that position in the consensus string

in this part
if countA >= max(countC,countG,countT):
consensus=consensus+"A"
elif countC >= max(countA,countG,countT):
consensus=consensus+"C"
elif countG >= max(countA,countC,countT):
consensus=consensus+"G"
elif countT >= max(countA,countC,countG):
consensus=consensus+"T"
Use this instead and you will get your Consensus sequences correctly
if countA[i] >= max(countC[i],countG[i],countT[i]):
consensus+="A"
if countC[i] >= max(countA[i],countG[i],countT[i]):
consensus+="C"
if countG[i] >= max(countA[i],countC[i],countT[i]):
consensus+="G"
if countT[i] >= max(countA[i],countC[i],countG[i]):
consensus+="T"

Related

What does N[1][i] do if N is an array and N[1][i] is inside a for loop? I am having trouble understanding

So I was looking at some python code and came across this piece of code: It supposedly scans for all instances of a number in a string, but I don't really get it. Here is my code:
inp = open("socdist1.in").read().strip().split()
print(inp)
n = int(inp[0])
cow_places = []
for i in range(n):
if (inp[1][i] == "1"):
cow_places.append(i);
print(cow_places)
Thanks Poke, Navaneeth Reddy and Hamza for answering my question
For people that are interested, the answer is that N[1][i] is part of two things: N[1] part is the 1st place of the array, and the N[i] scans that part for the thing you are looking for.
inp = [14, '10001001000010'] #14 is the length of the string '10001001000010'
print(inp[1]) #outputs '10001001000010'
print(inp[1][0]) #outputs '1'
print(inp[1][2]) #outputs '0'
for i in range(inp[0]): #inp[0] is the length of the string
print(inp[1][i]) #outputs the i-th character in '10001001000010'
The for loop output is as follows:
1
0
0
0
1
0
0
1
0
0
0
0
1
0

Why is it giving runtime error on codeforces for python?

So I am very new to code and have learnt the basics of python. I am testing out my skills on codeforces by first solving some of their easier problems. I am trying to do 158A on codeforces. I think I have got it because it passed a few tests I assigned. So, I tried submitting it and it told me something about a runtime error. I don't really know what it is so I would like it if someone could tell me what it is and how to fix it in my code. Here is the link to the problem: https://codeforces.com/problemset/problem/158/A
n = int(input())
k = int(input())
b = []
for i in range(n):
a = int(input())
b.append(a)
c = 0
for i in b:
if i >= b[k]:
c = c+1
else:
pass
print(c)
The input you are given is "8 5" and "10 9 8 7 7 7 5 5". That doesn't mean you are given "8" and "5" as two different inputs. It means you have a very long string that contains numbers separated by spaces. You should turn these into a list.
a = input()
n = int(a.split(" ")[0])
k = int(a.split(" ")[1])
a should equal "8 5". We then turn the string into a list using a.split(" "). This will produce ["8", "5"].
In the problem 158A the expected input are:
1. Two numbers (int) separated by a single space, where 1 ≤ k ≤ n ≤ 50
2. n space-separated integers, where ai ≥ ai + 1
There is also a condition: Scores MUST be positive (score>0) so you can advance
This is all you need, I tested it and I got the expected output everytime
a = input("Input n and k: ")
n = int(a.split(" ")[0])
k = int(a.split(" ")[1])
b = input("Input n scores: ")
willAdvance = 0
scores = b.split()
print(scores)
for element in scores:
if int(element) >= int(scores[k-1]) and int(scores[k-1]) > 0:
willAdvance += 1
print(willAdvance)
TEST
Input:
8 5
10 9 8 7 7 7 5 5
Output:
6
Input:
4 6
0 0 0 0
Output:
0

problem calculating minimum amount of coins in change using python

I have a homework assignment in which we have to write a program that outputs the change to be given by a vending machine using the lowest number of coins. E.g. £3.67 can be dispensed as 1x£2 + 1x£1 + 1x50p + 1x10p + 1x5p + 1x2p.
However, my program is outputting the wrong numbers. I know there will probably be rounding issues, but I think the current issue is to do with my method of coding this.
change=float(input("Input change"))
twocount=0
onecount=0
halfcount=0
pttwocount=0
ptonecount=0
while change!=0:
if change-2>-1:
change=change-2
twocount+=1
else:
if change-1>-1:
change=change-1
onecount+=1
else:
if change-0.5>-1:
change=change-0.5
halfcount+=1
else:
if change-0.2>-1:
change=change-0.2
pttwocount+=1
else:
if change-0.1>-1:
change=change-0.1
ptonecount+=1
else:
break
print(twocount,onecount,halfcount,pttwocount,ptonecount)
RESULTS:
Input: 2.3
Output: 11010
i.e. 3.2
Input: 3.2
Output:20010
i.e. 4.2
Input: 2
Output: 10001
i.e. 2.1
All your comparisons use >-1, so you give out change as long as you have more than -1 balance.
This would be correct if you were only dealing with integers, since there >-1 is equal to >=0.
For floating point numbers however, we have for example -0.5>-1, so we will give out change for negative balance (which we do not want).
So the correct way would be to replace all >-1 comparisons by >=0 (larger or equal to 0) comparisons.
The problem is how it calculates the change using your if/else statements. If you walk through the first example change-2>-1 will register true and then result will be .3 but on the next loop the if change - 1 > -1 you are expecting to be false but it's not it's actually -0.7. One of the best ways to do this would be with Python's floor // and mod % operators. You have to round some of the calculations because of the way Python handles floats
change=float(input("Input change"))
twocount=0
onecount=0
halfcount=0
pttwocount=0
ptonecount=0
twocount = int(change//2)
change = round(change%2,1)
if change//1 > 0:
onecount = int(change//1)
change = round(change%1,1)
if change//0.5 > 0:
halfcount = int(change//0.5)
change = round(change%0.5, 1)
if change//0.2 > 0:
pttwocount = int(change//0.2)
change = round(change%0.2, 1)
if change//0.1 > 0:
ptonecount = int(change//0.1)
change = round(change%0.1,1)
print(twocount,onecount,halfcount,pttwocount,ptonecount)
But given the inputs this produces
Input: 2.3
Output: 1 0 0 1 1
Input: 3.2
Output:1 1 0 1 0
Input: 2
Output: 1 0 0 0 0

Dumb error in my if/and statement - not seeing it

I have a data set with float values:
dog-auto dog-bird dog-cat dog-dog Result
41.9579761457 41.7538647304 36.4196077068 33.4773590373 0
46.0021331807 41.33958925 38.8353268874 32.8458495684 0
42.9462290692 38.6157590853 36.9763410854 35.0397073189 0
41.6866060048 37.0892269954 34.575072914 33.9010327697 0
39.2269664935 38.272288694 34.778824791 37.4849250909 0
40.5845117698 39.1462089236 35.1171578292 34.945165344 0
45.1067352961 40.523040106 40.6095830913 39.0957278345 0
41.3221140974 38.1947918393 39.9036867306 37.7696131032 0
41.8244654995 40.1567131661 38.0674700168 35.1089144603 0
45.4976929401 45.5597962603 42.7258732951 43.2422832585 0
This is an SFrame. I have attempted to write a function that uses an if/an statement to determine if the value for dog-dog is less that the values for dog-ct AND dog-auto AND dog-bird.
I've gone through this for the better part of 4 hours. Admittedly I'm a newby to python - I'm making a illy mistake and just not seeing it.
If statement:
def is_dog_correct(row):
if (dog_distances[dog_distances['dog-dog']] < dog_distances[dog_distances['dog-cat']]) & (dog_distances[dog_distances['dog-dog']] < dog_distances[dog_distances['dog-bird']]) & (dog_distances[dog_distances['dog-dog']] < dog_distances[dog_distances['dog-auto']]):
dog_distances['Result'] = 1
else:
dog_distances['Result'] = 0
then I call the function with:
dog_distances.apply(is_dog_correct)
If this was working correctly, I would see "0" in every row but the fifth record. What is wrong with my if statement?
Full disclosure - this is coursework, but after spending 4 hours on this, I'm reaching for help!
Change & to and as indicated by the previous comments. Also, I recommend you break up such long if statements into multiple lines so it's clearer and easier to read.
def is_dog_correct(row):
if (dog_distances[dog_distances['dog-dog']] < dog_distances[dog_distances['dog-cat']]) and
(dog_distances[dog_distances['dog-dog']] < dog_distances[dog_distances['dog-bird']]) and
(dog_distances[dog_distances['dog-dog']] < dog_distances[dog_distances['dog-auto']]):
dog_distances['Result'] = 1
else:
dog_distances['Result'] = 0
Make your first if statement more clean by finding the min (minimum) of all of the values. This makes sure that 'dog-dog' is less than all of the rest:
def is_dog_correct(row):
if dog_distances[dog_distances['dog-dog']] < min([dog_distances[dog_distances['dog-'+x]] for x in ['cat','bird','auto']]):
dog_distances['Result'] = 0
else:
dog_distances['Result'] = 1
EDIT: For debuggin purposes use the following:
def is_dog_correct(row):
print 'dog is {}'.format(dog_distances[dog_distances['dog-dog']])
print 'everyone else is {}'.format([dog_distances[dog_distances['dog-'+x]] for x in ['cat','bird','auto']])
if dog_distances[dog_distances['dog-dog']] < min([dog_distances[dog_distances['dog-'+x]] for x in ['cat','bird','auto']]):
print 'Yay dog is faster'
dog_distances['Result'] = 0
else:
print 'Awww, dog is not faster'
dog_distances['Result'] = 1

Python If Function for Ranged Field

Actually this is a Python in GIS, so I use table in my Arcgis and try to count the field and divided it by using category.
I have Field named Elevation
the data contain integer example :
1 - 2
3 - 6
2 - 3
8.5 - 12
11 - 12
I need to categorize it using rule that if
Elevation < 1 then Index = 0.3 ,if Elevation = 2 - 3 Index = 0.6, if Elevation > 3 Index = 1
I have this code :
def Reclass( Elevation ):
r_min, r_max = (float(s.strip()) for s in Elevation.split('-'))
print "r_min: {0}, r_max: {1}".format(r_min,r_max)
if r_min < 1 and r_max < 1:
return 0.333
elif r_min >= 1 and r_max >= 1 and r_min <= 3 and r_max <= 3:
return 0.666
elif r_min > 3 and r_max > 3:
return 1
elif r_min <= 3 and r_max > 3:
return 1
else:
return 999
my question is how to strip it, and categorized it using my rule above?
Thanks before
Based on comments, your field is a string that contains ranges of the form you describe above.
Firstly, this is horrible database design. The minimum and maximum should be separate columns of integer types. shakes fist at ESRI more for discouraging good database design
Furthermore, your rule is insufficient for dealing with a range. A range check would either need to compare against either 1 end of the range or both ends. So you will have to clarify exactly what you want for your "indexing" rule.
Given that you have strings representing ranges, your only option is to parse the range into its minimum and maximum and work with those. That's not too hard in Python:
>>> r = "3 - 6"
>>> r_min, r_max = (int(s.strip()) for s in r.split('-'))
>>> r_min
3
>>> r_max
6
What does this do?
It's pretty simple, actually. It splits the string by the -. Then it loops over the resulting list, and each element has its leading and trailing whitespace removed and is then converted into an int. Finally, Python unpacks the generator on the right to fill in the variables on the left.
Be aware that malformed data will cause errors.
Once you've clarified your "index" rule, you can figure out how to use this minimum and maximum to get your "index".
I have borrowed code from you and #jpmc26 below. This code (minus the print statements that are just there for testing) should work for you in the Field Calculator of ArcMap but it is simply Python code. The problem is that you have not told us what you want to do when the two ends of a range fall into different categories so for now I have used an else statement to put out 999.
def Reclass( Elevation ):
r_min, r_max = (float(s.strip()) for s in Elevation.split('-'))
print "r_min: {0}, r_max: {1}".format(r_min,r_max)
if r_min < 1 and r_max < 1:
return 0.333
elif r_min >= 1 and r_max >= 1 and r_min <= 3 and r_max <= 3:
return 0.666
elif r_min > 3 and r_max > 3:
return 1
else:
return 999
print Reclass("0 - 1.1")
print Reclass("5.2 - 10")
print Reclass("2 - 3")
print Reclass("0 - 0")

Categories

Resources