I'm writing a program that reads in two proteins of the same (string) length and returns how many of the amino acid letters are different. I managed to write the some of the bits but unfortunately couldn't complete all of it so can any please guide through this by having a look on my code:
a = raw_input("Cheetah protein: ")
b = raw_input("Domestic cat protein: ")
u=zip(a,b)
d=dict(u)
x = 1
for i,j in d.items():
if i == j:
x = x + 1
print x
This the output I want to produce:
Cheetah protein: IGADKYFHARGNYDAA
Domestic cat protein: KGADKYFHARGNYEAA
2 difference(s).
I think you should describe better what you are trying to achieve. I don't understand this check:
if i == j:
If you want to check the differences, you should write instead:
if i != j:
After this fix your code gives me 3 differences for your example with cat and cheetah - are you sure the example is correct?
EDIT: OK, I see you're counting differences starting from one. Change the line
x = 1
to
x = 0
I don't think you want to assume that there is always at least one difference ;-)
a="IGADKYFHARGNYDAA"
b="KGADKYFHARGNYEAA"
u=zip(a,b)
x = 0 # not 1
for i,j in u: # you don't need a dict here
print i,j,
if i != j: # they differ iff they are not equal to each other
x = x + 1
print " neq"
else:
print " eq"
print x
func=lambda x,y: ((x!=y) and 1) or 0
print sum(map(func, a,b)), 'difference(s)'
Different proteins has different amino acids sequences ("letters"), but has also different lengths. No answers take care of this because you didn't asked for. Answering your question:
>>> a="IGADKYFHARGNYDAA"
>>> b="KGADKYFHARGNYEAA"
>>> sum(1 for x, y in zip(a,b) if x!=y)
2
I used "generator expressions" to generate a number one for each pair of different amino acids and summed up. Now if you want to spot the changed amino acids using a similar method:
>>> diff = ''.join('-' if x==y else y for x, y in zip(a,b))
>>> print 'A:', a, '\nB:', diff
A: IGADKYFHARGNYDAA
B: K------------E--
Related
I need to loop trough n lines of a file and for any i between 1 and n-1 to get the difference between words of line(n-1) - line(n) (eg. line[i]word[j] - line[i+1]word[j] etc .. )
Input :
Hey there !
Hey thre !
What a suprise.
What a uprise.
I don't know what to do.
I don't know wt to do.
Output:
e
s
ha
The goal is to extract the missing character(s) between two consecutive line words only.
I'm new to python so if you can guide me through writing the code, I would be more than thankful.
Without any lib :
def extract_missing_chars(s1, s2):
if len(s1) < len(s2):
return extract_missing_chars(s2, s1)
i = 0
to_return = []
for c in s1:
if s2[i] != c:
to_return.append(c)
else:
i += 1
return to_return
f = open('testfile')
l1 = f.readline()
while l1:
l2 = f.readline()
print(''.join(extract_missing_chars(l1, l2)))
l1 = f.readline()
Your example indicates that you want the comparisons between pairs of lines. This is different from defining it as line(n-1)-line(n) which would give you 5 results, not 3.
The result also depends on what you consider to be differences. Is it positional, is it simply based on missing letters from the odd lines or are the differences applicable in both directions.
(e.g. "boat"-"tub" = "boat", "oa" or "oau" ?).
You also have to decide if you want the differences to be case sensitive or not.
Here's an example where computation of the differences is centralized in a function so that you can change the rules more easily. It assumes that "boat"-"tub" = "oau".
lines = """Hey there !
Hey thre !
What a suprise.
What a uprise.
I don't know what to do.
I don't know wt to do.
""".split('\n')
def differences(word1,word2):
if isinstance(word1,list):
return "".join( differences(w1,w2) for w1,w2 in zip(word1+[""]*len(word2),word2+[""]*len(word1)) )
return "".join( c*abs(word1.count(c)-word2.count(c)) for c in set(word1+word2) )
result = [ differences(line1.split(),line2.split()) for line1,line2 in zip(lines[::2],lines[1::2]) ]
# ['e', 's', 'ha']
Note that line processing for result is based on your example (not on your definition).
I've been searching for a simpler way to do this, but i'm not sure what search parameters to use. I have a floating point number, that i would like to round, convert to a string, then specify a custom format on the string. I've read through the .format docs, but can't see if it's possible to do this using normal string formatting.
The output i want is just a normal string, with spaces every three chars, except for the last ones, which should have a space four chars before the end.
For example, i made this convoluted function that does what i want in an inefficient way:
def my_formatter(value):
final = []
# round float and convert to list of strings of individual chars
c = [i for i in '{:.0f}'.format(value)]
if len(c) > 3:
final.append(''.join(c[-4:]))
c = c[:-4]
else:
return ''.join(c)
for i in range(0, len(c) // 3 + 1, 1):
if len(c) > 2:
final.insert(0, ''.join(c[-3:]))
c = c[:-3]
elif len(c) > 0:
final.insert(0, ''.join(c))
return(' '.join(final))
e.g.
>>> my_formatter(123456789.12)
>>> '12 345 6789'
>>> my_formatter(12345678912.34)
>>> '1 234 567 8912'
Would really appreciate guidance on doing this in a simpler / more efficient way.
Took a slightly different angle but this uses a third party function partition_all. In short, I use it to group the string into groups of 3 plus the final group if there are less than 3 chars. You may prefer this as there are no for loops or conditionals but it's basically cosmetic differences.
from toolz.itertoolz import partition_all
def simpleformat(x):
x = str(round(x))
a, b = x[:-4], x[-4:]
strings = [''.join(x[::-1]) for x in reversed(list(partition_all(3, a[::-1])))]
return ' '.join(strings + [b])
Try this:
def my_formatter(x):
# round it as text
txt = "{:.0f}".format(x)
# find split indices
splits = [None] + list(range(-4, -len(txt), -3)) + [None]
# slice and rejoin
return " ".join(
reversed([txt[i:j] for i, j in zip(splits[1:], splits[:-1])]))
Then
>>> my_formatter(123456789.1)
12 345 6789
>>> my_formatter(1123456789.1)
112 345 6789
>>> my_formatter(11123456789.1)
1 112 345 6789
Here is a pretty simple solution using a loop over the elements in the reverse order such that counting the indices is easier:
num = 12345678912.34
temp = []
for ix, c in enumerate(reversed(str(round(num)))):
if ix%3 == 0 and ix !=0: temp.extend([c, ' '])
else: temp.extend(c)
''.join(list(reversed(temp)))
Output:
'1 234 567 8912'
Using list comprehensions we can do this in a single very confusing line as
num = 12345678912.34
''.join(list(reversed(list(''.join([c+' ' if(ix%3 == 0 and ix!=0) else c for ix, c in enumerate(reversed(str(round(num))))])))))
'1 234 567 8912'
Another approch is to use locale if available on your system of course, and use format.
import locale
for v in ('fr_FR.UTF-8', 'en_GB.UTF-8'):
locale.setlocale(locale.LC_NUMERIC, v)
print(v, '>> {:n}'.format(111222333999))
May as well share another slightly different variant, but still can't shake the feeling that there's some sublime way that we just can't see. Haven't marked any answers as correct yet, because i'm convinced python can do this in a simpler way, somehow. What's also driving me crazy is that if i remember correctly VB's format command can handle this (with a pattern like "### ####0"). Maybe it's just a case of not understanding how to use Python's .format correctly.
The below accepts a float or decimal and a list indicating split positions. If there are still digits in the string after consuming the last split position, it re-applies that until it reaches the start of the string.
def format_number(num, pat, sep=' '):
fmt = []
strn = "{:.0f}".format(num)
while strn:
p = pat.pop() if pat else p
fmt.append(strn[-p:])
strn = strn[:-p] if len(strn) > p else ''
return sep.join(fmt[::-1])
>>> format_number(123456789, [3, 4])
>>> '12 345 6789'
>>> format_number(1234567890, [3])
>>> '1 234 567 890'
so I've got a list of questions as a dictionary, e.g
{"Question1": 3, "Question2": 5 ... }
That means the "Question1" has 3 points, the second one has 5, etc.
I'm trying to create all subset of question that have between a certain number of questions and points.
I've tried something like
questions = {"Q1":1, "Q2":2, "Q3": 1, "Q4" : 3, "Q5" : 1, "Q6" : 2}
u = 3 #
v = 5 # between u and v questions
x = 5 #
y = 10 #between x and y points
solution = []
n = 0
def main(n_):
global n
n = n_
global solution
solution = []
finalSolution = []
for x in questions.keys():
solution.append("_")
finalSolution.extend(Backtracking(0))
return finalSolution
def Backtracking(k):
finalSolution = []
for c in questions.keys():
solution[k] = c
print ("candidate: ", solution)
if not reject(k):
print ("not rejected: ", solution)
if accept(k):
finalSolution.append(list(solution))
else:
finalSolution.extend(Backtracking(k+1))
return finalSolution
def reject(k):
if solution[k] in solution: #if the question already exists
return True
if k > v: #too many questions
return True
points = 0
for x in solution:
if x in questions.keys():
points = points + questions[x]
if points > y: #too many points
return True
return False
def accept(k):
points = 0
for x in solution:
if x in questions.keys():
points = points + questions[x]
if points in range (x, y+1) and k in range (u, v+1):
return True
return False
print(main(len(questions.keys())))
but it's not trying all possibilities, only putting all the questions on the first index..
I have no idea what I'm doing wrong.
There are three problems with your code.
The first issue is that the first check in your reject function is always True. You can fix that in a variety of ways (you commented that you're now using solution.count(solution[k]) != 1).
The second issue is that your accept function uses the variable name x for what it intends to be two different things (a question from solution in the for loop and the global x that is the minimum number of points). That doesn't work, and you'll get a TypeError when trying to pass it to range. A simple fix is to rename the loop variable (I suggest q since it's a key into questions). Checking if a value is in a range is also a bit awkward. It's usually much nicer to use chained comparisons: if x <= points <= y and u <= k <= v
The third issue is that you're not backtracking at all. The backtracking step needs to reset the global solution list to the same state it had before Backtracking was called. You can do this at the end of the function, just before you return, using solution[k] = "_" (you commented that you've added this line, but I think you put it in the wrong place).
Anyway, here's a fixed version of your functions:
def Backtracking(k):
finalSolution = []
for c in questions.keys():
solution[k] = c
print ("candidate: ", solution)
if not reject(k):
print ("not rejected: ", solution)
if accept(k):
finalSolution.append(list(solution))
else:
finalSolution.extend(Backtracking(k+1))
solution[k] = "_" # backtracking step here!
return finalSolution
def reject(k):
if solution.count(solution[k]) != 1: # fix this condition
return True
if k > v:
return True
points = 0
for q in solution:
if q in questions:
points = points + questions[q]
if points > y: #too many points
return True
return False
def accept(k):
points = 0
for q in solution: # change this loop variable (also done above, for symmetry)
if q in questions:
points = points + questions[q]
if x <= points <= y and u <= k <= v: # chained comparisons are much nicer than range
return True
return False
There are still things that could probably be improved in there. I think having solution be a fixed-size global list with dummy values is especially unpythonic (a dynamically growing list that you pass as an argument would be much more natural). I'd also suggest using sum to add up the points rather than using an explicit loop of your own.
Given string is "abc" then it should print out "abc", "bca", "cba"
My approach: find length of the given string and rotate them till length
def possible_rotation():
a = "abc"
b = len(a)
for i in range (b-1):
c = a[:i] + a[i:]
print c
Above code simply prints abc, abc. Any idea what am I missing here?
def possible_rotation():
a = "abc"
b = len(a)
for i in range (b):
c = a[i:]+a[:i]
print c
possible_rotation()
Output:
abc
bca
cab
You have 2 issues.The range issue and the rotation logic.it should be a[i:]+a[:i] not the other way round.For range range(b-1) should be range(b)
You have two errors:
range(b-1) should be range(b);
a[:i] + a[i:] should be a[i:] + a[:i].
This is what I did. I used a deque, A class in collections and then used the rotate function like this
from collections import deque
string = 'abc'
for i in range(len(string)):
c = deque(string)
c.rotate(i)
print ''.join(list(c))
And gives me this output.
abc
cab
bca
What it does. It creates a deque object, A double ended queue object, which has a method rotate, rotate takes the number of steps to rotate and returns the objects shifted to the right with the number of steps kinda like rshift in binary operations. Through the loops it shifts ad produces a deque object that I convert to list and finally to a string.
Hope this helps
for i in range(b):
print(a[i:] + a[:i])
0 - [a,b,c] + []
1 - [b,c] + [a]
2 - [c ] + [a,b]
swap the lists
No need to do (b-1),You simply do it by:
def possible_rotation():
a = "abc"
for i in range(0,len(a)):
strng = a[i:]+a[:i]
print strng
possible_rotation()
`
This looks to be homework, but here's a solution using the built-in collections.deque:
from collections import deque
def possible_rotations(string):
rotated = deque(string)
joined = None
while joined != string:
rotated.rotate(1)
joined = ''.join(x for x in rotated)
print(joined)
Test it out:
>>> print(possible_rotations('abc'))
cab
bca
abc
Two things:
Firstly, as already pointed out in the comments, you should iterate over range(b) instead of range(b-1). In general, range(b) is equal to [0, 1, ..., b-1], so in your example that would be [0, 1, 2].
Secondly, you switched around the two terms, it should be: a[i:] + a[:i].
I am writing a code snippet for a random algebraic equation generator for a larger project. Up to this point, everything has worked well. The main issue is simple. I combined the contents of a dictionary in sequential order. So for sake of argument, say the dictionary is: exdict = {a:1 , b:2 , c:3 , d:4}, I append those to a list as such: exlist = [a, b, c, d, 1, 2, 3, 4]. The length of my list is 8, which half of that is obviously 4. The algorithm is quite simple, whatever random number is generated between 1-4(or as python knows as 0-3 index), if you add half of the length of the list to that index value, you will have the correct value.
I have done research online and on stackoverflow but cannot find any answer that I can apply to my situation...
Below is the bug check version of my code. It prints out each variable as it happens. The issue I am having is towards the bottom, under the ### ITERATIONS & SETUP comment. The rest of the code is there so it can be ran properly. The primary issue is that a + x should be m, but a + x never equals m, m is always tragically lower.
Bug check code:
from random import randint as ri
from random import shuffle as sh
#def randomassortment():
letterss = ['a','b','x','d','x','f','u','h','i','x','k','l','m','z','y','x']
rndmletters = letterss[ri(1,15)]
global newdict
newdict = {}
numberss = []
for x in range(1,20):
#range defines max number in equation
numberss.append(ri(1,20))
for x in range(1,20):
rndmnumber = numberss[ri(1,18)]
rndmletters = letterss[ri(1,15)]
newdict[rndmletters] = rndmnumber
#x = randomassortment()
#print x[]
z = []
# set variable letter : values in list
for a in newdict.keys():
z.append(a)
for b in newdict.values():
z.append(b)
x = len(z)/2
test = len(z)
print 'x is value %d' % (x)
### ITERATIONS & SETUP
iteration = ri(2,6)
for x in range(1,iteration):
a = ri(1,x)
m = a + x
print 'a is value: %d' % (a)
print 'm is value %d' %(m)
print
variableletter = z[a]
variablevalue = z[m]
# variableletter , variablevalue
edit - My questions is ultimately, why is a + x returning a value that isn't a + x. If you run this code, it will print x , a , and m. m is supposed to be the value of a + x, but for some reason, it isnt?
The reason this isn't working as you expect is that your variable x originally means the length of the list, but it's replaced in your for x in range loop- and then you expect it to be equal to the length of the list. You could just change the line to
for i in range(iteration)
instead.
Also note that you could replace all the code in the for loop with
variableletter, variablevalue = random.choice(newdict.items())
Your problem is scope
which x are you looking for here
x = len(z)/2 # This is the first x
print 'x is value %d' % (x)
### ITERATIONS & SETUP
iteration = ri(2,6)
# x in the for loop is referencing the x in range...
for x in range(1,iteration):
a = ri(1,x)
m = a + x