Init method; len() of self object - python

def __init__(self,emps=str(""),l=[">"]):
self.str=emps
self.bl=l
def fromFile(self,seqfile):
opf=open(seqfile,'r')
s=opf.read()
opf.close()
lisst=s.split(">")
if s[0]==">":
lisst.pop(0)
nlist=[]
for x in lisst:
splitenter=x.split('\n')
splitenter.pop(0)
splitenter.pop()
splitstring="".join(splitenter)
nlist.append(splitstring)
nstr=">".join(nlist)
nstr=nstr.split()
nstr="".join(nstr)
for i in nstr:
self.bl.append(i)
self.str=nstr
return nstr
def getSequence(self):
print self.str
print self.bl
return self.str
def GpCratio(self):
pgenes=[]
nGC=[]
for x in range(len(self.lb)):
if x==">":
pgenes.append(x)
for i in range(len(pgenes)):
if i!=len(pgenes)-1:
c=krebscyclus[pgenes[i]:pgenes[i+1]].count('c')+0.000
g=krebscyclus[pgenes[i]:pgenes[i+1]].count('g')+0.000
ratio=(c+g)/(len(range(pgenes[i]+1,pgenes[i+1])))
nGC.append(ratio)
return nGC
s = Sequence()
s.fromFile('D:\Documents\Bioinformatics\sequenceB.txt')
print 'Sequence:\n', s.getSequence(), '\n'
print "G+C ratio:\n", s.GpCratio(), '\n'
I dont understand why it gives the error:
in GpCratio for x in range(len(self.lb)): AttributeError: Sequence instance has no attribute 'lb'.
When i print the list in def getSequence it prints the correct DNA sequenced list, but i can not use the list for searching for nucleotides. My university only allows me to input 1 file and not making use of other arguments in definitions, but "self"
btw, it is a class, but it refuses me to post it then.. class called Sequence

Looks like a typo. You define self.bl in your __init__() routine, then try to access self.lb.
(Also, emps=str("") is redundant - emps="" works just as well.)
But even if you correct that typo, the loop won't work:
for x in range(len(self.bl)): # This iterates over a list like [0, 1, 2, 3, ...]
if x==">": # This condition will never be True
pgenes.append(x)
You probably need to do something like
pgenes=[]
for x in self.bl:
if x==">": # Shouldn't this be != ?
pgenes.append(x)
which can also be written as a list comprehension:
pgenes = [x for x in self.bl if x==">"]
In Python, you hardly ever need len(x) or for n in range(...); you rather iterate directly over the sequence/iterable.
Since your program is incomplete and lacking sample data, I can't run it here to find all its other deficiencies. Perhaps the following can point you in the right direction. Assuming a string that contains the characters ATCG and >:
>>> gene = ">ATGAATCCGGTAATTGGCATACTGTAG>ATGATAGGAGGCTAG"
>>> pgene = ''.join(x for x in gene if x!=">")
>>> pgene
'ATGAATCCGGTAATTGGCATACTGTAGATGATAGGAGGCTAG'
>>> ratio = float(pgene.count("G") + pgene.count("C")) / (pgene.count("A") + pgene.count("T"))
>>> ratio
0.75
If, however, you don't want to look at the entire string but at separate genes (where > is the separator), use something like this:
>>> gene = ">ATGAATCCGGTAATTGGCATACTGTAG>ATGATAGGAGGCTAG"
>>> genes = [g for g in gene.split(">") if g !=""]
>>> genes
['ATGAATCCGGTAATTGGCATACTGTAG', 'ATGATAGGAGGCTAG']
>>> nGC = [float(g.count("G")+g.count("C"))/(g.count("A")+g.count("T")) for g in genes]
>>> nGC
[0.6875, 0.875]
However, if you want to calculate GC content, then of course you don't want (G+C)/(A+T) but (G+C)/(A+T+G+C) --> nGC = [float(g.count("G")+g.count("C"))/len(g)].

Related

Problems with 'lambda' expression in python

I have the following problem solving question
Please write a program using generator to print the even numbers
between 0 and n in comma separated form while n is input by console.
Example: If the following n is given as input to the program: 10 Then,
the output of the program should be: 0,2,4,6,8,10.
And below is my answer
n=int(input("enter the number of even numbers needed:"))
eve=''
st=(lambda x:(for i in range(0,x))[(str(i)) if i%2==0 else (",")])(n)
However, I have a problem with the third line that has the lambda
Take full advantage of Python 3's features to create a generator using generator expression syntax, also do the even number stepping with range()'s third paramter.
This would be much briefer:
>>> n = 12
>>>
>>> fn = lambda x: (f"{i}," for i in range(0, x + 1, 2))
>>>
>>> ''.join(list(fn(n)))[:-1] + '.'
'0,2,4,6,8,10,12.'
>>>
>>> fn(10)
<generator object <lambda>.<locals>.<genexpr> at 0x107f67660>
What looks like a tuple comprehension is actually called a "generator expression". Note that in the last line above the interpeter is indicating that the type returned by the lambda is indeed a generator.
Even briefer, you could do it this way:
>>> fn = lambda x: ','.join( (f"{i}" for i in range(0, x + 1, 2)) ) + '.'
>>>
>>> fn(n)
'0,2,4,6,8,10,12.'
>>>
Looks like you might have been on the right track in your question post.
A function that uses the yield keyword also creates a generator. So the other poster is correct in that regard.

Numpy atleast_1d, but for lists

I have a function that takes as an argument either a list of objects or a single object. I then want to loop through the elements of the list or operate on the single object if it is not a list.
Below, I use numpy.atleast_1d().tolist() to ensure that a loop works whether or not the argument is a list or a single object. However, I am not sure if converting the object to a numpy array and then to a list may cause any unintended changes to the object.
Is there a way to ensure the argument is transformed into a list if it is not a list? I have two possible solutions in a simple example below, but wanted to know if there are any better ones.
import numpy as np
def printer1(x):
for xi in np.atleast_1d(x).tolist():
print(xi)
def printer2(x):
if type(x) != list:
x = [x]
for xi in x:
print(xi)
x1 = 'a'
x2 = ['a','b','c']
printer1(x1)
printer1(x2)
printer2(x1)
printer2(x2)
I'm using Python 2.7
In your function you can add check for array. I think this is one way to do it. You dont even need to use numpy for this.
def foo(x):
x = [x] if not isinstance(x, list) else x
printx # or do whatever you want to do
# or
for value in x:
print value
foo('a')
foo(['a','b'])
output:
['a']
a
['a', 'b']
a
b
To ensure that the element will be a list even that it has only one element, declare its value inside square brackets:
foo = ['stringexample']
foo2 = ['a','b']
for foos in foo:
print (foos)
for foos2 in foo2:
print (foos2)
This way, even that 'foo' has only a single string, it will still operate as a list with only one element.
Also, you could try this:
declare a empty list
use youremptylist.extend(incoming value)
It will iterate a new list for each incoming value, even that it is a single one
As Roni is saying, you can use this:
def printer(x):
finalList = []
finalList.extend(x)
print finalList
if x is a single value, it will be added to the finalList, if x is a list, it will be joined to finalList and you can iterate throught it.
If you want loopable things mostly untouched and non loopables behave like a 1-element list you could do something like:
def forceiter(x):
return getattr(x,"__iter__",lambda:(x,))()
Demo:
for x in [1,[2],range(3),"abc",(),{3:3,4:"x"}, np.logspace(0,3,4)]:
print(x,end=" --> ")
for i in forceiter(x):
print(i,end=" ")
print()
# 1 --> 1
# [2] --> 2
# range(0, 3) --> 0 1 2
# abc --> a b c
# () -->
# {3: 3, 4: 'x'} --> 3 4
# [ 1. 10. 100. 1000.] --> 1.0 10.0 100.0 1000.0

Python - making a function that would add "-" between letters

I'm trying to make a function, f(x), that would add a "-" between each letter:
For example:
f("James")
should output as:
J-a-m-e-s-
I would love it if you could use simple python functions as I am new to programming. Thanks in advance. Also, please use the "for" function because it is what I'm trying to learn.
Edit:
yes, I do want the "-" after the "s".
Can I try like this:
>>> def f(n):
... return '-'.join(n)
...
>>> f('james')
'j-a-m-e-s'
>>>
Not really sure if you require the last 'hyphen'.
Edit:
Even if you want suffixed '-', then can do like
def f(n):
return '-'.join(n) + '-'
As being learner, it is important to understand for your that "better to concat more than two strings in python" would be using str.join(iterable), whereas + operator is fine to append one string with another.
Please read following posts to explore further:
Any reason not to use + to concatenate two strings?
which is better to concat string in python?
How slow is Python's string concatenation vs. str.join?
Also, please use the "for" function because it is what I'm trying to learn
>>> def f(s):
m = s[0]
for i in s[1:]:
m += '-' + i
return m
>>> f("James")
'J-a-m-e-s'
m = s[0] character at the index 0 is assigned to the variable m
for i in s[1:]: iterate from the second character and
m += '-' + i append - + char to the variable m
Finally return the value of variable m
If you want - at the last then you could do like this.
>>> def f(s):
m = ""
for i in s:
m += i + '-'
return m
>>> f("James")
'J-a-m-e-s-'
text_list = [c+"-" for c in text]
text_strung = "".join(text_list)
As a function, takes a string as input.
def dashify(input):
output = ""
for ch in input:
output = output + ch + "-"
return output
Given you asked for a solution that uses for and a final -, simply iterate over the message and add the character and '-' to an intermediate list, then join it up. This avoids the use of string concatenations:
>>> def f(message)
l = []
for c in message:
l.append(c)
l.append('-')
return "".join(l)
>>> print(f('James'))
J-a-m-e-s-
I'm sorry, but I just have to take Alexander Ravikovich's answer a step further:
f = lambda text: "".join([c+"-" for c in text])
print(f('James')) # J-a-m-e-s-
It is never too early to learn about list comprehension.
"".join(a_list) is self-explanatory: glueing elements of a list together with a string (empty string in this example).
lambda... well that's just a way to define a function in a line. Think
square = lambda x: x**2
square(2) # returns 4
square(3) # returns 9
Python is fun, it's not {enter-a-boring-programming-language-here}.

compute gc content for each sequence and return a dictionary

i have a list called "self.__sequences" with some DNA sequences, and the following is part of that list
['AAAACATCAGTATCCATCAGGATCAGTTTGGAAAGGGAGAGGCAATTTTTCCTAAACATGTGTTCAAATGGTCTGAGACAGACGTTAAAATGAAAAGGGG\n', 'TTAGAAACTATGGGATTATTCACTCCCTAGGTACTGAGAATGGAAACTTTCTTTGCCTTAATCGTTGACATCCCCTCTTTTAGGTTCTTGCTTCCTAACA\n', 'CTGAGTAAATCATATACTCAATGATTTTTTTATGTGTGTGCATGTGTGCTGTTGATATTCTTCAGTACCAAAACCCATCATCTTATTTGCATAGGGAAGT\n', 'CTGCCAGCACGCTGTCACCTCTCAATAACAGTGAGTGTAATGGCCATACTCTTGATTTGGTTTTTGCCTTATGAATCAGTGGCTAAAAATATTATTTAAT\n', 'ACTTATATTATGTTGACACTCAAAAATTTCAGAATTTGGAGTATTTTGAATTTCAGATTTTCTGATTAGGGATGTACCTGTACTTTTTTTTTTTTTTTTT\n', 'TTTGTTCTTTTTGTAATGGGGCCAGATGTCACTCATTCCACATGTAGTATCCAGATTGAAATGAAATGAGGTAGAACTGACCCAGGCTGGACAAGGAAGG\n', 'AAGAGGTAAAGGAAACAGACTGATGGCTGGAGAATTTGACAACGTATAAGAGAATCTGAGAATTCTTTTGAAAAATACTCAAATTTCCAGCCAAGATAGA\n', 'ACACTTGAGCATTAAGAGGAAACACCAAGGAAACAGATTTTAGGTCAAGAAAAAGAAGAGCTCTCTCATGTCAGAGCAGCCTAGAGCAGGAAAGTGCTGT\n', 'ACATCTATGCCCACCACACCTNGGTATGCANTGATGCTCATGAGATGGGAGGTGGCTACAGATTGCTCCATATAGAAATGTTACCTAGCATGTTAAAGAT\n']
I want to compute the gc conent for each DNA sequence and returns a dictionary with DNA:gc content. For example, something like that:
{(AAAACATCAGTATCCATCAGGATCAGTTTGGAAAGGGAGAGGCAATTTTTCCTAAACATGTGTTCAAATGGTCTGAGACAGACGTTAAAATGAAAAGGGG:0.5), (TTAGAAACTATGGGATTATTCACTCCCTAGGTACTGAGAATGGAAACTTTCTTTGCCTTAATCGTTGACATCCCCTCTTTTAGGTTCTTGCTTCCTAACA:0.33)}
gc content= (Count(G) + Count(C)) / (Count(A) + Count(T) + Count(G) + Count(C))
I write the following code but it gives me nothing!
def get_gc_content(self):
for i in range (len(self.__sequence)):
if seq[i] in self.__sequence:
return (seq.count('G')+seq.count('C'))/float(seq.count('G')+seq.count('C')+seq.count('T')+seq.count('A'))
Can anyone help me to improve my code?
Assuming you analyze DNA (not RNA, etc) and strip() newlines and spaces from your sequences, seq.count('A') + seq.count('G') + seq.count('C') + seq.count('T') would always equal len(seq).
Note that seq.some_method_name operates on the whole sequence. You don't need the for loop that iterates over sequence elements at all.
The i in self.__sequence is always False (you pick an integer and see if it belogs to to sequence of four possible letters), so it does nothing.
The first return inside the loop will break the loop.
Here's a piece of code that seems to work:
def getContentOf(target_list, seq):
# add a 1 for each nucleotide in target_list
target_count = sum(1 for x in seq if x in target_list)
return float(target_count) / len(seq)
Answers look sensible:
>>> getContentOf(['G', 'C'], 'AGCT')
0.5
>>> getContentOf(['G', 'C'], 'AGCTATAT')
0.25
>>> _
So what you need is something like {seq: getContentOf(['G', 'C'], seq)}
BTW the sequences you gave in your post seem to have different G+C content than your examples state.
what about this:
self.myDict = {}
def create_dna_dict(self):
for i in seq:
if i in self.__sequence:
self.myDict[i] = (seq.count('G') + seq.count('C')) / float(seq.count('G') + seq.count('C') + seq.count('T') + seq.count('A'))
a few things though:
Are you sure seq shouldn't be self.seq?
__sequence is a very odd variable name. It seems unconventional.
In the example of the dict you want, you have an incorrect dict
syntax
I am quite sure that you're dict, with it's tuples and lack of strings:
{(AAAACATCAGTATCCATCAGGATCAGTTTGGAAAGGGAGAGGCAATTTTTCCTAAACATGTGTTCAAATGGTCTGAGACAGACGTTAAAATGAAAAGGGG:0.5), (TTAGAAACTATGGGATTATTCACTCCCTAGGTACTGAGAATGGAAACTTTCTTTGCCTTAATCGTTGACATCCCCTCTTTTAGGTTCTTGCTTCCTAACA:0.33)}
Should look like this, with those brackets removed and the keys are strings:
{"AAAACATCAGTATCCATCAGGATCAGTTTGGAAAGGGAGAGGCAATTTTTCCTAAACATGTGTTCAAATGGTCTGAGACAGACGTTAAAATGAAAAGGGG":0.5, "TTAGAAACTATGGGATTATTCACTCCCTAGGTACTGAGAATGGAAACTTTCTTTGCCTTAATCGTTGACATCCCCTCTTTTAGGTTCTTGCTTCCTAACA":0.33}

Multiple-Target Assignments

I am reading a book about Python and there is a special part in the book about Multiple-Target Assignments. Now the book explains it like this:
but I dont see use of this. This makes no sense for me. Why would you use more variables?
Is there a reason to do this? What makes this so different from using: a='spam'and then printing out a 3 times?
I can only think of using it for emptying variables in one line.
A very good use for multiple assignment is setting a bunch of variables to the same number.
Below is a demonstration:
>>> vowels = consonants = total = 0
>>> mystr = "abcdefghi"
>>> for char in mystr:
... if char in "aeiou":
... vowels += 1
... elif char in "bcdfghjklmnpqrstvwxyz":
... consonants += 1
... total += 1
...
>>> print "Vowels: {}\nConsonants: {}\nTotal: {}".format(vowels, consonants, total)
Vowels: 3
Consonants: 6
Total: 9
>>>
Without multiple assignment, I'd have to do this:
>>> vowels = 0
>>> consonants = 0
>>> total = 0
As you can see, this is a lot more long-winded.
Summed up, multiple assignment is just Python syntax sugar to make things easier/cleaner.
It's mainly just for convenience. If you want to initialize a bunch of variables, it's more convenient to do them all on one line than several. The book even mentions that at the end of the snippet that you quoted: "for example, when initializing a set of counters to zero".
Besides that, though, the book is actually wrong. The example shown
a = b = c = 'spam'
is NOT equivalent to
c = 'spam'
b = c
a = b
What it REALLY does is basically
tmp = 'spam'
a = tmp
b = tmp
c = tmp
del tmp
Notice the order of the assignments! This makes a difference when some of the targets depend on each other. For example,
>>> x = [3, 5, 7]
>>> a = 1
>>> a = x[a] = 2
>>> a
2
>>> x
[3, 5, 2]
According to the book, x[1] would become 2, but clearly this is not the case.
For further reading, see these previous Stack Overflow questions:
How do chained assignments work?
What is this kind of assignment in Python called? a = b = True
Python - are there advantages/disadvantages to assignment statements with multiple (target list "=") groups?
And probably several others (check out the links on the right sidebar).
You might need to initialize several variables with the same value, but then use them differently.
It could be for something like this:
def fibonacci(n):
a = b = 1
while a < n:
c = a
a = a + b
b = c
return a
(variable swapping with tuple unpacking ommited to avoid confusion as with the downvoted answer)
An important note:
>>> a = b = []
is dangerous. It probably doesn't do what you think it does.
>>> b.append(7)
>>> print(b)
[7]
>>> print(a)
[7] # ???????
This is due to how variables work as names, or labels, in Python, rather than containers of values in other languages. See this answer for a full explanation.
Presumably you go on to do something else with the different variables.
a = do_something_with(a)
b = do_something_else_with(b)
#c is still 'spam'
Trivial example and the initialization step questionably didn't save you any work, but it's a valid way to code. There are certainly places where initializing a significant number of variables is needed, and as long as they're immutable this idiom can save space.
Though as the book pointed out, you can only use this type of grammar for immutable types. For mutable types you need to explicitly create multiple objects:
a,b,c = [mutable_type() for _ in range(3)]
Otherwise you end up with surprising results since you have three references to the same object rather than three objects.

Categories

Resources