Python - making counters, making loops? - python

I am having some trouble with a piece of code below:
Input: li is a nested list as below:
li = [['>0123456789 mouse gene 1\n', 'ATGTTGGGTT/CTTAGTTG\n', 'ATGGGGTTCCT/A\n'], ['>9876543210 mouse gene 2\n', 'ATTTGGTTTCCT\n', 'ATTCAATTTTAAGGGGGGGG\n']]
Using the function below, my desired output is simply the 2nd to the 9th digits following '>' under the condition that the number of '/' present in the entire sublist is > 1.
Instead, my code gives the digits to all entries. Also, it gives them multiple times. I therefore assume something is wrong with my counter and my for loop. I can't quite figure this out.
Any help, greatly appreciated.
import os
cwd = os.getcwd()
def func_one():
outp = open('something.txt', 'w') #output file
li = []
for i in os.listdir(cwd):
if i.endswith('.ext'):
inp = open(i, 'r').readlines()
li.append(inp)
count = 0
lis = []
for i in li:
for j in i:
for k in j[1:] #ignore first entry in sublist
if k == '/':
count += 1
if count > 1:
lis.append(i[0][1:10])
next_func(lis, outp)
Thanks,
S :-)

Your indentation is possibly wrong, you should check count > 1 within the for j in i loop, not within the one that checks every single character in j[1:].
Also, here's a much easier way to do the same thing:
def count_slashes(items):
return sum(item.count('/') for item in items)
for item in li:
if count_slashes(item[1:]) > 1:
print item[0][1:10]
Or, if you need the IDs in a list:
result = [item[0][1:10] for item in li if count_slashes(item[1:]) > 1]
Python list comprehensions and generator expressions are really powerful tools, try to learn how to use them as it makes your life much simpler. The count_slashes function above uses a generator expression, and my last code snippet uses a list comprehension to construct the result list in a nice and concise way.

Tamás has suggested a good solution, although it uses a very different style of coding than you do. Still, since your question was "I am having some trouble with a piece of code below", I think something more is called for.
How to avoid these problems in the future
You've made several mistakes in your approach to getting from "I think I know how to write this code" to having actual working code.
You are using meaningless names for your variables which makes it nearly impossible to understand your code, including for yourself. The thought "but I know what each variable means" is obviously wrong, otherwise you would have managed to solve this yourself. Notice below, where I fix your code, how difficult it is to describe and discuss your code.
You are trying to solve the whole problem at once instead of breaking it down into pieces. Write small functions or pieces of code that do just one thing, one piece at a time. For each piece you work on, get it right and test it to make sure it is right. Then go on writing other pieces which perhaps use pieces you've already got. I'm saying "pieces" but usually this means functions, methods or classes.
Fixing your code
That is what you asked for and nobody else has done so.
You need to move the count = 0 line to after the for i in li: line (indented appropriately). This will reset the counter for every sub-list. Second, once you have appended to lis and run your next_func, you need to break out of the for k in j[1:] loop and the encompassing for j in i: loop.
Here's a working code example (without the next_func but you can add that next to the append):
>>> li = [['>0123456789 mouse gene 1\n', 'ATGTTGGGTT/CTTAGTTG\n', 'ATGGGGTTCCT/A\n'], ['>9876543210 mouse gene 2\n', 'ATTTGGTTTCCT\n', 'ATTCAATTTTAAGGGGGGGG\n']]
>>> lis = []
>>> for i in li:
count = 0
for j in i:
break_out = False
for k in j[1:]:
if k == '/':
count += 1
if count > 1:
lis.append(i[0][1:10])
break_out = True
break
if break_out:
break
>>> lis
['012345678']
Re-writing you code to make it readable
This is so you see what I meant in the beginning of my answer.
>>> def count_slashes(gene):
"count the number of '/' character in the DNA sequences of the gene."
count = 0
dna_sequences = gene[1:]
for sequence in dna_sequences:
count += sequence.count('/')
return count
>>> def get_gene_name(gene):
"get the name of the gene"
gene_title_line = gene[0]
gene_name = gene_title_line[1:10]
return gene_name
>>> genes = [['>0123456789 mouse gene 1\n', 'ATGTTGGGTT/CTTAGTTG\n', 'ATGGGGTTCCT/A\n'], ['>9876543210 mouse gene 2\n', 'ATTTGGTTTCCT\n', 'ATTCAATTTTAAGGGGGGGG\n']]
>>> results = []
>>> for gene in genes:
if count_slashes(gene) > 1:
results.append(get_gene_name(gene))
>>> results
['012345678']
>>>

import itertools
import glob
lis = []
with open('output.txt', 'w') as outfile:
for file in glob.iglob('*.ext'):
content = open(file).read()
if content.partition('\n')[2].count('/') > 1:
lis.append(content[1:10])
next_func(lis, outfile)
The reason you digits to all entries, is because you're not resetting the counter.

Related

Using Multiple If Statements with While Loop

I'm trying to write a while loop that goes through a certain list looking for certain substrings. It will find words with those substrings and print out those strings.
Here is a example that works perfectly but for only one word:
lista = ['applepie','appleseed','bananacake','bananabread']
i = 0
z = len(lista)
while i < z:
if ('pie' in lista[0+i]) == True:
print(lista[0+i])
break
i = i + 1
else:
print('Not There Yet')
This prints out applepie (which is what is desired!).
How do I go about fixing this while loop to add in multiple constraints?
I'm trying to do this:
lista = ['applepie','appleseed','bananacake','bananabread']
i = 0
z = len(lista)
while i < z:
if ('pie' in lista[0+i]) == True:
print(lista[0+i])
if ('cake' in lista[0+1]) == True:
print(lista[0+i])
i = i + 1
else:
print('Not There Yet')
This prints out:
applepie
Not There Yet
When I want this to print out:
applepie
bananacake
I used multiple 'if' statements, because I know if I want to use an 'elif', it will only run if the first 'if' statement is false.
Any help is appreciated!
You have two issues smallish issues and I think one larger. The first of the two small ones are what Nick and Brenden noted above. The second is your conditional. It should be <= as opposed to the < you used.
The larger seems that you're having a problem conceptualizing the actual workings. For that, let me suggest you step through it here
You can use any() to go through a list of conditions. It will evaluate to True when the condition is first met and False if it never happens. If you combine that with a regular python for loop, it will be nice a succinct:
lista = ['applepie','appleseed','bananacake','bananabread']
words = ['pie', 'cake']
for food in lista:
if any(word in food for word in words):
print(food)
It prints:
applepie
bananacake
You can also so the same thing as a list comprehension to get a list of words that match:
lista = ['applepie','appleseed','bananacake','bananabread']
words = ['pie', 'cake']
found = [food for food in lista if any(word in food for word in words)]
# ['applepie', 'bananacake']
Generally speaking, Python discourages you from using indices in loops unless you really need to. It tends to be error prone and harder to read.

Function to reverse text characters

I am trying to make a reverse function which takes an input (text) and outputs the reversed version. So "Polar" would print raloP.
def reverse(text):
list = []
text = str(text)
x = len(text) - 1
list.append("T" * x)
for i in text:
list.insert(x, i)
x -= 1
print "".join(list)
reverse("Something")
As others have mentioned, Python already provides a couple of ways to reverse a string. The simple way is to use extended slicing: s[::-1] creates a reversed version of string s. Another way is to use the reversed function: ''.join(reversed(s)). But I guess it can be instructive to try implementing it for yourself.
There are several problems with your code.
Firstly,
list = []
You shouldn't use list as a variable name because that shadows the built-in list type. It won't hurt here, but it makes the code confusing, and if you did try to use list() later on in the function it would raise an exception with a cryptic error message.
text = str(text)
is redundant. text is already a string. str(text) returns the original string object, so it doesn't hurt anything, but it's still pointless.
x = len(text) - 1
list.append("T" * x)
You have an off-by-one error here. You really want to fill the list with as many items as are in the original string, this is short by one. Also, this code appends the string as a single item to the list, not as x separate items of one char each.
list.insert(x, i)
The .insert method inserts new items into a list, the subsequent items after the insertion point get moved up to make room. We don't want that, we just want to overwrite the current item at the x position, and we can do that by indexing.
When your code doesn't behave the way you expect it to, it's a Good Idea to add print statements at strategic places to make sure that variables have the value that they're supposed to have. That makes it much easier to find where things are going wrong.
Anyway, here's a repaired version of your code.
def reverse(text):
lst = []
x = len(text)
lst.extend("T" * x)
for i in text:
x -= 1
lst[x] = i
print "".join(lst)
reverse("Something")
output
gnihtemoS
Here's an alternative approach, showing how to do it with .insert:
def reverse(text):
lst = []
for i in text:
lst.insert(0, i)
print "".join(lst)
Finally, instead of using a list we could use string concatenation. However, this approach is less efficient, especially with huge strings, but in modern versions of Python it's not as inefficient as it once was, as the str type has been optimised to handle this fairly common operation.
def reverse(text):
s = ''
for i in text:
s = i + s
print s
BTW, you really should be learning Python 3, Python 2 reaches its official End Of Life in 2020.
You can try :
def reverse(text):
return text[::-1]
print(reverse("Something")) # python 3
print reverse("Something") # python 2
Easier way to do so:
def reverse(text):
rev = ""
i = len(text) - 1
while i > -1:
rev += text[i]
i = i - 1
return rev
print(reverse("Something"))
result: gnihtemoS
You could simply do
print "something"[::-1]

Searching for a character or string of characters in a list (like the find function on websites)

My first post here.
I'd like to create a search function, searching a list for any raw_input entered.
So far, I've been able to call a path on computer and append each item to a list.
I know I can list.index() for a complete file name, but I'd like it to search for
simply any character(s) one might want to input.
Here's what I've got so far:::
import os
list1 = []
x = "/Users/User/temp"
vec = os.listdir(x)
for p in vec:
list1.append(p)
for line in list1:
print line
o = raw_input("search>>> ")
print list.index(o)
Now, with this code, the filename has to be typed in exactly...
So, it'll take my path(users/user/temp) and make a list from it, search
the list for the filename and return the index at which it lies.
How can I search for say.. (you) in the list and bring up a result that
might be (youarewonderful.txt).
Thanks, I'm very new to Python, so any insight or code improvements are welcome
as well.
-peer
This gives a list of indices:
[i for i, x in enumerate(vec) if "you" in x]
This is called a list comprehension, and it uses the enumerate function to keep track of the indices. If you aren't familiar with these, I recommend the official python tutorial here
I've got a lead!
Next, I added:::
for p in vec:
if 'yo' in p:
print p
print list1.index(p)
Sorry Patrick, this is more along the lines of what I wanted to do. Thanks anyways! -although I don't quite get what you were getting at. I'd like to know, though.
Here is completed program, don't know how I got here:::
import os
list1 = []
x = "/Users/User/temp"
vec = os.listdir(x)
for p in vec:
list1.append(p)
for line in list1:
print line
b = raw_input('search for item in list/path>>> ')
for p in vec:
if str(b) in p:
print p
print list1.index(p)
Woo Hoo
-peer

Python if else statement in recursive function getting stuck:

so I'm making a function that takes a string and the number of gaps that i need to insert in it, and i want it to output a list of strings of all possible combinations where those gaps are inserted.
I have written a recursive function for that, but the stopping if condition is not being activated no matter what i do. Even printing the expression in itself gives the proper answer but the if condition doesn't follow that expression.
I hope you guys can help with this, even though it's probably a very simple error on my part, i just cant seem to find it.
Thanks in advance.
f = open("bonusoutput.txt",'w')
sequence1 = raw_input("Sequence 1:")
sequence2 = raw_input("Sequence 2:")
l1 = int(len(sequence1))
l2 = int(len(sequence2))
#---------------Function that has problem-----------------------------
def insertBlanks(numGap,string):
if (numGap <= 0):
return [string]
else:
outSeq = []
for cp in range(0,len(string)+1):
outSeq.append(string[:cp] + "_" + string[cp:])
for seq in outSeq:
outSeq += (insertBlanks(numGap-1,seq))
return outSeq
#-------------------------------------------------------------
nGap1 = l2
nGap2 = l1
outSeq2 = insertBlanks(nGap1,sequence2)
f.write(str(outSeq2))
print outSeq2
While looping for seq in outSeq, you are appending items to outSeq. You're returning a list of at least one item each time (base case returns [string] therefore you will add at least 1 item for each item you visit, so you have an infinite loop. Consider adding your output to a new list (or using a list comprehension, like [insertBlanks(numGap - 1, seq) for seq in outSeq]

Writing shorter, readable, pythonic code

I'm trying to produce shorter, more pythonic, readable python. And I have this working solution for Project Euler's problem 8 (find the greatest product of 5 sequential digits in a 1000 digit number).
Suggestions for writing a more pythonic version of this script?
numstring = ''
for line in open('8.txt'):
numstring += line.rstrip()
nums = [int(x) for x in numstring]
best=0
for i in range(len(nums)-4):
subset = nums[i:i+5]
product=1
for x in subset:
product *= x
if product>best:
best=product
bestsubset=subset
print best
print bestsubset
For example: there's gotta be a one-liner for the below snippet. I'm sure there's a past topic on here but I'm not sure how to describe what I'm doing below.
numstring = ''
for line in open('8.txt'):
numstring += line.rstrip()
Any suggestions? thanks guys!
I'm working on a full answer, but for now here's the one liner
numstring = ''.join(x.rstrip() for x in open('8.txt'))
Edit: Here you go! One liner for the search. List comprehensions are wonderful.
from operator import mul
def prod(list):
return reduce(mul, list)
numstring = ''.join(x.rstrip() for x in open('8.txt'))
nums = [int(x) for x in numstring]
print max(prod(nums[i:i+5]) for i in range(len(nums)-4))
from operator import mul
def product(nums):
return reduce(mul, nums)
nums = [int(c) for c in open('8.txt').read() if c.isdigit()]
result = max((product(nums[i:i+5]) for i in range(len(nums))))
Here is my solution. I tried to write the most "Pythonic" code that I know how to write.
with open('8.txt') as f:
numstring = f.read().replace('\n', '')
nums = [int(x) for x in numstring]
def sub_lists(lst, length):
for i in range(len(lst) - (length - 1)):
yield lst[i:i+length]
def prod(lst):
p = 1
for x in lst:
p *= x
return p
best = max(prod(lst) for lst in sub_lists(nums, 5))
print(best)
Arguably, this is one of the ideal cases to use reduce so maybe prod() should be:
# from functools import reduce # uncomment this line for Python 3.x
from operator import mul
def prod(lst):
return reduce(mul, lst, 1)
I don't like to try to write one-liners where there is a reason to have more than one line. I really like the with statement, and it's my habit to use that for all I/O. For this small problem, you could just do the one-liner, and if you are using PyPy or something the file will get closed when your small program finishes executing and exits. But I like the two-liner using with so I wrote that.
I love the one-liner by #Steven Rumbalski:
nums = [int(c) for c in open('8.txt').read() if c.isdigit()]
Here's how I would probably write that:
with open("8.txt") as f:
nums = [int(ch) for ch in f.read() if ch.isdigit()]
Again, for this kind of short program, your file will be closed when the program exits so you don't really need to worry about making sure the file gets closed; but I like to make a habit of using with.
As far as explaining what that last bit was, first you create an empty string called numstring:
numstring = ''
Then you loop over every line of text (or line of strings) in the txt file 8.txt:
for line in open('8.txt'):
And so for every line you find, you want to add the result of line.rstrip() to it. rstrip 'strips' the whitespace (newlines,spaces etc) from the string:
numstring += line.rstrip()
Say you had a file, 8.txt that contains the text: LineOne \nLyneDeux\t\nLionTree you'd get a result that looked something like this in the end:
>>>'LineOne' #loop first time
>>>'LineOneLyneDeux' # second time around the bush
>>>'LineOneLyneDeuxLionTree' #final answer, reggie
Here's a full solution! First read out the number:
with open("8.txt") as infile:
number = infile.replace("\n", "")
Then create a list of lists with 5 consecutive numbers:
cons_numbers = [list(map(int, number[i:i+5])) for i in range(len(number) - 4)]
Then find the largest and print it:
print(max(reduce(operator.mul, nums) for nums in cons_numbers))
If you're using Python 3.x you need to replace reduce with functools.reduce.

Categories

Resources