Why is my with statement not executing correctly? - python

I have a code, written as shown below, which comprises of a with block.
alleles=[]
pop = 1 + 4
snp=[]
# read in input file
with open("input.txt", 'r') as f:
for line in f:
alleles.append(line.split()[2]+line.split()[1]) # risk first, then ref. keep risk as second allele
snp = [{"riskall": line[1],"weight": float(line[4]),"freq": float(line[pop]),
line[1]+line[1]:(2*float(line[4]),(float(line[pop])*float(line[pop]))),
line[2]+line[1]:(float(line[4]),(2*(((1-float(line[pop]))*(float(line[pop])))))),
line[2]+line[2]:(0, ((1-float(line[pop]))*((1-float(line[pop])))))} for line in map(lambda x: x.split(),f)]
Strangely enough, the snp assignment simply does not work, resulting in an empty array. The same occurs if I switch the places of my for loop and the snp assignment, where the latter works fine but the former does not happen.
Does anyone know what is going on? I'm quite sure the indentation is correct..

Related

I can not add a loop to another Python loop, please help me

I want to add another loop to a Python loop that I have, but I can not come up with the solution, I am new to Python so I need help to make it work, please.
Here is what the code does:
1- It starts with the first CSV file, which reads from the first row (there is a given range to use).
2- it uses the data to run the function. After a few seconds of delay, it goes to the next row and does the same, same process to use all rows in the given range.
3- After a few seconds of delay, it goes to the next CSV file, does the same process as above (delay between each row), until all CSV files are used.
The problem is, when I run the file for the second time and more, it uses the same rows.
I want to add another loop to it, so when I run it again, it iterates over the next row that has not been used.
This is how it should work, assuming the given range is: 2 rows.
1- the first-time run, should read rows 1 and 2 of all CSV files, use the data to run the function.
2- second-time run, should read rows 3 and 4 of all CSV files, use the data to run function. and so on when the next time the file is going to be run.
I appreciate if I can help to make this work,
I have posted this problem some time ago but I did not get any working solution. so I try again.
Here is code that works:
from abc.zzz import xyz
path_id_map = [
{'path':'file1.csv', 'id': '12345678'},
{'path':'file2.csv', 'id': '44556677'}
{'path':'file3.csv', 'id': '33377799'}
{'path':'file4.csv', 'id': '66221144'}]
s_id = None
for pair in path_id_map:
with open(pair['path'], 'r') as f:
next(f) # skip first header line
for _ in range(1, 3):
line = next(f)
img_url, title_1, desc_1, link_1 = map(str.strip, line.split(';'))
zzz.func1(img_url=img_url, title_1=title_1, desc_1=desc_1,
link_1=link_1, B_id=B_id=pair['id'], s_id=s_id)
time.sleep(25)
CSV file content looks like this:
img_url,desc_1 title_1,link_1
site.com/image22.jpg;someTitle;description1;site1.com
site.com/image32.jpg;someTitle;description2;site2.com
site.com/image44.jpg;someTitle;description3;site3.com
thanks.
Edited:
OK, I got some help with a posted solution, but unfortunately, it did not work, and the person did not want to continue to work on the provided code to solve the problem.
I am back to asking for help again.
Can you put your processing logic in a separate function and call that?
All files should have same size or you need to put safety checks to not process when next(f) return empty. (which is a good thing to have anyway)
def process_n_lines_from_offset(file_set, processed, offset):
for pair in path_id_map:
with open(pair['path'], 'r') as f:
next(f) # skip first header line
# skip processed lines
x=0
while (x<processed):
next(f)
x += 1
num_processed = 0
for _ in range(offset):
line = next(f)
img_url, title_1, desc_1, link_1 = map(str.strip, line.split(';'))
zzz.func1(img_url=img_url, title_1=title_1, desc_1=desc_1,
link_1=link_1, B_id=B_id=pair['id'], s_id=s_id)
time.sleep(25)
num_processed += 1
return num_processed
if __name__ == '__main__':
start = 0
default_offset = 2
num_processed = process_n_lines_from_offset(path_id_map, start, default_offset)
num_processed = process_n_lines_from_offset(path_id_map, num_processed, default_offset)

Python and CSV: find a row and give column value

It consists in creating a function def(,) that searches for the name of the kid in the CSV file and gives his age.
The CSV file is structured as this:
Nicholas,12
Matthew,6
Lorna,12
Michael,8
Sebastian,8
Joseph,10
Ahmed,15
while the code that I tried is this:
def fetchcolvalue(kids_agefile, kidname):
import csv
file = open(kids_agefile, 'r')
ct = 0
for row in csv.reader(file):
while True:
print(row[0])
if row[ct] == kidname:
break
The frustrating thing is that it doesn't give me any error, but an infinite loop: I think that's what I'm doing wrong.
So far, what I learnt from the book is only loops (while and for) and if-elif-else cycles, besides CSV and file basic manipulation operations, so I can't really figure out how can I solve the problem with only those tools.
Please notice that the function would have to work with a generic 2-columns CSV file and not only the kids' one.
the while True in your loop is going to make you loop forever (no variables are changed within the loop). Just remove it:
for row in csv.reader(file):
if row[ct] == kidname:
break
else:
print("{} not found".format(kidname))
the csv file is iterated upon, and as soon as row[ct] equals kidname it breaks.
I would add an else statement so you know if the file has been completely scanned without finding the kid's name (just to expose some little-known usage of else after a for loop: if no break encountered, goes into else branch.)
EDIT: you could do it in one line using any and a generator comprehension:
any(kidname == row[ct] for row in csv.reader(file))
will return True if any first cell matches, probably faster too.
This should work, in your example the for loop sets row to the first row of the file, then starts the while loop. The while loop never updates row so it is infinite. Just remove the while loop:
def fetchcolvalue(kids_agefile, kidname):
import csv
file = open(kids_agefile, 'r')
ct = 0
for row in csv.reader(file):
if row[ct] == kidname:
print(row[1])

Splitting a list into a file without duplicates

Large data file like this:
133621 652.4 496.7 1993.0 ...
END SAMPLES EVENTS RES 271.0 2215.0 ...
ESACC 935.6 270.6 2215.0 ...
115133 936.7 270.3 2216.0 ...
115137 936.4 270.4 2219.0 ...
115141 936.1 271.0 2220.0 ...
ESACC L 114837 115141 308 938.5 273.3 2200
115145 936.3 271.8 2220.0 ...
END 115146 SAMPLES EVENTS RES 44.11 44.09
SFIX L 133477
133477 650.8 500.0 2013.0 ...
133481 650.2 499.9 2012.0 ...
ESACC 650.0 500.0 2009.0 ...
Want to grab only the ESACC data into trials. When END appears, preceding ESACC data is aggregated into a trial. Right now, I can get the first chunk of ESACC data into a file but because the loop restarts from the beginning of the data, it keeps grabbing only the first chunk so I have 80 trials with the exact same data.
for i in range(num_trials):
with open(fid) as testFile:
for tline in testFile:
if 'END' in tline:
fid_temp_start.close()
fid_temp_end.close() #Close the files
break
elif 'ESACC' in tline:
tline_snap = tline.split()
sac_x_start = tline_snap[4]
sac_y_start = tline_snap[5
sac_x_end = tline_snap[7]
sac_y_end = tline_snap[8]
My question: How to iterate to the next chunk of data without grabbing the previous chunks?
Try rewriting your code something like this:
def data_parse(filepath): #Make it a function
try:
with open(filepath) as testFile:
tline = '' #Initialize tline
while True: #Switch to an infinite while loop (I'll explain why)
while 'ESACC' not in tline: #Skip lines until one containing 'ESACC' is found
tline = next(testFile) #(since it seems like you're doing that anyway)
tline_snap = tline.split()
trial = [tline_snap[4],'','',''] #Initialize list and assign first value
trial[1] = tline_snap[5]
trial[2] = tline_snap[7]
trial[3] = tline_snap[8]
while 'END' not in tline: #Again, seems like you're skipping lines
tline = next(testFile) #so I'll do the same
yield trial #Output list, save function state
except StopIteration:
fid_temp_start.close() #I don't know where these enter the picture
fid_temp_end.close() #but you closed them so I will too
testfile.close()
#Now, initialize a new list and call the function:
trials = list()
for trial in data_parse(fid);
trials.append(trial) #Creates a list of lists
What this creates is a generator function. By using yield instead of return, the function returns a value AND saves its state. The next time you call the function (as you will do repeatedly in the for loop at the end), it picks up where it left off. It starts at the line after the most recently executed yield statement (which in this case restarts the while loop) and, importantly, it remembers the values of any variables (like the value of tline and the point it stopped at in the data file).
When you reach the end of the file (and have thus recorded all of your trials), the next execution of tline = next(testFile) raises a StopIteration error. The try - except structure catches that error and uses it to exit the while loop and close your files. This is why we use an infinite loop; we want to continue looping until that error forces us out.
At the end of the whole thing, your data is stored in trials as a list of lists, where each item equals [sac_x_start, sac_y_start, sac_x_end, sac_y_end], as you defined them in your code, for one trial.
Note: it does seem to me like your code is skipping lines entirely when they don't contain ESACC or END. I've replicated that, but I'm not sure if that's what you want. If you want to get the lines in between, you can rewrite this fairly simply by adding to the 'END' loop as below:
while 'END' not in tline:
tline = next(testFile)
#(put assignment operations to be applied to each line here)
Of course, you'll have to adjust the variable you're using to store this data accordingly.
Edit: Oh dear lord, I just now noticed how old this question is.

Why re is not compiling 'if' when there is 'else'?

Hello I'm facing a problem and I don't how to fix it. All I know is that when I add an else statement to my if statement the python execution always goes to the else statement even there is there a true statement in if and can enter the if statement.
Here is the script, without the else statement:
import re
f = open('C:\Users\Ziad\Desktop\Combination\MikrofullCombMaj.txt', 'r')
d = open('C:\Users\Ziad\Desktop\Combination\WhatsappResult.txt', 'r')
w = open('C:\Users\Ziad\Desktop\Combination\combination.txt','w')
s=""
av =0
b=""
filtred=[]
Mlines=f.readlines()
Wlines=d.readlines()
for line in Wlines:
Wspl=line.split()
for line2 in Mlines:
Mspl=line2.replace('\n','').split("\t")
if ((Mspl[0]).lower()==(Wspl[0])):
Wspl.append(Mspl[1])
if(len(Mspl)>=3):
Wspl.append(Mspl[2])
s="\t".join(Wspl)+"\n"
if s not in filtred:
filtred.append(s)
break
for x in filtred:
w.write(x)
f.close()
d.close()
w.close()
with the else statement and I want else for the if ((Mspl[0]).lower()==(Wspl[0])):
import re
f = open('C:\Users\Ziad\Desktop\Combination\MikrofullCombMaj.txt', 'r')
d = open('C:\Users\Ziad\Desktop\Combination\WhatsappResult.txt', 'r')
w = open('C:\Users\Ziad\Desktop\Combination\combination.txt','w')
s=""
av =0
b=""
filtred=[]
Mlines=f.readlines()
Wlines=d.readlines()
for line in Wlines:
Wspl=line.split()
for line2 in Mlines:
Mspl=line2.replace('\n','').split("\t")
if ((Mspl[0]).lower()==(Wspl[0])):
Wspl.append(Mspl[1])
if(len(Mspl)>=3):
Wspl.append(Mspl[2])
s="\t".join(Wspl)+"\n"
if s not in filtred:
filtred.append(s)
break
else:
b="\t".join(Wspl)+"\n"
if b not in filtred:
filtred.append(b)
break
for x in filtred:
w.write(x)
f.close()
d.close()
w.close()
first of all, you're not using "re" at all in your code besides importing it (maybe in some later part?) so the title is a bit misleading.
secondly, you are doing a lot of work for what is basically a filtering operation on two files. Remember, simple is better than complex, so for starters, you want to clean your code a bit:
you should use a little more indicative names than 'd' or 'w'. This goes for 'Wsplt', 's' and 'av' as well. Those names don't mean anything and are hard to understand (why is the d.readlines named Wlines when ther's another file named 'w'? It's really confusing).
If you choose to use single letters, it should still make sense (if you iterate over a list named 'results' it makes sense to use 'r'. 'line1' and 'line2' however, are not recommanded for anything)
You don't need parenthesis for conditions
You want to use as little variables as you can as to not get confused. There's too much different variables in your code, it's easy to get lost. You don't even use some of them.
you want to use strip rather than replace, and you want the whole 'cleaning' process to come first and then just have a code the deals with the filtering logic on the two lists. If you split each line according to some logic, and you don't use the original line anywhere in the iteration, then you can do the whole thing in the beggining.
Now, I'm really confused what you're trying to achieve here, and while I don't understand why your doing it that way, I can say that looking at your logic you are repeating yourself a lot. The action of checking against the filtered list should only happend once, and since it happens regardless of whether the 'if' checks out or not, I see absolutely no reason to use an 'else' clause at all.
Cleaning up like I mentioned, and re-building the logic, the script looks something like this:
# PART I - read and analyze the lines
Wappresults = open('C:\Users\Ziad\Desktop\Combination\WhatsappResult.txt', 'r')
Mikrofull = open('C:\Users\Ziad\Desktop\Combination\MikrofullCombMaj.txt', 'r')
Wapp = map(lambda x: x.strip().split(), Wappresults.readlines())
Mikro = map(lambda x: x.strip().split('\t'), Mikrofull.readlines())
Wappresults.close()
Mikrofull.close()
# PART II - filter using some logic
filtred = []
for w in Wapp:
res = w[:] # So as to copy the list instead of point to it
for m in Mikro:
if m[0].lower() == w[0]:
res.append(m[1])
if len(m) >= 3 :
res.append(m[2])
string = '\t'.join(res)+'\n' # this happens regardles of whether the 'if' statement changed 'res' or not
if string not in filtred:
filtred.append(string)
# PART III - write the filtered results into a file
combination = open('C:\Users\Ziad\Desktop\Combination\combination.txt','w')
for comb in filtred:
combination.write(comb)
combination.close()
I can't promise it will work (because again, like I said, I don't know what you're trying to achive) but this should be a lot easier to work with.

Why while loop is sticking at raw_input? (python)

In the following code i am trying to make a "more" command (unix) using python script by reading the file into a list and printing 10 lines at a time and then asking user do you want to print next 10 lines (Print More..).
Problem is that raw_input is asking again and again input if i give 'y' or 'Y' as input and do not continue with the while loop and if i give any other input the while loop brakes.
My code may not be best as am learning python.
import sys
import string
lines = open('/Users/abc/testfile.txt').readlines()
chunk = 10
start = 0
while 1:
block = lines[start:chunk]
for i in block:
print i
if raw_input('Print More..') not in ['y', 'Y']:
break
start = start + chunk
Output i am getting for this code is:-
--
10 lines from file
Print More..y
Print More..y
Print More..y
Print More..a
You're constructing your slices wrong: The second parameter in a slice gives the stop position, not the chunk size:
chunk = 10
start = 0
stop = chunk
end = len(lines)
while True:
block = lines[start:stop] # use stop, not chunk!
for i in block:
print i
if raw_input('Print More..') not in ['y', 'Y'] or stop >= end:
break
start += chunk
stop += chunk
Instead of explaining why your code doesn't work and how to fix it (because Tim Pietzcker already did an admirable job of that), I'm going to explain how to write code so that issues like this don't come up in the first place.
Trying to write your own explicit loops, checks, and index variables is difficult and error-prone. That's why Python gives you nice tools that almost always make it unnecessary to do so. And that's why you're using Python instead of C.
For example, look at the following version of your program:
count = 10
with open('/Users/abc/testfile.txt', 'r') as testfile:
for i, line in enumerate(testfile):
print line
if (i + 1) % count == 0:
if raw_input('Print More..') not in ['y', 'Y']:
break
This is shorter than the original code, and it's also much more efficient (no need to read the whole file in and then build a huge list in advance), but those aren't very good reasons to use it.
One good reason is that it's much more robust. There's very little explicit loop logic here to get wrong. You don't even need to remember how slices work (sure, it's easy to learn that they're [start:stop] rather than [start:length]… but if you program in another language much more frequently than Python, and you're always writing s.sub(start, length), you're going to forget…). It also automatically takes care of ending when you get to the end of the file instead of continuing forever, closing the file for you (even on exceptions, which is painful to get right manually), and other stuff that you haven't written yet.
The other good reason is that it's much easier to read, because, as much as possible, the code tells you what it's doing, rather than the details of how it's doing it.
But it's still not perfect, because there's still one thing you could easily get wrong: that (i + 1) % count == 0 bit. In fact, I got it wrong in my first attempt (I forgot the +1, so it gave me a "More" prompt after lines 0, 10, 20, … instead of 9, 19, 29, …). If you have a grouper function, you can rewrite it even more simply and robustly:
with open('/Users/abc/testfile.txt', 'r') as testfile:
for group in grouper(testfile, 10):
for line in group:
print line
if raw_input('Print More..') not in ['y', 'Y']:
break
Or, even better:
with open('/Users/abc/testfile.txt', 'r') as testfile:
for group in grouper(testfile, 10):
print '\n'.join(group)
if raw_input('Print More..') not in ['y', 'Y']:
break
Unfortunately, there's no such grouper function built into, say, the itertools module, but you can write one very easily:
def grouper(iterator, size):
return itertools.izip(*[iterator]*size)
(If efficiency matters, search around this site—there are a few questions where people do in-depth comparisons of different ways to achieve the same effect. But usually it doesn't matter. For that matter, if you want to understand why this groups things, search this site, because it's been explained at least twice.)
As #Tim Pietzcker pointed out, there's no need of updating chunk here, just use start+10 instead of chunk.
block = lines[start:start+10]
and update start using start += 10.
Another alternative solution using itertools.islice():
with open("data1.txt") as f:
slc=islice(f,5) #replace 5 by 10 in your case
for x in slc:
print x.strip()
while raw_input("wanna see more : ") in("y","Y"):
slc=islice(f,5) #replace 5 by 10 in your case
for x in slc:
print x.strip()
this outputs:
1
2
3
4
5
wanna see more : y
6
7
8
9
10
wanna see more : n

Categories

Resources