Python read file in while loop - python

I am currently in some truble regarding python and reading files. I have to open a file in a while loop and do some stuff with the values of the file. The results are written into a new file. This new file is then read in the next run of the while loop. But in this second run I get no values out of this file... Here is a code snippet, that hopefully clarifies what I mean.
while convergence == 0:
run += 1
prevrun = run-1
if os.path.isfile("./Output/temp/EmissionMat%d.txt" %prevrun) == True:
matfile = open("./Output/temp/EmissionMat%d.txt" %prevrun, "r")
EmissionMat = Aux_Functions.EmissionMat(matfile)
matfile.close()
else:
matfile = open("./Input/EmissionMat.txt", "r")
EmissionMat = Aux_Functions.EmissionMat(matfile)
matfile.close()
# now some valid operations, which produce a matrix
emissionmat_file = open("./output/temp/EmissionMat%d.txt" %run, "w")
emissionmat_file.flush()
emissionmat_file.write(str(matrix))
emissionmat_file.close()
Solved it!
matfile.seek(0)
This resets the pointer to the begining of the file and allows me to read the file in the next run correctly.

Why to write to a file and then read it ? Moreover you use flush, so you are doing potentially long io. I would do
with open(originalpath) as f:
mat = f.read()
while condition :
run += 1
write_mat_run(mat, run)
mat = func(mat)
write_mat_run may be done in another thread. You should check io exceptions.
BTW this will probably solve your bug, or at least make it clear.

I can see nothing wrong with your code. The following concrete example worked on my Linux machine:
import os
run = 0
while run < 10:
run += 1
prevrun = run-1
if os.path.isfile("output%d.txt" %prevrun):
matfile = open("output%d.txt" %prevrun, "r")
data = matfile.readlines()
matfile.close()
else:
matfile = open("input.txt", "r")
data = matfile.readlines()
matfile.close()
data = [ s[:-1] + "!\n" for s in data ]
emissionmat_file = open("output%d.txt" %run, "w")
emissionmat_file.writelines(data)
emissionmat_file.close()
It adds an exclamation mark to each line in the file input.txt.

I solved it
before closing the file I do
matfile.seek(0)
This solved my problem. This methods sets the pointer of the reader to the beginning.

Related

Python Question - How to extract text between {textblock}{/textblock} of a .txt file?

I want to extract the text between {textblock_content} and {/textblock_content}.
With this script below, only the 1st line of the introtext.txt file is going to be extracted and written in a newly created text file. I don't know why the script does not extract also the other lines of the introtext.txt.
f = open("introtext.txt")
r = open("textcontent.txt", "w")
for l in f.readlines():
if "{textblock_content}" in l:
pos_text_begin = l.find("{textblock_content}") + 19
pos_text_end = l.find("{/textblock_content}")
text = l[pos_text_begin:pos_text_end]
r.write(text)
f.close()
r.close()
How to solve this problem?
Your code actually working fine, assuming you have begin and end block in your line. But I think this is not what you dreamed of. You can't read multiple blocks in one line, and you can't read block which started and ended in different lines.
First of all take a look at the object which returned by open function. You can use method read in this class to access whole text. Also take a look at with statements, it can help you to make actions with file easier and safely. And to rewrite your code so it will read something between {textblockcontent} and {\textblockcontent} we should write something like this:
def get_all_tags_content(
text: str,
tag_begin: str = "{textblock_content}",
tag_end: str = "{/textblock_content}"
) -> list[str]:
useful_text = text
ans = []
# Heavy cicle, needs some optimizations
# Works in O(len(text) ** 2), we can better
while tag_begin in useful_text:
useful_text = useful_text.split(tag_begin, 1)[1]
if tag_end not in useful_text:
break
block_content, useful_text = useful_text.split(tag_end, 1)
ans.append(block_content)
return ans
with open("introtext.txt", "r") as f:
with open("textcontent.txt", "w+") as r:
r.write(str(get_all_tags_content(f.read())))
To write this function efficiently, so it can work with a realy big files on you. In this implementation I have copied our begin text every time out context block appeared, it's not necessary and it's slow down our program (Imagine the situation where you have millions of lines with content {textblock_content}"hello world"{/textblock_content}. In every line we will copy whole text to continue out program). We can use just for loop in this text to avoid copying. Try to solve it yourself
When you call file.readlines() the file pointer will reach the end of the file. For further calls of the same, the return value will be an empty list so if you change your code to sth like one of the below code snippets it should work properly:
f = open("introtext.txt")
r = open("textcontent.txt", "w")
f_lines = f.readlines()
for l in f_lines:
if "{textblock_content}" in l:
pos_text_begin = l.find("{textblock_content}") + 19
pos_text_end = l.find("{/textblock_content}")
text = l[pos_text_begin:pos_text_end]
r.write(text)
f.close()
r.close()
Also, you can implement it through with context manager like the below code snippet:
with open("textcontent.txt", "w") as r:
with open("introtext.txt") as f:
for line in f:
if "{textblock_content}" in l:
pos_text_begin = l.find("{textblock_content}") + 19
pos_text_end = l.find("{/textblock_content}")
text = l[pos_text_begin:pos_text_end]
r.write(text)

For loop isn't working in python 3

I'm trying to save a file from a URL into a folder on my computer, but I have 732 URLs (that when saved, gives experimental data) in a list. I'm trying to run a for loop on all those URLs to save each data set into its own file. This is what I'm doing right now:
for i in ExperimentURLs:
myurl123 = str(i)
myreq = urllib.request.urlopen(myurl123)
mydata = myreq.read()
with open('/Users/lauren/Desktop/IDData/file', 'wb') as ofile:
ofile.write(mydata)
ExperimentURLs is my list of URLs, but I don't know how to handle the for loop to save each data set into a new file. Currently, this code only writes a single experiment's data into a file and stops there. If I try to save it to a different file name, it takes a different experiment's data and saves that to the file. Help?
First, you need to automatically generate a new output file name every time through the loop. I'll give you the trivial version below. Also, note that the URLs are already strings; you don't have to convert them.
pos = 0
for myurl123 in ExperimentURLs:
myreq = urllib.request.urlopen(myurl123)
mydata = myreq.read()
out_file = '/Users/lauren/Desktop/IDData/file' + str(pos)
with open(out_file, 'wb') as ofile:
ofile.write(mydata)
pos += 1
Does that solve your problem?
BTW, you can do the two iterations in parallel with
for i, myurl123 in enumerate(ExperimentURLs):
Your mistake is simply at the point of writing the files. Not that the for loop is not working. You are writing to the same file again and again. Here is a modified version, using requests. All you need to do is simply change the file name when saving.
import requests
ExperimentURLs = [
"https://www.google.com",
"https://www.yahoo.com"
]
counter = 0;
for i in ExperimentURLs:
myurl123 = str(i)
r = requests.get(myurl123)
mydata = r.text.encode('utf-8').strip()
fileName = counter
with open("results/"+str(fileName)+".html", 'w') as ofile:
ofile.write(mydata)
counter += 1

Reading CSV file with python

filename = 'NTS.csv'
mycsv = open(filename, 'r')
mycsv.seek(0, os.SEEK_END)
while 1:
time.sleep(1)
where = mycsv.tell()
line = mycsv.readline()
if not line:
mycsv.seek(where)
else:
arr_line = line.split(',')
var3 = arr_line[3]
print (var3)
I have this Paython code which is reading the values from a csv file every time there is a new line printed in the csv from external program. My problem is that the csv file is periodically completely rewriten and then python stops reading the new lines. My guess is that python is stuck on some line number and the new update can put maybe 50 more or less lines. So for example python is now waiting a new line at line 70 and the new line has come at line 95. I think the solution is to let mycsv.seek(0, os.SEEK_END) been updated but not sure how to do that.
What you want to do is difficult to accomplish without rewinding the file every time to make sure that you are truly on the last line. If you know approximately how many characters there are on each line, then there is a shortcut you could take using mycsv.seek(-end_buf, os.SEEK_END), as outlined in this answer. So your code could work somehow like this:
avg_len = 50 # use an appropriate number here
end_buf = 3 * avg_len / 2
filename = 'NTS.csv'
mycsv = open(filename, 'r')
mycsv.seek(-end_buf, os.SEEK_END)
last = mycsv.readlines()[-1]
while 1:
time.sleep(1)
mycsv.seek(-end_buf, os.SEEK_END)
line = mycsv.readlines()[-1]
if not line == last:
arr_line = line.split(',')
var3 = arr_line[3]
print (var3)
Here, in each iteration of the while loop, you seek to a position close to the end of the file, just far back enough that you know for sure the last line will be contained in what remains. Then you read in all the remaining lines (this will probably include a partial amount of the second or third to last lines) and check if the last line of these is different to what you had before.
You can do a simpler way of reading lines in your program. Instead of trying to use seek in order to get what you need, try using readlines on the file object mycsv.
You can do the following:
mycsv = open('NTS.csv', 'r')
csv_lines = mycsv.readlines()
for line in csv_lines:
arr_line = line.split(',')
var3 = arr_line[3]
print(var3)

Include surrounding lines of text file match in output using Python 2.7.3

I've been working on a program which assists in log analysis. It finds error or fail messages using regex and prints them to a new .txt file. However, it would be much more beneficial if the program including the top and bottom 4 lines around what the match is. I can't figure out how to do this! Here is a part of the existing program:
def error_finder(filepath):
source = open(filepath, "r").readlines()
error_logs = set()
my_data = []
for line in source:
line = line.strip()
if re.search(exp, line):
error_logs.add(line)
I'm assuming something needs to be added to the very last line, but I've been working on this for a bit and either am not applying myself fully or just can't figure it out.
Any advice or help on this is appreciated.
Thank you!
Why python?
grep -C4 '^your_regex$' logfile > outfile.txt
Some comments:
I'm not sure why error_logs is a set instead of a list.
Using readlines() will read the entire file in memory, which will be inefficient for large files. You should be able to just iterate over the file a line at a time.
exp (which you're using for re.search) isn't defined anywhere, but I assume that's elsewhere in your code.
Anyway, here's complete code that should do what you want without reading the whole file in memory. It will also preserve the order of input lines.
import re
from collections import deque
exp = '\d'
# matches numbers, change to what you need
def error_finder(filepath, context_lines = 4):
source = open(filepath, 'r')
error_logs = []
buffer = deque(maxlen=context_lines)
lines_after = 0
for line in source:
line = line.strip()
if re.search(exp, line):
# add previous lines first
for prev_line in buffer:
error_logs.append(prev_line)
# clear the buffer
buffer.clear()
# add current line
error_logs.append(line)
# schedule lines that follow to be added too
lines_after = context_lines
elif lines_after > 0:
# a line that matched the regex came not so long ago
lines_after -= 1
error_logs.append(line)
else:
buffer.append(line)
# maybe do something with error_logs? I'll just return it
return error_logs
I suggest to use index loop instead of for each loop, try this:
error_logs = list()
for i in range(len(source)):
line = source[i].strip()
if re.search(exp, line):
error_logs.append((line,i-4,i+4))
in this case your errors log will contain ('line of error', line index - 4, line index + 4), so you can get these lines later form "source"

Downloading a file into memory

I am writing a python script and I just need the second line of a series of very small text files. I would like to extract this without saving the file to my harddrive as I currently do.
I have found a few threads that reference the TempFile and StringIO modules but I was unable to make much sense of them.
Currently I download all of the files and name them sequentially like 1.txt, 2.txt, etc, then go through all of them and extract the second line. I would like to open the file grab the line then move on to finding and opening and reading the next file.
Here is what I do currently with writing it to my HDD:
while (count4 <= num_files):
file_p = [directory,str(count4),'.txt']
file_path = ''.join(file_p)
cand_summary = string.strip(linecache.getline(file_path, 2))
linkFile = open('Summary.txt', 'a')
linkFile.write(cand_summary)
linkFile.write("\n")
count4 = count4 + 1
linkFile.close()
Just replace the file writing with a call to append() on a list. For example:
summary = []
while (count4 <= num_files):
file_p = [directory,str(count4),'.txt']
file_path = ''.join(file_p)
cand_summary = string.strip(linecache.getline(file_path, 2))
summary.append(cand_summary)
count4 = count4 + 1
As an aside you would normally write count += 1. Also it looks like count4 uses 1-based indexing. That seems pretty unusual for Python.
You open and close the output file in every iteration.
Why not simply do
with open("Summary.txt", "w") as linkfile:
while (count4 <= num_files):
file_p = [directory,str(count4),'.txt']
file_path = ''.join(file_p)
cand_summary = linecache.getline(file_path, 2).strip() # string module is deprecated
linkFile.write(cand_summary)
linkFile.write("\n")
count4 = count4 + 1
Also, linecache is probably not the right tool here since it's optimized for reading multiple lines from the same file, not the same line from multiple files.
Instead, better do
with open(file_path, "r") as infile:
dummy = infile.readline()
cand_summary = infile.readline.strip()
Also, if you drop the strip() method, you don't have to re-add the \n, but who knows why you have that in there. Perhaps .lstrip() would be better?
Finally, what's with the manual while loop? Why not use a for loop?
Lastly, after your comment, I understand you want to put the result in a list instead of a file. OK.
All in all:
summary = []
for count in xrange(num_files):
file_p = [directory,str(count),'.txt'] # or count+1, if you start at 1
file_path = ''.join(file_p)
with open(file_path, "r") as infile:
dummy = infile.readline()
cand_summary = infile.readline().strip()
summary.append(cand_summary)

Categories

Resources