Reading file continuously and appending new lines to list (python) - python

I am practicing file reading stuff in python. I got a text file and I want to run a program which continuously reads this text file and appends the newly written lines to a list. To exit the program and print out the resulting list the user should press "enter".
The code I have written so far looks like this:
import sys, select, os
data = []
i = 0
while True:
os.system('cls' if os.name == 'nt' else 'clear')
with open('test.txt', 'r') as f:
for line in f:
data.append(int(line))
print(data)
if sys.stdin in select.select([sys.stdin], [], [], 0)[0]:
line_ = input()
break
So to break out of the while loop 'enter' should be pressed. To be fair I just copy pasted the solution to do this from here: Exiting while loop by pressing enter without blocking. How can I improve this method?
But this code just appends all lines to my list again and again.
So, say my text files contains the lines:
1
2
3
So my list is going to look like data = [1,2,3,1,2,3,1,2,3...] and have a certain lenght until I press enter. When I add a line (e.g. 4) it will go data = [1,2,3,1,2,3,1,2,3,1,2,3,4,1,2,3,4...].
So I am looking for some kind of if statement before my append command so that only the newly written line get appended. But I can't think of something easy.
I already got some tips, i.e.
Continuously checking the file size and only reading the part between old and new size.
Keeping track of line number and skipping to line that is not appended in next iteration.
At the moment, I can't think of a way on how to do this. I tried fiddling around with enumerate(f) and itertools.islice but can't make it work. Would appreciate some help, as I have not yet adopted the way of how programmers think.

Store the file position between iterations. This allows to efficiently fast-forward the file when it is opened again:
data = []
file_position = 0
while True:
with open('test.txt', 'r') as f:
f.seek(file_position) # fast forward beyond content read previously
for line in f:
data.append(int(line))
file_position = f.tell() # store position at which to resume

I could get it to work on Windows. First of all, exiting the while-loop and continuous reading the file are two different questions. I assume, that exiting the while loop is not the main problem, and because your select.select() statement doesn't work this way on Windows, I created an exit-while with a try-except clause that triggers on Ctrl-c. (It's just one way of doing it).
The second part of your questions is how to continuously read the file. Well, not by reopening it again and again within the while loop, open it before the while loop.
Then, as soon as the file is being changed, either a valid or an invalid line is read. I suppose this happens because the iteration over f may sometimes happen before the file was completely written (I'm not quite sure about that). Anyway, it is easy to check the read line. I used again a try-except clause for it which catches the error if int(line) raises an error.
Code:
import sys, select, os
data = []
with open('text.txt', 'r') as f:
try:
while True:
os.system('cls' if os.name == 'nt' else 'clear')
for line in f:
try:
data.append(int(line))
except:
pass
print(data)
except KeyboardInterrupt:
print('Quit loop')
print(data)

Related

How do I properly read large text files in Python so I dont clog up memory?

So today while buying BTC I messed up and lost my decryption passphrase to wallet that ATM sends automatically on email.
I remember the last 4 characters of the passphrase so I generated a wordlist and wanted to try to bruteforce my way into it. It was a 4MB file and the script checked all the possibilities with no luck. Then I realized that maybe the letters are wrong, but I still remember what numbers were in those 4 chars. Well suddenly, I have 2GB file that get SIGKILLed by Ubuntu.
Here is the whole code, it is very short.
#!/usr/bin/python
from zipfile import ZipFile
import sys
i = 0
found = False
with ZipFile("/home/kuskus/Desktop/wallet.zip") as zf:
with open('/home/kuskus/Desktop/wl.txt') as wordlist:
for line in wordlist.readlines():
if(not found):
try:
zf.extractall(pwd = str.encode(line))
print("password found: %s" % line)
found = True
except:
print(i)
i += 1
else: sys.exit()
I think the issue is that the textfile fills up the memory so OS kills it. I really don't know how could I read the file, maybe by 1000 lines, then clean it and do another 1000 lines. If anyone could help me I would be very grateful, thank you in advance :) Oh and the text file has about 300 milion lines, if it matters.
Usually the best thing to do is to iterate over the file directly. The file handler will act as a generator, producing lines one at a time rather than aggregating them all into memory at once into a list (as fh.readlines() does):
with open("somefile") as fh:
for line in fh:
# do something
Furthermore, file handles allow you to read specific amounts of data if you so choose:
with open("somefile") as fh:
number_of_chars = fh.read(15) # 15 is the number of characters in a StringIO style handler
while number_of_chars:
# do something with number_of_chars
number_of_chars = fh.read(15)
Or, if you want to read a specific number of lines:
with open('somefile') as fh:
while True:
chunk_of_lines = [fh.readline() for i in range(5)] # this will read 5 lines at a time
if not chunk_of_lines:
break
# do something else here
Where fh.readline() is analogous to calling next(fh) in a for loop.
The reason a while loop is used in the latter two examples is because once the file has been completely iterated through, fh.readline() or fh.read(some_integer) will yield an empty string, which acts as False and will terminate the loop

how to read lines in a file and unread it once it has read in python

So I am writing a for loop program which reads line from the file.
file lines looks like this:
Program: Python User: Cma Code: 1234
The program goes like this:
while True:
with open('file.txt', 'r') as fp:
for i in fp:
data = i.split()
program = data[1]
user = data[3]
code = data[5]
total = program + user + code
print(total)
file.seek(-len(i),1)
else:
print("Program Put to sleep!")
time.sleep(5)
I believe my issue in this code is a logical error.
The program is suppose to keep running and check of the file constantly. If there are 3 lines in the file it should read each one of them then unread it and when it finds there is no more lines in the file, the program goes to sleep and and start the loop from the new line entry.
In the program I have coded it keeps on reading from the beginning. Had a look at some examples from this platform but wasn't helpful so thought to ask. cheers
can you try deleting file.seek and changing with statement like this:
for line in fp.readlines():
data = line.split(" ")

Python conditional statement based on text file string

Noob question here. I'm scheduling a cron job for a Python script for every 2 hours, but I want the script to stop running after 48 hours, which is not a feature of cron. To work around this, I'm recording the number of executions at the end of the script in a text file using a tally mark x and opening the text file at the beginning of the script to only run if the count is less than n.
However, my script seems to always run regardless of the conditions. Here's an example of what I've tried:
with open("curl-output.txt", "a+") as myfile:
data = myfile.read()
finalrun = "xxxxx"
if data != finalrun:
[CURL CODE]
with open("curl-output.txt", "a") as text_file:
text_file.write("x")
text_file.close()
I think I'm missing something simple here. Please advise if there is a better way of achieving this. Thanks in advance.
The problem with your original code is that you're opening the file in a+ mode, which seems to set the seek position to the end of the file (try print(data) right after you read the file). If you use r instead, it works. (I'm not sure that's how it's supposed to be. This answer states it should write at the end, but read from the beginning. The documentation isn't terribly clear).
Some suggestions: Instead of comparing against the "xxxxx" string, you could just check the length of the data (if len(data) < 5). Or alternatively, as was suggested, use pickle to store a number, which might look like this:
import pickle
try:
with open("curl-output.txt", "rb") as myfile:
num = pickle.load(myfile)
except FileNotFoundError:
num = 0
if num < 5:
do_curl_stuff()
num += 1
with open("curl-output.txt", "wb") as myfile:
pickle.dump(num, myfile)
Two more things concerning your original code: You're making the first with block bigger than it needs to be. Once you've read the string into data, you don't need the file object anymore, so you can remove one level of indentation from everything except data = myfile.read().
Also, you don't need to close text_file manually. with will do that for you (that's the point).
Sounds more for a job scheduling with at command?
See http://www.ibm.com/developerworks/library/l-job-scheduling/ for different job scheduling mechanisms.
The first bug that is immediately obvious to me is that you are appending to the file even if data == finalrun. So when data == finalrun, you don't run curl but you do append another 'x' to the file. On the next run, data will be not equal to finalrun again so it will continue to execute the curl code.
The solution is of course to nest the code that appends to the file under the if statement.
Well there probably is an end of line jump \n character which makes that your file will contain something like xx\n and not simply xx. Probably this is why your condition does not work :)
EDIT
What happens if through the python command line you type
open('filename.txt', 'r').read() # where filename is the name of your file
you will be able to see whether there is an \n or not
Try using this condition along with if clause instead.
if data.count('x')==24
data string may contain extraneous data line new line characters. Check repr(data) to see if it actually a 24 x's.

Check if a file is modified in Python

I am trying to create a box that tells me if a file text is modified or not, if it is modified it prints out the new text inside of it. This should be in an infinite loop (the bot sleeps until the text file is modified).
I have tried this code but it doesn't work.
while True:
tfile1 = open("most_recent_follower.txt", "r")
SMRF1 = tfile1.readline()
if tfile1.readline() == SMRF1:
print(tfile1.readline())
But this is totally not working... I am new to Python, can anyone help me?
def read_file():
with open("most_recent_follower.txt", "r") as f:
SMRF1 = f.readlines()
return SMRF1
initial = read_file()
while True:
current = read_file()
if initial != current:
for line in current:
if line not in initial:
print(line)
initial = current
Read the file in once, to get it's initial state. Then continuously repeat reading of the file. When it changes, print out its contents.
I don't know what bot you are referring to, but this code, and yours, will continuously read the file. It never seems to exit.
I might suggest copying the file to a safe duplicate location, and possibly using a diff program to determine if the current file is different from the original copy, and print the added lines. If you just want lines appended you might try to utilize a utility like tail
You can also use a library like pyinotify to only trigger when the filesystem detects the file has been modified
This is the first result on Google for "check if a file is modified in python" so I'm gonna add an extra solution here.
If you're curious if a file is modified in the sense that its contents have changed, OR it was touched, then you can use os.stat:
import os
get_time = lambda f: os.stat(f).st_ctime
fn = 'file.name'
prev_time = get_time(fn)
while True:
t = get_time(fn)
if t != prev_time:
do_stuff()
prev_time = t

Open text file, print new lines only in python

I am opening a text file, which once created is constantly being written to, and then printing this out to a console any new lines, as I don't want to reprint the whole text file each time. I am checking to see if the file grows in size, if it is, just print the next new line. This is mostly working, but occasionally it gets a bit confused about the next new line, and new lines appear a few lines up, mixed in with the old lines.
Is there a better way to do this, below is my current code.
infile = "Null"
while not os.path.exists(self.logPath):
time.sleep(.1)
if os.path.isfile(self.logPath):
infile = codecs.open(self.logPath, encoding='utf8')
else:
raise ValueError("%s isn't a file!" % file_path)
lastSize = 0
lastLineIndex = 0
while True:
wx.Yield()
fileSize = os.path.getsize(self.logPath)
if fileSize > lastSize:
lines = infile.readlines()
newLines = 0
for line in lines[lastLineIndex:]:
newLines += 1
self.running_log.WriteText(line)
lastLineIndex += newLines
if "DBG-X: Returning 1" in line:
self.subject = "FAILED! - "
self.sendEmail(self)
break
if "DBG-X: Returning 0" in line:
self.subject = "PASSED! - "
self.sendEmail(self)
break
fileSize1 = fileSize
infile.flush()
infile.seek(0)
infile.close()
Also my application freezes whilst waiting for the text file to be created, as it takes a couple of seconds to appear, which isn't great.
Cheers.
This solution could help. You'd also have to do a bit of waiting until the file appears, using os.path.isfile and time.sleep.
Maybe you could:
open the file each time you need to read in it,
use lastSize as argument to seek directly to where you stopped at last reading.
Additional comment: I don't know if you need some protection, but I think you should not bother to test whether given filename is a file or not; just open it in a try...except block and catch problems if any.
As for the freezing of your application, you may want to use some kind of Threading, for instance: one thread, your main one, is handling the GUI, and a second one would wait for the file to be created. Once the file is created, the second thread sends signals to the GUI thread, containing the data to be displayed.

Categories

Resources