For loop in python: Code issue - python

I am somehow able to write the below code(taking help from various sources):
langs=['C','Java','Cobol','Python']
f1=open('a.txt','r')
f2=open('abc.txt','w')
for i in range(len(langs)):
for line in f1:
f2.write(line.replace('Frst languag','{}'.format(langs[i])))
f1.close()
f2.close()
Don't know why the the for loop is not running till the end. Because everytime i open the txt only 'C' gets stored in the txt. I want the script to run and at the end of the script's execution the txt should have the last value of the list (here python)

After the first pass of your inner for loop, f1 is pointing to the end of the file. So the subsequent passes don't do anything.
The easiest fix is to move f1=open('a.txt','r') to just before for line in f1:. Then the file will be re-read for each of your languages. (Alternatively, you might be able to restructure your logic so that you can handle all of the languages at the same time in one pass of the file.)

Related

Refresh variable when reading from a txt file

I have a file in my python folder called data.txt and i have another file read.py trying to read text from data.txt but when i change something in data.txt my read doesn't show anything new i put
Something else i tried wasn't working and i found something that read, but when i changed it to something that was actually meaningful it didn't print the new text.
Can someone explain why it doesn't refresh, or what i need to do to fix it?
with open("data.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
First and foremost, strings are immutable in Python - once you use file.read(), that returned object cannot change.
That being said, you must re-read the file at any given point the file contents may change.
For example
read.py
def get_contents(filepath):
with open(filepath) as f:
return f.read().rstrip("\n")
main.py
from read import get_contents
import time
print(get_contents("data.txt"))
time.sleep(30)
# .. change file somehow
print(get_contents("data.txt"))
Now, you could setup an infinite loop that watches the file's last modification timestamp from the OS, then always have the latest changes, but that seems like a waste of resources unless you have a specific need for that (e.g. tailing a log file), however there are arguably better tools for that
It was unclear from your question if you do the read once or multiple times. So here are steps to do:
Make sure you call the read function repeatedly with a certain interval
Check if you actually save file after modification
Make sure there are no file usage conflicts
So here is a description of each step:
When you read a file the way you shared it gets closed, meaning it is read only once, you need to read it multiple times if you want to see changes, so make it with some kind of interval in another thread or async or whatever suits your application best.
This step is obvious, remember to hit ctrl+c
It may happen that a single file is being accessed by multiple processes, for example your editor and the script, now to prevent errors try the following code:
def read_file(file_name: str):
while True:
try:
with open(file_name) as f:
return f.read().rstrip("\n")
except IOError:
pass

Precurse with open() or .write()?

Is there a way to precurse a write function in python (I'm working with fasta files but any write function that works with text files should work)?
The only way I could think is to read the whole file in as an array and count the number of lines I want to start at and just re-write that array, at that value, to a text file.
I was just thinking there might be a write an option or something somewhere.
I would add some code, but I'm writing it right now, and everyone on here seems to be pretty well versed, and probably know what I'm talking about. I'm an EE in the CS domain and just calling on the StackOverflow community to enlighten me.
From what I understand you want to truncate a file from the start - i.e remove the first n lines.
Then no - there is no way you can do without reading in the lines and ignoring the lines - this is what I would do :
import shutil
remove_to = 5 # Remove lines 0 to 5
try:
with open('precurse_me.txt') as inp, open('temp.txt') as out:
for index, line in enumerate(inp):
if index <= remove_to:
continue
out.write(line)
# If you don't want to replace the original file - delete this
shutil.move('temp.txt', 'precurse_me.txt')
except Exception as e:
raise e
Here I open a file for the output and then use shutil.move() to replace the input file only after the processing (the for loop) is complete. I do this so that I don't break the 'precurse_me.txt' file in case the processing fails. I wrap the whole thing in a try/except so that if anything fails it doesn't try to move the file by accident.
The key is the for loop - read the input file line by line; using the enumerate() function to count the lines as they come in.
Ignore those lines (by using continue) until the index says to not ignore the line - after that simply write each line to the out file.

Python : Delete certain lines from a file

I'm making a program which deletes certain lines from an existing file. It takes file1 as entry(f1), it looks for a certain pattern and if it finds it, it modifies the line (to make it compatible with the other file) and saves this modification in a variable 'mark'. It opens another file f2, and searches 'mark' in it. If it finds 'mark' in a certain line in f2, I have to delete that line and the three lines after. The thing is that when I run it, the program deletes everything from f2, so I get an empty file as a result.
new=''
pattern2 = '2:N:0:8'
i=0
f1=open('test_reverse.txt','r')
for line in f1:
if pattern2 in line:
mark=line.replace('2:N:0:8','1:N:0:8')
f2=open('test_OKforward2.txt','r')
lines=f2.readlines()
for i, line in enumerate(lines):
if mark in lines[i]:
e=lines[i]
e1=lines[i+1]
e2=lines[i+2]
e3=lines[i+3]
new=e+e1+e2+e3
f3=open('test_OKforward2.txt','w')
if line!=new:
f3.write(line)
I tried with the next() function as well, but I got the same result and a 'stop iteration' error.
The thing is that when I run it, the program deletes everything from f2, so I get an empty file as a result.
Whenever you open a file for writing, everything in it is lost. You have to re-write everything you wish to preserve in the files and exclude what you wanted to delete in the first place.
Notice these lines:
f2=open('test_OKforward2.txt','r')
# ...
f3=open('test_OKforward2.txt','w')
The problem is that f3 is opening the same file as f2 for writing for every loop on the lines of file f2.
Basically, after you add lines, you re-open the file for writing, eliminating what you had previously.
First: You should remove the f3=open from within the loop iterating on each line of f2 (i.e. do this at some other location outside this loop). This is the main issue.
Second: Use a temporary file for the process instead and, at the end, rename the temporary file to the one you had.
Third: You're not closing the files. Consider using context managers. Your code would look more like this:
with open('something.txt') as f2:
# do something with f2;
# f2 with be automatically closed when it exits the ctx manager
Fourth: Follow the PEP-8 style standards for your code. Everyone reading your code will thank you.
I got [...] a 'stop iteration' error.
This is normal; you said you were using the next() function. Iterators and next() raise StopIteration in order to signal that they cannot produce more elements from the collection being iterated and that this iteration process should stop.
Quoting the docs:
exception StopIteration
Raised by built-in function next() and an iterator‘s __next__() method to signal that there are no further items produced by the iterator.

using for-loop on a function that opens a file

I'm very new to programming/python so I have some trouble understanding in which order different operations should be performed for optimal usage.
I wrote a script that takes a long list of words and searches different files for bits of text that contain these words and returns the result but it is not very fast at the moment.
What I think I first need to optimize is the code listed below.
Is there a more resource efficient way to write the following code:
ListofStuff = ["blabla", "singer", "dinger"]
def FindinFile(FindStuff):
with open(File, encoding="utf-8") as TargetFile:
for row in TargetFile:
# search whole file for FindStuff and return chunkoftext as result
def EditText(result):
#do some text editing to result
print edited text
for key in ListofStuff:
EditText(FindinFile(key))
Does (with open) here open the file each time I rerun the function FindinFile in the for-loop at the end? Or does (with-open) keep the file in the buffer until the script is finished?
You have to assume that variable is valid and exists in the same scope in which it was defined. It was defined in a with clause, so it ceases to exist once you exit this clause (and function) - so yes, file is reopened every that time (unless there's some optimization, which is unlikely in this case, though).

"for line in file object" method to read files

I'm trying to find out the best way to read/process lines for super large file.
Here I just try
for line in f:
Part of my script is as below:
o=gzip.open(file2,'w')
LIST=[]
f=gzip.open(file1,'r'):
for i,line in enumerate(f):
if i%4!=3:
LIST.append(line)
else:
LIST.append(line)
b1=[ord(x) for x in line]
ave1=(sum(b1)-10)/float(len(line)-1)
if (ave1 < 84):
del LIST[-4:]
output1=o.writelines(LIST)
My file1 is around 10GB; and when I run the script, the memory usage just keeps increasing to be like 15GB without any output. That means the computer is still trying to read the whole file into memory first, right? This really makes no different than using readlines()
However in the post:
Different ways to read large data in python
Srika told me:
The for line in f treats the file object f as an iterable, which automatically uses buffered IO and memory management so you don't have to worry about large files.
But obviously I still need to worry large files..I'm really confused.
thx
edit:
Every 4 lines is kind of group in my data.
THe purpose is to do some calculations on every 4th line; and based on that calculation, decide if we need to append those 4 lines.So writing lines is my purpose.
The reason the memory keeps inc. even after you use enumerator is because you are using LIST.append(line). That basically accumulates all the lines of the file in a list. Obviously its all sitting in-memory. You need to find a way to not accumulate lines like this. Read, process & move on to next.
One more way you could do is read your file in chunks (in fact reading 1 line at a time can qualify in this criteria, 1chunk == 1line), i.e. read a small part of the file process it then read next chunk etc. I still maintain that this is best way to read files in python large or small.
with open(...) as f:
for line in f:
<do something with line>
The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered IO and memory management so you don't have to worry about large files.
It looks like at the end of this function, you're taking all of the lines you've read into memory, and then immediately writing them to a file. Maybe you can try this process:
Read the lines you need into memory (the first 3 lines).
On the 4th line, append the line & perform your calculation.
If your calculation is what you're looking for, flush the values in your collection to the file.
Regardless of what follows, create a new collection instance.
I haven't tried this out, but it could maybe look something like this:
o=gzip.open(file2,'w')
f=gzip.open(file1,'r'):
LIST=[]
for i,line in enumerate(f):
if i % 4 != 3:
LIST.append(line)
else:
LIST.append(line)
b1 = [ord(x) for x in line]
ave1 = (sum(b1) - 10) / float(len(line) - 1
# If we've found what we want, save them to the file
if (ave1 >= 84):
o.writelines(LIST)
# Release the values in the list by starting a clean list to work with
LIST = []
EDIT: As a thought though, since your file is so large, this may not be the best technique because of all the lines you would have to write to file, but it may be worth investigating regardless.
Since you add all the lines to the list LIST and only sometimes remove some lines from it, LIST we become longer and longer. All those lines that you store in LIST will take up memory. Don't keep all the lines around in a list if you don't want them to take up memory.
Also your script doesn't seem to produce any output anywhere, so the point of it all isn't very clear.
Ok, you know what your problem is already from the other comments/answers, but let me simply state it.
You are only reading a single line at a time into memory, but you are storing a significant portion of these in memory by appending to a list.
In order to avoid this you need to store something in the filesystem or a database (on the disk) for later look up if your algorithm is complicated enough.
From what I see it seems you can easily write the output incrementally. ie. You are currently using a list to store valid lines to write to output as well as temporary lines you may delete at some point. To be efficient with memory you want to write the lines from your temporary list as soon as you know these are valid output.
In summary, use your list to store only temporary data you need to do your calculations based off of, and once you have some valid data ready for output you can simply write it to disk and delete it from your main memory (in python this would mean you should no longer have any references to it.)
If you do not use the with statement , you must close the file's handlers:
o.close()
f.close()

Categories

Resources