using for-loop on a function that opens a file - python

I'm very new to programming/python so I have some trouble understanding in which order different operations should be performed for optimal usage.
I wrote a script that takes a long list of words and searches different files for bits of text that contain these words and returns the result but it is not very fast at the moment.
What I think I first need to optimize is the code listed below.
Is there a more resource efficient way to write the following code:
ListofStuff = ["blabla", "singer", "dinger"]
def FindinFile(FindStuff):
with open(File, encoding="utf-8") as TargetFile:
for row in TargetFile:
# search whole file for FindStuff and return chunkoftext as result
def EditText(result):
#do some text editing to result
print edited text
for key in ListofStuff:
EditText(FindinFile(key))
Does (with open) here open the file each time I rerun the function FindinFile in the for-loop at the end? Or does (with-open) keep the file in the buffer until the script is finished?

You have to assume that variable is valid and exists in the same scope in which it was defined. It was defined in a with clause, so it ceases to exist once you exit this clause (and function) - so yes, file is reopened every that time (unless there's some optimization, which is unlikely in this case, though).

Related

Refresh variable when reading from a txt file

I have a file in my python folder called data.txt and i have another file read.py trying to read text from data.txt but when i change something in data.txt my read doesn't show anything new i put
Something else i tried wasn't working and i found something that read, but when i changed it to something that was actually meaningful it didn't print the new text.
Can someone explain why it doesn't refresh, or what i need to do to fix it?
with open("data.txt") as f:
file_content = f.read().rstrip("\n")
print(file_content)
First and foremost, strings are immutable in Python - once you use file.read(), that returned object cannot change.
That being said, you must re-read the file at any given point the file contents may change.
For example
read.py
def get_contents(filepath):
with open(filepath) as f:
return f.read().rstrip("\n")
main.py
from read import get_contents
import time
print(get_contents("data.txt"))
time.sleep(30)
# .. change file somehow
print(get_contents("data.txt"))
Now, you could setup an infinite loop that watches the file's last modification timestamp from the OS, then always have the latest changes, but that seems like a waste of resources unless you have a specific need for that (e.g. tailing a log file), however there are arguably better tools for that
It was unclear from your question if you do the read once or multiple times. So here are steps to do:
Make sure you call the read function repeatedly with a certain interval
Check if you actually save file after modification
Make sure there are no file usage conflicts
So here is a description of each step:
When you read a file the way you shared it gets closed, meaning it is read only once, you need to read it multiple times if you want to see changes, so make it with some kind of interval in another thread or async or whatever suits your application best.
This step is obvious, remember to hit ctrl+c
It may happen that a single file is being accessed by multiple processes, for example your editor and the script, now to prevent errors try the following code:
def read_file(file_name: str):
while True:
try:
with open(file_name) as f:
return f.read().rstrip("\n")
except IOError:
pass

how to copy python method text to text files

I want to extract each method of a class and write it to a text file. The method can be composed of any length of lines and the input file can contain any number of methods. I want to run a loop that will start copying lines to the output file when it hits the first def keyword and continue until the second def. Then it will continue until all methods have been copied into individual files.
Input:
class Myclass:
def one():
pass
def two(x, y):
return x+y
Output File 1:
def one():
pass
Output File 2:
def two(x, y):
return x+y
Code:
with open('myfile.py', 'r') as f1:
lines = f1.readlines()
for line in lines:
if line.startswith('def'): # or if 'def' in line
file = open('output.txt', 'w')
file.writelines(line)
else:
file.writelines(line)
It's unclear what exactly is your problem or what you have actively tried so far.
Looking at the code you supplied, there are somethings to point out:
Since you are iterating over lines obtained using readlines(), they already contain line breaks. Therefore when writing to file, you should use simply write() instead of writelines(), unless you want duplicated line breaks.
If the desired output is each function on a different file, you should create a new file each time you find an occurrence of "def". You could simply use a counter and increment it each time to create unique filenames.
Always make sure you are correctly dealing with files (opening and close or using with statement). In your code, it seems that file would no longer be opened in your else statement.
Another possible solution, based on Extract functions from python file and write them to other files (not sure if duplicate, as I am new here, but very similar question), would be:
Instead of reading each line of the file, you could read the entire file and then split it by "def" keyword.
Read the file to a String.
Split the string by "def" (make sure you don't end up without the "def" words) into a list.
Ignore the first element, since it will be everything before the first function definition, and iterate over the remaining ones.
Write each of those Strings (as they will be the function defs you want) into a new file (you could use a counter and increment it to produce a different name for each of the files).
Follow this steps and you should achieve your goal.
Let me know if you need extra clarification.

Creating a program which counts words number in a row of a text file (Python)

I am trying to create a program which takes an input file, counts the number of words in each row and writes a string of that certain number in another output file. I managed to develope this code:
in_file = "our_input.txt"
out_file = "output.txt"
f=open(in_file)
g=open(out_file,"w")
for line in f:
if line == "\n":
g.write("0\n")
else:
g.write(str(line.count(" ")+1)+"\n")
now, this works well, but the problem is that it works for only a certain amount of lines. If my input file has 8000 lines, it will display only the first 6800. If it has 6000, than will be displayed (all numbers are rounded, right).
I tried creating another program, which splits each line to a list, and then counting the length of it, but the problem remains just the same.
Any idea what could cause this?
You need to close each file after you're done with it. The safest way to do this is by using the with statement:
with open(in_file) as f, open(out_file,"w") as g:
for line in f:
if line == "\n":
g.write("0\n")
else:
g.write(str(line.count(" ")+1)+"\n")
When reaching the end of a with block, all files you opened in the with line will be closed.
The reason for the behavior you see is that for performance reasons, reading and writing to/from files is buffered. Because of the way hard drives are constructed, data is read/written in blocks rather than in individual bytes - so even if you attempt to read/write a single byte, you have to read/write an entire block. Therefore, most programming languages' built-in file IO functions actually read (at least) one block at a time into memory and feed you data from that in-memory block until it needs to read another block. Similarly, writing is performed by actually writing into a memory block first, and only writing the block to disk when it is full. If you don't close the file writer, whatever is in the last in-memory block won't be written.

For loop in python: Code issue

I am somehow able to write the below code(taking help from various sources):
langs=['C','Java','Cobol','Python']
f1=open('a.txt','r')
f2=open('abc.txt','w')
for i in range(len(langs)):
for line in f1:
f2.write(line.replace('Frst languag','{}'.format(langs[i])))
f1.close()
f2.close()
Don't know why the the for loop is not running till the end. Because everytime i open the txt only 'C' gets stored in the txt. I want the script to run and at the end of the script's execution the txt should have the last value of the list (here python)
After the first pass of your inner for loop, f1 is pointing to the end of the file. So the subsequent passes don't do anything.
The easiest fix is to move f1=open('a.txt','r') to just before for line in f1:. Then the file will be re-read for each of your languages. (Alternatively, you might be able to restructure your logic so that you can handle all of the languages at the same time in one pass of the file.)

"for line in file object" method to read files

I'm trying to find out the best way to read/process lines for super large file.
Here I just try
for line in f:
Part of my script is as below:
o=gzip.open(file2,'w')
LIST=[]
f=gzip.open(file1,'r'):
for i,line in enumerate(f):
if i%4!=3:
LIST.append(line)
else:
LIST.append(line)
b1=[ord(x) for x in line]
ave1=(sum(b1)-10)/float(len(line)-1)
if (ave1 < 84):
del LIST[-4:]
output1=o.writelines(LIST)
My file1 is around 10GB; and when I run the script, the memory usage just keeps increasing to be like 15GB without any output. That means the computer is still trying to read the whole file into memory first, right? This really makes no different than using readlines()
However in the post:
Different ways to read large data in python
Srika told me:
The for line in f treats the file object f as an iterable, which automatically uses buffered IO and memory management so you don't have to worry about large files.
But obviously I still need to worry large files..I'm really confused.
thx
edit:
Every 4 lines is kind of group in my data.
THe purpose is to do some calculations on every 4th line; and based on that calculation, decide if we need to append those 4 lines.So writing lines is my purpose.
The reason the memory keeps inc. even after you use enumerator is because you are using LIST.append(line). That basically accumulates all the lines of the file in a list. Obviously its all sitting in-memory. You need to find a way to not accumulate lines like this. Read, process & move on to next.
One more way you could do is read your file in chunks (in fact reading 1 line at a time can qualify in this criteria, 1chunk == 1line), i.e. read a small part of the file process it then read next chunk etc. I still maintain that this is best way to read files in python large or small.
with open(...) as f:
for line in f:
<do something with line>
The with statement handles opening and closing the file, including if an exception is raised in the inner block. The for line in f treats the file object f as an iterable, which automatically uses buffered IO and memory management so you don't have to worry about large files.
It looks like at the end of this function, you're taking all of the lines you've read into memory, and then immediately writing them to a file. Maybe you can try this process:
Read the lines you need into memory (the first 3 lines).
On the 4th line, append the line & perform your calculation.
If your calculation is what you're looking for, flush the values in your collection to the file.
Regardless of what follows, create a new collection instance.
I haven't tried this out, but it could maybe look something like this:
o=gzip.open(file2,'w')
f=gzip.open(file1,'r'):
LIST=[]
for i,line in enumerate(f):
if i % 4 != 3:
LIST.append(line)
else:
LIST.append(line)
b1 = [ord(x) for x in line]
ave1 = (sum(b1) - 10) / float(len(line) - 1
# If we've found what we want, save them to the file
if (ave1 >= 84):
o.writelines(LIST)
# Release the values in the list by starting a clean list to work with
LIST = []
EDIT: As a thought though, since your file is so large, this may not be the best technique because of all the lines you would have to write to file, but it may be worth investigating regardless.
Since you add all the lines to the list LIST and only sometimes remove some lines from it, LIST we become longer and longer. All those lines that you store in LIST will take up memory. Don't keep all the lines around in a list if you don't want them to take up memory.
Also your script doesn't seem to produce any output anywhere, so the point of it all isn't very clear.
Ok, you know what your problem is already from the other comments/answers, but let me simply state it.
You are only reading a single line at a time into memory, but you are storing a significant portion of these in memory by appending to a list.
In order to avoid this you need to store something in the filesystem or a database (on the disk) for later look up if your algorithm is complicated enough.
From what I see it seems you can easily write the output incrementally. ie. You are currently using a list to store valid lines to write to output as well as temporary lines you may delete at some point. To be efficient with memory you want to write the lines from your temporary list as soon as you know these are valid output.
In summary, use your list to store only temporary data you need to do your calculations based off of, and once you have some valid data ready for output you can simply write it to disk and delete it from your main memory (in python this would mean you should no longer have any references to it.)
If you do not use the with statement , you must close the file's handlers:
o.close()
f.close()

Categories

Resources