I want to extract each method of a class and write it to a text file. The method can be composed of any length of lines and the input file can contain any number of methods. I want to run a loop that will start copying lines to the output file when it hits the first def keyword and continue until the second def. Then it will continue until all methods have been copied into individual files.
Input:
class Myclass:
def one():
pass
def two(x, y):
return x+y
Output File 1:
def one():
pass
Output File 2:
def two(x, y):
return x+y
Code:
with open('myfile.py', 'r') as f1:
lines = f1.readlines()
for line in lines:
if line.startswith('def'): # or if 'def' in line
file = open('output.txt', 'w')
file.writelines(line)
else:
file.writelines(line)
It's unclear what exactly is your problem or what you have actively tried so far.
Looking at the code you supplied, there are somethings to point out:
Since you are iterating over lines obtained using readlines(), they already contain line breaks. Therefore when writing to file, you should use simply write() instead of writelines(), unless you want duplicated line breaks.
If the desired output is each function on a different file, you should create a new file each time you find an occurrence of "def". You could simply use a counter and increment it each time to create unique filenames.
Always make sure you are correctly dealing with files (opening and close or using with statement). In your code, it seems that file would no longer be opened in your else statement.
Another possible solution, based on Extract functions from python file and write them to other files (not sure if duplicate, as I am new here, but very similar question), would be:
Instead of reading each line of the file, you could read the entire file and then split it by "def" keyword.
Read the file to a String.
Split the string by "def" (make sure you don't end up without the "def" words) into a list.
Ignore the first element, since it will be everything before the first function definition, and iterate over the remaining ones.
Write each of those Strings (as they will be the function defs you want) into a new file (you could use a counter and increment it to produce a different name for each of the files).
Follow this steps and you should achieve your goal.
Let me know if you need extra clarification.
Related
Is there a way to precurse a write function in python (I'm working with fasta files but any write function that works with text files should work)?
The only way I could think is to read the whole file in as an array and count the number of lines I want to start at and just re-write that array, at that value, to a text file.
I was just thinking there might be a write an option or something somewhere.
I would add some code, but I'm writing it right now, and everyone on here seems to be pretty well versed, and probably know what I'm talking about. I'm an EE in the CS domain and just calling on the StackOverflow community to enlighten me.
From what I understand you want to truncate a file from the start - i.e remove the first n lines.
Then no - there is no way you can do without reading in the lines and ignoring the lines - this is what I would do :
import shutil
remove_to = 5 # Remove lines 0 to 5
try:
with open('precurse_me.txt') as inp, open('temp.txt') as out:
for index, line in enumerate(inp):
if index <= remove_to:
continue
out.write(line)
# If you don't want to replace the original file - delete this
shutil.move('temp.txt', 'precurse_me.txt')
except Exception as e:
raise e
Here I open a file for the output and then use shutil.move() to replace the input file only after the processing (the for loop) is complete. I do this so that I don't break the 'precurse_me.txt' file in case the processing fails. I wrap the whole thing in a try/except so that if anything fails it doesn't try to move the file by accident.
The key is the for loop - read the input file line by line; using the enumerate() function to count the lines as they come in.
Ignore those lines (by using continue) until the index says to not ignore the line - after that simply write each line to the out file.
To read a file in Python, the file must be first opened, and then a read() function is needed. Why is that when we use a for loop to read lines of a file, no read() function is necessary?
filename = 'pi_digits.txt'
with open(filename,) as file_object:
for line in file_object:
print(line)
I'm used to the code below, showing the read requirement.
for line in file_object.read():
This is because the file_object class has an "iter" method built in that states how the file will interact with an iterative statement, like a for loop.
In other words, when you say for line in file_object the file object is referencing its __iter__ method, and returning a list where each index contains a line of the file.
Python file objects define special behavior when you iterate over them, in this case with the for loop. Every time you hit the top of the loop it implicitly calls readline(). That's all there is to it.
Note that the code you are "used to" will actually iterate character by character, not line by line! That's because you will be iterating over a string (the result of the read()), and when Python iterates over strings, it goes character by character.
The open command in your with statement handles the reading implicitly. It creates a generator that yields the file a record at a time (the read is hidden within the generator). For a text file, each line is one record.
Note that the read command in your second example reads the entire file into a string; this consumes more memory than the line-at-a-time example.
I have a file called vegetables:
carrots
apples_
cucumbers
What I want to do is open the file in python, and modify it in-place, without overwriting large portions of the file. Specifically, I want to overwrite apples_ with lettuce, such that the file would look like this:
carrots
lettuce
cucumbers
To do this, I've been told to use 'r+' mode. However, I don't know how to overwrite that line in place. Is that possible? All the solutions I am familiar with involve caching the entire file, and then overwriting the entire file, for a small amendment. Is this really the best option?
Important note: the replacement line is always the same length as the original line.
For context: I'm not really concerned with a file on vegetables. Rather, I have a textfile of about 400 lines to which I need to make revisions roughly every two minutes. I have a script to do this, but I want to do it more efficiently.
an answer that works with your example
with open("vegetables","r+") as t:
data = t.read()
t.seek(data.index("apples_"))
t.write("lettuce")
although, it might not be worth it to complicate things like this,
it's fine to just read the entire file, and then overwriting the entire file, you aren't going to save much by doing something like my example
NOTE: this only works if it has the exactly the same length as the original text you are replacing
edit1: a (possibly bad) example to replace all match:
import re
with open("test","r+") as t:
data = t.read()
for m in re.finditer("apples_", data):
t.seek(m.start())
t.write("lettuce")
edit2: something a little more complex using closure so that it can check for multiple words to replace
import re
def get_find_and_replace(f):
"""f --> a file that is open with r+ mode"""
data = f.read()
def find_and_replace(old, new):
for m in re.finditer(old, data):
f.seek(m.start())
f.write(new)
return find_and_replace
with open("test","r+") as f:
find_and_replace = get_find_and_replace(f)
find_and_replace("apples_","lettuce")
#find_and_replace(...,...)
#find_and_replace(...,...)
If I understanding you correctly fileinput.input should work providing the string is not a substring of another:
import fileinput
for line in fileinput.input("in.txt",inplace=True):
print(line.rstrip().replace("apples_","lettuce"))
print(line.rstrip().replace("apples_","lettuce")) actually writes to the file inplace it does not print the line.
you can also check for multiple words to replace in one pass:
old = "apples_"
for line in fileinput.input("in.txt",inplace=True):
if line.rstrip() == old:
print(line.rstrip().replace(old,"lettuce"))
elif ....
elif....
else:
print(line.rstrip())
am simply iterating through an external file (which contains a phrase) and want to see if a line exists (which has the word 'Dad' in it) If i find it, I want to replace it with 'Mum'. Here is the program i've built... but am not sure why it isn't working?!
message_file = open('test.txt','w')
message_file.write('Where\n')
message_file.write('is\n')
message_file.write('Dad\n')
message_file.close()
message_temp_file = open('testTEMP.txt','w')
message_file = open('test.txt','r')
for line in message_file:
if line == 'Dad': # look for the word
message_temp_file.write('Mum') # replace it with mum in temp file
else:
message_temp_file.write(line) # else, just write the word
message_file.close()
message_temp_file.close()
import os
os.remove('test.txt')
os.rename('testTEMP.txt','test.txt')
This should be so simple...it's annoyed me! Thanks.
You don't have any lines that are "Dad". You have a line that is "Dad\n", but no "Dad". In addition, since you've done message_file.read(), the cursor is at the end of your file so for line in message_file will return StopIteration immediately. You should do message_file.seek(0) just before your for loop.
print(message_file.read())
message_file.seek(0)
for line in message_file:
if line.strip() == "Dad":
...
That should put the cursor back at the beginning of the file, and strip out the newline and get you what you need.
Note that this exercise is a great example of how not to do things in general! The better implementation would have been:
in_ = message_file.read()
out = in_.replace("Dad","Mum")
message_temp_file.write(out)
print(message_file.read())
here you already read the whole file.
Nothing is left for the for loop to check
A file object always remembers where it stopped to read/write the last time you accessed it.
So if you call print(message_file.readline()), the first line of the file is read and printed. Next time you call the same command, the second line is read and printed and so on until you reach the end of the file. By using print(message_file.read()) you have read the whole file and any further call of read or readline will give you nothing
You can get the current position by message_file.tell() and set it to a certain value by message_file.seek(value), or simply reopen the file
The problem most likely is due to the fact that your conditional will only match the string "Dad", when the string is actually "Dad\n". You could either update your conditional to:
if line == "Dad\n":
OR
if "Dad" in line:
Lastly, you also read the entire file when you call print(message_file.read()). You either need to remove that line, or you need to put a call to message_file.seek(0) in order for the loop that follows to actually do anything.
I'm very new to programming/python so I have some trouble understanding in which order different operations should be performed for optimal usage.
I wrote a script that takes a long list of words and searches different files for bits of text that contain these words and returns the result but it is not very fast at the moment.
What I think I first need to optimize is the code listed below.
Is there a more resource efficient way to write the following code:
ListofStuff = ["blabla", "singer", "dinger"]
def FindinFile(FindStuff):
with open(File, encoding="utf-8") as TargetFile:
for row in TargetFile:
# search whole file for FindStuff and return chunkoftext as result
def EditText(result):
#do some text editing to result
print edited text
for key in ListofStuff:
EditText(FindinFile(key))
Does (with open) here open the file each time I rerun the function FindinFile in the for-loop at the end? Or does (with-open) keep the file in the buffer until the script is finished?
You have to assume that variable is valid and exists in the same scope in which it was defined. It was defined in a with clause, so it ceases to exist once you exit this clause (and function) - so yes, file is reopened every that time (unless there's some optimization, which is unlikely in this case, though).