How can I make a python script change itself?
To boil it down, I would like to have a python script (run.py)like this
a = 0
b = 1
print a + b
# do something here such that the first line of this script reads a = 1
Such that the next time the script is run it would look like
a = 1
b = 1
print a + b
# do something here such that the first line of this script reads a = 2
Is this in any way possible? The script might use external resources; however, everything should work by just running the one run.py-file.
EDIT:
It may not have been clear enough, but the script should update itself, not any other file. Sure, once you allow for a simple configuration file next to the script, this task is trivial.
For an example (changing the value of a each time its run):
a = 0
b = 1
print a + b
with open(__file__, 'r') as f:
lines = f.read().split('\n')
val = int(lines[0].split(' = ')[-1])
new_line = 'a = {}'.format(val+1)
new_file = '\n'.join([new_line] + lines[1:])
with open(__file__, 'w') as f:
f.write('\n'.join([new_line] + lines[1:]))
What you're asking for would require you to manipulate files at the {sys} level; basically, you'd read the current file in, modify it, over-write it, and reload the current module. I played with this briefly because I was curious, but I ran into file locking and file permission issues. Those are probably solvable, but I suspect that this isn't really what you want here.
First: realize that it's generally a good idea to maintain a separation between code and data. There are exceptions to this, but for most purposes, you'll want to make the parts of your program that can change at runtime read their configuration from a file, and write changes to that same file.
Second: idomatically, most python projects use YAML for configuration
Here's a simple script that uses the yaml library to read from a file called 'config.yaml', and increments the value of 'a' each time the program runs:
#!/usr/bin/python
import yaml
config_vals = ""
with open("config.yaml", "r") as cr:
config_vals = yaml.load(cr)
a = config_vals['a']
b = config_vals['b']
print a + b
config_vals['a'] = a + 1
with open("config.yaml", "w") as cw:
yaml.dump(config_vals, cw, default_flow_style=True)
The runtime output looks like this:
$ ./run.py
3
$ ./run.py
4
$ ./run.py
5
The initial YAML configuration file looks like this:
a: 1
b: 2
Make a file a.txt that contains one character on one line:
0
Then in your script, open that file and retrieve the value, then immediately change it:
with open('a.txt') as f:
a = int(f.read())
with open('a.txt', 'w') as output:
output.write(str(a+1))
b = 1
print a+b
On the first run of the program, a will be 0, and it will change the file to contain a 1. On subsequent runs, a will continue to be incremented by 1 each time.
Gerrat's code but modified.
#some code here
a = 0
b = 1
print(a + b)
applyLine = 1#apply to wich line(line 1 = 0, line 2 = 1)
with open(__file__, 'r') as f:
lines = f.read().split('\n')#make each line a str in a list called 'lines'
val = int(lines[applyLine].split(' = ')[-1])#make an int to get whatever is after ' = ' to applyed line
new_line = 'a = {}'.format(val+1)#generate the new line
lines[applyLine] = new_line#update 'lines' to add the new line
write = "\n".join(lines)#create what to rewrite and store it in 'write' as str
with open(__file__, 'w') as f:
f.write(write)#update the code
Related
I want to extract the text between {textblock_content} and {/textblock_content}.
With this script below, only the 1st line of the introtext.txt file is going to be extracted and written in a newly created text file. I don't know why the script does not extract also the other lines of the introtext.txt.
f = open("introtext.txt")
r = open("textcontent.txt", "w")
for l in f.readlines():
if "{textblock_content}" in l:
pos_text_begin = l.find("{textblock_content}") + 19
pos_text_end = l.find("{/textblock_content}")
text = l[pos_text_begin:pos_text_end]
r.write(text)
f.close()
r.close()
How to solve this problem?
Your code actually working fine, assuming you have begin and end block in your line. But I think this is not what you dreamed of. You can't read multiple blocks in one line, and you can't read block which started and ended in different lines.
First of all take a look at the object which returned by open function. You can use method read in this class to access whole text. Also take a look at with statements, it can help you to make actions with file easier and safely. And to rewrite your code so it will read something between {textblockcontent} and {\textblockcontent} we should write something like this:
def get_all_tags_content(
text: str,
tag_begin: str = "{textblock_content}",
tag_end: str = "{/textblock_content}"
) -> list[str]:
useful_text = text
ans = []
# Heavy cicle, needs some optimizations
# Works in O(len(text) ** 2), we can better
while tag_begin in useful_text:
useful_text = useful_text.split(tag_begin, 1)[1]
if tag_end not in useful_text:
break
block_content, useful_text = useful_text.split(tag_end, 1)
ans.append(block_content)
return ans
with open("introtext.txt", "r") as f:
with open("textcontent.txt", "w+") as r:
r.write(str(get_all_tags_content(f.read())))
To write this function efficiently, so it can work with a realy big files on you. In this implementation I have copied our begin text every time out context block appeared, it's not necessary and it's slow down our program (Imagine the situation where you have millions of lines with content {textblock_content}"hello world"{/textblock_content}. In every line we will copy whole text to continue out program). We can use just for loop in this text to avoid copying. Try to solve it yourself
When you call file.readlines() the file pointer will reach the end of the file. For further calls of the same, the return value will be an empty list so if you change your code to sth like one of the below code snippets it should work properly:
f = open("introtext.txt")
r = open("textcontent.txt", "w")
f_lines = f.readlines()
for l in f_lines:
if "{textblock_content}" in l:
pos_text_begin = l.find("{textblock_content}") + 19
pos_text_end = l.find("{/textblock_content}")
text = l[pos_text_begin:pos_text_end]
r.write(text)
f.close()
r.close()
Also, you can implement it through with context manager like the below code snippet:
with open("textcontent.txt", "w") as r:
with open("introtext.txt") as f:
for line in f:
if "{textblock_content}" in l:
pos_text_begin = l.find("{textblock_content}") + 19
pos_text_end = l.find("{/textblock_content}")
text = l[pos_text_begin:pos_text_end]
r.write(text)
I have a huge text file that I need to split based on matching a 'EKYC' only value. However, when other values with similar pattern show up my script fails.
I am new in Python and it is wearing me out.
import sys;
import os;
MASTER_TEXT_FILE=sys.argv[1];
OUTPUT_FILE=sys.argv[2];
L = file(MASTER_TEXT_FILE, "r").read().strip().split("EKYC")
i = 0
for l in L:
i = i + 1
f = file(OUTPUT_FILE+"-%d.ekyc" % i , "w")
print >>f, "EKYC" + l
The script breaks when there is EKYCSMRT or EKYCVDA or EKYCTIGO then how can I make the guard to prevent the splitting to occur before the point.
This is the content of all of the messages
EKYC
WIK 12
EKYC
WIK 12
EKYCTIGO
EKYC
WIK 13
TTL
EKYCVD
EKYC
WIK 14
TTL D
Thanks for the assistance.
If possible, you should avoid reading large files into memory all at once. Instead, stream chunks of them at a time.
The sensible chunks of text files are usually lines. This can be done with .readline(), but simply iterating over the file yields its lines too.
After reading a line (which includes the newline), you can .write() it directly to the current output file.
import sys
master_filename = sys.argv[1]
output_filebase = sys.argv[2]
output = None
output_number = 0
for line in open(master_filename):
if line.strip() == 'EKYC':
if output is not None:
output.close()
output = None
else:
if output is None:
output_number += 1
output_filename = '%s-%d.ekyc' % (output_filebase, output_number)
output = open(output_filename, 'w')
output.write(line)
if output is not None:
output.close()
The output file is closed and reset upon encountering 'EKYC' on its own line.
Here, you'll notice that the output file isn't (re)opened until right before there is a line to write to it: this avoids creating an empty output file in case there are no further lines to write to it. You'll have to re-order this slightly if you want the 'EKYC' line to appear in the output file also.
Based on your sample input file, you need to: split('\nEKYC\n')
#!/usr/bin/env python
import sys
MASTER_TEXT_FILE = sys.argv[1]
OUTPUT_FILE = sys.argv[2]
with open(MASTER_TEXT_FILE) as f:
fdata = f.read()
i = 0
for subset in fdata.split('\nEKYC\n'):
i += 1
with open(OUTPUT_FILE+"-%d.ekyc" % i, 'w') as output:
output.write(subset)
Other comments:
Python doesn't use ;.
Your original code wasn't using os.
It's recommended to use with open(<filename>, <mode>) as f: ... since it handles possible errors and closes the file afterward.
I have the following code to compare two files. I would like this program run if I point them to files which are as big as 4 or 5 MB. When I do that, the prompt cursor in python console just blinks, and no output is shown. Once, I ran it for the whole night and the next morning it was still blinking. What can I change in this code?
import difflib
file1 = open('/home/michel/Documents/first.csv', 'r')
file2 = open('/home/michel/Documents/second.csv', 'r')
diff = difflib.ndiff(file1.readlines(), file2.readlines())
delta = ''.join(diff)
print delta
If you use linux based system, you can call external command diff and you can use result of it. I try it for two file 14M and 9.3M with diff command. It takes 1.3 second.
real 0m1.295s
user 0m0.056s
sys 0m0.192s
When I have tried to use difflib in your way I had the same issue, because for big files difflib buffer the whole file in the memory and then compare them. As the solution you can compare two file partially. Here I am doing it for each 100 line.
import difflib
file1 = open('1.csv', 'r')
file2 = open('2.csv', 'r')
lines_file1 = []
lines_file2 = []
# i: number of line
# line: content of line
for i, line in enumerate(zip(file1, file2)):
# check if it is in line 100
if not (i % 100 == 0):
lines_file1.append(line[0])
lines_file2.append(line[1])
else:
# show the different for 100 line
diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2))
print ''.join(list(diff))
lines_file1 = []
lines_file2 = []
# show the different if any lines left
diff = difflib.ndiff("".join(lines_file1), "".join(lines_file2))
print ''.join(list(diff))
file1.close()
file2.close()
Hope it helps.
I am currently in some truble regarding python and reading files. I have to open a file in a while loop and do some stuff with the values of the file. The results are written into a new file. This new file is then read in the next run of the while loop. But in this second run I get no values out of this file... Here is a code snippet, that hopefully clarifies what I mean.
while convergence == 0:
run += 1
prevrun = run-1
if os.path.isfile("./Output/temp/EmissionMat%d.txt" %prevrun) == True:
matfile = open("./Output/temp/EmissionMat%d.txt" %prevrun, "r")
EmissionMat = Aux_Functions.EmissionMat(matfile)
matfile.close()
else:
matfile = open("./Input/EmissionMat.txt", "r")
EmissionMat = Aux_Functions.EmissionMat(matfile)
matfile.close()
# now some valid operations, which produce a matrix
emissionmat_file = open("./output/temp/EmissionMat%d.txt" %run, "w")
emissionmat_file.flush()
emissionmat_file.write(str(matrix))
emissionmat_file.close()
Solved it!
matfile.seek(0)
This resets the pointer to the begining of the file and allows me to read the file in the next run correctly.
Why to write to a file and then read it ? Moreover you use flush, so you are doing potentially long io. I would do
with open(originalpath) as f:
mat = f.read()
while condition :
run += 1
write_mat_run(mat, run)
mat = func(mat)
write_mat_run may be done in another thread. You should check io exceptions.
BTW this will probably solve your bug, or at least make it clear.
I can see nothing wrong with your code. The following concrete example worked on my Linux machine:
import os
run = 0
while run < 10:
run += 1
prevrun = run-1
if os.path.isfile("output%d.txt" %prevrun):
matfile = open("output%d.txt" %prevrun, "r")
data = matfile.readlines()
matfile.close()
else:
matfile = open("input.txt", "r")
data = matfile.readlines()
matfile.close()
data = [ s[:-1] + "!\n" for s in data ]
emissionmat_file = open("output%d.txt" %run, "w")
emissionmat_file.writelines(data)
emissionmat_file.close()
It adds an exclamation mark to each line in the file input.txt.
I solved it
before closing the file I do
matfile.seek(0)
This solved my problem. This methods sets the pointer of the reader to the beginning.
I am writing a python script and I just need the second line of a series of very small text files. I would like to extract this without saving the file to my harddrive as I currently do.
I have found a few threads that reference the TempFile and StringIO modules but I was unable to make much sense of them.
Currently I download all of the files and name them sequentially like 1.txt, 2.txt, etc, then go through all of them and extract the second line. I would like to open the file grab the line then move on to finding and opening and reading the next file.
Here is what I do currently with writing it to my HDD:
while (count4 <= num_files):
file_p = [directory,str(count4),'.txt']
file_path = ''.join(file_p)
cand_summary = string.strip(linecache.getline(file_path, 2))
linkFile = open('Summary.txt', 'a')
linkFile.write(cand_summary)
linkFile.write("\n")
count4 = count4 + 1
linkFile.close()
Just replace the file writing with a call to append() on a list. For example:
summary = []
while (count4 <= num_files):
file_p = [directory,str(count4),'.txt']
file_path = ''.join(file_p)
cand_summary = string.strip(linecache.getline(file_path, 2))
summary.append(cand_summary)
count4 = count4 + 1
As an aside you would normally write count += 1. Also it looks like count4 uses 1-based indexing. That seems pretty unusual for Python.
You open and close the output file in every iteration.
Why not simply do
with open("Summary.txt", "w") as linkfile:
while (count4 <= num_files):
file_p = [directory,str(count4),'.txt']
file_path = ''.join(file_p)
cand_summary = linecache.getline(file_path, 2).strip() # string module is deprecated
linkFile.write(cand_summary)
linkFile.write("\n")
count4 = count4 + 1
Also, linecache is probably not the right tool here since it's optimized for reading multiple lines from the same file, not the same line from multiple files.
Instead, better do
with open(file_path, "r") as infile:
dummy = infile.readline()
cand_summary = infile.readline.strip()
Also, if you drop the strip() method, you don't have to re-add the \n, but who knows why you have that in there. Perhaps .lstrip() would be better?
Finally, what's with the manual while loop? Why not use a for loop?
Lastly, after your comment, I understand you want to put the result in a list instead of a file. OK.
All in all:
summary = []
for count in xrange(num_files):
file_p = [directory,str(count),'.txt'] # or count+1, if you start at 1
file_path = ''.join(file_p)
with open(file_path, "r") as infile:
dummy = infile.readline()
cand_summary = infile.readline().strip()
summary.append(cand_summary)