I'm trying to set a variable to the last character of a file. I am using Python, and I'm fairly new to it. If it is of any importance, my code appends a random number between 2 and 9 to the end of an HTML file. In a separate function, I want to set the last character of the HTML file (the last character being the random number between 2 and 9) to a variable, then delete the last character (as to not affect the function of the HTML). Doe's anyone know how I could do this? I can attach my code below if needed, but I chose not to as it is 50 lines long and all 50 lines are needed for full context.
try this,
"a.txt" file has number 1, 3, 4, 5
Below code will read the file and pulls out last character from the file.
file = open('a.txt','r')
lines = file.read()
print(lines[-1])
=> 5
Using #Jab's answer from the comment above as well as some assumptions, we can produce a more efficient solution to finding the last character and replacing it.
The assumptions that are made are common and most likely will be valid:
You will know whether there is a newline character at the very end of the file, or whether the random number is truly the last character in the file (meaning accounting for whitespace).
You know the encoding of the file. This is valid since almost all HTML is utf-8, (can be utf-16), and since you are the one editing it, you will know. Most times the encoding won't even matter.
So, this is what we can do:
with open("test.txt", "rb+", encoding='utf-8') as f:
f.seek(-2, 2)
# -1 or -2, may change depending on whitespace characters at end of the file
var = f.read(1) # read one byte for a number
f.seek(-1,1)
print("last character:", str(var, 'utf-8'))
f.write(bytes('variable', 'utf-8')) # set whatever info here
f.write(bytes('\n', 'utf-8')) # you may want a newline character at the end of the file
f.truncate()
This is efficient because we actually don't have to iterate through the entire file. We iterate through just the last character, once to read and once to write.
You can do something like that:
# Open the file to read and the file to write
with open('file.txt'), open('new_file.txt', 'w+') as f_in, f_out:
# Read all the lines to memory (you can't find the last line lazily)
lines = f_in.readlines()
# Iterate over every line
for i, line in enumerate(lines):
# If the current index is the last index (i.e. the last line)
if i == len(lines) - 1:
# Get the last character
last_char = line[-1]
# Write to the output file the line without the last character
print(line[:-1], file=f_out, end='')
else:
# Write to the output file the line as it is
print(line, file=f_out, end='')
# Print the removed char
print(last_char)
If you don't want to create a new file, you can load all the file to memory as we're currently doing:
# Read all the lines into memory
with open('file.txt') as f:
lines = f.readlines()
# Replace the lines inside the list using the previous logic
for i, line in enumerate(lines):
if i == len(lines) - 1:
last_char = line[-1]
lines[i] = line[:-1]
else:
lines[i] = line
# Write the changed lines to the same file
with open('file.txt', 'w+') as f:
print(''.join(lines), file=f, end='')
# Print the removed char
print(last_char)
Related
I'm creating a program to allow users to remove users which works, however, when it removes a user at the end of the file a new line character is not removed which breaks the program. The following is the a part of the function to remove the user.
with open("users.txt", "r") as input:
with open("temp.txt", "w") as output: # Iterate all lines from file
for line in input:
if not line.strip("\n").startswith(enteredUsername):
# If line doesn't start with the username entered, then write it in temp file.
output.write(line)
os.replace('temp.txt', 'users.txt') # Replace file with original name
This creates a temporary file where anything which doesn't start with a given string is written to the file. the name is then swapped back to "users.txt" I've looked on other threads on stackoverflow as well as other websites and nothing has worked, is there anything I should change about this solution?
EDIT --------------------
I managed to fix this with the following code (and thanks to everyone for your suggestions!):
count = 1 # Keeps count of the number of lines
removed = False # Initially nothing has been removed
with open(r"users.txt", 'r') as fp:
x = len(fp.readlines()) # Finds the number of lines in the file
if login(enteredUsername, enteredPassword) == True: # Checks if the username and password combinination is correct
with open("users.txt", "r") as my_input:
with open("temp.txt", "w") as output: # Iterate all lines from file
for line in my_input:
if not line.strip("\n").startswith(enteredUsername): # If line doesn't start with the username entered, then write it in temp file.
if count == x - 1 and removed == False: # If something has not been removed, get rid of newline character
output.write(line[:-1])
else:
output.write(line)
else:
removed = True # This only becomes true if the previous statement is false, if so, something has been 'removed'
count +=1 # Increments the count for every line
os.replace('temp.txt', 'users.txt') # Replace file with original name
with open("users.txt", "r") as input:
with open("temp.txt", "w") as output: # Iterate all lines from file
for line in input:
if not line.strip("\n").startswith(enteredUsername):
# If line doesn't start with the username entered, then write it in temp file.
# output.write(line) # <-- you are writing the line that still have the new line character
output.write(line.strip("\n")) # try this?
os.replace('temp.txt', 'users.txt') # Replace file with original name
Also as general tip I would recommend not using the term "input" as a variable name since it is reserved in python. Just letting you know as it can potentially cause some whacky errors that can be a pain to debug (speaking from personal experience here!)
================================================================
EDIT:
I realize that doing this will likely not have any new line characters after you write the line, which will have all usernames on the same line. You will need to write a new line character after every name you write down except for the last one, which give you the trailing new line character that is causing you the problem.
with open("users.txt", "r") as my_input:
with open("temp.txt", "w") as output: # Iterate all lines from file
for line in my_input:
if not line.strip("\n").startswith(enteredUsername):
# If line doesn't start with the username entered, then write it in temp file.
output.write(line)
os.replace('temp.txt', 'users.txt') # Replace file with original name
# https://stackoverflow.com/questions/18857352/remove-very-last-character-in-file
# remove last new line character from the file
with open("users.txt", 'rb+') as filehandle:
filehandle.seek(-1, os.SEEK_END)
filehandle.truncate()
This is admittedly a hackey way to go about it-- but it should work! This last section removes the last character of the file, which is a new line character.
You don't need to use a temporary file for this.
def remove_user(filename, enteredUsername):
last = None
with open(filename, 'r+') as users:
lines = users.readlines()
users.seek(0)
for line in lines:
if not line.startswith(enteredUsername):
users.write(line)
last = line
# ensure that the last line is newline terminated
if last and last[-1] != '\n':
users.write('\n')
users.truncate()
Coding and Python lightweight :)
I've gotta iterate through some logfiles and pick out the ones that say ERROR. Boom done got that. What I've gotta do is figure out how to grab the following 10 lines containing the details of the error. Its gotta be some combo of an if statement and a for/while loop I presume. Any help would be appreciated.
import os
import re
# Regex used to match
line_regex = re.compile(r"ERROR")
# Output file, where the matched loglines will be copied to
output_filename = os.path.normpath("NodeOut.log")
# Overwrites the file, ensure we're starting out with a blank file
#TODO Append this later
with open(output_filename, "w") as out_file:
out_file.write("")
# Open output file in 'append' mode
with open(output_filename, "a") as out_file:
# Open input file in 'read' mode
with open("MXNode1.stdout", "r") as in_file:
# Loop over each log line
for line in in_file:
# If log line matches our regex, print remove later, and write > file
if (line_regex.search(line)):
# for i in range():
print(line)
out_file.write(line)
There is no need for regex to do this, you can just use the in operator ("ERROR" in line).
Also, to clear the content of the file without opening it in w mode, you can simply place the cursor at the beginning of the file and truncate.
import os
output_filename = os.path.normpath("NodeOut.log")
with open(output_filename, 'a') as out_file:
out_file.seek(0, 0)
out_file.truncate(0)
with open("MXNode1.stdout", 'r') as in_file:
line = in_file.readline()
while line:
if "ERROR" in line:
out_file.write(line)
for i in range(10):
out_file.write(in_file.readline())
line = in_file.readline()
We use a while loop to read lines one by one using in_file.readline(). The advantage is that you can easily read the next line using a for loop.
See the doc:
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if f.readline() returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.
Assuming you would only want to always grab the next 10 lines, then you could do something similar to:
with open("MXNode1.stdout", "r") as in_file:
# Loop over each log line
lineCount = 11
for line in in_file:
# If log line matches our regex, print remove later, and write > file
if (line_regex.search(line)):
# for i in range():
print(line)
lineCount = 0
if (lineCount < 11):
lineCount += 1
out_file.write(line)
The second if statement will help you always grab the line. The magic number of 11 is so that you grab the next 10 lines after the initial line that the ERROR was found on.
I have a file that contains millions of sequences. What I want to do is to get 5mers from each sequence in every line of my file.
My file looks like this:
CGATGCATAGGAA
GCAGGAGTGATCC
my code is:
with open('test.txt','r') as file:
for line in file:
for i in range(len(line)):
kmer = str(line[i:i+5])
if len(kmer) == 5:
print(kmer)
else:
pass
with this code, I should not get 4 mers but I do even I have an if statement for the length of 5mers. Could anyone help me with this? Thanks
my out put is:
CGATG
GATGC
ATGCA
TGCAT
GCATA
CATAG
ATAGG
TAGGA
AGGAA
GGAA
GCAGG
CAGGA
AGGAG
GGAGT
GAGTG
AGTGA
GTGAT
TGATC
GATCC
ATCC
but the ideal output should be only the one with length equal to 5 (for each line separately):
CGATG
GATGC
ATGCA
TGCAT
GCATA
CATAG
ATAGG
TAGGA
AGGAA
GCAGG
CAGGA
AGGAG
GGAGT
GAGTG
AGTGA
GTGAT
TGATC
GATCC
When iterating through a file, every character is represented somewhere. In particular, the last character for each of those lines is a newline \n, which you're printing.
with open('test.txt') as f: data = list(f)
# data[0] == 'CGATGCATAGGAA\n'
# data[1] == 'GCAGGAGTGATCC\n'
So the very last substring you're trying to print from the first line is 'GGAA\n', which has a length of 5, but it's giving you the extra whitespace and the appearance of 4mers. One of the comments proposed a satisfactory solution, but when you know the root of the problem you have lots of options:
with open('test.txt', 'r') as file:
for line_no, line in enumerate(file):
if line_no: print() # for the space between chunks which you seem to want in your final output -- omit if not desired
line = line.strip() # remove surrounding whitespace, including the pesky newlines
for i in range(len(line)):
kmer = str(line[i:i+5])
if len(kmer) == 5:
print(kmer)
else:
pass
What im trying to do is match a phrase in a text file, then print that line(This works fine). I then need to move the cursor up 4 lines so I can do another match in that line, but I cant get the seek() method to move up 4 lines from the line that has been matched so that I can do another regex search. All I can seem to do with seek() is search from the very end of the file, or the beginning. It doesn't seem to let me just do seek(105,1) from the line that is matched.
### This is the example test.txt
This is 1st line
This is 2nd line # Needs to seek() to this line from the 6th line. This needs to be dynamic as it wont always be 4 lines.
This is 3rd line
This is 4th line
This is 5th line
This is 6st line # Matches this line, now need to move it up 4 lines to the "2nd line"
This is 7 line
This is 8 line
This is 9 line
This is 10 line
#
def Findmatch():
file = open("test.txt", "r")
print file.tell() # shows 0 which is the beginning of the file
string = file.readlines()
for line in string:
if "This is 6th line" in line:
print line
print file.tell() # shows 171 which is the end of the file. I need for it to be on the line that matches my search which should be around 108. seek() only lets me search from end or beginning of file, but not from the line that was matched.
Findmatch()
Since you've read all of it into memory at once with file.readlines(). tell() method does indeed correctly point to the end and your already have all your lines in an array. If you still wanted to, you'd have to read the file in line by line and record position within file for each line start so that you could go back four lines.
For your described problem. You can first find index of the line first match and then do the second operation starting from the list slice four items before that.
Here a very rough example of that (return None isn't really needed, it's just for sake of verbosity, clearly stating intent/expected behavior; raising an exception might be just as well a desired depending on what the overall plan is):
def relevant(value, lines):
found = False
for (idx, line) in enumerate(lines):
if value in line:
found = True
break # Stop iterating, last idx is a match.
if found is True:
idx = idx - 4
if idx < 0:
idx = 0 # Just return all lines up to now? Or was that broken input and fail?
return lines[idx:]
else:
return None
with open("test.txt") as in_file:
lines = in_file.readlines()
print(''.join(relevant("This is 6th line", lines)))
Please also note: It's a bit confusing to name list of lines string (one would probably expect a str there), go with lines or something else) and it's also not advisable (esp. since you indicate to be using 2.7) to assign your variable names already used for built-ins, like file. Use in_file for instance.
EDIT: As requested in a comment, just a printing example, adding it in parallel as the former seem potentially more useful for further extension. :) ...
def print_relevant(value, lines):
found = False
for (idx, line) in enumerate(lines):
if value in line:
found = True
print(line.rstrip('\n'))
break # Stop iterating, last idx is a match.
if found is True:
idx = idx - 4
if idx < 0:
idx = 0 # Just return all lines up to now? Or was that broken input and fail?
print(lines[idx].rstrip('\n'))
with open("test.txt") as in_file:
lines = in_file.readlines()
print_relevant("This is 6th line", lines)
Note, since lines are read in with trailing newlines and print would add one of its own I've rstrip'ed the line before printing. Just be aware of it.
I am trying to parse some text files and need to extract blocks of text. Specifically, the lines that start with "1:" and 19 lines after the text. The "1:" does not start on the same row in each file and there is only one instance of "1:". I would prefer to save the block of text and export it to a separate file. In addition, I need to preserve the formatting of the text in the original file.
Needless to say I am new to Python. I generally work with R but these files are not really compatible with R and I have about 100 to process. Any information would be appreciated.
The code that I have so far is:
tmp = open(files[0],"r")
lines = tmp.readlines()
tmp.close()
num = 0
a=0
for line in lines:
num += 1
if "1:" in line:
a = num
break
a = num is the line number for the block of text I want. I then want to save to another file the next 19 lines of code, but can't figure how how to do this. Any help would be appreciated.
Here is one option. Read all lines from your file. Iterate till you find your line and return next 19 lines. You would need to handle situations where your file doesn't contain additional 19 lines.
fh = open('yourfile.txt', 'r')
all_lines = fh.readlines()
fh.close()
for count, line in enumerate(all_lines):
if "1:" in line:
return all_lines[count+1:count+20]
Could be done in a one-liner...
open(files[0]).read().split('1:', 1)[1].split('\n')[:19]
or more readable
txt = open(files[0]).read() # read the file into a big string
before, after = txt.split('1:', 1) # split the file on the first "1:"
after_lines = after.split('\n') # create lines from the after text
lines_to_save = after_lines[:19] # grab the first 19 lines after "1:"
then join the lines with a newline (and add a newline to the end) before writing it to a new file:
out_text = "1:" # add back "1:"
out_text += "\n".join(lines_to_save) # add all 19 lines with newlines between them
out_text += "\n" # add a newline at the end
open("outputfile.txt", "w").write(out_text)
to comply with best practice for reading and writing files you should also be using the with statement to ensure that the file handles are closed as soon as possible. You can create convenience functions for it:
def read_file(fname):
"Returns contents of file with name `fname`."
with open(fname) as fp:
return fp.read()
def write_file(fname, txt):
"Writes `txt` to a file named `fname`."
with open(fname, 'w') as fp:
fp.write(txt)
then you can replace the first line above with:
txt = read_file(files[0])
and the last line with:
write_file("outputfile.txt", out_text)
I always prefer to read the file into memory first, but sometimes that's not possible. If you want to use iteration then this will work:
def process_file(fname):
with open(fname) as fp:
for line in fp:
if line.startswith('1:'):
break
else:
return # no '1:' in file
yield line # yield line containing '1:'
for i, line in enumerate(fp):
if i >= 19:
break
yield line
if __name__ == "__main__":
with open('ouput.txt', 'w') as fp:
for line in process_file('intxt.txt'):
fp.write(line)
It's using the else: clause on a for-loop which you don't see very often anymore, but was created for just this purpose (the else clause if executed if the for-loop doesn't break).