how to count the last character in a text file in python - python

I have a text file. I want to Count the last names ending with "E". This is the code I have so far. I know it is not correct but I am stuck and do not know what else to do to make it work.
def ans9(file):
infile = open(file)
contents = infile.read().split()
infile.close()
return len(contents)
ans9.reverse()
for word in ans9:
print(word[e])

From what I see in the file, the name and the float number are delimited by tab. What you want to do is open a file, read it line by line. Then go through those lines (one line at a time), split it on tab character (\t) and take the first element of that list (name) and then a last character of that name. In code, it would look like this:
with open(file, ‘r’) as f:
lines = f.readlines()
cnt = 0
for i in lines:
if i.split(‘\t’)[0][-1] == ‘e’ or i.split(‘\t’)[0][-1] == ‘E’:
cnt += 1

Related

How to get the last character in a file from Python?

I'm trying to set a variable to the last character of a file. I am using Python, and I'm fairly new to it. If it is of any importance, my code appends a random number between 2 and 9 to the end of an HTML file. In a separate function, I want to set the last character of the HTML file (the last character being the random number between 2 and 9) to a variable, then delete the last character (as to not affect the function of the HTML). Doe's anyone know how I could do this? I can attach my code below if needed, but I chose not to as it is 50 lines long and all 50 lines are needed for full context.
try this,
"a.txt" file has number 1, 3, 4, 5
Below code will read the file and pulls out last character from the file.
file = open('a.txt','r')
lines = file.read()
print(lines[-1])
=> 5
Using #Jab's answer from the comment above as well as some assumptions, we can produce a more efficient solution to finding the last character and replacing it.
The assumptions that are made are common and most likely will be valid:
You will know whether there is a newline character at the very end of the file, or whether the random number is truly the last character in the file (meaning accounting for whitespace).
You know the encoding of the file. This is valid since almost all HTML is utf-8, (can be utf-16), and since you are the one editing it, you will know. Most times the encoding won't even matter.
So, this is what we can do:
with open("test.txt", "rb+", encoding='utf-8') as f:
f.seek(-2, 2)
# -1 or -2, may change depending on whitespace characters at end of the file
var = f.read(1) # read one byte for a number
f.seek(-1,1)
print("last character:", str(var, 'utf-8'))
f.write(bytes('variable', 'utf-8')) # set whatever info here
f.write(bytes('\n', 'utf-8')) # you may want a newline character at the end of the file
f.truncate()
This is efficient because we actually don't have to iterate through the entire file. We iterate through just the last character, once to read and once to write.
You can do something like that:
# Open the file to read and the file to write
with open('file.txt'), open('new_file.txt', 'w+') as f_in, f_out:
# Read all the lines to memory (you can't find the last line lazily)
lines = f_in.readlines()
# Iterate over every line
for i, line in enumerate(lines):
# If the current index is the last index (i.e. the last line)
if i == len(lines) - 1:
# Get the last character
last_char = line[-1]
# Write to the output file the line without the last character
print(line[:-1], file=f_out, end='')
else:
# Write to the output file the line as it is
print(line, file=f_out, end='')
# Print the removed char
print(last_char)
If you don't want to create a new file, you can load all the file to memory as we're currently doing:
# Read all the lines into memory
with open('file.txt') as f:
lines = f.readlines()
# Replace the lines inside the list using the previous logic
for i, line in enumerate(lines):
if i == len(lines) - 1:
last_char = line[-1]
lines[i] = line[:-1]
else:
lines[i] = line
# Write the changed lines to the same file
with open('file.txt', 'w+') as f:
print(''.join(lines), file=f, end='')
# Print the removed char
print(last_char)

Return First Letter of Line in File

I am trying to pull the first letter of every line in a file, then print those letters to a new file. I am working step-by-step so I created the code that would be able to pull the first letter of every line, however, when I added the code to read a specific file it appears that it is not properly iterating over the entire files content. Does anyone know why my for loop is not iterating? Or perhaps, is the issue that it is iterating but not properly adding the letters to 'lines'.
def secret2(m):
infile = open(m, 'r')
text = infile.read()
for line in text:
lines = text[0]
for i in range(len(text)):
if text[i] == '\n':
lines += text[i+1]
print(lines)
return(lines)
m.close()
Output:
>>> secret2('file.txt')
A
'A'
>>>
Proper output would be:
>>> secret2('file.txt')
'ALICE'
>>>
Your code is iterating over the characters instead of lines. You could print the first character from each line with following code:
def secret2(m):
with open(m) as infile:
print(''.join(line[0] for line in infile if line))
You want to consider the each line as a single data. So use readlines() instead of read. So your code should be
def secret2(m):
infile = open(m, 'r')
text = infile.readlines()
for j in (text):
print j[0]
You can use this:
def get_1st_chr(your_file, id_line) :
with open(your_file) as f :
text_splitted = f.read().splitlines()
f.close()
return text_splitted[id_line][0]
Or, if you want all of the first lines character:
def get_1st_chr(your_file, nb_lines) :
with open(your_file) as f :
text_splitted = f.read().splitlines()
f.close()
for i in range(nb_lines) :
print(text_splitted[[i][0])
You could replace 0 with the id of the character you want to print of course.

Printing to a file via Python

Hopefully this is an easy fix. I'm trying to edit one field of a file we use for import, however when I run the following code it leaves the file blank and 0kb. Could anyone advise what I'm doing wrong?
import re #import regex so we can use the commands
name = raw_input("Enter filename:") #prompt for file name, press enter to just open test.nhi
if len(name) < 1 : name = "test.nhi"
count = 0
fhand = open(name, 'w+')
for line in fhand:
words = line.split(',') #obtain individual words by using split
words[34] = re.sub(r'\D', "", words[34]) #remove non-numeric chars from string using regex
if len(words[34]) < 1 : continue # If the 34th field is blank go to the next line
elif len(words[34]) == 2 : "{0:0>3}".format([words[34]]) #Add leading zeroes depending on the length of the field
elif len(words[34]) == 3 : "{0:0>2}".format([words[34]])
elif len(words[34]) == 4 : "{0:0>1}".format([words[34]])
fhand.write(words) #write the line
fhand.close() # Close the file after the loop ends
I have taken below text in 'a.txt' as input and modified your code. Please check if it's work for you.
#Intial Content of a.txt
This,program,is,Java,program
This,program,is,12Python,programs
Modified code as follow:
import re
#Reading from file and updating values
fhand = open('a.txt', 'r')
tmp_list=[]
for line in fhand:
#Split line using ','
words = line.split(',')
#Remove non-numeric chars from 34th string using regex
words[3] = re.sub(r'\D', "", words[3])
#Update the 3rd string
# If the 3rd field is blank go to the next line
if len(words[3]) < 1 :
#Removed continue it from here we need to reconstruct the original line and write it to file
print "Field empty.Continue..."
elif len(words[3]) >= 1 and len(words[3]) < 5 :
#format won't add leading zeros. zfill(5) will add required number of leading zeros depending on the length of word[3].
words[3]=words[3].zfill(5)
#After updating 3rd value in words list, again creating a line out of it.
tmp_str = ",".join(words)
tmp_list.append(tmp_str)
fhand.close()
#Writing to same file
whand = open("a.txt",'w')
for val in tmp_list:
whand.write(val)
whand.close()
File content after running code
This,program,is,,program
This,program,is,00012,programs
The file mode 'w+' Truncates your file to 0 bytes, so you'll only be able to read lines that you've written.
Look at Confused by python file mode "w+" for more information.
An idea would be to read the whole file first, close it, and re-open it to write files in it.
Not sure which OS you're on but I think reading and writing to the same file has undefined behaviour.
I guess internally the file object holds the position (try fhand.tell() to see where it is). You could probably adjust it back and forth as you went using fhand.seek(last_read_position) but really that's asking for trouble.
Also, I'm not sure how the script would ever end as it would end up reading the stuff it had just written (in a sort of infinite loop).
Best bet is to read the entire file first:
with open(name, 'r') as f:
lines = f.read().splitlines()
with open(name, 'w') as f:
for l in lines:
# ....
f.write(something)
For 'Printing to a file via Python' you can use:
ifile = open("test.txt","r")
print("Some text...", file = ifile)

Find lines with a phrase and print another section of the line

I am trying to search through a long text file to locate sections where a phrase is located and then print the phrase in one column and the corresponding data in another in a new text file.
Phrase I am looking for is "Initialize All". The text file will have thousands of lines - the one I am looking for will look something like this:
14-09-23 13:47:46.053 -07 000000027 INF: Initialize All start
This is where I am at so far
Still trying to print three separate columns: Initialize All, Date, Time
with open ('Result.txt', 'w') as wFile:
with open('Log.txt', 'r') as f:
for line in f:
if 'Initialize All' in line:
date, time = line.split(" ",2)[:2]
wFile.write(date)
with open('file.txt', 'r') as f:
for line in f:
if 'Inintialize All' in line:
# do stuff with line
you can use regex:
lines=open('file.txt', 'r').readlines()
[re.search(r'\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}',line).group(0) for line in lines: if 'Inintialize All' in line]
s = "14-09-23 13:47:46.053 -07 000000027 INF: Initialize All start"
if "Initialize All" in s: # check for substring
date, time = s.split(" ",2)[:2] # split on whitespace and get the first two elements
print date,time
14-09-23 13:47:46.053
The 2 in s.split(" ",2) means the maxsplit is set to 2 so we just split twice other than splitting the whole string, s.split()[:2] will also work as it splits on whitespace by default but as we only want the first two substrings there is no point splitting the whole string.

parse blocks of text from text file using Python

I am trying to parse some text files and need to extract blocks of text. Specifically, the lines that start with "1:" and 19 lines after the text. The "1:" does not start on the same row in each file and there is only one instance of "1:". I would prefer to save the block of text and export it to a separate file. In addition, I need to preserve the formatting of the text in the original file.
Needless to say I am new to Python. I generally work with R but these files are not really compatible with R and I have about 100 to process. Any information would be appreciated.
The code that I have so far is:
tmp = open(files[0],"r")
lines = tmp.readlines()
tmp.close()
num = 0
a=0
for line in lines:
num += 1
if "1:" in line:
a = num
break
a = num is the line number for the block of text I want. I then want to save to another file the next 19 lines of code, but can't figure how how to do this. Any help would be appreciated.
Here is one option. Read all lines from your file. Iterate till you find your line and return next 19 lines. You would need to handle situations where your file doesn't contain additional 19 lines.
fh = open('yourfile.txt', 'r')
all_lines = fh.readlines()
fh.close()
for count, line in enumerate(all_lines):
if "1:" in line:
return all_lines[count+1:count+20]
Could be done in a one-liner...
open(files[0]).read().split('1:', 1)[1].split('\n')[:19]
or more readable
txt = open(files[0]).read() # read the file into a big string
before, after = txt.split('1:', 1) # split the file on the first "1:"
after_lines = after.split('\n') # create lines from the after text
lines_to_save = after_lines[:19] # grab the first 19 lines after "1:"
then join the lines with a newline (and add a newline to the end) before writing it to a new file:
out_text = "1:" # add back "1:"
out_text += "\n".join(lines_to_save) # add all 19 lines with newlines between them
out_text += "\n" # add a newline at the end
open("outputfile.txt", "w").write(out_text)
to comply with best practice for reading and writing files you should also be using the with statement to ensure that the file handles are closed as soon as possible. You can create convenience functions for it:
def read_file(fname):
"Returns contents of file with name `fname`."
with open(fname) as fp:
return fp.read()
def write_file(fname, txt):
"Writes `txt` to a file named `fname`."
with open(fname, 'w') as fp:
fp.write(txt)
then you can replace the first line above with:
txt = read_file(files[0])
and the last line with:
write_file("outputfile.txt", out_text)
I always prefer to read the file into memory first, but sometimes that's not possible. If you want to use iteration then this will work:
def process_file(fname):
with open(fname) as fp:
for line in fp:
if line.startswith('1:'):
break
else:
return # no '1:' in file
yield line # yield line containing '1:'
for i, line in enumerate(fp):
if i >= 19:
break
yield line
if __name__ == "__main__":
with open('ouput.txt', 'w') as fp:
for line in process_file('intxt.txt'):
fp.write(line)
It's using the else: clause on a for-loop which you don't see very often anymore, but was created for just this purpose (the else clause if executed if the for-loop doesn't break).

Categories

Resources