Finding the last element (digit) on every line in a file (Python)

Finding the last element (digit) on every line in a file (Python) - python

I'm reading from a text file and I need to find the last element (digit) on each line. I don't understand why this code isn't working as I have tried it on a regular string but it doesn't seem to apply in this case.
f = open("file.txt", "r")
result = 0
for line in f:
string = str(f.read())
if string[-1:].isdigit() == True:
result = int(string[-1:])
else:
result = 40
print(result)
f.close()
The file file.txt only contains the line
81 First line32
so the code should print out 2 as a result, but I only get 40, as the first condition never becomes true. What am I doing wrong?

This line is extraneous:
string = str(f.read())
You don't need to read from your file, and will actually move the file pointer by doing so, causing all sorts of issues. You're already reading with this:
for line in f:
Thus, what you want is:
for line in f:
if line[-1:].isdigit() == True:
result = int(line[-1:])
else:
result = 40
This is explained in the documentation.

You have an f.read() too many. This is all you need:
f = open("file.txt", "r")
result = 0
for line in f:
if line[-1:].isdigit():
result = int(line[-1:])
else:
result = 40
print(result)
f.close()
Also the if string[-1:].isdigit() == True: can be replaced with if line[-1:].isdigit():
You may also want to use line.strip() to get rid of new lines, or else the comparison will fail.
f = open("file.txt", "r")
result = 0
for line in f:
l = line.strip()
if l[-1:].isdigit():
result = int(l[-1:])
else:
result = 40
print(result)
f.close()

The problem is that the last character in the line is the end-of-line character. Use .strip() to remove it (it will also remove extra spaces).
with open("file.txt", "r") as f:
for line in f:
lastchar = line.strip()[-1]
if lastchar.isdigit():
result = int(lastchar)
else:
result = 40
print(result)
This prints 2 as you requested in your question with the one-line file.
81 First line32
It will also work for multiple lines, printing the result for each line.

Related

Removing duplicates from text file using python

I have this text file and let's say it contains 10 lines.
Bye
Hi
2
3
4
5
Hi
Bye
7
Hi
Every time it says "Hi" and "Bye" I want it to be removed except for the first time it was said.
My current code is (yes filename is actually pointing towards a file, I just didn't place it in this one)
text_file = open(filename)
for i, line in enumerate(text_file):
if i == 0:
var_Line1 = line
if i = 1:
var_Line2 = line
if i > 1:
if line == var_Line2:
del line
text_file.close()
It does detect the duplicates, but it takes a very long time considering the amount of lines there are, but I'm not sure on how to delete them and save it as well

You could use dict.fromkeys to remove duplicates and preserve order efficiently:
with open(filename, "r") as f:
lines = dict.fromkeys(f.readlines())
with open(filename, "w") as f:
f.writelines(lines)
Idea from Raymond Hettinger

Using a set & some basic filtering logic:
with open('test.txt') as f:
seen = set() # keep track of the lines already seen
deduped = []
for line in f:
line = line.rstrip()
if line not in seen: # if not seen already, write the lines to result
deduped.append(line)
seen.add(line)
# re-write the file with the de-duplicated lines
with open('test.txt', 'w') as f:
f.writelines([l + '\n' for l in deduped])

How to remove lines that start with the same letters (sequence) in a txt file?

#!/usr/bin/env python
FILE_NAME = "testprecomb.txt"
NR_MATCHING_CHARS = 5
lines = set()
with open(FILE_NAME, "r") as inF:
for line in inF:
line = line.strip()
if line == "": continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines):
print(line)
lines.add(beginOfSequence)
This is the code I have right now but it is not working. I have a file that has lines of DNA that sometimes start with the same sequence (or pattern of letters). I need to write a code that will find all lines of DNA that start with the same letters (perhaps the same 10 characters) and delete one of the lines.
Example (issue):
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT
What I need as result after one is taken out of file:
CCTGGATGGCTTATATAAGAT***GTTAT***
***GTTAT***ATAATATACCACCGGGCTGCTT
(no third line)

I think your set logic is correct. You are just missing the portion that will save the lines you want to write back into the file. I am guessing you tried this with a separate list that you forgot to add here since you are using append somewhere.
FILE_NAME = "sample_file.txt"
NR_MATCHING_CHARS = 5
lines = set()
output_lines = [] # keep track of lines you want to keep
with open(FILE_NAME, "r") as inF:
for line in inF:
line = line.strip()
if line == "": continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if not (beginOfSequence in lines):
output_lines.append(line + '\n') # add line to list, newline needed since we will write to file
lines.add(beginOfSequence)
print output_lines
with open(FILE_NAME, 'w') as f:
f.writelines(output_lines) # write it out to the file

Your approach has a few problems. First, I would avoid naming file variables inF as this can be confused with inf. Descriptive names are better: testFile for instance. Also testing for empty strings using equality misses a few important edge cases (what if line is None for instance?); use the not keyword instead. As for your actual problem, you're not actually doing anything based on that set membership:
FILE_NAME = "testprecomb.txt"
NR_MATCHING_CHARS = 5
prefixCache = set()
data = []
with open(FILE_NAME, "r") as testFile:
for line in testFile:
line = line.strip()
if not line:
continue
beginOfSequence = line[:NR_MATCHING_CHARS]
if (beginOfSequence in prefixCache):
continue
else:
print(line)
data.append(line)
prefixCache.add(beginOfSequence)

Unable to remove line breaks in a text file in python

At the risk of losing reputation I did not know what else to do. My file is not showing any hidden characters and I have tried every .replace and .strip I can think of. My file is UTF-8 encoded and I am using python/3.6.1
I have a file with the format:
>header1
AAAAAAAA
TTTTTTTT
CCCCCCCC
GGGGGGGG
>header2
CCCCCC
TTTTTT
GGGGGG
AAAAAA
I am trying to remove line breaks from the end of the file to make each line a continuous string. (This file is actually thousands of lines long).
My code is redundant in the sense that I typed in everything I could think of to remove line breaks:
fref = open(ref)
for line in fref:
sequence = 0
header = 0
if line.startswith('>'):
header = ''.join(line.splitlines())
print(header)
else:
sequence = line.strip("\n").strip("\r")
sequence = line.replace('\n', ' ').replace('\r', '').replace(' ', '').replace('\t', '')
print(len(sequence))
output is:
>header1
8
8
8
8
>header2
6
6
6
6
But if I manually go in and delete the end of line to make it a continuous string it shows it as a congruent string.
Expected output:
>header1
32
>header2
24
Thanks in advance for any help,
Dennis

There are several approaches to parsing this kind of input. In all cases, I would recommend isolating the open and print side-effects outside of a function that you can unit test to convince yourself of the proper behavior.
You could iterate over each line and handle the case of empty lines and end-of-file separately. Here, I use yield statements to return the values:
def parse(infile):
for line in infile:
if line.startswith(">"):
total = 0
yield line.strip()
elif not line.strip():
yield total
else:
total += len(line.strip())
if line.strip():
yield total
def test_parse(func):
with open("input.txt") as infile:
assert list(parse(infile)) == [
">header1",
32,
">header2",
24,
]
Or, you could handle both empty lines and end-of-file at the same time. Here, I use an output array to which I append headers and totals:
def parse(infile):
output = []
while True:
line = infile.readline()
if line.startswith(">"):
total = 0
header = line.strip()
elif line and line.strip():
total += len(line.strip())
else:
output.append(header)
output.append(total)
if not line:
break
return output
def test_parse(func):
with open("input.txt") as infile:
assert parse(infile) == [
">header1",
32,
">header2",
24,
]
Or, you could also split the whole input file into empty-line-separated blocks and parse them independently. Here, I use an output stream to which I write the output; in production, you could pass the sys.stdout stream for example:
import re
def parse(infile, outfile):
content = infile.read()
for block in re.split(r"\r?\n\r?\n", content):
header, *lines = re.split(r"\s+", block)
total = sum(len(line) for line in lines)
outfile.write("{header}\n{total}\n".format(
header=header,
total=total,
))
from io import StringIO
def test_parse(func):
with open("/tmp/a.txt") as infile:
outfile = StringIO()
parse(infile, outfile)
outfile.seek(0)
assert outfile.readlines() == [
">header1\n",
"32\n",
">header2\n",
"24\n",
]
Note that my tests use open("input.txt") for brevity but I would actually recommend passing a StringIO(...) instance instead to see the input being tested more easily, to avoid hitting the filesystem and to make the tests faster.

From my understanding of your question you would like something like this:
Note how the sequence is build over multiple iteration steps of the loop, as you wish to combine multiple lines.
with open(ref) as f:
sequence = "" # reset sequence
header = None
for line in f:
if line.startswith('>'):
if header:
print(header) # print last header
print(len(sequence)) # print last sequence
sequence = "" # reset sequence
header = line[1:] # store header
else:
sequence += line.rstrip() # append line to sequence

Locate a specific line in a file based on user input then delete a specific number of lines

I'm trying to delete specific lines in a text file the way I need to go about it is by prompting the user to input a string (a phrase that should exist in the file) the file is then searched and if the string is there the data on that line and the number line number are both stored.
After the phrase has been found it and the five following lines are printed out. Now I have to figure out how to delete those six lines without changing any other text in the file which is my issue lol.
Any Ideas as to how I can delete those six lines?
This was my latest attempt to delete the lines
file = open('C:\\test\\example.txt', 'a')
locate = "example string"
for i, line in enumerate(file):
if locate in line:
line[i] = line.strip()
i = i+1
line[i] = line.strip()
i = i+1
line[i] = line.strip()
i = i+1
line[i] = line.strip()
i = i + 1
line[i] = line.strip()
i = i+1
line[i] = line.strip()
break

Usually I would not think it's desirable to overwrite the source file - what if the user does something by mistake? If your project allows, I would write the changes out to a new file.
with open('source.txt', 'r') as ifile:
with open('output.txt', 'w') as ofile:
locate = "example string"
skip_next = 0
for line in ifile:
if locate in line:
skip_next = 6
print(line.rstrip('\n'))
elif skip_next > 0:
print(line.rstrip('\n'))
skip_next -= 1
else:
ofile.write(line)
This is also robust to finding the phrase multiple times - it will just start counting lines to remove again.

You can find the occurrences, copy the list items between the occurrences to a new list and then save the new list into the file.
_newData = []
_linesToSkip = 3
with open('data.txt', 'r') as _file:
data = _file.read().splitlines()
occurrences = [i for i, x in enumerate(data) if "example string" in x]
_lastOcurrence = 0
for ocurrence in occurrences:
_newData.extend(data[_lastOcurrence : ocurrence])
_lastOcurrence = ocurrence + _linesToSkip
_newData.extend(data[_lastOcurrence:])
# Save new data into the file

There are a couple of points that you clearly misunderstand here:
.strip() removes whitespace or given characters:
>>> print(str.strip.__doc__)
S.strip([chars]) -> str
Return a copy of the string S with leading and trailing
whitespace removed.
If chars is given and not None, remove characters in chars instead.
incrementing i doesn't actually do anything:
>>> for i, _ in enumerate('ignore me'):
... print(i)
... i += 10
...
0
1
2
3
4
5
6
7
8
You're assigning to the ith element of the line, which should raise an exception (that you neglected to tell us about)
>>> line = 'some text'
>>> line[i] = line.strip()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
Ultimately...
You have to write to a file if you want to change its contents. Writing to a file that you're reading from is tricky business. Writing to an alternative file, or just storing the file in memory if it's small enough is a much healthier approach.
search_string = 'example'
lines = []
with open('/tmp/fnord.txt', 'r+') as f: #`r+` so we can read *and* write to the file
for line in f:
line = line.strip()
if search_string in line:
print(line)
for _ in range(5):
print(next(f).strip())
else:
lines.append(line)
f.seek(0) # back to the beginning!
f.truncate() # goodbye, original lines
for line in lines:
print(line, file=f) # python2 requires `from __future__ import print_function`
There is a fatal flaw in this approach, though - if the sought after line is any closer than the 6th line from the end, it's going to have problems. I'll leave that as an exercise for the reader.

You are appending to your file by using open with 'a'. Also, you are not closing your file (bad habit). str.strip() does not delete the line, it removes whitespace by default. Also, this would usually be done in a loop.
This to get started:
locate = "example string"
n=0
with open('example.txt', 'r+') as f:
for i,line in enumerate(f):
if locate in line:
n = 6
if n:
print( line, end='' )
n-=1
print( "done" )
Edit:
Read-modify-write solution:
locate = "example string"
filename='example.txt'
removelines=5
with open(filename) as f:
lines = f.readlines()
with open(filename, 'w') as f:
n=0
for line in lines:
if locate in line:
n = removelines+1
if n:
n-=1
else:
f.write(line)

how to count empty lines in python file

I would like to print the total empty lines using python. I have been trying to print using:
f = open('file.txt','r')
for line in f:
if (line.split()) == 0:
but not able to get proper output
I have been trying to print it.. it does print the value as 0.. not sure what wrong with code..
print "\nblank lines are",(sum(line.isspace() for line in fname))
it printing as:
blank lines are 0
There are 7 lines in the file.
There are 46 characters in the file.
There are 8 words in the file.

Since the empty string is a falsy value, you may use .strip():
for line in f:
if not line.strip():
....
The above ignores lines with only whitespaces.
If you want completely empty lines you may want to use this instead:
if line in ['\r\n', '\n']:
...

Please use a context manager (with statement) to open files:
with open('file.txt') as f:
print(sum(line.isspace() for line in f))
line.isspace() returns True (== 1) if line doesn't have any non-whitespace characters, and False (== 0) otherwise. Therefore, sum(line.isspace() for line in f) returns the number of lines that are considered empty.
line.split() always returns a list. Both
if line.split() == []:
and
if not line.split():
would work.

FILE_NAME = 'file.txt'
empty_line_count = 0
with open(FILE_NAME,'r') as fh:
for line in fh:
# The split method will split the word into list. if the line is
# empty the split will return an empty list. ' == [] ' this will
# check the list is empty or not.
if line.split() == []:
empty_line_count += 1
print('Empty Line Count : ' , empty_line_count)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding the last element (digit) on every line in a file (Python) - python

Related

Removing duplicates from text file using python

How to remove lines that start with the same letters (sequence) in a txt file?

Unable to remove line breaks in a text file in python

Locate a specific line in a file based on user input then delete a specific number of lines

how to count empty lines in python file

Categories

Resources