How to read text files in python with specified condition?

How to read text files in python with specified condition? - python

I have a text file and I want to extract the number of the line that contains certain phrases (ATOMIC_POSITIONS (angstrom) and K_POINTS (automatic)).
n = -1
with open(filename) as f:
for line in f:
n += 1
if line == "ATOMIC_POSITIONS (angstrom)":
print('test1')
start = n
elif line == "K_POINTS (automatic)":
print('test2')
end = n
print(start, end)
My problem is that python does not go inside the if statements (i.e. test1 and test2 are not printed).
But I am sure that filename contain the phrases, this is small part of filename:
0.000000613 0.000000613 1.022009120
ATOMIC_POSITIONS (angstrom)
C 1.696797551 1.714436737 -0.068349117

Simply put: your condition is not met. "==" checks for equality, which for several reasons may not be true in your case (see comments).
When checking for a string in a line of a file I would try this:
n=-1
with open(filename) as f:
for line in f:
n += 1
if "ATOMIC_POSITIONS (angstrom)" in line:
print('test1')
start = n

Related

How can I go to the next line in .txt file?

How can I read only first symbol in each line with out reading all line, using python?
For example, if I have file like:
apple
pear
watermelon
In each iteration I must store only one (the first) letter of line.
Result of program should be ["a","p","w"], I tried to use file.seek(), but how can I move it to the new line?

ti7 answer is great, but if the lines might be too long to save in memory, you might wish to read char-by-char to prevent storing the whole line in memory:
from pathlib import Path
from typing import Iterator
NEWLINE_CHAR = {'\n', '\r'}
def first_chars(file_path: Path) -> Iterator[str]:
with open(file_path) as fh:
new_line = True
while c := fh.read(1):
if c in NEWLINE_CHAR:
new_line = True
elif new_line:
yield c
new_line = False
Test:
path = Path('/some/path/a.py')
easy_first_chars = [l[0] for l in path.read_text().splitlines() if l]
smart_first_chars = list(first_chars(path))
assert smart_first_chars == easy_first_chars

file-like objects are iterable, so you can directly use them like this
collection = []
with open("input.txt") as fh:
for line in fh: # iterate by-lines over file-like
try:
collection.append(line[0]) # get the first char in the line
except IndexError: # line has no chars
pass # consider other handling
# work with collection
You may also consider enumerate() if you cared about which line a particular value was on, or yielding line[0] to form a generator (which may allow a more efficient process if it can halt before reading the entire file)
def my_generator():
with open("input.txt") as fh:
for lineno, line in enumerate(fh, 1): # lines are commonly 1-indexed
try:
yield lineno, line[0] # first char in the line
except IndexError: # line has no chars
pass # consider other handling
for lineno, first_letter in my_generator():
# work with lineno and first_letter here and break when done

You can read one letter with file.read(1)
file = open(filepath, "r")
letters = []
# Initilalized to '\n' to sotre first letter
previous = '\n'
while True:
# Read only one letter
letter = file.read(1)
if letter == '':
break
elif previous == '\n':
# Store next letter after a next line '\n'
letters.append(letter)
previous = letter

Is there a way to read a line backwards in Python?

Im trying to write a program that counts the number of N's at the end of a string.
I have a file containing a many lines of unique sequences and I want to measure how often the sequence ends with N, and how long the series of N's are. For example, the file input will look like this:
NTGTGTAATAGATTTTACTTTTGCCTTTAAGCCCAAGGTCCTGGACTTGAAACATCCAAGGGATGGAAAATGCCGTATAACNN
NAAAGTCTACCAATTATACTTAGTGTGAAGAGGTGGGAGTTAAATATGACTTCCATTAATAGTTTCATTGTTTGGAAAACAGN
NTACGTTTAGTAGAGACAGTGTCTTGCTATGTTGCCCAGGCTGGTCTCAAACTCCTGAGCTCTAGCAAGCCTTCCACCTCNNN
NTAATCCAACTAACTAAAAATAAAAAGATTCAAATAGGTACAGAAAACAATGAAGGTGTAGAGGTGAGAAATCAACAGGANNN
Ideally, the code will read through the file, line by line and count how often a line ends with 'N'.
Then, if a line ends with N, it should read each character backwards to see how long the string of N's is. This information will be used to calculate the percentage of lines ending in N, as well as the mean, mode, median and range of N strings.
Here is what I have so far.
filename = 'N_strings_test.txt'
n_strings = 0
n_string_len = []
with open(filename, 'r') as in_f_obj:
line_count = 0
for line in in_f_obj:
line_count += 1
base_seq = line.rstrip()
if base_seq[-1] == 'N':
n_strings += 1
if base_seq[-2] == 'N':
n_string_len.append(int(2))
else:
n_string_len.append(int(1))
print(line_count)
print(n_strings)
print(n_string_len)
All i'm getting is an index out of range error, but I don't understand why. Also, what I have so far is only limited to 2 characters.
I want to try and write this for myself, so I don't want to import any modules.
Thanks.

You will probably get the IndexError because your file has empty lines!
Two sound approaches. First the generic one: iterate the line in reverse using reversed():
line = line.rstrip()
count = 0
for c in reversed(line):
if c != 'N':
break
count += 1
# count will now contain the number of N characters from the end
Another, even easier, which does modify the string, is to rstrip() all whitespace, get the length, and then rstrip() all Ns. The number of trailing Ns is the difference in lengths:
without_whitespace = line.rstrip()
without_ns = without_whitespace .rstrip('N')
count = len(without_whitespace) - len(without_ns)

This code is:
Reading line by line
Reversing the string and lstriping it. Reversing is not necessary but it make things natural.
Read last character, if N then increment
Keep reading that line until we have stream of N
n_string_count, n_string_len, line_count = 0, [], 0
with open('file.txt', 'r') as input_file:
for line in input_file:
line_count += 1
line = line[::-1].lstrip()
if line:
if line[0] == 'N':
n_string_count += 1
consecutive_n = 1
while consecutive_n < len(line) and line[consecutive_n] == 'N': consecutive_n += 1
n_string_len.append(consecutive_n)
print(line_count)
print(n_string_count)
print(n_string_len)

Python: reading integers in a file and adding them to a list

I am trying to create a function that takes an open file as an argument, reads the integers in the file which are all on their own line, then creates a list of those integers. The function should stop reading the file when there is an empty line. This is what I am stuck on.
def load_ints(file):
lst = []
x = 1
while x == 1:
for line in file:
if len(line.strip()) != 0:
load = line.split()
load = [int(i) for i in load]
lst = lst + load
else:
x = 2
x = 2
return lst
the file I am testing it with looks like this:
1
0
-12
53
1078
Should not be read by load_ints!
len(line.strip()) != 0:
is not working,
it currently gives me a ValueError: invalid literal for int() with base 10: 'Should'

You need to put a break after the x = 2
else:
x = 2
break
Otherwise, the for loop will keep iterating over the file. It has read the blank line, executed the else condition, then carried on processing lines. So it tries to process the 'Should...' line, and fails because 'Should...' is not an integer.
Also, I don't see why you have the while statement. The for loop should be enough to iterate over the file and process each line, and the break I've suggested will exit the loop when you hit the blank line.

Other answers already point out the issue: you have to stop parsing the integers when encoutering the blank line.
Here's a one-liner using itertools.takewhile, stopping when stripping the line yields an empty line & converting to integer:
import itertools
def load_ints(file):
return [int(x) for x in itertools.takewhile(str.strip,file)]
result:
[1, 0, -12, 53, 1078]
So itertools.takewhile iterates on the file lines, and applies strip on each line. If the result is an empty string, it stops the iteration. Otherwise it continues so the line is converted to integer and added to the list comprehension.
The less lines you're writing in those cases, the less bugs you'll create with auxiliary variables & states.

I think it is not necessary the while.
def load_ints(file):
lst = []
for line in file:
if len(line.strip()) != 0:
load = line.split()
load = [int(i) for i in load]
lst.append(load)
else:
break
return lst

When you read a file you get a generator. Instead of reading it all to memory we could use the while loop to feed us with 1 row at a time and break when condition is met (row is blank). This should be the most efficient solution.
data = """\
1
2
-10
1241
Empty line above"""
with open("test.txt","w") as f:
f.write(data)
with open("test.txt") as f:
data = []
while True:
row = next(f).strip()
try:
data.append(int(row))
# Break if ValueError is raised (for instance blank line or string)
except ValueError:
break
data
Returns:
[1, 2, -10, 1241]
If you want a compact solution we could use takewhile from itertools. But this won't handle any error.
from itertools import takewhile
with open("test.txt") as f:
data = list(map(int,takewhile(lambda x: x.strip(), f)))

If you want to stop reading the file when the line is empty, you have to break the for loop :
def load_ints(file):
lst = []
for line in file:
if len(line.strip()) != 0:
load = line.split()
load = [int(i) for i in load]
lst = lst + load
else:
break
return lst

You can also use re module:
import re
def load_ints(my_file):
return list(map(int, re.findall('-?\d', my_file.read())))

Compare 2 files in Python

I am trying to compare two files, A and C, in Python and for some reason the double for loop doesn't seem to work properly:
with open(locationA + filenameC,'r') as fileC, open(locationA + filenameA,'r') as fileA:
for lineC in fileC:
fieldC = lineC.split('#')
for lineA in fileA:
fieldA = lineA.split('#')
print 'UserID Clicks' + fieldC[0]
print 'UserID Activities' + fieldA[0]
if (fieldC[0] == fieldA[0]) and (fieldC[2] == fieldA[2]):
print 'OK'
Here, only the line of C seems to be compared, but for the other lines, the "A loop" seems to be ignored.
Can anyone help me with this?

Your problem is that once you iterate over fileA once you need to change the pointer to the beginning of the file again.
So what you might do is create two lists from both files and iterate over them as many times as you want. For example:
fileC_list = fileC.readlines()
fileA_list = fileA.readlines()
for lineC in fileC_list:
# do something
for lineA in fileA_list:
# do somethins

The problem with nested loops (from the point of view of your current problem) is precisely that the inner loop runs to completion for each iteration of the outer loop. So instead, set lineA by calling for the next item from the fileA iterator explicitly:
with open(locationA + filenameC,'r') as fileC, open(locationA + filenameA,'r') as fileA:
for lineC in fileC:
fieldC = lineC.split('#')
lineA = next(fileA)
fieldA = lineA.split('#')
print 'UserID Clicks' + fieldC[0]
print 'UserID Activities' + fieldA[0]
if (fieldC[0] == fieldA[0]) and (fieldC[2] == fieldA[2]):
print 'OK'
This logic will ignore any extra lines from fileA once fileC is exhausted, and if fileC contains more lines than FileA things might also get ugly without special checks.
A different approach might use itertools.izip() to collect lines from each file in pairs:
import itertools
with open(locationA + filenameC,'r') as fileC, open(locationA + filenameA,'r') as fileA:
for lineC, lineA in itertools.izip(fileC, fileA):
fieldC = lineC.split('#')
fieldA = lineA.split('#')
print 'UserID Clicks' + fieldC[0]
print 'UserID Activities' + fieldA[0]
if (fieldC[0] == fieldA[0]) and (fieldC[2] == fieldA[2]):
print 'OK'
I can't think of any specific reason to use one instead of the other, but if the files are of any size at all refuse the temptation to use the builtin zip() function instead of itertools.izip() - the former returns a list, and so memory usage depends on file sizes, whereas the latter is a generator, and so creates values as they are required.

You are comparing all lines from FileA to each line from FileC. That means, for each line of File C, you will read the entire FileA, and (provided you do move the pointer to the beginning of the File A), you would read it again, and again.
It is easier to read them both at the same time while they both have lines
if they are the same, do something, read from both
if they are different, read from the smallest (Line A < Line C, read from File A only; Line C < Line A, read from Line C only)
and make two last loops while there are remaining lines (two loops, one for each file, as you do not know which one ran out of lines)

I know this is an old thread but it comes up on google when someone is looking for a solution to compare 2 text files in python.
This code worked for me.
You can update the codes and use "with open" instead and fine tune as you like but it does the job.
# Ask the user to enter the names of files to compare
fname1 = input("Enter the first filename (text1.txt): ")
fname2 = input("Enter the second filename (text1.txt): ")
# Open file for reading in text mode (default mode)
f1 = open(fname1)
f2 = open(fname2)
# Print confirmation
print("-----------------------------------")
print("Comparing files ", " > " + fname1, " < " +fname2, sep='\n')
print("-----------------------------------")
# Read the first line from the files
f1_line = f1.readline()
f2_line = f2.readline()
# Initialize counter for line number
line_no = 1
# Loop if either file1 or file2 has not reached EOF
while f1_line != '' or f2_line != '':
# Strip the leading whitespaces
f1_line = f1_line.rstrip()
f2_line = f2_line.rstrip()
# Compare the lines from both file
if f1_line != f2_line:
# If a line does not exist on file2 then mark the output with + sign
if f2_line == '' and f1_line != '':
print(">+", "Line-%d" % line_no, f1_line)
# otherwise output the line on file1 and mark it with > sign
elif f1_line != '':
print(">", "Line-%d" % line_no, f1_line)
# If a line does not exist on file1 then mark the output with + sign
if f1_line == '' and f2_line != '':
print("<+", "Line-%d" % line_no, f2_line)
# otherwise output the line on file2 and mark it with < sign
elif f2_line != '':
print("<", "Line-%d" % line_no, f2_line)
# Print a blank line
print()
#Read the next line from the file
f1_line = f1.readline()
f2_line = f2.readline()
#Increment line counter
line_no += 1
# Close the files
f1.close()
f2.close()

skipping a line while reading a file with a for loop

I am trying to figure out a way to skip the next two lines in a file if a condition in the first line is true. Any ideas on a good way to do this? Here's what I have so far...
def main():
file = open(r'C:\Users\test\Desktop\test2.txt', 'r+')
ctr = 1
for current_line in file:
assert ctr<3
if current_line[0:6] == str("001IU"):
pass
else:
if ctr == 1 and current_line[9:11] == str("00"):
do something...
ctr += 1
elif ctr == 1 and current_line[9:11] != str("00"):
pass #I want it to skip the next two lines in the loop
elif ctr == 2:
do something...
ctr = 1
else:
raise ValueError

In Python 2.6 or above, use
next(file)
next(file)
to skip two items of the iterator file, i.e. the next two lines.

file.next()
file.next()
i'd do this way...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read text files in python with specified condition? - python

Related

How can I go to the next line in .txt file?

Is there a way to read a line backwards in Python?

Python: reading integers in a file and adding them to a list

Compare 2 files in Python

skipping a line while reading a file with a for loop

Categories

Resources