Compare 2 files in Python - python

I am trying to compare two files, A and C, in Python and for some reason the double for loop doesn't seem to work properly:
with open(locationA + filenameC,'r') as fileC, open(locationA + filenameA,'r') as fileA:
for lineC in fileC:
fieldC = lineC.split('#')
for lineA in fileA:
fieldA = lineA.split('#')
print 'UserID Clicks' + fieldC[0]
print 'UserID Activities' + fieldA[0]
if (fieldC[0] == fieldA[0]) and (fieldC[2] == fieldA[2]):
print 'OK'
Here, only the line of C seems to be compared, but for the other lines, the "A loop" seems to be ignored.
Can anyone help me with this?

Your problem is that once you iterate over fileA once you need to change the pointer to the beginning of the file again.
So what you might do is create two lists from both files and iterate over them as many times as you want. For example:
fileC_list = fileC.readlines()
fileA_list = fileA.readlines()
for lineC in fileC_list:
# do something
for lineA in fileA_list:
# do somethins

The problem with nested loops (from the point of view of your current problem) is precisely that the inner loop runs to completion for each iteration of the outer loop. So instead, set lineA by calling for the next item from the fileA iterator explicitly:
with open(locationA + filenameC,'r') as fileC, open(locationA + filenameA,'r') as fileA:
for lineC in fileC:
fieldC = lineC.split('#')
lineA = next(fileA)
fieldA = lineA.split('#')
print 'UserID Clicks' + fieldC[0]
print 'UserID Activities' + fieldA[0]
if (fieldC[0] == fieldA[0]) and (fieldC[2] == fieldA[2]):
print 'OK'
This logic will ignore any extra lines from fileA once fileC is exhausted, and if fileC contains more lines than FileA things might also get ugly without special checks.
A different approach might use itertools.izip() to collect lines from each file in pairs:
import itertools
with open(locationA + filenameC,'r') as fileC, open(locationA + filenameA,'r') as fileA:
for lineC, lineA in itertools.izip(fileC, fileA):
fieldC = lineC.split('#')
fieldA = lineA.split('#')
print 'UserID Clicks' + fieldC[0]
print 'UserID Activities' + fieldA[0]
if (fieldC[0] == fieldA[0]) and (fieldC[2] == fieldA[2]):
print 'OK'
I can't think of any specific reason to use one instead of the other, but if the files are of any size at all refuse the temptation to use the builtin zip() function instead of itertools.izip() - the former returns a list, and so memory usage depends on file sizes, whereas the latter is a generator, and so creates values as they are required.

You are comparing all lines from FileA to each line from FileC. That means, for each line of File C, you will read the entire FileA, and (provided you do move the pointer to the beginning of the File A), you would read it again, and again.
It is easier to read them both at the same time while they both have lines
if they are the same, do something, read from both
if they are different, read from the smallest (Line A < Line C, read from File A only; Line C < Line A, read from Line C only)
and make two last loops while there are remaining lines (two loops, one for each file, as you do not know which one ran out of lines)

I know this is an old thread but it comes up on google when someone is looking for a solution to compare 2 text files in python.
This code worked for me.
You can update the codes and use "with open" instead and fine tune as you like but it does the job.
# Ask the user to enter the names of files to compare
fname1 = input("Enter the first filename (text1.txt): ")
fname2 = input("Enter the second filename (text1.txt): ")
# Open file for reading in text mode (default mode)
f1 = open(fname1)
f2 = open(fname2)
# Print confirmation
print("-----------------------------------")
print("Comparing files ", " > " + fname1, " < " +fname2, sep='\n')
print("-----------------------------------")
# Read the first line from the files
f1_line = f1.readline()
f2_line = f2.readline()
# Initialize counter for line number
line_no = 1
# Loop if either file1 or file2 has not reached EOF
while f1_line != '' or f2_line != '':
# Strip the leading whitespaces
f1_line = f1_line.rstrip()
f2_line = f2_line.rstrip()
# Compare the lines from both file
if f1_line != f2_line:
# If a line does not exist on file2 then mark the output with + sign
if f2_line == '' and f1_line != '':
print(">+", "Line-%d" % line_no, f1_line)
# otherwise output the line on file1 and mark it with > sign
elif f1_line != '':
print(">", "Line-%d" % line_no, f1_line)
# If a line does not exist on file1 then mark the output with + sign
if f1_line == '' and f2_line != '':
print("<+", "Line-%d" % line_no, f2_line)
# otherwise output the line on file2 and mark it with < sign
elif f2_line != '':
print("<", "Line-%d" % line_no, f2_line)
# Print a blank line
print()
#Read the next line from the file
f1_line = f1.readline()
f2_line = f2.readline()
#Increment line counter
line_no += 1
# Close the files
f1.close()
f2.close()

Related

Can I find a line in a text file, if I know its number in python?

word = "some string"
file1 = open("songs.txt", "r")
flag = 0
index = 0
for line in file1:
index += 1
if word in line:
flag = 1
break
if flag == 0:
print(word + " not found")
else:
#I would like to print not only the line that has the string, but also the previous and next lines
print(?)
print(line)
print(?)
file1.close()
Use contents = file1.readlines() which converts the file into a list.
Then, loop through contents and if word is found, you can print contents[i], contents[i-1], contents[i+1]. Make sure to add some error handling if word is in the first line as contents[i-1] would throw and error.
word = "some string"
file1 = open("songs.txt", "r")
flag = 0
index = 0
previousline = ''
nextline = ''
for line in file1:
index += 1
if word in line:
finalindex = index
finalline = line
flag = 1
elsif flag==1
print(previousline + finalline + line)
print(index-1 + index + index+1)
else
previousline = line
You basically already had the main ingredients:
you have line (the line you currently evaluate)
you have the index (index)
the todo thus becomes storing the previous and next line in some variable and then printing the results.
have not tested it but code should be something like the above.....
splitting if you find the word, if you have found it and you flagged it previous time and if you have not flagged it.
i believe the else-if shouldnt fire unless flag ==1

How can I go to the next line in .txt file?

How can I read only first symbol in each line with out reading all line, using python?
For example, if I have file like:
apple
pear
watermelon
In each iteration I must store only one (the first) letter of line.
Result of program should be ["a","p","w"], I tried to use file.seek(), but how can I move it to the new line?
ti7 answer is great, but if the lines might be too long to save in memory, you might wish to read char-by-char to prevent storing the whole line in memory:
from pathlib import Path
from typing import Iterator
NEWLINE_CHAR = {'\n', '\r'}
def first_chars(file_path: Path) -> Iterator[str]:
with open(file_path) as fh:
new_line = True
while c := fh.read(1):
if c in NEWLINE_CHAR:
new_line = True
elif new_line:
yield c
new_line = False
Test:
path = Path('/some/path/a.py')
easy_first_chars = [l[0] for l in path.read_text().splitlines() if l]
smart_first_chars = list(first_chars(path))
assert smart_first_chars == easy_first_chars
file-like objects are iterable, so you can directly use them like this
collection = []
with open("input.txt") as fh:
for line in fh: # iterate by-lines over file-like
try:
collection.append(line[0]) # get the first char in the line
except IndexError: # line has no chars
pass # consider other handling
# work with collection
You may also consider enumerate() if you cared about which line a particular value was on, or yielding line[0] to form a generator (which may allow a more efficient process if it can halt before reading the entire file)
def my_generator():
with open("input.txt") as fh:
for lineno, line in enumerate(fh, 1): # lines are commonly 1-indexed
try:
yield lineno, line[0] # first char in the line
except IndexError: # line has no chars
pass # consider other handling
for lineno, first_letter in my_generator():
# work with lineno and first_letter here and break when done
You can read one letter with file.read(1)
file = open(filepath, "r")
letters = []
# Initilalized to '\n' to sotre first letter
previous = '\n'
while True:
# Read only one letter
letter = file.read(1)
if letter == '':
break
elif previous == '\n':
# Store next letter after a next line '\n'
letters.append(letter)
previous = letter

Read next lines of a file to determine whether or not to print current line

I'm trying to write this program slicer program in python, but this little bit is taking me forever to figure out.
I want to read certain file line by line and only print the lines that affect a certain variable.
In this case num
this is the text file I'm reading from:
a = 3
b = 4
num = 0
while a > b:
if a > b:
a = a +2
b = b + 1
if a > 2:
num-=1
My python file that has to read this file ^ and it has to isolate the variable num by printing the lines that contain the variable num and the loop statements where num is inside.
This is the output I expect:
num = 0
while a > b:
if a > b:
if a > 2:
num-=1
This is my code so far and the part of the code that is in charge of reading the lines inside the loops is not working:
# Open file
file = open('text.py', 'r')
Lines = file.readlines()
count = 0
word = "num"
# Store all the loop statements that can appear in the file
keyword = ["while", "for", "if", "else", "elif", "def"]
for i in range(len(Lines)):
count += 1
# if line contains keyword
if any(word in Lines[i] for word in keyword):
# read lines inside the loop
# count_indent is a function that counts the indent of the line to compare it and see if the line is inside the loop or not. I didn't added to the question shorter
line_indent = count_indent(Lines[i])+4
while i <= len(Lines) and count_indent(Lines[i + 1]) == line_indent:
# if word appears in the next lines that are inside the loop then we print the loop statement
if word in Lines[i + 1]:
print(str(count) + " : " + Lines[i])
else:
break
# move to next line inside the loop
i = i + 1
# If line is not empty and contains word
if Lines[i] != '\n' and word in Lines[i]:
print(str(count) + " : " + Lines[i])
This code output is similar to what you expect.
file = open('text.py', 'r', newline = '\n')
lines = file.readlines()
word = "num"
keyword = ["while", "for", "if", "else", "elif", "def"]
with open('out_text.py', 'wt') as fout:
for line in lines:
if any(word in line for word in keyword):
fout.write(line + '\n')
elif word in line:
fout.write(line + '\n')

Writing to two specific positions in each line of a text file (or rather creating a new file with the information added)

aspiring Python newb (2 months) here. I am trying to create a program that inserts information to two specific places of each line of a .txt file, actually creating a new file in the process.
The information in the source file is something like this:
1,340.959,859.210,0.0010,VV53
18abc,34099.9590,85989.2100,0.0010,VV53
00y46646464,34.10,859487.2970,11.4210,RP27
Output would be:
1,7340.959,65859.210,0.0010,VV53
18abc,734099.9590,6585989.2100,0.0010,VV53
00y46646464,734.10,65859487.2970,11.4210,RP27
Each line different, hundreds of lines. The specific markers I'm looking for are the first and second occurence of a comma (,). The stuff needs to be added after the first and second comma. You'll know what I mean when you see the code.
I have gotten as far as this: the program finds the correct places and inserts what I need, but doesn't write more than 1 line to the new file. I tried debugging and seeing what's going on 'under the hood', all seemed good there.
Lots of scrapping code and chin-holding later I'm still stuck where I was a week ago.
tl;dr Code only outputs 1 line to new file, need hundreds.
f = open('test.txt', 'r')
new = open('new.txt', 'w')
first = ['7']
second = ['65']
line = f.readline()
templist = list(line)
counter = 0
while line != '':
for i, j in enumerate(templist):
if j == ',':
place = i + 1
templist1 = templist[:place]
templist2 = templist[place:]
counter += 1
if counter == 1:
for i, j in enumerate(templist2):
if j == ',':
place = i + 1
templist3 = templist2[:place]
templist4 = templist2[place:]
templist5 = templist1 + first + templist3 + second + templist4
templist6 = ''.join(templist5)
new.write(templist6)
counter += 1
break
if counter == 2:
break
break
line = f.readline()
templist = list(line)
f.close()
new.close()
If I'm understanding your samples and code correctly, this might be a valid approach:
with open('test.txt', 'r') as infd, open('new.txt', 'w') as outfd:
for line in infd:
fields = line.split(',')
fields[1] = '7' + fields[1]
fields[2] = '65' + fields[2]
outfd.write('{}\n'.format(','.join(fields)))

How to write every other line in a text file?

inputFile = open('original_text.txt','r')
outputFile = open('half_text.txt','w')
line = inputFile.readline()
count = 0
for line in inputFile:
outputFile.write(line)
count += 1
if count % 2 == 0:
print(line)
inputFile.close()
outputFile.close()
It keeps skipping the 1st line. For instance, the text file right now has 10 lines. So it prints the 3rd 5th 7th and 9th. So I'm just missing the first.
This skips the first line because you read it and throw it away before the loop. Delete line 4,
line = inputFile.readline()
Add change the count parity to odd with
if count % 2 == 1:
For a slightly better design, use a boolean that toggles:
count = False
for line in inputFile:
outputFile.write(line)
count = not count
if count:
print(line)
inputFile.close()
outputFile.close()
I tried running the program on itself:
inputFile = open('this_file.py', 'r')
count = False
outputFile.write(line)
if count:
outputFile.close()
use next to skip the next line. You may need to watch for a StopIteration error on the call to next(fh) if you have odd lines.
outputFile = open('half_text.txt','w')
with open('original_text.txt') as fh:
for line1 in fh:
outputFile.write(line1)
try:
next(fh)
except StopIteration:
pass
outputFile.close()
The for loop will go over the file line by line and when you use the readline, it will advance the pointer forward inside the loop. Therefore odd will go over odd numbered lines and even goes over even numbered lines.
with open (path, 'r') as fi:
for odd in fi:
even = fi.readline()
print ('this line is odd! :' + odd)
print ('this line is even! :' + even)

Categories

Resources