Replacing text from one file from another file - python

The f1.write(line2) works but it does not replace the text in the file, it just adds it to the file. I want the file1 to be identical to file2 if they are different by overwriting the text from file1 with the text from file2
Here is my code:
with open("file1.txt", "r+") as f1, open("file2.txt", "r") as f2:
for line1 in f1:
for line2 in f2:
if line1 == line2:
print("same")
else:
print("different")
f1.write(line2)
break
f1.close()
f2.close()

I would read both files create a new list with the different elements replaced and then write the entire list to the file
with open('file2.txt', 'r') as f:
content = [line.strip() for line in f]
with open('file1.txt', 'r') as j:
content_a = [line.strip() for line in j]
for idx, item in enumerate(content_a):
if content_a[idx] == content[idx]:
print('same')
pass
else:
print('different')
content_a[idx] = content[idx]
with open('file1.txt', 'w') as k:
k.write('\n'.join(content_a))
file1.txt before:
chrx#chrx:~/python/stackoverflow/9.28$ cat file1.txt
this
that
this
that
who #replacing
that
what
blah
code output:
same
same
same
same
different
same
same
same
file1.txt after:
chrx#chrx:~/python/stackoverflow/9.28$ cat file1.txt
this
that
this
that
vash #replaced who
that
what
blah

I want the file1 to be identical to file2
import shutil
with open('file2', 'rb') as f2, open('file1', 'wb') as f1:
shutil.copyfileobj(f2, f1)
This will be faster as you don't have to read file1.
Your code is not working because you'd have to position the file current pointer (with f1.seek() in the correct position to write the line.
In your code, you're reading a line first, and that positions the pointer after the line just read. When writing, the line data will be written in the file in that point, thus duplicating the line.
Since lines can have different sizes, making this work won't be easy, because even if you position the pointer correctly, if some line is modified to get bigger it would overwrite part of the next line inside the file when you write it. You would end up having to cache at least part of the file contents in memory anyway.
Better truncate the file (erase its contents) and write the other file data directly - then they will be identical. That's what the code in the answer does.

Related

Comparing contents of two txt.files for deleted lines or changes in python

I'm trying to compare two .txt files for changes or deleted lines. If its deleted I want to output what the deleted line was and if it was changed I want to output the new line. I originally tried comparing line to line but when something was deleted it wouldn't work for my purpose:
for line1 in f1:
for line1 in f2:
if line1==line1:
print("SAME",file=x)
else:
print(f"Original:{line1} / New:{line1}", file=x)
Then I tried not comparing line to line so I could figure out if something was deleted but I'm not getting any output:
def check_diff(f1,f2):
check = {}
for file in [f1,f2]:
with open(file,'r') as f:
check[file] = []
for line in f:
check[file].append(line)
diff = set(check[f1]) - set(check[f2])
for line in diff:
print(line.rstrip(),file=x)
I tried combining a lot of other questions previously asked similar to my problem to get this far, but I'm new to python so I need a little extra help. Thanks! Please let me know if I need to add any additional information.
The concept is simple. Lets say file1,txt is the original file, and file2 is the one we need to see what changes were made to it:
with open('file1.txt', 'r') as f:
f1 = f.readlines()
with open('file2.txt', 'r') as f:
f2 = f.readlines()
deleted = []
added = []
for l in f1:
if l not in f2:
deleted.append(l)
for l in f2:
if l not in f1:
added.append(l)
print('Deleted lines:')
print(''.join(deleted))
print('Added lines:')
print(''.join(added))
For every line in the original file, if that line isn't in the other file, then that means that the line have been deleted. If it's the other way around, that means the line have been added.
I am not sure how you would quantify a changed line (since you could count it as one deleted plus one added line), but perhaps something like the below would be of some aid. Note that if your files are large, it might be faster to store the data in a set instead of a list, since the former has typically a search time complexity of O(1), while the latter has O(n):
with open('file1.txt', 'r') as f1, open('file2.txt', 'r') as f2:
file1 = set(f1.read().splitlines())
file2 = set(f2.read().splitlines())
changed_lines = [line for line in file1 if line not in file2]
deleted_lines = [line for line in file2 if line not in file1]
print('Changed lines:\n' + '\n'.join(changed_lines))
print('Deleted lines:\n' + '\n'.join(deleted_lines))

Search Large file for text and write result to file

I have file one that is 2.4 millions lines (256mb) and file two that is 32 thousand lines (1.5mb).
I need to go through file two line by line and print matching line in file one.
Pseudocode:
open file 1, read
open file 2, read
open results, write
for line2 in file 2:
for line1 in file 1:
if line2 in line1:
write line1 to results
stop inner loop
My Code:
p = open("file1.txt", "r")
d = open("file2.txt", "r")
o = open("results.txt", "w")
for hash1 in p:
hash1 = hash1.strip('\n')
for data in d:
hash2 = data.split(',')[1].strip('\n')
if hash1 in hash2:
o.write(data)
o.close()
d.close()
p.close()
I am expecting 32k results.
Your file2 is not too large, so it is perfectly well to load it in memory.
Load file2.txt into a set to speed up search process and remove duplicates;
Remove empty line from a set;
Scan file1.txt line-by-line and write found matches in results.txt.
with open("file2.txt","r") as f:
lines = set(f.readlines())
lines.discard("\n")
with open("results.txt", "w") as o:
with open("file1.txt","r") as f:
for line in f:
if line in lines:
o.write(line)
If file2 was larger, we could have split it in chunks and repeat the same for every chunk, but in that case it would be harder to compile the results together

How to compare 2 txt files in Python

I have written a program to compare file new1.txt with new2.txt and the lines which are there in new1.txt and not in new2.txt has to be written to difference.txt file.
Can someone please have a look and let me know what changes are required in the below given code. The code prints the same value multiple times.
file1 = open("new1.txt",'r')
file2 = open("new2.txt",'r')
NewFile = open("difference.txt",'w')
for line1 in file1:
for line2 in file2:
if line2 != line1:
NewFile.write(line1)
file1.close()
file2.close()
NewFile.close()
Here's an example using the with statement, supposing the files are not too big to fit in the memory
# Open 'new1.txt' as f1, 'new2.txt' as f2 and 'diff.txt' as outf
with open('new1.txt') as f1, open('new2.txt') as f2, open('diff.txt', 'w') as outf:
# Read the lines from 'new2.txt' and store them into a python set
lines = set(f2.readlines())
# Loop through each line in 'new1.txt'
for line in f1:
# If the line was not in 'new2.txt'
if line not in lines:
# Write the line to the output file
outf.write(line)
The with statement simply closes the opened file(s) automatically. These two pieces of code are equal:
with open('temp.log') as temp:
temp.write('Temporary logging.')
# equal to:
temp = open('temp.log')
temp.write('Temporary logging.')
temp.close()
Yet an other way using two sets, but this again isn't too memory effecient. If your files are big, this wont work:
# Again, open the three files as f1, f2 and outf
with open('new1.txt') as f1, open('new2.txt') as f2, open('diff.txt', 'w') as outf:
# Read the lines in 'new1.txt' and 'new2.txt'
s1, s2 = set(f1.readlines()), set(f2.readlines())
# `s1 - s2 | s2 - s2` returns the differences between two sets
# Now we simply loop through the different lines
for line in s1 - s2 | s2 - s1:
# And output all the different lines
outf.write(line)
Keep in mind, that this last code might not keep the order of your lines
For example you got
file1:
line1
line2
and file2:
line1
line3
line4
When you compare line1 and line3, you write to your output file new line (line1), then you go to compare line1 and line4, again they do not equal, so again you print into your output file (line1)... You need to break both for s, if your condition is true. You can use some help variable to break outer for.
If your file is a big one .You could use this.for-else method:
the else method below the second for loop is executes only when the second for loop completes it's execution with out break that is if there is no match
Modification:
with open('new1.txt') as file1, open('diff.txt', 'w') as NewFile :
for line1 in file1:
with open('new2.txt') as file2:
for line2 in file2:
if line2 == line1:
break
else:
NewFile.write(line1)
For more on for-else method see this stack overflow question for-else
It is because of your for loops.
If I understand well, you want to see what lines in file1 are not present in file2.
So for each line in file1, you have to check if the same line appears in file2. But this is not what you do with your code : for each line in file1, you check every line in file2 (this is right), but each time the line in file2 is different from the line if file1, you print the line in file1! So you should print the line in file1 only AFTER having checked ALL the lines in file2, to be sure the line does not appear at least one time.
It could look like something as below:
file1 = open("new1.txt",'r')
file2 = open("new2.txt",'r')
NewFile = open("difference.txt",'w')
for line1 in file1:
if line1 not in file2:
NewFile.write(line1)
file1.close()
file2.close()
NewFile.close()
I always find working with sets makes comparison of two collections easier. Especially because"does this collection contain this" operations runs i O(1), and most nested loops can be reduced to a single loop (easier to read in my opinion).
with open('test1.txt') as file1, open('test2.txt') as file2, open('diff.txt', 'w') as diff:
s1 = set(file1)
s2 = set(file2)
for e in s1:
if e not in s2:
diff.write(e)
Your loop is executed multiple times. To avoid that, use this:
file1 = open("new1.txt",'r')
file2 = open("new2.txt",'r')
NewFile = open("difference.txt",'w')
for line1, line2 in izip(file1, file2):
if line2 != line1:
NewFile.write(line1)
file1.close()
file2.close()
NewFile.close()
Print to the NewFile, only after comparing with all lines of file2
present = False
for line2 in file2:
if line2 == line1:
present = True
if not present:
NewFile.write(line1)
You can use basic set operations for this:
with open('new1.txt') as f1, open('new2.txt') as f2, open('diffs.txt', 'w') as diffs:
diffs.writelines(set(f1).difference(f2))
According to this reference, this will execute with O(n) where n is the number of lines in the first file. If you know that the second file is significantly smaller than the first you can optimise with set.difference_update(). This has complexity O(n) where n is the number of lines in the second file. For example:
with open('new1.txt') as f1, open('new2.txt') as f2, open('diffs.txt', 'w') as diffs:
s = set(f1)
s.difference_update(f2)
diffs.writelines(s)

How to ignore if line already exists in writing file in python?

I have file1.txt which contains lines as
list 0
list 1
line 1
In file2.txt i want to write only if the line is not already exists in file2.txt
my code:
fo=open("file1.txt","r")
fin=open("file2.txt","a")
lines=fo.readlines()
for line in lines:
if "list" in line:
fin.write(line)
for line in lines:
if "li" in line:
fin.write(line)
Output: It is printing the lines twice.Here I want to write only once if the same line is repeated.
list 0
list 1
list 0
list 1
line 1
My output should be
list 0
list 1
line 1
My suggestion would be, to first read all lines of file2.txt and put them into a suitable datastructure (i.e. a Set).
Then reopen file2.txt in append mode, iterate over all lines of file1.txt and write only these that are not in the set (here, the in operator comes handy...)
with open("file2.txt", "r") as f2:
lineset = set(f2)
with open("file2.txt", "a") as f2:
with open("file1.txt", "r") as f1:
for line in f1:
if not line in lineset:
d2.write(line)
This will read all the lines in file2 and only write a line to file2 if its not already there. It will also close your file automatically by using the excellent "with" statement in python. :)
with open("file1.txt","r") as file1, open("file2.txt", "w+") as file2:
lines2 = file2.readlines()
for line in file1:
if line not in lines2:
file2.write(line)
If you want to use list iteration, the same code is just 2 lines, but I prefer the readability of the first version.
with open("file1.txt", "r") as file1, open("file2.txt", "w+") as file2:
[file2.write(line) for line in file1 if line not in file2.readlines()]
Use a set to track the collection of lines in the file2.txt file.
fo=open("file1.txt","r")
fin=open("file2.txt","a")
lines=fo.readlines()
# Rewing the file so that we can read it's contents.
fin.seek(0)
existing_lines = set(fin)
for line in lines:
if line not in existing_lines:
fin.write(line)
existing_lines.add(line)
You would want to do something like:
fo=open("file1.txt","r")
fin=open("file2.txt","a")
linesOut=fo.readlines()
linesIn=fin.readlines()
for lineOut in linesOut:
#check each line in linesIn to see if it contains lineOut
writeLine=True
for lineIn in linesIn:
if lineOut==lineIn:
writeLine=False
#if not add it
if writeLine:
fin.write(lineOut)

How to delete a line in file1 that appeared once or multiple times in file2 in python?

I have two text files: file1 has 40 lines and file2 has 1.3 million lines
I would like to compare every line in file1 with file2.
If a line in file1 appeared once or multiple times in file2,
this line(lines) should be deleted from file2 and remaining lines of file2 return to a third file3.
I could painfully delete one line in file1 from file2 by manually copying the line,
indicated as "unwanted_line" in my code. Does anyone knows how to do this in python.
Thanks in advance for your assistance.
Here's my code:
fname = open(raw_input('Enter input filename: ')) #file2
outfile = open('Value.txt','w')
unwanted_line = "222" #This is in file1
for line in fname.readlines():
if not unwanted_line in line:
# now remove unwanted_line from fname
data =line.strip("unwanted_line")
# write it to the output file
outfile.write(data)
print 'results written to:\n', os.getcwd()+'\Value.txt'
NOTE:
This is how I got it to work for me. I would like to thank everyone who contributed towards the solution. I took your ideas here.I used set(),where intersection (common lines) of file1 with file2 is removed, then, the unique lines in file2 are return to file3. It might not be most elegant way of doing it, but it works for me. I respect everyone of your ideas, there are great and wonderful, it makes me feel python is the only programming language in the whole world.
Thanks guys.
def diff_lines(filenameA,filenameB):
fnameA = set(filenameA)
fnameB = set(filenameB)
data = []
#identify lines not common to both files
#diff_line = fnameB ^ fnameA
diff_line = fnameA.symmetric_difference(fnameB)
data = list(diff_line)
data.sort()
return data
Read file1; put the lines into a set or dict (it'll have to be a dict if you're using a really old version of Python); now go through file2 and say something like if line not in things_seen_in_file_1: outfile.write(line) for each line.
Incidentally, in recent Python versions you shouldn't bother calling readlines: an open file is an iterator and you can just say for line in open(filename2): ... to process each line of the file.
Here is my version, but be aware that miniscule variations can cause line not to be considered same (like one space before new line).
file1, file2, file3 = 'verysmalldict.txt', 'uk.txt', 'not_small.txt'
drop_these = set(open(file1))
with open(file3, 'w') as outfile:
outfile.write(''.join(line for line in open(file2) if line not in drop_these))
with open(path1) as f1:
lines1 = set(f1)
with open(path2) as f2:
lines2 = tuple(f2)
lines3 = x for x in lines2 if x in lines1
lines2 = x for x in lines2 if x not in lines1
with open(path2, 'w') as f2:
f2.writelines(lines2)
with open(path3, 'w') as f3:
f3.writelines(lines3)
Closing f2 by using 2 separate with statements is a matter of personal preference/design choice.
what you can do is load file1 completely into memory (since it is small) and check each line in file2 if it matches a line in file1. if it doesn't then write it to file three. Sort of like this:
file1 = open('file1')
file2 = open('file2')
file3 = open('file3','w')
lines_from_file1 = []
# Read in all lines from file1
for line in file1:
lines_from_file1.append(line)
file1.close()
# Now iterate over lines of file2
for line2 in file2:
keep_this_line = True
for line1 in lines_from_file1:
if line1 == line2:
keep_this_line = False
break # break out of inner for loop
if keep_this_line:
# line from file2 is not in file1 so save it into file3
file3.write(line2)
file2.close()
file3.close()
Maybe not the most elegant solution, but if you don't have to do it ever 3 seconds, it should work.
EDIT: By the way, the question in the text somewhat differs from the title. I tried to answer the question in the text.

Categories

Resources