Show different lines in files - python

I have found a script which shows different lines in the file NEW.txt which do not exist in OLD.txt file. It works fine, but the problem is that script is messing the lines order when I get the output. This is the script:
with open(r'C:\Users\AMB\NEW.txt') as f, open(r'C:\Users\AMB\OLD.txt') as f2:
lines1 = set(map(str.rstrip, f))
s = str(lines1.difference(map(str.rstrip, f2)))
s = s.replace(',', '\n').replace("'", '').replace("{", '').replace("}", '')
print(s)
So let's suppose that this is the OLD.txt content:
aaaaaaaaaaaa
cccccccccccc
eeeeeeeeeeee
And this is the NEW.txt content:
aaaaaaaaaaaa
bbbbbbbbbbbb
cccccccccccc
dddddddddddd
eeeeeeeeeeee
hhhhhhhhhhhh
I would like to get this output:
bbbbbbbbbbbb
dddddddddddd
hhhhhhhhhhhh
But I am getting a random line order, for example:
dddddddddddd
bbbbbbbbbbbb
hhhhhhhhhhhh
(the output is random, and not always the same)
Is there a way to keep the order for output lines in NEW.txt file? Thanks in advance.

You can remove the set and do all the operations using list.
See this link for more help.
Solution:
s = ""
with open(r'C:\Users\AMB\NEW.txt') as f, open(r'C:\Users\AMB\OLD.txt') as f2:
lines1 = list(map(str.rstrip, f)) #list of words in f
lines2 = list(map(str.rstrip, f2)) #list of words in f2
#finds the difference between both lists
diff = [i for i in lines1 + lines2 if i not in lines1 or i not in lines2]
for words in diff:
s = s + words + '\n' #Appending all words to form a single string
s = s.rstrip() #remove last line whitespace
print(s)

Related

how to remove first 3 character of every word in each line

I have a data file and I want to delete first 3 character of each word in each line
Here is the example of my file:
input
"13X5106,18C2295,17C1462,17X4893,14X4215,16C3729,14C1026,END"
"17C2308,14C1030,15C904,20C1602,17C1017,18C1030,END"
"13C2369,20C1505,18X4245,15C1224,14C1031,12C885,17C936,END"
"11C3080,13C4123,16C1180,14C1141,15C932,18C1467,END"
output
"5106,2295,1462,4893,4215,3729,1026,END"
"2308,1030,904,1602,1017,1030,END"
"2369,1505,4245,1224,1031,885,936,END"
"3080,4123,1180,1141,932,1467,END"
I tried to code but the output is not shown the way I want.
file1 = open('D:\pythonProject\block1.txt','r')
data = file1.read()
remove_char = [sub[3:] for sub in data]
print(remove_char)
If you use file1.readlines(), then you will need to split by comma. The only problem is that it may introduce an end-of-line character at the end. This is because of your END string at the end of each line. But this is easy to get rid of as shown below:
Code:
file1 = open('D:\pythonProject\block1.txt','r')
remove_char = [[s[3:] for s in sub.split(',')] for sub in file1.readlines()]
for the_list in remove_char:
print(the_list[0:-1])
Output:
['5106', '2295', '1462', '4893', '4215', '3729', '1026']
['2308', '1030', '904', '1602', '1017', '1030']
['2369', '1505', '4245', '1224', '1031', '885', '936']
['3080', '4123', '1180', '1141', '932', '1467']
I read the file with f.readlines and got rid of the " on each line.
Then each word is split by , and processed as word[3:].
with open("...", "r") as f:
lines = f.readlines()
lines = map(lambda x: x.replace('"',"").strip("\n").split(","), lines)
res = []
for line in lines:
new_line = []
for word in line:
if word != "END":
word = word[3:]
new_line.append(word)
res.append(",".join(new_line))
res = "\n".join(res)
print(res)
# Output
"""
5106,2295,1462,4893,4215,3729,1026,END
2308,1030,904,1602,1017,1030,END
2369,1505,4245,1224,1031,885,936,END
3080,4123,1180,1141,932,1467,END
"""
You can try this to print each line in for loop:
file1 = open('D:\pythonProject\block1.txt')
data = file1.readlines()
for sub in data:
line = [j[3:] for i in [eval(sub)] for j in i.split(',')[:-1]]+[eval(sub)[-3:]]
remove_char = f'"{chr(44).join(line)}"'
print(remove_char)
Or generator expression:
remove_char = '\n'.join('"'+chr(44).join(j[3:] for i in [eval(s)]
for j in i.split(chr(44))[:-1])+','+chr(44).join([eval(s)[-3:]])+'"'
for s in open('D:\pythonProject\block1.txt').readlines())
print(remove_char)
Output:
"5106,2295,1462,4893,4215,3729,1026,END"
"2308,1030,904,1602,1017,1030,END"
"2369,1505,4245,1224,1031,885,936,END"
"3080,4123,1180,1141,932,1467,END"
Here is a quick solution using a list comprehension :
data = ["13X5106", "18C2295"] # this is a sample list of strings
print([code[3:] for code in data if code != "END"])
This will print the same list with all strings with the first three chars discarded skipping the "END" string:
['5106', '2295']

Comparing two txt files and generate a new txt file with the strings that are only in one

I have two txt files with a string per line. I want to compare txt file 1 with txt file 2 and generate a new file with all the strings that are in 2 but not in 1.
I tried something rather simple:
file1 = open("file1.txt", "r")
file2 = open("file2.txt", "r")
for word in file2:
if word not in file1:
print(word)
What this code does, it gives me all the strings from file2 if there is ANY string not in 1.
**file1:**
this
is
a
word
**file2:**
this
is
a
totally
different
word
What I would expect is only the string "totally" and "different". But I get all the strings from file2.
open() function returns file object, not string that is inside that file. In order to read text inside you'd need to read after you open file. Also it's a good habit to close files always when you open them. That's why personally I prefer to use command with open('file1.txt', 'r') as file1: to make sure that after I'm done with that file it's always closed and I don't have to run close method explicitly. It would look like this:
with open('file1.txt', 'r', encoding='utf8') as file1:
with open('file2.txt', 'r', encoding='utf8') as file2:
text1 = file1.read()
text2 = file2.read()
words1 = text1.split('\n')
words2 = text2.split('\n')
unique_words = list(filter(lambda w2: w2 not in words1, words2))
for word in unique_words:
print(word)
You can use the .readlines() function which will convert all the lines in the text file to an element in a list or it will generate a list of the those lines.
file1 = open("file1.txt", "r")
file2 = open("file2.txt", "r")
f1=file1.readlines()
f2=file2.readlines()
for word in f2:
if word not in f1:
print(word)
How about this:-
FILES = ['file1.txt', 'file2.txt']
SETS = []
for _file in FILES:
with open(_file) as infile:
_lines = [line.strip() for line in infile.readlines()]
SETS.append(set(_lines))
print(SETS[1] - SETS[0])
Try with this solution:
file1 = open("file1.txt", "r")
file2 = open("file2.txt", "r")
word1 = [x for x in file1]
word2 = [x for x in file2]
words = [x.strip("\n") for x in word2 if x not in word1]
Output:
['totally', 'different']
I wrote x.strip("\n") because my example file contain a word for each row.

remove specific lines if some specific word is found

Lets say I have a text file that contains following words
a
b
c
d
e>
f
g
h
I>
j
whenever I find a words that contains >, I would like to replace the last two lines from it and itself too.
For example, the output would be this.
a
b
f
j
Is it possible to achieve this ?. For simple replace, I can do this
with open ('Final.txt', 'w') as f2:
with open('initial.txt', 'r') as f1:
for line in f1:
f2.write(line.replace('>', ''))
But I am stuck on how do I go back and delete the last two lines and also the line where the replace happen.
This is one approach using a simple iteration and list slicing.
Ex:
res = []
with open('initial.txt') as infile:
for line in infile:
if ">" in line:
res = res[:-2]
else:
res.append(line)
with open('Final.txt', "w") as f2:
for line in res:
f2.write(line)
Output:
a
b
f
j
Use re.
Here I am assuming that your data is a flat list of lines.
import re
print(re.sub('.*\n.*\n.*>\n','',''.join(data)))

Python linking 2 strings

I am working on python program where the goal is to create a tool that takes the first word word from a file and put it beside another line in a different file.
This is the code snippet:
lines = open("x.txt", "r").readlines()
lines2 = open("c.txt", "r").readlines()
for line in lines:
r = line.split()
line1 = str(r[0])
for line2 in lines2:
l2 = line2
rn = open("b.txt", "r").read()
os = open("b.txt", "w").write(rn + line1+ "\t" + l2)
but it doesn't work correctly.
My question is that I want to make this tool to take the first word from a file, and put it beside a line in from another file for all lines in the file.
For example:
File: 1.txt :
hello there
hi there
File: 2.txt :
michal smith
takawa sama
I want the result to be :
Output:
hello michal smith
hi takaua sama
By using the zip function, you can loop through both simultaneously. Then you can pull the first word from your greeting and add it to the name to write to the file.
greetings = open("x.txt", "r").readlines()
names = open("c.txt", "r").readlines()
with open("b.txt", "w") as output_file:
for greeting, name in zip(greetings, names):
greeting = greeting.split(" ")[0]
output = "{0} {1}\n".format(greeting, name)
output_file.write(output)
Yes , like Tigerhawk indicated you want to use zip function, which combines elements from different iterables at the same index to create a list of tuples (each ith tuple having elements from ith index from each list).
Example code -
lines = open("x.txt", "r").readlines()
lines2 = open("c.txt", "r").readlines()
newlines = ["{} {}".format(x.split()[0] , y) for x, y in zip(lines,lines2)]
with open("b.txt", "w") as opfile:
opfile.write(newlines)
from itertools import *
with open('x.txt', 'r') as lines:
with open('c.txt', 'r') as lines2:
with open('b.txt', 'w') as result:
words = imap(lambda x: str(x.split()[0]), lines)
results = izip(words, lines2)
for word, line in results:
result_line = '{0} {1}'.format(word, line)
result.write(result_line)
This code will work without loading files into memory.

Python comparing two files partially

I have the two input file:
Input 1:
okay sentence
two runway
three runway
right runway
one pathway
four pathway
zero pathway
Input 2 :
okay sentence
two runway
three runway
right runway
zero pathway
one pathway
four pathway
I have used the following code:
def diff(a, b):
y = []
for x in a:
if x not in b:
y.append(x)
else:
b.remove(x)
return y
with open('output_ref.txt', 'r') as file1:
with open('output_ref1.txt', 'r') as file2:
same = diff(list(file1), list(file2))
print same
print "\n"
if '\n' in same:
same.remove('\n')
with open('some_output_file.txt', 'w') as FO:
for line in same:
FO.write(line)
And the expected output is :
one pathway
zero pathway
But the output I am getting an empty output for this. The problem is I don't know how to store the content from the files to the list partially ,then compare and finally read it back from there. Can someone help me in this regard ??
It seems that if you just want to have the common text lines in both files, sets would provide a good way. Something like this:
content1 = set(open("file1", "r"))
content2 = set(open("file2", "r"))
diff_items = content1.difference(content2)
UPDATE: But is it so that the question is about difference in the same sense as the diff utility? I.e. the order is important (looks like that with the examples).
Use sets
with open('output_ref.txt', 'r') as file1:
with open('output_ref1.txt', 'r') as file2:
f1 = [x.strip() for x in file1] # get all lines and strip whitespace
f2 = [x.strip() for x in file2]
five_f1 = f1[0:5] # first five lines
two_f1 = f1[5:] # rest of lines
five_f2 = f2[0:5]
two_f2 = f2[5:]
s1 = set(five_f1) # make sets to compare
s2 = set(two_f1)
s1 = s1.difference(five_f2) # in a but not b
s2 = s2.difference(two_f2)
same = s1.union(s2)
with open('some_output_file.txt', 'w') as FO:
for line in same:
FO.write(line+"\n") # add new line to write each word on separate line
Without sets using your own method:
with open('output_ref.txt', 'r') as file1:
with open('output_ref1.txt', 'r') as file2:
f1 = [x.strip() for x in file1]
f2 = [x.strip() for x in file2]
five_f1 = f1[0:5]
two_f1 = f1[5:]
five_f2 = f2[0:5]
two_f2 = f2[5:]
same = diff(five_f1,five_f2) + diff(two_f1,two_f2)
print same
['one pathway', 'zero pathway']

Categories

Resources