Comparing 2 text files in python

Comparing 2 text files in python - python

I have 2 text files. I want to compare the 2 text files and return a list that has every line number that is different. Right now, I think my code returns the lines that are different, but how do I return the line number instead?
def diff(filename1, filename2):
with open('./exercise-files/text_a.txt', 'r') as filename1:
with open('./exercise-files/text_b.txt', 'r') as filename2:
difference = set(filename1).difference(filename2)
difference.discard('\n')
with open('diff.txt', 'w') as file_out:
for line in difference:
file_out.write(line)
Testing on:
diff('./exercise-files/text_a.txt', './exercise-files/text_b.txt') == [3, 4, 6]
diff('./exercise-files/text_a.txt', './exercise-files/text_a.txt') == []

difference = [
line_number + 1 for line_number, (line1, line2)
in enumerate(zip(filename1, filename2))
if line1 != line2
]
zip takes two (or more) generators and returns a generator of tuples, where each tuple contains the corresponding entries of each generator. enumerate takes this generator and returns a generator of tuples, where the first element is the index and the second the value from the original generator. And it's straightforward from there.

Here is an example which will ignore any surplus lines if one file has more lines than the other. The key is to use enumerate when iterating to get the line number as well as the contents. next can be used to get a line from the file iterator which is not used directly by the for loop.
def diff(filename1, filename2):
difference_line_numbers = []
with open(filename1, "r") as file1, open(filename2, "r") as file2:
for line_number, contents1 in enumerate(file1, 1):
try:
contents2 = next(file2)
except StopIteration:
break
if contents1 != contents2:
difference_line_numbers.append(line_number)
return difference_line_numbers

Related

Writing every two items in a list to a text file, then creating a new line

Full script essentially opens a txt file with three columns separated by a whitespace and saves it to a list words
with open(filename) as f:
words = f.read().split()
Every third item is deleted from the list starting from position 2 using: del words[2::3]
The next step I would like to add is writing to the same filename with two of the three columns (or every two items in words), space them out and then create a new line in a text file:
('list_item1' + ' ' + 'list_item2' + '/n'), ('list_item3' + ' ' + 'list_item4' + '/n'), etc
Any help would be appreciated!
I've thus far only been able to split text from the text file into a list but would like to take every two items from that list and write to text file using the format: ('list_item1/3/5etc' + ' ' + 'list_item2/4/6etc' + '/n')
with open(r'filename', 'w') as fp:
fp.write(" ".join(str(item) for item in words))

You can create an iterator for the list, and use the next function inside the loop to get the next element from the iterator.
my_lst = ["a", "b", "c", "d", "e", "f", "g", "h"]
list_iter = iter(my_lst)
with open(file_name, "w") as fp:
for item in list_iter:
try:
item2 = next(list_iter)
except StopIteration:
fp.write(f"{item}\n")
else:
fp.write(f"{item} {item2}\n")
Which writes a file with:
a b
c d
e f
g h
The item2 line is wrapped in a try..except to catch the StopIteration that is thrown if the list iterator has been exhausted. In this case, only the first item is written to the file. If no StopIteration is thrown, the else block is executed, which writes both items and a newline to the file
A more general implementation of this would be to use a function that gives chunks of the desired size from your original list:
def chunks(collection, chunk_size):
l = [] # An empty list to hold the current chunk
for item in collection:
l.append(item) # Append item to the current chunk
if len(l) == chunk_size:
yield l # If chunk is of required size, yield it
l = [] # Set current chunk to new empty list
# Yield the chunk containing any remaining items
if l:
yield l
See What does the "yield" keyword do? for more information on what yield does.
Iterating over this function yields lists of chunk_size elements. We can just take these lists, join them with a space, and write them to the file:
with open(file_name, "w") as fp:
for chunk in chunks(my_lst, 2):
fp.write(" ".join(chunk))
fp.write("\n")
Or, in a single line (ugh)
with open(file_name, "w") as fp:
fp.write("\n".join(" ".join(chunk) for chunk in chunks(my_lst, 2)))

Remove partial column duplicates from a txt file

How can I export only 1st column partial duplicate lines? For example in.txt contains lines:
red,color,color
red,color,color
blue,color,color
blue,color,color
Desired outcome:
red,color,color
blue,color,color
with open(infile,'r', encoding="cp437",errors="ignore") as in_file, open(outfile,'w', encoding="cp437",errors="ignore") as out_file:
seen = set()
for line in in_file:
if line.split(',')[0] == (str(x).split(',')[0] for x in seen):
continue
seen.add(line)
out_file.write(line)

(str(x).split(',')[0] for x in seen) is a generator expression, it won't be equal to any string, like line.split(',')[0].
If you want to check if a string is equal to any string in an iterable, you could use any:
if any(line.split(',')[0] == str(x).split(',')[0] for x in seen):
or collect the results of the generator expression in a list and use the in operator for membership test:
if line.split(',')[0] in [str(x).split(',')[0] for x in seen]:
But: why don't you just only store the first part of the line (line.split(',')[0]) in the seen list, instead of the whole line, and better yet, use a set instead, this will greatly simplify your code:
seen = set()
for line in in_file:
first_part = line.split(',')[0]
if first_part in seen:
continue
seen.add(first_part)
out_file.write(line)

I want to return a list of length of lines in a file, but my code isn't working

I want to return the length on each line as an element in a list named lst but
my code is not working, the output always comes to be an empty list. Please tell
me what's wrong with my code.
# this is the file
f = open("abcd.txt", 'w')
f.write("Hello How Ar3 you?")
f.write("\nHope You're doing fine")
f.write("\nI'm doing okay too.")
f.write("\nSizole!")
f.close()
This is the code I wrote to return a list of length of lines in the file:
f = open("abcd.txt", 'r')
t = f.readlines()
print(t)
lst = []
for line in f.readlines():
lst.append(len(line))
print(lst)
Output: lst == []

Just make it simple by reading the line once and do the length count.
below code is used list comprehension.
texts = f.readlines()
lst = [len(line) for line in texts]
print(lst)
Here's the output of the above code. Hope this helps and most of them had given the correct answers.
[19, 23, 20, 7]

When you read the file back in the second code snippet, the:
t = f.readlines()
...
reads the entire file in to list and assigns it to the variable t.
You then try to read all the lines again with the:
for line in f.readlines():
...
Which will not work because they have all been read already.
To fix it, just change the for loop to this:
for line in t:

You don't need to read the lines before the loop (that is the line t = ... is unnecessary).
In fact doing so is likely causing the problem - once you read the lines, the file pointer is at the end of the file so there's nothing left to read.

In your code, you are calling f.readlines() twice. You just need to call it once:
f = open("abcd.txt", 'r')
t = f.readlines()
print(t)
lst = []
# instead of for line in f.readlines(), we can simply use t
for line in t:
lst.append(len(line))
print(lst)
or if the variable t is not necessary:
f = open("abcd.txt", 'r')
lst = []
for line in f.readlines():
lst.append(len(line))
print(lst)
f.readlines() will move the file pointer to the end of file. Calling it again will return an empty list, which is not what we want.

Cannot write sorted list into new file.

I'm trying to write a sorted list onto a file. I have 1000 integers that I've sorted in ascending order, but cannot manage to write the new list of ascending numbers into my new file 'results'. I am new to programming and any help would be greatly appreciated.
This is my code so far:
def insertion_sort():
f = open("integers.txt", "r")
lines = f.read().splitlines()
print(lines)
print(type(lines[0]))
results = list(map(int, lines))
print(type(results[0]))
results.sort()
print(results)
f=open("integers.txt", "r")
lines = f.read().splitlines()
results = list(map(int,lines))
insertion_sort()
value = results.sort()
file_to_save_to = open("results.txt", "w")
file_to_save_to.write(str(value))
file_to_save_to.close()

Your problem is from this line
value = results.sort()
the sort method of a list doesn't return anything, it modifies the list in place.
Instead you should use sorted
value = sorted(results)
or the list directly as results without storing the return value of it's sort method
results.sort()`

First of all, it would be great if you would change your function to receive a parameter (list of ints), and return sorted list instead of None (as it is right now)
def insertion_sort(T):
return sorted(T)
f=open("integers.txt", "r")
lines = f.read().splitlines()
results = list(map(int,lines))
value = insertion_sort(results)
file_to_save_to = open("results.txt", "w")
file_to_save_to.write(str(value))
file_to_save_to.close()

You should try to use function sorted
value = sorted(results)
And you may not use list(map(int, lines))) you can pass map object to the sorted, so code will be like this:
results = sorted(map(int,lines))
And full example will be like this:
f=open("integers.txt", "r")
lines = f.read().splitlines()
sorted_lines = sorted(map(int,lines))
file_to_save_to = open("results.txt", "w")
for line in sorted_lines:
file_to_save_to.write(str(line))
file_to_save_to.close()

You're calling insertion_sort() which is a function that performs the operation, but doesn't yet return anything. You'll want to return the results by simply adding the return statement at the end of your function:
def insertion_sort():
f = open("integers.txt", "r")
lines = f.read().splitlines()
print(lines)
print(type(lines[0]))
results = list(map(int, lines))
print(type(results[0]))
results.sort()
print(results)
return results # <- Add this line
Then, use the returned list by changing:
insertion_sort() to value = insertion_sort()

String Object Not Callable When Using Tuples and Ints

I am utterly flustered. I've created a list of tuples from a text file and done all of the conversions to ints:
for line in f:
if firstLine is True: #first line of file is the total knapsack size and # of items.
knapsackSize, nItems = line.split()
firstLine = False
else:
itemSize, itemValue = line.split()
items.append((int(itemSize), int(itemValue)))
print items
knapsackSize, nItems = int(knapsackSize), int(nItems) #convert strings to ints
I have functions that access the tuples for more readable code:
def itemSize(item): return item[0]
def itemValue(item): return item[1]
Yet when I call these functions, i.e.,:
elif itemSize(items[nItems-1]) > sizeLimit
I get an inexplicable "'str' object is not callable" error, referencing the foregoing line of code. I have type checked everything that should be a tuple or an int using instsanceof, and it all checks out. What gives?

Because at this point:
itemSize, itemValue = line.split()
itemSize is still a string - you've appended to items the int converted values...
I would also change your logic slightly for handling first line:
with open('file') as fin:
knapsackSize, nItems = next(fin).split() # take first line
for other_lines in fin: # everything after
pass # do stuff for rest of file
Or just change the whole lot (assuming it's a 2column file of ints)
with open('file') as fin:
lines = (map(int, line.split()) for line in fin)
knapsackSize, nItems = next(lines)
items = list(lines)
And possibly instead of your functions to return items - use a dict or a namedtuple...
Or if you want to stay with functions, then go to the operator module and use:
itemSize = operator.itemgetter(0)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Comparing 2 text files in python - python

Related

Writing every two items in a list to a text file, then creating a new line

Remove partial column duplicates from a txt file

I want to return a list of length of lines in a file, but my code isn't working

Cannot write sorted list into new file.

String Object Not Callable When Using Tuples and Ints

Categories

Resources