Creating a program that compares two lists - python

I am trying to create a program that checks whether items from one list are not in another. It keeps returning lines saying that x value is not in the list. Any suggestions? Sorry about my code, it's quite sloppy.
Searching Within an Array
Putting .txt files into arrays
with open('Barcodes', 'r') as f:
barcodes = [line.strip() for line in f]
with open('EAN Staging', 'r') as f:
EAN_staging = [line.strip() for line in f]
Arrays
list1 = barcodes
list2 = EAN_staging
Main Code
fixed = -1
for x in list1:
for variable in list1: # Moves along each variable in the list, in turn
if list1[fixed] in list2: # If the term is in the list, then
fixed = fixed + 1
location = list2.index(list1[fixed]) # Finds the term in the list
print ()
print ("Found", variable ,"at location", location) # Prints location of terms

Instead of lists, read the files as sets:
with open('Barcodes', 'r') as f:
barcodes = {line.strip() for line in f}
with open('EAN Staging', 'r') as f:
EAN_staging = {line.strip() for line in f}
Then all you need to do is to calculate the symmetric difference between them:
diff = barcodes - EAN_staging # or barcodes.difference(EAN_stagin)
An extracted example:
a = {1, 2, 3}
b = {3, 4, 5}
print(a - b)
>> {1, 2, 4, 5} # 1, 2 are in a but in b
Note that if you are operating with sets, information about how many times an element is present will be lost. If you care about situations when an element is present in barcodes 3 times, but only 2 times in EAN_staging, you should use Counter from collections.

Your code doesn't seem to quite answer your question. If all you want to do is see which elements aren't shared, I think sets are the way to go.
set1 = set(list1)
set2 = set(list2)
in_first_but_not_in_second = set1.difference(set2) # outputs a set
not_in_both = set1.symmetric_difference(set2) # outputs a set

Related

Where to store data in Python after split?

I read each line and get array result:
arr = line.split("|"); // [1, 500, ABC]
Then I need to check of there are a duplications in each line by each value [1, 500, ABC]
Which structure of data I have use?
So, if I get two the same lines or line where the first parameter is the same I should skip it:
[1, 500, ABC]
[1, 600, ABC] // Skip it
[2, 500, ABC]
[1, 500, ABC] // skip it
Ok, so you want to read a bunch of lines, split them on | character, and discard lines where the first field is the same as the first field of the first line. It could be:
line = next(file) # process first line
arr = line.split("|")
field0 = arr[0]
data = [arr[:]] # make sure to take a copy...
for line in file:
arr = line.split("|")
if arr[0] != field0:
data.append(arr[:]) # again append a copy
Please correct me if I'm not understanding the question correctly. The part that is confusing me is that if two lines are the same, then it implies the first element is the same. This means that you only need to check for the weaker condition of the first element being the same / observed before?
seen_elements = set()
filtered_lines = []
for line in lines:
arr = line.split("|")
if arr[0] in seen_elements:
pass
else:
filtered_lines.append(line)
seen_elements.add(arr[0])
Just wanted to add that if later you want to quickly identify the elements associated with the first number for some reason, you can also use dict.setdefault:
seen_items = dict()
for line in lines:
arr = line.split('|')
seen_items.setdefault(arr[0], arr[1:])

Remove duplicates from large list but remove both if it does exist?

So I have a text file like this
123
1234
123
1234
12345
123456
You can see 123 appears twice so both instances should be removed. but 12345 appears once so it stays. My text file is about 70,000 lines.
Here is what I came up with.
file = open("test.txt",'r')
lines = file.read().splitlines() #to ignore the '\n' and turn to list structure
for appId in lines:
if(lines.count(appId) > 1): #if element count is not unique remove both elements
lines.remove(appId) #first instance removed
lines.remove(appId) #second instance removed
writeFile = open("duplicatesRemoved.txt",'a') #output the left over unique elements to file
for element in lines:
writeFile.write(element + "\n")
When I run this I feel like my logic is correct, but I know for a fact the output is suppose to be around 950, but Im still getting 23000 elements in my output so a lot is not getting removed. Any ideas where the bug could reside?
Edit: I FORGOT TO MENTION. An element can only appear twice MAX.
Use Counter from built in collections:
In [1]: from collections import Counter
In [2]: a = [123, 1234, 123, 1234, 12345, 123456]
In [3]: a = Counter(a)
In [4]: a
Out[4]: Counter({123: 2, 1234: 2, 12345: 1, 123456: 1})
In [5]: a = [k for k, v in a.items() if v == 1]
In [6]: a
Out[6]: [12345, 123456]
For your particular problem I will do it like this:
from collections import defaultdict
out = defaultdict(int)
with open('input.txt') as f:
for line in f:
out[line.strip()] += 1
with open('out.txt', 'w') as f:
for k, v in out.items():
if v == 1: #here you use logic suitable for what you want
f.write(k + '\n')
Be careful about removing elements from a list while still iterating over that list. This changes the behavior of the list iterator, and can make it skip over elements, which may be part of your problem.
Instead, I suggest creating a filtered copy of the list using a list comprehension - instead of removing elements that appear more than twice, you would keep elements that appear less than that:
file = open("test.txt",'r')
lines = file.read().splitlines()
unique_lines = [line for line in lines if lines.count(line) <= 2] # if it appears twice or less
with open("duplicatesRemoved.txt", "w") as writefile:
writefile.writelines(unique_lines)
You could also easily modify this code to look for only one occurrence (if lines.count(line) == 1) or for more than two occurrences.
You can count all of the elements and store them in a dictionary:
dic = {a:lines.count(a) for a in lines}
Then remove all duplicated one from array:
for k in dic:
if dic[k]>1:
while k in lines:
lines.remove(k)
NOTE: The while loop here is becaues line.remove(k) removes first k value from array and it must be repeated till there's no k value in the array.
If the for loop is complicated, you can use the dictionary in another way to get rid of duplicated values:
lines = [k for k, v in dic.items() if v==1]

Need help deleting repeating lines in txt file

I need to have an output printed in which only 1 list is split with no duplicates. The list i am using has like 100k emails and 1000x repeat. I want to remove those ..
I have tried some i have looked online
but nothing is written in my new file and the pycharm just freezes on running
def uniquelines(lineslist):
unique = {}
result = []
for item in lineslist:
if item.strip() in unique: continue
unique[item.strip()] = 1
result.append(item)
return result
file1 = open("wordlist.txt","r")
filelines = file1.readlines()
file1.close()
output = open("wordlist_unique.txt","w")
output.writelines(uniquelines(filelines))
output.close()
I expect it to just print all the emails with none repeating into a new text file
Before I get into the few ways to hopefully solve the issue, one thing I see off the bat is that you are using both a dictionary and a list within your function. This almost doubles the memory you will need to process things. I suggest using one or the other.
Using a set will provide you with a guaranteed "list" of unique items. The set.add() function will ignore duplicates.
s = {1, 2, 3}
print(s) #{1, 2, 3}
s.add(4)
print(s) #{1, 2, 3, 4}
s.add(4)
print(s) #{1, 2, 3, 4}
With that, you can modify your function to the following to achieve what you want. For my example, I have input.txt as a series of lines just containing a single integer value with plenty of duplicates.
def uniquelines(lineslist):
unique = set()
for line in lineslist:
unique.add(str(line).strip())
return list(unique)
with open('input.txt', 'r') as f:
lines = f.readlines()
output = uniquelines(lines)
with open('output.txt', 'w') as f:
f.write("\n".join([i for i in output]))
output.txt is as follows without any duplicates!
2
0
4
5
3
1
9
6
You can accomplish the same thing by calling set() on a list comprehension, but the disadvantage here is that you will need to load all the records into memory first and then pull out the duplicates. THe method above will hold all the unique values, but no duplicates, so depending on the size of your set, you probably want to use the function.
with open('input.txt', 'r') as f:
lines = f.readlines()
output = set([l.strip() for l in lines])
with open('output.txt', 'w') as f:
f.write("\n".join([i for i in output]))
I couldn't quite tell if you were looking to maintain a running count of how many times each unique line occured. If that's what you're going for, then you can use the in operator to see if it is in the keys already.
def uniquelines(lineslist):
unique = {}
for line in lineslist:
line = line.strip()
if line in unique:
unique[line] += 1
else:
unique[line] = 1
return unique
# {'9': 2, '0': 3, '4': 3, '1': 1, '3': 4, '2': 1, '6': 3, '5': 1}

Change the display of a list took from text file

I have this code wrote in Python:
with open ('textfile.txt') as f:
list=[]
for line in f:
line = line.split()
if line:
line = [int(i) for i in line]
list.append(line)
print(list)
This actually read integers from a text file and put them in a list.But it actually result as :
[[10,20,34]]
However,I would like it to display like:
10 20 34
How to do this? Thanks for your help!
You probably just want to add the items to the list, rather than appending them:
with open('textfile.txt') as f:
list = []
for line in f:
line = line.split()
if line:
list += [int(i) for i in line]
print " ".join([str(i) for i in list])
If you append a list to a list, you create a sub list:
a = [1]
a.append([2,3])
print a # [1, [2, 3]]
If you add it you get:
a = [1]
a += [2,3]
print a # [1, 2, 3]!
with open('textfile.txt') as f:
lines = [x.strip() for x in f.readlines()]
print(' '.join(lines))
With an input file 'textfiles.txt' that contains:
10
20
30
prints:
10 20 30
It sounds like you are trying to print a list of lists. The easiest way to do that is to iterate over it and print each list.
for line in list:
print " ".join(str(i) for i in line)
Also, I think list is a keyword in Python, so try to avoid naming your stuff that.
If you know that the file is not extremely long, if you want the list of integers, you can do it at once (two lines where one is the with open(.... And if you want to print it your way, you can convert the element to strings and join the result via ' '.join(... -- like this:
#!python3
# Load the content of the text file as one list of integers.
with open('textfile.txt') as f:
lst = [int(element) for element in f.read().split()]
# Print the formatted result.
print(' '.join(str(element) for element in lst))
Do not use the list identifier for your variables as it masks the name of the list type.

Simple way to "mix" respectively two lists in python?

I have the following issue:
I need to "mix" respectively two lists in python...
I have this:
names = open('contactos.txt')
numbers = open('numeros.txt')
names1 = []
numbers1= []
for line in numbers:
numberdata = line.strip()
numbers1.append(numberdata)
print numbers1
for line in names:
data = line.strip()
names1.append(data)
print names1
names.close()
numbers.close()
This prints abot 300 numbers first, and the respective 300 names later, what I need to do is to make a new file (txt) that prints the names and the numbers in one line, separated by a comma (,), like this:
Name1,64673635
Name2,63513635
Name3,67867635
Name4,12312635
Name5,78679635
Name6,63457635
Name7,68568635
..... and so on...
I hope you can help me do this, I've tried with "for"s but I'm not sure on how to do it if I'm iterating two lists at once, thank you :)
Utilize zip:
for num, name in zip(numbers, names):
print('{0}, {1}'.format(num, name))
zip will combine the two lists together, letting you write them to a file:
In [1]: l1 = ['one', 'two', 'three']
In [2]: l2 = [1, 2, 3]
In [3]: zip(l1, l2)
Out[3]: [('one', 1), ('two', 2), ('three', 3)]
However you can save yourself a bit of time. In your code, you are iterating over each file separately, creating a list from each. You could also iterate over both at the same time, creating your list in one sweep:
results = []
with open('contactos.txt') as c:
with open('numeros.txt') as n:
for line in c:
results.append([line.strip(), n.readline().strip()])
print results
This uses a with statement (context manager), that essentially handles the closing of files for you. This will iterate through contactos, reading a line from numeros and appending the pair to the list. You can even cut out the list step and write directly to your third file in the same loop:
with open('output.txt', 'w') as output:
with open('contactos.txt', 'r') as c:
with open('numeros.txt', 'r') as n:
for line in c:
output.write('{0}, {1}\n'.format(line.strip(), n.readline().strip()))
A "pythonic" way to mix two lists in this way is the zip function!
names = open('contactos.txt')
numbers = open('numeros.txt')
names1 = []
numbers1= []
for line in numbers:
numberdata = line.strip()
numbers1.append(numberdata)
for line in names:
data = line.strip()
names1.append(data)
names.close()
numbers.close()
for name, number in zip(names1, numbers1):
print '%s, %s' % (name number)
There are other and better ways to print formatted text (e.g. Yuushi's answer). I also like to use the with statement and list comprehensions, e.g.
with open('contactos.txt') as f:
names = [line.strip() for line in f]
with open('numeros.txt') as f:
numbers = [line.strip() for line in f]
for name, number in zip(names, numbers):
print '%s, %s' % (name, number)
Finally, I just want to comment on how you could do it without the zip function. I'm not sure what you want to do if there are a different number of numbers and names, but you can use a for loop like this for the last bit to access the values from both lists in a single for loop:
for i in range(len(numbers)):
print '%s, %s' % (names[i], numbers[i])
This code in particular will throw an exception if there are more names than numbers, so you would probably want to add some code to handle that.
Everything at once:
import itertools
with open('contactos.txt') as names, open('numeros.txt') as numbers:
for num, name in itertools.izip(numbers, names):
print '%s, %s' % (num.strip(), name.strip())
This reads the two files in parallel, rather than reading each file completely into memory one at a time.

Categories

Resources