I'm trying to loop through a concatenated list of two lists that is essentially a bag of words - example outputs yields [('brexit', 11), ('say', 11), ('uk', 7), ('eu', 6), ('deal', 5), ('may', 5), ..., ('brexit', 35), ('say', 28), , ('may', 5), ('uk', 1), ... ]
Having gathered all the text inputs from .txt files, I've removed the stop-words and using stemming to remove duplicated from tenses.
The next step I want to take is to loop through the list and find the differences in the number of appearances a given word - I would want 'brexit', 'say' and 'uk' to be flagged as significant words with either the two numbers of appearances or just the difference. My start of the code (partly python, partly pseudocode) is below.
def findSimilarities (word, count):
for (word, count) in biasDict:
if word == word and count != count:
print (word, count - count)
elif word ==word and count == count:
del (word, count)
(word, count)++
Any advice on how to approach this and edit the code to work? If it would be better, I can have the words come from two separate lists (which is how they are created; I concatenated them after they were created).
Many thanks.
This would be an option. Not efficient, but the output is as desired. That is, if you want to delete word's with the same count (as shown in your code). If you want to keep the entries, just skip the biasDict.remove() part.
If your just interested in the word's that occur twice with a different count, you could append the tuples to a new list instead of printing the difference. Afterwards return the new list.
import numpy as np
def findSimilarities (biasDict):
similarities = {}
#remove_later = []
for i in range(0, len(biasDict)):
word, count = biasDict[i][0], biasDict[i][1]
for c in range(0, len(biasDict)):
word_compare, count_compare = biasDict[c][0], biasDict[c][1]
if c==i:
pass #Same entry
elif word == word_compare and count != count_compare:
delta = count - count_compare
if word not in similarities and delta != 0:
similarities[word] = np.abs(delta)
#elif word == word_compare and count == count_compare and (word, count) not in remove_later:
# remove_later.append((word, count))
#for entry in remove_later:
# biasDict.remove(entry)
return similarities
biasDict = [('brexit', 11), ('say', 11), ('uk', 7), ('eu', 6), ('deal', 5), ('may', 5), ('brexit', 35), ('say', 28), ('may', 5), ('uk', 1)]
print(findSimilarities(biasDict))
Output:
{'brexit': 24, 'say': 17, 'uk': 6}
The idea of combining occurrences seems fine for me. Here is my implementation. Any comment or optimization is appreciated.
def merge_list(words_count_list):
updated_list = list()
words_list = list()
for i in range(len(words_count_list)):
word = words_count_list[i][0]
count = words_count_list[i][1]
if word not in words_list:
words_list.append(word)
for j in range(i+1,len(words_count_list),1):
if word == words_count_list[j][0]:
count += words_count_list[j][1]
updated_list.append((word,count))
return updated_list
print(merge_list([('brexit', 11), ('say', 11), ('uk', 7), ('eu', 6), ('deal', 5), ('may', 5),
('brexit', 35), ('say', 28),('may', 5), ('uk', 1)]))
output:
[('brexit', 46), ('say', 39), ('uk', 8), ('eu', 6), ('deal', 5), ('may', 10)]
Now, you can specify a threshold on the word count, sort by the count, then remove the most significant words.
Assuming you have two lists of the words, then you can do
#Converts list of tuples to dictionary.
#[('a',1'),('b',2)] => {'a':1,'b',2}
def tupleListToDict(list):
dictobj = {}
for item in list:
dictobj[item[0]] = item[1]
return dictobj
def findSimilarities(list1, list2):
dict1 = tupleListToDict(list1)
dict2 = tupleListToDict(list2)
dict3 = {} #To store the difference
#Find occurence of key in 2nd dict, if found, calculate the difference
for key, value in dict1.items():
if key in dict2.keys():
dict3[key] = abs(value - dict2[key])
return dict3
Example output
list1 = [('brexit', 11), ('say', 11), ('uk', 7), ('eu', 6), ('deal', 5), ('may', 5)]
list2 = [('brexit', 35), ('say', 28), ('may', 5), ('uk', 1)]
print(findSimilarities(list1, list2))
{'brexit': 24, 'say': 17, 'uk': 6, 'may': 0}
Related
I am trying to get the highest 4 values in a list of tuples and put them into a new list. However, if there are two tuples with the same value I want to take the one with the lowest number.
The list originally looks like this:
[(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)...]
And I want the new list to look like this:
[(9,20), (3,16), (54, 13), (2,10)]
This is my current code any suggestions?
sorted_y = sorted(sorted_x, key=lambda t: t[1], reverse=True)[:5]
sorted_z = []
while n < 4:
n = 0
x = 0
y = 0
if sorted_y[x][y] > sorted_y[x+1][y]:
sorted_z.append(sorted_y[x][y])
print(sorted_z)
print(n)
n = n + 1
elif sorted_y[x][y] == sorted_y[x+1][y]:
a = sorted_y[x]
b = sorted_y[x+1]
if a > b:
sorted_z.append(sorted_y[x+1][y])
else:
sorted_z.append(sorted_y[x][y])
n = n + 1
print(sorted_z)
print(n)
Edit: When talking about lowest value I mean the highest value in the second value of the tuple and then if two second values are the same I want to take the lowest first value of the two.
How about groupby?
from itertools import groupby, islice
from operator import itemgetter
data = [(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)]
pre_sorted = sorted(data, key=itemgetter(1), reverse=True)
result = [sorted(group, key=itemgetter(0))[0] for key, group in islice(groupby(pre_sorted, key=itemgetter(1)), 4)]
print(result)
Output:
[(9, 20), (3, 16), (54, 13), (2, 10)]
Explanation:
This first sorts the data by the second element's value in descending order. groupby then puts them into groups where each tuple in the group has the same value for the second element.
Using islice, we take the top four groups and sort each by the value of the first element in ascending order. Taking the first value of each group, we arrive at our answer.
You can try this :
l = [(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)]
asv = set([i[1] for i in l]) # The set of unique second elements
new_l = [(min([i[0] for i in l if i[1]==k]),k) for k in asv]
OUTPUT :
[(3, 16), (2, 10), (9, 20), (54, 13)]
I must lower letters in a list if the occupy a certain position given in a previous function I did. The function I must program is lower_words.
I'm having an issue: every time I lower an element the row is repeated.
I don't need to use the list "words" for this. Just left it there so you could understand better what the function does/must do. Can someone help me?
words= ["PATO", "GATO", "BOI", "CAO"]
grid1= ["PIGATOS",
"ANRBKFD",
"TMCAOXA",
"OOBBYQU",
"MACOUIV",
"EEJMIWL"]
positions_words_occupy = ((0, 0), (1, 0), (2, 0), (3, 0), (0, 2), (0, 3), (0, 4), (0, 5), (3, 2), (4, 3), (5, 4), (2, 2), (2, 3), (2, 4)) #these are the positions the words occupy. I have determined these positions with a previous function. first is the line, second the column
def lower_words(grid, positions_words_occupy):
new= []
for position in positions_words_occupy:
line= position[0]
column= position[1]
row= grid[line]
element= row[column]
new.append(row.replace(element, element.lower()))
return new
Expected output:
['pIgatoS', 'aNRBKFD', 'tMcaoXA', 'oObBYQU', 'MACoUIV', 'EEJMiWL']
Actual output:
['pIGATOS', 'aNRBKFD', 'tMCAOXA', 'ooBBYQU', 'PIgATOS', 'PIGaTOS', 'PIGAtOS', 'PIGAToS', 'OObbYQU', 'MACoUIV', 'EEJMiWL', 'TMcAOXA', 'TMCaOXa', 'TMCAoXA']
Changing the perspective, you can see it lowers the words I have in the list words:
['pIgatoS',
'aNRBKFD',
'tMcaoXA',
'oObBYQU',
'MACoUIV',
'EEJMiWL']
You are very close! You're actually appending to your new list new every time you replace a letter. That is why you are getting so many values in your list.
Another way you would run your code is to create a copy of grid1, and then replace each word every time you replace a letter. Here is a new function implementing these small changes:
def lower_words(grid, positions_words_occupy):
new = grid1.copy()
for position in positions_words_occupy:
line= position[0]
column= position[1]
row= new[line]
element= row[column]
#new.remove(row)
new_word = row[:column] + element.lower() + row[column+1:]
new[line] = new_word
return new
Output running lower_words(grid1, positions_words_occupy):
['pIgatoS', 'aNRBKFD', 'tMcaoXa', 'oObBYQU', 'MACoUIV', 'EEJMiWL']
I would first collect your grid positions in a collections.defaultdict or sets, then rebuild the strings with lowercase letters if their positions exist in these sets.
Demo:
from collections import defaultdict
grid1 = ["PIGATOS", "ANRBKFD", "TMCAOXA", "OOBBYQU", "MACOUIV", "EEJMIWL"]
positions_words_occupy = (
(0, 0),
(1, 0),
(2, 0),
(3, 0),
(0, 2),
(0, 3),
(0, 4),
(0, 5),
(3, 2),
(4, 3),
(5, 4),
(2, 2),
(2, 3),
(2, 4),
)
d = defaultdict(set)
for grid, pos in positions_words_occupy:
d[grid].add(pos)
result = []
for grid, pos in d.items():
result.append(
"".join(x.lower() if i in pos else x for i, x in enumerate(grid1[grid]))
)
print(result)
Output:
['pIgatoS', 'aNRBKFD', 'tMcaoXA', 'oObBYQU', 'MACoUIV', 'EEJMiWL']
I have a list of tuples (let's name it yz_list) that contains N tuples, which have the start and end values like: (start, end), represented by the example below:
yz_list = [(0, 6), (1, 7), (2, 8), (3, 9), (4, 10), (5, 11), (6, 12), (18, 24)]
And I would like to remove all values which are overlapped by the interval of a previous saved tuple. The output that represents this case on the sequences showed above is:
result = [(0,6), (6,12), (18,24)]
How could I achieve this result using Python?
Edit #1
The below code is the code that I'm generating this tuples:
for i, a in enumerate(seq):
if seq[i:i+multiplier] == "x"*multiplier:
to_replace.append((i, i+multiplier))
for i, j in enumerate(to_replace):
print(i,j)
if i == 0:
def_to_replace.append(j)
else:
ind = def_to_replace[i-1]
print(j[0]+1, "\n", ind)
if j[0]+1 not in range(ind[0], ind[1]):
def_to_replace.append(j)
# print(i, j)
print(def_to_replace)
for item in def_to_replace:
frag = replacer(frame_calc(seq[:item[0]]), rep0, rep1, rep2)
for k, v in enumerate(seq_dup[item[0]:item[1]]):
seq_dup[int(item[0]) + int(k)] = list(frag)[k]
return "".join(seq_dup)
As I'm developing with TDD, I'm making a step-by-step progress on the development and now I'm thinking on how to implement the removal of overlaping tuples. I don't really know if it's a good idea to use them as sets, and see the overlapping items.
The pseudocode for generating the result list is:
for item in yz_list:
if is not yz_list first item:
gets item first value
see if the value is betwen any of the values from tuples added on the result list
This may work. No fancy stuff, just manually process each tuple to see if either value is within the range of the saved tuple's set bounds:
yz_list = [(0, 6), (1, 7), (2, 8), (3, 9), (4, 10), (5, 11), (6, 12), (18, 24)]
result = [yz_list[0]]
bounds = yz_list[0][0], yz_list[0][1]
for tup in yz_list[1:]:
if tup[0] in range(bounds[0], bounds[1]) or tup[1] in range(bounds[0], bounds[1]):
pass
else:
result.append(tup)
print result # [(0, 6), (6, 12), (18, 24)]
Here is a class that calculates the overlaps using efficient binary search, and code showing its use to solve your problem. Run with python3.
import bisect
import sys
class Overlap():
def __init__(self):
self._intervals = []
def intervals(self):
return self._intervals
def put(self, interval):
istart, iend = interval
# Ignoring intervals that start after the window.
i = bisect.bisect_right(self._intervals, (iend, sys.maxsize))
# Look at remaining intervals to find overlap.
for start, end in self._intervals[:i]:
if end > istart:
return False
bisect.insort(self._intervals, interval)
return True
yz_list = [(0, 6), (1, 7), (2, 8), (3, 9), (4, 10), (5, 11), (6, 12), (18, 24)]
ov = Overlap()
for i in yz_list:
ov.put(i)
print('Original:', yz_list)
print('Result:', ov.intervals())
OUTPUT:
Original: [(0, 6), (1, 7), (2, 8), (3, 9), (4, 10), (5, 11), (6, 12), (18, 24)]
Result: [(0, 6), (6, 12), (18, 24)]
yz_list = [(0, 6), (1, 7), (2, 8), (3, 9), (4, 10), (5, 11), (6, 12), (18, 24)]
result = []
for start, stop in yz_list:
for low, high in result:
if (low < start < high) or (low < stop < high):
break
else:
result.append((start, stop))
This gives the desired output, and it's pretty easy to see how it works. The else clause basically just means "run this if we didn't break out of the loop".
I would like to tokenize concatenated characters based on the given dictionary and give and output of tokenized words found. For example, I have the following
dictionary = ['yak', 'kin', 'yakkin', 'khai', 'koo']
chars = 'yakkinpadthaikhaikoo'
Output should be something like the following:
[('yakkin', (0, 6), 6), ('padthai', (6, 13), 7), ('khai', (13, 17), 4), ('koo', (17, 20), 3)]
I would like to get the list of tuple as an output. The first element in tuple is the word found in dictionary, second element is character offset and third element is length of the word found. If characters is not found, we'll chunk them together into one word e.g. padthai in above case. If multiple words found from the dictionary, we'll select the longest one (select yakkin instead of yak and kin).
I have my current implementation below. It starts with index if 0 then looping through characters (it doesn't work yet).
import numpy as np
def tokenize(chars, dictionary):
n_chars = len(chars)
start = 0
char_found = []
words = []
for _ in range(int(n_chars/3)):
for r in range(1, n_chars + 1):
if chars[start:(start + r)] in dictionary:
char_found.append((chars[start:(start + r)], (start, start + r), len(chars[start:start+r])))
id_offset = np.argmax([t[1][1] for t in char_found])
start = char_found[id_offset][2]
if char_found[id_offset] not in words:
words.append(char_found[id_offset])
return words
tokenize(chars, dictionary) # give only [('yakkin', (0, 6), 6)]
I have hard time wrap around my head to solve this problem. Please feels free to comment/suggest!
it can look a bit nasty, but it works
def tokenize(string, dictionary):
# sorting dictionary words by length
# because we need to find longest word if its possible
# like "yakkin" instead of "yak"
sorted_dictionary = sorted(dictionary,
key=lambda word: len(word),
reverse=True)
start = 0
tokens = []
while start < len(string):
substring = string[start:]
try:
word = next(word
for word in sorted_dictionary
if substring.startswith(word))
offset = len(word)
except StopIteration:
# no words from dictionary were found
# at the beginning of substring,
# looking for next appearance of dictionary words
words_indexes = [substring.find(word)
for word in sorted_dictionary]
# if word is not found, "str.find" method returns -1
appeared_words_indexes = filter(lambda index: index > 0,
words_indexes)
try:
offset = min(appeared_words_indexes)
except ValueError:
# an empty sequence was passed to "min" function
# because there are no words from dictionary in substring
offset = len(substring)
word = substring[:offset]
token = word, (start, start + offset), offset
tokens.append(token)
start += offset
return tokens
gives output
>>>tokenize('yakkinpadthaikhaikoo', dictionary)
[('yakkin', (0, 6), 6),
('padthai', (6, 13), 7),
('khai', (13, 17), 4),
('koo', (17, 20), 3)]
>>>tokenize('lolyakhaiyakkinpadthaikhaikoolol', dictionary)
[('lol', (0, 3), 3),
('yak', (3, 6), 3),
('hai', (6, 9), 3),
('yakkin', (9, 15), 6),
('padthai', (15, 22), 7),
('khai', (22, 26), 4),
('koo', (26, 29), 3),
('lol', (29, 32), 3)]
You can use find() to find the starting index of the word, and the length of the word is known thanks to len(). Iterate through each word in the dictionary, and your list is complete!
def tokenize(chars, word_list):
tokens = []
for word in word_list:
word_len = len(word)
index = 0
# skips words that appear in longer words
skip = False
for other_word in word_list:
if word in other_word and len(other_word) > len(word):
print("skipped word:", word)
skip = True
if skip:
continue
while index < len(chars):
index = chars.find(word, index) # start search from index
if index == -1: # find() returns -1 if not found
break
# Append the tuple and continue the search at the end of the word
tokens.append((word, (index, word_len+index), word_len))
index += word_len
return tokens
Then we can run it for the following output:
>>>tokenize('yakkinpadthaikhaikoo', ['yak', 'kin', 'yakkin', 'khai', 'koo'])
skipped word: yak
skipped word: kin
[('yakkin', (0, 6), 6),
('khai', (13, 17), 4),
('koo', (17, 20), 3)]
I want to sort a list of tuples in a consecutive order, so the first element of each tuple is equal to the last element of the previous one.
For example:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
I have developed a search like this:
output=[]
given = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
t = given[0][0]
for i in range(len(given)):
# search tuples starting with element t
output += [e for e in given if e[0] == t]
t = output[-1][-1] # Get the next element to search
print(output)
Is there a pythonic way to achieve such order?
And a way to do it "in-place" (with only a list)?
In my problem, the input can be reordered in a circular way using all the tuples, so it is not important the first element chosen.
Assuming your tuples in the list will be circular, you may use dict to achieve it within complexity of O(n) as:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
input_dict = dict(input) # Convert list of `tuples` to dict
elem = input[0][0] # start point in the new list
new_list = [] # List of tuples for holding the values in required order
for _ in range(len(input)):
new_list.append((elem, input_dict[elem]))
elem = input_dict[elem]
if elem not in input_dict:
# Raise exception in case list of tuples is not circular
raise Exception('key {} not found in dict'.format(elem))
Final value hold by new_list will be:
>>> new_list
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
if you are not afraid to waste some memory you could create a dictionary start_dict containing the start integers as keys and the tuples as values and do something like this:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_dict = {item[0]: item for item in tpl}
start = tpl[0][0]
res = []
while start_dict:
item = start_dict[start]
del start_dict[start]
res.append(item)
start = item[-1]
print(res)
if two tuples start with the same number you will lose one of them... if not all the start numbers are used the loop will not terminate.
but maybe this is something to build on.
Actually there're many questions about what you intend to have as an output and what if the input list has invalid structure to do what you need.
Assuming you have an input of pairs where each number is included twice only. So we can consider such input as a graph where numbers are nodes and each pair is an edge. And as far as I understand your question you suppose that this graph is cyclic and looks like this:
10 - 7 - 13 - 4 - 9 - 10 (same 10 as at the beginning)
This shows you that you can reduce the list to store the graph to [10, 7, 13, 4, 9]. And here is the script that sorts the input list:
# input
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# sorting and archiving
first = input[0][0]
last = input[0][1]
output_in_place = [first, last]
while last != first:
for item in input:
if item[0] == last:
last = item[1]
if last != first:
output_in_place.append(last)
print(output_in_place)
# output
output = []
for i in range(len(output_in_place) - 1):
output.append((output_in_place[i], output_in_place[i+1]))
output.append((output_in_place[-1], output_in_place[0]))
print(output)
I would first create a dictionary of the form
{first_value: [list of tuples with that first value], ...}
Then work from there:
from collections import defaultdict
chosen_tuples = input[:1] # Start from the first
first_values = defaultdict()
for tup in input[1:]:
first_values[tup[0]].append(tup)
while first_values: # Loop will end when all lists are removed
value = chosen_tuples[-1][1] # Second item of last tuple
tuples_with_that_value = first_values[value]
chosen_tuples.append(tuples_with_that_value.pop())
if not chosen_with_that_value:
del first_values[value] # List empty, remove it
You can try this:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
output = [input[0]] # output contains the first element of input
temp = input[1:] # temp contains the rest of elements in input
while temp:
item = [i for i in temp if i[0] == output[-1][1]].pop() # We compare each element with output[-1]
output.append(item) # We add the right item to output
temp.remove(item) # We remove each handled element from temp
Output:
>>> output
[(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
Here is a robust solution using the sorted function and a custom key function:
input = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
def consec_sort(lst):
def key(x):
nonlocal index
if index <= lower_index:
index += 1
return -1
return abs(x[0] - lst[index - 1][1])
for lower_index in range(len(lst) - 2):
index = 0
lst = sorted(lst, key=key)
return lst
output = consec_sort(input)
print(output)
The original list is not modified. Note that sorted is called 3 times for your input list of length 5. In each call, one additional tuple is placed correctly. The first tuple keeps it original position.
I have used the nonlocal keyword, meaning that this code is for Python 3 only (one could use global instead to make it legal Python 2 code).
My two cents:
def match_tuples(input):
# making a copy to not mess up with the original one
tuples = input[:] # [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
last_elem = tuples.pop(0) # (10,7)
# { "first tuple's element": "index in list"}
indexes = {tup[0]: i for i, tup in enumerate(tuples)} # {9: 3, 4: 0, 13: 1, 7: 2}
yield last_elem # yields de firts element
for i in range(len(tuples)):
# get where in the list is the tuple which first element match the last element in the last tuple
list_index = indexes.get(last_elem[1])
last_elem = tuples[list_index] # just get that tuple
yield last_elem
Output:
input = [(10,7), (4,9), (13, 4), (7, 13), (9, 10)]
print(list(match_tuples(input)))
# output: [(10, 7), (7, 13), (13, 4), (4, 9), (9, 10)]
this is a (less efficient than the dictionary version) variant where the list is changed in-place:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
for i in range(1, len(tpl)-1): # iterate over the indices of the list
item = tpl[i]
for j, next_item in enumerate(tpl[i+1:]): # find the next item
# in the remaining list
if next_item[0] == item[1]:
next_index = i + j
break
tpl[i], tpl[next_index] = tpl[next_index], tpl[i] # now swap the items
here is a more efficient version of the same idea:
tpl = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
start_index = {item[0]: i for i, item in enumerate(tpl)}
item = tpl[0]
next_index = start_index[item[-1]]
for i in range(1, len(tpl)-1):
tpl[i], tpl[next_index] = tpl[next_index], tpl[i]
# need to update the start indices:
start_index[tpl[next_index][0]] = next_index
start_index[tpl[i][0]] = i
next_index = start_index[tpl[i][-1]]
print(tpl)
the list is changed in-place; the dictionary only contains the starting values of the tuples and their index in the list.
To get a O(n) algorithm one needs to make sure that one doesn't do a double-loop over the array. One way to do this is by keeping already processed values in some sort of lookup-table (a dict would be a good choice).
For example something like this (I hope the inline comments explain the functionality well). This modifies the list in-place and should avoid unnecessary (even implicit) looping over the list:
inp = [(10, 7), (4, 9), (13, 4), (7, 13), (9, 10)]
# A dictionary containing processed elements, first element is
# the key and the value represents the tuple. This is used to
# avoid the double loop
seen = {}
# The second value of the first tuple. This must match the first
# item of the next tuple
current = inp[0][1]
# Iteration to insert the next element
for insert_idx in range(1, len(inp)):
# print('insert', insert_idx, seen)
# If the next value was already found no need to search, just
# pop it from the seen dictionary and continue with the next loop
if current in seen:
item = seen.pop(current)
inp[insert_idx] = item
current = item[1]
continue
# Search the list until the next value is found saving all
# other items in the dictionary so we avoid to do unnecessary iterations
# over the list.
for search_idx in range(insert_idx, len(inp)):
# print('search', search_idx, inp[search_idx])
item = inp[search_idx]
first, second = item
if first == current:
# Found the next tuple, break out of the inner loop!
inp[insert_idx] = item
current = second
break
else:
seen[first] = item