Remove duplicates from dictionary of tuples

Remove duplicates from dictionary of tuples - python

I have the following dictionary, that consists of Keys: list of tuples as values
{0: [(1410.0, 21.77562368178597), (1400.0, 19.207664561646514), (1400.0, 0.0008002910625641847), (1390.0, 0.005384339399360756), (1390.0, 16.81119144715727), (1380.0, 0.006317587214078991), (1380.0, 14.581397723675186), (1370.0, 12.425676833176958), (1360.0, 10.157186391679849), (1350.0, 8.464056857473565), (1340.0, 6.743908971063571), (1330.0, 4.886783322196731), (1320.0, 3.712576730302521), (1310.0, 2.689847385668083), (1300.0, 1.6219146729959537), (1290.0, 0.41216337921204677)], ....etc)
In some cases there are tuples that have same first element but different second element.
In the example from the code above (1400, 19.2) & (1400, 0.0000000000291)
What I want to do is to combine these tuples into one tuple (1400, (sum of the second element)

Is this what you mean?
Suppose that d is the name of your dictionary.
d = {0: [(1410.0, 21.77562368178597), (1400.0, 19.207664561646514), (1400.0, 0.0008002910625641847), (1390.0, 0.005384339399360756), (1390.0, 16.81119144715727), (1380.0, 0.006317587214078991), (1380.0, 14.581397723675186), (1370.0, 12.425676833176958), (1360.0, 10.157186391679849), (1350.0, 8.464056857473565), (1340.0, 6.743908971063571), (1330.0, 4.886783322196731), (1320.0, 3.712576730302521), (1310.0, 2.689847385668083), (1300.0, 1.6219146729959537), (1290.0, 0.41216337921204677)]}
Now try this code:
new_d = {}
for item in d:
summations = {}
for key, value in d[item]:
if key in summations:
summations[key] += value
else:
summations[key] = value
temp_list = []
for key in summations:
temp_list.append((key, summations[key]))
new_d[item] = temp_list
print(new_d)

first Idea that i got is 2 for loops in the first one you take the first number from the current tuple and in the second for loop you compare it to each first number of the elemet in the dic. If it matches than you take the second number from that element from that tuple and add it to the one from the first for loop.Than you delete the tuple in the second loop.You could also save each key for all tuples that have duplicate first number and delate them in the next step There might be a better solution for this but this is what i would try if Performance is not important

s = set()
[s.add(t) for t in dict_of_tuples.values()]
Demo:
>>> dict_of_tuples = tuple([1,2,(3,3), (3, 3)]
{1, 2, (3, 3), (3,3)}
>>> s = set()
>>> [s.add(t) for t in dict_of_tuples.values()]
(1, 2, (3,3))

Related

Adding an element in a nested list based on condition

I was wondering if you could help me simplify my code or find an efficient method. I am given a nested array and I wish to add the second element based on the first nested item.
[('dog','1'),('dog','2'),('cat',1'),('cat','2')]
This will result in:
[('dog','3'),('cat','3')]
I would want the numbers to be strings instead of int type. Here is my code below:
dddd=[]
dddd=result_1_ce+result_2_ce+result_3_ce+result_4_ce
#Sum all of the elements from a prior find dddd stores [('dog','1'),('dog','2'),('cat',1'),('cat','2')]
newlist = [[int(element) if element.isdigit() else element for element in sub] for sub in dddd]
grouped = dict()
grouped.update((name,grouped.get(name,0)+value) for name,value in newlist)
dddd = [*map(list,grouped.items())]
#Of this manipulation display it in reverse order
dddd=sorted(dddd,key=lambda x:x[1],reverse=True)
X = [tuple(i) for i in dddd]
print("Findings:",X)

This code work
I am writing a comment where I change the code.
dddd=result_1_ce+result_2_ce+result_3_ce+result_4_ce
#Sum all of the elements from a prior find dddd stores [('dog','1'),('dog','2'),('cat',1'),('cat','2')]
newlist = [[int(element) if element.isdigit() else element for element in sub] for sub in dddd]
grouped = dict()
grouped.update((name,grouped.get(name,0)+value) for name,value in newlist)
dddd = [*map(list,grouped.items())]
#Of this manipulation display it in reverse order
dddd=sorted(dddd,key=lambda x:x[1],reverse=True)
X = [tuple([f,str(s)]) for f,s in dddd] # get two both element from the list of list and change second('s) element to str.
print("Findings:",X)
OUTPUT
Findings: [('dog', '3'), ('cat', '3')]
You dddd list is looks like this [['dog', 3], ['cat', 3]].
# If I write This
dddd = [['dog', 3], ['cat', 3]]
for f,s in dddd: # (f is for 'dog' and 'cat) and (s is for 3 and 3)
print(f)
print(s)

It seems to me, a very simple approach would be to convert to a dictionary first, and it is a good data structure for grouping. Also, use an integer to sum the numbers. You can use int of str if you are unsure if each number value will be int or str. Then to get the output of list of tuples, you just convert with a simple comprehension.
l = [("dog", "1"), ("dog", "2"), ("cat", 1), ("cat", "2")]
d = {}
for t in l:
d[t[0]] = d.setdefault(t[0], 0) + int(str(t[1]))
print([(k, str(v)) for k, v in d.items()])
Output:
[('dog', '3'), ('cat', '3')]

Python - list + tuples length of elements

I'm having some issues trying to figure this out (as i'm a pure beginner to python).
I have a list of names:
names_2 = ["Lars", "Per", "Henrik"]
Which I need to convert into a tuple who hold each elements length + the element it self.
I tried this:
namesTuple = tuple(names_2 + [len(name) for name in names_2])
Output of this is: ('Lars', 'Per', 'Henrik', 4, 3, 6)
The output im looking for is ('Lars', 4, 'Per', 3, 'Henrik', 6)
Anyone who can help?

You can use a nested generator expression in the tuple constructor, for instance:
names_tuple = tuple(x for name in names_2 for x in (name, len(name)))
# ('Lars', 4, 'Per', 3, 'Henrik', 6)
If you were to build it in a looping approach, it makes sense to build a list first (tuples are immutable):
names = []
for name in names_2:
# extend (both at once)
names.extend((name, len(name)))
# OR append one by one (no spurious intermediate tuple)
# names.append(name)
# names.append(len(name))
names_tuple = tuple(names)

names_2 = ["Lars", "Per", "Henrik"]
names = []
for name in names_2:
names.append(name)
names.append(len(name))
names = tuple(names)
Iterate over the names, append the name itself and its length to a list, and convert the list to tuple.
Or as a one-liner (but you'll end up with a tuple of tuples):
names_2 = ["Lars", "Per", "Henrik"]
names = tuple((name, len(name)) for name in names_2)

Zip the list of names with the list of lengths, then flatten the resulting list and convert that to a tuple.
from itertools import chain
namesTuple = tuple(chain.from_iterable(zip(names_2, map(len, names_2))))
If you prefer something a little less "functional", you can use a generator expression.
namesTuple = tuple(chain.from_iterable((x, len(x)) for x in names_2))
or (repeating #schwobaseggl's answer)
namesTuple = tuple(value for name in names_2 for value in (name, len(name)))

First create a tuple of tuples : ((name_1,lenght_1), (name_2,lenght_2),...)
The zip function is existing for that.
Secondly, you have to flatten this tuple of tuples.
[In]
names = ["Lars", "Per", "Henrik"]
[In]
zip_tupled = tuple(zip(names, [len(x) for x in names]))
[Out]
zip_tupled = (('Lars', 4), ('Per', 3), ('Henrik', 6))
[In]
final = tuple(item for subtuple in zip_tupled for item in subtuple)
[Out]
final = ('Lars', 4, 'Per', 3, 'Henrik', 6)
This solution is quite close to the solution of schwobaseggl...But less direct/straight.
Stackoverflow : how to flatten a list

Make a list with the most frequent tuple of a dictionary acording the first element

I'm trying to make a list that contains the most frequent tuple of a dictionary acording the first element. For example:
If d is my dictionary:
d = {(Hello, my): 1,(Hello, world):2, (my, name):3, (my,house):1}
I want to obtain a list like this:
L= [(Hello, world),(my, name)]
So I try this:
L = [k for k,val in d.iteritems() if val == max(d.values())]
But that only gives me the max of all the tuples:
L = [('my', 'name')]
I was thinking that maybe I have to go through my dictionary and make a new one for every first word of each tuple and then find the most frequent and put it on a list, but I'm having trouble to translate that in a code.

from itertools import groupby
# your input data
d = {('Hello', 'my'): 1,('Hello', 'world'):2, ('my', 'name'):3, ('my','house'):1}
key_fu = lambda x: x[0][0] # first element of first element,
# i.e. of ((a,b), c), return a
groups = groupby(sorted(d.iteritems(), key=key_fu), key_fu)
l = [max(g, key=lambda x:x[1])[0] for _, g in groups]

This is achievable in O(n) if you just re-key the mapping off the first word:
>>> d = {('Hello','my'): 1, ('Hello','world'): 2, ('my','name'): 3, ('my','house'): 1}
>>> d_max = {}
>>> for (first, second), count in d.items():
... if count >= d_max.get(first, (None, 0))[1]:
... d_max[first] = (second, count)
...
>>> d_max
{'Hello': ('world', 2), 'my': ('name', 3)}
>>> output = [(first, second) for (first, (second, count)) in d_max.items()]
>>> output
[('my', 'name'), ('Hello', 'world')]

In my opinion you should not just get the max on all the d values otherwise it just get the biggest value contained in your dictionary that is three in the specified case.
What I would do is create an intermediate list ( maybe this can be hidden ) that keeps in memory the first part of the key as second element, and the counter as first element. In this way you can just get the first element on the sorted list, to get the real max key.

You have pairs of words and a count associated to each of them. You could store your information in (or convert it to) 3-tuples:
d = [
('Hello', 'my', 1),
('Hello', 'world', 2),
('my', 'name', 3),
('my', 'house', 1)
]
For each word in the first position, you want to find the word in 2nd position occurs the most frequently. Sort the data according to the first word (any order, just to group them), then according to the count (descending).
d.sort(lambda t1,t2: cmp(t2[2],t1[2]) if (t1[0]==t2[0]) else cmp(t1[0],t2[0]))
Finally, iterate through the resulting array, keeping track of the last word encountered, and append only when encountering a new word in 1st position.
L = []
last_word = ""
for word1, word2, count in d:
if word1 != last_word:
L.append((word1,word2))
last_word = word1
print L
By running this code, you obtain [('Hello', 'world'), ('my', 'name')].

Loop through entries in a list and create new list

I have a list of strings that looks like that
name=['Jack','Sam','Terry','Sam','Henry',.......]
I want to create a newlist with the logic shown below. I want to go to every entry in name and assign it a number if the entry is seen for the first time. If it is being repeated(as in the case with 'Sam') I want to assign it the corresponding number, include it in my newlist and continue.
newlist = []
name[1] = 'Jack'
Jack = 1
newlist = ['Jack']
name[2] = 'Sam'
Sam = 2
newlist = ['Jack','Sam']
name[3] = 'Terry'
Terry = 3
newlist = ['Jack','Sam','Terry']
name[4] = 'Sam'
Sam = 2
newlist = ['Jack','Sam','Terry','Sam']
name[5] = 'Henry'
Henry = 5
newlist = ['Jack','Sam','Terry','Sam','Henry']
I know this can be done with something like
u,index = np.unique(name,return_inverse=True)
but for me it is important to loop through the individual entries of the list name and keep the logic above. Can someone help me with this?

Try using a dict and checking if keys are already paired to a value:
name = ['Jack','Sam','Terry','Sam','Henry']
vals = {}
i = 0
for entry in name:
if entry not in vals:
vals[entry] = i + 1
i += 1
print vals
Result:
{'Henry': 5, 'Jack': 1, 'Sam': 2, 'Terry': 3}
Elements can be accessed by "index" (read: key) just like you would do for a list, except the "index" is whatever the key is; in this case, the keys are names.
>>> vals['Henry']
5
EDIT: If order is important, you can enter the items into the dict using the number as the key: in this way, you will know which owner is which based on their number:
name = ['Jack','Sam','Terry','Sam','Henry']
vals = {}
i = 0
for entry in name:
#Check if entry is a repeat
if entry not in name[0:i]:
vals[i + 1] = entry
i += 1
print (vals)
print (vals[5])
This code uses the order in which they appear as the key. To make sure we don't overwrite or create duplicates, it checks if the current name has appeared before in the list (anywhere from 0 up to i, the current index in the name list).
In this way, it is still in the "sorted order" which you want. Instead of accessing items by the name of the owner you simply index by their number. This will give you the order you desire from your example.
Result:
>>> vals
{1: 'Jack', 2: 'Sam', 3: 'Terry', 5: 'Henry'}
>>> vals[5]
'Henry'

If you really want to create variable.By using globals() I am creating global variable .If you want you can create local variable using locals()
Usage of globals()/locals() create a dictionary which is the look up table of the variable and their values by adding key and value you are creating a variable
lists1 = ['Jack','Sam','Terry','Sam','Henry']
var = globals()
for i,n in enumerate(nl,1):
if n not in var:
var[n] = [i]
print var
{'Jack':1,'Sam': 2,'Terry': 3, 'Henry':5}
print Jack
1

If order of the original list is key, may I suggest two data structures, a dictionary and a newlist
d = {}
newlist = []
for i,n in enumerate(nl):
if n not in d:
d[n] = [i+1]
newlist.append({n: d[n]})
newlist will return
[{'Jack': [1]}, {'Sam': [2]}, {'Terry': [3]}, {'Sam': [2]}, {'Henry': [5]}]
to walk it:
for names in newlist:
for k, v in names.iteritems():
print('{} is number {}'.format(k, v))
NOTE: This does not make it easy to lookup the number based on the name as other suggested above. That would require more data structure logic. This does however let you keep the original list order, but keep track of the time the name was first found essentially.

Edit: Since order is important to you. Use orderedDict() from the collections module.
Use a dictionary. Iterate over your list with a for loop and then check to see if you have the name in the dictionary with a if statement. enumerate gives you the index number of your name, but keep in mind that index number start from 0 so in accordance to your question we append 1 to the index number giving it the illusion that we begin indexing from 1
import collections
nl = ['Jack','Sam','Terry','Sam','Henry']
d = collections.OrderedDict()
for i,n in enumerate(nl):
if n not in d:
d[n] = [i+1]
print d
Output:
OrderedDict([('Jack', [1]), ('Sam', [2]), ('Terry', [3]), ('Henry', [5])])
EDIT:
The ordered dict is still a dictionary. So you can use .items() to get the key value pairs as tuples. the number is essectially a list so you can do this:
for i in d.items():
print '{} = {}'.format(i[0],i[1][0]) #where `i[1]` gives you the number at that position in the tuple, then the `[0]` gives you the first element of the list.
Output:
Jack = 1
Sam = 2
Terry = 3
Henry = 5

Sort a list of tuples using index position of items in another list

I have a base list.
li = ['fca', 'fc_add', 'fca_2', 'fcadd_2', 'Red_Exis', 'G_Exis', 'P_Exis',
'fam_1']
and want to use the index position of items in the list to sort the following list of tuples.
tup= [('G_Exis','abc'), ('fca','210Y'), ('Red_Exis', 107),
('fc_add','999 Des ST.')]
I need the final sorted list to look like the following:
fin_tup = [('fca','210Y'), ('fc_add','999 Des ST.'), ('Red_Exis', 107),
('G_Exis','abc')]

index_dict = {item: index for index, item in enumerate(li)}
tup.sort(key=lambda t: index_dict[t[0]])
print(tup)
Output
[('fca', '210Y'), ('fc_add', '999 Des ST.'), ('Red_Exis', 107),
('G_Exis', 'abc')]
Or without sorting:
index_dict = {item: index for index, item in enumerate(li)}
fin_tup = [None]*len(li)
for t in tup:
fin_tup[index_dict[t[0]]] = t
fin_tup = [t for t in fin_tup if t is not None]

An easy way to do this is to not actually sort it, but store it in a dictionary using the indices of the key on which you are sorting as the dictionary keys :
>>> d ={}
>>> for i in tup : d[li.index(i[0])] = i
>>> list(d.values())
[('fca', '210Y'), ('fc_add', '999 Des ST.'), ('Red_Exis', 107), ('G_Exis', 'abc')]
This is the easy way, not the efficient one.

Use sort()'s key parameter. The value of the sorting is the index in li, so pluck out the first item from each value, and find the index in li. Done.
tup.sort(key=lambda item_in_li, trash: li.index(item_in_li))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove duplicates from dictionary of tuples - python

s = set() [s.add(t) for t in dict_of_tuples.values()] Demo: >>> dict_of_tuples = tuple([1,2,(3,3), (3, 3)] {1, 2, (3, 3), (3,3)} >>> s = set() >>> [s.add(t) for t in dict_of_tuples.values()] (1, 2, (3,3))

Related

Adding an element in a nested list based on condition

Python - list + tuples length of elements

Make a list with the most frequent tuple of a dictionary acording the first element

Loop through entries in a list and create new list

Sort a list of tuples using index position of items in another list

Categories

Resources