Combine two lists: aggregate values that have similar keys

Combine two lists: aggregate values that have similar keys - python

I have two lists or more than . Some thing like this:
listX = [('A', 1, 10), ('B', 2, 20), ('C', 3, 30), ('D', 4, 30)]
listY = [('a', 5, 50), ('b', 4, 40), ('c', 3, 30), ('d', 1, 20),
('A', 6, 60), ('D', 7, 70])
i want to get the result that move the duplicate elements like this:
my result is to get all the list from listX + listY,but in the case there are duplicated
for example
the element ('A', 1, 10), ('D', 4, 30) of listX is presented or exitst in listY.so the result so be like this
result = [('A', 7, 70), ('B', 2, 20), ('C', 3, 30), ('D', 11, 100),
('a', 5, 50), ('b', 4, 40), ('c', 3, 30), ('d', 1, 20)]
(A, 7, 70) is obtained by adding ('A', 1, 10) and ('A', '6', '60') together
Anybody could me to solve this problem.?
Thanks.

This is pretty easy if you use a dictionary.
combined = {}
for item in listX + listY:
key = item[0]
if key in combined:
combined[key][0] += item[1]
combined[key][1] += item[2]
else:
combined[key] = [item[1], item[2]]
result = [(key, value[0], value[1]) for key, value in combined.items()]

You appear to be using lists like a dictionary. Any reason you're using lists instead of dictionaries?
My understanding of this garbled question, is that you want to add up values in tuples where the first element in the same.
I'd do something like this:
counter = dict(
(a[0], (a[1], a[2]))
for a in listX
)
for key, v1, v2 in listY:
if key not in counter:
counter[key] = (0, 0)
counter[key][0] += v1
counter[key][1] += v2
result = [(key, value[0], value[1]) for key, value in counter.items()]

I'd say use a dictionary:
result = {}
for eachlist in (ListX, ListY,):
for item in eachlist:
if item[0] not in result:
result[item[0]] = item
It's always tricky do do data manipulation if you have data in a structure that doesn't represent the data well. Consider using better data structures.

Use dictionary and its 'get' method.
d = {}
for x in (listX + listY):
y = d.get(x[0], (0, 0, 0))
d[x[0]] = (x[0], x[1] + y[1], x[2] + y[2])
d.values()

Related

Create a nested list based on tree relationship

I have a datatype that consists of multiple tuples in a list. It represents the relationship of parent-child.
For example, [('A', 1), ('A', 2, 1), ('A', 2, 2) ('A', 3), ('B', 1), ('B', 1, 1), ('B', 1, 2), ('C',)] where the tuples can either have 1, 2, or three items with the format of (letter, number, number). In the above example, ('B', 1) is the parent of ('B', 1, 1) and ('B', 1, 2), and so on until we reach just a letter.
My question is, how can I create a function that will receive something like the example above and create a nested list where the similar orders and letters/numbers will be grouped together.
For instance, how do I create a function that will take something like:
[('A', 1), ('A', 2, 1), ('A', 2, 2), ('A', 3), ('B', 1), ('B', 1, 1), ('B', 1, 2), ('B', 2), ('B', 3), ('C',)]
and turn it into:
[[('A', 1), [('A', 2, 1), ('A', 2, 2)] ('A', 3)], [[('B', 1, 1), ('B', 1, 2)], ('B', 2), ('B', 3)], ('C',)]
Also note that the list will come presorted already in alphabetical and numerical order. Only the lowest order tuples are in the input list as well. (Parental tuples will not appear in the input list if their children are present)
Thanks!

We basically can iterate over the tuples, and for each tuple recursively "dive" into the data structure, and add that element. I think however that a list is, at least for an intermediate structure, not appropriate. A dictionary allows fast retrieval, hence it will boost updating.
def to_nested_list(tuples):
data = {}
for tup in tuples:
elem = data
for ti in tup:
elem = elem.setdefault(ti, {})
stck = []
def to_list(source, dest):
for k, v in source.items():
stck.append(k)
if v:
dest.append(to_list(v, []))
else:
dest.append(tuple(stck))
stck.pop()
return dest
return to_list(data, [])
For the given sample data, we thus first construct a dictionary that looks, before the stck = [] line, like:
{'A': {1: {}, 2: {1: {}, 2: {}}, 3: {}}, 'B': {1: {1: {}, 2: {}}}, 'C': {}}
next we "harvest" the tuples of this structure, by iterating over the dictionary recursively, and each time if the corresponding value is not empty, adding a tuple we construct based on the "call path" to the corresponding sublist.
For example:
>>> to_nested_list([('A', 1), ('A', 2, 1), ('A', 2, 2), ('A', 3), ('B', 1), ('B', 1, 1), ('B', 1, 2), ('C',)])
[[('A', 1), [('A', 2, 1), ('A', 2, 2)], ('A', 3)], [[('B', 1, 1), ('B', 1, 2)]], ('C',)]
This works for tuples of arbitrary length, as long as the elements of these tuples are all hashable (strings and integers are hashable, so we are safe here if the tuples contain only letters and numbers).
That being said, I'm not sure that using a nested list is a good idea anyway. Such list will result in the fact that it can take a lot of time to verify that the list contains a certain tuple, since the elements of the list do not "hint" about the prefix of that tuple. I think the data dictionary is probable a better representation.

Set
a = [('A', 1), ('A', 2, 1), ('A', 2, 2), ('A', 3), ('B', 1), ('B', 1, 1), ('B', 1, 2), ('B', 2), ('B', 3), ('C',)]
The following solution works for trees with any depth:
First, a helper function that wraps each node with excess brackets if needed
def self_wrap(x, n):
output = x
for _ in range(n):
output = [output]
return output
Now, the main loop:
out_list = []
for i in range(len(a)):
# add 0th element to out_list
if i == 0:
out_list.append(self_wrap(a[i], len(a[i])-1))
continue
# determine the appropriate bracket level to add a[i]
prev_list = curr_list = out_list
j = 0
while min(len(a[i-1]), len(a[i])) > j and a[i-1][j] == a[i][j]:
prev_list, curr_list = curr_list, curr_list[-1]
print(curr_list, i, j)
j += 1
left_over_len = len(a[i]) - j - 1
# override if last item was parent
if j == len(a[i-1]):
prev_list[-1] = self_wrap(a[i], left_over_len + 1)
continue
# append a[i] to appropriate level and wrap with additional brackets if needed
curr_list.append(self_wrap(a[i], left_over_len) if left_over_len > 0 else a[i])
print(out_list)
This prints
[[('A', 1), [('A', 2, 1), ('A', 2, 2)], ('A', 3)], [[('B', 1, 1), ('B', 1, 2)], ('B', 2), ('B', 3)], ('C',)]
as expected.
As people have pointed out, this structure is not very efficient. There are 2 reasons:
redundant information
lists are hard to manipulate/lookup
That being said, the is probably the only way to represent paths.

Creating Python defaultdict using nested list of tuples

The scenario is that I have a 2-D list. Each item of the inner list is tuple (key, value pair). The key might repeat in the list. I want to create a default-dict on the fly, in such a way that finally, the dictionary stores the key, and the cumulative sum of all the values of that key from the 2-D list.
To put the code :
listOfItems = [[('a', 1), ('b', 3)], [('a', 6)], [('c', 0), ('d', 5), ('b', 2)]]
finalDict = defaultdict(int)
for eachItem in listOfItems:
for key, val in eachItem:
finalDict[key] += val
print(finalDict)
This is giving me what I want : defaultdict(<class 'int'>, {'a': 7, 'b': 5, 'c': 0, 'd': 5}) but I am looking for a more 'Pythonic' way using comprehensions. So I tried the below :
finalDict = defaultdict(int)
finalDict = {key : finalDict[key]+val for eachItem in listOfItems for key, val in eachItem}
print(finalDict)
But the output is : {'a': 6, 'b': 2, 'c': 0, 'd': 5} What is it that I am doing wrong? Or is it that when using comprehension the Dictionary is not created and modified on the fly?

Yes a comprehension can't be updated on-the-fly. Anyway, this task might be better suited to collections.Counter() with .update() calls:
>>> from collections import Counter
>>> c = Counter()
>>> for eachItem in listOfItems:
... c.update(dict(eachItem))
...
>>> c
Counter({'a': 7, 'b': 5, 'd': 5, 'c': 0})

This is because you do not assign any value to your finalDict inside your dict in comprehension.
In your dict in comprehension you are literally changing the type of finalDict
As far as I know you cannot assign value to your dict inside a dict in comprehension.
Here is a way to get the dictionnary you want
from functools import reduce
listOfItems = [[('a', 1), ('b', 3)], [('a', 6)], [('c', 0), ('d', 5), ('b', 2)]]
list_dict = [{key: val} for eachItem in listOfItems for key, val in eachItem]
def sum_dict(x, y):
return {k: x.get(k, 0) + y.get(k, 0) for k in set(x) | set(y)}
print(reduce(sum_dict, list_dict))

Simple solution without using additional modules:
inp_list = [[('a', 1), ('b', 3)], [('a', 6)], [('c', 0), ('d', 5), ('b', 2)]]
l = [item for sublist in inp_list for item in sublist] # flatten the list
sums = [(key, sum([b for (a,b) in l if a == key])) for key in dict(l)]
print(sums)

trying to use python's built-in methods instead of coding the functionality myself:
The long and explained solution
from itertools import chain, groupby
from operator import itemgetter
listOfItems = [[('a', 1), ('b', 3)], [('a', 6)], [('c', 0), ('d', 5), ('b', 2)]]
# just flat the list of lists into 1 list..
flatten_list = chain(*listOfItems)
# get all elements grouped by the key, e.g 'a', 'b' etc..
first = itemgetter(0)
groupedByKey = groupby(sorted(flatten_list, key=first), key=first))
#sum
summed_by_key = ((k, sum(item[1] for item in tups_to_sum)) for k, tups_to_sum in groupedByKey)
# create a dict
d = dict(summed_by_key)
print(d) # {'a': 7, 'b': 5, 'c': 0, 'd': 5}
~one line solution
from itertools import chain, groupby
from operator import itemgetter
first = itemgetter(0)
d = dict((k, sum(item[1] for item in tups_to_sum)) for k, tups_to_sum in groupby(sorted(chain(*listOfItems), key=first), key=first))
print(d) # {'a': 7, 'b': 5, 'c': 0, 'd': 5}

Sum numbers by letter in list of tuples

I have a list of tuples:
[ ('A',100), ('B',50), ('A',50), ('B',20), ('C',10) ]
I am trying to sum up all numbers that have the same letter. I.e. I want to output
[('A', 150), ('B', 70), ('C',10)]
I have tried using set to get the unique values but then when I try and compare the first elements to the set I get
TypeError: unsupported operand type(s) for +: 'int' and 'str'
Any quick solutions to match the numbers by letter?

Here is a one(and a half?)-liner: group by letter (for which you need to sort before), then take the sum of the second entries of your tuples.
from itertools import groupby
from operator import itemgetter
data = [('A', 100), ('B', 50), ('A', 50), ('B', 20), ('C', 10)]
res = [(k, sum(map(itemgetter(1), g)))
for k, g in groupby(sorted(data, key=itemgetter(0)), key=itemgetter(0))]
print(res)
// => [('A', 150), ('B', 70), ('C', 10)]
The above is O(n log n) — sorting is the most expensive operation. If your input list is truly large, you might be better served by the following O(n) approach:
from collections import defaultdict
data = [('A', 100), ('B', 50), ('A', 50), ('B', 20), ('C', 10)]
d = defaultdict(int)
for letter, value in data:
d[letter] += value
res = list(d.items())
print(res)
// => [('B', 70), ('C', 10), ('A', 150)]

>>> from collections import Counter
>>> c = Counter()
>>> for k, num in items:
c[k] += num
>>> c.items()
[('A', 150), ('C', 10), ('B', 70)]
Less efficient (but nicer looking) one liner version:
>>> Counter(k for k, num in items for i in range(num)).items()
[('A', 150), ('C', 10), ('B', 70)]

How about this: (assuming a is the name of the tuple you have provided)
letters_to_numbers = {}
for i in a:
if i[0] in letters_to_numbers:
letters_to_numbers[i[0]] += i[1]
else:
letters_to_numbers[i[0]] = i[1]
b = letters_to_numbers.items()
The elements of the resulting tuple b will be in no particular order.

In order to achieve this, firstly create a dictionary to store your values. Then convert the dict object to tuple list using .items() Below is the sample code on how to achieve this:
my_list = [ ('A',100), ('B',50), ('A',50), ('B',20), ('C',10) ]
my_dict = {}
for key, val in my_list:
if key in my_dict:
my_dict[key] += val
else:
my_dict[key] = val
my_dict.items()
# Output: [('A', 150), ('C', 10), ('B', 70)]

What is generating the list of tuples? Is it you? If so, why not try a defaultdict(list) to append the values to the right letter at the time of making the list of tuples. Then you can simply sum them. See example below.
>>> from collections import defaultdict
>>> val_store = defaultdict(list)
>>> # next lines are me simulating the creation of the tuple
>>> val_store['A'].append(10)
>>> val_store['B'].append(20)
>>> val_store['C'].append(30)
>>> val_store
defaultdict(<class 'list'>, {'C': [30], 'A': [10], 'B': [20]})
>>> val_store['A'].append(10)
>>> val_store['C'].append(30)
>>> val_store['B'].append(20)
>>> val_store
defaultdict(<class 'list'>, {'C': [30, 30], 'A': [10, 10], 'B': [20, 20]})
>>> for val in val_store:
... print(val, sum(val_store[val]))
...
C 60
A 20
B 40

Try this:
a = [('A',100), ('B',50), ('A',50), ('B',20), ('C',10) ]
letters = set([s[0] for s in a])
new_a = []
for l in letters:
nums = [s[1] for s in a if s[0] == l]
new_a.append((l, sum(nums)))
print new_a
Results:
[('A', 150), ('C', 10), ('B', 70)]

A simpler approach
x = [('A',100),('B',50),('A',50),('B',20),('C',10)]
y = {}
for _tuple in x:
if _tuple[0] in y:
y[_tuple[0]] += _tuple[1]
else:
y[_tuple[0]] = _tuple[1]
print [(k,v) for k,v in y.iteritems()]

A one liner:
>>> x = [ ('A',100), ('B',50), ('A',50), ('B',20), ('C',10) ]
>>> {
... k: reduce(lambda u, v: u + v, [y[1] for y in x if y[0] == k])
... for k in [y[0] for y in x]
... }.items()
[('A', 150), ('C', 10), ('B', 70)]

Returning all keys that have the same corresponding value in a dictionary with python

I'm new to this site, and I have a problem that I need some help with. I am trying to find the highest integer value in a dictionary and the corresponding key and then check if there are other keys with the same value. If there are duplicate values i want to randomly select one of them and return it. As of now the code can find the highest value in the dictionary and return the key, but it returns the same key each time. I'm not able to check for other keys with the same value.
def lvl2():
global aiMove2
posValueD = {}
for x in moveList(): #Movelist returns a list of tuples
m = aiFlip(x) #aiFlip returns an integer
posValueD[x] = m
aiMove2 = max(posValueD, key = posValueD.get)
return aiMove2

After getting the maximum, you can check each key of their values. This comprehension list returns a list of keys where the value associated if the same as aiMove2.
keys = [x for x,y in posValueD.items() if y == posValueD[aiMove2]]
Here's an example in Python shell:
>>> a = {'a':1, 'b':2, 'c':2}
>>> [x for x,y in a.items() if y == 2]
['c', 'b']

You could write something like this:
max_value = 0
max_keys = []
for key,value in myDict.iteritems():
if value > max_value:
max_value = value
max_keys = [key]
elif value == max_value:
max_keys.append(key)
if max_keys:
return random.choice(max_keys)
return None

You could use itertools groupby:
from itertools import groupby
di={'e': 0, 'd': 1, 'g': 2, 'f': 0, 'a': 1, 'c': 3, 'b': 2, 'l': 2, 'i': 1, 'h': 3, 'k': 0, 'j': 1}
groups=[]
for k, g in groupby(sorted(di.items(), key=lambda t: (-t[1], t[0])), lambda t: t[1]):
groups.append(list(g))
print(groups)
# [[('c', 3), ('h', 3)],
[('b', 2), ('g', 2), ('l', 2)],
[('a', 1), ('d', 1), ('i', 1), ('j', 1)],
[('e', 0), ('f', 0), ('k', 0)]]
Or, more succinctly:
print([list(g) for k, g in groupby(
sorted(di.items(), key=lambda t: (-t[1], t[0])),
lambda t: t[1])])
Then just take the first list in the groups list of lists.

how to write a function to add the integer of corresponding letter in python?

how to write a function to add the integer of corresponding letter in python?
for example:
L=[('a',3),('b',4),('c',5),('a',2),('c',2),('b',1)]
How to solve it by just loop over the item in L?

I guess the clearest way is just to loop through and add them up.
>>> L=[('a',3),('b',4),('c',5),('a',2),('c',2),('b',1)]
>>> import collections
>>> d=collections.defaultdict(int)
>>> for key,n in L:
... d[key] += n
...
>>> sorted(d.items())
[('a', 5), ('b', 5), ('c', 7)]

You can use dictionary for it and add the repeated key values , Just like that.
dict = {}
for i in L:
if i[0] in dict:
dict[i[0]] += i[1]
else:
dict[i[0]] = i[1]
dict.items()
Output will be : [('a', 5), ('c', 7), ('b', 5)]

you can try to define a function like this :
def sorting(L):
dit = {}
result = []
for l in L :
dit[l[0]]= 0
for key , item in dit.items():
for ll in L :
if key == ll[0] :
dit[key] += ll[1]
for key , item in dit.items():
result.append((key , item))
return sorted(result)
you will see the result :
>>> sorting(L)
[('a', 5), ('b', 5), ('c', 7)]

Here's the obligatory one-line itertools solution:
>>> import itertools
>>> [
... (k, sum(g[1] for g in group))
... for k, group in itertools.groupby(sorted(L), key=lambda x: x[0])
... ]
[('a', 5), ('b', 5), ('c', 7)]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Combine two lists: aggregate values that have similar keys - python

This is pretty easy if you use a dictionary. combined = {} for item in listX + listY: key = item[0] if key in combined: combined[key][0] += item[1] combined[key][1] += item[2] else: combined[key] = [item[1], item[2]] result = [(key, value[0], value[1]) for key, value in combined.items()]

I'd say use a dictionary: result = {} for eachlist in (ListX, ListY,): for item in eachlist: if item[0] not in result: result[item[0]] = item It's always tricky do do data manipulation if you have data in a structure that doesn't represent the data well. Consider using better data structures.

Use dictionary and its 'get' method. d = {} for x in (listX + listY): y = d.get(x[0], (0, 0, 0)) d[x[0]] = (x[0], x[1] + y[1], x[2] + y[2]) d.values()

Related

Create a nested list based on tree relationship

Creating Python defaultdict using nested list of tuples

Sum numbers by letter in list of tuples

Returning all keys that have the same corresponding value in a dictionary with python

how to write a function to add the integer of corresponding letter in python?

Categories

Resources