How to add N OrderDict() in python - python

Assuming that I have 2 OrderedDict(), I can get the result of the (+) operation by doing the following action:
dict1 = OrderedDict([(52, 0),
(53, 0),
(1, 0),
(2, 0),
(3, 0),
(4, 0),
(5, 0),
(6, 0),
(7, 0),
(8, 0),
(9, 0),
(10, 0),
(11, 1)])
dict2 = OrderedDict([(52, 0),
(53, 0),
(1, 0),
(2, 5),
(3, 0),
(4, 0),
(5, 0),
(6, 1),
(7, 0),
(8, 0),
(9, 0),
(10, 1),
(11, 1)])
dict3 = OrderedDict((k, dict1[k] + dict2[k]) for k in dict1 if k in dict2)
print(dict3)
OrderedDict([(52, 0),
(53, 0),
(1, 0),
(2, 5),
(3, 0),
(4, 0),
(5, 0),
(6, 1),
(7, 0),
(8, 0),
(9, 0),
(10, 1),
(11, 2)])
My question is: how can I generalize the above action so I can get the (+) operation result for N OrderedDict()?

By testing each key for membership of each other dict you're essentially performing an operation of a set intersection, but you can't actually use set intersections because sets are unordered in Python.
You can work around this limitation by installing the ordered-set package, so that you can use the OrderedSet.intersection method to obtain common keys among the dicts ordered by keys in the first dict, which you can then iterate over to construct a new OrderedDict with each value being the sum of the values of the current key from all dicts:
from ordered_set import OrderedSet
dicts = [dict1, dict2]
common_keys = OrderedSet.intersection(*dicts)
print(OrderedDict((k, sum(d[k] for d in dicts)) for k in common_keys))
Demo: https://replit.com/#blhsing/FlawlessGrowlingAccounting

Some naive approach using map-reduce. Note that I didn't test the following code, so it might need some adjustments
import operator
dicts = [dict1, dict2, dict3, dict4]
dicts_keys = map(lambda d: set(d.keys()), dicts)
common_keys = set.intersection(*dicts_keys)
sum_dict = OrderedDict(
(k, reduce(operator.add, map(lambda d: d[k], dicts)))
for k in common_keys)

In case you don't want to install an external package, a similar result can be achieved by using this function:
def add_dicts(*args):
items_list = list()
for k in args[0]:
if all([k in arg for arg in args[1:]]):
value = 0
for arg in args:
value += arg[k]
items_list.append((k, value))
return OrderedDict(items_list)
To call it:
dict3 = add_dicts(dict1, dict2)
dict4 = add_dicts(dict1, dict2, dict3)
If you want to call it with a list of dictionaries:
dict_list=[dict1, dict2]
dict5 = add_dicts(*dict_list)
More information about *args can be found in this answer

Related

How to find the maximum per group in an rdd?

I'm using PySpark and I have an RDD that looks like this:
[
("Moviex", [(1, 100), (2, 20), (3, 50)]),
("MovieY", [(1, 100), (2, 250), (3, 100), (4, 120)]),
("MovieZ", [(1, 1000), (2, 250)]),
("MovieX", [(4, 50), (5, 10), (6, 0)]),
("MovieY", [(3, 0), (4, 260)]),
("MovieZ", [(5, 180)]),
]
The first element in the tuple represents the week number and the second element represents the number of viewers. I want to find the week with the most views for each movie, but ignoring the first week.
I've tried some things but nothing worked, for example:
stats.reduceByKey(max).collect()
returns:
[('MovieX', [(4, 50), (5, 10), (6, 0)]),
('MovieY', [(5, 180)]),
('MovieC', [(3, 0), (4, 260)])]
so the entire second set.
Also this:
stats.groupByKey().reduce(max)
which returns just this:
('MovieZ', <pyspark.resultiterable.ResultIterable at 0x558f75eeb0>)
How can I solve this?
If you want the most views per movie, ignoring the first week ... [('MovieA', 50), ('MovieC', 250), ('MovieB', 260)]
Then, you'll want your own map function rather than a reduce.
movie_stats = spark.sparkContext.parallelize([
("MovieA", [(1, 100), (2, 20), (3, "50")]),
("MovieC", [(1, 100), (2, "250"), (3, 100), (4, "120")]),
("MovieB", [(1, 1000), (2, 250)]),
("MovieA", [(4, 50), (5, "10"), (6, 0)]),
("MovieB", [(3, 0), (4, "260")]),
("MovieC", [(5, "180")]),
])
def get_views_after_first_week(v):
values = iter(v) # iterator of tuples, groupped by key
result = list()
for x in values:
result.extend([int(y[1]) for y in x if y[0] > 1])
return result
mapped = movie_stats.groupByKey().mapValues(get_views_after_first_week).mapValues(max)
mapped.collect()
to include the week number... [('MovieA', (3, 50)), ('MovieC', (2, 250)), ('MovieB', (4, 260))]
def get_max_weekly_views_after_first_week(v):
values = iter(v) # iterator of tuples, groupped by key
max_views = float('-inf')
max_week = None
for x in values:
for t in x:
week, views = t
views = int(views)
if week > 1 and views > max_views:
max_week = week
max_views = views
return (max_week, max_views, )
mapped = movie_stats.groupByKey().mapValues(get_max_weekly_views_after_first_week)
Some code is needed to convert the string into int, and apply a map function to 1) filter out week 1 data; 2) get the week with max view.
def helper(arr: list):
max_week = None
for sub_arr in arr:
for item in sub_arr:
if item[0] == 1:
continue
count = int(item[1])
if max_week is None or max_week[1] < count:
max_week = [item[0], count]
return max_week
movie_stats.groupByKey().map(lambda x: (x[0], helper(x[1]))).collect()

Converting keys from float to int in dict in dict in list in dict

I've a dict data as below. I want to convert the float into integers. How do I go about it? I tried a few ways but to no avail.
data:
data =
{'ABC': {'2020-09-01': [{487.0: (0, 1), 488.0: (1, 2)}, {489.0: (0, 1), 481.0: (1, 2)}]},
'CDE': {'2020-01-01': [{484.0: (0, 1), 483.0: (1, 2)}, {482.0: (0, 1), 481.0: (1, 2)}]}}
I want this:
{'ABC': {'2020-09-01': [{487: (0, 1), 488: (1, 2)}, {489: (0, 1), 481: (1, 2)}]},
'CDE': {'2020-01-01': [{484: (0, 1), 483: (1, 2)}, {482: (0, 1), 481: (1, 2)}]}}
I tried this code, but I get this error "RuntimeError: dictionary keys changed during iteration":
I understand keys are immutable so I googled and found "pop" is an alternative solution
for i in data:
for date in data[i]:
for model in range(0, len(data[i][date])):
for k, v in data[i][date][model].items():
data[i][date][model][int(k)] = data[i][date][model].pop(k)
The problem is that you are tying to modify the dictionary while iterating over the same in:
for k, v in data[i][date][model].items():
data[i][date][model][int(k)] = data[i][date][model].pop(k)
You could consider using list comprehension instead:
for k_l1, v_l1 in data.items(): #iterate first level of dict
for k_l2, v_l2 in v_l1.items(): #iterate second level of dict
data[k_l1][k_l2] = [{ int(key): val for key, val in elt.items() } for elt in v_l2] # update the list
Output:
{'ABC': {'2020-09-01': [{487: (0, 1), 488: (1, 2)}, {489: (0, 1), 481: (1, 2)}]}, 'CDE': {'2020-01-01': [{484: (0, 1), 483: (1, 2)}, {482: (0, 1), 481: (1, 2)}]}}

extract data from file as a dictonary

could you please advice how to read data from file as a dict.
file contains the following lines:
{'foo1': (0, 10), 'foo2': (0, 9), 'foo3': (0, 20)}
{'foo4': (0, 16), 'foo5': (0, 7), 'foo6': (0, 13), 'foo7': (0, 11)}
{'foo8': (0, 8), 'foo9': (0, 8), 'foo10': (0, 7)}
{'foo11': (0, 8)}
all data in {'key': (value, value)} format. All keys in the file are different.
I'd like to get the following "dict":
{'foo1': (0, 1), 'foo2': (0, 0), 'foo3': (0, 1), 'foo4': (1, 0), 'foo5': (0, 0), 'foo6': (0, 5), 'foo7': (0, 2), 'foo8': (2, 2), 'foo9': (1, 1), 'foo10': (0, 7), 'foo11': (0, 1)}
is it possible to extract dicts from the file as merged dict?
For a moment I get only "list" from the file and stucked at this step
import ast
with open('filename') as f:
content = [ ast.literal_eval( l ) for l in f.readlines() ]
print(content)
Output:
[{'foo1': (0, 10), 'foo2': (0, 9), 'foo3': (0, 20)}, {'foo4': (0, 16), 'foo5': (0, 7), 'foo6': (0, 13), 'foo7': (0, 11)}, {'foo8': (0, 8), 'foo9': (0, 8), 'foo10': (0, 7)}, {'foo11': (0, 8)}]
Check out ast.literal_eval(): https://www.kite.com/python/answers/how-to-read-a-dictionary-from-a-file-in--python
If you are totally confident in this file being innocent and only a dictionary, then you can use the python built-in function eval to get the job done. e.g.:
myfile = open("file.txt", "r")
mydict = eval(myfile.read())
If this allows any user input into the file, however, this could be used to potentially run arbitrary code on your machine. There are precautions to be took if this is reliant on user input, see the top answer on Python: make eval safe for some ideas.
As you get a list of dictionary in content due to list comprehension.
I will modify a little.
import ast
content = {}
with open('filename') as f:
content = content.update(content,ast.literal_eval( l ) for l in f.readlines())
print(content)
See if it works I am beginner. I learned how to merge dictionary from below link.
https://levelup.gitconnected.com/7-different-ways-to-merge-dictionaries-in-python-30148bf27add

How to iterate over a dictionary of tuples

I have a list of tuples called possible_moves containing possible moves on a board in my game:
[(2, 1), (2, 2), (2, 3), (3, 1), (4, 5), (5, 2), (5, 3), (6, 0), (6, 2), (7, 1)]
Then, I have a dictionary that assigns a value to each cell on the game board:
{(0,0): 10000, (0,1): -3000, (0,2): 1000, (0,3): 800, etc.}
I want to iterate over all possible moves and find the move with the highest value.
my_value = 0
possible_moves = dict(possible_moves)
for move, value in moves_values:
if move in possible_moves and possible_moves[move] > my_value:
my_move = possible_moves[move]
my_value = value
return my_move
The problem is in the part for move, value, because it creates two integer indexes, but I want the index move to be a tuple.
IIUC, you don't even need the list of possible moves. The moves and their scores you care about are already contained in the dictionary.
>>> from operator import itemgetter
>>>
>>> scores = {(0,0): 10000, (0,1): -3000, (0,2): 1000, (0,3): 800}
>>> max_move, max_score = max(scores.items(), key=itemgetter(1))
>>>
>>> max_move
(0, 0)
>>> max_score
10000
edit: turns out I did not understand quite correctly. Assuming that the list of moves, let's call it possible_moves, contains the moves possible right now and that the dictionary scores contains the scores for all moves, even the impossible ones, you can issue:
max_score, max_move = max((scores[move], move) for move in possible_moves)
... or if you don't need the score:
max_move = max(possible_moves, key=scores.get)
You can use max with dict.get:
possible_moves = [(2, 1), (2, 2), (2, 3), (3, 1), (4, 5), (5, 2),
(5, 3), (6, 0), (6, 2), (7, 1), (0, 2), (0, 1)]
scores = {(0,0): 10000, (0,1): -3000, (0,2): 1000, (0,3): 800}
res = max(possible_moves, key=lambda x: scores.get(x, 0)) # (0, 2)
This assumes moves not found in your dictionary have a default score of 0. If you can guarantee that every move is included as a key in your scores dictionary, you can simplify somewhat:
res = max(possible_moves, key=scores.__getitem__)
Note the syntax [] is syntactic sugar for __getitem__: if the key isn't found you'll meet KeyError.
If d is a dict, iterator of d generates keys. d.items() generates key-value pairs. So:
for move, value in moves_values.items():
possibleMoves=[(2, 1), (2, 2), (2, 3), (3, 1), (4, 5), (5, 2),(0, 3),(5, 3), (6, 0), (6, 2), (7, 1),(0,2)]
movevalues={(0,0): 10000, (0,1): -3000, (0,2): 1000, (0,3): 800}
def func():
my_value=0
for i in range(len(possibleMoves)):
for k,v in movevalues.items():
if possibleMoves[i]==k and v>my_value:
my_value=v
return my_value
maxValue=func()
print(maxValue)

How to merge repeated elements in list in python?

I have a list of coordinate like:
list_coordinate =[(9,0),(9,1),(9,3) ... (53,0),(53,1),(53,3)...(54,0),(54,1)..]
value = []
for m in range(0,len(list_coordinate)):
if m != len(list_coordinate)-1:
if list_coordinate[m][0]==list_coordinate[m+1][0]:
value.append(list_coordinate[m][0])`
Output of this code:
value = [9,9 ,9,...,53,53,53,...,54,54,54,54...]
I want to merge this value list for similar element and want output as:
Expected output:
[9,53,54]
If you prefer one-liners, you can do it like this:
list(set(map(lambda x: x[0], list_coordinate)))
It will output:
[9, 53, 54]
Note: As set is being used in the code, ordering of the elements is not guaranteed here.
you can use itertools.groupby
from itertools import groupby
value = [9,9 ,9,53,53,53,54,54,54,54]
g = [k for k,_ in groupby(value)]
print(g)
which produces
[9, 53, 54]
and it is guaranteed to be in the same order as the input list (if it matters).
Basically
groupby(iterable[, keyfunc])
groups the elements in the iterable, passing to a new group when the key function changes.
If the key function is omitted, the identity function is assumed, and the key for the group will be each element encountered.
So as long as the elements in value stay the same, they will be grouped under the same key, which is the element itself.
Note: this works for contiguous repetitions only. In case you wanted to get rid of re-occurring duplicates, you should sort the list first (as groupby docs explains)
As per your comment below, in case you wanted to operate on the coordinates directly
list_coordinate = [(9,0), (9,1), (9,3), (53,0), (53,1), (53,3), (54,0), (54,1)]
g = [k for k,_ in groupby(list_coordinate, lambda x: x[0])]
print(g)
produces the same output
[9, 53, 54]
You could use an OrderedDict for both of your cases. Firstly for just the x coordinates:
list_coords = [(9, 0), (9, 1), (9, 3), (53, 0), (53, 1), (53, 3), (54, 0), (54, 1)]
merged = OrderedDict()
for coord in list_coords:
merged[coord[0]] = 1
print merged.keys()
Giving:
[9, 53, 54]
Note, if for example (9, 0) was repeated later on, it would not change the output.
Secondly, for whole coordinates. Note, the data has (10 ,0) repeated 3 times:
list_coords = [(9, 0), (9, 1), (9, 3), (10, 0), (10, 0), (10, 0), (53, 0), (53, 1), (53, 3), (54, 0), (54, 1)]
merged = OrderedDict()
for coord in list_coords:
merged[coord] = 1
print merged.keys()
Giving:
[(9, 0), (9, 1), (9, 3), (10, 0), (53, 0), (53, 1), (53, 3), (54, 0), (54, 1)]
Why don't you use a set:
{ k[0] for k in list_coordinate }

Categories

Resources