Python: comparison of two dict lists - python

Here is what I want to achieve:
I have got two lists of dictionaries. All the dictionaries have the following structure:
dictinary = {'name':'MyName', 'state':'MyState'}
I would like to go through all the elements of both lists and compare the states of the entries with the same name. Here is the best way that I can imagine:
for d in list1:
name = d['name']
for d2 in list2:
if d2['name'] == name:
if d1['state'] != d2['state']:
# Do something
While I think that this approach would work, I wonder whether there is a more efficient and/or elegant way to perform this operation. Thank you for your ideas!

have a look at product from itertools:
import itertools
xs = range(1,10)
ys = range(11,20)
zs = itertools.product(xs,ys)
list(zs)
[(1, 11), (1, 12), (1, 13), (1, 14), (1, 15), (1, 16), (1, 17), (1, 18), (1, 19), (2, 11), (2, 12), (2, 13), (2, 14), (2, 15), (2, 16), (2, 17), (2, 18), (2, 19), (3, 11), (3, 12), (3, 13), (3, 14), (3, 15), (3, 16), (3, 17), (3, 18), (3, 19), (4, 11), (4, 12), (4, 13), (4, 14), (4, 15), (4, 16), (4, 17), (4, 18), (4, 19), (5, 11), (5, 12), (5, 13), (5, 14), (5, 15), (5, 16), (5, 17), (5, 18), (5, 19), (6, 11), (6, 12), (6, 13), (6, 14), (6, 15), (6, 16), (6, 17), (6, 18), (6, 19), (7, 11), (7, 12), (7, 13), (7, 14), (7, 15), (7, 16), (7, 17), (7, 18), (7, 19), (8, 11), (8, 12), (8, 13), (8, 14), (8, 15), (8, 16), (8, 17), (8, 18), (8, 19), (9, 11), (9, 12), (9, 13), (9, 14), (9, 15), (9, 16), (9, 17), (9, 18), (9, 19)]
A couple of other things -
when you are only representing two things, it is common to use a tuple (even a named tuple)
so have a think about why they are dicts to begin with - you might have a great reason :)
[('name','state'),('name','state'),('name','state')...]
Another approach, would be to compare elements directly, for example you could check the intersection of setA (list of dicts 1) and setB (list of dicts 2)
>>> listA = [('fred','A'), ('bob','B'), ('mary', 'D'), ('eve', 'E')]
>>> listB = [('fred','X'), ('clive', 'C'), ('mary', 'D'), ('ben','B')]
# your listA and listB could be sets to begin with
>>> set.intersection(set(listA),set(listB))
set([('mary', 'D')])
this approach however does not allow for duplicates...

The most elegant way I can think of is a list comprehension.
[[do_something() for d1 in list1 if d1["name"] == d2["name"] and d1["state"] != d2["state"]] for d2 in list2]
But that's kind of the same code.
You can also make your sample code a bit more elegant by reducing it a bit:
for d in list1:
for d2 in list2:
if d2['name'] == d['name'] and d['state'] != d2['state']:
# Do something

The other answers are functional (they deliver the correct answer), but won't perform well for large lists because they use nested iteration -- for lists of length N, the number of steps they use grows like N^2. This isn't a concern if the lists are small; but if the lists are big, the number of iterations would explode.
An alternate approach that keeps time complexity linear with N goes like this (being pretty verbose):
##
## sample data
data = list()
data.append( [
dict(name='a', state='0'),
dict(name='b', state='1'),
dict(name='c', state='3'),
dict(name='d', state='5'),
dict(name='e', state='7'),
dict(name='f', state='10'),
dict(name='g', state='11'),
dict(name='h', state='13'),
dict(name='i', state='14'),
dict(name='l', state='19'),
])
data.append( [
dict(name='a', state='0'),
dict(name='b', state='1'),
dict(name='c', state='4'),
dict(name='d', state='6'),
dict(name='e', state='8'),
dict(name='f', state='10'),
dict(name='g', state='12'),
dict(name='j', state='16'),
dict(name='k', state='17'),
dict(name='m', state='20'),
])
##
## coalesce lists to a single flat dict for searching
dCombined = {}
for d in data:
dCombined = { i['name'] : i['state'] for i in d }
##
## to record mismatches
names = []
##
## iterate over lists -- individually / not nested
for d in data:
for i in d:
if i['name'] in dCombined and i['state'] != dCombined[i['name']]:
names.append(i['name'])
##
## see result
print names
Caveats:
The OP didn't say if there could be repeated names within a list; that would change this approach a bit.
Depending on the details of "do something" you might record something other than justthe names -- could store references to or copies of the individual dict objects, or whatever "do something" requires.
The trade-off for this approach is that it requires more memory than the previous answers; however the memory requirement scales only with the number of actual mismatches, and is O(N).
Notes:
This approach also works when you have more than 2 lists to compare -- e.g. if there were 5 lists, my alternative is still O(N) in time and memory, while the previous answers would be O(N^5) in time!

Related

how to check if several lists have "maximum" number of shared elements and return that list

In addition to a reference list, I have a list of lists of varying lengths and want to be able to eventually choose a single list among these lists that have the maximum number of "shared elements" with that reference list and return that list. is there any better pythonic way to do this?
My data structure and code attempt is as follows:
reference_list = [(0, 0), (3, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10), (15, 19), (15, 20)]
list_1 = [(7, 7), (8, 8), (9, 9), (10, 10)]
list_2 = [(6, 6), (7, 7), (8, 8), (9, 9), (10, 10), (15, 19), (15, 20)]
list_3 = [(0, 0), (7, 7), (10, 10), (15, 19), (15, 20)]
list_4 = [(2, 2), (8, 8), (9, 9), (7, 7), (8, 8), (9, 9)]
#list_5 = [(5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10), (15, 19), (15, 20)]
#list_6 = [(0, 0), (3, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10), (15, 19), (15, 20)]
list_of_lists = (list_1, list_2, list_3, list_4)#, list_5)#, list_6)
counts = []
#flag = 0
for list_element in list_of_lists:
count = 0
print(list_element)
lengh_of_element = len(list_element)
for i in range(lengh_of_element):
if list_element[i] in reference_list:
count=count+1
counts.append(count)
maximum_count = max(counts)
max_count_index = counts.index(maximum_count)
selected_list = list_of_lists[max_count_index]
print('the list with maximum number of shared elements with the reference list is: list_', max_count_index+1)
You can just do
s = pd.DataFrame(list_of_lists).isin(reference_list).sum(1)
Out[351]:
0 4
1 7
2 5
3 5
dtype: int64
list_of_lists[s.idxmax()]
Out[352]: [(6, 6), (7, 7), (8, 8), (9, 9), (10, 10), (15, 19), (15, 20)]

List of lists of tuples, sum element-wise

I have a list of lists of tuples. Each inner list contains 3 tuples, of 2 elements each:
[
[(3, 5), (4, 5), (4, 5)],
[(7, 13), (9, 13), (10, 13)],
[(5, 7), (6, 7), (7, 7)]
]
I need to get a single list of 3 tuples, summing all these elements "vertically", like this:
(3, 5), (4, 5), (4, 5)
+ + + + + +
(7, 13), (9, 13), (10, 13)
+ + + + + +
(5, 7), (6, 7), (7, 7)
|| || ||
[(15, 25), (19, 25), (21, 25)]
so, for example, the second tuple in the result list is given by the sums of the second tuples in the initial list
(4+9+6, 5+13+7) = (19, 25)
I'm trying with list/tuple comprehensions, but I'm getting a little lost with this.
You can use zip and sum for something a little longer, but without the heavyweight dependency on numpy if you aren't already using it.
>>> [tuple(sum(v) for v in zip(*t)) for t in zip(*x)]
[(15, 25), (19, 25), (21, 25)]
The outer zip pairs the corresponding tuples together; the inner zip pairs corresponding elements of those tuples together for addition.
You could do this pretty easily with numpy. Use sum on axis 0.
import numpy as np
l = [
[(3, 5), (4, 5), (4, 5)],
[(7, 13), (9, 13), (10, 13)],
[(5, 7), (6, 7), (7, 7)]
]
[tuple(x) for x in np.sum(l,0)]
Output
[(15, 25), (19, 25), (21, 25)]
You could do this with pure python code.
lst = [
[(3, 5), (4, 5), (4, 5)],
[(7, 13), (9, 13), (10, 13)],
[(5, 7), (6, 7), (7, 7)]
]
lst2 = []
for a in range(len(lst[0])):
l = []
for i in range(len(lst)):
l.append(lst[i][a])
lst2.append(l)
output = []
for a in lst2:
t = [0 for a in range(len(lst[0][0]))]
for i in range(len(a)):
for z in range(len(a[i])):
t[z]+= a[i][z]
output.append(tuple(t))
print(output)
if you change the list then its is works.
output
IN:
lst = [
[(3, 5), (4, 5), (4, 5)],
[(7, 13), (9, 13), (10, 13)],
[(5, 7), (6, 7), (7, 7)]
]
OUT:
[(15, 25), (19, 25), (21, 25)]
IN:
lst = [
[(3, 5,2), (4, 5,3), (4, 5,1)],
[(7, 13,1), (9, 13,3), (10, 13,3)],
[(5, 7,6), (6, 7,3), (7, 7,7)]
]
OUT:
[(15, 25, 9), (19, 25, 9), (21, 25, 11)]
data = [
[(3, 5), (4, 5), (4, 5)],
[(7, 13), (9, 13), (10, 13)],
[(5, 7), (6, 7), (7, 7)]
]
result = [tuple(sum(x) for x in zip(*t)) for t in zip(*data)]
print(result)
This is a one-liner, I don't think you can get more pythonic than this.

Crate a List of elements from different combination of list iltems

I have N number of List. Here, I am giving two examples.
List_1 = [5,6,7,8,9,10]
List_2 = [5,6,7,8,9,10]
I want to create a List of tuple from this N number of list. For two elements of list output should be,
[(5,),(6,),(7,),(8,),(9,),(10,),(5,5,),(5,6)....(5,10),(6,5,),(6,6)....(6,10),(7,5,),(7,6)....(7,10)
.............(10,10)]
The output element is 1 to N number of elements pair using all combinations of list elements.
List_1 = [5,6,7,8,9,10]
List_2 = [5,6,7,8,9,10]
List_3 = [5,6,7,8,9,10]
For 3 list element output is,
[(5,),(6,),(7,),(8,),(9,),(10,),(5,5,),(5,6)....(5,10),(6,5,),(6,6)....(6,10),(7,5,),(7,6)....(7,10)
.............(10,10),(5,5,5)..(all combination of 1 ,2 & 3 elements items of three list)...(10,10,10)]
Note: All list have the same value
This could be a possible solution:
from itertools import product
#since the lists have the same value, we need to save it once and decide how many times repeat the product
List_1 = [5,6,7,8,9,10]
list_repetition = 2
result = []
for i in range(list_repetition):
result.extend(tuple(product(List_1, repeat=i+1)))
print(result)
And the output will be:
[(5,), (6,), (7,), (8,), (9,), (10,), (5, 5), (5, 6), (5, 7), (5, 8), (5, 9), (5, 10), (6, 5), (6, 6), (6, 7), (6, 8), (6, 9), (6, 10), (7, 5), (7, 6), (7, 7), (7, 8), (7, 9), (7, 10), (8, 5), (8, 6), (8, 7), (8, 8), (8, 9), (8, 10), (9, 5), (9, 6), (9, 7), (9, 8), (9, 9), (9, 10), (10, 5), (10, 6), (10, 7), (10, 8), (10, 9), (10, 10)]

Python - list of tuples from file

I have completed some rather intensive calculations, and i was not able to save my results in pickle (recursion depth exceded), so i was forced to print all the data and save it in a text file.
Is there any easy way to now convert my list of tuples in text to well... list of tuples in python? the output looks like this:
[(10, 5), (11, 6), (12, 5), (14, 5), (103360, 7), (16, 6), (102725, 7), (17, 6), (18, 5), (19, 9), (20, 6), ...(it continues for 60MB)]
You can use ast.literal_eval():
>>> s = '[(10, 5), (11, 6), (12, 5), (14, 5)]'
>>> res = ast.literal_eval(s)
[(10, 5), (11, 6), (12, 5), (14, 5)]
>>> res[0]
(10, 5)
string = "[(10, 5), (11, 6), (12, 5), (14, 5), (103360, 7), (16, 6), (102725, 7), (17, 6), (18, 5), (19, 9), (20, 6)]" # Read it from the file however you want
values = []
for t in string[1:-1].replace("),", ");").split("; "):
values.append(tuple(map(int, t[1:-1].split(", "))))
First I remove the start and end square bracket with [1:-1], I replace ), with ); to be able to split by ; so that the it foesn't split by the commas inside the tuples as they are not preceded by a ). Inside the loop I'm using [1:-1] to remove the parenthesis this time and splitting by the commas. The map part is to convert the numeric strs into ints and I'm appending them as a tuple.

Build 2 lists in one go while reading from file, pythonically

I'm reading a big file with hundreds of thousands of number pairs representing the edges of a graph. I want to build 2 lists as I go: one with the forward edges and one with the reversed.
Currently I'm doing an explicit for loop, because I need to do some pre-processing on the lines I read. However, I'm wondering if there is a more pythonic approach to building those lists, like list comprehensions, etc.
But, as I have 2 lists, I don't see a way to populate them using comprehensions without reading the file twice.
My code right now is:
with open('SCC.txt') as data:
for line in data:
line = line.rstrip()
if line:
edge_list.append((int(line.rstrip().split()[0]), int(line.rstrip().split()[1])))
reversed_edge_list.append((int(line.rstrip().split()[1]), int(line.rstrip().split()[0])))
I would keep your logic as it is the Pythonic approach just not split/rstrip the same line multiple times:
with open('SCC.txt') as data:
for line in data:
spl = line.split()
if spl:
i, j = map(int, spl)
edge_list.append((i, j))
reversed_edge_list.append((j, i))
Calling rstrip when you have already called it is redundant in itself even more so when you are splitting as that would already remove the whitespace so splitting just once means you save doing a lot of unnecessary work.
You can also use csv.reader to read the data and filter empty rows once you have a single whitespace delimiting:
from csv import reader
with open('SCC.txt') as data:
edge_list, reversed_edge_list = [], []
for i, j in filter(None, reader(data, delimiter=" ")):
i, j = int(i), int(j)
edge_list.append((i, j))
reversed_edge_list.append((j, i))
Or if there are multiple whitespaces delimiting you can use map(str.split, data):
for i, j in filter(None, map(str.split, data)):
i, j = int(i), int(j)
Whatever you choose will be faster than going over the data twice or splitting the sames lines multiple times.
You can't create two lists in one comprehension, so, instead of doing the same operations twice on the two lists, one viable option would be to initialize one of them and then create the second one by reversing each entry in the first one. That way you don't iterate over the file twice.
To that end, you could create the first list edge_list with a comprehension (not sure why you called rsplit again on it):
edge_list = [tuple(map(int, line.split())) for line in data]
And now go through each entry and reverse it with [::-1] in order to create its reversed sibling reverse_edge_list.
Using mock data for edge_list:
edge_list = [(1, 2), (3, 4), (5, 6)]
Reversing it could look like this:
reverse_edge_list = [t[::-1] for t in edge_list]
Which now looks like:
reverse_edge_list
[(2, 1), (4, 3), (6, 5)]
Maybe not clearer, but shorter:
with open('SCC.txt') as data:
process_line = lambda line, r: (int(line.rstrip().split()[r]), int(line.rstrip().split()[1-r]))
edge_list, reverved_edge_list = map(list, zip(*[(process_line(line, 0), process_line(line, 1))
for line in data
if line.rstrip()]))
Here comes a solution
A test file:
In[19]: f = ["{} {}".format(i,j) for i,j in zip(xrange(10), xrange(10, 20))]
In[20]: f
Out[20]:
['0 10',
'1 11',
'2 12',
'3 13',
'4 14',
'5 15',
'6 16',
'7 17',
'8 18',
'9 19']
One liner using comprehension, zip and map:
In[27]: l, l2 = map(list,zip(*[(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f]))
In[28]: l
Out[28]:
[(0, 10),
(1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15),
(6, 16),
(7, 17),
(8, 18),
(9, 19)]
In[29]: l2
Out[29]:
[(10, 0),
(11, 1),
(12, 2),
(13, 3),
(14, 4),
(15, 5),
(16, 6),
(17, 7),
(18, 8),
(19, 9)]
Explaining, with [(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f] we build a list containing a pair tuple with the pair tuples and its reversed forms:
In[24]: [(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f]
Out[24]:
[((0, 10), (10, 0)),
((1, 11), (11, 1)),
((2, 12), (12, 2)),
((3, 13), (13, 3)),
((4, 14), (14, 4)),
((5, 15), (15, 5)),
((6, 16), (16, 6)),
((7, 17), (17, 7)),
((8, 18), (18, 8)),
((9, 19), (19, 9))]
Applaying zip to the unpack form we split the tuples inside the main tuple, so we have 2 tuples containing the tuples pairs in the first and the reversed in the others:
In[25]: zip(*[(tuple(map(int, x.split())), tuple(map(int, x.split()))[::-1]) for x in f])
Out[25]:
[((0, 10),
(1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15),
(6, 16),
(7, 17),
(8, 18),
(9, 19)),
((10, 0),
(11, 1),
(12, 2),
(13, 3),
(14, 4),
(15, 5),
(16, 6),
(17, 7),
(18, 8),
(19, 9))]
Almost there, we just use map to transform that tuples into lists.
EDIT:
as #PadraicCunningham asked, for filtering empty lines, just add a if x in the comprehension [ ... for x in f if x]

Categories

Resources