Comparing lists of dictionaries - python

I have two lists of test results. The test results are represented as dictionaries:
list1 = [{testclass='classname', testname='testname', testtime='...},...]
list2 = [{testclass='classname', testname='testname', ...},...]
The dictionary representation is slightly different in both lists, because for one list I have some
more information. But in all cases, every test dictionary in either list will have a classname and testname element which together effectively form a way of uniquely identifying the test and a way to compare it across lists.
I need to figure out all the tests that are in list1 but not in list2, as these represent new test failures.
To do this I do:
def get_new_failures(list1, list2):
new_failures = []
for test1 in list1:
for test2 in list2:
if test1['classname'] == test2['classname'] and \
test1['testname'] == test2['testname']:
break; # Not new breakout of inner loop
# Doesn't match anything must be new
new_failures.append(test1);
return new_failures;
I am wondering is a more python way of doing this. I looked at filters. The function the filter uses would need to get a handle to both lists. One is easy, but I am not sure how it would get a handle to both. I do know the contents of the lists until runtime.
Any help would be appreciated,
Thanks.

Try this:
def get_new_failures(list1, list2):
check = set([(d['classname'], d['testname']) for d in list2])
return [d for d in list1 if (d['classname'], d['testname']) not in check]

To compare two dict d1 and d2 on a subset of their keys, use:
all(d1[k] == d2[k] for k in ('testclass', 'testname'))
And if your two list have the same lenght, you can use zip() to pair them.

If each combination of classname and testname is truly unique, then the more computationally efficient approach would be to use two dictionaries instead of two lists. As key to the dictionary, use a tuple like so: (classname, testname). Then you can simply say if (classname, testname) in d: ....
If you need to preserve insertion order, and are using Python 2.7 or above, you could use an OrderedDict from the collections module.
The code would look something like this:
tests1 = {('classname', 'testname'):{'testclass':'classname',
'testname':'testname',...},
...}
tests2 = {('classname', 'testname'):{'testclass':'classname',
'testname':'testname',...},
...}
new_failures = [t for t in tests1 if t not in tests2]
If you must use lists for some reason, you could iterate over list2 to generate a set, and then test for membership in that set:
test1_tuples = ((d['classname'], d['testname']) for d in test1)
test2_tuples = set((d['classname'], d['testname']) for d in test2)
new_failures = [t for t in test1_tuples if t not in test2_tuples]

Related

Given two list of words, than return as dictionary and set together

Hey (Sorry bad english) so am going to try and make my question more clear. if i have a function let's say create_username_dict(name_list, username_list). which takes in two list's 1 being the name_list with names of people than the other list being usernames that is made out of the names of people. what i want to do is take does two list than convert them to a dictonary and set them together.
like this:
>>> name_list = ["Ola Nordmann", "Kari Olsen", "Roger Jensen"]
>>> username_list = ["alejon", "carli", "hanri"]
>>> create_username_dict(name_list, username_list)
{
"Albert Jones": "alejon",
"Carlos Lion": "carli",
"Hanna Richardo": "hanri"
}
i have tried look around on how to connect two different list in too one dictonary, but can't seem to find the right solution
If both lists are in matching order, i.e. the i-th element of one list corresponds to the i-th element of the other, then you can use this
D = dict(zip(name_list, username_list))
Use zip to pair the list.
d = {key: value for key,value in zip(name_list, username_list)}
print(d)
Output:
{'Ola Nordmann': 'alejon', 'Kari Olsen': 'carli', 'Roger Jensen': 'hanri'}
Considering both the list are same length and one to one mapping
name_list = ["Ola Nordmann", "Kari Olsen", "Roger Jensen"]
username_list = ["alejon", "carli", "hanri"]
result_stackoverflow = dict()
for index, name in enumerate(name_list):
result_stackoverflow[name] = username_list[index]
print(result_stackoverflow)
>>> {'Ola Nordmann': 'alejon', 'Kari Olsen': 'carli', 'Roger Jensen': 'hanri'}
Answer by #alex does the same but maybe too encapsulated for a beginner. So this is the verbose version.

accelerate comparing dictionary keys and values to strings in list in python

Sorry if this is trivial I'm still learning but I have a list of dictionaries that looks as follow:
[{'1102': ['00576', '00577', '00578', '00579', '00580', '00581']},
{'1102': ['00582', '00583', '00584', '00585', '00586', '00587']},
{'1102': ['00588', '00589', '00590', '00591', '00592', '00593']},
{'1102': ['00594', '00595', '00596', '00597', '00598', '00599']},
{'1102': ['00600', '00601', '00602', '00603', '00604', '00605']}
...]
it contains ~89000 dictionaries. And I have a list containing 4473208 paths. example:
['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv',
'/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv',
...]
and what I want to do is group each path that contains the grouped values in the dict in the folder containing the key together.
I tried using for loops like this:
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for file in ct_paths:
for key, val in elem.items():
if (file[16:20] == key) and (any(x in file[21:26] for x in val)):
temp1.append(file)
grpd_cts.append(temp1)
but this takes around 30hours. is there a way to make it more efficient? any itertools function or something?
Thanks a lot!
ct_paths is iterated repeatedly in your inner loop, and you're only interested in a little bit of it for testing purposes; pull that out and use it to index the rest of your data, as a dictionary.
What does make your problem complicated is that you're wanting to end up with the original list of filenames, so you need to construct a two-level dictionary where the values are lists of all originals grouped under those two keys.
ct_path_index = {}
for f in ct_paths:
ct_path_index.setdefault(f[16:20], {}).setdefault(f[21:26], []).append(f)
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for key, val in elem.items():
d2 = ct_path_index.get(key)
if d2:
for v in val:
v2 = d2.get(v)
if v2:
temp1 += v2
grpd_cts.append(temp1)
ct_path_index looks like this, using your data:
{'1102': {'00575': ['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv'],
'00578': ['/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv']}}
The use of setdefault (which can be a little hard to understand the first time you see it) is important when building up collections of collections, and is very common in these kinds of cases: it makes sure that the sub-collections are created on demand and then re-used for a given key.
Now, you've only got two nested loops; the inner checks are done using dictionary lookups, which are close to O(1).
Other optimizations would include turning the lists in dict_list into sets, which would be worthwhile if you made more than one pass through dict_list.

How to convert a list of tuples containing two lists into dictionary of key value pairs?

I have a list like this-
send_recv_pairs = [(['produce_send'], ['consume_recv']), (['Send'], ['Recv']), (['sender2'], ['receiver2'])]
I want something like
[ {['produce_send']:['consume_recv']},{['Send']:['Recv']},{['sender2']:['receiver2']}
How to do this?
You can not use list as the key of dictionary.
This Article explain the concept,
https://wiki.python.org/moin/DictionaryKeys
To be used as a dictionary key, an object must support the hash function (e.g. through hash), equality comparison (e.g. through eq or cmp), and must satisfy the correctness condition above.
And
lists do not provide a valid hash method.
>>> d = {['a']: 1}
TypeError: unhashable type: 'list'
If you want to specifically differentiate the key values you can use tuple as they hash able
{ (i[0][0], ): (i[1][0], ) for i in send_recv_pairs}
{('Send',): ('Recv',),
('produce_send',): ('consume_recv',),
('sender2',): ('receiver2',)}
You can't have lists as keys, only hashable types - strings, numbers, None and such.
If you still want to use a dictionary knowing that, then:
d={}
for tup in send_recv_pairs:
d[tup[0][0]]=tup[1]
If you want the value to be string as well, use tup[1][0] instead of tup[1]
As a one liner:
d={tup[0][0]]:tup[1] for tup in list} #tup[1][0] if you want values as strings
You can check it over here, in the second way of creating distionary.
https://developmentality.wordpress.com/2012/03/30/three-ways-of-creating-dictionaries-in-python/
A Simple way of doing it,
First of all, your tuple is tuple of lists, so better change it to tuple of strings (It makes more sense I guess)
Anyway simple way of working with your current tuple list can be like :
mydict = {}
for i in send_recv_pairs:
print i
mydict[i[0][0]]= i[1][0]
As others pointed out, you cannot use list as key to dictionary. So the term i[0][0] first takes the first element from the tuple - which is a list- and then the first element of list, which is the only element anyway for you.
Do you mean like this?
send_recv_pairs = [(['produce_send'], ['consume_recv']),
(['Send'], ['Recv']),
(['sender2'], ['receiver2'])]
send_recv_dict = {e[0][0]: e[1][0] for e in send_recv_pairs}
Resulting in...
>>> {'produce_send': 'consume_recv', 'Send': 'Recv', 'sender2': 'receiver2'}
As mentioned in other answers, you cannot use a list as a dictionary key as it is not hashable (see links in other answers).
You can therefore just use the values in your lists (assuming they stay as simple as in your example) to create the following two possibilities:
send_recv_pairs = [(['produce_send'], ['consume_recv']), (['Send'], ['Recv']), (['sender2'], ['receiver2'])]
result1 = {}
for t in send_recv_pairs:
result1[t[0][0]] = t[1]
# without any lists
result2 = {}
for t in send_recv_pairs:
result2[t[0][0]] = t[1][0]
Which respectively gives:
>>> result1
{'produce_send': ['consume_recv'], 'Send': ['Recv'], 'sender2': ['receiver2']}
>>> result2
{'produce_send': 'consume_recv', 'Send': 'Recv', 'sender2': 'receiver2'}
Try like this:
res = { x[0]: x[1] for x in pairs } # or x[0][0]: x[1][0] if you wanna store inner values without list-wrapper
It's for Python 3 and when keys are unique. If you need collect list of values per key, instead of single value, than you may use something like itertools.groupby or map+reduce. Wrote about this in comments and I'll provide example.
And yes, list cannot store key-values, only dict's, but maybe it's just typo in question.
You can not use list as the dictionary key, but instead you may type-cast it as tuple to create the dict object.
Below is the sample example using a dictionary comprehension:
>>> send_recv_pairs = [(['produce_send'], ['consume_recv']), (['Send'], ['Recv']), (['sender2'], ['receiver2'])]
>>> {tuple(k): v for k, v in send_recv_pairs}
{('sender2',): ['receiver2'], ('produce_send',): ['consume_recv'], ('Send',): ['Recv']}
For details, take a look at: Why can't I use a list as a dict key in python?
However if your nested tuple pairs were not list, but any other hashable object pairs, you may have type-casted it to dict for getting the desired result. For example:
>>> my_list = [('key1', 'value1'), ('key2', 'value2')]
>>> dict(my_list)
{'key1': 'value1', 'key2': 'value2'}

Elegantly Generalising Sorting into Dictionaries in Python?

The list comprehension is a great structure for generalising working with lists in such a way that the creation of lists can be managed elegantly. Is there a similar tool for managing Dictionaries in Python?
I have the following functions:
# takes in 3 lists of lists and a column specification by which to group
def custom_groupby(atts, zmat, zmat2, col):
result = dict()
for i in range(0, len(atts)):
val = atts[i][col]
row = (atts[i], zmat[i], zmat2[i])
try:
result[val].append(row)
except KeyError:
result[val] = list()
result[val].append(row)
return result
# organises samples into dictionaries using the groupby
def organise_samples(attributes, z_matrix, original_z_matrix):
strucdict = custom_groupby(attributes, z_matrix, original_z_matrix, 'SecStruc')
strucfrontdict = dict()
for k, v in strucdict.iteritems():
strucfrontdict[k] = custom_groupby([x[0] for x in strucdict[k]],
[x[1] for x in strucdict[k]], [x[2] for x in strucdict[k]], 'Front')
samples = dict()
for k in strucfrontdict:
samples[k] = dict()
for k2 in strucfrontdict[k]:
samples[k][k2] = dict()
samples[k][k2] = custom_groupby([x[0] for x in strucfrontdict[k][k2]],
[x[1] for x in strucfrontdict[k][k2]], [x[2] for x in strucfrontdict[k][k2]], 'Back')
return samples
It seems like this is unwieldy. There being elegant ways to do almost everything in Python, I'm inclined to think I'm using Python wrongly.
More importantly, I'd like to be able to generalise this function better so that I can specify how many "layers" should be in the dictionary (without using several lambdas and approaching the problem in a Lisp style). I would like a function:
# organises samples into a dictionary by specified columns
# number of layers could also be assumed by number of criterion
def organise_samples(number_layers, list_of_strings_for_column_ids)
Is this possible to do in Python?
Thank you! Even if there isn't a way to do it elegantly in Python, any suggestions towards making the above code more elegant would be really appreciated.
::EDIT::
For context, the attributes object, z_matrix, and original_zmatrix are all lists of Numpy arrays.
Attributes might look like this:
Type,Num,Phi,Psi,SecStruc,Front,Back
11,181,-123.815,65.4652,2,3,19
11,203,148.581,-89.9584,1,4,1
11,181,-123.815,65.4652,2,3,19
11,203,148.581,-89.9584,1,4,1
11,137,-20.2349,-129.396,2,0,1
11,163,-34.75,-59.1221,0,1,9
The Z-matrices might both look like this:
CA-1, CA-2, CA-CB-1, CA-CB-2, N-CA-CB-SG-1, N-CA-CB-SG-2
-16.801, 28.993, -1.189, -0.515, 118.093, 74.4629
-24.918, 27.398, -0.706, 0.989, 112.854, -175.458
-1.01, 37.855, 0.462, 1.442, 108.323, -72.2786
61.369, 113.576, 0.355, -1.127, 111.217, -69.8672
Samples is a dict{num => dict {num => dict {num => tuple(attributes, z_matrix)}}}, having one row of the z-matrix.
The list comprehension is a great structure for generalising working with lists in such a way that the creation of lists can be managed elegantly. Is there a similar tool for managing Dictionaries in Python?
Have you tries using dictionary comprehensions?
see this great question about dictionary comperhansions

Pythonic way to parse list of dictionaries for a specific attribute?

I want to cross reference a dictionary and django queryset to determine which elements have unique dictionary['name'] and djangoModel.name values, respectively. The way I'm doing this now is to:
Create a list of the dictionary['name'] values
Create a list of djangoModel.name values
Generate the list of unique values by checking for inclusion in those lists
This looks as follows:
alldbTests = dbp.test_set.exclude(end_date__isnull=False) #django queryset
vctestNames = [vctest['name'] for vctest in vcdict['tests']] #from dictionary
dbtestNames = [dbtest.name for dbtest in alldbTests] #from django model
# Compare tests in protocol in fortytwo's db with protocol from vc
obsoleteTests = [dbtest for dbtest in alldbTests if dbtest.name not in vctestNames]
newTests = [vctest for vctest in vcdict if vctest['name'] not in dbtestNames]
It feels unpythonic to have to generate the intermediate list of names (lines 2 and 3 above), just to be able to check for inclusion immediately after. Am I missing anything? I suppose I could put two list comprehensions in one line like this:
obsoleteTests = [dbtest for dbtest in alldbTests if dbtest.name not in [vctest['name'] for vctest in vcdict['tests']]]
But that seems harder to follow.
Edit:
Think of the initial state like this:
# vcdict is a list of django models where the following are all true
alldBTests[0].name == 'test1'
alldBTests[1].name == 'test2'
alldBTests[2].name == 'test4'
dict1 = {'name':'test1', 'status':'pass'}
dict2 = {'name':'test2', 'status':'pass'}
dict3 = {'name':'test5', 'status':'fail'}
vcdict = [dict1, dict2, dict3]
I can't convert to sets and take the difference unless I strip things down to just the name string, but then I lose access to the rest of the model/dictionary, right? Sets only would work here if I had the same type of object in both cases.
vctestNames = dict((vctest['name'], vctest) for vctest in vcdict['tests'])
dbtestNames = dict((dbtest.name, dbtest) for dbtest in alldbTests)
obsoleteTests = [vctestNames[key]
for key in set(vctestNames.keys()) - set(dbtestNames.keys())]
newTests = [dbtestNames[key]
for key in set(dbtestNames.keys()) - set(vctestNames.keys())]
You're working with basic set operations here. You could convert your objects to sets and just find the intersection (think Venn Diagrams):
obsoleteTests = list(set([a.name for a in alldbTests]) - set(vctestNames))
Sets are really useful when comparing two lists of objects (pseudopython):
set(a) - set(b) = [c for c in a and not in b]
set(a) + set(b) = [c for c in a or in b]
set(a).intersection(set(b)) = [c for c in a and in b]
The intersection- and difference-operations of sets should help you solve your problem more elegant.
But as you're originally dealing with dicts these examples and discussion may provide some inspirations: http://code.activestate.com/recipes/59875-finding-the-intersection-of-two-dicts

Categories

Resources