Elegantly Generalising Sorting into Dictionaries in Python? - python

The list comprehension is a great structure for generalising working with lists in such a way that the creation of lists can be managed elegantly. Is there a similar tool for managing Dictionaries in Python?
I have the following functions:
# takes in 3 lists of lists and a column specification by which to group
def custom_groupby(atts, zmat, zmat2, col):
result = dict()
for i in range(0, len(atts)):
val = atts[i][col]
row = (atts[i], zmat[i], zmat2[i])
try:
result[val].append(row)
except KeyError:
result[val] = list()
result[val].append(row)
return result
# organises samples into dictionaries using the groupby
def organise_samples(attributes, z_matrix, original_z_matrix):
strucdict = custom_groupby(attributes, z_matrix, original_z_matrix, 'SecStruc')
strucfrontdict = dict()
for k, v in strucdict.iteritems():
strucfrontdict[k] = custom_groupby([x[0] for x in strucdict[k]],
[x[1] for x in strucdict[k]], [x[2] for x in strucdict[k]], 'Front')
samples = dict()
for k in strucfrontdict:
samples[k] = dict()
for k2 in strucfrontdict[k]:
samples[k][k2] = dict()
samples[k][k2] = custom_groupby([x[0] for x in strucfrontdict[k][k2]],
[x[1] for x in strucfrontdict[k][k2]], [x[2] for x in strucfrontdict[k][k2]], 'Back')
return samples
It seems like this is unwieldy. There being elegant ways to do almost everything in Python, I'm inclined to think I'm using Python wrongly.
More importantly, I'd like to be able to generalise this function better so that I can specify how many "layers" should be in the dictionary (without using several lambdas and approaching the problem in a Lisp style). I would like a function:
# organises samples into a dictionary by specified columns
# number of layers could also be assumed by number of criterion
def organise_samples(number_layers, list_of_strings_for_column_ids)
Is this possible to do in Python?
Thank you! Even if there isn't a way to do it elegantly in Python, any suggestions towards making the above code more elegant would be really appreciated.
::EDIT::
For context, the attributes object, z_matrix, and original_zmatrix are all lists of Numpy arrays.
Attributes might look like this:
Type,Num,Phi,Psi,SecStruc,Front,Back
11,181,-123.815,65.4652,2,3,19
11,203,148.581,-89.9584,1,4,1
11,181,-123.815,65.4652,2,3,19
11,203,148.581,-89.9584,1,4,1
11,137,-20.2349,-129.396,2,0,1
11,163,-34.75,-59.1221,0,1,9
The Z-matrices might both look like this:
CA-1, CA-2, CA-CB-1, CA-CB-2, N-CA-CB-SG-1, N-CA-CB-SG-2
-16.801, 28.993, -1.189, -0.515, 118.093, 74.4629
-24.918, 27.398, -0.706, 0.989, 112.854, -175.458
-1.01, 37.855, 0.462, 1.442, 108.323, -72.2786
61.369, 113.576, 0.355, -1.127, 111.217, -69.8672
Samples is a dict{num => dict {num => dict {num => tuple(attributes, z_matrix)}}}, having one row of the z-matrix.

The list comprehension is a great structure for generalising working with lists in such a way that the creation of lists can be managed elegantly. Is there a similar tool for managing Dictionaries in Python?
Have you tries using dictionary comprehensions?
see this great question about dictionary comperhansions

Related

accelerate comparing dictionary keys and values to strings in list in python

Sorry if this is trivial I'm still learning but I have a list of dictionaries that looks as follow:
[{'1102': ['00576', '00577', '00578', '00579', '00580', '00581']},
{'1102': ['00582', '00583', '00584', '00585', '00586', '00587']},
{'1102': ['00588', '00589', '00590', '00591', '00592', '00593']},
{'1102': ['00594', '00595', '00596', '00597', '00598', '00599']},
{'1102': ['00600', '00601', '00602', '00603', '00604', '00605']}
...]
it contains ~89000 dictionaries. And I have a list containing 4473208 paths. example:
['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv',
'/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv',
...]
and what I want to do is group each path that contains the grouped values in the dict in the folder containing the key together.
I tried using for loops like this:
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for file in ct_paths:
for key, val in elem.items():
if (file[16:20] == key) and (any(x in file[21:26] for x in val)):
temp1.append(file)
grpd_cts.append(temp1)
but this takes around 30hours. is there a way to make it more efficient? any itertools function or something?
Thanks a lot!
ct_paths is iterated repeatedly in your inner loop, and you're only interested in a little bit of it for testing purposes; pull that out and use it to index the rest of your data, as a dictionary.
What does make your problem complicated is that you're wanting to end up with the original list of filenames, so you need to construct a two-level dictionary where the values are lists of all originals grouped under those two keys.
ct_path_index = {}
for f in ct_paths:
ct_path_index.setdefault(f[16:20], {}).setdefault(f[21:26], []).append(f)
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for key, val in elem.items():
d2 = ct_path_index.get(key)
if d2:
for v in val:
v2 = d2.get(v)
if v2:
temp1 += v2
grpd_cts.append(temp1)
ct_path_index looks like this, using your data:
{'1102': {'00575': ['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv'],
'00578': ['/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv']}}
The use of setdefault (which can be a little hard to understand the first time you see it) is important when building up collections of collections, and is very common in these kinds of cases: it makes sure that the sub-collections are created on demand and then re-used for a given key.
Now, you've only got two nested loops; the inner checks are done using dictionary lookups, which are close to O(1).
Other optimizations would include turning the lists in dict_list into sets, which would be worthwhile if you made more than one pass through dict_list.

Python list comprehension: add values from a list into multiple separate lists

I want to add values into multiple lists from a single list with list comprehension. There are 4 values in the feature list, which should go into 4 separate lists.
my code:
features = features_time_domain(values, column_name)
feature_lists =[mean_list, std_list, max_list, min_list]
[x.append(y) for x,y in zip(feature_lists, features)]
The expected output should be something like this:
'x-axis_mean': [[0.010896254499999957,-0.01702540899999986 ,0.24993333400000006 ,-0.2805791479999999, -0.7675066368], 'x-axis_std': [4.100886585107956,3.8269951001730607, 4.19064980513631, 3.7522815775487643, 3.5302154102620498], 'x-axis_max': [6.2789803,7.668256, 11.604536, 9.384419, 7.5865335], 'x-axis_min': [-8.430995,-8.662541, -7.8861814,7.6546354,-5.175732 ]
but I get:
'x-axis_mean': [[0.010896254499999957, 4.100886585107956, 6.2789803, -8.430995], [-0.01702540899999986, 3.8269951001730607, 7.668256, -8.662541], [0.24993333400000006, 4.19064980513631, 11.604536, -7.8861814], [-0.7675066368, 3.7522815775487643, 9.384419, -7.6546354], [-0.2805791479999999, 3.5302154102620498, 7.5865335, -5.175732]], 'x-axis_std': [], 'x-axis_max': [], 'x-axis_min': []
I have looked at other post and tried to do something similar as them but my answer is off. I could do something like this, but it looks very ugly:
mean_list.append[features[0]]
std_list.append[features[1]]
max_list.append[features[2]]
min_list.append[features[3]]
Apparently this works:
mean_list.append[features[0]]
std_list.append[features[1]]
max_list.append[features[2]]
min_list.append[features[3]]
This is simple, declarative, and clear, although this would be even clearer:
mean, std, mx, mn = features
mean_list.append(mean)
max_list.append(mx)
std_list.append(std)
min_list.append(mn)
However if your data is in a dict, something like this would be clear:
results = dict(mean=[], std=[], mx=[], mn=[])
for features in get_features():
for i, k in enumerate(results):
results[k].append(features[i])

Output a dictionary based on inputs from another dictionary and two lists

I have a dictionary and two lists and would like to output another dictionary that contains the individual list as the title and sum of the list contents as the values however, I have no clue as to how I could do this.
results = {'Apple':'14.0', 'Banana':'12.0', 'Orange':'2.0', 'Pineapple':'9.0'}
ListA = ['Apple','Pineapple']
ListB = ['Banana','Orange']
Output:
dicttotal = {'ListA':'23.0', 'ListB':'14.0'}
Edit: I have decided to use pandas to work with the above data as I find that the simplicity of pandas is more suited for my level of understanding. Thanks for the help everyone!
in python you can use list comprehensions to make this easy to read:
items_for_a = [float(v) for i, v in results.items() if i in ListA]
total_a = sum(items_for_a)
the dicttotal you want to print is strange, though. I don't think you want to print variable names as dictionary keys.
in python2 you should use .iteritems() instead of .items()
You can use fllowing code get ListA's sum. The same way for ListB. Just try it yourself
dicttotal = {}
ListASum = 0.0
ListBSum = 0.0
for item in ListA:
if item in results:
ListASum += float(results[item])
dicttotal['ListA'] = ListASum
Reduce and dictionary comprehension to create dictionary with an initial value, followed by updating a single value. Had key names not been variable names maybe could have done dict comprehension for both.
from functools import reduce
d_total = {'ListA': str(reduce(lambda x, y: float(results[x]) + float(results[y]), lista))}
d_total['ListB'] = str(reduce(lambda x, y: float(results[x]) + float(results[y]), listb))
{'ListA': '23.0', 'ListB': '14.0'}
Note: PEP-8 snake_case for naming
One-liner, using the (ugly) eval() function, but as Eduard said, it's better not to use variable names as keys :
{list_name: str(sum([float(results[item]) for item in eval(list_name)])) for list_name in ['ListA', 'ListB']}

Is this the fastest way to build dict?

I am reading an element list from an xml file and make the data into 2 dictionaries.
Was this the fastest way? (I don't think this is the best, you guys always surprise me.;-)
ADict = {}
BDict = {}
for x in fields:
key = x.get('key')
ADict[key] = x.find('A').text
BDict[key] = x.find('B').text
I think add it one by one is a bad idea, but write it in a single line. aka more pythonic way like this
ADict,BDict = [dict(k) for k in zip(*([(x.get('key'),x.find('A').text),(x.get('key'),x.find('B').text)] for x in fields))]
I don't think it's better, two reasons,
first, x.get('key') was called twice
second, create too much temp tuples
Not tested, but should work
ADict = dict((x.get('key'), x.find('A').text) for x in fields)
BDict = dict((x.get('key'), x.find('B').text) for x in fields)

Comparing lists of dictionaries

I have two lists of test results. The test results are represented as dictionaries:
list1 = [{testclass='classname', testname='testname', testtime='...},...]
list2 = [{testclass='classname', testname='testname', ...},...]
The dictionary representation is slightly different in both lists, because for one list I have some
more information. But in all cases, every test dictionary in either list will have a classname and testname element which together effectively form a way of uniquely identifying the test and a way to compare it across lists.
I need to figure out all the tests that are in list1 but not in list2, as these represent new test failures.
To do this I do:
def get_new_failures(list1, list2):
new_failures = []
for test1 in list1:
for test2 in list2:
if test1['classname'] == test2['classname'] and \
test1['testname'] == test2['testname']:
break; # Not new breakout of inner loop
# Doesn't match anything must be new
new_failures.append(test1);
return new_failures;
I am wondering is a more python way of doing this. I looked at filters. The function the filter uses would need to get a handle to both lists. One is easy, but I am not sure how it would get a handle to both. I do know the contents of the lists until runtime.
Any help would be appreciated,
Thanks.
Try this:
def get_new_failures(list1, list2):
check = set([(d['classname'], d['testname']) for d in list2])
return [d for d in list1 if (d['classname'], d['testname']) not in check]
To compare two dict d1 and d2 on a subset of their keys, use:
all(d1[k] == d2[k] for k in ('testclass', 'testname'))
And if your two list have the same lenght, you can use zip() to pair them.
If each combination of classname and testname is truly unique, then the more computationally efficient approach would be to use two dictionaries instead of two lists. As key to the dictionary, use a tuple like so: (classname, testname). Then you can simply say if (classname, testname) in d: ....
If you need to preserve insertion order, and are using Python 2.7 or above, you could use an OrderedDict from the collections module.
The code would look something like this:
tests1 = {('classname', 'testname'):{'testclass':'classname',
'testname':'testname',...},
...}
tests2 = {('classname', 'testname'):{'testclass':'classname',
'testname':'testname',...},
...}
new_failures = [t for t in tests1 if t not in tests2]
If you must use lists for some reason, you could iterate over list2 to generate a set, and then test for membership in that set:
test1_tuples = ((d['classname'], d['testname']) for d in test1)
test2_tuples = set((d['classname'], d['testname']) for d in test2)
new_failures = [t for t in test1_tuples if t not in test2_tuples]

Categories

Resources