Python sort nested dictionary - python

I want to sort this nested dictionary twice. First, I want to sort by time, and then by key. This is a nested nested dictionary. The time should be filtered first and then by keys ("FileNameXXX") of the inner dictionary.
data = {1: {"05:00:00": {"FileName123": "LineString1"}},
2: {"16:00:00": {"FileName456": "LineString2"}},
3: {"07:00:00": {"FileName789": "LineString3"}},
4: {"07:00:00": {"FileName555": "LineString4"}}}
Expected Result:
1: {"05:00:00": {"FileName123": "LineString1"}}
3: {"07:00:00": {"FileName789": "LineString3"}}
4: {"07:00:00": {"FileName555": "LineString4"}}
2: {"16:00:00": {"FileName456": "LineString2"}}

You can achieve that by building some notion of value for each entry in data. For example, I defined the "value" of a data entry in the following function but notice that it heavily relies on having exactly one key inside the second nested dict which must also be strictly a time formatted as string.
def get_comparable(key):
raw_time = list(data[key].keys())[0]
time = datetime.strptime(raw_time, "%H:%M:%S").time()
return time.hour * 3600 + time.minute * 60 + time.second + key * 0.001
The you can just use:
for k in sorted(data, key=get_comparable):
print(k, data[k])
output:
1 {'05:00:00': {'FileName123': 'LineString1'}}
3 {'07:00:00': {'FileName789': 'LineString3'}}
4 {'07:00:00': {'FileName555': 'LineString4'}}
2 {'16:00:00': {'FileName456': 'LineString2'}}
Using
sorted(data, key=lambda x: list(data[x].keys())[0])
will produce the same output but be careful and notice that it will not take into account the values of first level keys (the numbers) and that will sort the times lexicographically.

Related

Nested FOR cycle wih a dictionary. How to start checking the second FOR from the value the first FOR uses

So I have a dictionary in which the tag is the patient code or name(string), and the number associated is the consult time(int). For example I have this:
{'0001': 15, 'Charles': 20} The person '0001' takes 15 minutes in a consult, the person 'Charles' takes 20 minutes.
I do not know in advance where there is a name or a number identifying a person(but they are both strings), that is, I don't know where Charles is located in the dictionary or even if it exists at all
I want to run this dictionary to check the patients that have the same consult time and if they have I want to run a certain function which is not relevant. For now I have this:
for pat_i in self.MD_dict:
for pat_j in self.MD_dict:
if(self.MD_dict[pat_i]==self.MD_dict[pat_j]):
aux=aux+1
funtion(pat_i, pat_j);
However I want j to start from i+1. How do I do that?
I tried using the range fucntion:
for pat_i in range(len(self.MD_dict)):
for pat_j in range(i+1,len(self.MD_dict)):
if(self.MD_dict[pat_i]==self.MD_dict[pat_j]):
aux=aux+1
funtion(pat_i, pat_j);
But because pat_i could be a string this doens't work . For example with pat_i=1 and pat_j=2 the code necessary for the if to work would be '0001' and Charles. But since pat_i!='0001' and pat_j!='Charles' it doesn't work
In the original dictionary, patient name/id is key and time is value. In the new dictionary, time is key and list of paitent name/id is the value.
This is the idea of making a dictionary for inverse look up and then combine patient name/id with the same time into a list.
def make_inverse_dict(x):
# Time is key. List of patients is value
time_dict = {}
for name, time in x.items():
if time in time_dict:
time_dict[time].append(name)
else:
time_dict[time] = [name]
return time_dict
def f(p1, p2):
print(f" {p1}, {p2}")
def make_patient_pair(patient_list):
# Triangular loop to make pairs of patients
# without duplication
n_patients = len(patient_list)
out = []
for i in range(n_patients):
for j in range(i):
pair = (patient_list[i], patient_list[j])
out.append(pair)
return out
def main():
patient_dict = {'0001': 15,
'0002': 10,
'Charles': 20,
'Ann': 20,
'0003': 15}
time_dict = make_inverse_dict(patient_dict)
for time, patient_list in time_dict.items():
n_patients = len(patient_list)
print("time: ", time)
if n_patients > 1:
for p1, p2 in make_patient_pair(patient_list):
f(p1, p2)
main()
result:
time: 15
0003, 0001
time: 10
time: 20
Ann, Charles
The issue with your code is that you are iterating through it more than once. On the second iteration, you then check if MD_dict[pat_i] is equal to MD_dict[pat_j]. This means that you are going to get double the amount of data because at one point, the second loop will reach the same exact point that the first loop is on. For example, when the first loop reaches a value, let's say, "Charles," the second loop will first start at the beginning and then it will eventually reach "Charles." Obviously, the value for "Charles" isn't going to change between the first and second loop - this will result in an instance in which you would be passing "Charles" and "Charles" into your function, which is not what you want.
Let's say MD_dict is:
MD_dict = {'0001': 15, 'Charles': 20, '0002':15, 'Alex': 20, 'Jack':15}
What we can do is use enumerate to make sure these duplicates don't happen:
pats = list(MD_dict.keys())
enum = list(enumerate(pats))
Above, enum pairs the patients' name and the index at which they appear:
[(0, '0001'), (1, 'Charles'), (2, '0002'), (3, 'Alex'), (4, 'Jack')]
Then, we can iterate over enum twice:
for x, pat_i in enum:
for y,pat_j in enum:
if x != y and MD_dict[pat_i] == MD_dict[pat_j]:
function(pat_i, pat_j) # apply function here
Notice how x != y appears in the conditional statement. This is to avoid pairing the same person with itself. This ensures that the second loop will not consider the person that the first loop is on. The result is:
function(0001, 0002) # consult times == 15
function(0001, Jack) # consult times == 15
function(Charles, Alex) # consult times == 20
function(0002, 0001) # consult times == 15
function(0002, Jack) # consult times == 15
function(Alex, Charles) # consult times == 20
function(Jack, 0001) # consult times == 15
function(Jack, 0002) # consult times == 15
However, there is one issue with the function above, and I am not totally clear on what you were asking for in the question. This problem is that the function will be applied to "Charles" and "Alex" and then "Alex" and "Charles" later on. This is because we run through the dictionary twice, so it's going to pick up "Alex" as a match when the first loop hits "Charles" and it will pick up "Charles" as a match when the first loop hits "Alex." If this is not what you want, we can slice enum on the second loop:
pats = list(MD_dict.keys())
enum = list(enumerate(pats))
for x, pat_i in enum:
for y,pat_j in enum[x+1:]:
if MD_dict[pat_i] == MD_dict[pat_j]:
function(pat_i, pat_j) # apply function here
Above, for y,pat_j in enum[x+1:] will only consider the people that follow the person that the first loop is on. We then do not have to check if x != y. The output is as follows:
function(0001, 0002) # consult times == 15
function(0001, Jack) # consult times == 15
function(Charles, Alex) # consult times == 20
function(0002, Jack) # consult times == 15
sorted_items = sorted(self.MD_dict.items(), key=lambda x: x[1])
for i, (pat_i, pat_i_val) in enumerate(sorted_items):
for (pat_j, pat_j_val) in sorted_items[i+1:]
if pat_i_val == pat_j_val:
aux=aux+1
funtion(pat_i, pat_j)

How to find minimum value in sorted dictionary with repeat elements?

I have a dictionary with keys as items and values as their prices. I have to print the cheapest item.
if input is dict_1={'mobile1':11000, 'mobile2':11000, 'mobile3':11000}
then output - mobile1: 11000 in case of a tie in values, print whichever item came first should be the output.
and if input is {'mobile1':10000, 'mobile2':9000, 'mobile3':13000}
output is - mobile2: 9000.
My code is working for 2nd input set but failing for 1st input list where values are same.
dict_1={'mobile1':11000, 'mobile2':11000, 'mobile3':11000}
mobile=list(dict_1.keys())
price=list(dict_1.values())
for key,val in dict_1.items():
if dict_1[key]==min(price):
print('{0}: {1}'.format(key, val))
Expected output:
mobile1: 11000
actual result :
mobile1: 11000
mobile2: 11000
mobile3: 11000
You can use the min() function for this.
>>> dict_1={'mobile1':11000, 'mobile2':11000, 'mobile3':11000}
>>> min(dict_1, key=dict_1.get)
'mobile1'
>>>
You can try this:
dict_1 = {'mobile1':11000, 'mobile2':11000, 'mobile3':11000}
ans = sorted(dict_1.keys(), key = lambda x: dict_1[x])[0]
print(str(ans) + ': ' + str(dict_1[ans]))

Working with a pair in lists

I extract two values with a statement from a dataframe via:
date = data_audit.loc[data_audit.Audit == audit) & data_audit.Meilenstein == phase1), 'Planned_Date']
division = data_audit.loc[(data_audit.Audit == audit) & (data_audit.Meilenstein == phase1), 'Ber']
After that extract, I transform these values...
x = date.tolist()
y = division.tolist()
.. and append it to a list
time.extend((x, y))
My result in pycharm is (after looping the .extend through some values):
[[100], [A], [200], [A], [100], [B]]
My first question: Why is the result not like:
[([100], [A]), ([200], [A]), ([100], [B])] ?
My second question: I want to calculate the average of all first items (the integers) and of all first items (the integers) per exec (exec=A, B)
Result would be: All: 133, 33 | A: 150 | B: 100
How can I access all values of the "first value" of the pair in my list [(firstvalue,secondvalue),(,)...]
For example:
time= np.round(np.mean(timeCleaned[ACCESS_ALL_"FIRST"_VALUES_IN_MY_LIST]), 2)
Thank you!
edit: Variable names.
extend unpacks and appends each item of an iterable to your list. Use append instead:
time.append((x, y))

Speed up dictionary merging with soft conjunction logic

I have a look-up table which contains <word: dictionary>pairs.
Then, given a word list,
I can produce a dictionary list using this look-up table.
(Each time, the length of this word list is not fixed).
Values in these dictionaries represent log probability of some keys.
Here is an example:
Given a word list
['fruit','animal','plant'],
we can check out the look-up table and have
dict_list = [{'apple':-1, 'flower':-2}, {'apple':-3, 'dog':-1}, {'apple':-2, 'flower':-1}].
We can see from the list that we have a set of keys: {'apple', 'flower', 'dog'}
For each key, I want to give a sum of each value in the dict_list. And if a key is not existed in one dictionary, then we add a small value -10 to the value (you can regard -10 as an very small log probability).
The result dictionary looks like:
dict_merge = {'apple':-6, 'flower':-13, 'dog':-21},
because 'apple' = (-1) + (-3) + (-2), 'flower' = (-2) + (-10) + (-1), 'dog' = (-10) + (-1) + (-10)
Here is my python3 code:
dict_list = [{'apple':-1, 'flower':-2}, {'apple':-3, 'dog':-1}, {'apple':-2, 'flower':-1}]
key_list = []
for dic in dict_list:
key_list.extend(dic.keys())
dict_merge = dict.fromkeys(key_list, 0)
for key in dict_merge:
for dic in dict_list:
dict_merge[key] += dic.get(key, -10)
This code works, but if the sizes of some dictionaries in dict_list are super large (for example 100,000), then it could take over 200ms, which is not acceptable in practice.
The main computation is in the for key in dict_merge loop, imagine it is a loop of size 100,000.
Is there any speed-up solutions? Thanks! And, thanks for reading~ maybe too long and too annoying...
P.S.
There are only a few dictionaries in the look-up table have super large size. So there could be some chances here.
As I can understand, sum(len(d) for d in dict_list) is much smaller then len(key_list) * len(dict_list).
from collections import defaultdict
dict_list = [{'apple':-1, 'flower':-2}, {'apple':-3, 'dog':-1}, {'apple':-2, 'flower':-1}]
default_value = len(dict_list) * (-10)
dict_merge = defaultdict(lambda: default_value)
for d in dict_list:
for key, value in d.items():
dict_merge[key] += value + 10

Calculating means of values for subgroups of keys in python dictionary

I have a dictionary which looks like this:
cq={'A1_B2M_01':2.04, 'A2_B2M_01':2.58, 'A3_B2M_01':2.80, 'B1_B2M_02':5.00,
'B2_B2M_02':4.30, 'B2_B2M_02':2.40 etc.}
I need to calculate mean of triplets, where the keys[2:] agree. So, I would ideally like to get another dictionary which will be:
new={'_B2M_01': 2.47, '_B2M_02': 3.9}
The data is/should be in triplets so in theory I could just get the means of the consecutive values, but first of all, I have it in a dictionary so the keys/values will likely get reordered, besides I'd rather stick to the names, as a quality check for the triplets assigned to names (I will later add a bit showing error message when there will be more than three per group).
I've tried creating a dictionary where the keys would be _B2M_01 and _B2M_02 and then loop through the original dictionary to first append all the values that are assigned to these groups of keys so I could later calculate an average, but I am getting errors even in the first step and anyway, I am not sure if this is the most effective way to do this...
cq={'A1_B2M_01':2.4, 'A2_B2M_01':5, 'A3_B2M_01':4, 'B1_B2M_02':3, 'B2_B2M_02':7, 'B3_B2M_02':6}
trips=set([x[2:] for x in cq.keys()])
new={}
for each in trips:
for k,v in cq.iteritems():
if k[2:]==each:
new[each].append(v)
Traceback (most recent call last):
File "<pyshell#28>", line 4, in <module>
new[each].append(v)
KeyError: '_B2M_01'
I would be very grateful for any suggestions. It seems like a fairly easy operation but I got stuck.
An alternative result which would be even better would be to get a dictionary which contains all the names used as in cq, but with values being the means of the group. So the end result would be:
final={'A1_B2M_01':2.47, 'A2_B2M_01':2.47, 'A3_B2M_01':2.47, 'B1_B2M_02':3.9,
'B2_B2M_02':3.9, 'B2_B2M_02':3.9}
Something like this should work. You can probably make it a little more elegant.
cq = {'A1_B2M_01':2.04, 'A2_B2M_01':2.58, 'A3_B2M_01':2.80, 'B1_B2M_02':5.00, 'B2_B2M_02':4.30, 'B2_B2M_02':2.40 }
sum = {}
count = {}
mean = {}
for k in cq:
if k[2:] in sum:
sum[k[2:]] += cq[k]
count[k[2:]] += 1
else:
sum[k[2:]] = cq[k]
count[k[2:]] = 1
for k in sum:
mean[k] = sum[k] / count[k]
cq={'A1_B2M_01':2.4, 'A2_B2M_01':5, 'A3_B2M_01':4, 'B1_B2M_02':3, 'B2_B2M_02':7, 'B3_B2M_02':6}
sums = dict()
for k, v in cq.iteritems():
_, p2 = k.split('_', 1)
if p2 not in sums:
sums[p2] = [0, 0]
sums[p2][0] += v
sums[p2][1] += 1
res = {}
for k, v in sums.iteritems():
res[k] = v[0]/float(v[1])
print res
also could be done with one iteration
Grouping:
SEPARATOR = '_'
cq={'A1_B2M_01':2.4, 'A2_B2M_01':5, 'A3_B2M_01':4, 'B1_B2M_02':3, 'B2_B2M_02':7, 'B3_B2M_02':6}
groups = {}
for key in cq:
group_key = SEPARATOR.join(key.split(SEPARATOR)[1:])
if group_key in groups:
groups[group_key].append(cq[key])
else:
groups[group_key] = [cq[key]]
Generate means:
def means(groups):
for group, group_vals in groups.iteritems():
yield (group, float(sum(group_vals)) / len(group_vals),)
print list(means(groups))

Categories

Resources