Related
This question already has answers here:
How do I sort a dictionary by value?
(34 answers)
Closed 1 year ago.
{'KRW-SOL': {'count': 3, 'tradeAmount': 437540},
'KRW-LOOM': {'count': 78, 'tradeAmount': 21030768},
'KRW-ONT': {'count': 14, 'tradeAmount': 947009},
'KRW-FCT2': {'count': 1, 'tradeAmount': 491935},
'KRW-DKA': {'count': 30, 'tradeAmount': 12053758}
I want to sort by count or tradeAmount
i want like this
{'KRW-LOOM': {'count': 78, 'tradeAmount': 21030768},
'KRW-DKA': {'count': 30, 'tradeAmount': 12053758}
'KRW-ONT': {'count': 14, 'tradeAmount': 947009},
'KRW-SOL': {'count': 3, 'tradeAmount': 437540},
'KRW-FCT2': {'count': 1, 'tradeAmount': 491935}}
sorted(cdDict.items(), key=lambda item: item[1]['tradeAmount'], reverse=True)
you can do something like below
x = {'KRW-SOL': {'count': 3, 'tradeAmount': 437540},
'KRW-LOOM': {'count': 78, 'tradeAmount': 21030768},
'KRW-ONT': {'count': 14, 'tradeAmount': 947009},
'KRW-FCT2': {'count': 1, 'tradeAmount': 491935},
'KRW-DKA': {'count': 30, 'tradeAmount': 12053758}} #input
# new dict sorted by count
y = {k: v for k, v in sorted(x.items(), key=lambda item: item[1]['count'],reverse=True)}
print(y)
If you want sort by tradeAmount then,
y = {k: v for k, v in sorted(x.items(), key=lambda item: item[1]['tradeAmount'],reverse=True)}
print(y)
I have a list with barline ticks and midi notes that can overlap the barlines. So I made a list of 'barlineticks':
barlinepos = [0, 768.0, 1536.0, 2304.0, 3072.0, 3840.0, 4608.0, 5376.0, 6144.0, 6912.0, 0, 576.0, 1152.0, 1728.0, 2304.0, 2880.0, 3456.0, 4032.0, 4608.0, 5184.0, 5760.0, 6336.0, 6912.0, 7488.0]
And a MidiFile:
{'type': 'time_signature', 'numerator': 4, 'denominator': 4, 'time': 0, 'duration': 768, 'ID': 0}
{'type': 'set_tempo', 'tempo': 500000, 'time': 0, 'ID': 1}
{'type': 'track_name', 'name': 'Tempo Track', 'time': 0, 'ID': 2}
{'type': 'track_name', 'name': 'New Instrument', 'time': 0, 'ID': 3}
{'type': 'note_on', 'time': 0, 'channel': 0, 'note': 48, 'velocity': 100, 'ID': 4, 'duration': 956}
{'type': 'time_signature', 'numerator': 3, 'denominator': 4, 'time': 768, 'duration': 6911, 'ID': 5}
{'type': 'note_on', 'time': 768, 'channel': 0, 'note': 46, 'velocity': 100, 'ID': 6, 'duration': 575}
{'type': 'note_off', 'time': 956, 'channel': 0, 'note': 48, 'velocity': 0, 'ID': 7}
{'type': 'note_off', 'time': 1343, 'channel': 0, 'note': 46, 'velocity': 0, 'ID': 8}
{'type': 'end_of_track', 'time': 7679, 'ID': 9}
And I want to check if the midi note is overlapping a barline. Every note_on message has a 'time' and a 'duration' value. I have to check if one of the barlineticks(in the list) is inside the range of the note('time' and 'duration'). I tried:
if barlinepos in range(0, 956):
print(True)
Of course this doesn't work because barlinepos is a list. How can I check if one of the values in the list results in True?
Simple iteration to solve the requirement:
for i in midifile:
start, end = i["time"], i["time"]+i["duration"]
for j in barlinepos:
if j >= start and j<= end:
print(True)
break
print(False)
Given the following dictionary created from df['statistics'].head().to_dict()
{0: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}},
1: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}},
2: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}},
3: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}},
4: {'executions': {'total': '1',
'passed': '1',
'failed': '0',
'skipped': '0'},
'defects': {'product_bug': {'total': 0, 'PB001': 0},
'automation_bug': {'AB001': 0, 'total': 0},
'system_issue': {'total': 0, 'SI001': 0},
'to_investigate': {'total': 0, 'TI001': 0},
'no_defect': {'ND001': 0, 'total': 0}}}}
Is there a way to expand the dictionary key/value pairs into their own columns and prefix these columns with the name of the original column, i.e. statisistics.executions.total would become statistics_executions_total or even executions_total?
I have demonstrated that I can create the columns using the following:
pd.concat([df.drop(['statistics'], axis=1), df['statistics'].apply(pd.Series)], axis=1)
However, you will notice that each of these newly created columns have a duplicate name "total".
I; however, have not been able to find a way to prefix the newly created columns with the original column name, i.e. executions_total.
For additional insight, statistics will expand into executions and defects and executions will expand into pass | fail | skipped | total and defects will expand into automation_bug | system_issue | to_investigate | product_bug | no_defect. The later will then expand into total | **001 columns where total is duplicated several times.
Any ideas are greatly appreciated. -Thanks!
.apply(pd.Series) is slow, don't use it.
See timing in Splitting dictionary/list inside a Pandas Column into Separate Columns
Create a DataFrame with a 'statistics' column from the dict in the OP.
This will create a DataFrame with a column of dictionaries.
Use pandas.json_normalize on the 'statistics' column.
The default sep is ..
Nested records will generate names separated by sep.
import pandas as pd
# this is for setting up the test dataframe from the data in the question, where data is the name of the dict
df = pd.DataFrame({'statistics': [v for v in data.values()]})
# display(df)
statistics
0 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
1 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
2 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
3 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
4 {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
# normalize the statistics column
dfs = pd.json_normalize(df.statistics)
# display(dfs)
total passed failed skipped product_bug.total product_bug.PB001 automation_bug.AB001 automation_bug.total system_issue.total system_issue.SI001 to_investigate.total to_investigate.TI001 no_defect.ND001 no_defect.total
0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0 0 0 0 0 0 0 0 0
3 1 1 0 0 0 0 0 0 0 0 0 0 0 0
4 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Taking this dictionary:
{'local': {'count': 7,
'dining-and-nightlife': {'count': 1,
'bar-clubs': {'count': 1}
},
'activities-events': {'count': 6,
'outdoor-adventures': {'count': 4},
'life-skill-classes': {'count': 2}
}
}}
How do I determine the most relevant match (within a 30% leeway)? For example, activities-events has a count of 6 so 6/7 = 85% and its child outdoor-adventures has a count of 4 out 6 (66%). So from this the most relevant category is outdoor-adventures.
In this example:
{'local': {'count': 11,
'dining-and-nightlife': {'count': 4,
'bar-clubs': {'count': 4}
},
'activities-events': {'count': 6,
'outdoor-adventures': {'count': 4},
'life-skill-classes': {'count': 2}
}
}}
Take both dining-and-nightlife (33%) with bar-clubs (100%) and activities-events (54%) with
outdoor-aventures (66%).
I was hoping the percentage cutoff to be determined by
cutoff = 0.3
The idea here is to determine which category is most relevant removing the smaller results (below a 30%) match.
#F.J answered this question below but now I wish to update the counts in the tree.
Inital Output:
{'local': {'activities-events': {'count': 6,
'life-skill-classes': {'count': 2},
'outdoor-adventures': {'count': 4}},
'count': 11,
'dining-and-nightlife': {'bar-clubs': {'count': 4}, 'count': 4}}}
Post output:
{'local': {'activities-events': {'count': 6,
'life-skill-classes': {'count': 2},
'outdoor-adventures': {'count': 4}},
'count': 10,
'dining-and-nightlife': {'bar-clubs': {'count': 4}, 'count': 4}}}
The following should work, note that this will modify your input dictionary in place:
def keep_most_relevant(d, cutoff=0.3):
for k, v in list(d.items()):
if k == 'count':
continue
if 'count' in d and v['count'] < d['count'] * cutoff:
del d[k]
else:
keep_most_relevant(v)
Examples:
>>> d1 = {'local': {'count': 7, 'dining-and-nightlife': {'count': 1, 'bar-clubs': {'count': 1}}, 'activities-events': {'count': 6, 'outdoor-adventures': {'count': 4}, 'life-skill-classes': {'count': 2}}}}
>>> keep_most_relevant(d1)
>>> pprint.pprint(d1)
{'local': {'activities-events': {'count': 6,
'life-skill-classes': {'count': 2},
'outdoor-adventures': {'count': 4}},
'count': 7}}
>>> d2 = {'local': {'count': 11, 'dining-and-nightlife': {'count': 4, 'bar-clubs': {'count': 4}}, 'activities-events': {'count': 6, 'outdoor-adventures': {'count': 4}, 'life-skill-classes': {'count': 2}}}}
>>> keep_most_relevant(d2)
>>> pprint.pprint(d2)
{'local': {'activities-events': {'count': 6,
'life-skill-classes': {'count': 2},
'outdoor-adventures': {'count': 4}},
'count': 11,
'dining-and-nightlife': {'bar-clubs': {'count': 4}, 'count': 4}}}
def matches(match, cutoff):
total = float(match['count'])
for k in match:
if k == 'count':
continue
score = match[k]['count'] / total
if score >= cutoff:
yield (k, score)
m = list(matches(match[k], cutoff))
if m: yield max(m, key=lambda (c, s): s)
def best_matches(d, cutoff):
for k in d:
for m in matches(d[k], cutoff):
yield m
Test 1
>>> d = {'local': {'count': 7,
'dining-and-nightlife': {'count': 1,
'bar-clubs': {'count': 1}
},
'activities-events': {'count': 6,
'outdoor-adventures': {'count': 4},
'life-skill-classes': {'count': 2}
}
}}
>>> print list(best_matches(d, 0.3))
[('activities-events', 0.8571428571428571), ('outdoor-adventures', 0.66666666666666663)]
Test 2
>>> d = {'local': {'count': 11,
'dining-and-nightlife': {'count': 4,
'bar-clubs': {'count': 4}
},
'activities-events': {'count': 6,
'outdoor-adventures': {'count': 4},
'life-skill-classes': {'count': 2}
}
}}
>>> print list(best_matches(d, 0.3))
[('dining-and-nightlife', 0.36363636363636365), ('bar-clubs', 1.0), ('activities-events', 0.54545454545454541), ('outdoor-adventures', 0.66666666666666663)]
I see a lot of similarly worded questions here, but none has solved my problem.
I have a document like this:
{'_id': ObjectId('5006916af9cf0e7126000000'),'data': [{'count': 0,'alis':'statsministeren','avis':'Ekstrabladet'}, {'count': 0,'alis':'thorning','avis':'Ekstrabladet'}, {'count': 0,'alis':'socialdemokratiets formand','avis':'Ekstrabladet'}, {'count': 0,'alis':'lars barfod','avis':'Ekstrabladet'}, {'count': 0,'alis':'formand for det konservative folkeparti','avis':'Ekstrabladet'}, {'count': 0,'alis':'s\xf8vndal','avis':'Ekstrabladet'}, {'count': 0,'alis': u"sf's formand",'avis':'Ekstrabladet'}, {'count': 0,'alis':'m\xf6ger','avis':'Ekstrabladet'}, {'count': 0,'alis':'lars l\xf8kke','avis':'Ekstrabladet'}, {'count': 0,'alis':'l\xf8kke rasmussen','avis':'Ekstrabladet'}, {'count': 0,'alis':'lederen af danmarks st\xf8rste parti','avis':'Ekstrabladet'}, {'count': 0,'alis':'Pia Kj\xe6rsgaard','avis':'Ekstrabladet'}, {'count': 0,'alis':'statsministeren','avis':'Information'}, {'count': 1,'alis':'thorning','avis':'Information'}, {'count': 0,'alis':'socialdemokratiets formand','avis':'Information'}, {'count': 0,'alis':'lars barfod','avis':'Information'}, {'count': 0,'alis':'formand for det konservative folkeparti','avis':'Information'}, {'count': 0,'alis':'s\xf8vndal','avis':'Information'}, {'count': 0,'alis': u"sf's formand",'avis':'Information'}, {'count': 0,'alis':'m\xf6ger','avis':'Information'}, {'count': 0,'alis':'lars l\xf8kke','avis':'Information'}, {'count': 0,'alis':'l\xf8kke rasmussen','avis':'Information'}, {'count': 0,'alis':'lederen af danmarks st\xf8rste parti','avis':'Information'}, {'count': 0,'alis':'Pia Kj\xe6rsgaard','avis':'Information'}, {'count': 0,'alis':'statsministeren','avis':'Berlingske'}, {'count': 0,'alis':'thorning','avis':'Berlingske'}, {'count': 0,'alis':'socialdemokratiets formand','avis':'Berlingske'}, {'count': 0,'alis':'lars barfod','avis':'Berlingske'}, {'count': 0,'alis':'formand for det konservative folkeparti','avis':'Berlingske'}, {'count': 1,'alis':'s\xf8vndal','avis':'Berlingske'}, {'count': 0,'alis': u"sf's formand",'avis':'Berlingske'}, {'count': 0,'alis':'m\xf6ger','avis':'Berlingske'}, {'count': 0,'alis':'lars l\xf8kke','avis':'Berlingske'}, {'count': 0,'alis':'l\xf8kke rasmussen','avis':'Berlingske'}, {'count': 0,'alis':'lederen af danmarks st\xf8rste parti','avis':'Berlingske'}, {'count': 0,'alis':'Pia Kj\xe6rsgaard','avis':'Berlingske'}, {'count': 0,'alis':'statsministeren','avis':'JP'}, {'count': 0,'alis':'thorning','avis':'JP'}, {'count': 0,'alis':'socialdemokratiets formand','avis':'JP'}, {'count': 0,'alis':'lars barfod','avis':'JP'}, {'count': 0,'alis':'formand for det konservative folkeparti','avis':'JP'}, {'count': 0,'alis':'s\xf8vndal','avis':'JP'}, {'count': 0,'alis': u"sf's formand",'avis':'JP'}, {'count': 1,'alis':'m\xf6ger','avis':'JP'}, {'count': 0,'alis':'lars l\xf8kke','avis':'JP'}, {'count': 0,'alis':'l\xf8kke rasmussen','avis':'JP'}, {'count': 0,'alis':'lederen af danmarks st\xf8rste parti','avis':'JP'}, {'count': 0,'alis':'Pia Kj\xe6rsgaard','avis':'JP'}, {'count': 0,'alis':'statsministeren','avis':'BT'}, {'count': 0,'alis':'thorning','avis':'BT'}, {'count': 0,'alis':'socialdemokratiets formand','avis':'BT'}, {'count': 0,'alis':'lars barfod','avis':'BT'}, {'count': 0,'alis':'formand for det konservative folkeparti','avis':'BT'}, {'count': 0,'alis':'s\xf8vndal','avis':'BT'}, {'count': 0,'alis': u"sf's formand",'avis':'BT'}, {'count': 0,'alis':'m\xf6ger','avis':'BT'}, {'count': 0,'alis':'lars l\xf8kke','avis':'BT'}, {'count': 0,'alis':'l\xf8kke rasmussen','avis':'BT'}, {'count': 0,'alis':'lederen af danmarks st\xf8rste parti','avis':'BT'}, {'count': 0,'alis':'Pia Kj\xe6rsgaard','avis':'BT'}, {'count': 0,'alis':'statsministeren','avis':'Politiken'}, {'count': 0,'alis':'thorning','avis':'Politiken'}, {'count': 0,'alis':'socialdemokratiets formand','avis':'Politiken'}, {'count': 0,'alis':'lars barfod','avis':'Politiken'}, {'count': 0,'alis':'formand for det konservative folkeparti','avis':'Politiken'}, {'count': 0,'alis':'s\xf8vndal','avis':'Politiken'}, {'count': 0,'alis': u"sf's formand",'avis':'Politiken'}, {'count': 0,'alis':'m\xf6ger','avis':'Politiken'}, {'count': 0,'alis':'lars l\xf8kke','avis':'Politiken'}, {'count': 0,'alis':'l\xf8kke rasmussen','avis':'Politiken'}, {'count': 0,'alis':'lederen af danmarks st\xf8rste parti','avis':'Politiken'}, {'count': 0,'alis':'Pia Kj\xe6rsgaard','avis':'Politiken'}],'time':'2012-07-18 12:35:22.241245'}
I.e.:
{_objectId : xxx, time: yyy, data :[ 72 similar dicts in this array ]}
I want to retrieve values from within one of the 72 dicts.
My first attempt was something along these lines:
db.observations.find({'data.avis':'Ekstrabladet', 'data.alis':'thorning'}, {'data.count':1})
That would retrieve 72 count dicts, when what I really wanted was the count value for the array which satisfies both avis:ekstrabladetand alis:thorning (only one array). But instead mongo returns the whole document.
I have foundt $elemMatch, but I get the same output.
db.observations.find({'data' : {$elemMatch: {'alis':'thorning','avis':'Ekstrabladet'}}},{'data.count':1})
I guess I could iterate over the complete document in python (this is for a flask app), but it doesn't seem very elegant.
So my question is: How do I reach inside a document and grab values from a nested document of arrarys?
Bonus: As I am new to all sorts of databases I only chose mongodb because it seemed very nice and flexible, and because I don't work with critcal data. But I have no need for scalability and could use e.g. sqlite instead. If you have strong opinions about me using the wrong tool for the job - then please abuse me.
You cannot return just the selected subdocument. You'll get all of them. So you'll have to filter on the client side.
$elemMatch is of the essence, though, otherwise you would not be matching avis and alis against the same array entry (having one each that matches either would suffice, AND vs OR in a way).