field selection within mongodb query using dot notation - python
I see a lot of similarly worded questions here, but none has solved my problem.
I have a document like this:
{'_id': ObjectId('5006916af9cf0e7126000000'),'data': [{'count': 0,'alis':'statsministeren','avis':'Ekstrabladet'}, {'count': 0,'alis':'thorning','avis':'Ekstrabladet'}, {'count': 0,'alis':'socialdemokratiets formand','avis':'Ekstrabladet'}, {'count': 0,'alis':'lars barfod','avis':'Ekstrabladet'}, {'count': 0,'alis':'formand for det konservative folkeparti','avis':'Ekstrabladet'}, {'count': 0,'alis':'s\xf8vndal','avis':'Ekstrabladet'}, {'count': 0,'alis': u"sf's formand",'avis':'Ekstrabladet'}, {'count': 0,'alis':'m\xf6ger','avis':'Ekstrabladet'}, {'count': 0,'alis':'lars l\xf8kke','avis':'Ekstrabladet'}, {'count': 0,'alis':'l\xf8kke rasmussen','avis':'Ekstrabladet'}, {'count': 0,'alis':'lederen af danmarks st\xf8rste parti','avis':'Ekstrabladet'}, {'count': 0,'alis':'Pia Kj\xe6rsgaard','avis':'Ekstrabladet'}, {'count': 0,'alis':'statsministeren','avis':'Information'}, {'count': 1,'alis':'thorning','avis':'Information'}, {'count': 0,'alis':'socialdemokratiets formand','avis':'Information'}, {'count': 0,'alis':'lars barfod','avis':'Information'}, {'count': 0,'alis':'formand for det konservative folkeparti','avis':'Information'}, {'count': 0,'alis':'s\xf8vndal','avis':'Information'}, {'count': 0,'alis': u"sf's formand",'avis':'Information'}, {'count': 0,'alis':'m\xf6ger','avis':'Information'}, {'count': 0,'alis':'lars l\xf8kke','avis':'Information'}, {'count': 0,'alis':'l\xf8kke rasmussen','avis':'Information'}, {'count': 0,'alis':'lederen af danmarks st\xf8rste parti','avis':'Information'}, {'count': 0,'alis':'Pia Kj\xe6rsgaard','avis':'Information'}, {'count': 0,'alis':'statsministeren','avis':'Berlingske'}, {'count': 0,'alis':'thorning','avis':'Berlingske'}, {'count': 0,'alis':'socialdemokratiets formand','avis':'Berlingske'}, {'count': 0,'alis':'lars barfod','avis':'Berlingske'}, {'count': 0,'alis':'formand for det konservative folkeparti','avis':'Berlingske'}, {'count': 1,'alis':'s\xf8vndal','avis':'Berlingske'}, {'count': 0,'alis': u"sf's formand",'avis':'Berlingske'}, {'count': 0,'alis':'m\xf6ger','avis':'Berlingske'}, {'count': 0,'alis':'lars l\xf8kke','avis':'Berlingske'}, {'count': 0,'alis':'l\xf8kke rasmussen','avis':'Berlingske'}, {'count': 0,'alis':'lederen af danmarks st\xf8rste parti','avis':'Berlingske'}, {'count': 0,'alis':'Pia Kj\xe6rsgaard','avis':'Berlingske'}, {'count': 0,'alis':'statsministeren','avis':'JP'}, {'count': 0,'alis':'thorning','avis':'JP'}, {'count': 0,'alis':'socialdemokratiets formand','avis':'JP'}, {'count': 0,'alis':'lars barfod','avis':'JP'}, {'count': 0,'alis':'formand for det konservative folkeparti','avis':'JP'}, {'count': 0,'alis':'s\xf8vndal','avis':'JP'}, {'count': 0,'alis': u"sf's formand",'avis':'JP'}, {'count': 1,'alis':'m\xf6ger','avis':'JP'}, {'count': 0,'alis':'lars l\xf8kke','avis':'JP'}, {'count': 0,'alis':'l\xf8kke rasmussen','avis':'JP'}, {'count': 0,'alis':'lederen af danmarks st\xf8rste parti','avis':'JP'}, {'count': 0,'alis':'Pia Kj\xe6rsgaard','avis':'JP'}, {'count': 0,'alis':'statsministeren','avis':'BT'}, {'count': 0,'alis':'thorning','avis':'BT'}, {'count': 0,'alis':'socialdemokratiets formand','avis':'BT'}, {'count': 0,'alis':'lars barfod','avis':'BT'}, {'count': 0,'alis':'formand for det konservative folkeparti','avis':'BT'}, {'count': 0,'alis':'s\xf8vndal','avis':'BT'}, {'count': 0,'alis': u"sf's formand",'avis':'BT'}, {'count': 0,'alis':'m\xf6ger','avis':'BT'}, {'count': 0,'alis':'lars l\xf8kke','avis':'BT'}, {'count': 0,'alis':'l\xf8kke rasmussen','avis':'BT'}, {'count': 0,'alis':'lederen af danmarks st\xf8rste parti','avis':'BT'}, {'count': 0,'alis':'Pia Kj\xe6rsgaard','avis':'BT'}, {'count': 0,'alis':'statsministeren','avis':'Politiken'}, {'count': 0,'alis':'thorning','avis':'Politiken'}, {'count': 0,'alis':'socialdemokratiets formand','avis':'Politiken'}, {'count': 0,'alis':'lars barfod','avis':'Politiken'}, {'count': 0,'alis':'formand for det konservative folkeparti','avis':'Politiken'}, {'count': 0,'alis':'s\xf8vndal','avis':'Politiken'}, {'count': 0,'alis': u"sf's formand",'avis':'Politiken'}, {'count': 0,'alis':'m\xf6ger','avis':'Politiken'}, {'count': 0,'alis':'lars l\xf8kke','avis':'Politiken'}, {'count': 0,'alis':'l\xf8kke rasmussen','avis':'Politiken'}, {'count': 0,'alis':'lederen af danmarks st\xf8rste parti','avis':'Politiken'}, {'count': 0,'alis':'Pia Kj\xe6rsgaard','avis':'Politiken'}],'time':'2012-07-18 12:35:22.241245'}
I.e.:
{_objectId : xxx, time: yyy, data :[ 72 similar dicts in this array ]}
I want to retrieve values from within one of the 72 dicts.
My first attempt was something along these lines:
db.observations.find({'data.avis':'Ekstrabladet', 'data.alis':'thorning'}, {'data.count':1})
That would retrieve 72 count dicts, when what I really wanted was the count value for the array which satisfies both avis:ekstrabladetand alis:thorning (only one array). But instead mongo returns the whole document.
I have foundt $elemMatch, but I get the same output.
db.observations.find({'data' : {$elemMatch: {'alis':'thorning','avis':'Ekstrabladet'}}},{'data.count':1})
I guess I could iterate over the complete document in python (this is for a flask app), but it doesn't seem very elegant.
So my question is: How do I reach inside a document and grab values from a nested document of arrarys?
Bonus: As I am new to all sorts of databases I only chose mongodb because it seemed very nice and flexible, and because I don't work with critcal data. But I have no need for scalability and could use e.g. sqlite instead. If you have strong opinions about me using the wrong tool for the job - then please abuse me.
You cannot return just the selected subdocument. You'll get all of them. So you'll have to filter on the client side.
$elemMatch is of the essence, though, otherwise you would not be matching avis and alis against the same array entry (having one each that matches either would suffice, AND vs OR in a way).
Related
How can I sort a dictionary by value? [duplicate]
This question already has answers here: How do I sort a dictionary by value? (34 answers) Closed 1 year ago. {'KRW-SOL': {'count': 3, 'tradeAmount': 437540}, 'KRW-LOOM': {'count': 78, 'tradeAmount': 21030768}, 'KRW-ONT': {'count': 14, 'tradeAmount': 947009}, 'KRW-FCT2': {'count': 1, 'tradeAmount': 491935}, 'KRW-DKA': {'count': 30, 'tradeAmount': 12053758} I want to sort by count or tradeAmount i want like this {'KRW-LOOM': {'count': 78, 'tradeAmount': 21030768}, 'KRW-DKA': {'count': 30, 'tradeAmount': 12053758} 'KRW-ONT': {'count': 14, 'tradeAmount': 947009}, 'KRW-SOL': {'count': 3, 'tradeAmount': 437540}, 'KRW-FCT2': {'count': 1, 'tradeAmount': 491935}}
sorted(cdDict.items(), key=lambda item: item[1]['tradeAmount'], reverse=True)
you can do something like below x = {'KRW-SOL': {'count': 3, 'tradeAmount': 437540}, 'KRW-LOOM': {'count': 78, 'tradeAmount': 21030768}, 'KRW-ONT': {'count': 14, 'tradeAmount': 947009}, 'KRW-FCT2': {'count': 1, 'tradeAmount': 491935}, 'KRW-DKA': {'count': 30, 'tradeAmount': 12053758}} #input # new dict sorted by count y = {k: v for k, v in sorted(x.items(), key=lambda item: item[1]['count'],reverse=True)} print(y) If you want sort by tradeAmount then, y = {k: v for k, v in sorted(x.items(), key=lambda item: item[1]['tradeAmount'],reverse=True)} print(y)
How to do error-handling of JSON Parser Loop
I found some elegant code that builds a list by iterating through each element of another JSON list: results = [ ( t["vintage"]["wine"]["winery"]["name"], t["vintage"]["year"], t["vintage"]["wine"]["id"], f'{t["vintage"]["wine"]["name"]} {t["vintage"]["year"]}', t["vintage"]["wine"]["statistics"]["ratings_average"], t["vintage"]["wine"]["statistics"]["ratings_count"], t["price"]["amount"], t["vintage"]["wine"]["region"]["name"], t["vintage"]["wine"]["style"]["name"], #<--------------issue here ) for t in r.json()["explore_vintage"]["matches"] ] The problem is that sometimes the JSON doesn't have a "name" element because the "style" is null (or None in JSON world). See the second-last line below for the JSON sample. Is there a simple way to handle this error? Error: matches[23]["vintage"]["wine"]["style"]["name"] Traceback (most recent call last): File "<ipython-input-94-59447d0d4859>", line 1, in <module> matches[23]["vintage"]["wine"]["style"]["name"] TypeError: 'NoneType' object is not subscriptable Perhaps something like: iferror(t["vintage"]["wine"]["style"]["name"], "DoesNotExist") JSON: {'id': 4026076, 'name': 'Shiraz - Petit Verdot', 'seo_name': 'shiraz-petit-verdot', 'type_id': 1, 'vintage_type': 0, 'is_natural': False, 'region': {'id': 685, 'name': 'South Eastern Australia', 'name_en': '', 'seo_name': 'south-eastern', 'country': {'code': 'au', 'name': 'Australia', 'native_name': 'Australia', 'seo_name': 'australia', 'sponsored': False, 'currency': {'code': 'AUD', 'name': 'Australian Dollars', 'prefix': '$', 'suffix': None}, 'regions_count': 120, 'users_count': 867353, 'wines_count': 108099, 'wineries_count': 13375, 'most_used_grapes': [{'id': 1, 'name': 'Shiraz/Syrah', 'seo_name': 'shiraz-syrah', 'has_detailed_info': True, 'wines_count': 536370}, {'id': 2, 'name': 'Cabernet Sauvignon', 'seo_name': 'cabernet-sauvignon', 'has_detailed_info': True, 'wines_count': 780931}, {'id': 5, 'name': 'Chardonnay', 'seo_name': 'chardonnay', 'has_detailed_info': True, 'wines_count': 586874}], 'background_video': None}, 'class': {'typecast_map': {'background_image': {}, 'class': {}}}, 'background_image': {'location': '//images.vivino.com/regions/backgrounds/0iT8wuQXRWaAmEGpPjZckg.jpg', 'variations': {'large': '//thumbs.vivino.com/region_backgrounds/0iT8wuQXRWaAmEGpPjZckg_1280x760.jpg', 'medium': '//thumbs.vivino.com/region_backgrounds/0iT8wuQXRWaAmEGpPjZckg_600x356.jpg'}}}, 'winery': {'id': 74363, 'name': 'Barramundi', 'seo_name': 'barramundi', 'status': 0, 'background_image': None}, 'taste': {'structure': None, 'flavor': [{'group': 'black_fruit', 'stats': {'count': 16, 'score': 2987}}, {'group': 'oak', 'stats': {'count': 11, 'score': 1329}}, {'group': 'red_fruit', 'stats': {'count': 10, 'score': 1413}}, {'group': 'spices', 'stats': {'count': 6, 'score': 430}}, {'group': 'non_oak', 'stats': {'count': 5, 'score': 126}}, {'group': 'floral', 'stats': {'count': 3, 'score': 300}}, {'group': 'earth', 'stats': {'count': 3, 'score': 249}}, {'group': 'microbio', 'stats': {'count': 2, 'score': 66}}, {'group': 'vegetal', 'stats': {'count': 1, 'score': 100}}, {'group': 'dried_fruit', 'stats': {'count': 1, 'score': 100}}]}, 'statistics': {'status': 'Normal', 'ratings_count': 1002, 'ratings_average': 3.5, 'labels_count': 11180, 'vintages_count': 25}, 'style': None, 'has_valid_ratings': True}
printing into a text file from dictionary key and values
I need some help with some code where it needs to go into the log file and it should look like this: I have the dictonary which holds the count value and the keys which is the event id, but I want to display it like that but I do not know how to since it comes out all at once and it does not print individually instead of 1 by 1 and I have used a nested dictionary to do this. This is an example of the dictionary which holds the count vals and keys which need to be printed. eventIDs = {1102: {'count': 0}, 4611: {'count': 0}, 4624: {'count': 0}, 4634: {'count': 0}, 4648: {'count': 0}, 4661: {'count': 0}, 4662: {'count': 0}, 4663: {'count': 0}, 4672: {'count': 0}, 4673: {'count': 0}, 4688: {'count': 0}, 4698: {'count': 0}, 4699: {'count': 0}, 4702: {'count': 0}, 4703: {'count': 0}, 4719: {'count': 0}, 4732: {'count': 0}, 4738: {'count': 0}, 4742: {'count': 0}, 4776: {'count': 0}, 4798: {'count': 0}, 4799: {'count': 0}, 4985: {'count': 0}, 5136: {'count': 0}, 5140: {'count': 0}, 5142: {'count': 0}, 5156: {'count': 0}, 5158: {'count': 0}} This is the code I have tried: def log_output(): with open('path' + timeStamp + '.txt', 'a') as VisualiseLog: event_id = list{eventIDs.keys()} event_count = list(eventIDs.values) for item in eventIDs: print(f'Event ID: {event_id}') VisualiseLog.write('Event ID: {event_id}') print(f'Event Count: {event_count}') VisualiseLog.write(f'Event Count: {event_count}')
Try this code: eventIDs = { 1102: {'count': 0}, 4611: {'count': 0} } timeStamp = "1234" def log_output(): with open('path' + timeStamp + '.txt', 'a') as VisualiseLog: for id in eventIDs: count = eventIDs[id]['count'] print(f'Event ID: {id}') VisualiseLog.write(f'Event ID: {id}\n') print(f'Event Count: {count}') VisualiseLog.write(f'Event Count: {count}\n\n') log_output() # Outputs: # Event ID: 1102 # Event Count: 0 # # Event ID: 4611 # Event Count: 0
Sorting nested dictionary in python by value
This is my Dictionary: {{'count': 5, 'leftCount': 5, 'length': '5', 'submittedTime': 1526815239}, {'count': 10, 'leftCount': 10, 'length': '5', 'submittedTime': 1526814198}, {'count': 5, 'leftCount': 5, 'length': '25', 'submittedTime': 1526815326}, {'count': 8, 'leftCount': 8, 'length': '25', 'submittedTime': 1526815326}, {'count': 5, 'leftCount': 5, 'length': '30', 'submittedTime': 1526815239}} I want to sort it by the value of the key="submittedTime". I have no idea how to make it work. I tried lambda but I think I'm doing something wrong because the result was exactly the same.
You might want to use OrderedDict: from collections import OrderedDict d = {0: {'count': 5, 'leftCount': 5, 'length': '5', 'submittedTime': 1526815239}, 1: {'count': 10, 'leftCount': 10, 'length': '5', 'submittedTime': 1526814198}, 2: {'count': 5, 'leftCount': 5, 'length': '25', 'submittedTime': 1526815326}, 3: {'count': 8, 'leftCount': 8, 'length': '25', 'submittedTime': 1526815326}, 4: {'count': 5, 'leftCount': 5, 'length': '30', 'submittedTime': 1526815239}} print(OrderedDict(sorted(d.items(), key=lambda t: t[1]['submittedTime']))) #OrderedDict([(1, {'count': 10, 'leftCount': 10, 'length': '5', 'submittedTime': 1526814198}), (0, {'count': 5, 'leftCount': 5, 'length': '5', 'submittedTime': 1526815239}), (4, {'count': 5, 'leftCount': 5, 'length': '30', 'submittedTime': 1526815239}), (2, {'count': 5, 'leftCount': 5, 'length': '25', 'submittedTime': 1526815326}), (3, {'count': 8, 'leftCount': 8, 'length': '25', 'submittedTime': 1526815326})])
try using lambda d = {0: {'count': 5, 'leftCount': 5, 'length': '5', 'submittedTime': 1526815239}, 1: {'count': 10, 'leftCount': 10, 'length': '5', 'submittedTime': 1526814198}, 2: {'count': 5, 'leftCount': 5, 'length': '25', 'submittedTime': 1526815326}, 3: {'count': 8, 'leftCount': 8, 'length': '25', 'submittedTime': 1526815326}, 4: {'count': 5, 'leftCount': 5, 'length': '30', 'submittedTime': 1526815239}} dd = sorted(d.items(),key=lambda x:x[1]['submittedTime'])
Find Most Relevant Child in Dictionary
Taking this dictionary: {'local': {'count': 7, 'dining-and-nightlife': {'count': 1, 'bar-clubs': {'count': 1} }, 'activities-events': {'count': 6, 'outdoor-adventures': {'count': 4}, 'life-skill-classes': {'count': 2} } }} How do I determine the most relevant match (within a 30% leeway)? For example, activities-events has a count of 6 so 6/7 = 85% and its child outdoor-adventures has a count of 4 out 6 (66%). So from this the most relevant category is outdoor-adventures. In this example: {'local': {'count': 11, 'dining-and-nightlife': {'count': 4, 'bar-clubs': {'count': 4} }, 'activities-events': {'count': 6, 'outdoor-adventures': {'count': 4}, 'life-skill-classes': {'count': 2} } }} Take both dining-and-nightlife (33%) with bar-clubs (100%) and activities-events (54%) with outdoor-aventures (66%). I was hoping the percentage cutoff to be determined by cutoff = 0.3 The idea here is to determine which category is most relevant removing the smaller results (below a 30%) match. #F.J answered this question below but now I wish to update the counts in the tree. Inital Output: {'local': {'activities-events': {'count': 6, 'life-skill-classes': {'count': 2}, 'outdoor-adventures': {'count': 4}}, 'count': 11, 'dining-and-nightlife': {'bar-clubs': {'count': 4}, 'count': 4}}} Post output: {'local': {'activities-events': {'count': 6, 'life-skill-classes': {'count': 2}, 'outdoor-adventures': {'count': 4}}, 'count': 10, 'dining-and-nightlife': {'bar-clubs': {'count': 4}, 'count': 4}}}
The following should work, note that this will modify your input dictionary in place: def keep_most_relevant(d, cutoff=0.3): for k, v in list(d.items()): if k == 'count': continue if 'count' in d and v['count'] < d['count'] * cutoff: del d[k] else: keep_most_relevant(v) Examples: >>> d1 = {'local': {'count': 7, 'dining-and-nightlife': {'count': 1, 'bar-clubs': {'count': 1}}, 'activities-events': {'count': 6, 'outdoor-adventures': {'count': 4}, 'life-skill-classes': {'count': 2}}}} >>> keep_most_relevant(d1) >>> pprint.pprint(d1) {'local': {'activities-events': {'count': 6, 'life-skill-classes': {'count': 2}, 'outdoor-adventures': {'count': 4}}, 'count': 7}} >>> d2 = {'local': {'count': 11, 'dining-and-nightlife': {'count': 4, 'bar-clubs': {'count': 4}}, 'activities-events': {'count': 6, 'outdoor-adventures': {'count': 4}, 'life-skill-classes': {'count': 2}}}} >>> keep_most_relevant(d2) >>> pprint.pprint(d2) {'local': {'activities-events': {'count': 6, 'life-skill-classes': {'count': 2}, 'outdoor-adventures': {'count': 4}}, 'count': 11, 'dining-and-nightlife': {'bar-clubs': {'count': 4}, 'count': 4}}}
def matches(match, cutoff): total = float(match['count']) for k in match: if k == 'count': continue score = match[k]['count'] / total if score >= cutoff: yield (k, score) m = list(matches(match[k], cutoff)) if m: yield max(m, key=lambda (c, s): s) def best_matches(d, cutoff): for k in d: for m in matches(d[k], cutoff): yield m Test 1 >>> d = {'local': {'count': 7, 'dining-and-nightlife': {'count': 1, 'bar-clubs': {'count': 1} }, 'activities-events': {'count': 6, 'outdoor-adventures': {'count': 4}, 'life-skill-classes': {'count': 2} } }} >>> print list(best_matches(d, 0.3)) [('activities-events', 0.8571428571428571), ('outdoor-adventures', 0.66666666666666663)] Test 2 >>> d = {'local': {'count': 11, 'dining-and-nightlife': {'count': 4, 'bar-clubs': {'count': 4} }, 'activities-events': {'count': 6, 'outdoor-adventures': {'count': 4}, 'life-skill-classes': {'count': 2} } }} >>> print list(best_matches(d, 0.3)) [('dining-and-nightlife', 0.36363636363636365), ('bar-clubs', 1.0), ('activities-events', 0.54545454545454541), ('outdoor-adventures', 0.66666666666666663)]