How can I get a subfield of dictionary in mongodb? - python

I've the data structured as follows:
{
"_id" : ObjectId("61e810b946788906359966"),
"titles" : {
"full_name" : "full name",
"name" : "name"
},
"duration" : 161,
"work_ids" : {
"plasma_id" : "METRO_3423659",
"product_code" : "34324000",
}
}
I would like query result as:
{'full_name': 'full name', 'plasma_id': 'METRO_3423659'}
The query that i do is:
.find({query},{"_id": 0, "work_ids.plasma_id": 1, "titles.full_name": 1}})
but the result i get is 'titles': {'full_name': 'full name'}, 'work_ids': {'plasma_id': 'METRO_3423659'}}
There is any way to get directly the result i want? Thank you very much

Query
you can do it using paths, and adding names for the fields
Playmongo
aggregate(
[{"$project":
{"_id": 0,
"full_name": "$titles.full_name",
"plasma_id": "$work_ids.plasma_id"}}])

Related

Generalize algorithm for a loop comparing to last record?

I have a data set which I can represent by this toy example of a list of dictionaries:
data = [{
"_id" : "001",
"Location" : "NY",
"start_date" : "2022-01-01T00:00:00Z",
"Foo" : "fruits"
},
{
"_id" : "002",
"Location" : "NY",
"start_date" : "2022-01-02T00:00:00Z",
"Foo" : "fruits"
},
{
"_id" : "011",
"Location" : "NY",
"start_date" : "2022-02-01T00:00:00Z",
"Bar" : "vegetables"
},
{
"_id" : "012",
"Location" : "NY",
"Start_Date" : "2022-02-02T00:00:00Z",
"Bar" : "vegetables"
},
{
"_id" : "101",
"Location" : "NY",
"Start_Date" : "2022-03-01T00:00:00Z",
"Baz" : "pizza"
},
{
"_id" : "102",
"Location" : "NY",
"Start_Date" : "2022-03-2T00:00:00Z",
"Baz" : "pizza"
},
]
Here is an algorithm in Python which collects each of the keys in each 'collection' and whenever there is a key change, the algorithm adds those keys to output.
data_keys = []
for i, lst in enumerate(data):
all_keys = []
for k, v in lst.items():
all_keys.append(k)
if k.lower() == 'start_date':
start_date = v
this_coll = {'start_date': start_date, 'all_keys': all_keys}
if i == 0:
data_keys.append(this_coll)
else:
last_coll = data_keys[-1]
if this_coll['all_keys'] == last_coll['all_keys']:
continue
else:
data_keys.append(this_coll)
The correct output given here records each change of field name: Foo, Bar, Baz as well as the change of case in field start_date to Start_Date:
[{'start_date': '2022-01-01T00:00:00Z',
'all_keys': ['_id', 'Location', 'start_date', 'Foo']},
{'start_date': '2022-02-01T00:00:00Z',
'all_keys': ['_id', 'Location', 'start_date', 'Bar']},
{'start_date': '2022-02-02T00:00:00Z',
'all_keys': ['_id', 'Location', 'Start_Date', 'Bar']},
{'start_date': '2022-03-01T00:00:00Z',
'all_keys': ['_id', 'Location', 'Start_Date', 'Baz']}]
Is there a general algorithm which covers this pattern comparing current to previous item in a stack?
I need to generalize this algorithm and find a solution to do exactly the same thing with MongoDB documents in a collection. In order for me to discover if Mongo has an Aggregation Pipeline Operator which I could use, I must first understand if this basic algorithm has other common forms so I know what to look for.
Or someone who knows MongoDB aggregation pipelines really well could suggest operators which would produce the desired result?
EDIT: If you want to use a query for this, one option is something like:
The $objectToArray allow to format the keys as values, and the $ifNull allows to check several options of start_date.
The $unwind allows us to sort the keys.
The $group allow us to undo the $unwind, but now with sorted keys
$reduce to create a string from all keys, so we'll have something to compare.
group again, but now with our string, so we'll only have documents for changes.
db.collection.aggregate([
{
$project: {
data: {$objectToArray: "$$ROOT"},
start_date: {$ifNull: ["$start_date", "$Start_Date"]}
}
},
{$unwind: "$data"},
{$project: {start_date: 1, key: "$data.k", _id: 0}},
{$sort: {start_date: 1, key: 1}},
{$group: {_id: "$start_date", all_keys: {$push: "$key"}}},
{
$project: {
all_keys: 1,
all_keys_string: {
$reduce: {
input: "$all_keys",
initialValue: "",
in: {$concat: ["$$value", "$$this"]}
}
}
}
},
{
$group: {
_id: "$all_keys_string",
all_keys: {$first: "$all_keys"},
start_date: {$first: "$_id"}
}
},
{$unset: "_id"}
])
Playground example
itertools.groupby iterates subiterators when a key value has changed. It does the work of tracking a changing key for you. In your case, that's the keys of the dictionary. You can create a list comprehension that takes the first value from each of these subiterators.
import itertools
data = ... your data ...
data_keys = [next(val)
for _, val in itertools.groupby(data, lambda record: record.keys())]
for row in data_keys:
print(row)
Result
{'_id': '001', 'Location': 'NY', 'start_date': '2022-01-01T00:00:00Z', 'Foo': 'fruits'}
{'_id': '011', 'Location': 'NY', 'start_date': '2022-02-01T00:00:00Z', 'Bar': 'vegetables'}
{'_id': '012', 'Location': 'NY', 'Start_Date': '2022-02-02T00:00:00Z', 'Bar': 'vegetables'}
{'_id': '101', 'Location': 'NY', 'Start_Date': '2022-03-01T00:00:00Z', 'Baz': 'pizza'}

Find a subdocument in array PyMongo

I want to query what comments have been made by any User about machine learning book between '2020-03-15' and '2020-04-25', ordered the comments from the most recent to the least recent.
Here is my document.
lib_books = db.lib_books
document_book1 = ({
"bookid" : "99051fe9-6a9c-46c2-b949-38ef78858dd0",
"title" : "Machine learning",
"author" : "Tom Michael",
"date_of_first_publication" : "2000-10-02",
"number_of_pages" : 414,
"publisher" : "New York : McGraw-Hill",
"topics" : ["Machine learning", "Computer algorithms"],
"checkout_list" : [
{
"time_checked_out" : "2020-03-20 09:11:22",
"userid" : "ef1234",
"comments" : [
{
"comment1" : "I just finished it and it is worth learning!",
"time_commented" : "2020-04-01 10:35:13"
},
{
"comment2" : "Some cases are a little bit outdated.",
"time_commented" : "2020-03-25 13:19:13"
},
{
"comment3" : "Can't wait to learning it!!!",
"time_commented" : "2020-03-21 08:21:42"
}]
},
{
"time_checked_out" : "2020-03-04 16:18:02",
"userid" : "ab1234",
"comments" : [
{
"comment1" : "The book is a little bit difficult but worth reading.",
"time_commented" : "2020-03-20 12:18:02"
},
{
"comment2" : "It's hard and takes a lot of time to understand",
"time_commented" : "2020-03-15 11:22:42"
},
{
"comment3" : "I just start reading, the principle of model is well explained.",
"time_commented" : "2020-03-05 09:11:42"
}]
}]
})
I tried this code, but it returns nothing.
query_test = lib_books.find({"bookid": "99051fe9-6a9c-46c2-b949-38ef78858dd0", "checkout_list.comments.time_commented" : {"$gte" : "2020-03-20", "$lte" : "2020-04-20"}})
for x in query_test:
print(x)
Can you try this
pipeline = [{'$match':{'bookid':"99051fe9-6a9c-46c2-b949-38ef78858dd0"}},//bookid filter
{'$unwind':'$checkout_list'},
{'$unwind':'$checkout_list.comments'},
{'$match':{'checkout_list.comments.time_commented':{"$gte" : "2020-03-20", "$lte" : "2020-04-20"}}},
{'$project':{'_id':0,'bookid':1,'title':1,'comment':'$checkout_list.comments'}},
{'$sort':{'checkout_list.comments.time_commented':-1}}]
query_test = lib_books.aggregate(pipeline)
#{"bookid": "99051fe9-6a9c-46c2-b949-38ef78858dd0", "checkout_list.comments.time_commented" : {"$gte" : "2020-03-20", "$lte" : "2020-04-20"}})
for x in query_test:
print(x)
I would recommend that you maintain comment field as one name, rather than keeping it as 'comment1', 'comment2', etc. If the field had been 'comment', it can be brought to the root itself
Aggregate can be modified as below
pipeline = [{'$match':{'bookid':"99051fe9-6a9c-46c2-b949-38ef78858dd0"}},//bookid filter
{'$unwind':'$checkout_list'},
{'$unwind':'$checkout_list.comments'},
{'$match':{'checkout_list.comments.time_commented':{"$gte" : "2020-03-20", "$lte" : "2020-04-20"}}},
{'$project':{'_id':0,'bookid':1,'title':1,'comment':'$checkout_list.comments.comment','time_commented':'$checkout_list.comments.time_commented'}},
{'$sort':{'time_commented':-1}}]
MongoDB Query, in case if required
db.books.aggregate([
{$match:{'bookid':"99051fe9-6a9c-46c2-b949-38ef78858dd0"}},//bookid filter
{$unwind:'$checkout_list'},
{$unwind:'$checkout_list.comments'},
{$match:{'checkout_list.comments.time_commented':{"$gte" : "2020-03-20", "$lte" : "2020-04-20"}}},
{$project:{_id:0,bookid:1,title:1,comment:'$checkout_list.comments.comment',time_commented:'$checkout_list.comments.time_commented'}},
{$sort:{'time_commented':-1}}
])
if there are multiple documents that you need to search, then you can use $in condition.
{$match:{'bookid':{$in:["99051fe9-6a9c-46c2-b949-38ef78858dd0","99051fe9-6a9c-46c2-b949-38ef78858dd1"]}}},//bookid filter

How to check in pymongo that item is not in list field?

So lets say I have a record like this
{ "name" : "Kobe Bryant",
"jersey_numbers" : [8,24]
}
{ "name" : "Michael Jordan",
"jersey_numbers" : [23]
}
How can i find all the records where in field "jersey_number" number 23 is not included ?
You can use the $nin operator (https://docs.mongodb.com/manual/reference/operator/query/nin/)
players = db['players'].find({ 'jersey_numbers': { "$nin": [ 23 ] } })

Extracting and updating a dictionary from array of dictinaries in MongoDB

I have a structure like this:
{
"id" : 1,
"user" : "somebody",
"players" : [
{
"name" : "lala",
"surname" : "baba",
"player_place" : "1",
"start_num" : "123",
"results" : {
"1" : { ... }
"2" : { ... },
...
}
},
...
]
}
I am pretty new to MongoDB and I just cannot figure out how to extract results for a specific user (in this case "somebody", but there are many other users and each has an array of players and each player has many results) for a specific player with start_num.
I am using pymongo and this is the code I came up with:
record = collection.find(
{'user' : name}, {'players' : {'$elemMatch' : {'start_num' : start_num}}, '_id' : False}
)
This extracts players with specific player for a given user. That is good, but now I need to get specific result from results, something like this:
{ 'results' : { '2' : { ... } } }.
I tried:
record = collection.find(
{'user' : name}, {'players' : {'$elemMatch' : {'start_num' : start_num}}, 'results' : result_num, '_id' : False}
)
but that, of course, doesn't work. I could just turn that to list in Python and extract what I need, but I would like to do that with query in Mongo.
Also, what would I need to do to replace specific result in results for specific player for specific user? Let's say I have a new result with key 2 and I want to replace existing result that has key 2. Can I do it with same query as for find() (just replacing method find with method replace or find_and_replace)?
You can replace a specific result and the syntax for that should be something like this,
assuming you want to replace the result with key 1,
collection.updateOne({
"user": name,
"players.start_num": start_num
},
{ $set: { "players.$.results.1" : new_result }})

MongoDB aggregate compare with previous document

I have this query in Motor:
history = yield self.db.stat.aggregate([
{'$match': {'user_id': user.get('uid')}},
{'$sort': {'date_time': -1}},
{'$project': {'user_id': 1, 'cat_id': 1, 'doc_id': 1, 'date_time': 1}},
{'$group': {
'_id': '$user_id',
'info': {'$push': {'doc': '$doc_id', 'date': '$date_time', 'cat': '$cat_id'}},
'total': {'$sum': 1}
}},
{'$unwind': '$info'},
])
Documents in stat collection look like this:
{
"_id" : ObjectId("5788fa45bc54f428d8e77903"),
"vrr_id" : 2,
"date_time" : ISODate("2016-07-15T14:59:17.411Z"),
"ip" : "10.79.0.230",
"cat_id" : "rsl01",
"vrr_group" : ObjectId("55f6d1b5aaab934a00bae1a4"),
"col" : [
"dledu"
],
"vrr_type" : "TH",
"doc_type" : "local",
"user_id" : "696230",
"page" : null,
"method" : "OpenView",
"branch" : 9,
"sc" : 200,
"doc_id" : "004894802",
"spec" : 0
}
/* 40 */
{
"_id" : ObjectId("5788fa45bc54f428d8e77904"),
"vrr_id" : 2,
"date_time" : ISODate("2016-07-15T14:59:17.500Z"),
"ip" : "10.79.0.230",
"cat_id" : "rsl01",
"vrr_group" : ObjectId("55f6d1b5aaab934a00bae1a4"),
"col" : [
"autoref"
],
"vrr_type" : "TH",
"doc_type" : "open",
"user_id" : "696230",
"page" : null,
"method" : "OpenView",
"branch" : 9,
"sc" : 200,
"doc_id" : "000000002",
"spec" : "07"
}
I want to compare date_time field with date_time from previous document and if they are not equal (or not in timedelta within 5 seconds), include it in result.
Filtering this in Python was easy, is it possible in Mongo? How can I achieve this?
If you include some example documents from the "stat" collection I can give a more reliable answer. But with the information you've provided, I can guess. Add a stage something like:
{'$group': {'_id': '$info.date', 'info': {'$first': '$info'}}}
That gives you each document in the result list that has a distinct "date" from the previous document.
That said, if all you need is a distinct list of dates, this is simpler and faster:
db.stats.distinct("date_time")

Categories

Resources