How to group nested fields in mongodb aggregation - python

My document is a little complicated, it is like:
{
'_id': 1,
'words':
{
'a':
{
'positions': [1,2,3],
'count':3,
},
'and':
{
'positions': [4, 5],
'count': 2,
}
}
}
I have many ducuments that contains very same words, I want aggregate all the fields in words. To give it me like:
{
'a': 5 #a's sum count
'and': 6 #and's sum count
}
I read this article, MongoDB - group composite key with nested fields, but unluckily, the structure is different, what they do is group the array field, not the nested field.
Any advice? Hope your answer and help, thanks in advance.

You can try following aggregation:
db.col.aggregate([
{
$project: {
wordsAsKeyValuePairs: {
$objectToArray: "$words"
}
}
},
{
$unwind: "$wordsAsKeyValuePairs"
},
{
$group: {
_id: "$wordsAsKeyValuePairs.k",
count: { $sum: "$wordsAsKeyValuePairs.v.count"}
}
}
])
To aggregate over your fields you need to use $objectToArray to decompose object into array of key-value pairs. Then we're just $unwinding such array to be able to group by k which is single word and sum all the counts.

Related

How do I update fields for multiple elements in an array with different values in MongoDB?

I have data of the form:
{
'_id': asdf123b51234
'field2': 0
'array': [
0: {
'unique_array_elem_id': id
'nested_field': {
'new_field_i_want_to_add': value
}
}
...
]
}
I have been trying to update like this:
for doc in update_dict:
collection.find_one_and_update(
{'_id':doc['_id']},
{'$set': {
'array.$[elem].nested_field.new_field_i_want_to_add':doc['new_field_value']
}
},
array_filters=[{'elem.unique_array_elem_id':doc['unique_array_elem_id']}]
But it is painfully slow. Updating all of my data will take several days running continuously. Is there a way to update this nested field for all array elements for a given document at once?
Thanks a lot

Sorted aggregation with group stage returns useless $first

Data
My collection is un-nested transaction data. Once or twice a year the third party changes their schema. I'm creating what I'm calling a schema change table listing transaction schemas and their effective_date. I need to initialize this table for pre-existing data in Mongo collection.
The important features in the original data look like this:
[_id, ..., [...], ..., g_uploaded_at, g_unique_id]
([...] are transaction features: Time, Amount, etc. not in the aggregation)
Aggregation with Group stage
Based on another post I have an aggregation pipeline which actually returns a row for each change in schema with three features in results:
all_keys, a concat of all features/headers
g_unique_id, some record which has these features
g_uploaded_at, should be the sorted date of the oldest/newest record
Problem
Results of this aggregation are not consistent. Recall, the goal is define the boundary of the schema change using sorting and grouping.
result = coll.aggregate([
{'$sort': {'g_uploaded_at': -1}},
{
'$project': {
'data': {'$objectToArray': "$$ROOT"},
'g_unique_id': 1,
'g_uploaded_at': 1}
},
{'$unwind': "$data"},
{'$project': {'g_uploaded_at': 1, 'g_unique_id': 1, 'key': "$data.k", '_id': 0}},
{'$sort': {'key': 1}},
{
'$group': {
'_id': "$g_unique_id",
'all_keys': {'$push': "$key"},
'g_uploaded_at': {'$first': "$g_uploaded_at"},
}
},
{
'$project': {
'all_keys': 1,
'g_uploaded_at': 1,
'all_keys_string': {
'$reduce': {
'input': "$all_keys",
'initialValue': "",
'in': {'$concat': ["$$value", "$$this"]}
}
}
}
},
{
'$group': {
'_id': "$all_keys_string",
'all_keys': {'$first': "$all_keys"},
'g_unique_id': {'$first': "$_id"},
'g_uploaded_at': {'$first': "$g_uploaded_at"},
}
},
{'$unset': "_id"},
])
Playground example
If I run this multiple times, feature all_keys with value BAZ will (eventually, keep trying) cycle through the different id's and dates in the toy data set (11 records).
Using $first as described here: A: mongo group query how to keep fields I'm new to Mongo. Maybe a rookie mistake?

how to add values using mongodb aggregation

is there any way to add values via aggregation
like db.insert_one
x = db.aggregate([{
"$addFields": {
"chat_id": -10013345566,
}
}])
i tried this
but this code return nothing and values are not updated
i wanna add the values via aggregation
cuz aggregation is way faster than others
sample document :
{"_id": 123 , "chat_id" : 125}
{"_id": 234, "chat_id" : 1325}
{"_id": 1323 , "chat_id" : 335}
expected output :
alternative to db.insert_one() in mongodb aggregation
You have to make use of $merge stage to save output of the aggregation to the collection.
Note: Be very very careful when you use $merge stage as you can accidentally replace the entire document in your collection. Go through the complete documentation of this stage before using it.
db.collection.aggregate([
{
"$match": {
"_id": 123
}
},
{
"$addFields": {
"chat_id": -10013345566,
}
},
{
"$merge": {
"into": "collection", // <- Collection Name
"on": "_id", // <- Merge operation match key
"whenMatched": "merge" // <- Operation to perform when matched
}
},
])
Mongo Playground Sample Execution

Multi-level Python Dict to Pandas DataFrame only processes one level out of many

I'm parsing some XML data, doing some logic on it, and trying to display the results in an HTML table. The dictionary, after filling, looks like this:
{
"general_info": {
"name": "xxx",
"description": "xxx",
"language": "xxx",
"prefix": "xxx",
"version": "xxx"
},
"element_count": {
"folders": 23,
"conditions": 72,
"listeners": 1,
"outputs": 47
},
"external_resource_count": {
"total": 9,
"extensions": {
"jar": 8,
"json": 1
},
"paths": {
"/lib": 9
}
},
"complexity": {
"over_1_transition": {
"number": 4,
"percentage": 30.769
},
"over_1_trigger": {
"number": 2,
"percentage": 15.385
},
"over_1_output": {
"number": 4,
"percentage": 30.769
}
}
}
Then I'm using pandas to convert the dictionary into a table, like so:
data_frame = pandas.DataFrame.from_dict(data=extracted_metrics, orient='index').stack().to_frame()
The result is a table that is mostly correct:
While the first and second levels seem to render correctly, those categories with a sub-sub category get written as a string in the cell, rather than as a further column. I've also tried using stack(level=1) but it raises an error "IndexError: Too many levels: Index has only 1 level, not 2". I've also tried making it into a series with no luck. It seems like it only renders "complete" columns. Is there a way of filling up the empty spaces in the dictionary before processing?
How can I get, for example, external_resource_count -> extensions to have two daughter rows jar and json, with an additional column for the values, so that the final table looks like this:
Extra credit if anyone can tell me how to get rid of the first row with the index numbers. Thanks!
The way you load the dataframe is correct but you should rename the 0 to a some column name.
# this function extracts all the keys from your nested dicts
def explode_and_filter(df, filterdict):
return [df[col].apply(lambda x:x.get(k) if type(x)==dict else x).rename(f'{k}')
for col,nested in filterdict.items()
for k in nested]
data_frame = pd.DataFrame.from_dict(data= extracted_metrics, orient='index').stack().to_frame(name='somecol')
#lets separate the rows where a dict is present & explode only those rows
mask = data_frame.somecol.apply(lambda x:type(x)==dict)
expp = explode_and_filter(data_frame[mask],
{'somecol':['jar', 'json', '/lib', 'number', 'percentage']})
# here we concat the exploded series to a frame
exploded_df = pd.concat(expp, axis=1).stack().to_frame(name='somecol2').reset_index(level=2)\.rename(columns={'level_2':'somecol'})
# and now we concat the rows with dict elements with the rows with non dict elements
out = pd.concat([data_frame[~mask], exploded_df])
The output dataframe looks like this

Mongodb aggregate query with condition

I have to perform aggregate on mongodb in python and unable to do so.
Below is the structure of mongodb document extracted:
{'Category': 'Male',
'details' :[{'name':'Sachin','height': 6},
{'name':'Rohit','height': 5.6},
{'name':'Virat','height': 5}
]
}
I want to return the height where name is Sachin by the aggregate function. Basically my idea is to extract data by $match apply condition and aggregate at the same time with aggregate function. This can be easily done by doing in 3 steps with if statements but i'm looking to do in 1 aggregate function.
Please note: there is not fixed length of 'details' value.
Let me know if any more explanation is needed.
You can do a $filter to achieve
db.collection.aggregate([
{
$project: {
details: {
$filter: {
input: "$details",
cond: {
$eq: [
"$$this.name",
"Sachin"
]
}
}
}
}
}
])
Working Mongo playground
If you use in find, but you need to be aware of positional operator
db.collection.find({
"details.name": "Sachin"
},
{
"details.$": 1
})
Working Mongo playground
If you need to make it as object, you can simply use $arrayElemAr with $ifNull

Categories

Resources