Finding particular value from a list by matching in Python

Finding particular value from a list by matching in Python - python

I have a small doubt in accessing the values from a list.
I have a list of elements
"result":[{"_id": "55b8b9913f32df094c7ba922", "Total": "450"},
{"_id": "55b8a2083f32df1030b9ef16", "Total": "400"}]
Here we basically get values from a list by doing list[0] or something like this based on the no. of list elements.
I would like to know if we can get only the particular value from the list by matching it with the _id. Since if the database is large it would be difficult to get the values by doing list[]
My actual code is:
id = self.body['_id']
test = yield db.Result.aggregate(
[
{ '$group': { '_id' : "$StudentId",
'Total': {'$max': "$Total"}}
}
]
)
list = test.get('result')
print(list)
I would like to get the total of the provided id only.

Use $match first to just get the documents you want.
id = self.body['_id']
test = yield db.Result.aggregate(
[
{ '$match': { '_id': id } },
{ '$group': { '_id' : "$StudentId",
'Total': {'$max': "$Total"}}
}
]
)
list = test.get('result')
print(list)
And for multiple values to match, use $in, declare an array
listOfIds = [id1,id2,id3]
And the pipeline change:
{ '$match': { '_id': { '$in': listOfIds } },
As I said earlier though:
Your "Total" field contains a "string". If you don't change that to be numeric you will get unexpected results. Strings sort differently to numbers. i.e "8" is greater that "100".
So you really should change that in your data.

Related

add the count of doc in list inside python code to a field in elasticsearch

I need to update a field of a doc in Elasticsearch and add the count of that doc in a list inside python code. The weight field contains the count of the doc in a dataset. The dataset needs to be updated from time to time.So the count of each document must be updated too. hashed_ids is a list of document ids that are in the new batch of data. the weight of matched id must be increased by the count of that id in hashed_ids.
I tried the code below but it does not work.
hashed_ids = [hashlib.md5(doc.encode('utf-8')).hexdigest() for doc in shingles]
update_with_query_body = {
"script": {
"source": "ctx._source.content_completion.weight +=param.count",
"lang": "painless",
"param": {
"count": hashed_ids.count("ctx.['_id']")
}
},
"query": {
"ids": {
"values": hashed_ids
}
}
}
for example let say a doc with id=d1b145716ce1b04ea53d1ede9875e05a and weight=5 is already present in index. and also the string d1b145716ce1b04ea53d1ede9875e05a is repeated three times in the hashed_ids so the update_with_query query shown above will match the doc in database. I need to add 3 to 5 and have 8 as final weight

I'm not aware of python but here is an e.g. based solution with a few assumptions.
Let's say the following is the hashed_ids extracted:
hashed_ids = ["id1","id1","id1","id2"]
To use it in terms query we can get just the unique list of ids, i.e.
hashed_ids_unique = ["id1", "id2"]
Lets assume the doc(s) are indexed with below structure:
PUT test/_doc/1
{
"id": "id1",
"weight":9
}
Now we can use update by query as below:
POST test/_update_by_query
{
"query":{
"terms": {
"id":["id1","id2"]
}
},
"script":{
"source":"long weightToAdd = params.hashed_ids.stream().filter(idFromList -> ctx._source.id.equals(idFromList)).count(); ctx._source.weight += weightToAdd;",
"params":{
"hashed_ids":["id1","id1","id1","id2"]
}
}
}
Explanation for script:
The following gives the count of matching ids in the hashed_ids list for the id of the current matching doc.
long weightToAdd = params.hashed_ids.stream().filter(idFromList -> ctx._source.id.equals(idFromList)).count();
The following adds up the weightToAdd to the existing value of weight in the document.
ctx._source.weight += weightToAdd;

How to push an item to an array and get the index of that item in MongoDB, Projection in find_one_and_update pymongo

I'd like to $push an item into an array and determine the index at which it was inserted. How can I do this with Mongo?
I need this to be atomic as multiple pushes can be happening in parallel on the document.
I'm using Python/PyMongo as the driver.

You can store the size of the array along with the array within the document and get that value after the update:
Sample input document: { '_id: 1', 'arr': [ "apple", "orange" ] }
The update operation - uses a pipeline for the update (available with MongoDB version 4.2):
NEW_ITEM = 'pear'
r = collection.find_one_and_update(
{ '_id': 1 },
[
{
'$set': {
'ix': { '$size': '$arr' },
'arr': { '$concatArrays': [ '$arr', [ NEW_ITEM ] ] }
}
}
],
projection = { '_id': False, 'ix': True },
return_document = ReturnDocument.AFTER
)
Another way is set the index of the newly inserted element within the same update operation (this can be used if array elements are unique):
[
{
'$set': {
'arr': { '$concatArrays': [ '$arr', [ NEW_ITEM ] ] }
},
'$set': {
'ix': { '$indexOfArray': [ '$arr', NEW_ITEM ] }
}
}
]

Updates to a single document in MongoDB are atomic, So if one update operation is writing to a document the following update operation has to wait until the first one finishes. So you can return the updated document & in code get the index of the newly pushed value(As $push will usually push to end of the array).
So when you use MongoDB's aggregation framework for reads - You can use $indexOfArray operator with $project stage to get the index of an element in an array. But projection is aggregation framework can accept lot more operators than projection in .find()'s or .findOneAndUpdate()'s. Getting index of an element in an array might not be possible with update operations, So using below query you can return the complete new array from updated document & using python try to get the index of element in new array.
Sample Doc :
{
_id: ObjectId("5eb773b8c4ec53c0626b167e"),
name: "YesMe",
ids: [1, 2, 3],
};
Query :
db.collection.find_one_and_update(
{ name: "YesMe" },
{ $push: { ids: 4 } },
(projection = { ids: True, _id: False }), // Projecting only `ids` field
(return_document = ReturnDocument.AFTER) // Returns updated doc (In your case updated array)
);
Output :
{ ids : [1, 2, 3, 4] }
Ref : Collection.find_one_and_update

Creating a list from data to include multiple entries then Iterate through list to return one string

What I have currently
I want to create a list from this data set to only include the name values, but have them grouped by each 'issue', so I then can iterate through this list and return one value based on priority.
{
"issues": [
{
"fields": {
"components": [],
"customfield_1": null,
"customfield_2": null
}
},
{
"fields": {
"components": [
{
"name": "Testing"
}
],
"customfield_1": null,
"customfield_2": null
}
},
{
"key": "3",
"fields": {
"components": [
{
"name": "Documentation"
},
{
"name": "Manufacturing"
}
],
"customfield_1": null,
"customfield_2": null
}
}
]
}
I want the output to look something like this:
['null', 'testing', ('Documentation', 'Manufacturing')]
I was able to accomplish this by the following code:
(sorry about the formatting, not sure how to make this looks better without having it on one line)
list((
'null' if len(item['fields']['components'])== 0
else
item['fields']['components'][0]['name']) if len(item['fields']['components'])==1
else
(item['fields']['components'][0]['name']), item['fields']['components'][1]['name'])))
for item in data['issues'])
The Problem
Now I need to have the value ("Documentation", "Manufacturing") from the above output to return only 1 component based on priority.
I think I need to iterate through something like ['Documentation', 'Testing', 'Manufacturing"]
so when it hits, lets say 'Documentation', it stops and returns only 'Documentation'. (this list is specific for priority and is in order of highest to lowest priority)
I want the final list to be ['null', 'Testing', 'Documentation']
I do not need the others to be changed, just the entry with multiple values.

How about the code below? I'm basically indexing the priority list and taking the minimum of it (since we're looking at the beginning to be of highest priority). We can switch this to max if the priority list ever changes.
Try this:
import json
with open("file.json") as f:
data = json.load(f)
result = []
priority = ['Documentation', 'Testing', 'Manufacturing']
for issue_dict in data['issues']:
if len(issue_dict["fields"]["components"]) == 0:
result.append([('null', 0)])
else: # Can support arbitrary number of custom fields
result.append([(list(name_dict.values())[0], priority.index(list(name_dict.values())[0])) for name_dict in issue_dict["fields"]["components"]])
print(result)
# [[('null', 0)], [('Testing', 1)], [('Documentation', 0), ('Manufacturing', 2)]]
result = [min(item, key= lambda x: x[1])[0] for item in result]
print(result)
#['null', 'Testing', 'Documentation']
For the nested lists: if the length is 1, then min will simply take the only choice obviously. For the others, we find the minimum index aka highest priority.
I've included some print statements strictly for debugging and for you to see if it makes sense. Hope this helps.

So i ended up doing it this way:
creating a function that will look if any "components" are in the given list, and will return and stop the iteration when the given component is found
def _define_component(multiple_component_list):
for components in ['Documentation', 'Testing', 'Manufacturing']:
if components in multiple_component_list:
return components
return 'Unknown'
and calling the function in my list comprehension with the expression i used when the length > 1 as the argument (same as original code except after the last 'else' statement)
list(('Unknown' if len(item['fields']['components']) == 0 else
item['fields']['components'][0]['name'] if len(
item['fields']['components']) == 1 else _define_component(
[item['fields']['components'][0]['name'],
item['fields']['components'][1]['name']]))for item in data['issues'])

Updating a value of dictionary in list of dictionaries in a collection using PyMongo

The collection structure that I have is as follows:
defaultdict(
lambda: {
'_id' : None,
'stuff' : [defaultdict(lambda : 0)]
})
I am trying to initialise a list of dictionaries that I'll keep on updating in a way that if a dictionary's key already exists then I'll increase its value by 1 otherwise I'll update the list of dictionary with the new key, value pair. e.g. is the value of stuff is [] and I have come across a value val then the value of stuff will be [{'val' : 1}] and further if I get a value val2 then stuff = [{'val' : 1}, {'val2' : 1}] and again if I get val, stuff should be [{'val' : 2}, {'val2' : 1}].
I tried this:
table.update({'_id' : data['_id']}, \
{'$inc' : {"stuff.$."+keyvalue : 1}})
where, data is a JSON object having an _id and a list of dictionaries stuff. Running this I received an OperationFailure: The positional operator did not find the match needed from the query.
I cannot figure out what to do? I am a Mongo newbie.

Quoting the documentation
When used with update operations, e.g. db.collection.update() and db.collection.findAndModify(),
the positional $ operator acts as a placeholder for the first element that matches the query document, and
the array field must appear as part of the query document.
So the stuff array should appear in your query. Now since you are doing conditional update you need to check if your array already has any subdocument with your keyvalue using $exists.
if not table.find_one({ "_id": data['_id'], "stuff": {"$elemMatch": {keyvalue: { "$exists" : True }}}}):
table.update({ "_id": data['_id'] }, { "$push": { "stuff": { keyvalue: 1 }}})
else:
table.update({ "_id": data['_id'], "stuff": { "$elemMatch": { keyvalue: { "$exists" : True}}}}, { "$inc": { "stuff.$." + keyvalue: 1}})

Getting the smaller number inside a set of lists in MongoDB

I'm storing some data in mongoDB and one of the values I'm storing is stored in a list:
{ "_id" : ObjectId("53e69fa04250631b68443a6d"), "uts" : [ 1407623152, 1407623477 ] }
{ "_id" : ObjectId("53e69f684250631b3d9645af"), "uts" : [ 1407622961 ] }
...
How can I get the smaller uts number between all the lists?

This is what the .aggregate() method is for as you are effectively manipulating the returned data:
db.collection.aggregate([
{ "$unwind": "$uts" },
{ "$group": {
"_id": None,
"uts": { "$min": "$uts" }
}}
])
Returning the smallest value from all documents in the collection. Or to just get the smallest value per document, supply the original _id value for the grouping key:
db.collection.aggregate([
{ "$unwind": "$uts" },
{ "$group": {
"_id": "$_id",
"uts": { "$min": "$uts" }
}}
])
The $unwind operator "de-normalizes" the array contents so that each entry effectively becomes a new document with all other values. The $group operator also does exactly what is says and "groups" documents by a given key. Here the $min operator is used on the "uts" field to find the smallest value.
The SQL to Aggregation Mapping document in the official documentation is a good place start for introduction to concepts.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding particular value from a list by matching in Python - python

Related

add the count of doc in list inside python code to a field in elasticsearch

How to push an item to an array and get the index of that item in MongoDB, Projection in find_one_and_update pymongo

Creating a list from data to include multiple entries then Iterate through list to return one string

Updating a value of dictionary in list of dictionaries in a collection using PyMongo

Getting the smaller number inside a set of lists in MongoDB

Categories

Resources