Sorting by multiple params in pyes and elasticsearch

Sorting by multiple params in pyes and elasticsearch - python

I can pass a single sort parameter to the search query in pyes like this:
s = MatchAllQuery()
conn.search(query=Search(s), indexes=["test"], sort='_score')
But I need to pass an extra parameter to sort the docs with the same score, like this:
{
"sort": [
"_score",
{
"extra_param": {
"order": "asc"
}
}
],
"query": {
"term": {
"match_all": {}
}
}
}
How can I do this in pyes?
Thanks

If you'd like the results in the result set with the same score to be ordered by price, append price to the sort string:
s = MatchAllQuery()
conn.search(query=Search(s), indexes=["test"], sort='_score,price')
By default the sort order is ascending. To pass the sort order append :asc or :desc to the sort parameter
s = MatchAllQuery()
conn.search(query=Search(s), indexes=["test"], sort='_score,price:desc')

If you want to do more detailed sorting that what's available via es.search's sort keyword, you can just pass the search dict into the es.Search constructor.
s = Search({'term': {'foo.monkey': 'george'}},
sort=[{'_geo_distance': {'unit': 'mi',
'order': 'desc',
'monkey.location': '81,20'}}])
conn.search(s)

Related

Automatically entering next JSON level using Python in a similar way to JQ in bash

I am trying to use Python to extract pricePerUnit from JSON. There are many entries, and this is just 2 of them -
{
"terms": {
"OnDemand": {
"7Y9ZZ3FXWPC86CZY": {
"7Y9ZZ3FXWPC86CZY.JRTCKXETXF": {
"offerTermCode": "JRTCKXETXF",
"sku": "7Y9ZZ3FXWPC86CZY",
"effectiveDate": "2020-11-01T00:00:00Z",
"priceDimensions": {
"7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7": {
"rateCode": "7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7",
"description": "Processed translation request in AWS GovCloud (US)",
"beginRange": "0",
"endRange": "Inf",
"unit": "Character",
"pricePerUnit": {
"USD": "0.0000150000"
},
"appliesTo": []
}
},
"termAttributes": {}
}
},
"CQNY8UFVUNQQYYV4": {
"CQNY8UFVUNQQYYV4.JRTCKXETXF": {
"offerTermCode": "JRTCKXETXF",
"sku": "CQNY8UFVUNQQYYV4",
"effectiveDate": "2020-11-01T00:00:00Z",
"priceDimensions": {
"CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7": {
"rateCode": "CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7",
"description": "$0.000015 per Character for TextTranslationJob:TextTranslationJob in EU (London)",
"beginRange": "0",
"endRange": "Inf",
"unit": "Character",
"pricePerUnit": {
"USD": "0.0000150000"
},
"appliesTo": []
}
},
"termAttributes": {}
}
}
}
}
}
The issue I run into is that the keys, which in this sample, are 7Y9ZZ3FXWPC86CZY, CQNY8UFVUNQQYYV4.JRTCKXETXF, and CQNY8UFVUNQQYYV4.JRTCKXETXF.6YS6EN2CT7 are a changing string that I cannot just type out as I am parsing the dictionary.
I have python code that works for the first level of these random keys -
with open('index.json') as json_file:
data = json.load(json_file)
json_keys=list(data['terms']['OnDemand'].keys())
#Get the region
for i in json_keys:
print((data['terms']['OnDemand'][i]))
However, this is tedious, as I would need to run the same code three times to get the other keys like 7Y9ZZ3FXWPC86CZY.JRTCKXETXF and 7Y9ZZ3FXWPC86CZY.JRTCKXETXF.6YS6EN2CT7, since the string changes with each JSON entry.
Is there a way that I can just tell python to automatically enter the next level of the JSON object, without having to parse all keys, save them, and then iterate through them? Using JQ in bash I can do this quite easily with jq -r '.terms[][][]'.

If you are really sure, that there is exactly one key-value pair on each level, you can try the following:
def descend(x, depth):
for i in range(depth):
x = next(iter(x.values()))
return x

You can use dict.values() to iterate over the values of a dict. You can also use next(iter(dict.values())) to get a first (only) element of a dict.
for demand in data['terms']['OnDemand'].values():
next_level = next(iter(demand.values()))
print(next_level)
If you expect other number of children than 1 in the second level, you can just nest the fors:
for demand in data['terms']['OnDemand'].values():
for sub_demand in demand.values()
print(sub_demand)
If you are insterested in the keys too, you can use dict.items() method to iterate over dict keys and values at the same time:
for demand_key, demand in data['terms']['OnDemand'].items():
for sub_demand_key, sub_demand in demand.items()
print(demand_key, sub_demand_key, sub_demand)

Pymongo include only the fields which are starting with a name

For example, if this is my record
{
"_id":"123",
"name":"google",
"ip_1":"10.0.0.1",
"ip_2":"10.0.0.2",
"ip_3":"10.0.1",
"ip_4":"10.0.1",
"description":""}
I want to get only those fields starting with 'ip_'. Consider I have 500 fields & only 15 of them start with 'ip_'
Can we do something like this to get the output -
db.collection.find({id:"123"}, {'ip*':1})
Output -
{
"ip_1":"10.0.0.1",
"ip_2":"10.0.0.2",
"ip_3":"10.0.1",
"ip_4":"10.0.1"
}

The following aggregate query, using PyMongo, returns documents with the field names starting with "ip_".
Note the various aggregation operators used: $filter, $regexMatch, $objectToArray, $arrayToObject. The aggregation pipeline the two stages $project and $replaceWith.
pipeline = [
{
"$project": {
"ipFields": {
"$filter" : {
"input": { "$objectToArray": "$$ROOT" },
"cond": { "$regexMatch": { "input": "$$this.k" , "regex": "^ip" } }
}
}
}
},
{
"$replaceWith": { "$arrayToObject": "$ipFields" }
}
]
pprint.pprint(list(collection.aggregate(pipeline)))

I am unaware of a way to specify an expression that would decide which hash keys would be projected. MongoDB has projection operators but they deal with arrays and text search.
If you have a fixed possible set of ip fields, you can simply request all of them regardless of which fields are present in a particular document, e.g. project with
{ip_1: true, ip_2: true, ...}

elasticsearch query - two criteria

I have a list of JSON files in elasticsearch.
I have a list of strings, matching which I want to use as the criteria for a search.
Where, matching = ["223232_ds","dnjsnsd_22","2ee2i33","mkddsj2220","23e3efdjn"
I now need to find those records in elasticsearch where two keys contain values in this list, matching.
Without elasticsearch and simply loading the JSON as a python object I can do this like:
results= []
for record in JSON_list:
if record['key_1'] in matching and record['key_2'] in matching:
results.append(record)
Where the JSON_list looks like this:
[{'key_1' : "blahaksds",
'key_2' : "njasdnjkns"},
{'key_1' : "bladfgfdf",
'key_2' : "njasdsfsdrr"}]
How do I search for multiple criteria in es? Previously, I've used this setup to search for a record_id directly.
es = elasticsearch.Elasticsearch()
name = "so_sample"
# Formulate query
query = str("_id:"+'"'+ record_id +'"')
# Query
result = es.search(name,q=query)

You can use a bool query with two terms queries in the must clause, like this:
{
"query": {
"bool": {
"must": [
{
"terms": {
"key_1": ["223232_ds","dnjsnsd_22","2ee2i33","mkddsj2220","23e3efdjn"]
}
},
{
"terms": {
"key_2": ["223232_ds","dnjsnsd_22","2ee2i33","mkddsj2220","23e3efdjn"]
}
}
]
}
}
}

MongoDB count distinct items in an array

My actors collection contains an array-of-documents field, called acted_in. Instead of returning the size of acted_in.idmovies like so: {$size: $acted_in.idmovies}, I want to return the number of distinct values inside $acted_in.idmovies. How can I do that ?
c1 = actors.aggregate([{"$match": {'$and': [{'fname': f_name},
{'lname': l_name}]}},
{"$project": {'first_name': '$fname',
'last_name': '$lname',
'gender': '$gender',
'distinct_movies_played_in': {'$size': '$acted_in.idmovies'}}}])

You basically need to include $setDifference in there to obtain the "distinct" items. All "sets" are "distinct" by design and by obtaining the "difference" from the present array to an empty one [] you get the desired result. Then you can apply the $size.
You also have some common mistakes/misconceptions. Firstly when using $match or any MongoDB query expression you do not need to use $and unless there is an explicit case to do so. All query expression arguments are "already" AND conditions unless explicitly stated otherwise, as with $or. So don't explicitly use for this case.
Secondly your $project was using the explicit field path variables for every field. You do not need to do that just to return the field, and outside of usage in an "expression", you can simply use a 1 to notate you want it included:
c1 = actors.aggregate([
{ "$match": { "fname"': f_name, "lname": l_name } },
{ "$project": {
"first_name": 1,
"last_name": 1,
"gender": 1,
"distinct_movies_played_in": {
"$size": { "$setDifference": [ "$acted_in.idmovies", [] ] }
}
}}
])
In fact, if you are actually using MongoDB 3.4 or greater ( and your notation of an element within an array "$acted_in.idmovies" says you have at least MongoDB 3.2 ) which has support for $addFields then use that instead of specifying all other fields in the document.
c1 = actors.aggregate([
{ "$match": { "fname"': f_name, "lname": l_name } },
{ "$addFields": {
"distinct_movies_played_in": {
"$size": { "$setDifference": [ "$acted_in.idmovies", [] ] }
}
}}
])
Unless you explicitly need to just specify "some" other fields.
The basic case here is do not use $unwind for array operations unless you specifically need to perform a $group operation on with it's _id key pointing at a value obtained from "within" the array.
In all other cases, MongoDB has far more efficient operators for working with arrays that what $unwind does.

This should give you what you want:
actors.aggregate([
{
$match: {fname: f_name, lname: l_name}
},
{
$unwind: '$tags'
},
{
$group: {
_id: '$_id',
first_name: {$first: '$fname'},
last_name: {$last: '$lname'},
gender: {$first: '$gender'},
tags: {$addToSet: '$tags'}
}
},
{
$project: {
first_name: 1,
last_name: 1,
gender: 1,
distinct: {$size: '$tags'}
}
}
])
After the tags array is deconstructed and then put back into a set of itself, then you just need to get the number of items or length of that set.

get filtered embeded elements from all class on mongoengine

I got two class on Mongoengine:
class UserPoints(EmbeddedDocument):
user = ReferenceField(User, verbose_name='user')
points = IntField(verbose_name='points', required=True)
def __unicode__(self):
return self.points
And
class Local(Document):
token = StringField(max_length=250,verbose_name='token_identifier',unique=True)
points = ListField(EmbeddedDocumentField(UserPoints),required=False)
def __unicode__(self):
return self.name
If i do something like: "LP = Local.objects.filter(points__user=user)" I got all the locals with userpoints from my user. But i Want all the UserPoints from a User. How can i?
I try also: "lUs = UserPoints.objects.filter(user=user)" but i got an empty Array.
PD: I do something like this to solve the problem, but it's not efficient.
LDPoints = []
LP = Local.objects.filter(points__user=user)
print 'List P: '+str(len(LP))
for local in LP:
for points in local.points:
if points.user == user:
dPoints = parsePoints(points)
lDPoints.append(dPoints)

Adding to the original and getting venerable answer is that the aggregation framework has $filter now for some time, which is a lot cleaner that the $map and $setDifference method used in the original answer.
Local._get_collection().aggregate([
{ "$match": { "points.user": user } },
{ "$project": {
"token": 1,
"points": {
"$filter": {
"input": "$points",
"as": "el",
"cond": { "$eq": [ "$$el.user", user ] }
}
}
}}
])
The same principles apply though for obtaining "multiple" matches from an array in the collection you use the aggregate() method of the underlying driver, as called from _get_collection().
Original
The answer to avoid "filtering" your embedded documents for the selected "user" only is to use the aggregation framework. This allows you to manipulate the "array content" on the database server rather than filtering the results in your client code.
Aggregation is done with the raw pymongo driver methods, but since Mongoengine is built on top of this driver you access the raw collection object from your class with the ._get_collection() method:
Local._get_collection().aggregate([
# Match the documents that have the required user
{ "$match": {
"points.user": user
}},
# unwind the embedded array to de-normalize
{ "$unwind": "$points" },
# Matching now filters the elements
{ "$match": {
"points.user": user
}},
# Group back as an array
{ "$group": {
"_id": "$_id",
"token": { "$first": "$token" },
"points": { "$push": "$points" }
}}
])
If you have MongoDB 2.6 or greater on your server and your "user/points" combination is always unique you can alternately filter without the $unwind|$match|$group cycle using the $map and $setDifference operators available there:
Local._get_collection().aggregate([
# Match the documents that have the required user
{ "$match": {
"points.user": user
}},
# Filter the array in place
{ "$project": {
"token": 1,
"points": {
"$setDifference": [
{
"$map": {
"input": "$points",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.user", user ] },
"$$el",
false
]
}
}
},
[false]
]
}
}}
])
In the second case there the $cond is a ternary operator which takes a logical expression as it's first argument and the values to return when that expression is either true or false as it's other arguments. Inside the $map, each element is tested to see if the condition is true, in this case "is the user field equal to the selected user".
Either the content of that array position is returned or otherwise false. The $setDifference takes the resulting array and "filters" the false values out, so only the matching elements are returned.
In the legacy approach, the $unwind pipeline operator is used to effectively turn each array element into it's own document with all other parent properties. This allows you to apply the same $match condition, which unlike the initial query actually removes the documents which now as single elements no longer match your condition. You always want the first stage as there is no point processing this $unwind|$match combination on all of the documents that might not contain your matching condition.
The $group stage brings everything back into line per document. Using the $first option to return all other fields that were essentially duplicated by the $unwind and the $push operator to rebuild the array with the matching elements.
So while there no "built-in" methods to MongoEngine to do this sort of query, you can do this the MongoDB way by accessing the raw driver.
Also note that if you only expected one element to match in any array for your given "user" or other query, then you could alternately use the field projection form available to the raw driver as well. But the aggregation method is required for any more than one matching element of the array.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sorting by multiple params in pyes and elasticsearch - python

Related

Automatically entering next JSON level using Python in a similar way to JQ in bash

Pymongo include only the fields which are starting with a name

elasticsearch query - two criteria

MongoDB count distinct items in an array

get filtered embeded elements from all class on mongoengine

Categories

Resources