Querying nested objects in Elasticsearch - python

I have a Product-Merchant mapping which looks like the following
catalog_map = {
"catalog": {
"properties": {
"merchant_id": {
"type": "string",
},
"products": {
"type": "object",
},
"merchant_name" :{
"type" : "string"
}
}
}
}
"product" has objects, say , product_id , product_name , product_price. Products and merchants are mapped, such that :
for merchant in Merchant.objects.all() :
products = [{"product_name" : x.product.name, "product_price" : x.price, "product_id" : x.product.id , "product_category" : x.product.category.name} for x in MerchantProductMapping.objects.filter(merchant=merchant)]
tab = {
'merchant_id': merchant.id,
'merchant_name': merchant.name,
'product': products
}
res = es.index(index="my-index", doc_type='catalog', body=tab)
The data gets indexed smoothly, in the desired form. Now, when I query the data from given index, I do it in the following way :
GET /esearch-index/catalog/_search
{
"query": {
"bool" :{
"must": [
{"match": {
"merchant_name": {
"query": "Sir John"
}
}}],
"should": [
{"match": {
"product_name": {
"query": "Vanilla"
}
}}
]
}}
This query gives me the result of all the products in the index with merchant name "Sir John" . However, I want it to return the details of the product "Vanilla" sold by "Sir John" instead.
On someone's recommendation, I used "_source" while querying, but that doesn't help.
How can I single out the information of one single object from the entire "catalog" index of the merchant?

Once your bool query has a must clause, all the conditions inside of it are required. The conditions inside of the should clause are not required. They will only boost the results. (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html#query-dsl-bool-query)
So, going back to your query, it will retrieve all catalogs matching merchant_name "Sir John". This is the only required (must) condition. The name "Vanilla" will only boost results with the name "Vanilla" to the top, because it is not required.
If you want to retrieve "Vanilla" sold by "Sir John", put both conditions inside of the must clause and change your query to this:
{
"query": {
"bool": {
"must": [
{
"match": {
"merchant_name": {
"query": "Sir John"
}
}
},
{
"match": {
"product_name": {
"query": "Vanilla"
}
}
}
]
}
}
}

Related

How to filter ElasticSearch results without having it affect the document score?

I am trying to filter my results on "publication_year" field but I don't want it to affect the score of the document, but if I add the "range" to the query or to "filter", it seems to affect the score and score the documents higher whose "publication_year" is closer to "lte" or "less than equal to" the upper limit in the "range".
My query:
query = {
'bool': {
'should': [
{
'match_phrase': {
"title": keywords
}
},
{
'match_phrase': {
"abstract": keywords
}
},
]
}
}
if publication_year_constraint:
range_query = {"range":{"publication_year":{"gte":publication_year_constraint, "lte": datetime.datetime.today().year}}}
query["bool"]["filter"] = [range_query]
tried putting the "range" inside the "should" block as well, similar results.
Try use Filter Context.
In a filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated.
Example:
{
"query": {
"bool": {
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}

Get field value in MongoDB without parent object name

I'm trying to find a way to retrieve some data on MongoDB trough python scripts
but I got stuck on a situation as follows:
I have to retrieve some data, check a field value and compare with another data (MongoDB Documents).
But the Object's name may vary from each module, see bellow:
Document 1
{
"_id": "001",
"promotion": {
"Avocado": {
"id": "01",
"timestamp": "202005181407",
},
"Banana": {
"id": "02",
"timestamp": "202005181407",
}
},
"product" : {
"id" : "11"
}
Document 2
{
"_id": "002",
"promotion": {
"Grape": {
"id": "02",
"timestamp": "202005181407",
},
"Dragonfruit": {
"id": "02",
"timestamp": "202005181407",
}
},
"product" : {
"id" : "15"
}
}
I'll aways have an Object called promotion but the child's name may vary, sometimes it's an ordered number, sometimes it is not. The field I need the value is the id inside promotion, it will aways have the same name.
So if the document matches the criteria I'll retrieve with python and get the rest of the work done.
PS.: I'm not the one responsible for this kind of Document Structure.
I've already tried these docs, but couldn't get them to work the way I need.
$all
$elemMatch
Try this python pipeline:
[
{
'$addFields': {
'fruits': {
'$objectToArray': '$promotion'
}
}
}, {
'$addFields': {
'FruitIds': '$fruits.v.id'
}
}, {
'$project': {
'_id': 0,
'FruitIds': 1
}
}
]
Output produced:
{FruitIds:["01","02"]},
{FruitIds:["02","02"]}
Is this the desired output?

How to Search in multiple OR conditions in Elastic search in Python

I have to do a search for all items in array along with a static detail in elastic search.
Fields in Elastics search index: tech_id, detail, volume
tech_ids = ['qwe1', 'qwe2', 'qwe3', 'qwe4', 'qwe5', 'qwe6', 'qwe7']
Number of tech_id in array can differ.
Now my search has to take place in a combination of tech_id and detail where tech_id varies while detail stays static. This combination is an or combination. In the end i am expecting search to have with provided tech_ids and static detail.
tech_ids = ['qwe1', 'qwe2', 'qwe3', 'qwe4', 'qwe5', 'qwe6', 'qwe7']
"query": {
"bool": {
"must": [
{
"match": {
"detail": "calci"
}
},
{
"match_phrase": {
"tech_id": tech_ids[0]
}
}]
}
What you're after, I think, is a bool-should within a bool-must:
{
"query": {
"bool": {
"must": [
{
"match": {
"detail": "calci"
}
},
{
"bool": {
"should":
[{
"match_phrase": { "tech_id": tid }
} for tid in tech_ids]
}
}
]
}
}
}

Multiple queries in one ElasticSearch Query

Here an example of an item indexed in ES :
{
"_id" : ..,
"class": "A",
"name": "item1"
}
I want a single query where I can get all items of the same class of the item with name "item1". So basically, I want all indexed items with class A, with only having the name.
I can do it with 2 queries :
Query 1 :
SEARCH
{
"query": {
"query_string": {
"default_field": "name",
"query": "item1"
}
}
Then from this I get the class and I write this query :
SEARCH
{
"query": {
"query_string": {
"default_field": "class",
"query": "A"
}
}
Any idea ? I know there's an easy way but I can't find it...
You can combine multiple queries with clauses using a bool query. In this case, two criteria must be satisified, so both queries should be must clauses
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "item1",
"fields": [
"name"
]
}
},
{
"query_string": {
"query": "A",
"fields": [
"class"
]
}
}
]
}
}
}
If you don't need relevancy scores, which it doesn't appear that you do in this case, both queries could be filter clauses instead of must clauses.
If name and class are mapped as keyword datatypes, you may want to use a term-level query as opposed to a full-text query like query_string query. Here's what that would look like, using filter clauses
{
"query": {
"bool": {
"filter": [
{
"term": {
"name": {
"value": "item1"
}
}
},
{
"term": {
"class": {
"value": "A"
}
}
}
]
}
}
}

Elastic Search: including #/hashtags in search results

Using elastic search's query DSL this is how I am currently constructing my query:
elastic_sort = [
{ "timestamp": {"order": "desc" }},
"_score",
{ "name": { "order": "desc" }},
{ "channel": { "order": "desc" }},
]
elastic_query = {
"fuzzy_like_this" : {
"fields" : [ "msgs.channel", "msgs.msg", "msgs.name" ],
"like_text" : search_string,
"max_query_terms" : 10,
"fuzziness": 0.7,
}
}
res = self.es.search(index="chat", body={
"from" : from_result, "size" : results_per_page,
"track_scores": True,
"query": elastic_query,
"sort": elastic_sort,
})
I've been trying to implement a filter or an analyzer that will allow the inclusion of "#" in searches (I want a search for "#thing" to return results that include "#thing"), but I am coming up short. The error messages I am getting are not helpful and just telling me that my query is malformed.
I attempted to incorporate the method found here : http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html but it doesn't make any sense to me in context.
Does anyone have a clue how I can do this?
Did you create a mapping for you index? You can specify within your mapping to not analyze certain fields.
For example, a tweet mapping can be something like:
"tweet": {
"properties": {
"id": {
"type": "long"
},
"msg": {
"type": "string"
},
"hashtags": {
"type": "string",
"index": "not_analyzed"
}
}
}
You can then perform a term query on "hashtags" for an exact string match, including "#" character.
If you want "hashtags" to be tokenized as well, you can always create a multi-field for "hashtags".

Categories

Resources