Get documents near a match in pyMongo

Get documents near a match in pyMongo - python

I'm trying to get the documents that surround a match in pyMongo. So I would search for a string and get the matches and the entries that are around this match (using the '_index' well, index), so the user has some context on the result.
I'm trying to do it using $setWindowFields to no success, as I'm getting no results. Probably I'm using the wrong syntax?. This is the aggregation that I'm trying:
show_near = ([{'$setWindowFields':{
'partitionBy':None,
'sortBy': {'_index':1},
'output':{
'nearIds':{
'$addToSet':'$_id',
'window':{'documents':[-2,2]}
}
}
}
},
{
'$match':
{field:{'$regex':f'({s})'}}
},
{'$lookup':
{'from':'collection',
'localField':'nearIds',
'foreignField':'_id',
'as':'nearDocs'}
},
{'$unwind':'$nearDocs'},
{'$replaceRoot':{
'newRoot':'$nearDocs'}}])
cursor = self.collection.aggregate(show_near)
Where 's' is the string I want to match and '_index' is the order of the entries.
Any idea? Maybe there is another method to do this? This feature looks perfect for what I want, but maybe I'm mistaken and there is another way. I've tried going back and forth with $gte and $lte, but is not feasible when results start to pile up.
Thanks!

Related

Elastic Search Python: must inlcuded all words and sorted by word order to results

must included all words and sorted by word order to results of Elastic Search
How to do that?
i mean i have python query:
result = es.search(index="main_database", body={"query": {"match": {'Full_texts':es_query}}}, size=50)
problem is: at least one word is in 'Full_texts' then it gives to me results.
For me i want all words must included and keep word orders.
for example: the top of search result is fulfilled with all above
like my requirements AND mid and bottom results is same as default Elastic search results.

You need to use intervals query or span near
With respectively with ordered or in_order parameter

Try using the operator and for your Elasticsearch match query.
The search query will be
{
"query": {
"match": {
"Full_texts": {
"query": "<your_query_input_goes_here>",
"operator": "and"
}
}
}
}
However, the word order is not guaranteed, if you strictly need the words to follow order, please also refer to the answer of #ExploZe

How to apply $regexMatch to an object field that's nested inside an object with the MongoDB Aggregation pipeline?

Is there a way I can build an aggregation pipeline stage to do a $regexMatch on an object field nested within another object. I typically see examples using the $addFields aggregation pipeline stage, so I've been trying to do a match with this stage. I'm trying to do a search on the word word within the string_text field of each object.
Note: This needs to be done in the domain file when building the aggregation pipeline.
Here is an example of the data I'm trying to do a $regexMatch on, in this case the text field:
{
array_of_data: [
textObjects: {
text1: {
string_text: "Matching word here"
},
text2: {
string_text: "Looking for the matching word which is here"
},
text3: {
string_text: "Won't find it here"
}
}
]
}
I've been trying to build it solve this issue with a $filter operation, but still don't get the results I'm looking for, it returns every document and not the ones with the word word in the string_text field.
Any help would be appreciated, I've been struggling with this for a few days now :(

elasticsearch search for large text has "too many clauses"

I have a set of news articles that I'm trying to index. Sometimes I get the same article with a tiny change (e.g. "Sep" vs "September"). Before loading an article to the database, I'd like to see if there is anything really similar before I load it.
so I tried this (using the python elasticsearch_dsl library)
search = elasticsearch_dsl.Search(index=INDEX, doc_type=DOC_TYPE)
search = search.filter(match, text=article_text)
and that works for a bit, until I get a very long article. Then I get an error message saying "maxClauseCount is set to 1024".
okay, so maybe my text is too long. So i do this:
text_bits = article_text.split()
if len(text_bits) > 1024:
article_text = " ".join(text_bits[:1023])
and that works for the first item with lots of text, but not the second. So maybe my original guess is off, or maybe I'm not doing this right.
(incidentally, I see that there's "more like this" query listed in the documentation, but when I try to use it through Sense, like so:
post /myindex/article/_search
{
"more_like_this" : {
"fields" : ["text"],
"like" : "mary had a little lamb"
}
}
I get "unknown search element 'more_like_this'"

Adding value to array in sub-document (nested within main doc) without duplication - MongoDB

it is quite complicated with the nested documents, but please let me know if you all has any solutions, thanks.
To summarize, I would like to:
Add a value to an array (without duplication), and the array is within a sub-document, that is within an array of a main document. (Document > Array > Subdoc > Array)
The subdocument itself might not exist, so if not exist, the subdocument itself need to be added, i.e. UpSert
The command be the same for both action (i.e. adding of value to subdoc's array, and adding of subdoc)
I have tried the following, but it doesn't work:
key = {'username':'user1'}
update1 = {
'$addToSet':{'clients':{
'$set':{'fname':'Jessica'},
'$set':{'lname':'Royce'},
'$addToSet':{'cars':'Toyota'}
}
}
}
#the document with 'Jessica' and 'Royce' does not exist in clients array, so a new document should be created
update2 = {
'$addToSet':{'clients':{
'$set':{'fname':'Jessica'},
'$set':{'lname':'Royce'},
'$addToSet':{'cars':'Honda'}
}
}
}
#now that the document with 'Jessica' and 'Royce' already exist in clients array, only the value of 'Honda' should be added to the cars array
mongo_collection.update(key, update1 , upsert=True)
mongo_collection.update(key, update2 , upsert=True)
error message: $set is not valid for storage
My intended outcome:
Before:
{
'username':'user1',
'clients':[
{'fname':'John',
'lname':'Baker',
'cars':['Merc','Ferrari']}
]
}
1st After:
{
'username':'user1',
'clients':[
{'fname':'John',
'lname':'Baker',
'cars':['Merc','Ferrari']},
{'fname':'Jessica',
'lname':'Royce',
'cars':['Toyota']}
]
}
2nd After:
{
'username':'user1',
'clients':[
{'fname':'John',
'lname':'Baker',
'cars':['Merc','Ferrari']},
{'fname':'Jessica',
'lname':'Royce',
'cars':['Toyota','Honda']}
]
}

My understanding says you won't be able to completely achieve intended solution directly. You can very well do nested update or upsert but duplication check probably not, as there is no direct way to check item contains in a array document.
For upsert operation you can refer mongodb update operation doc or bulk operation. And for duplication probably you need to have separate logic to identify.

Building an ElasticSearch search with exists using pyes

The goal of this example code is to figure out how to create a query consisting out of multiple filters and queries.
The below example is not working as expected.
I want to be able to execute my search only on document which contain a certain "key". That what I'm trying to reach with the ExistsFilter, but when enabling I don't get any results back.
Any pointers to clear up this question?
#!/usr/bin/python
import pyes
conn = pyes.ES('sandbox:9200')
conn.index('{"test":{"field1":"value1","field2":"value2"}}','2012.9.23','test')
filter = pyes.filters.BoolFilter()
filter.add_must(pyes.filters.LimitFilter(1))
filter.add_must(pyes.filters.ExistsFilter('test')) #uncommenting this line returns the documents
query = pyes.query.BoolQuery()
query.add_must(pyes.query.TextQuery('test.field1','value1'))
query.add_must(pyes.query.TextQuery('test.field2','value2'))
search = pyes.query.FilteredQuery(query, filter)
for reference in conn.search(query=search,indices=['2012.9.23']):
print reference

I don't use pyes (neither python). But, what I can see here is that some informations seems to miss in the ExistsFilter if I compare to the ExistsFilter documentation :
{
"constant_score" : {
"filter" : {
"exists" : { "field" : "user" }
}
}
}
Could it be your issue?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get documents near a match in pyMongo - python

Related

Elastic Search Python: must inlcuded all words and sorted by word order to results

How to apply $regexMatch to an object field that's nested inside an object with the MongoDB Aggregation pipeline?

elasticsearch search for large text has "too many clauses"

Adding value to array in sub-document (nested within main doc) without duplication - MongoDB

Building an ElasticSearch search with exists using pyes

Categories

Resources