Elastic search DSL python issue

Elastic search DSL python issue - python

I have been using the ElasticSearch DSL python package to query my elastic search database. The querying method is very intuitive but I'm having issues retrieving the documents. This is what I have tried:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
es = Elasticsearch(hosts=[{"host":'xyz', "port":9200}],timeout=400)
s = Search(using=es,index ="xyz-*").query("match_all")
response = s.execute()
for hit in response:
print hit.title
The error I get :
AttributeError: 'Hit' object has no attribute 'title'
I googled the error and found another SO : How to access the response object using elasticsearch DSL for python
The solution mentions:
for hit in response:
print hit.doc.firstColumnName
Unfortunately, I had the same issue again with 'doc'. I was wondering what the correct way to access my document was?
Any help would really be appreciated!

I'm running into the same issues as I've found different versions of this, but it seems to depend on the version of the elasticsearch-dsl library you're using. You might explore the response object, and it's sub-objects. For instance, using version 5.3.0, I see the expected data using the below loop.
for hit in RESPONSE.hits._l_:
print(hit)
or
for hit in RESPONSE.hits.hits:
print(hit)
NOTE these are limited to 10 data elements for some strange reason.
print(len(RESPONSE.hits.hits))
10
print(len(RESPONSE.hits._l_))
10
This doesn't match the amount of overall hits if I print the number of hits using print('Total {} hits found.\n'.format(RESPONSE.hits.total))
Good luck!

From version 6 onwards the response does not return your populated Document class anymore, meaning that your fields are just an AttrDict which is basically a dictionary.
To solve this you need to have a Document class representing the document you want to parse. Then you need to parse the hit dictionary with your document class using the .from_es() method.
Like I answered here.
https://stackoverflow.com/a/64169419/5144029
Also have a look at the Document class here
https://elasticsearch-dsl.readthedocs.io/en/7.3.0/persistence.html

Related

How to get Document Name from DocumentReference in Firestore Python

I have a document reference that I am retreiving from a query on my Firestore database. I want to use the DocumentReference as a query parameter for another query. However, when I do that, it says
TypeError: sequence item 1: expected str instance, DocumentReference found
This makes sense, because I am trying to pass a DocumentReference in my update statement:
db.collection("Teams").document(team).update("Dictionary here") # team is a DocumentReference
Is there a way to get the document name from a DocumentReference? Now before you mark this as duplicate: I tried looking at the docs here, and the question here, although the docs were so confusing and the question had no answer.
Any help is appreciated, Thank You in advance!

Yes,split the .refPath. The document "name" is always the last element after the split; something like lodash _.last() can work, or any other technique that identifies the last element in the array.
Note, btw, the refPath is the full path to the document. This is extremely useful (as in: I use it a lot) when you find documents via collectionGroup() - it allows you to parse to find parent document(s)/collection(s) a particular document came from.
Also note: there is a pseudo-field __name__ available. (really an alias of documentID()). In spite of it's name(s), it returns the FULL PATH (i.e. refPath) to the document NOT the documentID by itself.

I think I figured out - by doing team.path.split("/")[1] I could get the document name. Although this might not work for all firestore databases (like subcollections) so if anyone has a better solution, please go ahead. Thanks!

Python mongodb/motor "'ObjectId' object is not iterable" error while trying to find item in collection

I know that there are similar questions, but I've tried everything that was advised and still getting an error. I'm trying to fetch item from mongo collection by id, converting string to an ObjectId, like that:
from bson import ObjectId
async def get_single_template(db, template_id):
template = await db.templates.find_one({ '_id': ObjectId(template_id) })
return template
And I'm getting an error:
ValueError: [TypeError("'ObjectId' object is not iterable"), TypeError('vars() argument must have __dict__ attribute')]
"template_id" is a valid string, like "601401887ecf2f6153bbaaad". ObjectId created from it - too. It fails only to work inside find_one() method. When I'm using find() with that id it works well. I've tried from bson.objectid import ObjectId too - no difference. I'm using motor library to access mongo. Is there something that I'm missing?
P.S. Links to the corresponding docs:
https://pymongo.readthedocs.io/en/stable/tutorial.html#querying-by-objectid
Though I'm using motor async library, I can't find direct examples in it's docs. Basically, it wraps pymongo. I can only find examples in other's source code.

Well, I've found out what caused that issue. The problem was not in the way I've tried to query data, but in the way I've tried to return it. I've forgotten to convert ObjectId to string in the entity that I've retrieved from database and tried to return it 'as is'. My bad.

I encountered this problem as well while using Python motor for Mongodb. In your app.collection.find_one(...), add {'_id': 0} along with the dictionary which has the value you want to search for.
So it should be like this:
await app.collection.find_one({"value": val},{'_id': 0})

After long research i find one solution work for me
I try many ways let's discuss with you one of them
DATABASE_URL = mongodb://localhost:portname/yourdbname
client = mongo_client.MongoClient(
settings.DATABASE_URL#, ServerSelectionTimeoutMS=5000
)
db = client[settings.MONGO_INITDB_DATABASE]
Post = db.post
#router.get('/')
async def posts(user_id: str = Depends(oauth2.require_user)):
list = []
for i in Post.find():
list.append(i)
print("list", list)
return {'status': 'success', "list": list}
Everything work on print but when i return the response then show me error that mentioned in post i solve this error by doing this
serialize ObjectId with native fastApi methods
at top of my file i just import
from bson.objectid import ObjectId
import pydantic
pydantic.json.ENCODERS_BY_TYPE[ObjectId]=str
Note:
I am not expert of fastapi with MongoDB i just start learning last 5 days ago about fastapi and start learning mongoDb last 2 days ago. if you have any better practice then let me know
i also trying to serialize data but on that case also not work so try this way
thank you
this comment help me from github
https://github.com/tiangolo/fastapi/issues/1515#issuecomment-782838556

how to use pyknackhq python library for getting whole objects/tables from my knack builder

I am trying to connect knack online database with my python data handling scripts in order to renew objects/tables directly into my knack app builder. I discovered pyknackhq Python API for KnackHQ can fetch objects and return json objects for the object's records. So far so good.
However, following the documentation (http://www.wbh-doc.com.s3.amazonaws.com/pyknackhq/quick%20start.html) I have tried to fetch all rows (records in knack) for my object-table (having in total 344 records).
My code was:
i =0
for rec in undec_obj.find():
print(rec)
i=i+1
print(i)
>> 25
All first 25 records were returned indeed, however the rest until the 344-th were never returned. The documentation of pyknackhq library is relatively small so I couldn't find a way around my problem there. Is there a solution to get all my records/rows? (I have also changed the specification in knack to have all my records appear in the same page - page 1).
The ultimate goal is to take all records and make them a pandas dataframe.
thank you!

I haven't worked with that library, but I've written another python Knack API wrapper that should help:
https://github.com/cityofaustin/knackpy
The docs should get you where you want to go. Here's an example:
>>> from knackpy import Knack
# download data from knack object
# will fetch records in chunks of 1000 until all records have been downloaded
# optionally pass a rows_per_page and/or page_limit parameter to limit record count
>>> kn = Knack(
obj='object_3',
app_id='someappid',
api_key='topsecretapikey',
page_limit=10, # not needed; this is the default
rows_per_page=1000 # not needed; this is the default
)
>>> for row in kn.data:
print(row)
{'store_id': 30424, 'inspection_date': 1479448800000, 'id': '58598262bcb3437b51194040'},...
Hope that helps. Open a GitHub issue if you have any questions using the package.

How to disable query cache?

First of all, sorry for not 100% clearly questions title.
It is easier to explain with few lines of code:
query = {...}
while True:
elastic_response = elastic_client.search(elastic_index, body=query, request_cache=False)
if elastic_response["hits"]["total"]) == 0:
break
else:
for doc in elastic_response["hits"]["hits"]:
print("delete {}".format(doc["_id"]))
elastic_client.delete(index=elastic_index, doc_type=doc["_type"], id=doc["_id"])
I make a search, then delete all the docs and then do the search again to get the next bunch.
BUT the search query gives me the same docs! And this results in 404 exception on delete. It has to be some kind of cache, but i does not found anything, "request_cache" doesn't help.
I can probably refactor this code to use batch delete, but i want to understand what is wrong here
P.S. i'm using the official python client

If using a sleep() after the deletes makes the documents go away, then it's not about cache. It's about the refresh_interval and the near real timeness or Elasticsearch.
So, call _refresh after your code leaves the for loop. Also, don't delete document by document, but create a _bulk request where you delete all your documents in batches, depending on how many they are.

Using Pattern.web to search all Wikipedia is raising "'NoneType' object is not iterable" error

I am trying to use Pattern.web to search all of Wikipedia for words and phrases that include an apostrophe. This is my latest attempt:
from pattern.web import Wikipedia, plaintext
from pattern.web import SEARCH
engine = Wikipedia(language="en")
q = "\"cat's\""
for i in range(1, 2):
for result in engine.search(q, start=i, count=10, type=SEARCH, cached=True):
print plaintext(result.text)
print result.url
print result.date
print
But I get this error message:
for result in engine.search(q, start=i, count=10, type=SEARCH, cached=True):
TypeError: 'NoneType' object is not iterable
Question:
Is it even possible to do what I'm trying to do?
If it is, how do I fix this?

If you refer to the Wikipedia SearchEngine documentation, you'll notice that your attempt to iterate is misguided and your query may be erroneous as well:
Wikipedia.search() returns a single WikipediaArticle for the given (case-sensitive) query, which is the title of an article.
(Note that this means that start and count can only be 1.)
I would venture to guess, without downloading the pattern library and trying this myself, that since there is no Wikipedia article entitled "cat's", so you get None back.
So, is it possible to do what you are trying to do? Yes. Refer again to the documentation:
Wikipedia.index() returns an iterator over all article titles on Wikipedia.
You might do something like this:
for title in engine.index():
article = engine.search(title)
# do your string pattern searching here
I answered this to the best of my ability without downloading pattern and trying it myself, so YMMV.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Elastic search DSL python issue - python

Related

How to get Document Name from DocumentReference in Firestore Python

Python mongodb/motor "'ObjectId' object is not iterable" error while trying to find item in collection

how to use pyknackhq python library for getting whole objects/tables from my knack builder

How to disable query cache?

Using Pattern.web to search all Wikipedia is raising "'NoneType' object is not iterable" error

Categories

Resources