How to obtain all the hits in elasticsearch for python?

How to obtain all the hits in elasticsearch for python? - python

I am currently trying to run a query using a python package for elasticsearch. However, whenever I call es.search(), I only get 10 results, when there should be more than 1M. Can anyone tell me how I can obtain all the hits?

Using the elasticsearch and elasticsearch-dsl libraries:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
client = Elasticsearch(host="localhost")
s = Search(using=client, index="my_index")
for hit in s.scan():
print(hit.title)
See the documentation about pagination.

Related

Retrieving DataStream list in Elasticsearch using python API

In Kibana dev tools, When I am using the API call GET /_data_stream/
I am getting the list of DataStream.
In similar way I want to retrieve it using the Elasticsearch Python API, I am not able to find the way. Can anyone please help me?

You need to use below python code as describe here:
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts=["localhost:9200"])
resp= es.indices.get_data_stream(name="*")

Elasticsearch Data Insertion with Python

I'm brand new to using the Elastic Stack so excuse my lack of knowledge on the subject. I'm running the Elastic Stack on a Windows 10, corporate work computer. I have Git Bash installed for a bash cli, and I can successfully launch the entire Elastic Stack. My task is to take log data that is stored in one of our databases and display it on a Kibana dashboard.
From what my team and I have reasoned, I don't need to use Logstash because the database that the logs are sent to is effectively our 'log stash', so to use the Logstash service would be redundant. I found this nifty diagram
on freecodecamp, and from what I gather, Logstash is just the intermediary for log retrieval different services. So instead of using Logstash, since the log data is already in a database, I could just do something like this
USER ---> KIBANA <---> ELASTICSEARCH <--- My Python Script <--- [DATABASE]
My python script successfully calls our database and retrieves the data, and a function that molds the data into a dict object (as I understand, Elasticsearch takes data in a JSON format).
Now I want to insert all of that data into Elasticsearch - I've been reading the Elastic docs, and there's a lot of talk about indexing that isn't really indexing, and I haven't found any API calls I can use to plug the data right into Elasticsearch. All of the documentation I've found so far concerns the use of Logstash, but since I'm not using Logstash, I'm kind of at a loss here.
If there's anyone who can help me out and point me in the right direction I'd appreciate it. Thanks
-Dan

You ingest data on elasticsearch using the Index API, it is basically a request using the PUT method.
To do that with Python you can use elasticsearch-py, the official python client for elasticsearch.
But sometimes what you need is easier to be done using Logstash, since it can extract the data from your database, format it using many filters and send to elasticsearch.

How to expose Python api for elasticsearch

I have inserted large amount of data(1 million) in EllasticSearch. Now i want to create a REST API to fetch the data from EllasticSearch.
I want to use CURL commands
(eg: curl -i http://localhost:5000/todo/api/v1.0/tasks/2)
for being able to get the json fields having _id=2
I found the following blog https://blog.miguelgrinberg.com/post/designing-a-restful-api-with-python-and-flask
that helped me on how to create REST API, but i am not able to understand how do i extend this for ElasticSearch.

The elasticsearch python API is very convenient to create any kind of operation (inserting or fetching). You can find the doc's here:
https://elasticsearch-py.readthedocs.io/en/master/
Just one hint, in my experience the python api tended to be slower then creating direct curl requests from the command line. Anyhow, it is very convenient to work with. A query is as easy as the following snippet.
from elasticsearch import Elasticsearch
es = Elasticsearch()
res = es.index(index="index-logstash")

How to I use the elasticsearch python api to get an overview of all snapshots?

I'm using the elasticsearch python api to communicate with my elasticsearch database. How can I make a specific GET request to get an overview of all the snapshots that have been created?
The Kibana command for this would be: GET /_snapshot/my_backup/_all.
It seems the Elasticsearch.get() function is only suited to retrieve documents.
I would rather not use the Requests module.
The snapshot helper functions I found only have the option to get an overview of snapshots that are currently running.
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.snapshot.get_repository('my_backup') # configuration information
es.snapshot.status('my_backup') # currently running snapshots

I finally realized you can use the _all keyword when needing all snapshots, in the following way:
all_snapshots = es.snapshot.get(repository = 'my_backup', snapshot = '_all')

Just adding one of my own as this got me in the right path.
If you need to get the general snapshot status i.e if a snapshot is being run:
es_session.snapshot.status('_all')

Twitter Search Query in Python

I am trying to pull tweets matching the given search query. I'm using the following code :
import urllib2 as urllib
import json
response = urllib.urlopen("https://search.twitter.com/search.json?q=microsoft")
pyresponse = json.load(response)
print pyresponse
This was working a few days ago, but suddenly stopped working now. With some help from google, I learned that this type of url in not supported anymore.
How do I perform this search query. What url shall I use?

Twitter is deprecating non-authenticated searches. You should look into Tweepy or another Python library that interacts with Twitter. https://github.com/tweepy/tweepy

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.