How to expose Python api for elasticsearch

How to expose Python api for elasticsearch - python

I have inserted large amount of data(1 million) in EllasticSearch. Now i want to create a REST API to fetch the data from EllasticSearch.
I want to use CURL commands
(eg: curl -i http://localhost:5000/todo/api/v1.0/tasks/2)
for being able to get the json fields having _id=2
I found the following blog https://blog.miguelgrinberg.com/post/designing-a-restful-api-with-python-and-flask
that helped me on how to create REST API, but i am not able to understand how do i extend this for ElasticSearch.

The elasticsearch python API is very convenient to create any kind of operation (inserting or fetching). You can find the doc's here:
https://elasticsearch-py.readthedocs.io/en/master/
Just one hint, in my experience the python api tended to be slower then creating direct curl requests from the command line. Anyhow, it is very convenient to work with. A query is as easy as the following snippet.
from elasticsearch import Elasticsearch
es = Elasticsearch()
res = es.index(index="index-logstash")

Related

Retrieving DataStream list in Elasticsearch using python API

In Kibana dev tools, When I am using the API call GET /_data_stream/
I am getting the list of DataStream.
In similar way I want to retrieve it using the Elasticsearch Python API, I am not able to find the way. Can anyone please help me?

You need to use below python code as describe here:
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts=["localhost:9200"])
resp= es.indices.get_data_stream(name="*")

How to post request to API using only code?

I am developing a DAG to be scheduled on Apache Airflow which main porpuse will be to post survey data (on json format) to an API and then getting a response (the answers to the surveys). Since this whole process is going to be automated, every part of it has to be programmed in the DAG, so I can´t use Postman or any similar app (unless there is a way to automate their usage, but I don't know if this is possible).
I was thinking of using the requests library for Python, and the function I've written for posting the json to the API looks like this:
def postFileToAPI(**context):
print('uploadFileToAPI() ------ ')
json_file = context['ti'].xcom_pull(task_ids='toJson') ## this pulls the json file from a previous task
print('--------------- Posting survey request to API')
r = requests.post('https://[request]', data = json_file)
(I haven't finished defining the http link for the request because my source data is incomplete.)
However, since this is my frst time working with APIs and the requests library, I don't know if this is enough. For example, I'm unsure if I need to provide a token from the API to perform the request.
I also don't know if there are other libraries that are better suited for this or that could be a good support.
In short: I don't know if what I'm doing will work as intended, what other information I need t provide my DAG or if there are any libraries to make my work easier.

The Python requests package that you're using is all you need, except if you're making a request that needs extra authorisation - then you should also import for example requests_jwt (then from requests_jwt import JWTAuth) if you're using JSON web tokens, or whatever relevant requests package corresponds for your authorisation style.
You make POST and GET requests and all individual requests separately.
Include the URL and data arguments as you have done and that should work!
You may also need headers and/or auth arguments to get through security,
eg for the GitLab api for a private repository you would include these extra arguments, where GITLAB_TOKEN is a GitLab web token.
```headers={'PRIVATE-TOKEN': GITLAB_TOKEN},
auth=JWTAuth(GITLAB_TOKEN)```
If you just try it it should work, if it doesn't work then test the API with curl requests directly in the Terminal, or let us know :)

Elasticsearch Data Insertion with Python

I'm brand new to using the Elastic Stack so excuse my lack of knowledge on the subject. I'm running the Elastic Stack on a Windows 10, corporate work computer. I have Git Bash installed for a bash cli, and I can successfully launch the entire Elastic Stack. My task is to take log data that is stored in one of our databases and display it on a Kibana dashboard.
From what my team and I have reasoned, I don't need to use Logstash because the database that the logs are sent to is effectively our 'log stash', so to use the Logstash service would be redundant. I found this nifty diagram
on freecodecamp, and from what I gather, Logstash is just the intermediary for log retrieval different services. So instead of using Logstash, since the log data is already in a database, I could just do something like this
USER ---> KIBANA <---> ELASTICSEARCH <--- My Python Script <--- [DATABASE]
My python script successfully calls our database and retrieves the data, and a function that molds the data into a dict object (as I understand, Elasticsearch takes data in a JSON format).
Now I want to insert all of that data into Elasticsearch - I've been reading the Elastic docs, and there's a lot of talk about indexing that isn't really indexing, and I haven't found any API calls I can use to plug the data right into Elasticsearch. All of the documentation I've found so far concerns the use of Logstash, but since I'm not using Logstash, I'm kind of at a loss here.
If there's anyone who can help me out and point me in the right direction I'd appreciate it. Thanks
-Dan

You ingest data on elasticsearch using the Index API, it is basically a request using the PUT method.
To do that with Python you can use elasticsearch-py, the official python client for elasticsearch.
But sometimes what you need is easier to be done using Logstash, since it can extract the data from your database, format it using many filters and send to elasticsearch.

How to I use the elasticsearch python api to get an overview of all snapshots?

I'm using the elasticsearch python api to communicate with my elasticsearch database. How can I make a specific GET request to get an overview of all the snapshots that have been created?
The Kibana command for this would be: GET /_snapshot/my_backup/_all.
It seems the Elasticsearch.get() function is only suited to retrieve documents.
I would rather not use the Requests module.
The snapshot helper functions I found only have the option to get an overview of snapshots that are currently running.
from elasticsearch import Elasticsearch
es = Elasticsearch()
es.snapshot.get_repository('my_backup') # configuration information
es.snapshot.status('my_backup') # currently running snapshots

I finally realized you can use the _all keyword when needing all snapshots, in the following way:
all_snapshots = es.snapshot.get(repository = 'my_backup', snapshot = '_all')

Just adding one of my own as this got me in the right path.
If you need to get the general snapshot status i.e if a snapshot is being run:
es_session.snapshot.status('_all')

Python and curl question

I will be transmitting purchase info (like CC) to a bank gateway and retrieve the result by using Django thus via Python.
What would be the efficient and secure way of doing this?
I have read a documentation of this gateway for php, they seem to use this method:
$xml= Some xml holding data of a purchase.
$curl = `/usr/bin/curl -s -d 'DATA=$xml' "https://url of the virtual bank POS"`;
$data=explode("\n",$curl); //return value is also an xml, seems like they are splitting by each `\n`
and using the $data, they process if the payment is accepted, rejected etc..
I want to achieve this under python language, for this I have done some searching and seems like there is a python curl application named pycurl yet I have no experience using curl and do not know if this is library is suitable for this task. Please keep in mind that as this transfer requires security, I will be using SSL.
Any suggestion will be appreciated.

Use of the standard library urllib2 module should be enough:
import urllib
import urllib2
request_data = urllib.urlencode({"DATA": xml})
response = urllib2.urlopen("https://url of the virtual bank POS", request_data)
response_data = response.read()
data = response_data.split('\n')
I assume that xml variable holds data to be sent.

Citing pycurl.sourceforge.net:
To sum up, PycURL is very fast (esp. for multiple concurrent operations) and very feature complete, but has a somewhat complex interface. If you need something simpler or prefer a pure Python module you might want to check out urllib2 and urlgrabber. There is also a good comparison of the various libraries.
Both curl and urllib2 can work with https so it's up to you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.