I have a REST API built using Flask. It has GET and POST methods. The API returns a JSON response. I want SOLR to use the REST API's URL to perform a search on this response based on the query and return relevant search results.
How can I achieve this? Does SOLR only take a JSON file as input, in which case I would need to write endpoint's response to a JSON file, place it in the example folder and pass it into SOLR?
I want SOLR to use the REST API's URL to perform search on the data(JSON response) based on the query and return relevant search results
You will have to invoke the REST API, get back the JSON response and add it to Solr Index. Then you can use Solr to search that data.
Could you please let me know how to achieve this? Is it possible?
Please take a look at this documentation. It will help you indexing JSON documents.
https://lucene.apache.org/solr/guide/7_1/uploading-data-with-index-handlers.html#solr-style-json
Or does SOLR only take JSON file as input placed in the example folder, in which case, I would need to write the response received from my RESTAPI to a json file, place it in the example folder and pass it into SOLR
This is partially correct understanding, however you do not have to place it in example folder. You need to post that JSON (or format JSON in the form required by Solr) as shown in the reference provided above.
Hope this helps.
Thanks Amit! In the link you shared, it had curl commands. But I was trying to do this with Python. I found out a solution for this! Works like a charm. I have also embedded comments for further explanation.
#app.route('/solrInput', methods = ['POST'])
def solr_input():
"""Get response data from MongoDB RESTAPI and feed into SOLR for indexing"""
#get mongoDB data from rest api
input_data = requests.get('http://127.0.0.1:5000/getAllRecords')
json_data = input_data.json()
#loop through response to fetch the values
for key, val in json_data.items():
payload = str(val)
response = requests.post("http://localhost:8983/solr/techs/update/json/docs?commit=true", data=payload)
if __name__ == "__main__":
app.run(debug=True)
Related
Introduction
The challenge I bring to you today is: To implement a Real Rime REST API (GET, POST, PUT, DELETE, etc) to query and update any SPARQL endpoint using the Django REST Framework for a frontend application (I am using React) to request and use the serialized data provided by the REST API.
Please note that I'm using Django because I would like to implement Web AND Mobile applications in the future, but for now I will just implement it on a React Web application.
Specifications
The REST API should be able to:
Perform (read or update) queries to a SPARQL endpoint via HTTP requests.
Serialize the response to a JSON RDF standarized table, or an RDF Graph, depending on the HTTP response.
Store the serialized response in a Python object.
Provide an endpoint with the serialized response to a frontend application such as React).
Handle incoming requests from the frontend application, "translate" and execute as a SPARQL query.
Send back the response to the frontend application's request.
ALL OF THIS while performing all queries and updates In Real Time.
What I mean with a Real Time API:
A SPARQL query is executed from the REST API to a SPARQL endpoint via an HTTP request.
The REST API reads the HTTP response generated from the request.
The REST API serializes the response to the corresponding format.
This serialized response is stored locally in a Python object for future use.
(Note: All the triples from the SPARQL endpoint in the query now exist both in the SPARQL endpoint as well as in a Python object, and are consistent both locally and remotely.)
The triples are then (hypothetically) modified or updated (Either locally or remotely).
Now the local triples are out of synch with the remote triples.
The REST API now becomes aware of this update (maybe through Listener/Observer objects?).
The REST API then automatically synchs the triples, either through an update query request (if the changes were made locally) or by updating the Python object with the response from a query request (if the update was made remotely).
Finally, both (the SPARQL endpoint and the Python object) should share the latest updated triples and, therefore, be in synch.
Previous Attempts
I have currently been able to query a SPARQL endpoint using the SPARQLWrapper package (for executing the queries), and the RDFLib and JSON packages for serializing and instantiating Python objects from the response, like this:
import json
from rdflib import RDFS, Graph
from SPARQLWrapper import GET, JSON, JSONLD, POST, TURTLE, SPARQLWrapper
class Store(object):
def __init__(self, query_endpoint, update_endpoint=None):
self.query_endpoint = query_endpoint
self.update_endpoint = update_endpoint
self.sparql = SPARQLWrapper(query_endpoint, update_endpoint)
def graph_query(self, query: str, format=JSONLD, only_conneg=True):
results = self.query(query, format, only_conneg)
results_bytes = results.serialize(format=format)
results_json = results_bytes.decode('utf8').replace("'", '"')
data = json.loads(results_json)
return data
def query(self, query: str, format=JSON, only_conneg=True):
self.sparql.resetQuery()
self.sparql.setMethod(GET)
self.sparql.setOnlyConneg(only_conneg)
self.sparql.setQuery(query)
self.sparql.setReturnFormat(format)
return self.sparql.queryAndConvert()
def update_query(self, query: str, only_conneg=True):
self.sparql.resetQuery()
self.sparql.setMethod(POST)
self.sparql.setOnlyConneg(only_conneg)
self.sparql.setQuery(query)
self.sparql.query()
store = Store('http://www.example.com/sparql/Example')
print(store.query("""SELECT ?s WHERE {?s ?p ?o} LIMIT 1"""))
print(store.graph_query("""DESCRIBE <http://www.example.com/sparql/Example/>"""))
The Challenge
The previous code solves can already:
Perform (read or update) queries to a SPARQL endpoint via HTTP requests
Serialize the response to a JSON RDF standarized table, or an RDF Graph, depending on the HTTP response
Store the serialized response in a Python object.
But still fails to implement these other aspects:
Provide an endpoint with the serialized response to a frontend application such as React).
Handle incoming requests from the frontend application, "translate" and execute as a SPARQL query.**
Send back the response to the frontend application's request.
And last, but not least, it fails completely to implement the real time aspect of this challenge.
The Questions:
How would you implement this?
Is this really the best approach?
Can the already working code be optimized?
Is there something that already does this?
Thank you so much!
Sorry but I don't know anything much about Django so can't answer here with Django specifics.
However, I can say this: SPARQL has a specification for HTTP interactions (https://www.w3.org/TR/sparql11-protocol/) and it tells you to use sparql?query=... & sparql?update... style URIs for querying a store, so why define a new way of doing things with store.query & store.graph_query etc?
Is there a Django-specific reason?
You can already pose questions to a SPARQL Endpoint using React or whatever you want right now, just as it is.
You said what is missing is to "Provide an endpoint with the serialized response" but the SPARQL responses are this! SPARQL query response formats are defined in the spec (e.g. JSON: https://www.w3.org/TR/sparql11-results-json/) and SPARQLWrapper knows how to parse them into Python objects. Other language libraries, like rdflib.js in JavaScript also know.
See YASGUI (https://triply.cc/docs/yasgui) for a stand-alone JS SPARQL client.
There is data being automatically sent with POST request to the address I'm developing on, in format of JSON string.
How should I properly fetch this data in my function based view, so I can manipulate with it?
data = requests.get(url).json() did not work for me, it gives me back this error:
django.urls.exceptions.Resolver404: {'tried': [[<URLResolver <URLResolver list> (None:None) 'en/'>]], 'path': ''}
At least, how can I test if there is any data being sent to the url?
I'm developing in production env.
Depending on how the data is posted you will be able to find the data with
request.POST.get('url')
or if it is not sent as JSON body it will be found within, which will need to be decoded manually
request.body
I am trying to create a chatbot, I want to create a variable(list/dictionary) the one which is one to save my GET Request, I want to access the value of the variable and manipulate it. After some logical operations, I will use a new variable to save a result and print that result on the same page.
On the server side, you would want to have the logic of the chatbot, for example when the users type 'hi', the chatbot reply 'yes?'. The chatbot will be exposed through an API. Let the response of the API be in json format. You may use python's requests library to get the response from the api in dictionary by writing:
receivedFromChatbox = response.json().
I am writing a web service using Django that will be consumed from a MS SharePoint workflow. In the SP workflow, I created a dictionary with 2 items (id:1, text:'foo'), and used this dictionary as the request content. However, instead of using the dictionary to format a traditional POST parameter list, it sends it as a JSON object in the body of the POST request, so instead of the expected:
id=1&text=foo
in the body of the request, there is this:
{"id":1,"text":"foo"}
which of course, in turn, does not get parsed correctly by Python/Django (I am not sure who exactly does the parsing). How can I either get it to parse JSON, or get SharePoint to send traditionally encoded POST parameters?
EDIT
I saw other posts that explain how to get the raw body and parse the JSON. I was looking for a solution that would either:
Make SharePoint send normal data, or
Get Django to respect the Content-type header that states the data is JSON
There is no need for any parsing at the framework level. The body of the post request is always available in request.body, so you can access it directly:
result = json.loads(request.body)
May be it will help you bit more to handle.
import json
import urlparse
json.dumps(urlparse.parse_qs("id=1&text=foo"))
I am having a bit of trouble understanding API calls and the URLs I'm supposed to use for grabbing data from Imgur. I'm using the following URL to grab JSON data, but I'm receiving old data: http://imgur.com/r/wallpapers/top/day.json
But if I strip the .json from the end of the URL, I see the top pictures from today.
All I want is the JSON data from the top posts of today from Imgur, but keep getting data the refers to Dec 18th, 2014.
I'm using the call in a Python script. I have a token from Imgur to do the stuff, and reading the API documentation, I see a lot of the examples start with https://api. instead of http://imgur.
Which one should I use?
It's probably due to cache control, you can set it to no-cache with your headers and send along with your requests.
Sample (I'm using requests):
import requests
r = requests.get('http://imgur.com/r/wallpapers/top/day.json',
headers={'Cache-Control': 'no-cache'})
# ... your stuff here ...
Imgur updated their docs, so the new and correct form of the URL I used was:
r = requests.get("https://api.imgur.com/3/gallery/r/earthporn/top/")