Python Sparql Querying Local File

Python Sparql Querying Local File - python

I have the following Python code. It basically returns some elements of RDF from an online resource using SPARQL.
I want to query and return something from one of my local files. I tried to edit it but couldn't return anything.
What should I change in order to query within my local instead of http://dbpedia.org/resource?
from SPARQLWrapper import SPARQLWrapper, JSON
# wrap the dbpedia SPARQL end-point
endpoint = SPARQLWrapper("http://dbpedia.org/sparql")
# set the query string
endpoint.setQuery("""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpr: <http://dbpedia.org/resource/>
SELECT ?label
WHERE { dbpr:Asturias rdfs:label ?label }
""")
# select the retur format (e.g. XML, JSON etc...)
endpoint.setReturnFormat(JSON)
# execute the query and convert into Python objects
# Note: The JSON returned by the SPARQL endpoint is converted to nested Python dictionaries, so additional parsing is not required.
results = endpoint.query().convert()
# interpret the results:
for res in results["results"]["bindings"] :
print res['label']['value']
Thanks!

SPARQLWrapper is meant to be used only with remote or local SPARQL endpoints. You have two options:
(a) Put your local RDF file in a local triple store and point your code to localhost. (b) Or use rdflib and use the InMemory storage:
import rdflib.graph as g
graph = g.Graph()
graph.parse('filename.rdf', format='rdf')
print graph.serialize(format='pretty-xml')

You can query the rdflib.graph.Graph() with:
filename = "path/to/fileneme" #replace with something interesting
uri = "uri_of_interest" #replace with something interesting
import rdflib
import rdfextras
rdfextras.registerplugins() # so we can Graph.query()
g=rdflib.Graph()
g.parse(filename)
results = g.query("""
SELECT ?p ?o
WHERE {
<%s> ?p ?o.
}
ORDER BY (?p)
""" % uri) #get every predicate and object about the uri

Related

Expanding a Scribunto module that doesn't have a function

I want to get the return value of this Wikimedia Scribunto module in Python. Its source code is roughly like this:
local Languages = {}
Languages = {
["aa"] = {
name = "afarština",
dir = "ltr",
name_attr_gen_pl = "afarských"
},
-- More languages...
["zza"] = {
name = "zazaki",
dir = "ltr"
}
}
return Languages
In the Wiktextract library, there is already Python code to accomplish similar tasks:
def expand_template(sub_domain: str, text: str) -> str:
import requests
# https://www.mediawiki.org/wiki/API:Expandtemplates
params = {
"action": "expandtemplates",
"format": "json",
"text": text,
"prop": "wikitext",
"formatversion": "2",
}
r = requests.get(f"https://{sub_domain}.wiktionary.org/w/api.php",
params=params)
data = r.json()
return data["expandtemplates"]["wikitext"]
This works for languages like French because there the Scribunto module has a well-defined function that returns a value, as an example here:
Scribunto module:
p = {}
function p.affiche_langues_python(frame)
-- returns the needed stuff here
end
The associated Python function:
def get_fr_languages():
# https://fr.wiktionary.org/wiki/Module:langues/analyse
json_text = expand_template(
"fr", "{{#invoke:langues/analyse|affiche_langues_python}}"
)
json_text = json_text[json_text.index("{") : json_text.index("}") + 1]
json_text = json_text.replace(",\r\n}", "}") # remove tailing comma
data = json.loads(json_text)
lang_data = {}
for lang_code, lang_name in data.items():
lang_data[lang_code] = [lang_name[0].upper() + lang_name[1:]]
save_json_file(lang_data, "fr")
But in our case we don't have a function to call.
So if we try:
def get_cs_languages():
# https://cs.wiktionary.org/wiki/Modul:Languages
json_text = expand_template(
"cs", "{{#invoke:Languages}}"
)
print(json_text)
we get <strong class="error"><span class="scribunto-error" id="mw-scribunto-error-0">Chyba skriptu: Musíte uvést funkci, která se má zavolat.</span></strong> usage: get_languages.py [-h] sub_domain lang_code get_languages.py: error: the following arguments are required: sub_domain, lang_code. (Translated as "You have to specify a function you want to call. But when you enter a function name as a parameter like in the French example, it complains that that function does not exist.)
What could be a way to solve this?

The easiest and most general way is to get the return value of the module as JSON and parse it in Python.
Make another module that exports a function dump_as_json that takes the name of the first module as a frame argument and returns the first module as JSON. In Python, expand {{#invoke:json module|dump_as_json|Module:module to dump}} using the expandtemplates API and parse the return value of the module invocation as JSON with json.loads(data["expandtemplates"]["wikitext"]).
Text of Module:json module (call it what you want):
return {
dump_as_json = function(frame)
local module_name = frame.args[0]
local json_encode = mw.text.jsonEncode
-- json_encode = require "Module:JSON".toJSON
return json_encode(require(module_name))
end
}
With pywikibot:
from pywikibot import Site
site = Site(code="cs", fam="wiktionary")
languages = json.loads(site.expand_text("{{#invoke:json module|dump_as_json|Module:module to dump}}")
If you get the error Lua error: Cannot pass circular reference to PHP, this means that at least one of the tables in Module:module to dump is referenced by another table more than once, like if the module was
local t = {}
return { t, t }
To handle these tables, you will have to get a pure-Lua JSON encoder function to replace mw.text.jsonEncode, like the toJSON function from Module:JSON on English Wiktionary.
One warning about this method that is not relevant for the module you are trying to get: string values in the JSON will only be accurate if they were NFC-normalized valid UTF-8 with no special ASCII control codes (U+0000-U+001F excluding tab U+0009 and LF U+000A) when they were returned from Module:module to dump. As on a wiki page, the expandtemplates API will replace ASCII control codes and invalid UTF-8 with the U+FFFD character, and will NFC-normalize everything else. That is, "\1\128e" .. mw.ustring.char(0x0301) would be modified to the equivalent of mw.ustring.char(0xFFFD, 0xFFFD, 0x00E9). This doesn't matter in most cases (like if the table contains readable text), but if it did matter, the JSON-encoding module would have to output JSON escapes for non-NFC character sequences and ASCII control codes and find some way to encode invalid UTF-8.
If, like the module you are dumping, Module:module to dump is a pure table of literal values with no references to other modules or to Scribunto-only global values, you could also get its raw wikitext with the Revisions API and parse it in Lua on your machine and pass it to Python. I think there is a Python extension that allows you to directly use a Lua state in Python.
Running a module with dependencies on the local machine is not possible unless you go to the trouble of setting up the full Scribunto environment on your machine, and figuring out a way to download the module dependencies and make them available to the Lua state. I have sort of done this myself, but it isn't necessary for your use case.

Using prefixes to display entities from an rdflib graph

I'm using rdflib to load an RDF graph into a Python scrpit
I would like to print a list of subjects using the defined prefixes
I doesn't find any method to apply the prefixes.
My code
import rdflib
filepath = "... my file path ..."
gs = rdflib.Graph()
gs.bind('qs', "http://qs.org/")
gs.bind('foaf',"http://xmlns.com/foaf/0.1/")
gs.parse(filepath,format="nt")
mdstr = ""
for subject in gs.subjects():
mdstr += str(subject) +"\n"
print(mdstr)
I get, for example
http://qs.org/s12095
in place of
qs:s12095

Relevant rdflib documentation. The prefixes in a graph are stored in its NamespaceManager object. To get rdflib to print the prefixes instead of the complete IRI, you can call its method .n3(graph.namespace_manager). So in your case, you can do:
from rdflib import Namespace
from rdflib.namespace import NamespaceManager
# bind namespace to the graph or its namespace manager
graph.bind('qs', Namespace('http://qs.org/'))
# assuming entity is IRI http://qs.org/s12095
print(entity) # --> http://qs.org/s12095
print(entity.n3(graph.namespace_manager)) # --> qs:s12095

SPARQL - Unknown namespace prefix error

I have a python file with imported rdflib and some SPARQL query implemented
from rdflib import Graph
import html5lib
if __name__ == '__main__':
g = Graph()
g.parse('http://localhost:8085/weather-2.html', format='rdfa')
res1 = g.parse('http://localhost:8085/weather-2.html', format='rdfa')
print(res1.serialize(format='pretty-xml').decode("utf-8"))
print()
res2 = g.query("""SELECT ?obj
WHERE { <http://localhost:8085/weather-2.html> weather:region ?obj . }
""")
for row in res2:
print(row)
res1 has no trouble to print out but for res2 I get an error saying:
Exception: Unknown namespace prefix : weather
Apparently this is due to an error on line 15 according to pycharm, the editor I am using to implement this.
What am I missing that is causing this error?
Is there more to just calling weather:region in my SPARQL query?
If so how to do fix this problem?

As the error message suggests, the namespace weather: is not defined - so in the SPARQL you either need a PREFIX to define weather, like:
PREFIX weather: <weatheruri>
Or you should put the explicit weather URI instead of the weather:
The weather namespace URI (or is it called an IRI?) will be in the XML namespaces in the rdf document - it will end with / or # so if the URI is http://weather.com/ the prefix definition is PREFIX weather: <http://weather.com/>

How do i use SimpleQueryString function of elasticsearch?

I am trying to write a django app and use elasticsearch in it with elasticsearch-dsl library of python. I don't want to create all switch-case statements and then pass search queries and filters accordingly.
I want a function that does the parsing stuff by itself.
For e.g. If i pass "some text url:github.com tags:es,es-dsl,django",
the function should output corresponding query.
I searched for it in elasticsearch-dsl documentation and found a function that does the parsing.
https://github.com/elastic/elasticsearch-dsl-py/search?utf8=%E2%9C%93&q=simplequerystring&type=
However, I dont know how to use it.
I tried s = Search(using=client).query.SimpleQueryString("1st|ldnkjsdb"), but it is showing me parsing error.
Can anyone help me out?

You can just plug the SimpleQueryString in the Search object, instead of a dictionary send the elements as parameters of the object.
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
from elasticsearch_dsl.query import SimpleQueryString
client = Elasticsearch()
_search = Search(using=client, index='INDEX_NAME')
_search = _search.filter( SimpleQueryString(
query = "this + (that | thus) -those",
fields= ["field_to_search"],
default_operator= "and"
))
A lot of elasticsearch_dsl simply change the dictionary representation to classes of functions that makes the code look pythonic, and avoid the use of hard-to-read elasticsearch JSONs.

Im guessing you are asking about the usage of elasticsearch-dsl with query string like you are making a request with json data to the elasticsearch api. If that's the case, this is how you are going to use elasticsearch-dsl:
assume you have the query in query variable like this:
{
"query": {
"query_string" : {
"default_field" : "content",
"query" : "this AND that OR thus"
}
}
}
and now do this:
es = Elasticsearch(
host=settings.ELASTICSEARCH_HOST_IP, # Put your ES host IP
port=settings.ELASTICSEARCH_HOST_PORT, # Put yor ES host port
)
index = settings.MY_INDEX # Put your index name here
result = es.search(index=index, body=query)

How to query dbpedia resource ontology 'wikiPageExternalLink'

Using sparql\sparqlwrapper in python, how will I be able to query for the values of a certain dbpedia resource? For example, how will I be able to get the dbpedia-owl:wikiPageExternalLink values of http://dbpedia.org/page/Asturias?
Here's a simple example on how will I be able to query for the rdfs:label of Asturias. But I don't know how to modify the query/query parameters to get values of property/ontology other than those included on rdfs schema. Here's the sample:
from SPARQLWrapper import SPARQLWrapper, JSON, XML, N3, RDF
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label
WHERE { <http://dbpedia.org/resource/Asturias> rdfs:label ?label }
""")
print '\n\n*** JSON Example'
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
print result["label"]["value"]
Hoping to receive feedback. Thanks in advance!

Not sure where you're stuck—this is really easy:
SELECT ?label
WHERE { <http://dbpedia.org/resource/Asturias>
dbpedia-owl:wikiPageExternalLink ?label }
Usually you need to declare the namespace prefixes like rdfs: or dbpedia-owl: if you want to use them in the query, but on the DBpedia endpoint this works even without. If you want, you can declare them anyways:
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT ?label
WHERE { <http://dbpedia.org/resource/Asturias>
dbpedia-owl:wikiPageExternalLink ?label }
You can find out the full URI corresponding to the prefix by going to http://dbpedia.org/sparql and clicking on “Namespace Prefixes” near the top right corner.
If you want to rename the variable (for example, from ?label to ?link) then do it like this:
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT ?link
WHERE { <http://dbpedia.org/resource/Asturias>
dbpedia-owl:wikiPageExternalLink ?link }
and you also have to change "label" to "link" in the Python code that gets the value out of the JSON result.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Sparql Querying Local File - python

Related

Expanding a Scribunto module that doesn't have a function

Using prefixes to display entities from an rdflib graph

SPARQL - Unknown namespace prefix error

How do i use SimpleQueryString function of elasticsearch?

How to query dbpedia resource ontology 'wikiPageExternalLink'

Categories

Resources