SPARQL - Unknown namespace prefix error - python

I have a python file with imported rdflib and some SPARQL query implemented
from rdflib import Graph
import html5lib
if __name__ == '__main__':
g = Graph()
g.parse('http://localhost:8085/weather-2.html', format='rdfa')
res1 = g.parse('http://localhost:8085/weather-2.html', format='rdfa')
print(res1.serialize(format='pretty-xml').decode("utf-8"))
print()
res2 = g.query("""SELECT ?obj
WHERE { <http://localhost:8085/weather-2.html> weather:region ?obj . }
""")
for row in res2:
print(row)
res1 has no trouble to print out but for res2 I get an error saying:
Exception: Unknown namespace prefix : weather
Apparently this is due to an error on line 15 according to pycharm, the editor I am using to implement this.
What am I missing that is causing this error?
Is there more to just calling weather:region in my SPARQL query?
If so how to do fix this problem?

As the error message suggests, the namespace weather: is not defined - so in the SPARQL you either need a PREFIX to define weather, like:
PREFIX weather: <weatheruri>
Or you should put the explicit weather URI instead of the weather:
The weather namespace URI (or is it called an IRI?) will be in the XML namespaces in the rdf document - it will end with / or # so if the URI is http://weather.com/ the prefix definition is PREFIX weather: <http://weather.com/>

Related

Expanding a Scribunto module that doesn't have a function

I want to get the return value of this Wikimedia Scribunto module in Python. Its source code is roughly like this:
local Languages = {}
Languages = {
["aa"] = {
name = "afarština",
dir = "ltr",
name_attr_gen_pl = "afarských"
},
-- More languages...
["zza"] = {
name = "zazaki",
dir = "ltr"
}
}
return Languages
In the Wiktextract library, there is already Python code to accomplish similar tasks:
def expand_template(sub_domain: str, text: str) -> str:
import requests
# https://www.mediawiki.org/wiki/API:Expandtemplates
params = {
"action": "expandtemplates",
"format": "json",
"text": text,
"prop": "wikitext",
"formatversion": "2",
}
r = requests.get(f"https://{sub_domain}.wiktionary.org/w/api.php",
params=params)
data = r.json()
return data["expandtemplates"]["wikitext"]
This works for languages like French because there the Scribunto module has a well-defined function that returns a value, as an example here:
Scribunto module:
p = {}
function p.affiche_langues_python(frame)
-- returns the needed stuff here
end
The associated Python function:
def get_fr_languages():
# https://fr.wiktionary.org/wiki/Module:langues/analyse
json_text = expand_template(
"fr", "{{#invoke:langues/analyse|affiche_langues_python}}"
)
json_text = json_text[json_text.index("{") : json_text.index("}") + 1]
json_text = json_text.replace(",\r\n}", "}") # remove tailing comma
data = json.loads(json_text)
lang_data = {}
for lang_code, lang_name in data.items():
lang_data[lang_code] = [lang_name[0].upper() + lang_name[1:]]
save_json_file(lang_data, "fr")
But in our case we don't have a function to call.
So if we try:
def get_cs_languages():
# https://cs.wiktionary.org/wiki/Modul:Languages
json_text = expand_template(
"cs", "{{#invoke:Languages}}"
)
print(json_text)
we get <strong class="error"><span class="scribunto-error" id="mw-scribunto-error-0">Chyba skriptu: Musíte uvést funkci, která se má zavolat.</span></strong> usage: get_languages.py [-h] sub_domain lang_code get_languages.py: error: the following arguments are required: sub_domain, lang_code. (Translated as "You have to specify a function you want to call. But when you enter a function name as a parameter like in the French example, it complains that that function does not exist.)
What could be a way to solve this?
The easiest and most general way is to get the return value of the module as JSON and parse it in Python.
Make another module that exports a function dump_as_json that takes the name of the first module as a frame argument and returns the first module as JSON. In Python, expand {{#invoke:json module|dump_as_json|Module:module to dump}} using the expandtemplates API and parse the return value of the module invocation as JSON with json.loads(data["expandtemplates"]["wikitext"]).
Text of Module:json module (call it what you want):
return {
dump_as_json = function(frame)
local module_name = frame.args[0]
local json_encode = mw.text.jsonEncode
-- json_encode = require "Module:JSON".toJSON
return json_encode(require(module_name))
end
}
With pywikibot:
from pywikibot import Site
site = Site(code="cs", fam="wiktionary")
languages = json.loads(site.expand_text("{{#invoke:json module|dump_as_json|Module:module to dump}}")
If you get the error Lua error: Cannot pass circular reference to PHP, this means that at least one of the tables in Module:module to dump is referenced by another table more than once, like if the module was
local t = {}
return { t, t }
To handle these tables, you will have to get a pure-Lua JSON encoder function to replace mw.text.jsonEncode, like the toJSON function from Module:JSON on English Wiktionary.
One warning about this method that is not relevant for the module you are trying to get: string values in the JSON will only be accurate if they were NFC-normalized valid UTF-8 with no special ASCII control codes (U+0000-U+001F excluding tab U+0009 and LF U+000A) when they were returned from Module:module to dump. As on a wiki page, the expandtemplates API will replace ASCII control codes and invalid UTF-8 with the U+FFFD character, and will NFC-normalize everything else. That is, "\1\128e" .. mw.ustring.char(0x0301) would be modified to the equivalent of mw.ustring.char(0xFFFD, 0xFFFD, 0x00E9). This doesn't matter in most cases (like if the table contains readable text), but if it did matter, the JSON-encoding module would have to output JSON escapes for non-NFC character sequences and ASCII control codes and find some way to encode invalid UTF-8.
If, like the module you are dumping, Module:module to dump is a pure table of literal values with no references to other modules or to Scribunto-only global values, you could also get its raw wikitext with the Revisions API and parse it in Lua on your machine and pass it to Python. I think there is a Python extension that allows you to directly use a Lua state in Python.
Running a module with dependencies on the local machine is not possible unless you go to the trouble of setting up the full Scribunto environment on your machine, and figuring out a way to download the module dependencies and make them available to the Lua state. I have sort of done this myself, but it isn't necessary for your use case.

Using prefixes to display entities from an rdflib graph

I'm using rdflib to load an RDF graph into a Python scrpit
I would like to print a list of subjects using the defined prefixes
I doesn't find any method to apply the prefixes.
My code
import rdflib
filepath = "... my file path ..."
gs = rdflib.Graph()
gs.bind('qs', "http://qs.org/")
gs.bind('foaf',"http://xmlns.com/foaf/0.1/")
gs.parse(filepath,format="nt")
mdstr = ""
for subject in gs.subjects():
mdstr += str(subject) +"\n"
print(mdstr)
I get, for example
http://qs.org/s12095
in place of
qs:s12095
Relevant rdflib documentation. The prefixes in a graph are stored in its NamespaceManager object. To get rdflib to print the prefixes instead of the complete IRI, you can call its method .n3(graph.namespace_manager). So in your case, you can do:
from rdflib import Namespace
from rdflib.namespace import NamespaceManager
# bind namespace to the graph or its namespace manager
graph.bind('qs', Namespace('http://qs.org/'))
# assuming entity is IRI http://qs.org/s12095
print(entity) # --> http://qs.org/s12095
print(entity.n3(graph.namespace_manager)) # --> qs:s12095

Error in python code with SPARQL query

I'm writing a python code to retrieve all actors which is common to both DBpedia and Wikidata. And also getting some additional information like awards received from wikidata. But its throwing an error.
I'm not sure how to correct this error. Here is my python code:
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("https://query.wikidata.org/")
sparql.setQuery("""
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?Actor ?award_received
WHERE {
SERVICE <http://dbpedia.org/sparql> {
?c rdf:type <http://umbel.org/umbel/rc/Actor> .
?c rdfs:label ?Actor.
FILTER (LANG(?Actor)="en").
?c owl:sameAs ?wikidata_actor .
FILTER (STRSTARTS(STR(?wikidata_actor), "http://www.wikidata.org"))}
?wikidata_actor wdt:P166 ?award_received.
}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
if ("Actor" in result):
print(result["Actor"]["value"])
else:
url = 'NONE'
if ("award_received" in result):
print(result["award_received"]["value"])
else:
url = 'NONE'
Here is the error I'm getting:
/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7
"/Users/ashwinis/PycharmProjects/semantic web/club.py"
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-
packages/SPARQLWrapper/Wrapper.py:762: RuntimeWarning: unknown response
content type, returning raw response...
warnings.warn("unknown response content type, returning raw
response...", RuntimeWarning)
Traceback (most recent call last):
File "/Users/ashwinis/PycharmProjects/semantic web/club.py", line 27,
in <module>
for result in results["results"]["bindings"]:
TypeError: string indices must be integers, not str
Process finished with exit code 1
Wikidata SPARQL endpoint address is
http://www.wikidata.org/sparql, not http://www.wikidata.org.
Do not remove this optimization hint: hint:Query hint:optimizer "None". See documentation on how Blazegraph optimizer hints work.
Do not forget to define prefixes (including hint:).
There are minor indentation problems in your Python code.
Do not duplicate questions.

How to deal with globals in modules?

I try to make a non blocking api calls for OpenWeatherMap, but my problem is:
When i was doing tests on the file, and run it, the global api was taking effect, but when importing the function, global dont work anymore, and api dident change: api = ""?
Just after declaring the function i put global api, and then when I use print 'The API link is: ' + api I get the exact api, but global dident took effect!
Here is the code: https://github.com/abdelouahabb/tornadowm/blob/master/tornadowm.py#L62
What am I doing wrong?
When I import the file:
from tornadowm import *
forecast('daily', q='london', lang='fr')
The API link is: http://api.openweathermap.org/data/2.5/forecast/daily?lang=fr&q=london
api
Out[5]: ''
When executing the file instead of importing it:
runfile('C:/Python27/Lib/site-packages/tornadowm.py', wdir='C:/Python27/Lib/site-packages')
forecast('daily', q='london', lang='fr')
The API link is: http://api.openweathermap.org/data/2.5/forecast/daily?lang=fr&q=london
api
Out[8]: 'http://api.openweathermap.org/data/2.5/forecast/daily?lang=fr&q=london'
Edit: here is the code, if the Git got updated:
from tornado.httpclient import AsyncHTTPClient
import json
import xml.etree.ElementTree as ET
http_client = AsyncHTTPClient()
url = ''
response = ''
args = []
link = 'http://api.openweathermap.org/data/2.5/'
api = ''
result = {}
way = ''
def forecast(way, **kwargs):
global api
if way in ('weather', 'forecast', 'daily', 'find'):
if way == 'daily':
way = 'forecast/daily?'
else:
way += '?'
for i, j in kwargs.iteritems():
args.append('&{0}={1}'.format(i, j))
a = ''.join(set(args))
api = (link + way + a.replace(' ', '+')).replace('?&', '?')
print 'The API link is: ' + api
def handle_request(resp):
global response
if resp.error:
print "Error:", resp.error
else:
response = resp.body
http_client.fetch(api, handle_request)
else:
print "please put a way: 'weather', 'forecast', 'daily', 'find' "
def get_result():
global result
if response.startswith('{'):
print 'the result is JSON, stored in the variable result'
result = json.loads(response)
elif response.startswith('<'):
print 'the result is XML, parse the result variable to work on the nodes,'
print 'or, use response to see the raw result'
result = ET.fromstring(response)
else:
print '''Sorry, no valid response, or you used a parameter that is not compatible with the way!\n please check http://www.openweathermap.com/api for more informations''
It's the side effect of using global.
When you do from tornadowm import * your forecast() function is, we could say metaphorically, "on its own" and is not "hard-linked" to your global space anymore.
Why? Because any effect you make on your global api will "end" with your function, and the definition of api = "" in your global space will take precedence.
Also, as a side note, it's not considered a good practice to use from something import *. You should do from tornadowm import forecast or even better, import tornadown and then use tornadowm.forecast().
OR
Even better, I just noticed your forecast() function doesn't return anything. Which technically makes it not a function anymore, but a procedure (a procedure is like a function but it returns nothing, it just "does" stuff).
Instead of using a global, you should define api in this function and then return api from it. Like this:
def forecast(blablabla):
api = "something"
blablabla
return api
And then
import tornadowm
api = tornadown.forecast(something)
And you're done.
Globals are global only to the module they're defined in. So, normally, you would expect tornadowm.api to be changed when you call forecast, but not api in some other namespace.
The import * is contributing to your understanding of the problem. This imports api (among other names) into the importing namespace. This means that api and tornadowm.api initially point to the same object. But these two names are not linked in any way, and so calling forecast() changes only tornadowm.api and now the two names point to different objects.
To avoid this, don't use import *. It is bad practice anyway and this is just one of the reasons. Instead, import tornadowm and access the variable in the importing module as tornadowm.api.
I'm afraid this is because global is coupled within module, by the time you from tornadowm import * you have imported the api name, but the global api won't take any effects within another module.

Python Sparql Querying Local File

I have the following Python code. It basically returns some elements of RDF from an online resource using SPARQL.
I want to query and return something from one of my local files. I tried to edit it but couldn't return anything.
What should I change in order to query within my local instead of http://dbpedia.org/resource?
from SPARQLWrapper import SPARQLWrapper, JSON
# wrap the dbpedia SPARQL end-point
endpoint = SPARQLWrapper("http://dbpedia.org/sparql")
# set the query string
endpoint.setQuery("""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpr: <http://dbpedia.org/resource/>
SELECT ?label
WHERE { dbpr:Asturias rdfs:label ?label }
""")
# select the retur format (e.g. XML, JSON etc...)
endpoint.setReturnFormat(JSON)
# execute the query and convert into Python objects
# Note: The JSON returned by the SPARQL endpoint is converted to nested Python dictionaries, so additional parsing is not required.
results = endpoint.query().convert()
# interpret the results:
for res in results["results"]["bindings"] :
print res['label']['value']
Thanks!
SPARQLWrapper is meant to be used only with remote or local SPARQL endpoints. You have two options:
(a) Put your local RDF file in a local triple store and point your code to localhost. (b) Or use rdflib and use the InMemory storage:
import rdflib.graph as g
graph = g.Graph()
graph.parse('filename.rdf', format='rdf')
print graph.serialize(format='pretty-xml')
You can query the rdflib.graph.Graph() with:
filename = "path/to/fileneme" #replace with something interesting
uri = "uri_of_interest" #replace with something interesting
import rdflib
import rdfextras
rdfextras.registerplugins() # so we can Graph.query()
g=rdflib.Graph()
g.parse(filename)
results = g.query("""
SELECT ?p ?o
WHERE {
<%s> ?p ?o.
}
ORDER BY (?p)
""" % uri) #get every predicate and object about the uri

Categories

Resources