Sparql query JSON error from BNCF endpoint - python

I'm trying to retrieve results from the BNCF at this endpoint.
My query (with "ab" as example) is:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?source ?label ?content
WHERE {
?source a skos:Concept;
skos:prefLabel ?label;
skos:scopeNote ?content.
FILTER regex(str(?label), "ab", "i")
}
The query is correct in fact if you try to run it works.
But when I try to get the results from my python this is the error:
SyntaxError: JSON Parse error: Unexpected EOF
This is my python code:
__3store = "http://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS_03_2014/query"
sparql = SPARQLUpdateStore(queryEndpoint=__3store)
sparql.setReturnFormat(JSON)
results = sparql.query(query_rdf).convert()
print json.dumps(result, separators=(',',':'))
I tried the code above according to this answer, before my code was like this:
__3store = "http://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS_03_2014/query"
sparql = SPARQLWrapper(__3store,returnFormat="json")
sparql.setQuery(query_rdf)
result = sparql.query().convert()
print json.dumps(result, separators=(',',':'))
but both throw the same error.
Does anyone know how to fix it?
Thanks
EDIT:
This is python code, hope it is enough to understand
import sys
sys.path.append ('cgi/lib')
import rdflib
from rdflib.plugins.stores.sparqlstore import SPARQLUpdateStore, SPARQLStore
import json
from SPARQLWrapper import SPARQLWrapper, JSON
#MAIN
print "Content-type: application/json"
print
prefix_SKOS = "prefix skos: <http://www.w3.org/2004/02/skos/core#>"
crlf = "\n"
query_rdf = ""
query_rdf += prefix_SKOS + crlf
query_rdf += '''
SELECT DISTINCT ?source ?title ?content
WHERE {
?source a skos:Concept;
skos:prefLabel ?title;
skos:scopeNote ?content.
FILTER regex(str(?title), "ab", "i")
}
'''
__3store = "http://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS_03_2014/query"
sparql = SPARQLWrapper(__3store,returnFormat="json")
sparql.setQuery(query_rdf)
result = sparql.query().convert()
print result
Running this in Python shell returns:
Content-type: application/json
Warning (from warnings module):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/SPARQLWrapper-1.6.4-py2.7.egg/SPARQLWrapper/Wrapper.py", line 689
RuntimeWarning: Format requested was JSON, but XML (application/sparql-results+xml;charset=UTF-8) has been returned by the endpoint
<xml.dom.minidom.Document instance at 0x105add710>
So I think the result is always an XML also if I specificied Json as a return format.

There are a couple of problems playing together here:
First, you should only use SPARQLUpdateStore from rdflib if you want to access a SPARQL store via rdflib's Graph interface (e.g., you can add triples, you can iterate over them, etc.).
If you want to write a SPARQL query yourself you should use SPARQLWrapper.
Second, if you ask SPARQLWrapper to return JSON, what it does is actually ask the server for a couple of mime types that are most common and standardized for what we just call "json", as shown here and here:
_SPARQL_JSON = ["application/sparql-results+json", "text/javascript", "application/json"]
It seems as if your sever does understand application/sparql-results+json, but not a combined "give me any of these mime-types header" as rdflib compiles it for maximum interoperability (so your server essentially doesn't fully support HTTP Accept Headers):
curl -i -G -H 'Accept: application/sparql-results+json' --data-urlencode 'query=PREFIX skos:
<http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?source ?label ?content
WHERE {
?source a skos:Concept;
skos:prefLabel ?label;
skos:scopeNote ?content.
FILTER regex(str(?label), "ab", "i")
}' http://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS_03_2014/query
will return:
HTTP/1.1 200 OK
Date: Mon, 18 May 2015 13:13:45 GMT
Server: Apache/2.2.17 (Unix) PHP/5.3.6 mod_jk/1.2.31
...
Content-Type: application/sparql-results+json;charset=UTF-8
{
"head" : {
"vars" : [ ],
"vars" : [ "source", "label", "content" ],
"link" : [ "info" ]
},
"results" : {
"bindings" : [ {
"content" : {
"type" : "literal",
"value" : "Il lasciare ingiustificatamente qualcuno o qualcosa di cui si รจ responsabili"
},
"source" : {
"type" : "uri",
"value" : "http://purl.org/bncf/tid/12445"
},
"label" : {
"xml:lang" : "it",
"type" : "literal",
"value" : "Abbandono"
}
},
...
so everything is ok, but if we ask for the combined, more interoperable mime types:
curl -i -G -H 'Accept: application/sparql-results+json,text/javascript,application/json' --data-urlencode 'query=PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?source ?label ?content
WHERE {
?source a skos:Concept;
skos:prefLabel ?label;
skos:scopeNote ?content.
FILTER regex(str(?label), "ab", "i")
}' http://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS_03_2014/query
we get an xml result:
HTTP/1.1 200 OK
Server: Apache/2.2.17 (Unix) PHP/5.3.6 mod_jk/1.2.31
...
Content-Type: application/sparql-results+xml;charset=UTF-8
<?xml version='1.0' encoding='UTF-8'?>
...
So long story short: it's a bug in the server you're using. The following is a nasty workaround (it seems SPARQLWrapper doesn't just allow us to manually set the headers, but unconditionally overrides them in _createRequest), but it works:
In [1]: import SPARQLWrapper as sw
In [2]: sparql = sw.SPARQLWrapper("http://digitale.bncf.firenze.sbn.it/openrdf-workbench/repositories/NS_03_2014/query")
In [3]: sparql.setReturnFormat(sw.JSON)
In [4]: sparql.setQuery(''' PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?source ?label ?content
WHERE {
?source a skos:Concept;
skos:prefLabel ?label;
skos:scopeNote ?content.
FILTER regex(str(?label), "ab", "i")
}
''')
In [5]: request = sparql._createRequest()
In [6]: request.add_header('Accept', 'application/sparql-results+json')
In [7]: from urllib2 import urlopen
In [8]: response = urlopen(request)
In [9]: res = sw.Wrapper.QueryResult((response, sparql.returnFormat))
In [10]: result = res.convert()
In [11]: result
Out[11]:
{u'head': {u'link': [u'info'], u'vars': [u'source', u'label', u'content']},
u'results': {u'bindings': [{u'content': {u'type': u'literal',
u'value': u'Il lasciare ingiustificatamente qualcuno o qualcosa di cui si \xe8 responsabili'},
u'label': {u'type': u'literal',
u'value': u'Abbandono',
u'xml:lang': u'it'},
u'source': {u'type': u'uri', u'value': u'http://purl.org/bncf/tid/12445'}},
...

Related

Python is returning json response in single quotes not double (Elasticsearch API)

I am using Elasticsearch library in python to query an elasticsearch server like below:
query = {
"query":{
"match_all":{}
}
}
es = Elasticsearch('localhost:9200')
res = es.search(index='myindex', body=query)
with open('response.json', 'w') as out:
out.write(res)
# json.dump(res, out) also gives the same result
the output I save to file has the format
{
'key':'value',
'key2': {
'key3' : 'value2'
}
}
Notice the single quotes here. I know I can do a simple find and replace or use sed to change single quotes to double, however, I want to know why this is happening when with curl from terminal it is not the case.
I would prefer to have the output dumped in a proper json format
Python dictionary object is not necessarily automatically in json format. The above issue can be solved by using json.dumps() on the response like below:
query = {
"query":{
"match_all":{}
}
}
es = Elasticsearch('localhost:9200')
res = es.search(index='myindex', body=query)
with open('response.json', 'w') as out:
out.write(json.dumps(res))
This will turn res into json format like below:
{
"key":"value",
"key2": {
"key3" : "value2"
}
}

How can I load a string that looks like json? [duplicate]

I wonder if there is a way to decode a JSON-like string.
I got string:
'{ hotel: { id: "123", name: "hotel_name"} }'
It's not a valid JSON string, so I can't decode it directly with the python API.
Python will only accept a stringified JSON string like:
'{ "hotel": { "id": "123", "name": "hotel_name"} }'
where properties are quoted to be a string.
Use demjson module, which has ability to decode in non-strict mode.
In [1]: import demjson
In [2]: demjson.decode('{ hotel: { id: "123", name: "hotel_name"} }')
Out[2]: {u'hotel': {u'id': u'123', u'name': u'hotel_name'}}
You could try and use a wrapper for a JavaScript engine, like pyv8.
import PyV8
ctx = PyV8.JSContext()
ctx.enter()
# Note that we need to insert an assignment here ('a ='), or syntax error.
js = 'a = ' + '{ hotel: { id: "123", name: "hotel_name"} }'
a = ctx.eval(js)
a.hotel.id
>> '123' # Prints
#vartec has already pointed out demjson, which works well for slightly invalid JSON. For data that's even less JSON compliant I've written barely_json:
from barely_json import parse
print(parse('[no, , {complete: yes, where is my value?}]'))
prints
[False, '', {'complete': True, 'where is my value?': ''}]
Not very elegant and not robust (and easy to break), but it may be possible to kludge it with something like:
kludged = re.sub('(?i)([a-z_].*?):', r'"\1":', string)
# { "hotel": { "id": "123", "name": "hotel_name"} }
You may find that using pyparsing and the parsePythonValue.py example could do what you want as well... (or modified fairly easily to do so) or the jsonParser.py could be modified to not require quoted key values.

SyntaxError with pymongo and $ne : null

this is my code :
#! /usr/bin/python
import os
from pymongo.connection import Connection
from pymongo.master_slave_connection import MasterSlaveConnection
database = 'toto'
collection = 'logs'
master = Connection(host="X.X.X.X", port=27017)
slave1 = Connection(host="X.X.X.X", port=27017)
con = MasterSlaveConnection(master, slaves=[slave1, master])
db = getattr(con,database)
#host_name.append("getattr(db,collection).distinct( 'host_name' )")
#print host_name[1]
hosts = db.logs.distinct( 'host_name' )
services = db.logs.distinct("service_description" , { "service_description" : { $ne : null } } )
#print hosts
print services
I got this error :
File "./rapport.py", line 23
services = db.logs.distinct("service_description" , { "service_description" : { $ne : null } } )
^
SyntaxError: invalid syntax
Why i can't use "$ne : null" in my code? I don't understand because when i execute this query "db.logs.distinct("service_description" , { "service_description" : { $ne : null } } )" directly in mongodb it works.
I also tried this but it doesn't work :
services = db.logs.distinct("service_description", { "service_description" : { "$ne" : None } } )
Thanks for your help.
You need to quote the $ne and use None instead of null.
Pymongo uses dicts as parameters.
asdf = "something"
{ asdf: "foo"}
is a valid declaration, using "something" as key.
If you compare that with
{$ne: "foo"}
the interpreter expects a variable name as first entry, and $neis invalid.
Also, nullis not predefined in Python, so use None instead.
Combined with the fluid interface in pymongo, your query should be:
db.logs.find({"service_description": {"$ne" : None}}).distinct('service_description')

Pymongo find by _id in subdocuments

Assuming that this one item of my database:
{"_id" : ObjectID("526fdde0ef501a7b0a51270e"),
"info": "foo",
"status": true,
"subitems : [ {"subitem_id" : ObjectID("65sfdde0ef501a7b0a51e270"),
//more},
{....}
],
//more
}
I want to find (or find_one, doesn't matter) the document(s) with "subitems.subitem_id" : xxx.
I have tried the following. All of them return an empty list.
from pymongo import MongoClient,errors
from bson.objectid import ObjectId
id = '65sfdde0ef501a7b0a51e270'
db.col.find({"subitems.subitem_id" : id } ) #obviously wrong
db.col.find({"subitems.subitem_id" : Objectid(id) })
db.col.find({"subitems.subitem_id" : {"$oid":id} })
db.col.find({"subitems.subitem_id.$oid" : id })
db.col.find({"subitems.$.subitem_id" : Objectid(id) })
In mongoshell this one works however:
find({"subitems.subitem_id" : { "$oid" : "65sfdde0ef501a7b0a51e270" } })
The literal 65sfdde0ef501a7b0a51e270 is not hexadecimal, hence, not a valid ObjectId.
Also, id is a Python built-in function. Avoid reseting it.
Finally, you execute a find but do not evaluate it, so you do not see any results. Remember that pymongo cursors are lazy.
Try this.
from pymongo import MongoClient
from bson.objectid import ObjectId
db = MongoClient().database
oid = '65cfdde0ef501a7b0a51e270'
x = db.col.find({"subitems.subitem_id" : ObjectId(oid)})
print list(x)
Notice I adjusted oid to a valid hexadecimal string.
Same query in the Mongo JavaScript shell.
db.col.find({"subitems.subitem_id" : new ObjectId("65cfdde0ef501a7b0a51e270")})
Double checked. Right answer is db.col.find({"subitems.subitem_id" : Objectid(id)})
Be aware that this query will return full record, not just matching part of sub-array.
Mongo shell:
a = ObjectId("5273e7d989800e7f4959526a")
db.m.insert({"subitems": [{"subitem_id":a},
{"subitem_id":ObjectId()}]})
db.m.insert({"subitems": [{"subitem_id":ObjectId()},
{"subitem_id":ObjectId()}]})
db.m.find({"subitems.subitem_id" : a })
>>> { "_id" : ObjectId("5273e8e189800e7f4959526d"),
"subitems" :
[
{"subitem_id" : ObjectId("5273e7d989800e7f4959526a") },
{"subitem_id" : ObjectId("5273e8e189800e7f4959526c")}
]}

How to decode an invalid json string in python

I wonder if there is a way to decode a JSON-like string.
I got string:
'{ hotel: { id: "123", name: "hotel_name"} }'
It's not a valid JSON string, so I can't decode it directly with the python API.
Python will only accept a stringified JSON string like:
'{ "hotel": { "id": "123", "name": "hotel_name"} }'
where properties are quoted to be a string.
Use demjson module, which has ability to decode in non-strict mode.
In [1]: import demjson
In [2]: demjson.decode('{ hotel: { id: "123", name: "hotel_name"} }')
Out[2]: {u'hotel': {u'id': u'123', u'name': u'hotel_name'}}
You could try and use a wrapper for a JavaScript engine, like pyv8.
import PyV8
ctx = PyV8.JSContext()
ctx.enter()
# Note that we need to insert an assignment here ('a ='), or syntax error.
js = 'a = ' + '{ hotel: { id: "123", name: "hotel_name"} }'
a = ctx.eval(js)
a.hotel.id
>> '123' # Prints
#vartec has already pointed out demjson, which works well for slightly invalid JSON. For data that's even less JSON compliant I've written barely_json:
from barely_json import parse
print(parse('[no, , {complete: yes, where is my value?}]'))
prints
[False, '', {'complete': True, 'where is my value?': ''}]
Not very elegant and not robust (and easy to break), but it may be possible to kludge it with something like:
kludged = re.sub('(?i)([a-z_].*?):', r'"\1":', string)
# { "hotel": { "id": "123", "name": "hotel_name"} }
You may find that using pyparsing and the parsePythonValue.py example could do what you want as well... (or modified fairly easily to do so) or the jsonParser.py could be modified to not require quoted key values.

Categories

Resources