I'm writing a python code to retrieve all actors which is common to both DBpedia and Wikidata. And also getting some additional information like awards received from wikidata. But its throwing an error.
I'm not sure how to correct this error. Here is my python code:
from SPARQLWrapper import SPARQLWrapper, JSON
sparql = SPARQLWrapper("https://query.wikidata.org/")
sparql.setQuery("""
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?Actor ?award_received
WHERE {
SERVICE <http://dbpedia.org/sparql> {
?c rdf:type <http://umbel.org/umbel/rc/Actor> .
?c rdfs:label ?Actor.
FILTER (LANG(?Actor)="en").
?c owl:sameAs ?wikidata_actor .
FILTER (STRSTARTS(STR(?wikidata_actor), "http://www.wikidata.org"))}
?wikidata_actor wdt:P166 ?award_received.
}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
if ("Actor" in result):
print(result["Actor"]["value"])
else:
url = 'NONE'
if ("award_received" in result):
print(result["award_received"]["value"])
else:
url = 'NONE'
Here is the error I'm getting:
/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7
"/Users/ashwinis/PycharmProjects/semantic web/club.py"
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-
packages/SPARQLWrapper/Wrapper.py:762: RuntimeWarning: unknown response
content type, returning raw response...
warnings.warn("unknown response content type, returning raw
response...", RuntimeWarning)
Traceback (most recent call last):
File "/Users/ashwinis/PycharmProjects/semantic web/club.py", line 27,
in <module>
for result in results["results"]["bindings"]:
TypeError: string indices must be integers, not str
Process finished with exit code 1
Wikidata SPARQL endpoint address is
http://www.wikidata.org/sparql, not http://www.wikidata.org.
Do not remove this optimization hint: hint:Query hint:optimizer "None". See documentation on how Blazegraph optimizer hints work.
Do not forget to define prefixes (including hint:).
There are minor indentation problems in your Python code.
Do not duplicate questions.
Related
I want to find all pubs in a specific area using the Overpass API and selecting the area with geocodeArea.
Testing the following query on overpass-turbo.eu gives me the desired result:
{{geocodeArea:berlin}}->.searchArea;
(
node["amenity"="pub"](area.searchArea);
way["amenity"="pub"](area.searchArea);
relation["amenity"="pub"](area.searchArea);
);
out body;
>;
out skel qt;
But when I implement that query in python using overpy...
import overpy
api = overpy.Overpass()
result = api.query("""
{{geocodeArea:berlin}}->.searchArea;
(
node["amenity"="pub"](area.searchArea);
way["amenity"="pub"](area.searchArea);
relation["amenity"="pub"](area.searchArea);
);
out body;
>;
out skel qt;
""")
print("Amenities in nodes: %d" % len(result.nodes))
print("Amenities in ways: %d" % len(result.ways))
... I get the following error:
Traceback (most recent call last):
File "testOP.py", line 15, in <module>
""")
File "/usr/local/lib/python2.7/dist-packages/overpy/__init__.py", line 119, in query
msgs=msgs
overpy.exception.OverpassBadRequest: Error: line 2: parse error: Unknown type "{"
Error: line 2: parse error: An empty query is not allowed
Error: line 2: parse error: ';' expected - '{' found.
I guess that the problem has to do with the double curly braces, but so far escaping them and other variations didn't help.
Possible solution with Nominatim
Thanks to #scai I know now, that with {{geocodeArea:xxx}} overpass turbo only makes a geocode request. I decided to implement that in my program by myself using geopy and Nominatim:
from geopy.geocoders import Nominatim
import overpy
city_name = "berlin"
# Geocoding request via Nominatim
geolocator = Nominatim(user_agent="city_compare")
geo_results = geolocator.geocode(city_name, exactly_one=False, limit=3)
# Searching for relation in result set
for r in geo_results:
print(r.address, r.raw.get("osm_type"))
if r.raw.get("osm_type") == "relation":
city = r
break
# Calculating area id
area_id = int(city.raw.get("osm_id")) + 3600000000
# Excecuting overpass call
api = overpy.Overpass()
result = api.query("""
area(%s)->.searchArea;
(
node["amenity"="pub"](area.searchArea);
way["amenity"="pub"](area.searchArea);
relation["amenity"="pub"](area.searchArea);
);
out body;
""" % area_id)
# Printing no. of pubs in nodes and ways
print("Amenities in nodes: %d" % len(result.nodes))
print("Amenities in ways: %d" % len(result.ways))
The code ...
Does a geocoding request to Nominatim
Searches for the first element in the results (max. 3) which is a relation
Adds 3600000000 to get the area id from the relation id
It's not a very clean solution and I wonder if it's possible to directly use the first result (which is mostly the city just as point) for my purposes. Hints are still welcome.
{{geocodeArea: xxx }} is a special feature of overpass turbo and not part of Overpass API. overpy uses Overpass API directly which means you can't use this keyword.
However {{geocodeArea: xxx }} just tells overpass turbo to perform a geocoding request, i.e. transform an address into a geographic location. You can do the same, e.g. by making a call to Nominatim, Photon or any other geocoder.
You can achieve similar results using the area filter instead of geocodeArea. For your example:
area[name="Berlin"]->.searchArea;
(
node["amenity"="pub"](area.searchArea);
way["amenity"="pub"](area.searchArea);
relation["amenity"="pub"](area.searchArea);
);
out body;
>;
out skel qt;
This could work, but for other areas maybe you need to be more specific with the tags used in the area filter, more information in the Language Guide for Overpass
I have a python file with imported rdflib and some SPARQL query implemented
from rdflib import Graph
import html5lib
if __name__ == '__main__':
g = Graph()
g.parse('http://localhost:8085/weather-2.html', format='rdfa')
res1 = g.parse('http://localhost:8085/weather-2.html', format='rdfa')
print(res1.serialize(format='pretty-xml').decode("utf-8"))
print()
res2 = g.query("""SELECT ?obj
WHERE { <http://localhost:8085/weather-2.html> weather:region ?obj . }
""")
for row in res2:
print(row)
res1 has no trouble to print out but for res2 I get an error saying:
Exception: Unknown namespace prefix : weather
Apparently this is due to an error on line 15 according to pycharm, the editor I am using to implement this.
What am I missing that is causing this error?
Is there more to just calling weather:region in my SPARQL query?
If so how to do fix this problem?
As the error message suggests, the namespace weather: is not defined - so in the SPARQL you either need a PREFIX to define weather, like:
PREFIX weather: <weatheruri>
Or you should put the explicit weather URI instead of the weather:
The weather namespace URI (or is it called an IRI?) will be in the XML namespaces in the rdf document - it will end with / or # so if the URI is http://weather.com/ the prefix definition is PREFIX weather: <http://weather.com/>
I am using the Google Places API with Python to build a chatbot application to suggest nearby places.
I am referring to the below link:
https://github.com/slimkrazy/python-google-places
I am doing something like this:
from googleplaces import GooglePlaces, types, lang
API_KEY = ''
google_places = GooglePlaces(API_KEY)
query_result = google_places.nearby_search(
location='Mumbai', keyword='Restaurants',
radius=1000, types=[types.TYPE_RESTAURANT])
if query_result.has_attributions:
print query_result.html_attributions
for place in query_result.places:
print place.name
print place.geo_location
print place.place_id
Its given in the link itself. However , I keep getting the following error:
Traceback (most recent call last):
File "run.py", line 9, in <module>
radius=1000, types=[types.TYPE_RESTAURANT])
File "/home/neetu/Desktop/python-google-places/googleplaces/__init__.py", line 281, in nearby_search
lat_lng_str = self._generate_lat_lng_string(lat_lng, location)
File "/home/neetu/Desktop/python-google-places/googleplaces/__init__.py", line 593, in _generate_lat_lng_string
'lat_lng must be a dict with the keys, \'lat\' and \'lng\'. Cause: %s' % str(e))
ValueError: lat_lng must be a dict with the keys, 'lat' and 'lng'. Cause: Request to URL https://maps.googleapis.com/maps/api/geocode/json?sensor=false&key=AIzaSyAiFpFd85eMtfbvmVNEYuNds5TEF9FjIPI&address=Mumbai failed with response code: REQUEST_DENIED
Any help is super welcome :)
According to the documentation:
https://github.com/slimkrazy/python-google-places/blob/master/googleplaces/init.py:
You should be able to provide a location instead of a lat_lng pair, but you could try giving a lat_lng instead of a location.
Try removing:
location="Mumbai"
and adding
lat_lng={'lat:19.148320, 'lng':72.888794}
After digging through the code, I understand the error.
The error you got was:
ValueError: lat_lng must be a dict with the keys, 'lat' and 'lng'. Cause: >Request to URL https://maps.googleapis.com/maps/api/geocode/json?>sensor=false&key=AIzaSyAiFpFd85eMtfbvmVNEYuNds5TEF9FjIPI&address=Mumbai failed >with response code: REQUEST_DENIED
what it is saying is that it tried to look to google to get the lat_lng pair for "Mumbai", but the request failed as it was denied. Use a valid API key from Google and it should work.
I was trying to include a link in a HIT request in Amazon Mechanical Turk, using boto, and kept getting an error that my XML was invalid. I gradually pared my html down to the bare minimum, and isolated that it seems to be that some valid links fail for seemingly no reason. Can anyone with expertise in boto or aws help me parse why?
I followed these two guides:
http://www.toforge.com/2011/04/boto-mturk-tutorial-create-hits/
https://gist.github.com/j2labs/740267
Here is my example:
from boto.mturk.connection import MTurkConnection
from boto.mturk.question import QuestionContent,Question,QuestionForm,Overview,AnswerSpecification,SelectionAnswer,FormattedContent,FreeTextAnswer
from config import *
HOST = 'mechanicalturk.sandbox.amazonaws.com'
mtc = MTurkConnection(aws_access_key_id=ACCESS_ID,
aws_secret_access_key=SECRET_KEY,
host=HOST)
title = 'HIT title'
description = ("HIT description.")
keywords = 'keywords'
s1 = """<![CDATA[<p>Here comes a link <a href='%s'>LINK</a></p>]]>""" % "http://www.example.com"
s2 = """<![CDATA[<p>Here comes a link <a href='%s'>LINK</a></p>]]>""" % "https://www.google.com/search?q=example&site=imghp&tbm=isch"
def makeahit(s):
overview = Overview()
overview.append_field('Title', 'HIT title itself')
overview.append_field('FormattedContent',s)
qc = QuestionContent()
qc.append_field('Title','The title')
fta = FreeTextAnswer()
q = Question(identifier="URL",
content=qc,
answer_spec=AnswerSpecification(fta))
question_form = QuestionForm()
question_form.append(overview)
question_form.append(q)
mtc.create_hit(questions=question_form,
max_assignments=1,
title=title,
description=description,
keywords=keywords,
duration = 30,
reward=0.05)
makeahit(s1) # SUCCESS!
makeahit(s2) # FAIL?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 25, in makeahit
File "/usr/local/lib/python2.7/dist-packages/boto/mturk/connection.py", line 263, in create_hit
return self._process_request('CreateHIT', params, [('HIT', HIT)])
File "/usr/local/lib/python2.7/dist-packages/boto/mturk/connection.py", line 821, in _process_request
return self._process_response(response, marker_elems)
File "/usr/local/lib/python2.7/dist-packages/boto/mturk/connection.py", line 836, in _process_response
raise MTurkRequestError(response.status, response.reason, body)
boto.mturk.connection.MTurkRequestError: MTurkRequestError: 200 OK
<?xml version="1.0"?>
<CreateHITResponse><OperationRequest><RequestId>19548ab5-034b-49ec-86b2-9e499a3c9a79</RequestId></OperationRequest><HIT><Request><IsValid>False</IsValid><Errors><Error><Code>AWS.MechanicalTurk.XHTMLParseError</Code><Message>There was an error parsing the XHTML data in your request. Please make sure the data is well-formed and validates against the appropriate schema. Details: The reference to entity "site" must end with the ';' delimiter. Invalid content: <FormattedContent><![CDATA[<p>Here comes a link <a href='https://www.google.com/search?q=example&site=imghp&tbm=isch'>LINK</a></p>]]></FormattedContent> (1369323038698 s)</Message></Error></Errors></Request></HIT></CreateHITResponse>
Any idea why s2 fails, but s1 succeeds when both are valid links? Both link contents work:
http://www.example.com
https://www.google.com/search?q=example&site=imghp&tbm=isch
Things with query strings? Https?
UPDATE
I'm going to do some tests, but right now my candidate hypotheses are:
HTTPS doesn't work (so, I'll see if I can get another https link to work)
URLs with params don't work (so, I'll see if I can get another url with params to work)
Google doesn't allow its searches to get posted this way? (if 1 and 2 fail!)
You need to escape ampersands in urls, i.e. & => &.
At the end of s2, use
q=example&site=imghp&tbm=isch
instead of
q=example&site=imghp&tbm=isch
I have the following Python code. It basically returns some elements of RDF from an online resource using SPARQL.
I want to query and return something from one of my local files. I tried to edit it but couldn't return anything.
What should I change in order to query within my local instead of http://dbpedia.org/resource?
from SPARQLWrapper import SPARQLWrapper, JSON
# wrap the dbpedia SPARQL end-point
endpoint = SPARQLWrapper("http://dbpedia.org/sparql")
# set the query string
endpoint.setQuery("""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpr: <http://dbpedia.org/resource/>
SELECT ?label
WHERE { dbpr:Asturias rdfs:label ?label }
""")
# select the retur format (e.g. XML, JSON etc...)
endpoint.setReturnFormat(JSON)
# execute the query and convert into Python objects
# Note: The JSON returned by the SPARQL endpoint is converted to nested Python dictionaries, so additional parsing is not required.
results = endpoint.query().convert()
# interpret the results:
for res in results["results"]["bindings"] :
print res['label']['value']
Thanks!
SPARQLWrapper is meant to be used only with remote or local SPARQL endpoints. You have two options:
(a) Put your local RDF file in a local triple store and point your code to localhost. (b) Or use rdflib and use the InMemory storage:
import rdflib.graph as g
graph = g.Graph()
graph.parse('filename.rdf', format='rdf')
print graph.serialize(format='pretty-xml')
You can query the rdflib.graph.Graph() with:
filename = "path/to/fileneme" #replace with something interesting
uri = "uri_of_interest" #replace with something interesting
import rdflib
import rdfextras
rdfextras.registerplugins() # so we can Graph.query()
g=rdflib.Graph()
g.parse(filename)
results = g.query("""
SELECT ?p ?o
WHERE {
<%s> ?p ?o.
}
ORDER BY (?p)
""" % uri) #get every predicate and object about the uri