How to get Wikipedia page id from Wikidata id?

How to get Wikipedia page id from Wikidata id? - python

I want to get the Wikipedia page id from the Wikidata id, how can I get it from the Wikidata Query Service or other methods with python? Because I do not see any attribute in wikidata called something like wikipedia id.

I'm not sure, if DBpedia always contains both wikiPageID and Wikidata ID, but you can try the folowing query on DBpedia:
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT ?wikipedia_id WHERE {
?dbpedia_id owl:sameAs ?wikidata_id .
?dbpedia_id dbo:wikiPageID ?wikipedia_id .
VALUES (?wikidata_id) {(wd:Q123)}
}
Try it!
Or you can try the following federated query on Wikidata:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?wikipedia_id where {
VALUES (?wikidata_id) {(wd:Q123)}
SERVICE <http://dbpedia.org/sparql> {
?dbpedia_id owl:sameAs ?wikidata_id .
?dbpedia_id dbo:wikiPageID ?wikipedia_id
}
}
Try it!
Update
You can call out to the Wikipedia API using MWAPI on Wikidata:
SELECT ?pageid WHERE {
VALUES (?item) {(wd:Q123)}
[ schema:about ?item ; schema:name ?name ;
schema:isPartOf <https://en.wikipedia.org/> ]
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "en.wikipedia.org" .
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam mwapi:generator "allpages" .
bd:serviceParam mwapi:gapfrom ?name .
bd:serviceParam mwapi:gapto ?name .
?pageid wikibase:apiOutput "#pageid" .
}
}
Try it!
Unfortunately, it seems you have to use a generator; allpages appears to be the most suitable one.

First, you need to get the Wikipedia page title from the Wikidata id, which can be done with a request to Wikidata API wbgetentities module, like so: https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q123&format=json&props=sitelinks
Then, once you found the Wikipedia title from the desired Wikipedia edition, you can get the associated page id from that Wikipedia API: https://en.wikipedia.org/w/api.php?action=query&titles=September&format=json
So from those example URLs you can get that:
Wikidata id = Q123
=> English Wikipedia (enwiki) title = September
=> pageid = 15580374

Use below URL in your CURL call. You have to change WikiDataID Q243 in below link.
For Example if you want wikiPageID of Taj_Mahal then replace Q243 with Q9141 in below link and do CURL call.
http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=PREFIX+wd%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E+%0D%0ASELECT+%3FwikiPageID+WHERE+%7B%0D%0A%3Fdbpedia_id+owl%3AsameAs+%3Fwikidata_id++.%0D%0A%3Fdbpedia_id+dbo%3AwikiPageID+%3FwikiPageID+.%0D%0AVALUES+%28%3Fwikidata_id%29+%7B%28wd%3AQ243%29%7D+%0D%0A%7D&format=application%2Fsparql-results%2Bjson&CXML_redir_for_subjs=121&CXML_redir_for_hrefs=&timeout=30000&debug=on&run=+Run+Query
To Get WikiPageID through wikiDataId you have to modify above link or by replacing wikiDataID of your choice in above link.
Note:
1) To Get WikiPageID with Label use this URL in CURL Call
2) Find Q243 and Replace with your wikiDataID

Related

Can I add tags to parent span Datadog

I'm working with express + graphql environment.
I want to add tags to express span with the value I derive while resolving the graphql query.
Currently, tags get added to the graphql span with the following code.
let span = tracer.scope().active();
if (span !== null) {
span.setTag('queryname', queryName);
}
Let me know if there is a way to add these tags to the parent span instead of the current span.
This is required as I don't want to enable analytics on graphql since I already have it enabled in the express app.
Basically I want tags to root trace(express) instead of current span(graphql.execute).

This way my way to get parentSpan:
import tracer from "dd-trace";
import opentracing from "opentracing";
let span = tracer.scope().active();
if (span) {
const parentContext = span
.tracer()
.extract(opentracing.FORMAT_HTTP_HEADERS, req.headers);
if (parentContext) {
span
.tracer()
.inject(
parentContext,
opentracing.FORMAT_HTTP_HEADERS,
context.req.headers
);
}
}

How do i use SimpleQueryString function of elasticsearch?

I am trying to write a django app and use elasticsearch in it with elasticsearch-dsl library of python. I don't want to create all switch-case statements and then pass search queries and filters accordingly.
I want a function that does the parsing stuff by itself.
For e.g. If i pass "some text url:github.com tags:es,es-dsl,django",
the function should output corresponding query.
I searched for it in elasticsearch-dsl documentation and found a function that does the parsing.
https://github.com/elastic/elasticsearch-dsl-py/search?utf8=%E2%9C%93&q=simplequerystring&type=
However, I dont know how to use it.
I tried s = Search(using=client).query.SimpleQueryString("1st|ldnkjsdb"), but it is showing me parsing error.
Can anyone help me out?

You can just plug the SimpleQueryString in the Search object, instead of a dictionary send the elements as parameters of the object.
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
from elasticsearch_dsl.query import SimpleQueryString
client = Elasticsearch()
_search = Search(using=client, index='INDEX_NAME')
_search = _search.filter( SimpleQueryString(
query = "this + (that | thus) -those",
fields= ["field_to_search"],
default_operator= "and"
))
A lot of elasticsearch_dsl simply change the dictionary representation to classes of functions that makes the code look pythonic, and avoid the use of hard-to-read elasticsearch JSONs.

Im guessing you are asking about the usage of elasticsearch-dsl with query string like you are making a request with json data to the elasticsearch api. If that's the case, this is how you are going to use elasticsearch-dsl:
assume you have the query in query variable like this:
{
"query": {
"query_string" : {
"default_field" : "content",
"query" : "this AND that OR thus"
}
}
}
and now do this:
es = Elasticsearch(
host=settings.ELASTICSEARCH_HOST_IP, # Put your ES host IP
port=settings.ELASTICSEARCH_HOST_PORT, # Put yor ES host port
)
index = settings.MY_INDEX # Put your index name here
result = es.search(index=index, body=query)

Why is this SPARQL query missing so many results?

(First off, my apologies as this is a blatant cross-post. I thought opendata.SE would be the place for this, but it's gotten barely any views there and it appears to not be a very active site in general, so I figure I ought to try it here as it's programming-related.)
I'm trying to get a list of major cities in the world: their name, population, and location. I found what looked like a good query on Wikidata, slightly tweaking one of their built-in query examples:
SELECT DISTINCT ?cityLabel ?population ?gps WHERE {
?city (wdt:P31/wdt:P279*) wd:Q515.
?city wdt:P1082 ?population.
?city wdt:P625 ?gps.
FILTER (?population >= 500000) .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?population)
The results, at first glance, appear to be good, but it's missing a ton of important cities. For example, San Francisco (population 800,000+) and Seattle (population 650,000+) are not in the list, when I specifically asked for all cities with a population greater than 500,000.
Is there something wrong with my query? If not, there must be something wrong with the data Wikidata is using. Either way, how can I get a valid data set, with an API I can query from a Python script? (I've got the script all working for this; I'm just not getting back valid data.)
from SPARQLWrapper import SPARQLWrapper, JSON
from geopy.distance import great_circle
def parseCoords(gps):
base = gps[6:-1]
coords=base.split()
return (float(coords[1]), float(coords[0]))
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")
sparql.setReturnFormat(JSON)
sparql.setQuery("""SELECT DISTINCT ?cityLabel ?population ?gps WHERE {
?city (wdt:P31/wdt:P279*) wd:Q515.
?city wdt:P1082 ?population.
?city wdt:P625 ?gps.
FILTER (?population >= 500000) .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?population)""")
queryResults = sparql.query().convert()
cities = [(city["cityLabel"]["value"], int(city["population"]["value"]), parseCoords(city["gps"]["value"])) for city in queryResults["results"]["bindings"]]
print (cities)

The population of seattle is simply not in this database.
If you execute:
#Largest cities of the world
#defaultView:BubbleChart
SELECT * WHERE {
wd:Q5083 wdt:P1082 ?population.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
You get zero results. Altought the instance wd:Q5083(seattle) exists, it does not have a predicate wdt:P1082(population).

How to query dbpedia resource ontology 'wikiPageExternalLink'

Using sparql\sparqlwrapper in python, how will I be able to query for the values of a certain dbpedia resource? For example, how will I be able to get the dbpedia-owl:wikiPageExternalLink values of http://dbpedia.org/page/Asturias?
Here's a simple example on how will I be able to query for the rdfs:label of Asturias. But I don't know how to modify the query/query parameters to get values of property/ontology other than those included on rdfs schema. Here's the sample:
from SPARQLWrapper import SPARQLWrapper, JSON, XML, N3, RDF
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label
WHERE { <http://dbpedia.org/resource/Asturias> rdfs:label ?label }
""")
print '\n\n*** JSON Example'
sparql.setReturnFormat(JSON)
results = sparql.query().convert()
for result in results["results"]["bindings"]:
print result["label"]["value"]
Hoping to receive feedback. Thanks in advance!

Not sure where you're stuck—this is really easy:
SELECT ?label
WHERE { <http://dbpedia.org/resource/Asturias>
dbpedia-owl:wikiPageExternalLink ?label }
Usually you need to declare the namespace prefixes like rdfs: or dbpedia-owl: if you want to use them in the query, but on the DBpedia endpoint this works even without. If you want, you can declare them anyways:
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT ?label
WHERE { <http://dbpedia.org/resource/Asturias>
dbpedia-owl:wikiPageExternalLink ?label }
You can find out the full URI corresponding to the prefix by going to http://dbpedia.org/sparql and clicking on “Namespace Prefixes” near the top right corner.
If you want to rename the variable (for example, from ?label to ?link) then do it like this:
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT ?link
WHERE { <http://dbpedia.org/resource/Asturias>
dbpedia-owl:wikiPageExternalLink ?link }
and you also have to change "label" to "link" in the Python code that gets the value out of the JSON result.

Python Sparql Querying Local File

I have the following Python code. It basically returns some elements of RDF from an online resource using SPARQL.
I want to query and return something from one of my local files. I tried to edit it but couldn't return anything.
What should I change in order to query within my local instead of http://dbpedia.org/resource?
from SPARQLWrapper import SPARQLWrapper, JSON
# wrap the dbpedia SPARQL end-point
endpoint = SPARQLWrapper("http://dbpedia.org/sparql")
# set the query string
endpoint.setQuery("""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpr: <http://dbpedia.org/resource/>
SELECT ?label
WHERE { dbpr:Asturias rdfs:label ?label }
""")
# select the retur format (e.g. XML, JSON etc...)
endpoint.setReturnFormat(JSON)
# execute the query and convert into Python objects
# Note: The JSON returned by the SPARQL endpoint is converted to nested Python dictionaries, so additional parsing is not required.
results = endpoint.query().convert()
# interpret the results:
for res in results["results"]["bindings"] :
print res['label']['value']
Thanks!

SPARQLWrapper is meant to be used only with remote or local SPARQL endpoints. You have two options:
(a) Put your local RDF file in a local triple store and point your code to localhost. (b) Or use rdflib and use the InMemory storage:
import rdflib.graph as g
graph = g.Graph()
graph.parse('filename.rdf', format='rdf')
print graph.serialize(format='pretty-xml')

You can query the rdflib.graph.Graph() with:
filename = "path/to/fileneme" #replace with something interesting
uri = "uri_of_interest" #replace with something interesting
import rdflib
import rdfextras
rdfextras.registerplugins() # so we can Graph.query()
g=rdflib.Graph()
g.parse(filename)
results = g.query("""
SELECT ?p ?o
WHERE {
<%s> ?p ?o.
}
ORDER BY (?p)
""" % uri) #get every predicate and object about the uri

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get Wikipedia page id from Wikidata id? - python

I want to get the Wikipedia page id from the Wikidata id, how can I get it from the Wikidata Query Service or other methods with python? Because I do not see any attribute in wikidata called something like wikipedia id.

Related

Can I add tags to parent span Datadog

How do i use SimpleQueryString function of elasticsearch?

Why is this SPARQL query missing so many results?

How to query dbpedia resource ontology 'wikiPageExternalLink'

Python Sparql Querying Local File

Categories

Resources