How to get data from wikidata using the QID URL - python

I wanted to know how can I get data using QID URL. I have some names, I use falcon 2.0 entity linker curl command( change it into python script) to get information of its QID. Now I want to use that QID to access information about the persons gender( male or female) or alias or other information. Can someone give an idea how it should be approached. The code to get QID URL is given below. the link to falcon 2.0 is https://github.com/SDM-TIB/Falcon2.0.
import requests
import json
response_list=[]
person_names=[]
if __name__ == '__main__':
limit=100
with open(filename, 'r') as in_file:
in_reader = in_file.readlines()
for data in in_reader:
if limit > 0:
person_names.append(data.rstrip())
limit -=1
else :
break
"""
Url of post request and header of type json create linking against each line of text.
"""
url="https://labs.tib.eu/falcon/falcon2/api?mode=long"
headers = {'Content-type': 'application/json'}
for name in person_names:
data = {"text":name }
data_json = json.dumps(data)
response = requests.post(url, data=data_json, headers=headers)
print(response.content)
It gives output as http://www.wikidata.org/entity/Q42493 for the entity.

You can convert URLs of the form http://www.wikidata.org/entity/Q42493 to https://www.wikidata.org/wiki/Special:EntityData/Q42493.json to get a JSON payload with the information that you seek, but first you should make sure that the entity resolution algorithm is giving you accurate results so that you have the correct QID to start with.

Related

Seleniumwire get Response text

I'm using Selenium-wire to try and read the request response text of some network traffic. The code I have isn't fully reproducable as the account is behind a paywall.
The bit of selenium-wire I'm currently using using is:
for request in driver.requests:
if request.method == 'POST' and request.headers['Content-Type'] == 'application/json':
# The body is in bytes so convert to a string
body = driver.last_request.body.decode('utf-8')
# Load the JSON
data = json.loads(body)
Unfortunately though, that is reading the payload of the request
and I'm trying to parse the Response:
You need to get last_request's response:
body = driver.last_request.response.body.decode('utf-8')
data = json.loads(body)
I usually use these 3 steps
# I define the scopes to avoid other post requests that are not related
# we can also use it to only select the required endpoints
driver.scopes = [
# .* is a regex stands for any char 0 or more times
'.*stackoverflow.*',
'.*github.*'
]
# visit the page
driver.get('LINK')
# get the response
response = driver.last_request # or driver.requests[-1]
# get the json
js = json.loads(
decode(
response.response.body,
# get the encoding from the request
response.headers.get('Content-Encoding', 'identity'),
)
)
# this clears all the requests it's a good idea to do after each visit to the page
del driver.requests
for more info here is the doc

Etherscan api on ropsten for balance checking does not work

I have a problem with the etherscan api on ropsten testnetwork, the output of the code is: expecting value line 1 column 1 (char 0)
the code:
import requests, json
ADD = "0xfbb61B8b98a59FbC4bD79C23212AddbEFaEB289f"
KEY = "HERE THE API KEY"
REQ = requests.get(f"https://api-ropsten.etherscan.io/api?module=account&action=balance&address={str(ADD)}&tag=latest&apikey={str(KEY)}")
CONTENT = json.loads(REQ.content)
BALANCE = int(CONTENT['result'])
print(BALANCE)
When I try to do a request it gives back <Response [403]>
Some websites don't allow Python scripts to access their website. You can get around this by adding a user agent in you request.
the code would look something like this:
import requests, json
ADD = "0xfbb61B8b98a59FbC4bD79C23212AddbEFaEB289f"
KEY = "HERE THE API KEY"
LINK = f"https://api-ropsten.etherscan.io/api?module=account&action=balance&address={str(ADD)}&tag=latest&apikey={str(KEY)}"
headers = {"HERE YOUR USER-AGENT"}
REQ = requests.get(LINK, headers = headers)
CONTENT = json.loads(REQ.content)
BALANCE = int(CONTENT['result'])
print(BALANCE)
To find your user agent simply type in google: my user agent

Get specific attribute data from Json

I'm implemented a code to return Json value like this
{ "key1":"1", "key2":"2", ......}
But I want to get only value of key1. so I first defined info to get Json.
info = requests.get(url, headers=headers)
And use info.text['key1'] to get value of key1. but i got error. could anyone suggest any solution?
JSON is a format, but the data are included as text. Inside the info.text you have a string.
If you want to access the json data you can do info.json()['key1']. This will work only if the response content type is defined as JSON, to check that do
info.headers['content-type'] should be application/json; charset=utf8
http://docs.python-requests.org/en/master/
Otherwise you will have to manually load the text to json, with json library for example
import json
response = requests.get(url, headers=headers)
data = json.loads(response.text)
import json
info = requests.get(url, headers=headers)
jsonObject = json.loads(info)
key1= jsonObject['key1']
or
jsonObject = info.json()
key1= jsonObject['key1']

POST form data containing spaces with Python requests

I'm probably overlooking something spectacularly obvious, but I can't find why the following is happening.
I'm trying to POST a search query to http://www.arcade-museum.com using the requests lib and whenever the query contains spaces, the resulting page contains no results. Compare the result of these snippets:
import requests
url = "http://www.arcade-museum.com/results.php"
payload = {'type': 'Videogame', 'q': '1942'}
r = requests.post(url, payload)
with open("search_results.html", mode="wb") as f:
f.write(r.content)
and
import requests
url = "http://www.arcade-museum.com/results.php"
payload = {'type': 'Videogame', 'q': 'Wonder Boy'}
r = requests.post(url, payload)
with open("search_results.html", mode="wb") as f:
f.write(r.content)
If you try the same query on the website, the latter will result in a list of about 10 games. The same happens when posting the form data using the Postman REST client Chrome extension:
Again, it's probably something very obvious I'm overlooking, but I can't find what's causing this issue.

download file from web service in python 3

I do see a few methods of downloading a file from HTTP/HTTPS in Python, but for all of these you need to know the exact URL. I'm trying to download from a web service and the URL has methods and post arguments that are sent in order to download the file, I can't figure out what the URL is to send. This is the code snippet:
url = 'https://www.example123.com'
params = { 'user' : 'username', 'pass' : 'password', 'method' : 'getproject', 'getPDF' : 'true' }
data = urllib.parse.urlencode(params)
data = data.encode('utf-8')
request= urllib.request.Request(url, data)
response = urllib.request.urlopen(request)
xdata = response.read()
print(xdata)
The print statement looks as though it's reading the PDF, but I want to save it somewhere and can't find any way to do that? Here is the beginning of the print response:
b'%PDF-1.6\r%\xe2\xe3\xcf\xd3\r\n12 0 obj\r<</Lin
You have to open a file and write to it. Right now, you are just storing it in a string variable.
with open('yourfile.pdf', 'w') as f:
f.write(xdata)

Categories

Resources