Related
I wrote some code to get data from a web API. I was able to parse the JSON data from the API, but the result I gets looks quite complex. Here is one example:
>>> my_json
{'name': 'ns1:timeSeriesResponseType', 'declaredType': 'org.cuahsi.waterml.TimeSeriesResponseType', 'scope': 'javax.xml.bind.JAXBElement$GlobalScope', 'value': {'queryInfo': {'creationTime': 1349724919000, 'queryURL': 'http://waterservices.usgs.gov/nwis/iv/', 'criteria': {'locationParam': '[ALL:103232434]', 'variableParam': '[00060, 00065]'}, 'note': [{'value': '[ALL:103232434]', 'title': 'filter:sites'}, {'value': '[mode=LATEST, modifiedSince=null]', 'title': 'filter:timeRange'}, {'value': 'sdas01', 'title': 'server'}]}}, 'nil': False, 'globalScope': True, 'typeSubstituted': False}
Looking through this data, I can see the specific data I want: the 1349724919000 value that is labelled as 'creationTime'.
How can I write code that directly gets this value?
I don't need any searching logic to find this value. I can see what I need when I look at the response; I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way. I read some tutorials, so I understand that I need to use [] to access elements of the nested lists and dictionaries; but I can't figure out exactly how it works for a complex case.
More generally, how can I figure out what the "path" is to the data, and write the code for it?
For reference, let's see what the original JSON would look like, with pretty formatting:
>>> print(json.dumps(my_json, indent=4))
{
"name": "ns1:timeSeriesResponseType",
"declaredType": "org.cuahsi.waterml.TimeSeriesResponseType",
"scope": "javax.xml.bind.JAXBElement$GlobalScope",
"value": {
"queryInfo": {
"creationTime": 1349724919000,
"queryURL": "http://waterservices.usgs.gov/nwis/iv/",
"criteria": {
"locationParam": "[ALL:103232434]",
"variableParam": "[00060, 00065]"
},
"note": [
{
"value": "[ALL:103232434]",
"title": "filter:sites"
},
{
"value": "[mode=LATEST, modifiedSince=null]",
"title": "filter:timeRange"
},
{
"value": "sdas01",
"title": "server"
}
]
}
},
"nil": false,
"globalScope": true,
"typeSubstituted": false
}
That lets us see the structure of the data more clearly.
In the specific case, first we want to look at the corresponding value under the 'value' key in our parsed data. That is another dict; we can access the value of its 'queryInfo' key in the same way, and similarly the 'creationTime' from there.
To get the desired value, we simply put those accesses one after another:
my_json['value']['queryInfo']['creationTime'] # 1349724919000
I just need to know how to translate that into specific code to extract the specific value, in a hard-coded way.
If you access the API again, the new data might not match the code's expectation. You may find it useful to add some error handling. For example, use .get() to access dictionaries in the data, rather than indexing:
name = my_json.get('name') # will return None if 'name' doesn't exist
Another way is to test for a key explicitly:
if 'name' in resp_dict:
name = resp_dict['name']
else:
pass
However, these approaches may fail if further accesses are required. A placeholder result of None isn't a dictionary or a list, so attempts to access it that way will fail again (with TypeError). Since "Simple is better than complex" and "it's easier to ask for forgiveness than permission", the straightforward solution is to use exception handling:
try:
creation_time = my_json['value']['queryInfo']['creationTime']
except (TypeError, KeyError):
print("could not read the creation time!")
# or substitute a placeholder, or raise a new exception, etc.
Here is an example of loading a single value from simple JSON data, and converting back and forth to JSON:
import json
# load the data into an element
data={"test1": "1", "test2": "2", "test3": "3"}
# dumps the json object into an element
json_str = json.dumps(data)
# load the json to a string
resp = json.loads(json_str)
# print the resp
print(resp)
# extract an element in the response
print(resp['test1'])
Try this.
Here, I fetch only statecode from the COVID API (a JSON array).
import requests
r = requests.get('https://api.covid19india.org/data.json')
x = r.json()['statewise']
for i in x:
print(i['statecode'])
Try this:
from functools import reduce
import re
def deep_get_imps(data, key: str):
split_keys = re.split("[\\[\\]]", key)
out_data = data
for split_key in split_keys:
if split_key == "":
return out_data
elif isinstance(out_data, dict):
out_data = out_data.get(split_key)
elif isinstance(out_data, list):
try:
sub = int(split_key)
except ValueError:
return None
else:
length = len(out_data)
out_data = out_data[sub] if -length <= sub < length else None
else:
return None
return out_data
def deep_get(dictionary, keys):
return reduce(deep_get_imps, keys.split("."), dictionary)
Then you can use it like below:
res = {
"status": 200,
"info": {
"name": "Test",
"date": "2021-06-12"
},
"result": [{
"name": "test1",
"value": 2.5
}, {
"name": "test2",
"value": 1.9
},{
"name": "test1",
"value": 3.1
}]
}
>>> deep_get(res, "info")
{'name': 'Test', 'date': '2021-06-12'}
>>> deep_get(res, "info.date")
'2021-06-12'
>>> deep_get(res, "result")
[{'name': 'test1', 'value': 2.5}, {'name': 'test2', 'value': 1.9}, {'name': 'test1', 'value': 3.1}]
>>> deep_get(res, "result[2]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[-1]")
{'name': 'test1', 'value': 3.1}
>>> deep_get(res, "result[2].name")
'test1'
How do I get the value of a dict item within a list, within a dict in Python? Please see the following code for an example of what I mean.
I use the following lines of code in Python to get data from an API.
res = requests.get('https://api.data.amsterdam.nl/bag/v1.1/nummeraanduiding/', params)
data = res.json()
data then returns the following Python dictionary:
{
'_links': {
'next': {
'href': null
},
'previous': {
"href": null
},
'self': {
'href': 'https://api.data.amsterdam.nl/bag/v1.1/nummeraanduiding/'
}
},
'count': 1,
'results': [
{
'_display': 'Maple Street 99',
'_links': {
'self': {
'href': 'https://api.data.amsterdam.nl/bag/v1.1/nummeraanduiding/XXXXXXXXXXXXXXXX/'
}
},
'dataset': 'bag',
'landelijk_id': 'XXXXXXXXXXXXXXXX',
'type_adres': 'Hoofdadres',
'vbo_status': 'Verblijfsobject in gebruik'
}
]
}
Using Python, how do I get the value for 'landelijk_id', represented by the twelve Xs?
This should work:
>>> data['results'][0]['landelijk_id']
"XXXXXXXXXXXXXXXX"
You can just chain those [] for each child you need to access.
I'd recommend using the jmespath package to make handling nested Dictionaries easier. https://pypi.org/project/jmespath/
import jmespath
import requests
res = requests.get('https://api.data.amsterdam.nl/bag/v1.1/nummeraanduiding/', params)
data = res.json()
print(jmespath.search('results[].landelijk_id', data)
I'm trying to hit my geocoding server's REST API:
[https://locator.stanford.edu/arcgis/rest/services/geocode/USA_StreetAddress/GeocodeServer] (ArcGIS Server 10.6.1)
...using the POST method (which, BTW, could use an example or two, there only seems to be this VERY brief "note" on WHEN to use POST, not HOW: https://developers.arcgis.com/rest/geocode/api-reference/geocoding-geocode-addresses.htm#ESRI_SECTION1_351DE4FD98FE44958C8194EC5A7BEF7D).
I'm trying to use requests.post(), and I think I've managed to get the token accepted, etc..., but I keep getting a 400 error.
Based upon previous experience, this means something about the formatting of the data is bad, but I've cut-&-pasted directly from the Esri support site, this test pair.
# import the requests library
import requests
# Multiple address records
addresses={
"records": [
{
"attributes": {
"OBJECTID": 1,
"Street": "380 New York St.",
"City": "Redlands",
"Region": "CA",
"ZIP": "92373"
}
},
{
"attributes": {
"OBJECTID": 2,
"Street": "1 World Way",
"City": "Los Angeles",
"Region": "CA",
"ZIP": "90045"
}
}
]
}
# Parameters
# Geocoder endpoint
URL = 'https://locator.stanford.edu/arcgis/rest/services/geocode/USA_StreetAddress/GeocodeServer/geocodeAddresses?'
# token from locator.stanford.edu/arcgis/tokens
mytoken = <GeneratedToken>
# output spatial reference id
outsrid = 4326
# output format
format = 'pjson'
# params data to be sent to api
params ={'outSR':outsrid,'f':format,'token':mytoken}
# Use POST to batch geocode
r = requests.post(url=URL, data=addresses, params=params)
print(r.json())
print(r.text)
Here's what I consistently get:
{'error': {'code': 400, 'message': 'Unable to complete operation.', 'details': []}}
I had to play around with this for longer than I'd like to admit, but the trick (I guess) is to use the correct request header and convert the raw addresses to a JSON string using json.dumps().
import requests
import json
url = 'http://sampleserver6.arcgisonline.com/arcgis/rest/services/Locators/SanDiego/GeocodeServer/geocodeAddresses'
headers = { 'Content-Type': 'application/x-www-form-urlencoded' }
addresses = json.dumps({ 'records': [{ 'attributes': { 'OBJECTID': 1, 'SingleLine': '2920 Zoo Dr' }}] })
r = requests.post(url, headers = headers, data = { 'addresses': addresses, 'f':'json'})
print(r.text)
I'm working on a project using Python(3.6) in which I have implemented GitHub JSON APIs by using the Python's requests package.I'm getting the list of public repos for a specific serach term provided by user.It's providing a response object which includes information about repos but I need to get the last commit infor of every repo, how can I get this info from response object from Github api.
Here's how I have implemented this:
class GhNavigator(CreateView):
def get(self, request, *args, **kwargs):
term = request.GET.get('search_term')
username = 'arycloud'
token = 'API_TOKEN'
login = requests.get('https://api.github.com/search/repositories?q=' + term, auth=(username, token))
response = login.json()
print(response)
return render(request, 'navigator/template.html', {'response': response, 'term': term})
Here's a sample response:
{'total_count': 4618, 'incomplete_results': False, 'items': [{'id': 6750871, 'name': 'arrow', 'full_name': 'crsmithdev/arrow', 'owner': {'login': 'crsmithdev', 'id': 1596037, 'avatar_url': 'https://avatars1.githubusercontent.com/u/1596037?v=4', 'gravatar_id': '', 'url': 'https://api.github.com/users/crsmithdev', 'html_url': 'https://github.com/crsmithdev', 'followers_url': 'https://api.github.com/users/crsmithdev/followers', 'following_url': 'https://api.github.com/users/crsmithdev/following{/other_user}', 'gists_url': 'https://api.github.com/users/crsmithdev/gists{/gist_id}', 'starred_url': 'https://api.github.com/users/crsmithdev/starred{/owner}{/repo}', 'subscriptions_url': 'https://api.github.com/users/crsmithdev/subscriptions', 'organizations_url': 'https://api.github.com/users/crsmithdev/orgs', 'repos_url': 'https://api.github.com/users/crsmithdev/repos', 'events_url': 'https://api.github.com/users/crsmithdev/events{/privacy}', 'received_events_url': 'https://api.github.com/users/crsmithdev/received_events', 'type': 'User', 'site_admin': False}, 'private': False, 'html_url': 'https://github.com/crsmithdev/arrow', 'description': 'Better dates & times for Python', 'fork': False, 'url': 'https://api.github.com/repos/crsmithdev/arrow', 'forks_url': 'https://api.github.com/repos/crsmithdev/arrow/forks', 'keys_url': 'https://api.github.com/repos/crsmithdev/arrow/keys{/key_id}', 'collaborators_url': 'https://api.github.com/repos/crsmithdev/arrow/collaborators{/collaborator}', 'teams_url': 'https://api.github.com/repos/crsmithdev/arrow/teams', 'hooks_url': 'https://api.github.com/repos/crsmithdev/arrow/hooks', 'issue_events_url': 'https://api.github.com/repos/crsmithdev/arrow/issues/events{/number}', 'events_url': 'https://api.github.com/repos/crsmithdev/arrow/events', 'assignees_url': 'https://api.github.com/repos/crsmithdev/arrow/assignees{/user}', 'branches_url': 'https://api.github.com/repos/crsmithdev/arrow/branches{/branch}', 'tags_url': 'https://api.github.com/repos/crsmithdev/arrow/tags', 'blobs_url': 'https://api.github.com/repos/crsmithdev/arrow/git/blobs{/sha}', 'git_tags_url': 'https://api.github.com/repos/crsmithdev/arrow/git/tags{/sha}', 'git_refs_url': 'https://api.github.com/repos/crsmithdev/arrow/git/refs{/sha}', 'trees_url': 'https://api.github.com/repos/crsmithdev/arrow/git/trees{/sha}', 'statuses_url': 'https://api.github.com/repos/crsmithdev/arrow/statuses/{sha}', 'languages_url': 'https://api.github.com/repos/crsmithdev/arrow/languages', 'stargazers_url': 'https://api.github.com/repos/crsmithdev/arrow/stargazers', 'contributors_url': 'https://api.github.com/repos/crsmithdev/arrow/contributors', 'subscribers_url': 'https://api.github.com/repos/crsmithdev/arrow/subscribers', 'subscription_url': 'https://api.github.com/repos/crsmithdev/arrow/subscription', 'commits_url': 'https://api.github.com/repos/crsmithdev/arrow/commits{/sha}', 'git_commits_url': 'https://api.github.com/repos/crsmithdev/arrow/git/commits{/sha}', 'comments_url': 'https://api.github.com/repos/crsmithdev/arrow/comments{/number}', 'issue_comment_url': 'https://api.github.com/repos/crsmithdev/arrow/issues/comments{/number}', 'contents_url': 'https://api.github.com/repos/crsmithdev/arrow/contents/{+path}', 'compare_url': 'https://api.github.com/repos/crsmithdev/arrow/compare/{base}...{head}', 'merges_url': 'https://api.github.com/repos/crsmithdev/arrow/merges', 'archive_url': 'https://api.github.com/repos/crsmithdev/arrow/{archive_format}{/ref}', 'downloads_url': 'https://api.github.com/repos/crsmithdev/arrow/downloads', 'issues_url': 'https://api.github.com/repos/crsmithdev/arrow/issues{/number}', 'pulls_url': 'https://api.github.com/repos/crsmithdev/arrow/pulls{/number}', 'milestones_url': 'https://api.github.com/repos/crsmithdev/arrow/milestones{/number}', 'notifications_url': 'https://api.github.com/repos/crsmithdev/arrow/notifications{?since,all,participating}', 'labels_url': 'https://api.github.com/repos/crsmithdev/arrow/labels{/name}', 'releases_url': 'https://api.github.com/repos/crsmithdev/arrow/releases{/id}', 'deployments_url': 'https://api.github.com/repos/crsmithdev/arrow/deployments', 'created_at': '2012-11-18T20:23:27Z', 'updated_at': '2018-05-24T11:23:14Z', 'pushed_at': '2018-05-21T09:03:24Z', 'git_url': 'git://github.com/crsmithdev/arrow.git', 'ssh_url': 'git#github.com:crsmithdev/arrow.git', 'clone_url': 'https://github.com/crsmithdev/arrow.git', 'svn_url': 'https://github.com/crsmithdev/arrow', 'homepage': 'https://arrow.readthedocs.org', 'size': 1454, 'stargazers_count': 4999, 'watchers_count': 4999, 'language': 'Python', 'has_issues': True, 'has_projects': True, 'has_downloads': True, 'has_wiki': True, 'has_pages': True, 'forks_count': 416, 'mirror_url': None, 'archived': False, 'open_issues_count': 123, 'license': {'key': 'other', 'name': 'Other', 'spdx_id': None, 'url': None}, 'forks': 416, 'open_issues': 123, 'watchers': 4999, 'default_branch': 'master', 'permissions': {'admin': False, 'push': False, 'pull': True}, 'score': 126.08792}]}
Get the last commit from multiple repositories
Use GraphQL API v4 to perform your search request and get the most recent commit using history(first: 1) when iterating the repository. The graphQL query :
{
search(query: "language:python", type: REPOSITORY, first: 100) {
edges {
node {
... on Repository {
defaultBranchRef {
target {
... on Commit {
history(first: 1) {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
}
}
}
}
}
}
}
pageInfo {
endCursor
hasNextPage
}
}
}
Try it in the explorer
In python :
import json
import requests
access_token = "YOUR_TOKEN"
query = """
{
search(query: "language:python", type: REPOSITORY, first: 100) {
edges {
node {
... on Repository {
defaultBranchRef {
target {
... on Commit {
history(first: 1) {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
}
}
}
}
}
}
}
pageInfo {
endCursor
hasNextPage
}
}
}
"""
data = {'query': query.replace('\n', ' ')}
headers = {'Authorization': 'token ' + access_token, 'Content-Type': 'application/json'}
r = requests.post('https://api.github.com/graphql', headers=headers, json=data)
print(json.loads(r.text)['data']['search']['edges'])
You will then need to go through pagination with the cursor value specifying after: "END_CURSOR_VALUE" if hasNextPage is true. check this
Get the last commit from a single repository
You can use list commits on a repository API and only return the first element with per_page=1 since the first is the most recent one. If you don't specify the sha parameter it will take the default branch :
https://api.github.com/repos/torvalds/linux/commits?per_page=1
Using Rest API v3 :
import requests
repo = 'torvalds/linux'
r = requests.get('https://api.github.com/repos/{0}/commits?per_page=1'.format(repo))
commit = r.json()[0]["commit"]
print(commit)
And if you want to use GraphQL API v4, you can do the following :
import json
import requests
access_token = "YOUR_TOKEN"
query = """
{
repository(owner: "torvalds", name: "linux") {
defaultBranchRef {
target {
... on Commit {
history(first: 1) {
nodes {
message
committedDate
authoredDate
oid
author {
email
name
}
}
}
}
}
}
}
}
"""
data = {'query': query.replace('\n', ' ')}
headers = {'Authorization': 'token ' + access_token, 'Content-Type': 'application/json'}
r = requests.post('https://api.github.com/graphql', headers=headers, json=data)
print(json.loads(r.text)['data']['repository']['defaultBranchRef']['target']['history']['nodes'][0])
Try it in the explorer
You can also consider the last push event: that would represent the last and most recent commit done (on any branch), pushed by a user to this repo.
I'm trying to add a timestamp to my data, have elasticsearch-py bulk index it, and then display the data with kibana.
My data is showing up in kibana, but my timestamp is not being used. When I go to the "Discovery" tab after configuring my index pattern, I get 0 results (yes, I tried adjusting the search time).
Here is what my bulk index json looks like:
{'index':
{'_timestamp': u'2015-08-11 14:18:26',
'_type': 'webapp_fingerprint',
'_id': u'webapp_id_redacted_2015_08_13_12_39_34',
'_index': 'webapp_index'
}
}
****JSON DATA HERE***
This will be accepted by elasticsearch and will get imported into Kibana, but the _timestamp field will not actually be indexed (it does show up in the dropdown when configuring an index pattern under "Time-field name").
I also tried formatting the metaFields like this:
{'index': {
'_type': 'webapp_fingerprint',
'_id': u'webapp_id_redacted_2015_08_13_12_50_04',
'_index': 'webapp_index'
},
'source': {
'_timestamp': {
'path': u'2015-08-11 14:18:26',
'enabled': True,
'format': 'YYYY-MM-DD HH:mm:ss'
}
}
}
This also doesn't work.
Finally, I tried including the _timestamp field within the index and applying the format, but I got an error with elasticsearch.
{'index': {
'_timestamp': {
'path': u'2015-08-11 14:18:26',
'enabled': True,
'format': 'YYYY-MM-DD HH:mm:ss'
},
'_type': 'webapp_fingerprint',
'_id': u'webapp_id_redacted_2015_08_13_12_55_53',
'_index': 'webapp_index'
}
}
The error is:
elasticsearch.exceptions.TransportError:
TransportError(500,u'IllegalArgumentException[Malformed action/metadata
line [1], expected a simple value for field [_timestamp] but found [START_OBJECT]]')
Any help someone can provide would be greatly appreciated. I apologize if I haven't explained the issue well enough. Let me know if I need to clarify more. Thanks.
Fixed my own problem. Basically, I needed to add mappings for the timestamp when I created the index.
request_body = {
"settings" : {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings" : {
"_default_":{
"_timestamp":{
"enabled":"true",
"store":"true",
"path":"plugins.time_stamp.string",
"format":"yyyy-MM-dd HH:m:ss"
}
}
}
}
print("creating '%s' index..." % (index_name))
res = es.indices.create(index = index_name, body = request_body)
print(" response: '%s'" % (res))
In the latest versions of Elasticsearch, just using the PUT/POST API and ISOFORMAT strings should work.
import datetime
import requests
query = json.dumps(
{
"createdAt": datetime.datetime.now().replace(microsecond=0).isoformat(),
}
)
response = requests.post("https://search-XYZ.com/your-index/log", data=query,
headers={'Content-Type': 'application/json'})
print(response)