I am having trouble understanding the concept of “API discovery” as used in Google products/services. Here’s some Python code that uses the said discovery service to access Google Cloud Vision:
from googleapiclient.discovery import build
from oauth2client.client import GoogleCredentials
…
API_DISCOVERY_FILE = 'https://vision.googleapis.com/$discovery/rest?version=v1'
hlh = httplib2.Http()
credentials = GoogleCredentials.get_application_default().create_scoped(
['https://www.googleapis.com/auth/cloud-platform'])
credentials.authorize(hlh)
service = build(serviceName='vision', version='v1', http=hlh, discoveryServiceUrl=API_DISCOVERY_FILE)
service_request = service.images().annotate(body={ <more JSON code here> })
Here’s another bit of Python code that also accesses Google Cloud Vision, but does not use API discovery and works just fine:
import requests
…
ENDPOINT_URL = 'https://vision.googleapis.com/v1/images:annotate'
response = requests.post(ENDPOINT_URL,
data=make_image_data(image_filenames),
params={'key': api_key},
headers={'Content-Type': 'application/json'})
What I can’t wrap my head around is this question: You need to know the details of the API that you are going to be calling so that you can tailor the call; this is obvious. So, how would API discovery help you at the time of the call, after you have already prepared the code for calling that API?
PS: I did look at the following resources prior to posting this question:
https://developers.google.com/discovery/v1/getting_started
https://developers.google.com/discovery/v1/using
I did see this answered question but would appreciate additional insight.
Note: I lack the knowledge of the Google API you are mentioning.
You need to know the details of the API that you are going to be
calling so that you can tailor the call; this is obvious.
In theory to me this is only required for the very first call, which would be the call to a starting service which would for instance list a number of resources. From there on you can have branches to underlying resources and their allowed methods (verbs if you will). So this example illustrated a tree like structure. If you provide a GUI to navigate through this discoverability in a generic way, a person would then be able to decide what to do.
In practice this is kind of hard to do once you have vast amounts of resources and all sorts of interrelationships between them.
Related
I am developing a DAG to be scheduled on Apache Airflow which main porpuse will be to post survey data (on json format) to an API and then getting a response (the answers to the surveys). Since this whole process is going to be automated, every part of it has to be programmed in the DAG, so I can´t use Postman or any similar app (unless there is a way to automate their usage, but I don't know if this is possible).
I was thinking of using the requests library for Python, and the function I've written for posting the json to the API looks like this:
def postFileToAPI(**context):
print('uploadFileToAPI() ------ ')
json_file = context['ti'].xcom_pull(task_ids='toJson') ## this pulls the json file from a previous task
print('--------------- Posting survey request to API')
r = requests.post('https://[request]', data = json_file)
(I haven't finished defining the http link for the request because my source data is incomplete.)
However, since this is my frst time working with APIs and the requests library, I don't know if this is enough. For example, I'm unsure if I need to provide a token from the API to perform the request.
I also don't know if there are other libraries that are better suited for this or that could be a good support.
In short: I don't know if what I'm doing will work as intended, what other information I need t provide my DAG or if there are any libraries to make my work easier.
The Python requests package that you're using is all you need, except if you're making a request that needs extra authorisation - then you should also import for example requests_jwt (then from requests_jwt import JWTAuth) if you're using JSON web tokens, or whatever relevant requests package corresponds for your authorisation style.
You make POST and GET requests and all individual requests separately.
Include the URL and data arguments as you have done and that should work!
You may also need headers and/or auth arguments to get through security,
eg for the GitLab api for a private repository you would include these extra arguments, where GITLAB_TOKEN is a GitLab web token.
```headers={'PRIVATE-TOKEN': GITLAB_TOKEN},
auth=JWTAuth(GITLAB_TOKEN)```
If you just try it it should work, if it doesn't work then test the API with curl requests directly in the Terminal, or let us know :)
I have code below that was given to me to list Google Cloud Service Accounts for a specific Project.
import os
from googleapiclient import discovery
from gcp import get_key
"""gets all Service Accounts from the Service Account page"""
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = get_key()
service = discovery.build('iam', 'v1')
project_id = 'projects/<google cloud project>'
request = service.projects().serviceAccounts().list(name=project_id)
response = request.execute()
accounts = response['accounts']
for account in accounts:
print(account['email'])
This code works perfectly and prints the accounts as I need them. What I'm trying to figure out is:
Where can I go to see how to construct code like this? I found a site that has references to the Python API Client, but I can't seem to figure out how to make the code above from it. I can see the Method to list the Service Accounts, but it's still not giving me enough information.
Is there somewhere else I should be going to educate myself. Any information you have is appreciated so I don't pull out the rest of my hair.
Thanks, Eric
Let me share with you this documentation page, where there is a detailed explanation on how to build a script such as the one you shared, and what does each line of code mean. It is extracted from the documentation of ML Engine, not IAM, but it is using the same Python Google API Client Libary, so just ignore the references to ML and the rest will be useful for you.
In any case, here it is a commented version of your code, so that you understand it better:
# Imports for the Client API Libraries and the key management
import os
from googleapiclient import discovery
from gcp import get_key
# Look for an environment variable containing the credentials for Google Cloud Platform
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = get_key()
# Build a Python representation of the REST API
service = discovery.build('iam', 'v1')
# Define the Project ID of your project
project_id = 'projects/<google cloud project>'
"""Until this point, the code is general to any API
From this point on, it is specific to the IAM API"""
# Create the request using the appropriate 'serviceAccounts' API
# You can substitute serviceAccounts by any other available API
request = service.projects().serviceAccounts().list(name=project_id)
# Execute the request that was built in the previous step
response = request.execute()
# Process the data from the response obtained with the request execution
accounts = response['accounts']
for account in accounts:
print(account['email'])
Once you understand the first part of the code, the last lines are specific to the API you are using, which in this case is the Google IAM API. In this link, you can find detailed information on the methods available and what they do.
Then, you can follow the Python API Client Library documentation that you shared in order to see how to call the methods. For instance, in the code you shared, the method used depends on service, which is the Python representation of the API, and then goes down the tree of methods in the last link as in projects(), then serviceAccounts() and finally the specificlist() method, which ends up in request = service.projects().serviceAccounts().list(name=project_id).
Finally, just in case you are interested in the other available APIs, please refer to this page for more information.
I hope the comments I made on your code were of help, and that the documentation shared makes it easier for you to understand how a code like that one could be scripted.
You can use ipython having googleapiclient installed - with something like:
sudo pip install --upgrade google-api-python-client
You can go to interactive python console and do:
from googleapiclient import discovery
dir(discovery)
help(discovery)
dir - gives all entries that object has - so:
a = ''
dir(a)
Will tell what you can do with string object. Doing help(a) will give help for string object. You can do dipper:
dir(discovery)
# and then for instance
help(discovery.re)
You can call your script in steps, and see what is result print it, do some research, having something - do %history to printout your session, and have solution that can be triggered as a script.
I have a tiny Flask server that is supposed to load data from a file and run a function on it. This function will return a DataFrame and I return the json version of it. Much to my surprise this all works nicely. However, how would I test this? I have included some attempts below but I don't understand Flask (nor REST) well enough yet:
#!/home/thomas/python
from flask import Flask
from flask.ext.restful import Resource, Api
app = Flask(__name__)
api = Api(app)
class UniverseAPI(Resource):
def get(self):
import pandas as pd
frame = pd.read_csv("//datasrv10//data$//AQ//test.csv", index_col=0, header=0)
return frame.to_json()
api.add_resource(UniverseAPI, '/data/universe')
I am happy to include a few of my attempts here... I appreciate any hints. I have read the official documentation.
I should specify what I mean with testing. I can run this on my linux server and can extract all the required information with the requests package. However, I want to create a unittest that comes without the need to start the server on the localhost. I think I have managed with the FLASK test-client. However, the problem now is that the requests response object and the flask response object treat the underlying json strings rather differently. So I guess my problem is more related to json string issues rather than FLASK. Thanks for all your helpful feedback though
Well, the basics of writing a REST API are essentially a set of design principles. My understanding of it is based on this article by Miguel Grinberg, http://blog.miguelgrinberg.com/post/designing-a-restful-api-with-python-and-flask .
In it, he talks about how a REST API is:
"Stateless" - All interactions with the service can happen using the information from one request.
Built upon accessing "resources" from URIs using HTTP requests like GET, PUTS, and POST. A resource could be an order in a store, a task in a web app, or whatever you like.
There's also a bunch of stuff about how the server should standardize all forms of communication between itself and the client, indicate whether it can do cacheing, and other stuff like that. From an initial design standpoint, though, this is "the point" as he put it:
"The task of designing a web service or API that adheres to the REST guidelines then becomes > an exercise in identifying the resources that will be exposed and how they will be affected > by the different request methods."
If you're looking for an interesting example of a REST API that might be suited to your interests (I know it is to mine), reddit's is open source. It's a relatable example to see how they try and structure the interactions behind requests: http://www.reddit.com/dev/api
I am new to Flask.
I have a public api, call it api.example.com.
#app.route('/api')
def api():
name = request.args.get('name')
...
return jsonify({'address':'100 Main'})
I am building an app on top of my public api (call it www.coolapp.com), so in another app I have:
#app.route('/make_request')
def index():
params = {'name':'Fred'}
r = requests.get('http://api.example.com', params=params)
return render_template('really_cool.jinja2',address=r.text)
Both api.example.com and www.coolapp.com are hosted on the same server. It seems inefficient the way I have it (hitting the http server when I could access the api directly). Is there a more efficient way for coolapp to access the api and still be able to pass in the params that api needs?
Ultimately, with an API powered system, it's best to hit the API because:
It's user testing the API (even though you're the user, it's what others still access);
You can then scale easily - put a pool of API boxes behind a load balancer if you get big.
However, if you're developing on the same box you could make a virtual server that listens on localhost on a random port (1982) and then forwards all traffic to your api code.
To make this easier I'd abstract the API_URL into a setting in your settings.py (or whatever you are loading in to Flask) and use:
r = requests.get(app.config['API_URL'], params=params)
This will allow you to make a single change if you find using this localhost method isn't for you or you have to move off one box.
Edit
Looking at your comments you are hoping to hit the Python function directly. I don't recommend doing this (for the reasons above - using the API itself is better). I can also see an issue if you did want to do this.
First of all we have to make sure the api package is in your PYTHONPATH. Easy to do, especially if you're using virtualenvs.
We from api import views and replace our code to have r = views.api() so that it calls our api() function.
Our api() function will fail for a couple of reasons:
It uses the flask.request to extract the GET arg 'name'. Because we haven't made a request with the flask WSGI we will not have a request to use.
Even if we did manage to pass the request from the front end through to the API the second problem we have is using the jsonify({'address':'100 Main'}). This returns a Response object with an application type set for JSON (not just the JSON itself).
You would have to completely rewrite your function to take into account the Response object and handle it correctly. A real pain if you do decide to go back to an API system again...
Depending on how you structure your code, your database access, and your functions, you can simply turn the other app into package, import the relevant modules and call the functions directly.
You can find more information on modules and packages here.
Please note that, as Ewan mentioned, there's some advantages to using the API. I would advise you to use requests until you actually need faster requests (this is probably premature optimization).
Another idea that might be worth considering, depending on your particular code, is creating a library that is used by both applications.
I've worked out how OAuth2 works (via https://developers.google.com/api-client-library/python/guide/aaa_oauth) and now have an OAuth2Credentials object (let's call the object credentials) that I want to use for Google Apps provisioning purposes (the example here is using sites, but could be any of the gdata apis)
If I try:
client = gdata.sites.client.SitesClient(site="test-site",domain='my.domain')
client = credentials.authorize(client)
I get
TypeError: new_request() got an unexpected keyword argument 'http_request'
when I try to do anything
If I try
client = gdata.sites.client.SitesClient(site="test-site",domain='my.domain', auth_token=credentials)
or
client = gdata.sites.client.SitesClient(site="test-site",domain='my.domain', auth_token=credentials.access_token)
I get an AttributeError that the relevant object (credentials or credentials.access_token) has no attribute 'modify_request'
Any ideas what I can try?
I'm not entirely sure about Google's client code, but you could always try (shameless plug) sanction. It's an OAuth 2.0 client I wrote a while ago available on Github and PyPI.
The upsides:
Being a whopping 55 LOC, it's tremendously easy to grok. If something goes wrong, you won't have to ask questions here.. You should be able to just understand what's going on ;)
It has been tested with 8 different providers (including Google)
The downsides:
Obviously would require a refactor of your current code
Doesn't assume (and therefore provide) persistence implementations
Doesn't provide API implementations (you have to have a basic understanding of the OAuth 2.0-exposed API that you're dealing with)
This answer says you have to monkeypatch the OAuth2Credentials object, before passing it to the SitesClient(auth_token=credentials). It has an answer showing you how to do the monkey patching