How to use Bigquery streaming insertall on app engine & python - python

I would like to develop an app engine application that directly stream data into a BigQuery table.
According to Google's documentation there is a simple way to stream data into bigquery:
http://googlecloudplatform.blogspot.co.il/2013/09/google-bigquery-goes-real-time-with-streaming-inserts-time-based-queries-and-more.html
https://developers.google.com/bigquery/streaming-data-into-bigquery#streaminginsertexamples
(note: in the above link you should select the python tab and not Java)
Here is the sample code snippet on how streaming insert should be coded:
body = {"rows":[
{"json": {"column_name":7.7,}}
]}
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=DATASET_ID,
tableId=TABLE_ID,
body=body).execute()
Although I've downloaded the client api I didn't find any reference to a "bigquery" module/object referenced in the above Google's example.
Where is the the bigquery object (from snippet) should be located?
Can anyone show a more complete way to use this snippet (with the right imports)?
I've Been searching for that a lot and found documentation confusing and partial.

Minimal working (as long as you fill in the right ids for your project) example:
import httplib2
from apiclient import discovery
from oauth2client import appengine
_SCOPE = 'https://www.googleapis.com/auth/bigquery'
# Change the following 3 values:
PROJECT_ID = 'your_project'
DATASET_ID = 'your_dataset'
TABLE_ID = 'TestTable'
body = {"rows":[
{"json": {"Col1":7,}}
]}
credentials = appengine.AppAssertionCredentials(scope=_SCOPE)
http = credentials.authorize(httplib2.Http())
bigquery = discovery.build('bigquery', 'v2', http=http)
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=DATASET_ID,
tableId=TABLE_ID,
body=body).execute()
print response
As Jordan says: "Note that this uses the appengine robot to authenticate with BigQuery, so you'll to add the robot account to the ACL of the dataset. Note that if you also want to use the robot to run queries, not just stream, you need the robot to be a member of the project 'team' so that it is authorized to run jobs."

Here is a working code example from an appengine app that streams records to a BigQuery table. It is open source at code.google.com:
http://code.google.com/p/bigquery-e2e/source/browse/sensors/cloud/src/main.py#124
To find out where the bigquery object comes from, see
http://code.google.com/p/bigquery-e2e/source/browse/sensors/cloud/src/config.py
Note that this uses the appengine robot to authenticate with BigQuery, so you'll to add the robot account to the ACL of the dataset.
Note that if you also want to use the robot to run queries, not just stream, you need to robot to be a member of the project 'team' so that it is authorized to run jobs.

Related

How can I see the service account that the python bigquery client uses?

To create a default bigquery client I use:
from google.cloud import bigquery
client = bigquery.Client()
This uses the (default) credentials available in the environment.
But how I see then which (default) service account is used?
While you can interrogate the credentials directly (be it json keys, metadata server, etc), I have occasionally found it valuable to simply query bigquery using the SESSION_USER() function.
Something quick like this should suffice:
client = bigquery.Client()
query_job = client.query("SELECT SESSION_USER() as whoami")
results = query_job.result()
for row in results:
print("i am {}".format(row.whoami))
This led me in the right direction:
Google BigQuery Python Client using the wrong credentials
To see the service-account used you can do:
client._credentials.service_account_email
However:
This statement above works when you run it on a jupyter notebook (in Vertex AI), but when you run it in a cloud function with print(client._credentials.service_account_email) then it just logs 'default' to Cloud Logging. But the default service account for a Cloud Function should be: <project_id>#appspot.gserviceaccount.com.
This will also give you the wrong answer:
client.get_service_account_email()
The call to client.get_service_account_email() does not return the credential's service account email address. Instead, it returns the BigQuery service account email address used for KMS encryption/decryption.
Following John Hanley's comment (when running on a Compute Engine) you can query the metadata service to get the email user name:
https://cloud.google.com/compute/docs/metadata/default-metadata-values
So you can either use linux:
curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email" -H "Metadata-Flavor: Google"
Or python:
import requests
headers = {'Metadata-Flavor': 'Google'}
response = requests.get(
"http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/email",
headers=headers
)
print(response.text)
The default in the url used is the alias of the actual service account used.

Python, Google Service Object in Pandas dataframe?

We have an online company. The website contains many features and we would like to analyse which customers visit which sites, and how many times.
PROBLEM:
I am trying to write a program that should use certain Google Analytics data to create an HTML table (using pandas), that can be viewed anytime with the most recent Google Analytics data.
WHAT I HAVE DONE:
I have managed to get authenticated and have all permissions (I believe so because I haven't received a permission error message yet) and get in return a service object, which I don't know how to use/open?
#!/usr/bin/env python3
"""Script that does the following:
1) Initialise a Google Analytics Reporting API service object
"""
import os
import argparse
from apiclient.discovery import build
import httplib2
from oauth2client import client
from oauth2client import file
from oauth2client import tools
import yaml
import pandas as pd
scopes = ['https://www.googleapis.com/auth/analytics.readonly']
# Path to client_secrets.json file.
client_secrets_path = 'credentials/client_secret_xx.apps.googleusercontent.com.json'
def initialise_analyticsreporting():
"""Initializes the analyticsreporting service object.
Returns:
an authorized analyticsreporting service object.
"""
# Parse command-line arguments.
parser = argparse.ArgumentParser(
formatter_class=argparse.RawDescriptionHelpFormatter,
parents=[tools.argparser])
flags = parser.parse_args([])
# Set up a Flow object to be used if we need to authenticate.
flow = client.flow_from_clientsecrets(
"credentials/client_secret_xx.apps.googleusercontent.com.json",
scope='https://www.googleapis.com/auth/analytics.readonly',
message=tools.message_if_missing(client_secrets_path))
# Prepare credentials, and authorize HTTP object with them.
# If the credentials don't exist or are invalid run through the native client
# flow. The Storage object will ensure that if successful the good
# credentials will get written back to a file.
storage = file.Storage('credentials/analyticsreporting.dat')
credentials = storage.get()
if credentials is None or credentials.invalid:
credentials = tools.run_flow(flow, storage, flags)
http = credentials.authorize(http=httplib2.Http())
# Build the service object.
analytics = build('analyticsreporting', 'v4', http=http)
return analytics
This returns analytics, looking like this <googleapiclient.discovery.Resource object at 0x00000XOXOXOXOX>
At the end of the day, I would just want to have the Google Analytics data in a pandas data frame so that I can manipulate and work with it. I am no expert with Google Analytics. This is crucial for our business, any help would be appreciated. I am really scratching my head.
EXPECTED OUTPUT (to serve as a guideline to what I want to achieve, I am fairly skilled with Pandas. The problem is to get the data from GA):
user_id site visits
123 abc.com/something 12
234 abc.com/smthngelse 7
Thanks, I am happy to answer questions
Your analytics object is just a service object - you can use it to access the methods that return the data, but it does not, by itself, contain Google Analytics data. As you are using version 4 of the core reporting API, you can just look at this example from the documentation:
def get_report(analytics):
# Use the Analytics Service Object to query the Analytics Reporting API V4.
return analytics.reports().batchGet(
body={
'reportRequests': [
{
'viewId': VIEW_ID,
'dateRanges': [{'startDate': '7daysAgo', 'endDate': 'today'}],
'metrics': [{'expression': 'ga:sessions'}]
}]
}
).execute()
Change the metrics and add dimensions to your liking (not every combination works or makes sense, though), enter your view id and you should be good to go.

Trying to authenticate google cloud platform (GCP) to use speech to text APi

I am trying to use the google cloud platform (GCP) for the speech to text API in python but for some reason I can't seem to get access to the GCP to use the API. How do I authenticate my credentials?
I have tried to follow the instructions provided by google to authenticate my credentials but I am just so lost as nothing seems to be working.
I have created a GCP project, set-up billing information, enabled API and created service account without any problems.
I have tried to set my environment using command line to set GOOGLE_APPLICATION_CREDENTIALS=[PATH]
and then run the following code which has been taken straight from the google tutorial page:
def transcribe_streaming(stream_file):
"""Streams transcription of the given audio file."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
with io.open(stream_file, 'rb') as audio_file:
content = audio_file.read()
# In practice, stream should be a generator yielding chunks of audio data.
stream = [content]
requests = (types.StreamingRecognizeRequest(audio_content=chunk)
for chunk in stream)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
streaming_config = types.StreamingRecognitionConfig(config=config)
# streaming_recognize returns a generator.
responses = client.streaming_recognize(streaming_config, requests)
for response in responses:
# Once the transcription has settled, the first result will contain the
# is_final result. The other results will be for subsequent portions of
# the audio.
for result in response.results:
print('Finished: {}'.format(result.is_final))
print('Stability: {}'.format(result.stability))
alternatives = result.alternatives
# The alternatives are ordered from most likely to least.
for alternative in alternatives:
print('Confidence: {}'.format(alternative.confidence))
print(u'Transcript: {}'.format(alternative.transcript))
I get the following error message:
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
You can also set credentials directly in your script
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file("/path/to/your/crendentials.json")
client = speech.SpeechClient(credentials=credentials)

bigQuery Google Cloud how to share dataset with other users? using python

I have a bigQuery dataset defined in Google Cloud with my userA account, and I want my colleague userB, who is a member of the same group, to be able to see the dataset that I have defined. Using the bq command-line interface, userB can see the project, but not the dataset. How can I share the dataset created by userA with userB using python script?
Another thing you may run into is that you must give access at the data set level in BigQuery. Depending on how you have setup user roles in cloud platform and BigQuery, you may need to give the service account direct access to the Bigquery data set.
To do this go into BigQuery, hover on your dataset and click the down arrow, select 'share data set'. A modal will open where you can then specify which email address's and service accounts to share the data set with and control their access rights.
Let me know if my instructions are too confusing and I'll upload some images showing exactly how to do this.
Good Luck!!
An example using the Python Client Library. Adapted from here but adding a get_dataset call to get the current ACL policy for already existing datasets:
from google.cloud import bigquery
project_id = "PROJECT_ID"
dataset_id = "DATASET_NAME"
group_name= "google-group-name#google.com"
role = "READER"
client = bigquery.Client(project=project_id)
dataset_info = client.get_dataset(client.dataset(dataset_id))
access_entries = dataset_info.access_entries
access_entries.append(
bigquery.AccessEntry(role, "groupByEmail", group_name)
)
dataset_info.access_entries = access_entries
dataset_info = client.update_dataset(
dataset_info, ['access_entries'])
Another way to do it is using the Google Python API Client and the get and patch methods. First, we retrieve the existing dataset ACL, add the group as READER to the response and patch the dataset metadata:
from oauth2client.client import GoogleCredentials
from googleapiclient import discovery
project_id="PROJECT_ID"
dataset_id="DATASET_NAME"
group_name="google-group-name#google.com"
role="READER"
credentials = GoogleCredentials.get_application_default()
bq = discovery.build("bigquery", "v2", credentials=credentials)
response = bq.datasets().get(projectId=project_id, datasetId=dataset_id).execute()
response['access'].append({u'role': u'{}'.format(role), u'groupByEmail': u'{}'.format(group_name)})
bq.datasets().patch(projectId=project_id, datasetId=dataset_id, body=response).execute()
Replace the project_id, dataset_id, group_name and role variables accordingly.
Versions used:
$ pip freeze | grep -E 'bigquery|api-python'
google-api-python-client==1.7.7
google-cloud-bigquery==1.8.1

401 Unauthorized making REST Call to Azure API App using Bearer token

I created 2 applications in my Azure directory, 1 for my API Server and one for my API client. I am using the Python ADAL Library and can successfully obtain a token using the following code:
tenant_id = "abc123-abc123-abc123"
context = adal.AuthenticationContext('https://login.microsoftonline.com/' + tenant_id)
token = context.acquire_token_with_username_password(
'https://myapiserver.azurewebsites.net/',
'myuser',
'mypassword',
'my_apiclient_client_id'
)
I then try to send a request to my API app using the following method but keep getting 'unauthorized':
at = token['accessToken']
id_token = "Bearer {0}".format(at)
response = requests.get('https://myapiserver.azurewebsites.net/', headers={"Authorization": id_token})
I am able to successfully login using myuser/mypass from the loginurl. I have also given the client app access to the server app in Azure AD.
Although the question was posted a long time ago, I'll try to provide an answer. I stumbled across the question because we had the exact same problem here. We could successfully obtain a token with the adal library but then we were not able to access the resource I obtained the token for.
To make things worse, we sat up a simple console app in .Net, used the exact same parameters, and it was working. We could also copy the token obtained through the .Net app and use it in our Python request and it worked (this one is kind of obvious, but made us confident that the problem was not related to how I assemble the request).
The source of the problem was in the end in the oauth2_client of the adal python package. When I compared the actual HTTP requests sent by the .Net and the python app, a subtle difference was that the python app sent a POST request explicitly asking for api-version=1.0.
POST https://login.microsoftonline.com/common//oauth2/token?api-version=1.0
Once I changed the following line in oauth2_client.py in the adal library, I could access my resource.
Changed
return urlparse('{}?{}'.format(self._token_endpoint, urlencode(parameters)))
in the method _create_token_url, to
return urlparse(self._token_endpoint)
We are working on a pull request to patch the library in github.
For the current release of Azure Python SDK, it support authentication with a service principal. It does not support authentication using an ADAL library yet. Maybe it will in future releases.
See https://azure-sdk-for-python.readthedocs.io/en/latest/resourcemanagement.html#authentication for details.
See also Azure Active Directory Authentication Libraries for the platforms ADAL is available on.
#Derek,
Could you set your Issue URL on Azure Portal? If I set the wrong Issue URL, I could get the same error with you. It seems that your code is right.
Base on my experience, you need add your application into Azure AD and get a client ID.(I am sure you have done this.) And then you can get the tenant ID and input into Issue URL textbox on Azure portal.
NOTE:
On old portal(manage.windowsazure.com),in the bottom command bar, click View Endpoints, and then copy the Federation Metadata Document URL and download that document or navigate to it in a browser.
Within the root EntityDescriptor element, there should be an entityID attribute of the form https://sts.windows.net/ followed by a GUID specific to your tenant (called a "tenant ID"). Copy this value - it will serve as your Issuer URL. You will configure your application to use this later.
My demo is as following:
import adal
import requests
TenantURL='https://login.microsoftonline.com/*******'
context = adal.AuthenticationContext(TenantURL)
RESOURCE = 'http://wi****.azurewebsites.net'
ClientID='****'
ClientSect='7****'
token_response = context.acquire_token_with_client_credentials(
RESOURCE,
ClientID,
ClientSect
)
access_token = token_response.get('accessToken')
print(access_token)
id_token = "Bearer {0}".format(access_token)
response = requests.get(RESOURCE, headers={"Authorization": id_token})
print(response)
Please try to modified it. Any updates, please let me know.

Categories

Resources