BigQuery unit testing using Python - python

I am trying to test a BigQuery class with a Mock object to represent the table. Instances of my BigQueryRequest class must provide the BigQuery table uri. Would it be possible for me to create a mock BigQuery table directly from Python? How would it possible?
class BigQueryRequest:
"""BigQueryRequest
Contains a BigQuery request with its parameter.
Receive a table uri ($project_id.$dataset.$table) to run query
Args:
uri (str): BigQuery table uri
Properties:
BigQueryRequest.project: return the project running BigQuery
BigQueryRequest.dataset: return the dataset
BigQueryRequest.table: return the table to query
BigQueryRequest.destination_project: same as project but for destination project
BigQueryRequest.destination_dataset: same as project but for destination dataset
BigQueryRequest.destination_table: same as project but for destination table
Methods:
from_uri(): (#classmethod) parse a BigQuery uri to its project, dataset, table
destination(): return a uri of the BigQuery request destination table
query(): run the given BigQuery query
Private methods:
__set_destination(): generate a destination uri following the nomenclature or reuse entry uri
"""
def __init__(self, uri="", step="", params={}):
self.project, self.dataset, self.table = self.from_uri(uri)
self.step = step
self.params = self.set_params(params)
self.overwrite = False
(
self.destination_project,
self.destination_dataset,
self.destination_table,
) = self.__set_destination()

You'd have to do it yourself, Google Cloud does not provide an official mocking library for GCP products or service.
You could also try https://github.com/Khan/tinyquery as an alternative.

if you plan to test your SQL and assert your result based on input, I would suggest bq-test-kit. This framework allows you to interact with BigQuery in Python and make tests reliables.
You have 3 ways to inject data into it :
Create datasets and tables with an ability to isolate their name and therefore have your own namespace
Rely on temp tables, where data is inserted with data literals
data literal merged into your query
Hope that this helps.

Related

Create Ebay Connector using Syncari SDk

i'm trying to create a connector inside Syncari SDK to get Orders from Ebay using this SDK
https://pypi.org/project/syncari-sdk/
in ebay am using Production environment using Auth'n'Auth
and with this Token am using:
1. Trading API
2.API CALL : GetOrders
3.API VERSION : 967
HTTP headers
X-EBAY-API-SITEID:0
X-EBAY-API-COMPATIBILITY-LEVEL:967
X-EBAY-API-CALL-NAME:GetOrders
Request Body
<?xml version="1.0" encoding="utf-8"?>
<GetOrdersRequest xmlns="urn:ebay:apis:eBLBaseComponents">
<RequesterCredentials>
<eBayAuthToken></eBayAuthToken>
</RequesterCredentials>
<ErrorLanguage>en_US</ErrorLanguage>
<WarningLevel>High</WarningLevel>
<OrderIDArray>
<!-- Enter one or more of the seller's Order IDs in separate OrderID fields. Only legacy Order IDs are supported in GetOrders and not the ExtendedOrderIDs that are supported in eBay REST APIs calls like the Fulfillment API. The legacy Order ID values are actually the concatenation of ItemID and TransactionID, with a hyphen in between these two values. In legacy API calls, the OrderLineItemID and OrderID values are the same for single line item orders. Note that the TransactionID for an auction listing is always '0', which is demonstrated below -->
<OrderID>XXXXXXXXXXXX-XXXXXXXXXXXXX</OrderID>
<OrderID>XXXXXXXXXXXX-0</OrderID>
</OrderIDArray>
<OrderRole>Seller</OrderRole>
</GetOrdersRequest>
Inside SDK we have a method called : synapse_Info() to connect to ebay using token
Read the document here : https://support.syncari.com/hc/en-us/articles/4580013102868-Custom-Synapse-SDK-Documentation
i want to know how to add Headers, Toekn , Body to get Orders From Ebay API and add all of this here
def synapse_info(self):
return SynapseInfo(
name='lysiSynapse', category='other',
metadata=UIMetadata(displayName='Lysi Synapse'),
supportedAuthTypes=[AuthMetadata(authType=AuthType.BASIC_TOKEN, label='API Key', fields=[AuthField(name='token', label='API Key', dataType=DataType.PASSWORD)])],
configuredFields=[AuthField(name='endpoint', label='Endpoint URL', dataType=DataType.STRING)])
we can see response usning Method : Test() in the SDK
def test(self, connection: Connection):
self.client.get("/users",headers=self.__auth_headers())
if not connection.metaConfig:
connection.metaConfig={}
return connection
The synapse_info info method is meant to be used to declare the Synapse’s UI elements and authentication options and not to add headers and tokens directly. The method returns a SynapseInfo object that defines fields in the Syncari’s UI. It is possible later in the framework to access those defined fields from the connection object, which persist through the framework.
We recommend first is to define the Syncari rest client in the Synapse class:
def __init__(self, request: Request) -> None:
super().__init__(request)
self.client = SyncariRestClient(self.connection.authConfig.endpoint, self.connection.authConfig)
This way it’s possible to use the Syncari client that will have access to the endpoint and authentication data that the user inputs in the UI of Syncari.
Since Ebay's API uses XML queries we’d recommend creating helper methods to assist with constructing the queries. For this particular example:
def xmlQueryConstructorGetOrderByID(eBayAuthToken, Orders):
return f"<?xml version=\"1.0\" encoding=\"utf-8\"?><GetOrderRequest xmlns=\"urn:ebay:apis:eBLBaseComponents\"><RequesterCredentials><eBayAuthToken>{eBayAuthToken}</eBayAuthToken></RequesterCredentials><ErrorLanguage>en_US</ErrorLanguage><WarningLevel>High</WarningLevel><OrderIDArray>{Orders}</OrderIDArray><OrderRole>Seller</OrderRole></GetOrderRequest>""
Additionally, we recommend to hardcode the required headers in a constant:
HEADERS = {"Content-Type":"application/xml", "x-ebay-api-call-name": "GetOrders", "x-ebay-api-compatibility-level" : "967", "X-EBAY-API-SITEID" : "0"}
We can now use all of this to make a call in the test method, which is used to test the credentials a user imputed in the UI of Syncari application:
def test(self, connection: Connection):
self.client.get("", headers=HEADERS, data=self.client.post("",headers=HEADERS,
data=xmlQueryConstructorGetOrderByID(self.connection.authConfig.token,"<OrderID>XXXXXXXXXXXX-XXXXXXXXXXXXX</OrderID>")))
return connection
In an actual use case the would be replaced with an actual existing order’s ID and the authentication data is accessed from the connection object.
We also recommend using a python library like xmltodict to convert the received XML data into JSON, so it is easier to pass it to Syncari.
We hope this answers the question but if you need any further assistance please reach out to our support at developers#syncari.com.

How to point to the ARN of a dynamodb table instead of using the name when using boto3

I'm trying to access a dynamodb table in another account without having to make any code changes if possible. I've setup the IAM users, roles and policies to make this possible and have succeeded with other services such as sqs and s3.
The problem I have now is with dynamodb as the code to intialise the boto3.resource connection seems to only allow me to point to the name. docs
dynamodb = boto3.resource('dynamodb', region_name='us-east-2')
table = dynamodb.Table(config['dynamo_table_1'])
This causes the problem of the code trying to access a table with that particular name in the account the code is executing in which errors out as the table exists in a different AWS account.
Is there a way to pass the ARN of the table or some identifier that would allow me to specify the accountID?
There's sample code at https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/configure-cross-account-access-to-amazon-dynamodb.html which shows how to do cross-account access. Here is a snippet from the the attached zip. I expect you could do .resource() as well as .client() with the same arguments.
import boto3
from datetime import datetime
sts_client = boto3.client('sts')
sts_session = sts_client.assume_role(RoleArn='arn:aws:iam::<Account-A ID>::role/DynamoDB-FullAccess-For-Account-B',
RoleSessionName='test-dynamodb-session')
KEY_ID = sts_session['Credentials']['AccessKeyId']
ACCESS_KEY = sts_session['Credentials']['SecretAccessKey']
TOKEN = sts_session['Credentials']['SessionToken']
dynamodb_client = boto3.client('dynamodb',
region_name='us-east-2',
aws_access_key_id=KEY_ID,
aws_secret_access_key=ACCESS_KEY,
aws_session_token=TOKEN)

Firestore to BigQuery using Cloud function

I have a Bigquery table that link using the firestore to bigQuery extension. But, I want the BigQuery to run based on the event trigger from cloud firestore (e.g. if there is any changes/new document inserted in specific collection) so that it can load the new data streaming direct to BigQuery table.
And I want all the cloud function to be written using Python script. But most of the online use case/examples are in JS (example: https://blog.questionable.services/article/from-firestore-to-bigquery-firebase-functions/).
When I tried to follow (https://cloud.google.com/functions/docs/calling/cloud-firestore) and deploy in cloud function as below, it keep fails.
import json
cred = credentials.Certificate("YYYYY.json")
firebase_admin.initialize_app(cred)
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="YYYYY.json"
db = firestore.client()
bq = bigquery.Client()
collection = db.collection('XXXX')
def hello_firestore(data, context):
""" Triggered by a change to a Firestore document.
Args:
data (dict): The event payload.
context (google.cloud.functions.Context): Metadata for the event.
"""
trigger_resource = context.resource
print ('Function triggered by change to: %s' % trigger_resource)
print ('\nOld value:')
print (json.dumps(data["oldValue"]))
print ('\nNew value:')
print (json.dumps(data["value"]))
Request for ideas/suggestion. Thank you for clarifying me in advance.

How to load data from one project's bucket into another project's table in python?

I have two projects. I store data in project_A and build tables in project_B. I have created a service_account.json for both of them, but I don't know how to use both files at the same time when I need to load the data from project_A and build the tables in project_B.
The data is stored in URI:
gs://project_A//*
The table will live in project_B table name huge_table
from google import storage, bigquery
proj_a_client = storage.Client.from_service_account_json(service_acct_A.json)
proj_b_client = bigquery.Client.from_service_account_json(service_acct_B.json)
dest_table = proj_b_client.dataset('DS_B').table('huge_table')
uri = 'gs://project_A//*'
job_config = bigquery.LoadJobConfig()
load_job = proj_b_client.load_table_from_uri(uri,
dest_table,
job_config=job_config)
But I get the error:
google.api_core.exceptions.Forbidden: 403 Access Denied: File
gs://project_A/: Access Denied
You have to make sure service_acct_B has storage access to project_A:
In project_A,
go to IAM & admin
add member service_acct_B with (at a minimum) Storage Object Viewer role
As a matter of fact, you don't use/need a service_acct_A here so
proj_a_client = storage.Client.from_service_account_json(service_acct_A.json) is redundant.

How to use Bigquery streaming insertall on app engine & python

I would like to develop an app engine application that directly stream data into a BigQuery table.
According to Google's documentation there is a simple way to stream data into bigquery:
http://googlecloudplatform.blogspot.co.il/2013/09/google-bigquery-goes-real-time-with-streaming-inserts-time-based-queries-and-more.html
https://developers.google.com/bigquery/streaming-data-into-bigquery#streaminginsertexamples
(note: in the above link you should select the python tab and not Java)
Here is the sample code snippet on how streaming insert should be coded:
body = {"rows":[
{"json": {"column_name":7.7,}}
]}
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=DATASET_ID,
tableId=TABLE_ID,
body=body).execute()
Although I've downloaded the client api I didn't find any reference to a "bigquery" module/object referenced in the above Google's example.
Where is the the bigquery object (from snippet) should be located?
Can anyone show a more complete way to use this snippet (with the right imports)?
I've Been searching for that a lot and found documentation confusing and partial.
Minimal working (as long as you fill in the right ids for your project) example:
import httplib2
from apiclient import discovery
from oauth2client import appengine
_SCOPE = 'https://www.googleapis.com/auth/bigquery'
# Change the following 3 values:
PROJECT_ID = 'your_project'
DATASET_ID = 'your_dataset'
TABLE_ID = 'TestTable'
body = {"rows":[
{"json": {"Col1":7,}}
]}
credentials = appengine.AppAssertionCredentials(scope=_SCOPE)
http = credentials.authorize(httplib2.Http())
bigquery = discovery.build('bigquery', 'v2', http=http)
response = bigquery.tabledata().insertAll(
projectId=PROJECT_ID,
datasetId=DATASET_ID,
tableId=TABLE_ID,
body=body).execute()
print response
As Jordan says: "Note that this uses the appengine robot to authenticate with BigQuery, so you'll to add the robot account to the ACL of the dataset. Note that if you also want to use the robot to run queries, not just stream, you need the robot to be a member of the project 'team' so that it is authorized to run jobs."
Here is a working code example from an appengine app that streams records to a BigQuery table. It is open source at code.google.com:
http://code.google.com/p/bigquery-e2e/source/browse/sensors/cloud/src/main.py#124
To find out where the bigquery object comes from, see
http://code.google.com/p/bigquery-e2e/source/browse/sensors/cloud/src/config.py
Note that this uses the appengine robot to authenticate with BigQuery, so you'll to add the robot account to the ACL of the dataset.
Note that if you also want to use the robot to run queries, not just stream, you need to robot to be a member of the project 'team' so that it is authorized to run jobs.

Categories

Resources