Python Unit Testing Google Bigquery

Python Unit Testing Google Bigquery - python

I am having trouble in unit testing the following code block:
from google.cloud import bigquery
from google.oauth2 import service_account
def run_query(query, gcp_ser_acc):
credentials =
service_account.Credentials.from_service_account_info(gcp_ser_acc)
client = bigquery.Client(gcp_ser_acc['project_id'], credentials)
query_job = client.query(query)
results = query_job.result()
return results
I am new to mocking and I have tried the following test:
def test_run_a_query_with_real_key(self):
gcp_ser_acc = {
'project_id': 'my_project_id',
'private_key': 'my_private_key',
'token_uri': 'my_token_uri',
'client_email': 'my_client_email'
}
with mock.patch('service_account.Credentials', call_args=gcp_ser_acc, return_value={}):
with mock.patch('bigquery.Client', call_args=(gcp_ser_acc['project_id'], {}), return_value={}):
run_query('SELECT 1+1 as col', gcp_ser_acc)
assert service_account.Credentials.called
assert bigquery.Client.called
Can anybody mock the google stuff and write a unit test please?

This is how you mock google.cloud.bigquery with pytest, pytest-mock
from google.cloud import bigquery
schema = [
bigquery.SchemaField("full_name", "STRING", mode="REQUIRED"),
bigquery.SchemaField("age", "INTEGER", mode="REQUIRED"),
]
def some_query(table_name='blahblahbloo'):
client = bigquery.Client()
table_id = f"project.dataset.{table_name}"
table = bigquery.Table(table_id, schema=schema)
table = client.create_table(table)
def test_some_query(mocker):
mock_table = mocker.patch('google.cloud.bigquery.Table', autospec=True)
mock_client = mocker.patch('google.cloud.bigquery.Client', autospec=True)
some_query() # run with mocked objects
mock_table.assert_called_with('project.dataset.blahblahbloo', schema=schema)
mock_client().create_table.assert_called_with(mock_table.return_value)

While it might be possible to improve the mocks here, it isn't going to provide much value to you as a test. In your code, there's two basic things you can be testing:
Are you passing in correct credentials etc to use BigQuery correctly.
Is your application's business logic around the query and result processing correct.
For (1), no unit test is going to provide you actual reassurance that your code works on GCP. All it will do is show that it does the thing that your tests check for. Instead of unit testing, consider some kind of integration or system test that actual makes a for-real call to GCP (but don't run this as often as unit tests)
Unit tests are a good fit for (2), however your function as it currently stands doesn't really do anything. If you did - lets say some code that instantiates an object for each result row - then we could unit test that.

Related

Update query string in scheduled query using Python Client for BigQuery Data Transfer Service

I'm struggling to find documentation and examples for Python Client for BigQuery Data Transfer Service. A new query string is generated by my application from time to time and I'd like to update the existing scheduled query accordingly. This is the most helpful thing I have found so far, however I am still unsure where to pass my query string. Is this the correct method?
from google.cloud import bigquery_datatransfer_v1
def sample_update_transfer_config():
# Create a client
client = bigquery_datatransfer_v1.DataTransferServiceClient()
# Initialize request argument(s)
transfer_config = bigquery_datatransfer_v1.TransferConfig()
transfer_config.destination_dataset_id = "destination_dataset_id_value"
request = bigquery_datatransfer_v1.UpdateTransferConfigRequest(
transfer_config=transfer_config,
)
# Make the request
response = client.update_transfer_config(request=request)
# Handle the response
print(response)

You may refer to this Update Scheduled Queries for python documentation from BigQuery for the official reference on the usage of Python Client Library in updating scheduled queries.
However, I updated the code for you to update your query string. I added the updated query string in the params and define what attributes of the TransferConfig() will be updated in the update_mask.
See updated code below:
from google.cloud import bigquery_datatransfer
from google.protobuf import field_mask_pb2
transfer_client = bigquery_datatransfer.DataTransferServiceClient()
transfer_config_name = "projects/{your-project-id}/locations/us/transferConfigs/{unique-ID-of-transferconfig}"
new_display_name = "Your Desired Updated Name if Necessary" #--remove if no need to update **scheduled query name**.
query_string_new = """
SELECT
CURRENT_TIMESTAMP() as current_time
"""
new_params={
"query": query_string_new,
"destination_table_name_template": "your_table_{run_date}",
"write_disposition": "WRITE_TRUNCATE",
"partitioning_field": "",
}
transfer_config = bigquery_datatransfer.TransferConfig(name=transfer_config_name,
)
transfer_config.display_name = new_display_name #--remove if no need to update **scheduled query name**.
transfer_config.params = new_params
transfer_config = transfer_client.update_transfer_config(
{
"transfer_config": transfer_config,
"update_mask": field_mask_pb2.FieldMask(paths=["display_name","params"]), #--remove "display_name" from the list if no need to update **scheduled query name**.
}
)
print("Updates are executed successfully")
For you to get the value of your transfer_config_name, you may list all your scheduled queries by following this SO post.

Google Cloud Function - Python script to get data from Webhook

I hope someone can help me out on my problem.
I have a google-cloud function created which is http triggered and a webhook setup in customer.io
I need to capture data that was sent by customer.io app. this should trigger the google cloud function and run the python script setup within the cloud function. I am new to writing python script and its libraries. The final goal is to write the webhook data into bigquery table.
For now, I am able to see that trigger is working since I am seeing the data using print sent by the app using the function logs. I am able to check the schema of the data as well from the logs in textpayload.
This is the sample data from the textpayload I wanted to load on a bigquery table:
{
"data":{
"action_id":42,
"campaign_id":23,
"customer_id":"user-123",
"delivery_id":"RAECAAFwnUSneIa0ZXkmq8EdkAM==-",
"identifiers":{
"id":"user-123"
},
"recipient":"test#example.com",
"subject":"Thanks for signing up"
},
"event_id":"01E2EMRMM6TZ12TF9WGZN0WJaa",
"metric":"sent",
"object_type":"email",
"timestamp":1669337039
}
and this is the sample Python code I have created on the google-cloud function:
import os
def webhook(request):
request_json = request.get_json()
if request.method == 'POST':
print(request_json)
return 'success'
else:
return 'failed'
I have only tried printing the data from webhook and what I am expecting is to have a Python code that writes this textpayload data into bigquery table.
{
"data":{
"action_id":42,
"campaign_id":23,
"customer_id":"user-123",
"delivery_id":"RAECAAFwnUSneIa0ZXkmq8EdkAM==-",
"identifiers":{
"id":"user-123"
},
"recipient":"test#example.com",
"subject":"Thanks for signing up"
},
"event_id":"01E2EMRMM6TZ12TF9WGZN0WJaa",
"metric":"sent",
"object_type":"email",
"timestamp":1669337039
}

So, you have up a cloud function that executes some code whenever the webhook posts some data to it.
What this cloud function needs now is the BigQuery python client library. Here's an example of how it's used (source):
from google.cloud import bigquery
client = bigquery.Client()
dataset_id = ...
table_name = ...
data = ...
dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_name)
table = client.get_table(table_ref)
result = client.insert_rows(table, data)
So you could put something like this into your cloud function in order to send your data to a target BigQuery table.

Test fixture 'postgres' not found

I'm setting up unittest to test a postgres connection for an airflow operator. I have a setup function to spin up a postgres container and then a function to test some queries against the container. I'm relatively new to this so no doubt it's my logic not right.
class TestOperator:
def setUp(self):
#pytest postresql container patches
postgres_image = fetch(repository="postgres:11.1-alpine")
postgres = container(
image="{postgres_image.id}",
environment={"POSTGRES_USER": "testuser", "POSTGRES_PASSWORD": "testpass"},
ports={"5432/tcp": None},
volumes={
os.path.join(os.path.dirname(__file__), "postgres-init.sql"): {
"bind": "/docker-entrypoint-initdb.d/postgres-init.sql"
}
}
)
"""
Using Pytest Mocker to create a Postgresql container to connect test.
"""
def test_postgres_operator(self, mocker, postgres):
mocker.patch.object(
PostgresHook,
"get_connection",
return_value=Connection(
conn_id="postgres",
conn_type="postgres",
host="localhost",
login="testuser",
password="testpass",
port=postgres.ports["5432/tcp"][0],
),
)
#target Postgres Container for
task = PostGresOperator(
task_id="PostgresOperator",
postgres_conn_id="postgres_id",
)
pg_hook = PostgresHook()
row_count = pg_hook.get_first("select * from test")[0]
assert row_count >1
I then get the error
fixture 'postgres' not found
I'm sure my logic is wrong.

pytest thinks postgres is a fixture and fails looking for it.
Instead of having postgres passed in as an argument, you could set is as an instance field:
def setUp(self):
postgres_image = fetch(repository="postgres:11.1-alpine")
self.postgres = container(...)
def test_postgres_operator(self, mocker):
# use self.postgres instead of postgres
Or, you could define postgres as a proper pytest fixture to promote better reusability.
Alternatively, you could look into pytest-postgresql plugin which may make it easier to mock out and test postgresql related code.

Python built-in fixtures

I'm trying to run a pytest which uses the following function:
def storage_class(request):
def fin():
sc.delete()
request.addfinalizer(fin)
logger.info("Creating storage")
data = {'api_version': 'v1', 'kind': 'namespace'}
# data is ususally loaded from yaml template
sc = OCS(**data)
return sc
I cannot find in the project any fixture named "request" so I assume it's a built-in fixture. I have however searched it in the docs, but I cannot find a "request" build-in fixture: https://docs.pytest.org/en/latest/builtin.html
Anybody can shed some light on this (builtin?) fixture?
Thanks!

request fixture helps to get information about the context.
more on request fixture.
Example for request fixture.
The most common usage from request fixture is addfinalizer and config
And if you only need a teardown functionality, you can simply use a yield and get rid of the request fixture.
#pytest.fixture()
def storage_class():
logger.info("Creating storage")
data = {'api_version': 'v1', 'kind': 'namespace'}
sc = OCS(**data)
yield sc
# Any code after yield will give you teardown effect
sc.delete()

Mocking boto3 S3 client method Python

I'm trying to mock a singluar method from the boto3 s3 client object to throw an exception. But I need all other methods for this class to work as normal.
This is so I can test a singular Exception test when and error occurs performing a upload_part_copy
1st Attempt
import boto3
from mock import patch
with patch('botocore.client.S3.upload_part_copy', side_effect=Exception('Error Uploading')) as mock:
client = boto3.client('s3')
# Should return actual result
o = client.get_object(Bucket='my-bucket', Key='my-key')
# Should return mocked exception
e = client.upload_part_copy()
However this gives the following error:
ImportError: No module named S3
2nd Attempt
After looking at the botocore.client.py source code I found that it is doing something clever and the method upload_part_copy does not exist. I found that it seems to call BaseClient._make_api_call instead so I tried to mock that
import boto3
from mock import patch
with patch('botocore.client.BaseClient._make_api_call', side_effect=Exception('Error Uploading')) as mock:
client = boto3.client('s3')
# Should return actual result
o = client.get_object(Bucket='my-bucket', Key='my-key')
# Should return mocked exception
e = client.upload_part_copy()
This throws an exception... but on the get_object which I want to avoid.
Any ideas about how I can only throw the exception on the upload_part_copy method?

Botocore has a client stubber you can use for just this purpose: docs.
Here's an example of putting an error in:
import boto3
from botocore.stub import Stubber
client = boto3.client('s3')
stubber = Stubber(client)
stubber.add_client_error('upload_part_copy')
stubber.activate()
# Will raise a ClientError
client.upload_part_copy()
Here's an example of putting a normal response in. Additionally, the stubber can now be used in a context. It's important to note that the stubber will verify, so far as it is able, that your provided response matches what the service will actually return. This isn't perfect, but it will protect you from inserting total nonsense responses.
import boto3
from botocore.stub import Stubber
client = boto3.client('s3')
stubber = Stubber(client)
list_buckets_response = {
"Owner": {
"DisplayName": "name",
"ID": "EXAMPLE123"
},
"Buckets": [{
"CreationDate": "2016-05-25T16:55:48.000Z",
"Name": "foo"
}]
}
expected_params = {}
stubber.add_response('list_buckets', list_buckets_response, expected_params)
with stubber:
response = client.list_buckets()
assert response == list_buckets_response

As soon as I posted on here I managed to come up with a solution. Here it is hope it helps :)
import botocore
from botocore.exceptions import ClientError
from mock import patch
import boto3
orig = botocore.client.BaseClient._make_api_call
def mock_make_api_call(self, operation_name, kwarg):
if operation_name == 'UploadPartCopy':
parsed_response = {'Error': {'Code': '500', 'Message': 'Error Uploading'}}
raise ClientError(parsed_response, operation_name)
return orig(self, operation_name, kwarg)
with patch('botocore.client.BaseClient._make_api_call', new=mock_make_api_call):
client = boto3.client('s3')
# Should return actual result
o = client.get_object(Bucket='my-bucket', Key='my-key')
# Should return mocked exception
e = client.upload_part_copy()
Jordan Philips also posted a great solution using the the botocore.stub.Stubber class. Whilst a cleaner solution I was un-able to mock specific operations.

If you don't want to use either moto or the botocore stubber (the stubber does not prevent HTTP requests being made to AWS API endpoints it seems), you can use the more verbose unittest.mock way:
foo/bar.py
import boto3
def my_bar_function():
client = boto3.client('s3')
buckets = client.list_buckets()
...
bar_test.py
import unittest
from unittest import mock
class MyTest(unittest.TestCase):
#mock.patch('foo.bar.boto3.client')
def test_that_bar_works(self, mock_s3_client):
self.assertTrue(mock_s3_client.return_value.list_buckets.call_count == 1)

Here's an example of a simple python unittest that can be used to fake client = boto3.client('ec2') api call...
import boto3
class MyAWSModule():
def __init__(self):
client = boto3.client('ec2')
tags = client.describe_tags(DryRun=False)
class TestMyAWSModule(unittest.TestCase):
#mock.patch("boto3.client.describe_tags")
#mock.patch("boto3.client")
def test_open_file_with_existing_file(self, mock_boto_client, mock_describe_tags):
mock_describe_tags.return_value = mock_get_tags_response
my_aws_module = MyAWSModule()
mock_boto_client.assert_call_once('ec2')
mock_describe_tags.assert_call_once_with(DryRun=False)
mock_get_tags_response = {
'Tags': [
{
'ResourceId': 'string',
'ResourceType': 'customer-gateway',
'Key': 'string',
'Value': 'string'
},
],
'NextToken': 'string'
}
hopefully that helps.

What about simply using moto?
It comes with a very handy decorator:
from moto import mock_s3
#mock_s3
def test_my_model_save():
pass

I had to mock boto3 client for some integration testing and it was a bit painful! The problem that I had is that moto does not support KMS very well, yet I did not want to rewrite my own mock for the S3 buckets. So I created this morph of all of the answers. Also it works globally which is pretty cool!
I have it setup with 2 files.
First one is aws_mock.py. For the KMS mocking I got some predefined responses that came from live boto3 client.
from unittest.mock import MagicMock
import boto3
from moto import mock_s3
# `create_key` response
create_resp = { ... }
# `generate_data_key` response
generate_resp = { ... }
# `decrypt` response
decrypt_resp = { ... }
def client(*args, **kwargs):
if args[0] == 's3':
s3_mock = mock_s3()
s3_mock.start()
mock_client = boto3.client(*args, **kwargs)
else:
mock_client = boto3.client(*args, **kwargs)
if args[0] == 'kms':
mock_client.create_key = MagicMock(return_value=create_resp)
mock_client.generate_data_key = MagicMock(return_value=generate_resp)
mock_client.decrypt = MagicMock(return_value=decrypt_resp)
return mock_client
Second one is the actual test module. Let's call it test_my_module.py. I've omitted the code of my_module. As well as functions that are under the test. Let's call those foo, bar functions.
from unittest.mock import patch
import aws_mock
import my_module
#patch('my_module.boto3')
def test_my_module(boto3):
# Some prep work for the mock mode
boto3.client = aws_mock.client
conn = boto3.client('s3')
conn.create_bucket(Bucket='my-bucket')
# Actual testing
resp = my_module.foo()
assert(resp == 'Valid')
resp = my_module.bar()
assert(resp != 'Not Valid')
# Etc, etc, etc...
One more thing, not sure if that is fixed but I found out that moto was not happy unless you set some environmental variables like credentials and region. They don't have to be actual credentials but they do need to be set. There is a chance it might be fixed by the time you read this! But here is some code in case you do need it, shell code this time!
export AWS_ACCESS_KEY_ID='foo'
export AWS_SECRET_ACCESS_KEY='bar'
export AWS_DEFAULT_REGION='us-east-1'
I know it is probably not the prettiest piece of code but if you are looking for something universal it should work pretty well!

Here is my solution for patching a boto client used in the bowels of my project, with pytest fixtures. I'm only using 'mturk' in my project.
The trick for me was to create my own client, and then patch boto3.client with a function that returns that pre-created client.
#pytest.fixture(scope='session')
def patched_boto_client():
my_client = boto3.client('mturk')
def my_client_func(*args, **kwargs):
return my_client
with patch('bowels.of.project.other_module.boto3.client', my_client_func):
yield my_client_func
def test_create_hit(patched_boto_client):
client = patched_boto_client()
stubber = Stubber(client)
stubber.add_response('create_hit_type', {'my_response':'is_great'})
stubber.add_response('create_hit_with_hit_type', {'my_other_response':'is_greater'})
stubber.activate()
import bowels.of.project # this module imports `other_module`
bowels.of.project.create_hit_function_that_calls_a_function_in_other_module_which_invokes_boto3_dot_client_at_some_point()
I also define another fixture that sets up dummy aws creds so that boto doesn't accidentally pick up some other set of credentials on the system. I literally set 'foo' and 'bar' as my creds for testing -- that's not a redaction.
It's important that AWS_PROFILE env be unset because otherwise boto will go looking for that profile.
#pytest.fixture(scope='session')
def setup_env():
os.environ['AWS_ACCESS_KEY_ID'] = 'foo'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'bar'
os.environ.pop('AWS_PROFILE', None)
And then I specify setup_env as a pytest usefixtures entry so that it gets used for every test run.

I had a slightly different use case where the client is set up during a setup() method in a Class, as it does a few things such as listing things from the AWS service it's talking to (Connect, in my case). Lots of the above approaches weren't quite working, so here's my working version for future Googlers.
In order to get everything to work properly, I had to do this:
In the class under test (src/flow_manager.py):
class FlowManager:
client: botocore.client.BaseClient
def setup(self):
self.client = boto3.client('connect')
def set_instance(self):
response = self.client.list_instances()
... do stuff ....
In the test file (tests/unit/test_flow_manager.py):
#mock.patch('src.flow_manager.boto3.client')
def test_set_instance(self, mock_client):
expected = 'bar'
instance_list = {'alias': 'foo', 'id': 'bar'}
mock_client.list_instances.return_value = instance_list
actual = flow_manager.FlowManager("", "", "", "", 'foo')
actual.client = mock_client
actual.set_instance()
self.assertEqual(expected, actual.instance_id)
I've truncated the code to the relevant bits for this answer.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Unit Testing Google Bigquery - python

Related

Update query string in scheduled query using Python Client for BigQuery Data Transfer Service

Google Cloud Function - Python script to get data from Webhook

Test fixture 'postgres' not found

Python built-in fixtures

Mocking boto3 S3 client method Python

Categories

Resources