Related
Now that AWS have a Pricing API, how could one use Boto3 to fetch the current hourly price for a given on-demand EC2 instance type (e.g. t2.micro), region (e.g. eu-west-1) and operating system (e.g. Linux)? I only want the price returned. Based on my understanding, having those four pieces of information should be enough to filter down to a singular result.
However, all the examples I've seen fetch huge lists of data from the API that would have to be post-processed in order to get what I want. I would like to filter the data on the API side, before it's being returned.
Here is the solution I ended up with. Using Boto3's own Pricing API with a filter for the instance type, region and operating system. The API still returns a lot of information, so I needed to do a bit of post-processing.
import boto3
import json
from pkg_resources import resource_filename
# Search product filter. This will reduce the amount of data returned by the
# get_products function of the Pricing API
FLT = '[{{"Field": "tenancy", "Value": "shared", "Type": "TERM_MATCH"}},'\
'{{"Field": "operatingSystem", "Value": "{o}", "Type": "TERM_MATCH"}},'\
'{{"Field": "preInstalledSw", "Value": "NA", "Type": "TERM_MATCH"}},'\
'{{"Field": "instanceType", "Value": "{t}", "Type": "TERM_MATCH"}},'\
'{{"Field": "location", "Value": "{r}", "Type": "TERM_MATCH"}},'\
'{{"Field": "capacitystatus", "Value": "Used", "Type": "TERM_MATCH"}}]'
# Get current AWS price for an on-demand instance
def get_price(region, instance, os):
f = FLT.format(r=region, t=instance, o=os)
data = client.get_products(ServiceCode='AmazonEC2', Filters=json.loads(f))
od = json.loads(data['PriceList'][0])['terms']['OnDemand']
id1 = list(od)[0]
id2 = list(od[id1]['priceDimensions'])[0]
return od[id1]['priceDimensions'][id2]['pricePerUnit']['USD']
# Translate region code to region name. Even though the API data contains
# regionCode field, it will not return accurate data. However using the location
# field will, but then we need to translate the region code into a region name.
# You could skip this by using the region names in your code directly, but most
# other APIs are using the region code.
def get_region_name(region_code):
default_region = 'US East (N. Virginia)'
endpoint_file = resource_filename('botocore', 'data/endpoints.json')
try:
with open(endpoint_file, 'r') as f:
data = json.load(f)
# Botocore is using Europe while Pricing API using EU...sigh...
return data['partitions'][0]['regions'][region_code]['description'].replace('Europe', 'EU')
except IOError:
return default_region
# Use AWS Pricing API through Boto3
# API only has us-east-1 and ap-south-1 as valid endpoints.
# It doesn't have any impact on your selected region for your instance.
client = boto3.client('pricing', region_name='us-east-1')
# Get current price for a given instance, region and os
price = get_price(get_region_name('eu-west-1'), 't3.micro', 'Linux')
print(price)
This example outputs 0.0114000000 (hourly price in USD) fairly quickly. (This number was verified to match the current value listed here at the date of this writing)
If you don't like the native function, then look at Lyft's awspricing library for Python. Here's an example:
import awspricing
ec2_offer = awspricing.offer('AmazonEC2')
p = ec2_offer.ondemand_hourly(
't2.micro',
operating_system='Linux',
region='eu-west-1'
)
print(p) # 0.0126
I'd recommend enabling caching (see AWSPRICING_USE_CACHE) otherwise it will be slow.
I have updated toringe's solution a bit to handle different key errors
def price_information(self, instance_type, os, region):
# Search product filter
FLT = '[{{"Field": "operatingSystem", "Value": "{o}", "Type": "TERM_MATCH"}},' \
'{{"Field": "instanceType", "Value": "{t}", "Type": "TERM_MATCH"}}]'
f = FLT.format(t=instance_type, o=os)
try:
data = self.pricing_client.get_products(ServiceCode='AmazonEC2', Filters=json.loads(f))
instance_price = 0
for price in data['PriceList']:
try:
first_id = list(eval(price)['terms']['OnDemand'].keys())[0]
price_data = eval(price)['terms']['OnDemand'][first_id]
second_id = list(price_data['priceDimensions'].keys())[0]
instance_price = price_data['priceDimensions'][second_id]['pricePerUnit']['USD']
if float(price) > 0:
break
except Exception as e:
print(e)
print(instance_price)
return instance_price
except Exception as e:
print(e)
return 0
Based on other answers, here's some code that returns the On Demand prices for all instance types (or for a given instance type, if you add the search filter), gets some relevant attributes for each instance type, and pretty-prints the data.
It assumes pricing is the AWS Pricing client.
import json
def ec2_get_ondemand_prices(Filters):
data = []
reply = pricing.get_products(ServiceCode='AmazonEC2', Filters=Filters, MaxResults=100)
data.extend([json.loads(r) for r in reply['PriceList']])
while 'NextToken' in reply.keys():
reply = pricing.get_products(ServiceCode='AmazonEC2', Filters=Filters, MaxResults=100, NextToken=reply['NextToken'])
data.extend([json.loads(r) for r in reply['PriceList']])
print(f"\x1b[33mGET \x1b[0m{len(reply['PriceList']):3} \x1b[94m{len(data):4}\x1b[0m")
instances = {}
for d in data:
attr = d['product']['attributes']
type = attr['instanceType']
if type in data: continue
region = attr.get('location', '')
clock = attr.get('clockSpeed', '')
type = attr.get('instanceType', '')
market = attr.get('marketoption', '')
ram = attr.get('memory', '')
os = attr.get('operatingSystem', '')
arch = attr.get('processorArchitecture', '')
region = attr.get('regionCode', '')
storage = attr.get('storage', '')
tenancy = attr.get('tenancy', '')
usage = attr.get('usagetype', '')
vcpu = attr.get('vcpu', '')
terms = d['terms']
ondemand = terms['OnDemand']
ins = ondemand[next(iter(ondemand))]
pricedim = ins['priceDimensions']
price = pricedim[next(iter(pricedim))]
desc = price['description']
p = float(price['pricePerUnit']['USD'])
unit = price['unit'].lower()
if 'GiB' not in ram: print('\x1b[31mWARN\x1b[0m')
if 'hrs'!=unit: print('\x1b[31mWARN\x1b[0m')
if p==0.: continue
instances[type] = {'type':type, 'market':market, 'vcpu':vcpu, 'ram':float(ram.replace('GiB','')), 'ondm':p, 'unit':unit, 'terms':list(terms.keys()), 'desc':desc}
instances = {k:v for k,v in sorted(instances.items(), key=lambda e: e[1]['ondm'])}
for ins in instances.values():
p = ins['ondm']
print(f"{ins['type']:32} {ins['market'].lower()}\x1b[91m: \x1b[0m{ins['vcpu']:3} vcores\x1b[91m, \x1b[0m{ins['ram']:7.1f} GB, \x1b[0m{p:7.4f} \x1b[95m$/h\x1b[0m, \x1b[0m\x1b[0m{p*720:8,.1f} \x1b[95m$/m\x1b[0m, \x1b[0m\x1b[0m{p*720*12:7,.0f} \x1b[95m$/y\x1b[0m, \x1b[0m{ins['unit']}\x1b[91m, \x1b[0m{ins['terms']}\x1b[0m")
# print(desc, , sep='\n')
print(f'\x1b[92m{len(instances)}\x1b[0m')
flt = [
# {'Field': 'instanceType', 'Value': 't4g.nano', 'Type': 'TERM_MATCH'}, # enable this filter to select only 1 instance type
{'Field': 'regionCode', 'Value': 'us-east-2', 'Type': 'TERM_MATCH'}, # alternative notation?: {'Field': 'location', 'Value': 'US East (Ohio)', 'Type': 'TERM_MATCH'},
{'Field': 'operatingSystem', 'Value': 'Linux', 'Type': 'TERM_MATCH'},
{'Field': 'tenancy', 'Value': 'shared', 'Type': 'TERM_MATCH'},
{'Field': 'capacitystatus', 'Value': 'Used', 'Type': 'TERM_MATCH'},
]
ec2_get_ondemand_prices(Filters=flt)
aws ec2 describe-snapshots --owner-ids $AWS_ACCOUNT_ID --query
"Snapshots[?(StartTime<='$dtt')].[SnapshotId]" --output text | tr '\t'
'\n' | sort
I have this shell script which I want to convert to python.
I tried looking at the boto3 documentation and came up with this
client = boto3.client('ec2')
client.describe_snapshots(OwnerIds = [os.environ['AWS_ACCOUNT_ID']], )
But I can't figure out how to change that --query tag in python.
I couldn't find it in the documentation.
What am I missing here?
You should ignore the --query portion and everything after it, and process that within Python instead.
First, store the result of the call in a variable:
ec2_client = boto3.client('ec2')
response = ec2_client.describe_snapshots(OwnerIds = ['self'])
It will return something like:
{
'NextToken': '',
'Snapshots': [
{
'Description': 'This is my snapshot.',
'OwnerId': '012345678910',
'Progress': '100%',
'SnapshotId': 'snap-1234567890abcdef0',
'StartTime': datetime(2014, 2, 28, 21, 28, 32, 4, 59, 0),
'State': 'completed',
'VolumeId': 'vol-049df61146c4d7901',
'VolumeSize': 8,
},
],
'ResponseMetadata': {
'...': '...',
},
}
Therefore, you can use response['Snapshots'] to extract your desired results, for example:
for snapshot in response['Snapshots']:
if snapshot['StartTime'] < datetime(2022, 6, 1):
print(snapshot['SnapshotId'])
It's really all Python at that point.
I have the following function that I need to test:
def function_to_test(host: str, prefix: str, file_reg_ex=None, dir_reg_ex=None):
s3_client = boto3.client('s3')
s3_paginator = s3_client.get_paginator('list_objects')
response_iterator = s3_paginator.paginate(
Bucket=host,
Prefix=prefix,
PaginationConfig={
'PageSize': 1000
}
)
ret_dict = {}
for page in response_iterator:
for s3_object in page['Contents']:
key = s3_object['Key']
sections = str(key).rsplit('/', 1)
key_dir = sections[0]
file_name = sections[1]
if (file_reg_ex is None or re.search(file_reg_ex, file_name)) and \
(dir_reg_ex is None or re.search(dir_reg_ex, key_dir)):
ret_dict[key] = {
'ETag': s3_object['ETag'],
'Last-Modified': s3_object['LastModified'].timestamp()
}
return ret_dict
It looks like I need to use the boto stubber referenced here: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/stubber.html#botocore-stub
In the documentation they make a response that is returned from a 'list-objects' S3 request but this will not work for a paginator as it returns a botocore.paginate.PageIterator object. How can this functionality be mocked?
It was suggested to look into https://pypi.org/project/boto3-mocking/ and https://github.com/spulec/moto but due to time constraints I did a more simple workaround.
#staticmethod
def get_s3_resp_iterator(host, prefix, s3_client):
s3_paginator = s3_client.get_paginator('list_objects')
return s3_paginator.paginate(
Bucket=host,
Prefix=prefix,
PaginationConfig={
'PageSize': 1000
}
)
def function_to_test(host: str, prefix: str, file_reg_ex=None, dir_reg_ex=None):
s3_client = boto3.client('s3')
s3_paginator = s3_client.get_paginator('list_objects')
response_iterator = self.get_s3_resp_iterator(host, prefix, s3_client)
ret_dict = {}
for page in response_iterator:
for s3_object in page['Contents']:
key = s3_object['Key']
sections = str(key).rsplit('/', 1)
key_dir = sections[0]
file_name = sections[1]
if (file_reg_ex is None or re.search(file_reg_ex, file_name)) and \
(dir_reg_ex is None or re.search(dir_reg_ex, key_dir)):
ret_dict[key] = {
'ETag': s3_object['ETag'],
'Last-Modified': s3_object['LastModified'].timestamp()
}
return ret_dict
This allows me to do the following in a pretty straight forward manner:
def test_s3(self):
test_resp_iter = [
{
'Contents': [
{
'Key': 'key/key1',
'ETag': 'etag1',
'LastModified': datetime.datetime(2020, 8, 14, 17, 19, 34, tzinfo=tzutc())
},
{
'Key': 'key/key2',
'ETag': 'etag2',
'LastModified': datetime.datetime(2020, 8, 14, 17, 19, 34, tzinfo=tzutc())
}
]
}
]
tc = TestClass()
tc.get_s3_resp_iterator = MagicMock(return_value=test_resp_iter)
ret_dict = tc.function_s3('test_host', '', file_reg_ex=None, dir_reg_ex=None)
self.assertEqual(len(ret_dict), 2)
I have the below code, and want to get it to return a dataframe properly. The polling logic works, but the dataframe doesn't seem to get created/returned. Right now it just returns None when called.
import boto3
import pandas as pd
import io
import re
import time
AK='mykey'
SAK='mysecret'
params = {
'region': 'us-west-2',
'database': 'default',
'bucket': 'my-bucket',
'path': 'dailyreport',
'query': 'SELECT * FROM v_daily_report LIMIT 100'
}
session = boto3.Session(aws_access_key_id=AK,aws_secret_access_key=SAK)
# In[32]:
def athena_query(client, params):
response = client.start_query_execution(
QueryString=params["query"],
QueryExecutionContext={
'Database': params['database']
},
ResultConfiguration={
'OutputLocation': 's3://' + params['bucket'] + '/' + params['path']
}
)
return response
def athena_to_s3(session, params, max_execution = 5):
client = session.client('athena', region_name=params["region"])
execution = athena_query(client, params)
execution_id = execution['QueryExecutionId']
df = poll_status(execution_id, client)
return df
def poll_status(_id, client):
'''
poll query status
'''
result = client.get_query_execution(
QueryExecutionId = _id
)
state = result['QueryExecution']['Status']['State']
if state == 'SUCCEEDED':
print(state)
print(str(result))
s3_key = 's3://' + params['bucket'] + '/' + params['path']+'/'+ _id + '.csv'
print(s3_key)
df = pd.read_csv(s3_key)
return df
elif state == 'QUEUED':
print(state)
print(str(result))
time.sleep(1)
poll_status(_id, client)
elif state == 'RUNNING':
print(state)
print(str(result))
time.sleep(1)
poll_status(_id, client)
elif state == 'FAILED':
return result
else:
print(state)
raise Exception
df_data = athena_to_s3(session, params)
print(df_data)
I plan to move the dataframe load out of the polling function, but just trying to get it to work as is right now.
I recommend you to take a look at AWS Wrangler instead of using the traditional boto3 Athena API. This newer and more specific interface to all things data in AWS including queries to Athena and giving more functionality.
import awswrangler as wr
df = wr.pandas.read_sql_athena(
sql="select * from table",
database="database"
)
Thanks to #RagePwn comment it is worth checking PyAthena as an alternative to the boto3 option to query Athena.
If it is returning None, then it is because state == 'FAILED'. You need to investigate the reason it failed, which may be in 'StateChangeReason'.
{
'QueryExecution': {
'QueryExecutionId': 'string',
'Query': 'string',
'StatementType': 'DDL'|'DML'|'UTILITY',
'ResultConfiguration': {
'OutputLocation': 'string',
'EncryptionConfiguration': {
'EncryptionOption': 'SSE_S3'|'SSE_KMS'|'CSE_KMS',
'KmsKey': 'string'
}
},
'QueryExecutionContext': {
'Database': 'string'
},
'Status': {
'State': 'QUEUED'|'RUNNING'|'SUCCEEDED'|'FAILED'|'CANCELLED',
'StateChangeReason': 'string',
'SubmissionDateTime': datetime(2015, 1, 1),
'CompletionDateTime': datetime(2015, 1, 1)
},
'Statistics': {
'EngineExecutionTimeInMillis': 123,
'DataScannedInBytes': 123,
'DataManifestLocation': 'string',
'TotalExecutionTimeInMillis': 123,
'QueryQueueTimeInMillis': 123,
'QueryPlanningTimeInMillis': 123,
'ServiceProcessingTimeInMillis': 123
},
'WorkGroup': 'string'
}
}
Just to elaborate on the RagePwn's answer of using PyAthena -that's what I ultimately did as well. For some reason AwsWrangler choked on me and couldn't handle the JSON that was being returned from S3. Here's the code snippet that worked for me based on PyAthena's PyPi page
import os
from pyathena import connect
from pyathena.util import as_pandas
aws_access_key_id = os.getenv('ATHENA_ACCESS_KEY')
aws_secret_access_key = os.getenv('ATHENA_SECRET_KEY')
region_name = os.getenv('ATHENA_REGION_NAME')
staging_bucket_dir = os.getenv('ATHENA_STAGING_BUCKET')
cursor = connect(aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key,
region_name=region_name,
s3_staging_dir=staging_bucket_dir,
).cursor()
cursor.execute(sql)
df = as_pandas(cursor)
The above assumes you have defined as environment variables the following:
ATHENA_ACCESS_KEY: the AWS access key id for your AWS account
ATHENA_SECRET_KEY: the AWS secret key
ATHENA_REGION_NAME: the AWS region name
ATHENA_STAGING_BUCKET: a bucket in the same account that has the correct access settings (explanation of which is outside the scope of this answer)
I'm migrating an application that formerly ran on IBM's DoCloud to their new API based off of Watson. Since our application doesn't have data formatted in CSV nor a separation between the model and data layers it seemed simpler to upload an LP file along with a model file that reads the LP file and solves it. I can upload and it claims to solve correctly but returns empty solve status. I've also output various model info (e.g. number of variables) and everything is zeroed out. I've confirmed the LP isn't blank - it has a trivial MILP.
Here is my model code (most of which is taken directly from the example at https://dataplatform.cloud.ibm.com/exchange/public/entry/view/50fa9246181026cd7ae2a5bc7e4ac7bd):
import os
import sys
from os.path import splitext
import pandas
from docplex.mp.model_reader import ModelReader
from docplex.util.environment import get_environment
from six import iteritems
def loadModelFiles():
"""Load the input CSVs and extract the model and param data from it
"""
env = get_environment()
inputModel = params = None
modelReader = ModelReader()
for inputName in [f for f in os.listdir('.') if splitext(f)[1] != '.py']:
inputBaseName, ext = splitext(inputName)
print(f'Info: loading {inputName}')
try:
if inputBaseName == 'model':
inputModel = modelReader.read_model(inputName, model_name=inputBaseName)
elif inputBaseName == 'params':
params = modelReader.read_prm(inputName)
except Exception as e:
with env.get_input_stream(inputName) as inStream:
inData = inStream.read()
raise Exception(f'Error: {e} found while processing {inputName} with contents {inData}')
if inputModel is None or params is None:
print('Warning: error loading model or params, see earlier messages for details')
return inputModel, params
def writeOutputs(outputs):
"""Write all dataframes in ``outputs`` as .csv.
Args:
outputs: The map of outputs 'outputname' -> 'output df'
"""
for (name, df) in iteritems(outputs):
csv_file = '%s.csv' % name
print(csv_file)
with get_environment().get_output_stream(csv_file) as fp:
if sys.version_info[0] < 3:
fp.write(df.to_csv(index=False, encoding='utf8'))
else:
fp.write(df.to_csv(index=False).encode(encoding='utf8'))
if len(outputs) == 0:
print("Warning: no outputs written")
# load and solve model
model, modelParams = loadModelFiles()
ok = model.solve(cplex_parameters=modelParams)
solution_df = pandas.DataFrame(columns=['name', 'value'])
for index, dvar in enumerate(model.solution.iter_variables()):
solution_df.loc[index, 'name'] = dvar.to_string()
solution_df.loc[index, 'value'] = dvar.solution_value
outputs = {}
outputs['solution'] = solution_df
# Generate output files
writeOutputs(outputs)
try:
with get_environment().get_output_stream('test.txt') as fp:
fp.write(f'{model.get_statistics()}'.encode('utf-8'))
except Exception as e:
with get_environment().get_output_stream('excInfo') as fp:
fp.write(f'Got exception {e}')
and a stub of the code that runs it (again, pulling heavily from the example):
prmFile = NamedTemporaryFile()
prmFile.write(self.ctx.cplex_parameters.export_prm_to_string().encode())
modelFile = NamedTemporaryFile()
modelFile.write(self.solver.export_as_lp_string(hide_user_names=True).encode())
modelMetadata = {
self.client.repository.ModelMetaNames.NAME: self.name,
self.client.repository.ModelMetaNames.TYPE: 'do-docplex_12.9',
self.client.repository.ModelMetaNames.RUNTIME_UID: 'do_12.9'
}
baseDir = os.path.dirname(os.path.realpath(__file__))
def reset(tarinfo):
tarinfo.uid = tarinfo.gid = 0
tarinfo.uname = tarinfo.gname = 'root'
return tarinfo
with NamedTemporaryFile() as tmp:
tar = tarfile.open(tmp.name, 'w:gz')
tar.add(f'{baseDir}/ibm_model.py', arcname='main.py', filter=reset)
tar.add(prmFile.name, arcname='params.prm', filter=reset)
tar.add(modelFile.name, arcname='model.lp', filter=reset)
tar.close()
modelDetails = self.client.repository.store_model(
model=tmp.name,
meta_props=modelMetadata
)
modelUid = self.client.repository.get_model_uid(modelDetails)
metaProps = {
self.client.deployments.ConfigurationMetaNames.NAME: self.name,
self.client.deployments.ConfigurationMetaNames.BATCH: {},
self.client.deployments.ConfigurationMetaNames.COMPUTE: {'name': 'S', 'nodes': 1}
}
deployDetails = self.client.deployments.create(modelUid, meta_props=metaProps)
deployUid = self.client.deployments.get_uid(deployDetails)
solvePayload = {
# we upload input data as part of model since only CSV data is supported in this interface
self.client.deployments.DecisionOptimizationMetaNames.INPUT_DATA: [],
self.client.deployments.DecisionOptimizationMetaNames.OUTPUT_DATA: [
{
"id": ".*"
}
]
}
jobDetails = self.client.deployments.create_job(deployUid, solvePayload)
jobUid = self.client.deployments.get_job_uid(jobDetails)
while jobDetails['entity']['decision_optimization']['status']['state'] not in ['completed', 'failed',
'canceled']:
logger.debug(jobDetails['entity']['decision_optimization']['status']['state'] + '...')
time.sleep(5)
jobDetails = self.client.deployments.get_job_details(jobUid)
logger.debug(jobDetails['entity']['decision_optimization']['status']['state'])
# cleanup
self.client.repository.delete(modelUid)
prmFile.close()
modelFile.close()
Any ideas of what can be causing this or what a good test avenue is? It seems there's no way to view the output of the model for debugging, am I missing something in Watson studio?
I tryed something very similar from your code and the solution is included in the payload when the job is completed.
See this shared notebook: https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/cfbe34a0-52a8-436c-99bf-8df6979c11da/view?access_token=220636400ecdf537fb5ea1b47d41cb10f1b252199d1814d8f96a0280ec4a4e1e
I the last cells, after the job is completed, I print the status:
print(jobDetails['entity']['decision_optimization'])
and get
{'output_data_references': [], 'input_data': [], 'solve_state': {'details': {'PROGRESS_GAP': '0.0', 'MODEL_DETAIL_NONZEROS': '3', 'MODEL_DETAIL_TYPE': 'MILP', 'MODEL_DETAIL_CONTINUOUS_VARS': '0', 'MODEL_DETAIL_CONSTRAINTS': '2', 'PROGRESS_CURRENT_OBJECTIVE': '100.0', 'MODEL_DETAIL_INTEGER_VARS': '2', 'MODEL_DETAIL_KPIS': '[]', 'MODEL_DETAIL_BOOLEAN_VARS': '0', 'PROGRESS_BEST_OBJECTIVE': '100.0'}, 'solve_status': 'optimal_solution'}, 'output_data': [{'id': 'test.txt', 'fields': ['___TEXT___'], 'values': [['IC0gbnVtYmVyIG9mIHZhcmlhYmxlczogMgogICAtIGJpbmFyeT0wLCBpbnRlZ2VyPTIsIGNvbnRpbnVvdXM9MAogLSBudW1iZXIgb2YgY29uc3RyYWludHM6IDIKICAgLSBsaW5lYXI9Mg==']]}, {'id': 'solution.json', 'fields': ['___TEXT___'], 'values': [['eyJDUExFWFNvbHV0aW9uIjogeyJ2ZXJzaW9uIjogIjEuMCIsICJoZWFkZXIiOiB7InByb2JsZW1OYW1lIjogIm1vZGVsIiwgIm9iamVjdGl2ZVZhbHVlIjogIjEwMC4wIiwgInNvbHZlZF9ieSI6ICJjcGxleF9sb2NhbCJ9LCAidmFyaWFibGVzIjogW3siaW5kZXgiOiAiMCIsICJuYW1lIjogIngiLCAidmFsdWUiOiAiNS4wIn0sIHsiaW5kZXgiOiAiMSIsICJuYW1lIjogInkiLCAidmFsdWUiOiAiOTUuMCJ9XSwgImxpbmVhckNvbnN0cmFpbnRzIjogW3sibmFtZSI6ICJjMSIsICJpbmRleCI6IDB9LCB7Im5hbWUiOiAiYzIiLCAiaW5kZXgiOiAxfV19fQ==']]}, {'id': 'solution.csv', 'fields': ['name', 'value'], 'values': [['x', 5], ['y', 95]]}], 'status': {'state': 'completed', 'running_at': '2020-03-09T06:45:29.759Z', 'completed_at': '2020-03-09T06:45:30.470Z'}}
which contains in output:
'output_data': [{
'id': 'test.txt',
'fields': ['___TEXT___'],
'values': [['IC0gbnVtYmVyIG9mIHZhcmlhYmxlczogMgogICAtIGJpbmFyeT0wLCBpbnRlZ2VyPTIsIGNvbnRpbnVvdXM9MAogLSBudW1iZXIgb2YgY29uc3RyYWludHM6IDIKICAgLSBsaW5lYXI9Mg==']]
}, {
'id': 'solution.json',
'fields': ['___TEXT___'],
'values': [['eyJDUExFWFNvbHV0aW9uIjogeyJ2ZXJzaW9uIjogIjEuMCIsICJoZWFkZXIiOiB7InByb2JsZW1OYW1lIjogIm1vZGVsIiwgIm9iamVjdGl2ZVZhbHVlIjogIjEwMC4wIiwgInNvbHZlZF9ieSI6ICJjcGxleF9sb2NhbCJ9LCAidmFyaWFibGVzIjogW3siaW5kZXgiOiAiMCIsICJuYW1lIjogIngiLCAidmFsdWUiOiAiNS4wIn0sIHsiaW5kZXgiOiAiMSIsICJuYW1lIjogInkiLCAidmFsdWUiOiAiOTUuMCJ9XSwgImxpbmVhckNvbnN0cmFpbnRzIjogW3sibmFtZSI6ICJjMSIsICJpbmRleCI6IDB9LCB7Im5hbWUiOiAiYzIiLCAiaW5kZXgiOiAxfV19fQ==']]
}, {
'id': 'solution.csv',
'fields': ['name', 'value'],
'values': [['x', 5], ['y', 95]]
}
],
Hope this helps.
Alain
Thanks to Alain for verifying the overall approach but the main issue was there was simply a bug in my code:
After calling modelFile.write(...) it's necessary to call modelFile.seek(0) to reset the file pointer - otherwise it writes an empty file to the tar archive