Using Boto3 to upload a file to Amazon WorkDocs - python

According to the Amazon WorkDocs SDK page, you can use Boto3 to migrate your content to Amazon WorkDocs. I found the entry for the WorkSpaces Client in the Boto3 documentation, but every call seems to require a "AuthenticationToken" parameter. The only information I can find on AuthenticationToken is that is it supposed to be a "Amazon WorkDocs authentication token".
Does anyone know what this token is? How do I get one? Is there any code examples of using the WorkDocs Client in Boto3?
I am trying to create a simple Python script that will upload a single document into WorkDocs, but there seems to be little to no information on how to do this. I was easily able to write a script that can upload/download files from S3, but this seems like something else entirely.

Related

Python Data Ingestion (Start with API call "Get" to AWS S3 Bucket), how to manage the username/pwd/api-key and token( expired in short time window)

data source is from SaaS Server's API endpoints, aim to use python to move data into AWS S3 Bucket(Python's Boto3 lib)
API is assigned via authorized Username/password combination and unique api-key.
then every time initially call API need get Token for further info fetch.
have 2 question:
how to manage those secrets above, save to a head file (*.ini, *.json *.yaml) or saved via AWS's Secret-Manager?
the Token is a bit challenging, the basically way is each Endpoint, fetch a new token and do the API call
then that's end of too many pipeline (like if 100 Endpoints info need per downstream business needs) then
need to craft 100 pipeline like an universal template repeating 100 times.
I am new to Python programing world, you all feel free to comment to share any user-case.
Much appreciate !!
I searched and read this show-case
[saving-from-api-to-s3-bucket/74648533]
saving from api to s3 bucket
and
"how-to-write-a-file-or-data-to-an-s3-object-using-boto3"
How to write a file or data to an S3 object using boto3
I found this has been helpful:
#Python-decopule summary: store parameters in .ini or .env files;
#few options of manage(hiding) sensitive info
a. IAM role
b. Store Secrets using **Parameter Store**
c. Store Secrets using **Secrets Manager** - Current method
recommended by AWS

automatically extract aws keys using python

I have been provided to access aws and get credentials manaully i,e to copy access_key_id,secret_access_key,session_token this will expire every one hour. I am using these credentials to extract information from Route53. I want to automate and get access_key_id,secret_access_key,session_token instead of manually copying to the script. I would like to understand is there any way to do this automation..
The process of refreshing tokens are documented here:
Auto-refresh AWS Tokens Using IAM Role and boto3
It is poorly documented in boto documentation but this could help

Getting S3 Response code (Only the HTTP code like 200,300,400,403,500 Etc) while saving file using S3a in pyspark

I am trying to get the HTTP code and store in RDS table for later analysis of pyspark job which will save the file as AVRO format to S3 using S3a. Once the file is saved I know that there will be return status code from S3 but I am not sure how to record that in Code. please find the snippet of the code.
def s3_load(df, row):
df.write.\
format("com.databricks.spark.avro").\
save("s3a://Test-" + row["PARTNER"].lower() + "/" + row["TABLE_NAME"] + "/" +
datetime.datetime.today().strftime('%Y%m%d'))
In the above code i would like the o get the return as status code.
Note:I am able to save the file in S3 as AVRO format.
Thanks
This is a similar concept discussed in this question, getting a status code of a library or function that wraps an s3 API: Amazon S3 POST, event when done?
Ultimately, if databricks is the library handling the upload, the resulting response code from the df.write.save(...) function call would be found somewhere in the result of the databricks function call.
Databricks supports s3 and s3a as target destinations for saving files (as shown in their docs here), but it doesn't appear that databricks surfaces the response code from underlying operations here (maybe they do, I couldn't find it in any of the docs).
A few options for moving forward:
Assuming databricks will throw "some" sort of error for that upload, a simple try/except will allow you to properly catch this (although any non-databricks level errors would still pass).
On AWS, s3 bucket uploads are an event source that can be used as a trigger for other operations like invoking an AWS Lambda, which you can use to call an arbitrary cloud hosted function. Lots of info available on what this architecture would look like in this tutorial.
Depending on the need for parallel uploading, you can rewrite your small upload function using boto3, the official AWS python library. Discussion on how to handle those error/response codes discussed here.
Databricks also seems to have audit logging capabilities somewhere in their enterprise offering.

Python using GCS new client library - upload objects/'directories'

I've been through the newest docs for the GCS client library and went through the example. The sample code shows how to create a file/stream on-the-fly on GCS.
How do I resumably (that allows resumes if error) upload existing files and directories from a local directory to a GCS bucket? Using the new client library. IE, this (can't post more than 2 links so h77ps://cloud.google.com/storage/docs/gspythonlibrary#uploading-objects) is deprecated.
Thanks all
P.S
I do not need GAE functionality - This is going to sit on-premise and upload to GCS
The Python API client can perform resumable uploads. See the documentation for examples. The important bit is:
media = MediaFileUpload('pig.png', mimetype='image/png', resumable=True)
Unfortunately, the library doesn't expose the upload ID itself, so while the upload call will resume uploads if there is an error, there's no way for your application to explicitly resume an upload. If, for instance, your application was terminated and you needed to resume the upload on restart, the library won't help you. If you need that level of retry, you'll have to use another tool or just directly invoke httplib.
The Boto library accomplishes this a little differently and DOES support keeping a persistable tracking token, in case your app crashes and needs to resume. Here's a quick example, stolen from Chromium's system tests:
# Set up other stuff normally
res_upload_handler = ResumableUploadHandler(
tracker_file_name=tracker_file_name, num_retries=3
dst_key.set_contents_from_file(src_file, res_upload_handler=res_upload_handler)
Since you're interested in the new hotness, the latest, greatest Python library for accessing Google Cloud Storage is probably APITools, which also provides for recoverable, resumable uploads and also has examples.

How to copy a file via the browser to Amazon S3 using Python (and boto)?

Creating a file (key) into Amazon S3 using Python (and boto) is not a problem.
With this code, I can connect to a bucket and create a key with a specific content:
bucket_instance = connection.get_bucket('bucketname')
key = bucket_instance.new_key('testfile.txt')
key.set_contents_from_string('Content for File')
I want to upload a file via the browser (file dialogue) into Amazon S3.
How can I realize this with boto?
Thanks in advance
You can't do this with boto, because what you're asking for is purely client-side - there's no direct involvement from the server except to generate the form to post.
What you need to use is Amazon's browser-based upload with POST support. There's a demo of it here.
do you mean this one? Upload files in Google App Engine

Categories

Resources