Uploading to S3 from Docker Container

Uploading to S3 from Docker Container - python

I'm trying to get a handle on Docker. I've got a very basic container setup that runs a simple python script to:
Query a database
Write a CSV file of the query results
Upload the CSV to S3 (using the tinys3 package).
When I run the script from my host, everything works as intended: the query fires, csv is created and uploaded perfectly. But when I run it from within my Docker container, tinys3 fails with the following error:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='my-s3-bucket', port=443): Max retries exceeded with url: /bucket.s3.amazonaws.com/test.csv (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f4f17cf7790>: Failed to establish a new connection: [Errno -2] Name or service not known',))
Everything prior to that works (query and CSV creation). This answer suggests that there's an incorrect endpoint. But that doesn't seem correct, since running the script from my host does not result in an error.
So my question is: am I missing something obvious? Is this an issue with the tinys3 module? Do I need to set something up in my container to allow it to "call out"? Or is there a better way to do this?

Alternatively you can also use minio-py client library for the same.
Please find example code for fput_object.py
from minio import Minio
from minio.error import ResponseError
client = Minio('s3.amazonaws.com',
access_key='YOUR-ACCESSKEYID',
secret_key='YOUR-SECRETACCESSKEY')
# Put on object 'my-objectname-csv' with contents from
# 'my-filepath.csv' as 'application/csv'.
try:
client.fput_object('my-bucketname', 'my-objectname-csv',
'my-filepath.csv', content_type='application/csv')
except ResponseError as err:
print(err)
Hope it helps.
Disclaimer: I work with Minio

Related

Why smbprotocol connection is getting timeout

I'm trying to transfer some file from ubuntu to Window(AWS EC2 instance).But I'm getting following error.
ValueError: Failed to connect to '35.154.105.236': timed out
For reference
import smbclient
import sys
# Optional - register the credentials with a server
print(smbclient.register_session("35.154.105.236", username="Administrator", password="XXXXXXXXXXX"))
May I know what is missing and why it is getting timeout.

boto 2.41 - set_contents_from_filename() fails when I use it to print upload progress

I tried to upload a file into AWS using the following snippet of code.
filePath = os.path.join(output_dir, fileToUpload)
k = bucket.new_key(os.path.join(bucket_dir,fileToUpload))
k.set_contents_from_filename(filePath,cb=mycb, num_cb=10,policy='public-read')
As seen above I am using the call back feature(cb=mycb) to print the status of the upload. The upload fails with the following error:
[WinError 10054] An existing connection was forcibly closed by the remote host
But when I remove the callback parameter from the set_contents_from_filename() func call, it works fine.
k.set_contents_from_filename(filePath ,policy='public-read')
Has anyone ever faced such an issue. Any thoughts on why the issue occurs?

receiveing "550 data channel timed out" while using python storbinary

I am trying to upload files from a UNIX server to a windows server using python FTP_TLS. Here is my code:
from ftplib import FTP_TLS
ftps = FTP_TLS('server')
ftps.connect(port=myport)
ftps.login('user', 'password')
ftps.prot_p()
ftps.retrlines('LIST')
remote_path = "MYremotePath"
ftps.cwd(remote_path)
ftps.storbinary('STOR myfile.txt', open('myfile.txt', 'rb'))
myfile.close()
ftps.close()
I can connect to the server successfully and receive the list of files, but I am not able able to upload the file and after some time I receive the following error:
ftplib.error_perm: 550 Data channel timed out due to not meeting the minimum bandwidth requirement.
I should mention that I am able to upload files on the same server using perl FTPSSL library. The problem happens only in python.
Does anyone have any idea about how to resolve this issue?
Thanks

Python 3 Read data from URL [duplicate]

I have this simple minimal 'working' example below that opens a connection to google every two seconds. When I run this script when I have a working internet connection, I get the Success message, and when I then disconnect, I get the Fail message and when I reconnect again I get the Success again. So far, so good.
However, when I start the script when the internet is disconnected, I get the Fail messages, and when I connect later, I never get the Success message. I keep getting the error:
urlopen error [Errno -2] Name or service not known
What is going on?
import urllib2, time
while True:
try:
print('Trying')
response = urllib2.urlopen('http://www.google.com')
print('Success')
time.sleep(2)
except Exception, e:
print('Fail ' + str(e))
time.sleep(2)

This happens because the DNS name "www.google.com" cannot be resolved. If there is no internet connection the DNS server is probably not reachable to resolve this entry.
It seems I misread your question the first time. The behaviour you describe is, on Linux, a peculiarity of glibc. It only reads "/etc/resolv.conf" once, when loading. glibc can be forced to re-read "/etc/resolv.conf" via the res_init() function.
One solution would be to wrap the res_init() function and call it before calling getaddrinfo() (which is indirectly used by urllib2.urlopen().
You might try the following (still assuming you're using Linux):
import ctypes
libc = ctypes.cdll.LoadLibrary('libc.so.6')
res_init = libc.__res_init
# ...
res_init()
response = urllib2.urlopen('http://www.google.com')
This might of course be optimized by waiting until "/etc/resolv.conf" is modified before calling res_init().
Another solution would be to install e.g. nscd (name service cache daemon).

For me, it was a proxy problem.
Running the following before import urllib.request helped
import os
os.environ['http_proxy']=''
response = urllib.request.urlopen('http://www.google.com')

Can't connect to S3 buckets with periods in their name, when using Boto on Heroku

We're getting a certificate error when trying to connect to our S3 bucket using Boto. Strangely, this only manifests itself when accessing a bucket with periods in its name WHILE running on Heroku.
from boto.s3.connection import S3Connection
conn = S3Connection({our_s3_key}, {our_s3_secret})
bucket = conn.get_bucket('ourcompany.images')
Raises the following error:
CertificateError: hostname 'ourcompany.images.s3.amazonaws.com'
doesn't match either of '*.s3.amazonaws.com', 's3.amazonaws.com'
But the same code works fine when run locally, and would also work on Heroku if the bucket name were 'ourcompany-images' instead of 'ourcompany.images'

According to the relevant github issue, add this to the configuration:
[s3]
calling_format = boto.s3.connection.OrdinaryCallingFormat
Or, specify the calling_format while instantiating an S3Connection:
from boto.s3.connection import OrdinaryCallingFormat
conn = S3Connection(access_key, secret_key, calling_format=OrdinaryCallingFormat())
The code worked for you locally and didn't work on heroku, most likely, because of the different python versions used. I suspect you are using 2.7.9 runtime on heroku, which has enabled certificate checks for stdlib http clients.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Uploading to S3 from Docker Container - python

Related

Why smbprotocol connection is getting timeout

boto 2.41 - set_contents_from_filename() fails when I use it to print upload progress

receiveing "550 data channel timed out" while using python storbinary

Python 3 Read data from URL [duplicate]

Can't connect to S3 buckets with periods in their name, when using Boto on Heroku

Categories

Resources