I have a sample code using AWS lambda function (python) to show a html page in browser, connection with API Gateway.
I want to use css file from S3 bucket in this lambda function as a sample code.
When I try like this, but it can't. How can I use?
You need to make /test-bucket/test.css publicly available.
Understand that Lambda is not even accessing that file during runtime, but rather only referencing it inside a String object, so neither Lambda nor API Gateway are aware that there's a .css file at all.
When your function is executed through API Gateway, your Lambda is going to return a String containing your HTML code inside it. The browser will then try to render that HTML, meaning that the Browser itself is trying to load a file which is private inside one of your Buckets. It's exactly the same thing as creating a new index.html file on your machine and try loading that test.css. It just won't work due to lack of permissions.
Go to test-bucket/test.css and make the object publicly available, so the browser can load it successfully.
If you don't know how to make an object publicly available, I suggest you follow this article on the Knowledge Center by AWS.
Related
I am new to AWS and have to copy a file from S3 bucket to an on-prem server.
As I understand the lambda would get triggered from S3 file upload event notification. But what could be used in lambda to send the file securely.
Your best bet may be to create a hybrid network so AWS lambda can talk directly to an on-prem server. Then you could directly copy the file server-to-server over the network. That's a big topic to cover, with lots of other considerations that probably go well beyond this simple question/answer.
You could send it via an HTTPS web request. You could easily write code to send it like this, but that implies you have something on the other end set up to receive it--some sort of HTTPS web server/API. Again, that could be big topic, but here's a description of how you might do that in Python.
Another option would be to use SNS to notify something on-premise whenever a file is uploaded. Then you could write code to pull the file down from S3 on your end. But the key thing is that this pull is initiated by code on your side. Maybe it gets triggered in response to an SNS email or something like that, but the network flow is on-premise fetching the file from S3 versus lambda pushing it to on-premise.
There are many other options. You probably need to do more research and decide your architectural approach before getting too deep into the implementation details.
I deployed a working Flask application to AWS Lambda via Zappa. One of the things working locally but not on Lambda is the call to
mimetypes.guess_extension
In particular, locally, on my Mac, the guessed extension for
application/vnd.openxmlformats-officedocument.wordprocessingml.document
is properly
.docx
but on Lambda, it's
None
The way mimetypes works is that it consults the host machine's mime.types file, and this file either does not exist on Lambda or something does but it does not have many types.
So how can get this module to work on Lambda? The documentation mentions an init function in the module which accepts files, but that doesn't seem right for a Lambda. I could, I guess bundle up the entire 48K mime.types file on my Mac into my deployed Lambda (as a file?), but that seems like overkill, and was wondering if perhaps I missed something and that Lambdas should have access to this information without uploading files?
I checked PyPI and found the packages mime and common-mimetypes but they have not been touched in years.
Any best practices I am overlooking here?
I think based on the structure of AWS lambda, it won't contain what you want or at least all of you want.
Instead of uploading a file to lambda, I will suggest to upload the file to some cloud storage like your s3 storage. And initialize your program using that file without storing it on disk.
mime = MimeTypes()
with requests.get(url) as res:
mime_file = BytesIO(res.content)
mime.readfp(mime_file)
mime.guess_extension("application/vnd.openxmlformats-officedocument.wordprocessingml.document")
I am writing a lambda function on Amazon AWS Lambda. It accesses the URL of an EC2 instance, on which I am running a web REST API. The lambda function is triggered by Alexa and is coded in the Python language (python3.x).
Currently, I have hard coded the URL of the EC2 instance in the lambda function and successfully ran the Alexa skill.
I want the lambda function to automatically obtain the IP from the EC2 instance, which keeps changing whenever I start the instance. This would ensure that I don't have to go the code and hard code the URL each time I start the EC2 instance.
I stumbled upon a similar question on SO, but it was unanswered. However, there was a reply which indicated updating IAM roles. I have already created IAM roles for other purposes before, but I am still not used to it.
Is this possible? Will it require managing of security groups of the EC2 instance?
Do I need to set some permissions/configurations/settings? How can the lambda code achieve this?
Additionally, I pip installed the requests library on my system, and I tried uploading a '.zip' file with the structure :
REST.zip/
requests library folder
index.py
I am currently using the urllib library
When I use zip files for my code upload (I currently edit code inline), it can't even accesse index.py file to run the code
You could do it using boto3, but I would advise against that architecture. A better approach would be to use a load balancer (even if you only have one instance), and then use the CNAME record of the load balancer in your application (this will not change for as long as the LB exists).
An even better way, if you have access to your own domain name, would be to create a CNAME record and point it to the address of the load balancer. Then you can happily use the DNS name in your Lambda function without fear that it would ever change.
I've been through the newest docs for the GCS client library and went through the example. The sample code shows how to create a file/stream on-the-fly on GCS.
How do I resumably (that allows resumes if error) upload existing files and directories from a local directory to a GCS bucket? Using the new client library. IE, this (can't post more than 2 links so h77ps://cloud.google.com/storage/docs/gspythonlibrary#uploading-objects) is deprecated.
Thanks all
P.S
I do not need GAE functionality - This is going to sit on-premise and upload to GCS
The Python API client can perform resumable uploads. See the documentation for examples. The important bit is:
media = MediaFileUpload('pig.png', mimetype='image/png', resumable=True)
Unfortunately, the library doesn't expose the upload ID itself, so while the upload call will resume uploads if there is an error, there's no way for your application to explicitly resume an upload. If, for instance, your application was terminated and you needed to resume the upload on restart, the library won't help you. If you need that level of retry, you'll have to use another tool or just directly invoke httplib.
The Boto library accomplishes this a little differently and DOES support keeping a persistable tracking token, in case your app crashes and needs to resume. Here's a quick example, stolen from Chromium's system tests:
# Set up other stuff normally
res_upload_handler = ResumableUploadHandler(
tracker_file_name=tracker_file_name, num_retries=3
dst_key.set_contents_from_file(src_file, res_upload_handler=res_upload_handler)
Since you're interested in the new hotness, the latest, greatest Python library for accessing Google Cloud Storage is probably APITools, which also provides for recoverable, resumable uploads and also has examples.
I am using this file storage engine to store files to Amazon S3 when they are uploaded:
http://code.welldev.org/django-storages/wiki/Home
It takes quite a long time to upload because the file must first be uploaded from client to web server, and then web server to Amazon S3 before a response is returned to the client.
I would like to make the process of sending the file to S3 asynchronous, so the response can be returned to the user much faster. What is the best way to do this with the file storage engine?
Thanks for your advice!
I've taken another approach to this problem.
My models have 2 file fields, one uses the standard file storage backend and the other one uses the s3 file storage backend. When the user uploads a file it get's stored localy.
I have a management command in my application that uploads all the localy stored files to s3 and updates the models.
So when a request comes for the file I check to see if the model object uses the s3 storage field, if so I send a redirect to the correct url on s3, if not I send a redirect so that nginx can serve the file from disk.
This management command can ofcourse be triggered by any event a cronjob or whatever.
It's possible to have your users upload files directly to S3 from their browser using a special form (with an encrypted policy document in a hidden field). They will be redirected back to your application once the upload completes.
More information here: http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1434
There is an app for that :-)
https://github.com/jezdez/django-queued-storage
It does exactly what you need - and much more, because you can set any "local" storage and any "remote" storage. This app will store your file in fast "local" storage (for example MogileFS storage) and then using Celery (django-celery), will attempt asynchronous uploading to the "remote" storage.
Few remarks:
The tricky thing is - you can setup it to copy&upload, or to upload&delete strategy, that will delete local file once it is uploaded.
Second tricky thing - it will serve file from "local" storage until it is not uploaded.
It also can be configured to make number of retries on uploads failures.
Installation & usage is also very simple and straightforward:
pip install django-queued-storage
append to INSTALLED_APPS:
INSTALLED_APPS += ('queued_storage',)
in models.py:
from queued_storage.backends import QueuedStorage
queued_s3storage = QueuedStorage(
'django.core.files.storage.FileSystemStorage',
'storages.backends.s3boto.S3BotoStorage', task='queued_storage.tasks.TransferAndDelete')
class MyModel(models.Model):
my_file = models.FileField(upload_to='files', storage=queued_s3storage)
You could decouple the process:
the user selects file to upload and sends it to your server. After this he sees a page "Thank you for uploading foofile.txt, it is now stored in our storage backend"
When the users has uploaded the file it is stored temporary directory on your server and, if needed, some metadata is stored in your database.
A background process on your server then uploads the file to S3. This would only possible if you have full access to your server so you can create some kind of "deamon" to to this (or simply use a cronjob).*
The page that is displayed polls asynchronously and displays some kind of progress bar to the user (or s simple "please wait" Message. This would only be needed if the user should be able to "use" (put it in a message, or something like that) it directly after uploading.
[*: In case you have only a shared hosting you could possibly build some solution which uses an hidden Iframe in the users browser to start a script which then uploads the file to S3]
You can directly upload media to the s3 server without using your web application server.
See the following references:
Amazon API Reference : http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?UsingHTTPPOST.html
A django implementation : https://github.com/sbc/django-uploadify-s3
As some of the answers here suggest uploading directly to S3, here's a Django S3 Mixin using plupload:
https://github.com/burgalon/plupload-s3mixin
I encountered the same issue with uploaded images. You cannot pass along files to a Celery worker because Celery needs to be able to pickle the arguments to a task. My solution was to deconstruct the image data into a string and get all other info from the file, passing this data and info to the task, where I reconstructed the image. After that you can save it, which will send it to your storage backend (such as S3). If you want to associate the image with a model, just pass along the id of the instance to the task and retrieve it there, bind the image to the instance and save the instance.
When a file has been uploaded via a form, it is available in your view as a UploadedFile file-like object. You can get it directly out of request.FILES, or better first bind it to your form, run is_valid and retrieve the file-like object from form.cleaned_data. At that point at least you know it is the kind of file you want it to be. After that you can get the data using read(), and get the other info using other methods/attributes. See https://docs.djangoproject.com/en/1.4/topics/http/file-uploads/
I actually ended up writing and distributing a little package to save an image asyncly. Have a look at https://github.com/gterzian/django_async Right it's just for images and you could fork it and add functionalities for your situation. I'm using it with https://github.com/duointeractive/django-athumb and S3