CSV-File from AWS S3 into PostgreSQL Amazon RDS using python

CSV-File from AWS S3 into PostgreSQL Amazon RDS using python - python

Status:
I have created new tables in PostgreSQL-Database on Amazon RDS
I have uploaded a csv-file into Bucket on Amazon S3
via lambda function I have connected with Amazon S3 Buckets and Amazon RDS
I can read csv-file via the following code
import csv, io, boto3
s3 = boto3.resource('s3')
client = boto3.client('s3',aws_access_key_id=Access_Key,aws_secret_access_key=Secret_Access_Key)
buf = io.BytesIO()
s3.Object('bucketname','filename.csv').download_fileobj(buf)
buf.seek(0)
while True:
line = buf.readlines(1)
print(line)
Problem:
I can't import necessary python libraries e.g. psycopg2, openpyxl etc.
when I tried to import psycopg2
import psycopg2
I got the error info:
Unable to import module 'myfilemane': No module named 'psycopg2._psycopg'
at first I have not imported the module "psycopg2._psycopg" but "psycopg2". I don't know where is the suffix '_psycopg' from
secondly I followed all the steps in the documentation:
https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html (1. create a directory. 2. Save all of your Python source files (the .py files) at the root level of this directory. 3. Install any libraries using pip at the root level of the directory. 4. Zip the content of the project-dir directory)
And I have also read this documentation:
https://docs.aws.amazon.com/lambda/latest/dg/vpc-rds-deployment-pkg.html
The same applies to other modules or libraries e.g. openpyxl etc. I was always told that "No Module Named 'OneNameThatIHaveNotImported'"
So does anyone have any idea or who know another way how can one via lambda-function edit the csv-file on s3 and import the edited version into rds-database?
Thanks for the help in advance!

the answer thread this SO answer references will put you on the right path. basically, you'd need to create the deployment package in an EC2 that matches the linux image the AWS lambda functions runs on. better yet, you can deploy lambda functions from the same staging EC2 instance where you created your deployment package through the AWS CLI.
you can also use [precompiled lambda packages][2] if you want an out-of-the-box fix.
[2]: https://github.com/jkehler/awslambda-psycopg2 or more generally, https://github.com/Miserlou/lambda-packages

Related

Copy data from CSV file based on AWS S3 bucket to Postgresql with Python

I am trying to create my first pipeline.
So my task is to create a pipeline with python, which purpose is to upload CSV file from a local system to AWS S3 Bucket.
After that I need to copy the data from this CSV file into PostgreSQL table. I checked AWS documentation (https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Procedural.Importing.html#USER_PostgreSQL.S3Import ). I follow the instructions but still get an error when i try to run the code from the tutorial in my python environment.
The Questions: is it possible to achieve it with python?
Could someone share a little bit knowlege and python code how he do it?

How do i write aws cli commands in python

I am new to AWS as well as Python.
AWS CLI, the below command is working perfectly fine:
aws cloudformation package --template-file sam.yaml --output-template-file output-sam.yaml --s3-bucket <<bucket_Name>>
The goal is to create automate python script which will run the above command. I tried to google it but none of the solutions is working for me.
Below are the solution that I have tried but not able to upload the artifacts onto the S3 bucket.
test.py file:
import subprocess
command= ["aws","cloudformation","package","--template-file","sam.yaml","--output-template-file","output-sam.yaml","--s3-bucket","<<bucket_name>>"]
print(subprocess.check_output(command, stderr=subprocess.STDOUT))

It can easily be done using the os library. The simplest way of doing it is given in the code.
import os
os.system("aws cloudformation package --template-file sam.yaml --output-template-file output-sam.yaml --s3-bucket <<bucket_name>>")
However, subprocess can be used for a little complicated tasks.
You can also check out boto3 library for such tasks. Boto is AWS SDK for Python.

You can check how this aws-cli command is implemented as it's all in Python already. Basically aws cloudformation package uploads the template to S3, so you can do the same with boto3 as mentioned in the comments.

How to reduce the size of packaged python zip files for AWS Lambda

Afternoon,
I recently came across AWS Lambda and Azure Functions. AWS imposes a limit on the size of zipped as well as unzipped files, which for python scripts need to include all of the dependent modules. I have been using lambda-uploader to package my script and it's module dependencies, but the pandas package is too big.
I have seen examples of people completing machine learning and using pandas on AWS Lambda (a little outdated though) but I can't see how they're doing it. Any suggestions?

The package that you upload to lambda should not contain anything but the code and support modules required for Lambda to run your code. The Lambda console UI limits the file size to 10MB but you can upload zip files up to 50MB if you place them in an S3 bucket and then request that Lambda load them from S3.
Any other assets that you require for execution such as machine learning models should be uploaded separately to S3 and then downloaded from within your Lambda function at execution time. The Lambda function can write to a /tmp folder but keep in mind it only has access to 512MB of disk space. Also keep in mind that the Lambda function has a maximum runtime of 300 seconds so downloading really large files will take time away from your function doing real work with the data you're downloading.

If you're using Python libraries, you can get rid of botocore, boto3 as they are already present in AWS's lambdas functions.

Try using Zappa. Add slim_handler to true in the zappa_settings.json which you make using zappa init.

To get the smallest possible zip file use option -9
$ zip -9

If you're using the serverless slim option and still hit the 250MB limit, you can use the option zip: true. That keeps all packages compressed through deployment and you just have to unzip them in your handler module, as explained here:
try:
import unzip_requirements
except ImportError:
pass

Uploading to S3 using Python Requests

I'd like to upload xml's directly to S3 without the use of modules like boto, boto3, or tinys3.
So far I have written:
url = "https://my-test-s3.s3.amazonaws.com"
with open(xml_file,'rb') as data:
requests.put(url, data=data)
and I've gone and head and set the AllowedOrigin on my S3 bucket to accept my server's address.
This does not error when running, however, it also does not seem to be uploading anything.
Any help would be appreciated --- I'd like to (a) get the thing to upload and (b) figure out how to apply AWSAccessKey and AWSSecretAccessKey to the request

If you want to upload xml's directly to S3 without the use of modules like boto, boto3, or tinys3 I would recommend to use awscli:
pip install awscli
aws configure # enter your AWSAccessKey and AWSSecretAccessKey credentials
AWSAccessKey and AWSSecretAccessKey will be stored inside ~/.aws folder permanently after using aws configure.
And then you can upload files using python:
os.system("aws s3 cp {0} s3://your_bucket_name/{1}".format(file_path, file_name))
Docs are here.

you need to install awscli following this documentation.
Then in a commandline shell, execute aws configure and follow the instruction.
to upload file, its much easier using boto
import boto3
s3 = boto3.resource('s3')
s3.meta.client.upload_file(xml_file, 'yourbucket', 'yours3filepath')
Alternatively, you can use aws s3 cp command combined with python subprocess.
subprocess.call(["aws", "s3", "cp", xml_file, "yours3destination"])

Run cleanup script on AWS Oracle RDS from AWS Lambda

I'm using Apex to deploy lambda functions in AWS. I need to write a lambda function which runs a cleanup script on an Oracle RDS in my AWS VPC. Oracle has a very nice python library called cx_Oracle, but I'm having some problems using it in a Lambda function (running on Python 2.7). My first step was to try to run the oracle-described test code as follows:
from __future__ import print_function
import json
import boto3
import boto3.ec2
import os
import cx_Oracle
def handle(event, context):
con = cx_Oracle.connect('username/password#my.oracle.rds:1521/orcl')
print(str(con.version))
con.close()
When I try to run this piece of test code, I get the following response:
Unable to import module 'main': /var/task/cx_Oracle.so: invalid ELF header
Google has told me that this error is caused because the cx_Oracle library is not a complete oracle implementation for python, rather it requires the SQLPlus client to be pre-installed, and the cx_Oracle library references components installed as part of SQLPlus.
Obviously pre-installing SQLPlus might be difficult.
Apex has the
hooks {}
functionality which would allow me to pre-build things, but I'm having trouble finding documentation showing what happens to those artefacts and how that works. In theory I could download the libraries into a nexus or an S3 bucket, and then in my hooks{} declaration, I could add them to the zip file. I could then try to install them as part of the python script. However, I have a few problems with this:
How are the 'built' artefacts accessed inside the lambda
function? Can they be? Have I misunderstood this?
Does a python 2.7 lambda function have enough access rights to
the operating system of the host container to be able to install a
library?
If the answer to question 2 is no, is there another way to write
a lambda function to run some SQL against an Oracle RDS instance?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

CSV-File from AWS S3 into PostgreSQL Amazon RDS using python - python

Related

Copy data from CSV file based on AWS S3 bucket to Postgresql with Python

How do i write aws cli commands in python

How to reduce the size of packaged python zip files for AWS Lambda

Uploading to S3 using Python Requests

Run cleanup script on AWS Oracle RDS from AWS Lambda

Categories

Resources