Python-Bottle internal Server Error 500 when uploading to S3 bucket and creating an PostgreSQL entry in RDS - python

so i explore the AWS at the moment and have a problem regarding an own little project.
I have a python bottle on an ec2-instance, a S3 bucket for uploads and a RDS instance (postgreSQL).
I have a script which lets me upload images (jpg and png).
Now I want to have an entry in my database for every upload I do.
The connection is established and all rules allow traffic from ec2-instance to S3 bucket, to RDS-instance and back.
After uploading, normally i get a message that it worked out, but now after including code in the python script to Write and Read from the database it just shows Internal Server Error 500.
Before it also put the upload in my S3 bucket, but now it puts it into my ec2-instance directory.
And creates a directory "user_uploads" which, actually should point to the S3 bucket, like it did before including the Code to Read/Write to the DB.
Can someone help me?
`
CREATE SCHEMA bottletube;
SET SCHEMA 'bottletube';
CREATE TABLE IF NOT EXISTS image_uploads
(
id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
url VARCHAR(20) NOT NULL,
category VARCHAR(64)
);
`
`
#!/usr/bin/python3
import time
import os
import uuid
import psycopg2
import requests
from bottle import route, run, template, request
from boto3 import resource
BUCKET_NAME = 'python-bottle-hw' # Replace with your bucket name
SAVE_PATH = 'user_uploads'
#route('/home')
#route('/')
def home():
# SQL Query goes here later, now dummy data only
# Read Entries from database
items = []
cursor.execute('SELECT * FROM image_uploads ORDER BY id')
for record in cursor.fetchall():
items.append({'id': record[0], 'filename': record[1], 'category': record[2]})
return template('home.tpl', name='BoTube Home', items=items)
#route('/upload', method='GET')
def do_upload_get():
return template('upload.tpl', name='Upload Image')
#route('/upload', method='POST')
def do_upload_post():
category = request.forms.get('category')
upload = request.files.get('file_upload')
# Check for errors
error_messages = []
if not upload:
error_messages.append('Please upload a file.')
if not category:
error_messages.append('Please enter a category.')
try:
name, ext = os.path.splitext(upload.filename)
if ext not in ('.png', '.jpg', '.jpeg'):
error_messages.append('File Type not allowed.')
except:
error_messages.append('Unknown error.')
if error_messages:
return template('upload.tpl', name='Upload Image', error_messages=error_messages)
# Save to SAVE_PATH directory
if not os.path.exists(SAVE_PATH):
os.makedirs(SAVE_PATH)
save_filename = f'{name}_{time.strftime("%Y%m%d-%H%M%S")}{ext}'
with open(f'{SAVE_PATH}{save_filename}', 'wb') as open_file:
open_file.write(upload.file.read())
if ext == '.png':
content_type = 'image/png'
else:
content_type = 'image/jpeg'
# Upload to S3
data = open(SAVE_PATH + save_filename, 'rb')
s3_resource.Bucket(BUCKET_NAME).put_object(Key=f'user_uploads/{save_filename}',
Body=data,
Metadata={'Content-Type': content_type},
ACL='public-read')
# Write to DB
cursor.execute(f"INSERT INTO image_uploads (url, category) VALUES ('user_uploads/{save_filename}', '{category}');")
connection.commit()
# Return template
return template('upload_success.tpl', name='Upload Image')
if __name__ == '__main__':
# Connect to DB
connection = psycopg2.connect(user="postgres", host="endpoint of my database", password="here would be my password", database="bottletube")
cursor = connection.cursor()
cursor.execute("SET SCHEMA 'bottletube';")
connection.commit()
# Connect to S3
s3_resource = resource('s3', region_name='us-east-1')
# Needs to be customized
# run(host='your_public_dns_name',
run(host=requests.get('http://169.254.169.254/latest/meta-data/public-hostname').text,
port=80)
`
So this is my code, i know, never include passwords in code, i know how AWS secretmanager works but wanted to check function before i do that

Related

How do i trigger sql file using psql command from ephemeral storage of lambda function in python

connection and downloaded file output# Access sql files from S3 to lambda and execute.
S3 -> Lambda -> RDS instance
a. Integration between funtion, databse and S3-> DONE
a1. download the .sql file from the S3 bucket and write the file to the /tmp storage of the Lambda function. -> DONE
b1. Import a Python library or create Lambda layer to include the relevant libraries/dependencies into the Lambda function to perform the psql command to execute the SQL file.
or
b2. you can download the file and convert it to a string and pass the string as a parameter to 'ExecuteSql' API call which allows you to run one or more SQL statements.
c. Once we able to successfully execute the sql files then check how to export the generated .csv,txt,html,TAB files to S3 OUTPUT path.
So far i have integrated function, S3 and RDS and able to view the output of table (to test connection) and download the sql file from S3 path to ephemeral storage /tmp of lambda function.
Now looking forward that how to execute downloaded sql file from /tmp of lambda function using psql command or convert file to a string and pass the string as a parameter to 'ExecuteSql' API which allows you to run one or more SQL statements. Please help share any ways to achieve.
Please refer below code in python which i am using with lambda function.
from dataclasses import dataclass
import psycopg2
from psycopg2.extras import RealDictCursor
import json
from datetime import datetime
import csv
import boto3
from botocore.exceptions import ClientError
import os
def get_secret():
secret_name = "baardsnonprod-qa2db"
region_name = "us-east-1"
# Create a Secrets Manager client
session = boto3.session.Session()
client = session.client(
service_name='secretsmanager',
region_name=region_name
)
secret = client.get_secret_value(
SecretId=secret_name
)
secret_dict = json.loads(secret['SecretString'])
return secret_dict
def download_sql_from_s3(sql_filename):
s3 = boto3.resource('s3')
bucket = 'baa-non-prod-baa-assets'
key = "RDS-batch/Reports/" + sql_filename
local_path = '/tmp/' + sql_filename
response = s3.Bucket(bucket).download_file(key, local_path)
return "file successfully downloaded"
def lambda_handler(event, context):
secret_dict = get_secret()
print(secret_dict)
hostname = secret_dict['host']
portnumber = secret_dict['port']
databasename = secret_dict['database']
username = secret_dict['username']
passwd = secret_dict['password']
print(hostname,portnumber,databasename,username,passwd)
conn = psycopg2.connect(host = hostname, port = portnumber, database = databasename, user = username, password = passwd)
cur = conn.cursor(cursor_factory = RealDictCursor)
cur.execute("SELECT * FROM PROFILE")
results = cur.fetchall()
json_result = json.dumps(results, default = str)
print(json_result)
status = download_sql_from_s3("ConsumerPopularFIErrors.sql")
# file_status = os.path.isfile('/tmp/ConsumerPopularFIErrors.sql')
print(status)
# print(file_status)
# with open('/tmp/ConsumerPopularFIErrors.sql') as file:
# content = file.readlines()
# for line in content:
# print(line)
#lambda_handler()

Google Client API v3 - update a file on drive using Python

I'm trying to update the content of a file from a python script using the google client api. The problem is that I keep receiving error 403:
An error occurred: <HttpError 403 when requesting https://www.googleapis.com /upload/drive/v3/files/...?alt=json&uploadType=resumable returned "The resource body includes fields which are not directly writable.
I have tried to remove metadata fields, but didn't help.
The function to update the file is the following:
# File: utilities.py
from googleapiclient import errors
from googleapiclient.http import MediaFileUpload
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
def update_file(service, file_id, new_name, new_description, new_mime_type,
new_filename):
"""Update an existing file's metadata and content.
Args:
service: Drive API service instance.
file_id: ID of the file to update.
new_name: New name for the file.
new_description: New description for the file.
new_mime_type: New MIME type for the file.
new_filename: Filename of the new content to upload.
new_revision: Whether or not to create a new revision for this file.
Returns:
Updated file metadata if successful, None otherwise.
"""
try:
# First retrieve the file from the API.
file = service.files().get(fileId=file_id).execute()
# File's new metadata.
file['name'] = new_name
file['description'] = new_description
file['mimeType'] = new_mime_type
file['trashed'] = True
# File's new content.
media_body = MediaFileUpload(
new_filename, mimetype=new_mime_type, resumable=True)
# Send the request to the API.
updated_file = service.files().update(
fileId=file_id,
body=file,
media_body=media_body).execute()
return updated_file
except errors.HttpError as error:
print('An error occurred: %s' % error)
return None
And here there is the whole script to reproduce the problem.
The goal is to substitute a file, retrieving its id by name.
If the file does not exist yet, the script will create it by calling insert_file (this function works as expected).
The problem is update_file, posted above.
from __future__ import print_function
from utilities import *
from googleapiclient import errors
from googleapiclient.http import MediaFileUpload
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
def get_authenticated(SCOPES, credential_file='credentials.json',
token_file='token.json', service_name='drive',
api_version='v3'):
# The file token.json stores the user's access and refresh tokens, and is
# created automatically when the authorization flow completes for the first
# time.
store = file.Storage(token_file)
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets(credential_file, SCOPES)
creds = tools.run_flow(flow, store)
service = build(service_name, api_version, http=creds.authorize(Http()))
return service
def retrieve_all_files(service):
"""Retrieve a list of File resources.
Args:
service: Drive API service instance.
Returns:
List of File resources.
"""
result = []
page_token = None
while True:
try:
param = {}
if page_token:
param['pageToken'] = page_token
files = service.files().list(**param).execute()
result.extend(files['files'])
page_token = files.get('nextPageToken')
if not page_token:
break
except errors.HttpError as error:
print('An error occurred: %s' % error)
break
return result
def insert_file(service, name, description, parent_id, mime_type, filename):
"""Insert new file.
Args:
service: Drive API service instance.
name: Name of the file to insert, including the extension.
description: Description of the file to insert.
parent_id: Parent folder's ID.
mime_type: MIME type of the file to insert.
filename: Filename of the file to insert.
Returns:
Inserted file metadata if successful, None otherwise.
"""
media_body = MediaFileUpload(filename, mimetype=mime_type, resumable=True)
body = {
'name': name,
'description': description,
'mimeType': mime_type
}
# Set the parent folder.
if parent_id:
body['parents'] = [{'id': parent_id}]
try:
file = service.files().create(
body=body,
media_body=media_body).execute()
# Uncomment the following line to print the File ID
# print 'File ID: %s' % file['id']
return file
except errors.HttpError as error:
print('An error occurred: %s' % error)
return None
# If modifying these scopes, delete the file token.json.
SCOPES = 'https://www.googleapis.com/auth/drive'
def main():
service = get_authenticated(SCOPES)
# Call the Drive v3 API
results = retrieve_all_files(service)
target_file_descr = 'Description of deploy.py'
target_file = 'deploy.py'
target_file_name = target_file
target_file_id = [file['id'] for file in results if file['name'] == target_file_name]
if len(target_file_id) == 0:
print('No file called %s found in root. Create it:' % target_file_name)
file_uploaded = insert_file(service, target_file_name, target_file_descr, None,
'text/x-script.phyton', target_file_name)
else:
print('File called %s found. Update it:' % target_file_name)
file_uploaded = update_file(service, target_file_id[0], target_file_name, target_file_descr,
'text/x-script.phyton', target_file_name)
print(str(file_uploaded))
if __name__ == '__main__':
main()
In order to try the example, is necessary to create a Google Drive API from https://console.developers.google.com/apis/dashboard,
then save the file credentials.js and pass its path to get_authenticated(). The file token.json will be created after the first
authentication and API authorization.
The problem is that the metadata 'id' can not be changed when updating a file, so it should not be in the body. Just delete it from the dict:
# File's new metadata.
del file['id'] # 'id' has to be deleted
file['name'] = new_name
file['description'] = new_description
file['mimeType'] = new_mime_type
file['trashed'] = True
I tried your code with this modification and it works
I also struggled a little bit with the function and found if you don't have to update the metadata then just remove them in the update function like :updated_file = service.files().update(fileId=file_id, media_body=media_body).execute()
At Least that worked for me
The problem is The resource body includes fields which are not directly writable. So try removing all of the metadata properties and then add them back one by one. The one I would be suspicious about is trashed. Even though the API docs say this is writable, it shouldn't be. Trashing a file has side effects beyond setting a boolean. Updating a file and setting it to trashed at the same time is somewhat unusual. Are you sure that's what you intend?

boto3 check if Athena database exists

Im making a script that creates a database in AWS Athena and then creates tables for that database, today the DB creation was taking ages, so the tables being created referred to a db that doesn't exists, is there a way to check if a DB is already created in Athena using boto3?
This is the part that created the db:
client = boto3.client('athena')
client.start_query_execution(
QueryString='create database {}'.format('db_name'),
ResultConfiguration=config
)
# -*- coding: utf-8 -*-
import logging
import os
from time import sleep
import boto3
import pandas as pd
from backports.tempfile import TemporaryDirectory
logger = logging.getLogger(__name__)
class AthenaQueryFailed(Exception):
pass
class Athena(object):
S3_TEMP_BUCKET = "please-replace-with-your-bucket"
def __init__(self, bucket=S3_TEMP_BUCKET):
self.bucket = bucket
self.client = boto3.Session().client("athena")
def execute_query_in_athena(self, query, output_s3_directory, database="csv_dumps"):
""" Useful when client executes a query in Athena and want result in the given `s3_directory`
:param query: Query to be executed in Athena
:param output_s3_directory: s3 path in which client want results to be stored
:return: s3 path
"""
response = self.client.start_query_execution(
QueryString=query,
QueryExecutionContext={"Database": database},
ResultConfiguration={"OutputLocation": output_s3_directory},
)
query_execution_id = response["QueryExecutionId"]
filename = "{filename}.csv".format(filename=response["QueryExecutionId"])
s3_result_path = os.path.join(output_s3_directory, filename)
logger.info(
"Query query_execution_id <<{query_execution_id}>>, result_s3path <<{s3path}>>".format(
query_execution_id=query_execution_id, s3path=s3_result_path
)
)
self.wait_for_query_to_complete(query_execution_id)
return s3_result_path
def wait_for_query_to_complete(self, query_execution_id):
is_query_running = True
backoff_time = 10
while is_query_running:
response = self.__get_query_status_response(query_execution_id)
status = response["QueryExecution"]["Status"][
"State"
] # possible responses: QUEUED | RUNNING | SUCCEEDED | FAILED | CANCELLED
if status == "SUCCEEDED":
is_query_running = False
elif status in ["CANCELED", "FAILED"]:
raise AthenaQueryFailed(status)
elif status in ["QUEUED", "RUNNING"]:
logger.info("Backing off for {} seconds.".format(backoff_time))
sleep(backoff_time)
else:
raise AthenaQueryFailed(status)
def __get_query_status_response(self, query_execution_id):
response = self.client.get_query_execution(QueryExecutionId=query_execution_id)
return response
As pointed in above answer, Athena Waiter is still not there implemented.
I use this light weighted Athena client to do the query, it returns the s3 path of result when the query is completed.
The waiter functions for Athena are not implemented yet: Athena Waiter
See: Support AWS Athena waiter feature for a possible workaround until it is implemented in Boto3. This is how it is implemented in AWS CLI.
while True:
stats = self.athena.get_query_execution(execution_id)
status = stats['QueryExecution']['Status']['State']
if status in ['SUCCEEDED', 'FAILED', 'CANCELLED']:
break
time.sleep(0.2)

Downloading a file from an s3 Bucket to the USERS computer

Goal
Download file from s3 Bucket to users computer.
Context
I am working on a Python/Flask API for a React app. When the user clicks the Download button on the Front-End, I want to download the appropriate file to their machine.
What I've tried
import boto3
s3 = boto3.resource('s3')
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')
I am currently using some code that finds the path of the downloads folder and then plugging that path into download_file() as the second parameter, along with the file on the bucket that they are trying to download.
This worked locally, and tests ran fine, but I run into a problem once it is deployed. The code will find the downloads path of the SERVER, and download the file there.
Question
What is the best way to approach this? I have researched and cannot find a good solution for being able to download a file from the s3 bucket to the users downloads folder. Any help/advice is greatly appreciated.
You should not need to save the file to the server. You can just download the file into memory, and then build a Response object containing the file.
from flask import Flask, Response
from boto3 import client
app = Flask(__name__)
def get_client():
return client(
's3',
'us-east-1',
aws_access_key_id='id',
aws_secret_access_key='key'
)
#app.route('/blah', methods=['GET'])
def index():
s3 = get_client()
file = s3.get_object(Bucket='blah-test1', Key='blah.txt')
return Response(
file['Body'].read(),
mimetype='text/plain',
headers={"Content-Disposition": "attachment;filename=test.txt"}
)
app.run(debug=True, port=8800)
This is ok for small files, there won't be any meaningful wait time for the user. However with larger files, this well affect UX. The file will need to be completely downloaded to the server, then download to the user. So to fix this issue, use the Range keyword argument of the get_object method:
from flask import Flask, Response
from boto3 import client
app = Flask(__name__)
def get_client():
return client(
's3',
'us-east-1',
aws_access_key_id='id',
aws_secret_access_key='key'
)
def get_total_bytes(s3):
result = s3.list_objects(Bucket='blah-test1')
for item in result['Contents']:
if item['Key'] == 'blah.txt':
return item['Size']
def get_object(s3, total_bytes):
if total_bytes > 1000000:
return get_object_range(s3, total_bytes)
return s3.get_object(Bucket='blah-test1', Key='blah.txt')['Body'].read()
def get_object_range(s3, total_bytes):
offset = 0
while total_bytes > 0:
end = offset + 999999 if total_bytes > 1000000 else ""
total_bytes -= 1000000
byte_range = 'bytes={offset}-{end}'.format(offset=offset, end=end)
offset = end + 1 if not isinstance(end, str) else None
yield s3.get_object(Bucket='blah-test1', Key='blah.txt', Range=byte_range)['Body'].read()
#app.route('/blah', methods=['GET'])
def index():
s3 = get_client()
total_bytes = get_total_bytes(s3)
return Response(
get_object(s3, total_bytes),
mimetype='text/plain',
headers={"Content-Disposition": "attachment;filename=test.txt"}
)
app.run(debug=True, port=8800)
This will download the file in 1MB chunks and send them to the user as they are downloaded. Both of these have been tested with a 40MB .txt file.
A better way to solve this problem is to create presigned url. This gives you a temporary URL that's valid up to a certain amount of time. It also removes your flask server as a proxy between the AWS s3 bucket which reduces download time for the user.
def get_attachment_url():
bucket = 'BUCKET_NAME'
key = 'FILE_KEY'
client: boto3.s3 = boto3.client(
's3',
aws_access_key_id=YOUR_AWS_ACCESS_KEY,
aws_secret_access_key=YOUR_AWS_SECRET_KEY
)
return client.generate_presigned_url('get_object',
Params={'Bucket': bucket, 'Key': key},
ExpiresIn=60) `

mySQL UPDATE doesn't work for GAE when called from browser

I am writing a verify email address python file for Google App Engine. (ya I know django has stuff, but I wanted to write my own because that is how I learn)
Below is the python code. The code returns "Email Account Verified" which seems to me that the queries worked. However when I look at the "active" column in the database, it is still 0.
If I run the query string that logging.info("%s",db_query) in the database itself, it works and is updated to 1.
All my other python code (with UPDATES) works fine, the only difference is those python files are called from my ios app and this one is called from a browser.
#Make the libs folder with 3rd party libraries and common methods
import sys
sys.path.insert(0, 'libs')
#Imports
import logging
import webapp2
from django.utils.html import strip_tags
import common
import MySQLdb
import json
VERIFIED_HTML = """\
<html>
<body>
<h1>Email Account Verified</h1>
</body>
</html>
"""
ERROR_HTML = """\
<html>
<body>
<h1>ERROR</h1>
</body>
</html>
"""
class VerifyEmail(webapp2.RequestHandler):
def get(self):
user_email = strip_tags(self.request.get('user_email').lower().strip())
user_activation_hash = strip_tags(self.request.get('user_activation_hash').strip())
logging.info("User Email = %s", user_email)
logging.info("User Activation Hash = %s", user_activation_hash)
#Insert the information into the users table
#Get the database connection to Google Cloud SQL
db = common.connect_to_google_cloud_sql()
db_cursor = db.cursor(MySQLdb.cursors.DictCursor)
#Check to see if user already exists
#Query for user
db_query = """SELECT \
email, activation_hash \
FROM users WHERE email='%s' AND activation_hash='%s'""" % (user_email, user_activation_hash)
db_cursor.execute(db_query)
#If there is one record containing the username check password
if(db_cursor.rowcount == 1):
db_query = """UPDATE users SET active=%s WHERE email='%s';""" % (1, user_email)
logging.info("%s" % db_query)
if(db_cursor.execute(db_query)):
self.response.write(VERIFIED_HTML)
else:
self.response.write(ERROR_HTML)
else: #either no user, or activation_hash doesn't match
self.response.write(ERROR_HTML)
Connect to Google Cloud SQL
def connect_to_google_cloud_sql():
#hostname = DEV_DB_HOSTNAME
#hostname = PROD_DB_HOSTNAME
db_username = 'dummy_user' #not real
db_password = 'dummypassword' # not real
#If PROD or Deployed Testing, use unix_socket
if(os.getenv('SERVER_SOFTWARE') and os.getenv('SERVER_SOFTWARE').startswith('Google App Engine/')):
db = MySQLdb.connect(unix_socket='/cloudsql/' + _DATABASE_HOSTNAME, db='dummydbname', user=db_username, passwd=db_password)
else: #Local Testing uses host
db = MySQLdb.connect(host=_DATABASE_HOSTNAME, port=3306, db='dummydbname', user=db_username, passwd=db_password)
logging.info("Got DB Connection")
return db
Any suggestions? Is it GAE Cloud SQL Privledges????
Maybe because I was using my browser with the local app engine running on my local ip???
You need to call .commit() on MySQLdb cursors after executing queries. This is why your update is failing. It updates inside the transaction, but when your code ends without committing the transaction, the changes to the DB are rolled back, despite having told the user of success on update.
You can also use the following method to ensure commits when using a cursor: db_cursor.autocommit(True).

Categories

Resources