I have to store a small PDF file in a Postgres database (already have a table ready with a bytea column for the data), then be able to delete the file, and use the data in the database to restore the PDF as it was.
For context, I'm working with FastApi in Python3 so I can get the file as bytes, or as a whole file. So the main steps are:
Getting the file as bytes or a file via FastAPI
Inserting it into the Postgres DB
Retrieve the data in the DB
Make a new PDF file with the data.
How can I do that in a clean way?
The uploading function from FastAPI :
def import_data(file: UploadFile= File(...)):
# Put the whole data into a variable as bytes
pdfFile = file.file.read()
database.insertPdfInDb(pdfFile)
# Saving the file we just got to check if it's intact (it is)
file_name = file.filename.replace(" ", "-")
with open(file_name,'wb+') as f:
f.write(pdfFile)
f.close()
return {"filename": file.filename}
The function inserting the data into the Postgres DB :
def insertPdfInDb(pdfFile):
conn = connectToDb()
curs = conn.cursor()
curs.execute("INSERT INTO PDFSTORAGE(pdf, description) values (%s, 'Some description...')", (psycopg2.Binary(pdfFile),))
conn.commit()
print("PDF insertion in the database attempted.")
disconnectFromDb(conn)
return 0
# Saving the file we just got to check if it's intact (it is)
file_name = file.filename.replace(" ", "-")
with open(file_name,'wb+') as f:
f.write(pdfFile)
f.close()
return {"filename": file.filename}
The exporting part is just started and entirely try-and-error code.
Related
I have csv file containing schema and table name (Format shared below). My task is to unload data from Redshift to S3 bucket in CSV file type. For this task I have below python script and I have 2 IAM access. First IAM access to unload data from Redshift. 2nd IAM access to write the data to S3 bucket. The issue that I am facing is using below script, I am able to create folder in my S3 bucket, however instead of CSV file the file type is " -" in S3 bucket. I am not sure what is the possible reason ?
Any help is much appreciated. Thanks in advance for your time and effort!
Note: I have millions of rows to unload from Redshift to S3 bucket.
CSV File containing schema and table name
Schema;tables
mmy_schema;my_table
Python Script
import csv
import redshift_connector
import sys
CSV_FILE="Tables.csv"
CSV_DELIMITER=';'
S3_DEST_PATH="s3://..../"
DB_HOST="MY HOST"
DB_PORT=1234
DB_DB="MYDB"
DB_USER="MY_READ"
DB_PASSWORD="MY_PSWD"
IM_ROLE="arn:aws:iam::/redshift-role/unload data","arn:aws::iam::/write in bucket"
def get_tables(path):
tables=[]
with open (path, 'r') as file:
csv_reader = csv.reader (file,delimiter=CSV_DELIMITER)
header = next(csv_reader)
if header != None:
for row in csv_reader:
tables.append(row)
return tables
def unload(conn, tables, s3_path):
cur = conn.cursor()
for table in tables:
print(f">{table[0]}.{table[1]}")
try:
query= f'''unload('select * from {table[0]}.{table[1]}' to '{s3_path}/{table[1]}/'
iam_role '{IAM_ROLE}'
CSV
PARALLEL FALSE
CLEANPATH;'''
print(f"loading in progress")
cur.execute(query)
print(f"Done.")
except Esception as e:
print("Failed to load")
print(str(e))
sys.exit(1)
cur.close()
def main():
try:
conn = redshift_connector.connect(
host=DB_HOST,
port=DB_PORT,
database= DB_DB,
user= DB_USER,
password=DB_PASSWORD
)
tables = get_tables(CSV_FILE)
unload(conn,tables,S3_DEST_PATH)
conn.close()
except Exception as e:
print(e)
sys.exit(1)
Redshift doesn't add file type suffixes on UNLOAD. It should just end with the part number. Yes, "parallel off" unloads can make multiple files. If it is required that these file names end in ".csv" then your script will need to issue the S3 rename API calls (copy and delete). This process should also check that the number of files and perform appropriate actions if needed.
I am trying to create function that I can upload and download (or View) pdf file into my database with PyQt5. Basically, I want to achieve following steps
Click button to upload, then opens the window to select pdf file
From the Tablewidget or some other available views, make function to download PDF or view PDF file that is saved in my database through 'step 1)'
I think I figured one way to upload the file,
import mysql.connector
from mysql.connector import Error
def write_file(data, filename):
# Convert binary data to proper format and write it on Hard Disk
with open(filename, 'wb', encoding="utf8", errors='ignore') as file:
file.write(data)
def readBLOB(emp_id, photo, bioData):
print("Reading BLOB data from python_Employee table")
try:
connection = mysql.connector.connect(host="00.00.00.000",
user="user",
password="pswd",
database="database")
cursor = connection.cursor()
sql_fetch_blob_query = """SELECT * from python_Employee where id = %s"""
cursor.execute(sql_fetch_blob_query, (emp_id,))
record = cursor.fetchall()
for row in record:
print("Id = ", row[0], )
print("Name = ", row[1])
image = row[2]
file = row[3]
print("Storing employee image and bio-data on disk \n")
write_file(image, photo)
write_file(file, bioData)
except mysql.connector.Error as error:
print("Failed to read BLOB data from MySQL table {}".format(error))
finally:
if (connection.is_connected()):
cursor.close()
connection.close()
print("MySQL connection is closed")
path1 = r'C:\Users\Bruce Ko\Desktop\Minsub Lee\Development\Git\Inventory Software\testdata.pdf'
path2 = r'C:\Users\Bruce Ko\Desktop\Minsub Lee\Development\Git\Inventory Software\eric_bioData.txt'
readBLOB(1, path1, path2)
From above programming, I could see from MySQL Workbench that my pdf file is uploaded (Even if I cannot read from there. Only binary information was available while uploaded image file was readable as image)
So if above function is correct to upload the pdf file, how can I download or read it? Above code is from https://pynative.com/python-mysql-blob-insert-retrieve-file-image-as-a-blob-in-mysql/ and reading blob part from above link does not work out for me. it leaves me error like following
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I would much appreciated any help on this!
I am reading a file from the file system from loading data to PostgreSQL DB. I would like to use the code below to copy data to the database. However I have to fetch the CSV file from S3 instead of reading from the file system. I saw that there were utilities that allow data to be loaded directly from S3 to RDS - but it is not supported in my organization at the moment.How can I stream data from CSV file in S3 to PostgreSQL DB?
def load_data(conn, table_name, file_path):
copy_sql = """
COPY %s FROM stdin WITH CSV HEADER
DELIMITER as ','
"""
cur = conn.cursor()
f = open(file_path, 'r', encoding="utf-8")
cur.copy_expert(sql=copy_sql % table_name, file=f)
f.close()
cur.close()
I am trying to load pipe separated csv file in hive table using python without success. Please assist.
Full code:
from pyhive import hive
host_name = "192.168.220.135"
port = 10000
user = "cloudera"
password = "cloudera"
database = "default"
conn = hive.Connection(host=host_name, port=port, username=user, database=database)
print('Connected to DB: {}'.format(host_name))
cursor = conn.cursor()
Query = """LOAD DATA LOCAL inpath '/home/cloudera/Desktop/ccna_test/RERATING_EMMCCNA.csv' INTO TABLE python_testing fields terminated by '|' lines terminated by '\n' """
cursor.execute(Query)
From your question, I assume the csv format is like below and you want a query to load data into hive table.
value1|value2|value3
value4|value5|value6
value7|value8|value9
First there should be a hive table and could be created using below query.
create table python_testing
(
col1 string,
col2 string,
col3 string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' with SERDEPROPERTIES ( "separatorChar" = "|")
stored as textfile;
Note that separator character and input file format is explicitly given on table creation.
Also table is stored in TEXTFILE format. This is due to the format of input file.
If you want ORC table, then input file should be in ORC format(Hive 'load data' command just copies the files to hive data files and does not do any transformations on data). A possible workaround is to create a temporary table with STORED AS TEXTFILE, LOAD DATA into it, and then copy data from this table to the ORC table.
Use 'load' command to load the data.
load data local inpath '/home/hive/data.csv' into table python_testing;
/home/hive/data.csv should be your file path.
For more details visit blog post - http://forkedblog.com/load-data-to-hive-database-from-csv-file/
I am trying to follow one copy_from example describe in stackoverflow but i modify little as i need to read data from csv file. Following this example i wrote a small program where the file is to be readed from file stored in disk and then copy data from that file to created table, My code is :
def importFile():
path = "C:\myfile.csv"
curs = conn.cursor()
curs.execute("Drop table if exists test_copy; ")
data = StringIO.StringIO()
data.write(path)
data.seek(0)
curs.copy_from(data, 'MyTable')
print("Data copied")
But i get error,
psycopg2.DataError: invalid input syntax for integer:
Does this mean there is mismatch between csv file and my table? OR is this syntax enough in order to copy csv file? or I need some more code ?? I am new to python, so any help will be appreciated..
Look at your .csv file with a text editor. You want to be sure that
the field-separator is a tab character
there are no quote-chars
there is no header row
If this is true, the following should work:
import psycopg2
def importFromCsv(conn, fname, table):
with open(fname) as inf:
conn.cursor().copy_from(inf, table)
def main():
conn = ?? # set up database connection
importFromCsv(conn, "c:/myfile.csv", "MyTable")
print("Data copied")
if __name__=="__main__":
main()