I have a requirement that there are a lot of files (such as image, .csv) saved in a table hosted in Azure PostgreSQL. Files are saved as binary data type. Is it possible extract them directly to local file system by SQL query? I am using python as my programming language, any guide or code sample is appreciated, thanks!
If you just want to extract binary files from SQL to local and save as a file, try the code below:
import psycopg2
import os
connstr = "<conn string>"
rootPath = "d:/"
def saveBinaryToFile(sqlRowData):
destPath = rootPath + str(sqlRowData[1])
if(os.path.isdir(destPath)):
destPath +='_2'
os.mkdir(destPath)
else:
os.mkdir(destPath)
newfile = open(destPath +'/' + sqlRowData[0]+".jpg", "wb");
newfile.write(sqlRowData[2])
newfile.close
conn = psycopg2.connect(connstr)
cur = conn.cursor()
sql = 'select * from images'
cur.execute(sql)
rows = cur.fetchall()
print(sql)
print('result:' + str(rows))
for i in range(len(rows)):
saveBinaryToFile(rows[i])
conn.close()
This is my sample SQL table :
Result:
Related
I have a table in my PostgreSQL database in which a column type is set to bytea in order to store zipped files.
The storing procedure works fine. I have problems when I need to retrieve the zipped file I uploaded.
def getAnsibleByLibrary(projectId):
con = psycopg2.connect(
database="xyz",
user="user",
password="pwd",
host="localhost",
port="5432",
)
print("Database opened successfully")
cur = con.cursor()
query = "SELECT ansiblezip FROM library WHERE library.id = (SELECT libraryid from project WHERE project.id = '"
query += str(projectId)
query += "')"
cur.execute(query)
rows = cur.fetchall()
repository = rows[0][0]
con.commit()
con.close()
print(repository, type(repository))
with open("zippedOne.zip", "wb") as fin:
fin.write(repository)
This code creates a zippedOne.zip file but it seems to be an invalid archive.
I tried also saving repository.tobytes() but it gives the same result.
I don't understand how I can handle memoriview objects.
If I try:
print(repository, type(repository))
the result is:
<memory at 0x7f6b62879348> <class 'memoryview'>
If I try to unzip the file:
chain#wraware:~$ unzip zippedOne.zip
The result is:
Archive: zippedOne.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of zippedOne.zip or
zippedOne.zip.zip, and cannot find zippedOne.zip.ZIP, period.
Trying to extract it in windows gives me the error: "The compressed (zipped) folder is invalid"
This code, based on the example in the question, works for me:
import io
import zipfile
import psycopg2
DROP = """DROP TABLE IF EXISTS so69434887"""
CREATE = """\
CREATE TABLE so69434887 (
id serial primary key,
ansiblezip bytea
)
"""
buf = io.BytesIO()
with zipfile.ZipFile(buf, mode='w') as zf:
zf.writestr('so69434887.txt', 'abc')
with psycopg2.connect(database="test") as conn:
cur = conn.cursor()
cur.execute(DROP)
cur.execute(CREATE)
conn.commit()
cur.execute("""INSERT INTO so69434887 (ansiblezip) VALUES (%s)""", (buf.getvalue(),))
conn.commit()
cur.execute("""SELECT ansiblezip FROM so69434887""")
memview, = cur.fetchone()
with open('so69434887.zip', 'wb') as f:
f.write(memview)
and is unzippable (on Linux, at least)
$ unzip -p so69434887.zip so69434887.txt
abc
So perhaps the data is not being inserted correctly.
FWIW I got the "End-of-central-directory signature not found" until I made sure I closed the zipfile object before writing to the database.
This question already has an answer here:
Inserting a list holding multiple values in MySQL using pymysql
(1 answer)
Closed 3 years ago.
What I'm doing:
I'm executing a query from a mysql table and exporting each day's worth of data into a folder
I then insert each csv row by row using a for loop into a separate mysql table
Once loaded into the table, I then move the csv into another separate folder
The problem is that it is taking a very long time to run and would like some help to find out areas where I can speed up the process or suggestions for alternative methods in Python.
Code:
import pymysql
import pymysql.cursors
import csv
import os
import shutil
import datetime
from db_credentials import db1_config, db2...
def date_range(start, end):
# Creates a list of dates from start to end
...
def export_csv(filename, data):
# Exports query result as a csv to the filename's pending folder
...
def extract_from_db(database, sql, start_date, end_date, filename):
# SQL query to extract data and export as csv
...
def open_csv(c):
# Read csv and return as a list of lists
...
def get_files(folder):
# Grab all csv files from a given folder's pending folder
...
# HERE IS WHERE IT GETS SLOW
def load_to_db(table, folder):
print('Uploading...\n')
files = get_files(folder)
# Connect to db2 database
connection = pymysql.connect(**db2_config, charset='utf8mb4', cursorclass=pymysql.cursors.DictCursor)
try:
with connection.cursor() as cursor:
# Open each csv in the files list and ignore column headers
for file in files:
print('Processing ' + file.split("pending/",1)[1] + '...', end='')
csv_file = open_csv(file)
csv_headers = ', '.join(csv_file[0])
csv_data = csv_file[1:]
# Insert each row of each csv into db2 table
for row in csv_data:
placeholders = ', '.join(['%s'] * len(row))
sql = "INSERT INTO %s (%s) VALUES ( %s )" % (table, csv_headers, placeholders)
cursor.execute(sql, row)
# Move processed file to the processed folder
destination_folder = os.path.join('/Users','python', folder, 'processed')
shutil.move(file, destination_folder)
print('DONE')
# Connection is not autocommit by default.
# So you must commit to save your changes.
connection.commit()
finally:
connection.close()
if not files:
print('No csv data available to process')
else:
print('Finished')
How about trying mysql LOAD DATA
e.g. execute the following statements for the entire csv rather than individual inserts
LOAD DATA INFILE '<your filename>'
INTO TABLE <your table>
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS;
I have a problem. I need parse muliple xml files and insert data to database.
import os
from lxml import etree
import sqlite3
conn = sqlite3.connect("xml.db")
cursor = conn.cursor()
path = 'C:/tools/XML'
for filename in os.listdir(path):
fullname = os.path.join(path, filename)
tree = etree.parse(fullname)
test = tree.xpath('//*[#name="Name"]/text()')
tpl = tuple(test)
cursor.executemany("INSERT INTO parsee VALUES (?);", (tpl,))
conn.commit()
sql = "SELECT * FROM parsee"
cursor.execute(sql)
print(cursor.fetchall())
result:
[('testname1',)]
If I run the program again the program adds another same name. Result:
[('testname1',),('testname1',)]
There are 100 files in folder:
<curent name="Name">testname1<curent>
<curent name="Name">testname2<curent>
<curent name="Name">testname3<curent>
<curent name="Name">testname4<curent>
Since I don't have admin rights to install lxml on my computer, I will use a battery (class) that is included in Python by default to deal with XPATHs - xml.etree.ElementTree. However, my code will show you how to insert multiple records in SQLITE using executemany()
It looks like in C:/tools/XML you will have many xml files of same structure.
I put following two in a folder to simulate this (I noted that your example has 'curent' as element, not sure if it is a typo, I am using 'current')
file1.xml
<note>
<current name="Name">testname1</current>
<current name="Name">testname2</current>
<otherdetail></otherdetail>
</note>
file2.xml
<note>
<current name="Name">testname3</current>
<current name="Name">testname4</current>
<otherdetail></otherdetail>
</note>
Created an sqlite database case called xml.db and a table in it with following statement
CREATE TABLE PARSEE (NAME VARCHAR(100));
And here is my python script
import os
import xml.etree.ElementTree as ET
import sqlite3
conn = sqlite3.connect("xml.db")
cursor = conn.cursor()
path = 'C:/tools/XML'
for filename in os.listdir(path):
fullname = os.path.join(path, filename)
print("Parsing file: %s" %fullname)
tree = ET.parse(fullname)
root = tree.getroot()
elements = root.findall(".//*[#name='Name']");
names = [(e.text,) for e in elements]
print("Names found: %s" %names)
cursor.executemany("INSERT INTO PARSEE VALUES (?)", names)
conn.commit()
sql = "SELECT * FROM PARSEE"
print("Printing table PARSEE content")
cursor.execute(sql)
print(cursor.fetchall())
And here is the output
I am using Python 3.6 to iterate through a folder structure and return the file paths of all these CSVs I want to import into two already created Oracle tables.
con = cx_Oracle.connect('BLAH/BLAH#XXX:666/BLAH')
#Targets the exact filepaths of the CSVs we want to import into the Oracle database
if os.access(base_cust_path, os.W_OK):
for path, dirs, files in os.walk(base_cust_path):
if "Daily" not in path and "Daily" not in dirs and "Jul" not in path and "2017-07" not in path:
for f in files:
if "OUTPUT" in f and "MERGE" not in f and "DD" not in f:
print("Import to OUTPUT table: "+ path + "/" + f)
#Run function to import to SQL Table 1
if "MERGE" in f and "OUTPUT" not in f and "DD" not in f:
print("Import to MERGE table: "+ path + "/" + f)
#Run function to import to SQL Table 2
A while ago I was able to use PHP to produce a function that used the BULK INSERT SQL command for SQL Server:
function bulkInserttoDB($csvPath){
$tablename = "[DATABASE].[dbo].[TABLE]";
$insert = "BULK
INSERT ".$tablename."
FROM '".$csvPath."'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\\n')";
print_r($insert);
print_r("<br>");
$result = odbc_prepare($GLOBALS['connection'], $insert);
odbc_execute($result)or die(odbc_error($connection));
}
I was looking to replicate this for Python, but a few Google searches left me to believe there is no 'BULK INSERT' command for Oracle. This BULK INSERT command had awesome performance.
Since these CSVs I am loading are huge (2GB x 365), performance is crucial. What is the most efficient way of doing this?
The bulk insert is made using the cx_oracle library and the commands
con = cx_Oracle.connect(CONNECTION_STRING)
cur= con.cursor()
cur.prepare("INSERT INTO MyTable values (
to_date(:1,'YYYY/MM/DD HH24:MI:SS'),
:2,
:3,
to_date(:4,'YYYY/MM/DD HH24:MI:SS'),
:5,
:6,
to_date(:7,'YYYY/MM/DD HH24:MI:SS'),
:8,
to_date(:9,'YYYY/MM/DD HH24:MI:SS'))"
) ##prepare your statment
list.append((sline[0],sline[1],sline[2],sline[3],sline[4],sline[5],sline[6],sline[7],sline[8])) ##prepare your data
cur.executemany(None, list) ##insert
you prepare an insert statement. Then you store your file and your list. finally you execute the many. It will paralyze everything.
I used following python script to dump a MySQL table to a CSV file. But it was saved in the same folder which python script is saved. I want to save it in another folder. How can I do it? Thank you
print 'Writing database to csv file'
import MySQLdb
import csv
import time
import datetime
import os
currentDate=datetime.datetime.now().date()
user = ''
passwd = ''
host = ''
db = ''
table = ''
con = MySQLdb.connect(user=user, passwd=passwd, host=host, db=db)
cursor = con.cursor()
query = "SELECT * FROM %s;" % table
cursor.execute(query)
with open('Data on %s.csv' % currentDate ,'w') as f:
writer = csv.writer(f)
for row in cursor.fetchall():
writer.writerow(row)
print 'Done'
Change this:
with open('/full/path/tofile/Data on %s.csv' % currentDate ,'w') as f:
This solves your problem X. But you have a problem Y. That is 'How do i efficiently, dump CSV data from mysql, without having to write a lot of code?'
Answer to problem Y is SELECT INTO OUTFILE