Copying data to Vertica using python - python

I use python and vertica-python library to COPY data to Vertica DB
connection = vertica_python.connect(**conn_info)
vsql_cur = connection.cursor()
with open("/tmp/vertica-test-insert", "rb") as fs:
vsql_cur.copy( "COPY table FROM STDIN DELIMITER ',' ", fs, buffer_size=65536)
connection.commit()
It inserts data, but only 5 rows, although the file contains more. Could this be related to db settings or it's some client issue?

This code works for me:
For JSON
# for json file
with open("D:/SampleCSVFile_2kb/tweets.json", "rb") as fs:
my_file = fs.read().decode('utf-8')
cur.copy( "COPY STG.unstruc_data FROM STDIN parser fjsonparser()", my_file)
connection.commit()
For CSV
# for csv file
with open("D:/SampleCSVFile_2kb/SampleCSVFile_2kb.csv", "rb") as fs:
my_file = fs.read().decode('utf-8','ignore')
cur.copy( "COPY STG.unstruc_data FROM STDIN PARSER FDELIMITEDPARSER (delimiter=',', header='false') ", my_file) # buffer_size=65536
connection.commit()

Very likely that you have rows getting rejected. Assuming you are using 7.x, you can add:
[ REJECTED DATA {'path' [ ON nodename ] [, ...] | AS TABLE 'reject_table'} ]
You can also query this after the copy execution to see the summary of results:
SELECTGET_NUM_ACCEPTED_ROWS(),GET_NUM_REJECTED_ROWS();

Related

Psycopg2 copy_expert returns different results than direct query

I am trying to debug this issue that I have been having for a couple of weeks now. I am trying to copy the result of a query in a Postgresql db into a csv file using psycopg2 and copy expert, however when my script finishes running, sometimes I end up with less rows than if I ran the query directly into the db using pgAdmin. This is the code that runs the query and saves it into a csv:
cursor = pqlconn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
fd = open("query.sql", 'r')
sql_query = fd.read()
fd.close()
csv_path = 'test.csv'
query = "copy (" + sql_query + \
") TO STDOUT WITH (FORMAT csv, DELIMITER ',', HEADER)"
with open(csv_path, 'w', encoding='utf-8') as f_output:
cursor.copy_expert(query, f_output)
print("Saved information to csv: ", csv_path)`
When it runs I will sometimes end up with less rows than if I ran it directly on the db, running it again still returns less rows than what I am already seeing in the db directly. Would appreciate any guidance on this, thanks!

Retrieve zipped file from bytea column in PostgreSQL using Python

I have a table in my PostgreSQL database in which a column type is set to bytea in order to store zipped files.
The storing procedure works fine. I have problems when I need to retrieve the zipped file I uploaded.
def getAnsibleByLibrary(projectId):
con = psycopg2.connect(
database="xyz",
user="user",
password="pwd",
host="localhost",
port="5432",
)
print("Database opened successfully")
cur = con.cursor()
query = "SELECT ansiblezip FROM library WHERE library.id = (SELECT libraryid from project WHERE project.id = '"
query += str(projectId)
query += "')"
cur.execute(query)
rows = cur.fetchall()
repository = rows[0][0]
con.commit()
con.close()
print(repository, type(repository))
with open("zippedOne.zip", "wb") as fin:
fin.write(repository)
This code creates a zippedOne.zip file but it seems to be an invalid archive.
I tried also saving repository.tobytes() but it gives the same result.
I don't understand how I can handle memoriview objects.
If I try:
print(repository, type(repository))
the result is:
<memory at 0x7f6b62879348> <class 'memoryview'>
If I try to unzip the file:
chain#wraware:~$ unzip zippedOne.zip
The result is:
Archive: zippedOne.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of zippedOne.zip or
zippedOne.zip.zip, and cannot find zippedOne.zip.ZIP, period.
Trying to extract it in windows gives me the error: "The compressed (zipped) folder is invalid"
This code, based on the example in the question, works for me:
import io
import zipfile
import psycopg2
DROP = """DROP TABLE IF EXISTS so69434887"""
CREATE = """\
CREATE TABLE so69434887 (
id serial primary key,
ansiblezip bytea
)
"""
buf = io.BytesIO()
with zipfile.ZipFile(buf, mode='w') as zf:
zf.writestr('so69434887.txt', 'abc')
with psycopg2.connect(database="test") as conn:
cur = conn.cursor()
cur.execute(DROP)
cur.execute(CREATE)
conn.commit()
cur.execute("""INSERT INTO so69434887 (ansiblezip) VALUES (%s)""", (buf.getvalue(),))
conn.commit()
cur.execute("""SELECT ansiblezip FROM so69434887""")
memview, = cur.fetchone()
with open('so69434887.zip', 'wb') as f:
f.write(memview)
and is unzippable (on Linux, at least)
$ unzip -p so69434887.zip so69434887.txt
abc
So perhaps the data is not being inserted correctly.
FWIW I got the "End-of-central-directory signature not found" until I made sure I closed the zipfile object before writing to the database.

Load CSV file from S3 to RDS PostgreSQL

I am reading a file from the file system from loading data to PostgreSQL DB. I would like to use the code below to copy data to the database. However I have to fetch the CSV file from S3 instead of reading from the file system. I saw that there were utilities that allow data to be loaded directly from S3 to RDS - but it is not supported in my organization at the moment.How can I stream data from CSV file in S3 to PostgreSQL DB?
def load_data(conn, table_name, file_path):
copy_sql = """
COPY %s FROM stdin WITH CSV HEADER
DELIMITER as ','
"""
cur = conn.cursor()
f = open(file_path, 'r', encoding="utf-8")
cur.copy_expert(sql=copy_sql % table_name, file=f)
f.close()
cur.close()

Reading sql query from a json file

I want to fetch the SQL query from a text file and run it in Python program. This is my code:
csvfilelist=os.listdir(inputPath)
mycursor = mydb.cursor()
for csvfilename in csvfilelist:
with open(inputPath + csvfilename, 'r') as csvFile:
reader = csv.reader(csvFile)
for row in reader:
'''r = "INSERT INTO Terminate.RAW VALUES('%s','%s','%s','%s','%s')" %(row[0],row[1],row[2],row[3],row[4],row[5])'''
try:
result = mycursor.execute(r)
mydb.commit()
except mysql.connector.Error as err:
print(err)
csvFile.close()
Say you have a INI file containing the query
[main]
query=INSERT INTO Terminate.RAW VALUES('%s','%s','%s','%s','%s')
you may load it
config = configparser.ConfigParser()
config.read('myfile.ini')
query = config['main']['query']
and later you can call it with
r = query % (row[0],row[1],row[2],row[3],row[4],row[5])
As pointed out in comments, using "%" in queries is not a good solution, you should bind your variables when executing the query. I don't remember the exact syntax, it's something like
r = query
mycursor.execute(r, (row[0],row[1],row[2],row[3],row[4],row[5]))
Edit: sorry, I just read that your file is JSON, not INI. You wrote that in the title, not in the post. If so, you should use the json module instead of configparser module.

how to render data from postgresql to csv in python flask app?

I am new in python and trying to write a code in it. I am trying to run a select query but i am not able to to render a data to csv file ?
this is the psql query :
# \copy (
# SELECT
# sr.imei,
# sensors.label,sr.created_at,
# sr.received_at,
# sr.type_id,
#
but How to write it in python to render it to csv file ?
thanking you,
Vikas
sql = "COPY (SELECT * FROM sensor_readings WHERE reading=blahblahblah) TO STDOUT WITH CSV DELIMITER ';'"
with open("/tmp/sensor_readings.csv", "w") as file:
cur.copy_expert(sql, file)
I think you just need to change the sql for your use, and it should work.
Install psycopg2 via pip install psycopg2 than you need something like this
import csv
import psycopg2
query = """
SELECT
sr.imei,
sensors.label,sr.created_at,
sr.received_at,
sr.type_id,
sr.data FROM sensor_readings as sr LEFT JOIN sensors on sr.imei = sensors.imei
WHERE sr.imei not like 'test%' AND sr.created_at > '2019-02-01'
ORDER BY sr.received_at desc
"""
conn = psycopg2.connect(database="routing_template", user="postgres", host="localhost", password="xxxx")
cur = conn.cursor()
cur.execute(query)
with open('result.csv', 'w') as f:
writer = csv.writer(f, delimiter=',')
for row in cur.fetchall():
writer.writerow(row)
cur.close()
conn.close()

Categories

Resources