I'm appreciate to your help in my issue.
I want to append my response query into exist CSV file. I'm implement this by this example but from unknown reason - the output file stay empty.
This is minimal snippet of my code:
import psycopg2
import numpy as np
# Connect to an existing database
conn = psycopg2.connect(dbname="...", user="...", password="...", host="...")
# Open a cursor to perform database operations
cur = conn.cursor()
f = open('file_name.csv', 'ab') # "ab" for appending
cur.execute("""select * from table limit 10""") # I have another query here but it isn't relevant.
cur_out = np.asarray(cur.fetchall())
Until here it works perfect. When I print(cur_out), I got desired output. But in the next step:
np.savetxt(f, cur_out, delimiter=",", fmt='%s')
The file stay empty and I didn't find the reason for that.
Can you help me please?
Thankes for helpers.
I don't know how to tell you. But your code works for me perfectly.
My suggestions:
If you have existing files in this name, try to delete them and execute your code again,
Try to change the batch size.
Try to change the postfix of the file name to txt or dat.
Best wishes.
np.savetxt('file_name.csv', cur_out, delimiter=",", fmt='%s')
Related
I want to connect oracle database to python and using select statement whatever result i will get, I want that result to be exported as csv file in sftp location.
I know we can connect oracle with python using cx_oracle package.
But my concern is to get data from oracle to csv and export that.
Also to make a note my data is bigger in size.
Can anyone help me to get the solution which could be fast as well.
Thanks in advance.
Start with something like this, and change to meet your requirements:
import cx_Oracle
import csv
import sys, os
if sys.platform.startswith("darwin"):
cx_Oracle.init_oracle_client(lib_dir=os.environ.get("HOME")+"/Downloads/instantclient_19_8")
connection = cx_Oracle.connect(user='cj', password='cj', dsn='localhost/orclpdb1')
cursor = connection.cursor()
cursor.arraysize = 5000
with open("testpy.csv", "w", encoding='utf-8') as outputfile:
writer = csv.writer(outputfile, lineterminator="\n")
results = cursor.execute('select * from all_objects where rownum < 10000000 order by object_id')
writer.writerows(results)
In particular, you will want to tune the arraysize value.
See Tuning cx_Oracle.
I'm looking for an efficient way to import data from a CSV file to a Postgresql table using python in batches as I have quite large files and the server I'm importing the data to is far away. I need an efficient solution as everything I tried was either slow or just didn't work. I'm using SQLlahcemy.
I wanted to use raw SQL but it's so hard to parameterize and I need multiple loops to execute the query for multiple rows
I was given the task of manipulating & migrating some data from CSV files into a remote Postgres Instance.
I decided to use the Python script below:
import csv
import uuid
import psycopg2
import psycopg2.extras
import time
#Instant Time at the start of the Script
start = time.time()
psycopg2.extras.register_uuid()
#List of CSV Files that I want to manipulate & migrate.
file_list=["Address.csv"]
conn = psycopg2.connect("host=localhost dbname=address user=postgres password=docker")
cur = conn.cursor()
i = 1
for f in file_list:
f = open(f)
csv_f = csv.reader(f)
next(csv_f)
for row in csv_f:
# Some simple manipulations on each row
#Inserting a uuid4 into the first column
row.pop(0)
row.insert(0,uuid.uuid4())
row.pop(10)
row.insert(10,False)
row.pop(13)
#Tracking the number of rows inserted
print(i)
i = i + 1
#INSERT QUERY
postgres_insert_query = """ INSERT INTO "public"."address"("address_id","address_line_1","locality_area_street","address_name","app_version","channel_type","city","country","created_at","first_name","is_default","landmark","last_name","mobile","pincode","territory","updated_at","user_id") VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"""
record_to_insert = row
cur.execute(postgres_insert_query,record_to_insert)
f.close()
conn.commit()
conn.close()
print(time.time()-start)
The script worked quite well and promptly when testing it locally. But connecting to a remote Database Server added a lot more latency.
As a workaround, I migrated the manipulated data into my local postgres instance.
I then generated a .sql file of the migrated data & manually imported the .sql file on the remote server.
Alternatively, you can also use Python's Multithreading features, to launch multiple concurrent connections to the remote server and dedicate an isolated batch process to each connection, and flush the data.
This should make your migration considerably faster.
I have personally not tried the multi threading approach as it wasn't required in my case. But it seems darn efficient.
Hope this helped ! :)
Resources:
CSV Manipulation using Python for Beginners.
use copy_from command, it copies all the rows to table.
path=open('file.csv','r')
next(path)
cur.copy_from(path,'table_name',columns=('id','name','email'))
Question 1 of 2
I'm trying to import data from CSV file to Vertica using Python, using Uber's vertica-python package. The problem is that whitespace-only data elements are being loaded into Vertica as NULLs; I want only empty data elements to be loaded in as NULLs, and non-empty whitespace data elements to be loaded in as whitespace instead.
For example, the following two rows of a CSV file are both loaded into the database as ('1','abc',NULL,NULL), whereas I want the second one to be loaded as ('1','abc',' ',NULL).
1,abc,,^M
1,abc, ,^M
Here is the code:
# import vertica-python package by Uber
# source: https://github.com/uber/vertica-python
import vertica_python
# write CSV file
filename = 'temp.csv'
data = <list of lists, e.g. [[1,'abc',None,'def'],[2,'b','c','d']]>
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f, escapechar='\\', doublequote=False)
writer.writerows(data)
# define query
q = "copy <table_name> (<column_names>) from stdin "\
"delimiter ',' "\
"enclosed by '\"' "\
"record terminator E'\\r' "
# copy data
conn = vertica_python.connect( host=<host>,
port=<port>,
user=<user>,
password=<password>,
database=<database>,
charset='utf8' )
cur = conn.cursor()
with open(filename, 'rb') as f:
cur.copy(q, f)
conn.close()
Question 2 of 2
Are there any other issues (e.g. character encoding) I have to watch out for using this method of loading data into Vertica? Are there any other mistakes in the code? I'm not 100% convinced it will work on all platforms (currently running on Linux; there may be record terminator issues on other platforms, for example). Any recommendations to make this code more robust would be greatly appreciated.
In addition, are there alternative methods of bulk inserting data into Vertica from Python, such as loading objects directly from Python instead of having to write them to CSV files first, without sacrificing speed? The data volume is large and the insert job as is takes a couple of hours to run.
Thank you in advance for any help you can provide!
The copy statement you have should perform the way you want with regards to the spaces. I tested it using a very similar COPY.
Edit: I missed what you were really asking with the copy, I'll leave this part in because it might still be useful for some people:
To fix the whitespace, you can change your copy statement:
copy <table_name> (FIELD1, FIELD2, MYFIELD3 AS FILLER VARCHAR(50), FIELD4, FIELD3 AS NVL(MYFIELD3,'') ) from stdin
By using filler, it will parse that into something like a variable which you can then assign to your actual table field using AS later in the copy.
As for any gotchas... I do what you have on Solaris often. The only one thing I noticed is you are setting the record terminator, not sure if this is really something you need to do depending on environment or not. I've never had to do it switching between linux, windows and solaris.
Also, one hint, this will return a resultset that will tell you how many rows were loaded. Do a fetchone() and print it out and you'll see it.
The only other thing I can recommend might be to use reject tables in case any rows reject.
You mentioned that it is a large job. You may need to increase your read timeout by adding 'read_timeout': 7200, to your connection or more. I'm not sure if None would disable the read timeout or not.
As for a faster way... if the file is accessible directly on the vertica node itself, you could just reference the file directly in the copy instead of doing a copy from stdin and have the daemon load it directly. It's much faster and has a number of optimizations that you can do. You could then use apportioned load, and if you have multiple files to load you can just reference them all together in a list of files.
It's kind of a long topic, though. If you have any specific questions let me know.
I am trying to read a BLOB type (containing a PNG image) from a row in a table from a SQLite3.db file using Python 2.7, then write that data to a new image file. I'm having a tough time making sense of how to accomplish this. This is real scratch code of essentially what I would like to accomplish...
c = conn.cursor()
data = c.execute("SELECT image_data FROM favicon_bitmaps WHERE index='26'")
x = open('C:\file.png', 'w')
x.write(data)
x.close
The c.execute is just returning some kind of cursor object and i'm not sure how to get at my data. I looked at the cursor object methods over at one of the documentation pages and my eyes just kind of glassed over... I'm not super familiar with working with SQLite3 DBs in Python, any pointers or thoughts are greatly appreciated! Thanks a bunch!
Don't use the return value of cursor.execute(); normally you ignore it completely (and really it would have been more Pythonic of it to just return None.)
Instead, call the .fetchone() method of the cursor after calling .execute(), which returns a Row object:
c = conn.cursor()
c.execute("SELECT image_data FROM favicon_bitmaps WHERE index='26'")
data = c.fetchone()
with open('C:\file.png', 'wb') as x:
x.write(data[0])
Newbie to sql and sqlite.
I'm trying to save a database, then copy the file.db to another folder and open it. So far I created the database, copy and pasted the file.db to another folder but when I try to access the database the output says that it is empty.
So far I have
from pysqlite2 import dbapi2 as sqlite
conn = sqlite.connect('db1Thu_04_Aug_2011_14_20_15.db')
c = conn.cursor()
print c.fetchall()
and the output is
[]
You need something like
c.execute("SELECT * FROM mytable")
for row in c:
#process row
I will echo Mat and point out that is not valid syntax. More than that, you do not include any select request (or other sql command) in your example. If you actually do not have a select statement in your code, and you run fetchall on a newly created cursor, you can expect to get an empty list, which seems to be what you have.
Finally, do make sure that you are opening the file from the right directory. If you tell sqlite to open a nonexistent file, it will happily create a new, empty one for you.