I want to connect oracle database to python and using select statement whatever result i will get, I want that result to be exported as csv file in sftp location.
I know we can connect oracle with python using cx_oracle package.
But my concern is to get data from oracle to csv and export that.
Also to make a note my data is bigger in size.
Can anyone help me to get the solution which could be fast as well.
Thanks in advance.
Start with something like this, and change to meet your requirements:
import cx_Oracle
import csv
import sys, os
if sys.platform.startswith("darwin"):
cx_Oracle.init_oracle_client(lib_dir=os.environ.get("HOME")+"/Downloads/instantclient_19_8")
connection = cx_Oracle.connect(user='cj', password='cj', dsn='localhost/orclpdb1')
cursor = connection.cursor()
cursor.arraysize = 5000
with open("testpy.csv", "w", encoding='utf-8') as outputfile:
writer = csv.writer(outputfile, lineterminator="\n")
results = cursor.execute('select * from all_objects where rownum < 10000000 order by object_id')
writer.writerows(results)
In particular, you will want to tune the arraysize value.
See Tuning cx_Oracle.
Related
Can someone point me in the right direction on how to open a .mdb file in python? I normally like including some code to start off a discussion, but I don't know where to start. I work with mysql a fair bit with python. I was wondering if there is a way to work with .mdb files in a similar way?
Below is some code I wrote for another SO question.
It requires the 3rd-party pyodbc module.
This very simple example will connect to a table and export the results to a file.
Feel free to expand upon your question with any more specific needs you might have.
import csv, pyodbc
# set up some constants
MDB = 'c:/path/to/my.mdb'
DRV = '{Microsoft Access Driver (*.mdb)}'
PWD = 'pw'
# connect to db
con = pyodbc.connect('DRIVER={};DBQ={};PWD={}'.format(DRV,MDB,PWD))
cur = con.cursor()
# run a query and get the results
SQL = 'SELECT * FROM mytable;' # your query goes here
rows = cur.execute(SQL).fetchall()
cur.close()
con.close()
# you could change the mode from 'w' to 'a' (append) for any subsequent queries
with open('mytable.csv', 'w') as fou:
csv_writer = csv.writer(fou) # default field-delimiter is ","
csv_writer.writerows(rows)
There's the meza library by Reuben Cummings which can read Microsoft Access databases through mdbtools.
Installation
# The mdbtools package for Python deals with MongoDB, not MS Access.
# So install the package through `apt` if you're on Debian/Ubuntu
$ sudo apt install mdbtools
$ pip install meza
Usage
>>> from meza import io
>>> records = io.read('database.mdb') # only file path, no file objects
>>> print(next(records))
Table1
Table2
…
This looks similar to a previous question:
What do I need to read Microsoft Access databases using Python?
http://code.activestate.com/recipes/528868-extraction-and-manipulation-class-for-microsoft-ac/
Answer there should be useful.
For a solution that works on any platform that can run Java, consider using Jython or JayDeBeApi along with the UCanAccess JDBC driver. For details, see the related question
Read an Access database in Python on non-Windows platform (Linux or Mac)
In addition to bernie's response, I would add that it is possible to recover the schema of the database. The code below lists the tables (b[2] contains the name of the table).
con = pyodbc.connect('DRIVER={};DBQ={};PWD={}'.format(DRV,MDB,PWD))
cur = con.cursor()
tables = list(cur.tables())
print 'tables'
for b in tables:
print b
The code below lists all the columns from all the tables:
colDesc = list(cur.columns())
This code will convert all the tables to CSV.
Happy Coding
for tbl in mdb.list_tables("file_name.MDB"):
df = mdb.read_table("file_name.MDB", tbl)
df.to_csv(tbl+'.csv')
I have a bz2 file (I have never worked with such files). When I manually unzip it, I see it's a sqlite db with several tables in it, but I don't know how to connect to it all from python without having to unzip it manually (I have many dbs so it has to be automated in the script). So far, I have tried the following but get an error.
import bz2
import sqlite3
zipfile = bz2.BZ2File("file.sqlite.bz2")
connection = sqlite3.connect(zipfile.read())
query = "SELECT * FROM sqlite_master WHERE type='table';"
cursor = connection.execute(query)
cursor.fetchall()
[]
But, when I do the same query for the unzipped file I do get all the tables.
If you can use apsw instead of the standard python library's sqlite3 module, it's possible to open an in-memory representation of a database (Like the bytes returned by BZ2File.read():
#!/usr/bin/env python3
import bz2
import apsw
zipfile = bz2.BZ2File("file.sqlite.bz2")
db = apsw.Connection(":memory:")
db.deserialize("main", zipfile.read())
query = "SELECT * FROM sqlite_master WHERE type='table';"
cursor = db.cursor()
for row in cursor.execute(query):
print(row)
Otherwise, since the standard bindings don't support Sqlite3's serialization functions, you'll have to save the decompressed database to a temporary file, and connect to that.
I'm looking for an efficient way to import data from a CSV file to a Postgresql table using python in batches as I have quite large files and the server I'm importing the data to is far away. I need an efficient solution as everything I tried was either slow or just didn't work. I'm using SQLlahcemy.
I wanted to use raw SQL but it's so hard to parameterize and I need multiple loops to execute the query for multiple rows
I was given the task of manipulating & migrating some data from CSV files into a remote Postgres Instance.
I decided to use the Python script below:
import csv
import uuid
import psycopg2
import psycopg2.extras
import time
#Instant Time at the start of the Script
start = time.time()
psycopg2.extras.register_uuid()
#List of CSV Files that I want to manipulate & migrate.
file_list=["Address.csv"]
conn = psycopg2.connect("host=localhost dbname=address user=postgres password=docker")
cur = conn.cursor()
i = 1
for f in file_list:
f = open(f)
csv_f = csv.reader(f)
next(csv_f)
for row in csv_f:
# Some simple manipulations on each row
#Inserting a uuid4 into the first column
row.pop(0)
row.insert(0,uuid.uuid4())
row.pop(10)
row.insert(10,False)
row.pop(13)
#Tracking the number of rows inserted
print(i)
i = i + 1
#INSERT QUERY
postgres_insert_query = """ INSERT INTO "public"."address"("address_id","address_line_1","locality_area_street","address_name","app_version","channel_type","city","country","created_at","first_name","is_default","landmark","last_name","mobile","pincode","territory","updated_at","user_id") VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s)"""
record_to_insert = row
cur.execute(postgres_insert_query,record_to_insert)
f.close()
conn.commit()
conn.close()
print(time.time()-start)
The script worked quite well and promptly when testing it locally. But connecting to a remote Database Server added a lot more latency.
As a workaround, I migrated the manipulated data into my local postgres instance.
I then generated a .sql file of the migrated data & manually imported the .sql file on the remote server.
Alternatively, you can also use Python's Multithreading features, to launch multiple concurrent connections to the remote server and dedicate an isolated batch process to each connection, and flush the data.
This should make your migration considerably faster.
I have personally not tried the multi threading approach as it wasn't required in my case. But it seems darn efficient.
Hope this helped ! :)
Resources:
CSV Manipulation using Python for Beginners.
use copy_from command, it copies all the rows to table.
path=open('file.csv','r')
next(path)
cur.copy_from(path,'table_name',columns=('id','name','email'))
I'm appreciate to your help in my issue.
I want to append my response query into exist CSV file. I'm implement this by this example but from unknown reason - the output file stay empty.
This is minimal snippet of my code:
import psycopg2
import numpy as np
# Connect to an existing database
conn = psycopg2.connect(dbname="...", user="...", password="...", host="...")
# Open a cursor to perform database operations
cur = conn.cursor()
f = open('file_name.csv', 'ab') # "ab" for appending
cur.execute("""select * from table limit 10""") # I have another query here but it isn't relevant.
cur_out = np.asarray(cur.fetchall())
Until here it works perfect. When I print(cur_out), I got desired output. But in the next step:
np.savetxt(f, cur_out, delimiter=",", fmt='%s')
The file stay empty and I didn't find the reason for that.
Can you help me please?
Thankes for helpers.
I don't know how to tell you. But your code works for me perfectly.
My suggestions:
If you have existing files in this name, try to delete them and execute your code again,
Try to change the batch size.
Try to change the postfix of the file name to txt or dat.
Best wishes.
np.savetxt('file_name.csv', cur_out, delimiter=",", fmt='%s')
Can someone point me in the right direction on how to open a .mdb file in python? I normally like including some code to start off a discussion, but I don't know where to start. I work with mysql a fair bit with python. I was wondering if there is a way to work with .mdb files in a similar way?
Below is some code I wrote for another SO question.
It requires the 3rd-party pyodbc module.
This very simple example will connect to a table and export the results to a file.
Feel free to expand upon your question with any more specific needs you might have.
import csv, pyodbc
# set up some constants
MDB = 'c:/path/to/my.mdb'
DRV = '{Microsoft Access Driver (*.mdb)}'
PWD = 'pw'
# connect to db
con = pyodbc.connect('DRIVER={};DBQ={};PWD={}'.format(DRV,MDB,PWD))
cur = con.cursor()
# run a query and get the results
SQL = 'SELECT * FROM mytable;' # your query goes here
rows = cur.execute(SQL).fetchall()
cur.close()
con.close()
# you could change the mode from 'w' to 'a' (append) for any subsequent queries
with open('mytable.csv', 'w') as fou:
csv_writer = csv.writer(fou) # default field-delimiter is ","
csv_writer.writerows(rows)
There's the meza library by Reuben Cummings which can read Microsoft Access databases through mdbtools.
Installation
# The mdbtools package for Python deals with MongoDB, not MS Access.
# So install the package through `apt` if you're on Debian/Ubuntu
$ sudo apt install mdbtools
$ pip install meza
Usage
>>> from meza import io
>>> records = io.read('database.mdb') # only file path, no file objects
>>> print(next(records))
Table1
Table2
…
This looks similar to a previous question:
What do I need to read Microsoft Access databases using Python?
http://code.activestate.com/recipes/528868-extraction-and-manipulation-class-for-microsoft-ac/
Answer there should be useful.
For a solution that works on any platform that can run Java, consider using Jython or JayDeBeApi along with the UCanAccess JDBC driver. For details, see the related question
Read an Access database in Python on non-Windows platform (Linux or Mac)
In addition to bernie's response, I would add that it is possible to recover the schema of the database. The code below lists the tables (b[2] contains the name of the table).
con = pyodbc.connect('DRIVER={};DBQ={};PWD={}'.format(DRV,MDB,PWD))
cur = con.cursor()
tables = list(cur.tables())
print 'tables'
for b in tables:
print b
The code below lists all the columns from all the tables:
colDesc = list(cur.columns())
This code will convert all the tables to CSV.
Happy Coding
for tbl in mdb.list_tables("file_name.MDB"):
df = mdb.read_table("file_name.MDB", tbl)
df.to_csv(tbl+'.csv')