Python: Load Oracle table directly from Pandas (write to Oracle) - python

Is there any way to load data directly into an Oracle table from Pandas.
Currently I'm writing the data set into a csv file and then loading the table. I would like to bypass "writing to csv" step.
I'm using cx_Oracle for connecting to Oracle database. url is passed as a parameter when calling the python script. result will be stored as a pandas dataframe in variable dataset . The layout of dataset and table definition are same.
import cx_Oracle as cx
response = requests.get(url)
data = response.json()
dataset = json_normalize(data['results'])
Please let me know if you require any further .

Did you try to_sql function from pandas module?
from sqlalchemy import create_engine
engine = create_engine('oracle://[user]:[pass]#[host]:[port]/[schema]', echo=False)
dataset.to_sql(name='target_table',con=engine ,if_exists = 'append', index=False)

Related

Python replacement for SQL bcp.exe

The goal is to load a csv file into an Azure SQL database from Python directly, that is, not by calling bcp.exe. The csv files will have the same number of fields as do the destination tables. It'd be nice to not have to create the format file bcp.exe requires (xml for +-400 fields for each of 16 separate tables).
Following the Pythonic approach, try to insert the data and ask SQL Server to throw an exception if there is a type mismatch, or other.
If you don't want use bcp cammand to import the csv file, you can using Python pandas library.
Here's the example that I import a no header 'test9.csv' file on my computer to Azure SQL database.
Csv file:
Python code example:
import pandas as pd
import sqlalchemy
import urllib
import pyodbc
# set up connection to database (with username/pw if needed)
params = urllib.parse.quote_plus("Driver={ODBC Driver 17 for SQL Server};Server=tcp:***.database.windows.net,1433;Database=Mydatabase;Uid=***#***;Pwd=***;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;")
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
# read csv data to dataframe with pandas
# datatypes will be assumed
# pandas is smart but you can specify datatypes with the `dtype` parameter
df = pd.read_csv (r'C:\Users\leony\Desktop\test9.csv',header=None,names = ['id', 'name', 'age'])
# write to sql table... pandas will use default column names and dtypes
df.to_sql('test9',engine,if_exists='append',index=False)
# add 'dtype' parameter to specify datatypes if needed; dtype={'column1':VARCHAR(255), 'column2':DateTime})
Notice:
get the connect string on Portal.
UID format is like [username]#[servername].
Run this scripts and it works:
Please reference these documents:
HOW TO IMPORT DATA IN PYTHON
pandas.DataFrame.to_sql
Hope this helps.

Import csv file data to plsql table using python

I have a csv file which contains 60000 rows. I need to insert this data into postgres database table. Is there any way to do this to reduce time to insert data from file to database without looping? Please help me
Python Version : 2.6
Database : postgres
table: keys_data
File Structure
1,ED2,'FDFDFDFDF','NULL'
2,ED2,'SDFSDFDF','NULL
Postgres can read CSV directly into a table with the COPY command. This either requires you to be able to place files directly on the Postgres server, or data can be piped over a connection with COPY FROM STDIN.
The \copy command in Postgres' psql command-line client will read a file locally and insert using COPY FROM STDIN so that's probably the easiest (and still fastest) way to do this.
Note: this doesn't require any use of Python, it's native functionality in Postgres and not all or most other RDBs have the same functionality.
I've performed similar task, the only exception is that my solution is python 3.x based. I am sure you can find equivalent code of this solution. Code is pretty self explanatory.
from sqlalchemy import create_engine
def insert_in_postgre(table_name, df):
#create engine object
engine = create_engine('postgresql+psycopg2://user:password#hostname/database_name')
#push dataframe in given database engine
df.head(0).to_sql(table_name, engine, if_exists='replace',index=False )
conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, table_name, null="")
conn.commit()
cur.close()

How i can insert data from dataframe(in python) to greenplum table?

Problem Statement:
I have multiple csv files. I am cleaning them using python and inserting them to SQL server using bcp. Now I want to insert that into Greenplum instead of SQL Server. Please suggest a way to bulk insert into greenplum table directly from python data-frame to GreenPlum table.
Solution: (What i can think)
Way i can think is CSV-> Dataframe -> Cleainig -> Dataframe -> CSV -> then Use Gpload for Bulk load. And integrate it in Shell script for automation.
Do anyone has a good solution for it.
Issue in loading data directly from dataframe to gp table:
As gpload ask for the file path. Can i pass a varibale or dataframe to that? Is there any way to bulkload into greenplum ?I dont want to create a csv or txt file from dataframe and then load it to greenplum.
I would use psycopg2 and the io libraries to do this. io is built-in and you can install psycopg2 using pip (or conda).
Basically, you write your dataframe to a string buffer ("memory file") in the csv format. Then you use psycopg2's copy_from function to bulk load/copy it to your table.
This should get you started:
import io
import pandas
import psycopg2
# Write your dataframe to memory as csv
csv_io = io.StringIO()
dataframe.to_csv(csv_io, sep='\t', header=False, index=False)
csv_io.seek(0)
# Connect to the GreenPlum database.
greenplum = psycopg2.connect(host='host', database='database', user='user', password='password')
gp_cursor = greenplum.cursor()
# Copy the data from the buffer to the table.
gp_cursor.copy_from(csv_io, 'db.table')
greenplum.commit()
# Close the GreenPlum cursor and connection.
gp_cursor.close()
greenplum.close()

How to write python sql output into CSV using a dataframe

IMPORT MODULES
import pyodbc
import pandas as pd
import csv
CREATE CONNECTION TO MICROSOFT SQL SERVER
msconn = pyodbc.connect(driver='{SQL Server}',
server='SERVER',
database='DATABASE',
trusted_msconnection='yes')
cursor = msconn.cursor()
CREATE VARIABLES THAT HOLD SQL STATEMENTS
SCRIPT = "SELECT * FROM TABLE"
PRINT DATA
cursor.execute(SCRIPT)
cursor.commit
for row in cursor:
print (row)
WRITE ALL ROWS WITH COLUMN NAME TO CSV --- NEED HELP HERE
Pandas
Since pandas support direct import from an RDBMS with the name being called read_sql you don't need to write this manually.
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('mssql+pyodbc://user:pass#mydsn')
df = pd.read_sql(sql='SELECT * FROM ...', con=engine)
The right tool: odo
From odo docs
Loading CSV files into databases is a solved problem. It’s a problem
that has been solved well. Instead of rolling our own loader every
time we need to do this and wasting computational resources, we should
use the native loaders in the database of our choosing.
And it works the other way round also.
from odo import odo
odo('mssql+pyodbc://user:pass#mydsn::tablename','myfile.csv')
#e4c5's answer is great as it should be faster compared to for loop + cursor - i would extend it with saving result set to CSV:
...
pd.read_sql(sql='SELECT * FROM TABLE', con=msconn) \
.to_csv('/path/to/file.csv', index=False)
if you want to read all rows (not specifying WHERE clause):
pd.read_sql_table('TABLE', con=msconn).to_csv('/path/to/file.csv', index=False)

Pandas writing dataframe to other postgresql schema

I am trying to write a pandas DataFrame to a PostgreSQL database,
using a schema-qualified table.
I use the following code:
import pandas.io.sql as psql
from sqlalchemy import create_engine
engine = create_engine(r'postgresql://some:user#host/db')
c = engine.connect()
conn = c.connection
df = psql.read_sql("SELECT * FROM xxx", con=conn)
df.to_sql('a_schema.test', engine)
conn.close()
What happens is that pandas writes in schema "public", in a table named 'a_schema.test',
instead of writing in the "test" table in the "a_schema" schema.
How can I instruct pandas to use a schema different than public?
Thanks
Update: starting from pandas 0.15, writing to different schema's is supported. Then you will be able to use the schema keyword argument:
df.to_sql('test', engine, schema='a_schema')
Writing to different schema's is not yet supported at the moment with the read_sql and to_sql functions (but an enhancement request has already been filed: https://github.com/pydata/pandas/issues/7441).
However, you can get around for now using the object interface with PandasSQLAlchemy and providing a custom MetaData object:
meta = sqlalchemy.MetaData(engine, schema='a_schema')
meta.reflect()
pdsql = pd.io.sql.PandasSQLAlchemy(engine, meta=meta)
pdsql.to_sql(df, 'test')
Beware! This interface (PandasSQLAlchemy) is not yet really public and will still undergo changes in the next version of pandas, but this is how you can do it for pandas 0.14.
Update: PandasSQLAlchemy is renamed to SQLDatabase in pandas 0.15.
Solved, thanks to joris answer.
Code was also improved thanks to joris comment, by passing around sqlalchemy engine instead of connection objects.
import pandas as pd
from sqlalchemy import create_engine, MetaData
engine = create_engine(r'postgresql://some:user#host/db')
meta = sqlalchemy.MetaData(engine, schema='a_schema')
meta.reflect(engine, schema='a_schema')
pdsql = pd.io.sql.PandasSQLAlchemy(engine, meta=meta)
df = pd.read_sql("SELECT * FROM xxx", con=engine)
pdsql.to_sql(df, 'test')

Categories

Resources