How to write python sql output into CSV using a dataframe - python

IMPORT MODULES
import pyodbc
import pandas as pd
import csv
CREATE CONNECTION TO MICROSOFT SQL SERVER
msconn = pyodbc.connect(driver='{SQL Server}',
server='SERVER',
database='DATABASE',
trusted_msconnection='yes')
cursor = msconn.cursor()
CREATE VARIABLES THAT HOLD SQL STATEMENTS
SCRIPT = "SELECT * FROM TABLE"
PRINT DATA
cursor.execute(SCRIPT)
cursor.commit
for row in cursor:
print (row)
WRITE ALL ROWS WITH COLUMN NAME TO CSV --- NEED HELP HERE

Pandas
Since pandas support direct import from an RDBMS with the name being called read_sql you don't need to write this manually.
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine('mssql+pyodbc://user:pass#mydsn')
df = pd.read_sql(sql='SELECT * FROM ...', con=engine)
The right tool: odo
From odo docs
Loading CSV files into databases is a solved problem. It’s a problem
that has been solved well. Instead of rolling our own loader every
time we need to do this and wasting computational resources, we should
use the native loaders in the database of our choosing.
And it works the other way round also.
from odo import odo
odo('mssql+pyodbc://user:pass#mydsn::tablename','myfile.csv')

#e4c5's answer is great as it should be faster compared to for loop + cursor - i would extend it with saving result set to CSV:
...
pd.read_sql(sql='SELECT * FROM TABLE', con=msconn) \
.to_csv('/path/to/file.csv', index=False)
if you want to read all rows (not specifying WHERE clause):
pd.read_sql_table('TABLE', con=msconn).to_csv('/path/to/file.csv', index=False)

Related

read.sql_query works, read sql_table doesn't

Trying to import a table from a SQLite into Pandas DF:
import pandas as pd
import sqlite3
cnxn = sqlite3.Connection("my_db.db")
c = cnxn.cursor()
Using this command works: pd.read_sql_query('select * from table1', con=cnxn). This doesn't : df = pd.read_sql_table('table1', con=cnxn).
Response :
ValueError: Table table1 not found
What could be the issue?
Using SQLite in Python the pd.read_sql_table() is not possible. Info found in Pandas doc.
Hence it's considered to be a DB-API when running the commands thru Python.
pd.read_sql_table() Documentation
Given a table name and a SQLAlchemy connectable, returns a DataFrame.
This function does not support DBAPI connections.

Extract from DataBase using Python

I wrote a small script in Python that could help me to extract data from a database. Here is my script :
#!/usr/bin/python3
import pandas as pd
from sqlalchemy import create_engine
#connect to server
mytab = create_engine('mssql+pyodbc://test:test1#mypass')
#sql query that retrieves my table
df = pd.read_sql('select * from FO_INV', mytab)
#query result to excel file
df.to_csv('inventory.csv', index=False, sep=',', encoding='utf-8')
Everything works fine if I choose to select top 100 rows for example. But for the whole table, it take forever !!!
Do you have any idea or recommendations, please ?
Thank you in advance :)
I would suggest using pyodbc instead of SQLALCHEMY.
Something like this:
import pyodbc
mytab = pyodbc.connect('DRIVER={SQL SERVER};SERVER=.\;DATABASE=myDB;UID=user;PWD=pwd')
Check your timings with this. This should be faster.

Python - read parquet file without pandas

Currently I'm using the code below on Python 3.5, Windows to read in a parquet file.
import pandas as pd
parquetfilename = 'File1.parquet'
parquetFile = pd.read_parquet(parquetfilename, columns=['column1', 'column2'])
However, I'd like to do so without using pandas. How to best do this? I'm using both Python 2.7 and 3.6 on Windows.
You can use duckdb for this. It's an embedded RDBMS similar to SQLite but with OLAP in mind. There's a nice Python API and a SQL function to import Parquet files:
import duckdb
conn = duckdb.connect(":memory:") # or a file name to persist the DB
# Keep in mind this doesn't support partitioned datasets,
# so you can only read one partition at a time
conn.execute("CREATE TABLE mydata AS SELECT * FROM parquet_scan('/path/to/mydata.parquet')")
# Export a query as CSV
conn.execute("COPY (SELECT * FROM mydata WHERE col = 'val') TO 'col_val.csv' WITH (HEADER 1, DELIMITER ',')")

How to commit df to SQL database using pyodbc?

I have a connection to a database (using pyodbc) and I need to commit a df to a new table. I've done this with SQL, but don't know how to do it with a df. Any ideas on how to alter the below code to make it work for a df?
code for SQL:
import pyodbc
import pandas as pd
conn= pyodbc.connect(r'DRIVER={Teradata};DBCNAME=foo; UID=name; PWD=password;QUIETMODE=YES;Trusted_Connection=yes')
cursor = conn.cursor()
cursor.execute(
"""
CREATE TABLE SCHEMA.NEW_TABLE AS
(
SELECT ... FROM ....
)
"""
)
conn.commit()
I tried this code, no errors but didn't create in the database:
import pyodbc
import pandas as pd
conn= pyodbc.connect(r'DRIVER={Teradata};DBCNAME=foo; UID=name; PWD=password;QUIETMODE=YES;Trusted_Connection=yes')
sheet1.to_sql(con=conn, name='new_table', schema='Schema', if_exists='replace', index=False)
The documentation for to_sql() clearly states:
con : SQLAlchemy engine or DBAPI2 connection (legacy mode)
Using SQLAlchemy makes it possible to use any DB supported by that
library. If a DBAPI2 object, only sqlite3 is supported.
Thus, you need to pass a SQLAlchemy engine to the to_sql() function to write from Pandas directly to your Teradata database.
Another way would be to dump the data to a different data structure (e.g. to_dict()) and then use pyODBC to perform DML statements on the database, preferably using binding variables to speed up processing.

Pandas writing dataframe to other postgresql schema

I am trying to write a pandas DataFrame to a PostgreSQL database,
using a schema-qualified table.
I use the following code:
import pandas.io.sql as psql
from sqlalchemy import create_engine
engine = create_engine(r'postgresql://some:user#host/db')
c = engine.connect()
conn = c.connection
df = psql.read_sql("SELECT * FROM xxx", con=conn)
df.to_sql('a_schema.test', engine)
conn.close()
What happens is that pandas writes in schema "public", in a table named 'a_schema.test',
instead of writing in the "test" table in the "a_schema" schema.
How can I instruct pandas to use a schema different than public?
Thanks
Update: starting from pandas 0.15, writing to different schema's is supported. Then you will be able to use the schema keyword argument:
df.to_sql('test', engine, schema='a_schema')
Writing to different schema's is not yet supported at the moment with the read_sql and to_sql functions (but an enhancement request has already been filed: https://github.com/pydata/pandas/issues/7441).
However, you can get around for now using the object interface with PandasSQLAlchemy and providing a custom MetaData object:
meta = sqlalchemy.MetaData(engine, schema='a_schema')
meta.reflect()
pdsql = pd.io.sql.PandasSQLAlchemy(engine, meta=meta)
pdsql.to_sql(df, 'test')
Beware! This interface (PandasSQLAlchemy) is not yet really public and will still undergo changes in the next version of pandas, but this is how you can do it for pandas 0.14.
Update: PandasSQLAlchemy is renamed to SQLDatabase in pandas 0.15.
Solved, thanks to joris answer.
Code was also improved thanks to joris comment, by passing around sqlalchemy engine instead of connection objects.
import pandas as pd
from sqlalchemy import create_engine, MetaData
engine = create_engine(r'postgresql://some:user#host/db')
meta = sqlalchemy.MetaData(engine, schema='a_schema')
meta.reflect(engine, schema='a_schema')
pdsql = pd.io.sql.PandasSQLAlchemy(engine, meta=meta)
df = pd.read_sql("SELECT * FROM xxx", con=engine)
pdsql.to_sql(df, 'test')

Categories

Resources