django panda read sql query map parameters - python

I am trying to connect sql server database within django framework, to read sql query result into panda dataframe
from django.db import connections
query = """SELECT * FROM [dbo].[table] WHERE project=%(Name)s"""
data = pd.read_sql(query, connections[database], params={'Name': input} )
the error message I got is 'format requires a mapping'
if I do it something like below, it will work, but I really want to be able to map each parameter with names:
from django.db import connections
query = """SELECT * FROM [dbo].[table] WHERE project=%s"""
data = pd.read_sql(query, connections[database], params={input} )
I was using odbc driver 17 for sql server

you can format at string level and then run pd.read_sql

Related

Connect to SQL Server and run query as "passthrough" from Python

I currently have code that executes queries on data stored on a SQL Server database, such as the following:
import pyodbc
conn = pyodbc.connect(
r'DRIVER={SQL Server};'
r'SERVER=SQL2SRVR;'
r'DATABASE=DBO732;'
r'Trusted_Connection=yes;'
)
sqlstr = '''
SELECT Company, Street_Address, City, State
FROM F556
WHERE [assume complicated criteria statement here]
'''
crsr = conn.cursor()
for row in crsr.execute(sqlstr):
print(row.Company, row.Street_Address, row.City, row.State)
I can't find documentation online of whether pyodbc can (or is by default) running my queries on the SQL Server (as passthrough queries), or whether (if pyodbc can't do that) there is another way (maybe sqlalchemy or similar?) of doing that. Any insight?
Or is there a way to execute passthrough queries directly from Pandas?
If you are working with pandas and SQL Server then you should already have created a SQLAlchemy Engine object (usually named engine). To execute a raw DML statement you can use the construct
with engine.begin() as conn:
conn.execute("UPDATE table_name SET column_name ...")
print("table updated")

Utility to find join columns

I have been given several tables in SQL Server and am trying to figure out the best way to join them.
What I've done is:
1) open a connection in R to the database
2) pull all the column names from the INFORMATION_SCHEMA.COLUMNS table
3) build loops in R to try every combination of columns and see what the row count is of the inner join of the 2 columns
I'm wondering if there's a better way to do this or if there's a package or utility that helps with this type of problem.
You could do your joins in python using pandas. Pandas has a powerful IO engine, so you could import from SQL Server into a pandas dataframe, perform your joins with python and write back to SQL server.
Below is a script I use to perform an import from SQL Server and an export to a MySQL table. I use the python package sqlalchemy for my ORM connections. You could follow this example and read up on joins in pandas.
import pyodbc
import pandas as pd
from sqlalchemy import create_engine
# MySQL info
username = 'user'
password = 'pw'
sqlDB = 'mydb'
# Create MSSQL PSS Connector
server = 'server'
database = 'mydb'
connMSSQL = pyodbc.connect(
'DRIVER={ODBC Driver 13 for SQL Server};' +
f'SERVER={server};PORT=1433;DATABASE={database};Trusted_Connection=yes;')
# Read Table into pandas dataframe
tsql = '''
SELECT [Index],
Tag,
FROM [dbo].[Tags]
'''
df = pd.read_sql(tsql, connMSSQL, index_col='Index')
# Write df to MySQL db
engine = create_engine(
f'mysql+mysqldb://{username}:{password}#localhost/mydb', pool_recycle=3600)
with engine.connect() as connMySQL:
df.to_sql('pss_alarms', connMySQL, if_exists='replace')

Python to SQL Server Insert

I'm trying to follow the method for inserting a Panda data frame into SQL Server that is mentioned here as it appears to be the fastest way to import lots of rows.
However I am struggling with figuring out the connection parameter.
I am not using DSN , I have a server name, a database name, and using trusted connection (i.e. windows login).
import sqlalchemy
import urllib
server = 'MYServer'
db = 'MyDB'
cxn_str = "DRIVER={SQL Server Native Client 11.0};SERVER=" + server +",1433;DATABASE="+db+";Trusted_Connection='Yes'"
#cxn_str = "Trusted_Connection='Yes',Driver='{ODBC Driver 13 for SQL Server}',Server="+server+",Database="+db
params = urllib.parse.quote_plus(cxn_str)
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
conn = engine.connect().connection
cursor = conn.cursor()
I'm just not sure what the correct way to specify my connection string is. Any suggestions?
I have been working with pandas and SQL server for a while and the fastest way I found to insert a lot of data in a table was in this way:
You can create a temporary CSV using:
df.to_csv('new_file_name.csv', sep=',', encoding='utf-8')
Then use pyobdc and BULK INSERT Transact-SQL:
import pyodbc
conn = pyodbc.connect(DRIVER='{SQL Server}', Server='server_name', Database='Database_name', trusted_connection='yes')
cur = conn.cursor()
cur.execute("""BULK INSERT table_name
FROM 'C:\\Users\\folders path\\new_file_name.csv'
WITH
(
CODEPAGE = 'ACP',
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)""")
conn.commit()
cur.close()
conn.close()
Then you can delete the file:
import os
os.remove('new_file_name.csv')
It was a second to charge a lot of data at once into SQL Server. I hope this gives you an idea.
Note: don't forget to have a field for the index. It was my mistake when I started to use this lol.
Connection string parameter values should not be enclosed in quotes so you should use Trusted_Connection=Yes instead of Trusted_Connection='Yes'.

Create a schema in SQL Server using pyodbc

I am using pyodbc to read from a SQL Server database and create analogous copies of the same structure in a different database somewhere else.
Essentially:
for db in source_dbs:
Execute('create database [%s]' % db) # THIS WORKS.
for schema in db:
# The following result in an error starting with:
# [42000] [Microsoft][ODBC SQL Server Driver][SQL Server]
Execute('create schema [%s].[%s]' % (db, schema)
# Incorrect syntax near '.'
Execute('use [%s]; create schema [%s]' %(db, schema)
# CREATE SCHEMA' must be the first statement in a query batch.
In this example, you can assume that Execute creates a cursor using pyodbc and executes the argument SQL string.
I'm able to create the empty databases, but I can't figure out how to create the schemas within them.
Is there a solution, or is this a limitation of using pyodbc with MS SQL Server?
EDIT: FWIW - I also tried to pass the database name to Execute, so I could try to set the database name in the connection string. This doesn't work either - it seems to ignore the database name completely.
Python database connections usually default to having transactions enabled (autocommit == False) and SQL Server tends to dislike certain DDL commands being executed in a transaction.
I just tried the following and it worked for me:
import pyodbc
connStr = (
r"Driver={SQL Server Native Client 10.0};"
r"Server=(local)\SQLEXPRESS;"
r"Trusted_connection=yes;"
)
cnxn = pyodbc.connect(connStr, autocommit=True)
crsr = cnxn.cursor()
crsr.execute("CREATE DATABASE pyodbctest")
crsr.execute("USE pyodbctest")
crsr.execute("CREATE SCHEMA myschema")
crsr.close()
cnxn.close()

Pyodbc Accessing Multiple Databases on same server

I'm tasked with obtaining data from two MS SQL databases on the same server so i can run a single query that uses info from both databases simultaneously. I am trying to achieve this in python 2.7 with pyodbc 3.0.7. My query would look like this:
Select forcast.WindGust_Forecast, forcast.Forecast_Date, anoSection.SectionName, refTable.WindGust
FROM [EO1D].[dbo].[Dashboard_Forecast] forcast
JOIN [EO1D].[dbo].[Dashboard_AnoSections] anoSection
ON forcast.Section_ID = anoSection.Record_ID
JOIN [EO1D].[dbo].[Dashboard_AnoCircuits] anoCircuits
ON anoSection.Circuit_Number = anoCircuits.Circuit_Number
JOIN [FTSAutoCaller].[dbo].[ReferenceTable] refTable
ON anoCircuits.StationCode = refTable.StationCode
Where refTable.Circuit IS NOT NULL and refTable.StationCode = 'sil'
the typical connection for pyodbc looks like:
cnxn = pyodbc.connect('DRIVER{SQLServer};SERVER=SQLSRV01;DATABASE=DATABASE;UID=USER;PWD=PASSWORD')
Which would only allow access to the database name provided.
how would I go about setting up a connection that allows me access to both databases so this query can be ran. The two database names in my case are EO1D and FTSAutoCaller.
you're overthinking it. If you setup the connection as you did above, and then simply pass the sql along to a cursor it should work.
import pyodbc
conn_string = '<removed>'
conn = pyodbc.connect(conn_string)
cur = conn.cursor()
query = 'select top 10 * from table1 t1 inner join database2..table2 t2 on t1.id = t2.id'
cur.execute(query)
and you are done (tested in my own environment, clearly the connection string and query were different, but it did work.)
The query takes care of its self although I only referenced one of the tables in the connection the query didnt have an issue connecting to both of the database. Not 100% sure but im assuming it worked because of the prefixed in "[ ]"

Categories

Resources