I have a doubt about python and sqlite3.
import sqlite3
conna= sqlite3.connect('db_a')
a = conna.cursor()
connb= sqlite3.connect('db_b')
b = conna.cursor()
I don't know how to ask the relational question between banks, can someone instruct me?
I don't want to use DEF, just the SELECT code for a variable to assume
query = """SELECT COL1 FROM TABLE1.DB_A WHERE NOT EXISTS (SELECT COL1 FROM TABLE2.DB_B WHERE COL1.TABLE2.DE_B = COL1.TABLE1.DE_A)"""
cursor.execute(query)
records = cursor.fetchall()
for row in records:
print(row[0])
Can someone help me?
If the tables exist in different databases you need the ATTACH DATABASE statement to use the 2nd database with the connection object that you connect to the 1st database:
import sqlite3
conn = sqlite3.connect('db_a')
cursor = conn.cursor()
attach = "ATTACH DATABASE 'db_b' AS db_b;"
cursor.execute(attach)
query = """
SELECT t1.COL1
FROM TABLE1 AS t1
WHERE NOT EXISTS (
SELECT t2.COL1
FROM db_b.TABLE2 AS t2
WHERE t2.COL1 = t1.COL1
)
"""
cursor.execute(query)
records = cursor.fetchall()
for row in records:
print(row[0])
detach = "DETACH DATABASE db_b;"
cursor.execute(detach)
Also, instead of EXISTS you could use EXCEPT with the difference being that EXCEPT returns only distinct results:
query = """
SELECT COL1 FROM TABLE1
EXCEPT
SELECT COL1 FROM db_b.TABLE2
"""
After a query with Python's psycopg2
SELECT
id,
array_agg(еnty_pub_uuid) AS ptr_entity_public
FROM table
GROUP BY id
I get returned an array:
{a630e0a3-c544-11ea-9b8c-b73c488956ba,c2f03d24-2402-11eb-ab91-3f8e49eb63e7}
How can I parse this to a list in python?
Is there a builtin function in psycopg2?
psycopg2 cares about type conversations between python and postgres:
import psycopg2
conn = psycopg2.connect("...")
cur = conn.cursor()
cur.execute(
"select user_id, array_agg(data_name) from user_circles where user_id = '81' group by user_id"
)
res = cur.fetchall()
print(res[0])
print(type(res[0][1]))
Out:
('81', ['f085b2e3-b943-429e-850f-4ecf358abcbc', '65546d63-be96-4711-a4c1-a09f48fbb7f0', '81d03c53-9d71-4b18-90c9-d33322b0d3c6', '00000000-0000-0000-0000-000000000000'])
<class 'list'>
you need to register the UUID type for python and postgres to infer types.
import psycopg2.extras
psycopg2.extras.register_uuid()
sql = """
SELECT
id,
array_agg(еnty_pub_uuid) AS ptr_entity_public
FROM table
GROUP BY id
"""
cursor = con.cursor()
cursor.execute(sql)
results = cursor.fetchall()
for r in results:
print(type(r[1]))
I am trying to use one query output into other. but not getting the correct result. Can you please help me how to do this?
Example:
query1 = "select distinct lower(tablename) as tablename from medaff.imedical_metadata where object_type = 'View'"
output of above query is :
tablename
vw_mdcl_insght
vw_fbms_interactions
I want to use above output in other query. Something like this-
query2 = "select * from medaff.imedical_business_metadata where objectname in ('vw_mdcl_insght', 'vw_fbms_interactions')"
How to do this part in python?
I am using below code to run the query:
conn = redshift_conn()
with conn.cursor() as cur:
query1 = "select distinct lower(tablename) as tablename from medaff.imedical_metadata where object_type = 'View'"
cur.execute(sql_query)
result = cur.fetchall()
print(result)
conn.commit()
query2 = "select * from medaff.imedical_business_metadata where objectname in ('vw_mdcl_insght', 'vw_fbms_interactions')"
cur.execute(sql_query)
result = cur.fetchall()
print(result)
conn.commit()
I think you can just use an in query:
select ibm.*
from medaff.imedical_business_metadata ibm
where ibm.objectname in (select lower(im.tablename) as tablename
from medaff.imedical_metadata im
where im.object_type = 'View'
);
It is better to let the database do the work.
I used the below code:
query = "select distinct lower(tablename) from medaff.imedical_metadata where object_type = 'View'"
cur.execute(query)
res = cur.fetchall()
print(res)
res = tuple([item[0] for item in res])
res = str(res)
SET UP
MWE: I have a table in SQL Server as such
CREATE TABLE dbo.MyTable(
order_id INT IDENTITY(1,1),
column2 DECIMAL,
column3 INT
PRIMARY KEY(order_id)
)
I am using pyodbc to insert some data in the form of a pandas.DataFrame into the table. I am using data such as:
column2 column3
0 1.23 5
1 4.95 9
2 6.79 10
Where I've created this example dataframe using
data = pd.DataFrame({'column2':[1.23, 4.95, 6.79], 'column3':[5,9,10]})
I use the following statement to insert data
stmt = "INSERT INTO dbo.MyTable(column2, column3) OUTPUT Inserted.order_id VALUES (?, ?)"
ISSUE
This is the code that I use to insert everything and returning the values:
# Set up connection and create cursor
conn_string = "DRIVER={MyDriver};SERVER=MyServer;DATABASE=MyDb;UID=MyUID;PWD=MyPWD"
cnxn = pyodbc.connect(conn_string)
cnxn.autocommit = False
cursor = cnxn.cursor()
cursor.fast_executemany = True
# Upload data
cursor.executemany(stmt, data.values.tolist())
# Process the result
try:
first_result = cursor.fetchall()
except pyodbc.ProgrammingError:
first_result = None
result_sets = []
while cursor.nextset():
result_sets.append(cursor.fetchall())
all_inserted_ids = np.array(result_sets).flatten()
However, I do not get all the ids that I should get! For instance, suppose there is no data in the table, I will not get
all_inserted_ids = np.array([1, 2, 3])
But rather I will only get
all_inserted_ids = np.array([2, 3])
Which means that I'm losing the first id somewhere!
And notice that first_result never works. It always throws the following:
pyodbc.ProgrammingError: No results. Previous SQL was not a query.
I've also tried using cursor.fetchone(), cursor.fetchone()[0] or cursor.fetchval() but they got me the same error.
METHODS THAT I TRIED BUT DID NOT WORK
1) Adding "SET NOCOUNT ON"
I tried using the same code as in the question but with
stmt =
"""
SET NOCOUNT ON;
INSERT INTO dbo.MyTable(column2, column3)
OUTPUT Inserted.order_id
VALUES (?, ?)
"""
The output was [1, 2] so I was missing 3.
2) Adding "SET NOCOUNT ON" and inserting output to table variable
I used the following statement:
stmt =
"""
SET NOCOUNT ON;
DECLARE #NEWID TABLE(ID INT);
INSERT INTO dbo.MyTable(column2, column3)
OUTPUT Inserted.order_id INTO #NEWID(ID)
VALUES (?, ?)
SELECT ID FROM #NEWID
"""
Again this didn't work as I obtained only '[2, 3]' but no '1'.
3) Selecting the ##IDENTITY
I used the following statement:
stmt =
"""
INSERT INTO dbo.MyTable(column2, column3)
OUTPUT Inserted.order_id
VALUES (?, ?)
SELECT ##IDENTITY
"""
But it didn't work as I obtained array([Decimal('1'), 2, Decimal('2'), 3, Decimal('3')]
4) Selecting ##IDENTITY with SET NOCOUNT ON
I used
stmt =
"""
SET NOCOUNT ON
INSERT INTO dbo.MyTable(column2, column3)
OUTPUT Inserted.order_id
VALUES (?, ?);
SELECT ##IDENTITY
"""
but I got array([Decimal('1'), 2, Decimal('2'), 3, Decimal('3')], dtype=object) again.
5) Selecting ##IDENTITY without using OUTPUT
I used:
stmt =
"""
INSERT INTO dbo.MyTable(column2, column3)
VALUES (?, ?);
SELECT ##IDENTITY
"""
But I got [Decimal('2') Decimal('3')]
6) Selecting ##IDENTITY without using OUTPUT but with SET NOCOUNT ON
I used:
stmt =
"""
SET NOCOUNT ON
INSERT INTO dbo.MyTable(column2, column3)
VALUES (?, ?);
SELECT ##IDENTITY
"""
But again I got: [Decimal('2') Decimal('3')]
A possible way around this, which is really bad, but does the job
A possible way is to create a new table where we'll store the ids and truncate it once we're done. It is horrible but I couldn't find any other solution..
Create a table:
CREATE TABLE NEWID(
ID INT
PRIMARY KEY (ID)
)
Next this is the complete code:
import pyodbc
import pandas as pd
import numpy as np
# Connect
conn_string = """
DRIVER={MYDRIVER};
SERVER=MYSERVER;
DATABASE=DB;
UID=USER;
PWD=PWD
"""
cnxn = pyodbc.connect(conn_string)
cnxn.autocommit = False
cursor = cnxn.cursor()
cursor.fast_executemany = True
# Data, Statement, Execution
data = pd.DataFrame({'column2': [1.23, 4.95, 6.79], 'column3': [5, 9, 10]})
stmt = """
INSERT INTO dbo.MyTable(column2, column3)
OUTPUT Inserted.order_id INTO NEWID(ID)
VALUES (?, ?);
"""
cursor.executemany(stmt, data.values.tolist())
cursor.execute("SELECT ID FROM NEWID;")
# Get stuff
try:
first_result = cursor.fetchall()
except pyodbc.ProgrammingError:
first_result = None
result_sets = []
while cursor.nextset():
result_sets.append(cursor.fetchall())
all_inserted_ids = np.array(result_sets).flatten()
print('First result: ', first_result)
print('All IDs: ', all_inserted_ids)
cursor.commit()
# Remember to truncate the table for next use
cursor.execute("TRUNCATE TABLE dbo.NEWID;", [])
cursor.commit()
This will return
First result: [(1, ), (2, ), (3, )]
All IDs: []
So we just keep the first result.
I have implemented a method similar to your method 1) using sqlAlchemy with the pyodbc dialect. It can easily be adapted to the pyodbc library directly. The trick was to had a SELECT NULL; before the Insert query. This way the first OUTPUT of the insert query will be in the returned sets. Using this method if you inserted n rows you will need to fetch 2n-1 sets using the cursor's nextset().
This is a patch because either MSSQL or pyodbc discards the first set. I wonder if there is an option is MSSQL server or pyodbc where you could specify to return the first set.
from sqlalchemy.orm import Session
from sqlalchemy.sql.expression import TableClause
def bulk_insert_return_defaults_pyodbc(
session: Session, statement: TableClause, parameters: List[dict], mapping: dict
):
"""
Parameters
----------
session:
SqlAlchemy Session object
statement:
SqlAlchemy table clause object (ie. Insert)
parameters:
List of parameters
ex: [{"co1": "value1", "col2": "value2"}, {"co1": "value3", "col2": "value4"}]
mapping
Mapping between SqlAlchemy declarative base attribute and name of column in
database
Returns
-------
"""
if len(parameters) > 0:
connexion = session.connection()
context = session.bind.dialect.execution_ctx_cls._init_statement(
session.bind.dialect,
connexion,
connexion._Connection__connection.connection,
statement,
parameters,
)
statement = context.statement.compile(
session.bind, column_keys=list(context.parameters[0].keys())
)
session.bind.dialect.do_executemany(
context.cursor,
"SELECT NULL; " + str(statement),
[
tuple(p[p_i] for p_i in statement.params.keys())
for p in context.parameters
],
context,
)
results = []
while context.cursor.nextset():
try:
result = context.cursor.fetchone()
if result[0] is not None:
results.append(result)
except Exception:
continue
return [
{mapping[r.cursor_description[i][0]]: c for i, c in enumerate(r)}
for r in results
]
else:
return []
multi_params = bulk_insert_return_defaults_pyodbc(
session,
table_cls.__table__.insert(returning=[table_cls.id]),
multi_params,
{
getattr(table_cls, c).expression.key: c
for c in list(vars(table_cls))
if isinstance(getattr(table_cls, c), InstrumentedAttribute)
},
)
To get a cursor in django I do:
from django.db import connection
cursor = connection.cursor()
How would I get a dict cursor in django, the equivalent of -
import MySQLdb
connection = (establish connection)
dict_cursor = connection.cursor(MySQLdb.cursors.DictCursor)
Is there a way to do this in django? When I tried cursor = connection.cursor(MySQLdb.cursors.DictCursor) I got a Exception Value: cursor() takes exactly 1 argument (2 given). Or do I need to connect directly with the python-mysql driver?
The django docs suggest using dictfetchall:
def dictfetchall(cursor):
"Returns all rows from a cursor as a dict"
desc = cursor.description
return [
dict(zip([col[0] for col in desc], row))
for row in cursor.fetchall()
]
Is there a performance difference between using this and creating a dict_cursor?
No there is no such support for DictCursor in Django.
But you can write a small function to that for you.
See docs: Executing custom SQL directly:
def dictfetchall(cursor):
"Returns all rows from a cursor as a dict"
desc = cursor.description
return [
dict(zip([col[0] for col in desc], row))
for row in cursor.fetchall()
]
>>> cursor.execute("SELECT id, parent_id from test LIMIT 2");
>>> dictfetchall(cursor)
[{'parent_id': None, 'id': 54360982L}, {'parent_id': None, 'id': 54360880L}]
Easily done with Postgres at least, i'm sure mysql has similar ( Django 1.11)
from django.db import connections
from psycopg2.extras import NamedTupleCursor
def scan_tables(app):
conn = connections['default']
conn.ensure_connection()
with conn.connection.cursor(cursor_factory=NamedTupleCursor) as cursor:
cursor.execute("SELECT table_name, column_name "
"FROM information_schema.columns AS c "
"WHERE table_name LIKE '{}_%'".format(app))
columns = cursor.fetchall()
for column in columns:
print(column.table_name, column.column_name)
scan_tables('django')
Obviously feel free to use DictCursor, RealDictCursor, LoggingCursor etc
The following code converts the result set into a dictionary.
from django.db import connections
cursor = connections['default'].cursor()
columns = (x.name for x in cursor.description)
result = cursor.fetchone()
result = dict(zip(columns, result))
If the result set has multiple rows, iterate over the cursor instead.
columns = [x.name for x in cursor.description]
for row in cursor:
row = dict(zip(columns, row))
The main Purpose of using RealDictCursor is to get data in list of dictionary format.
And the apt solution is this and that too without using django ORM
def fun(request):
from django.db import connections
import json
from psycopg2.extras import RealDictCursor
con = connections['default']
con.ensure_connection()
cursor= con.connection.cursor(cursor_factory=RealDictCursor)
cursor.execute("select * from Customer")
columns=cursor.fetchall()
columns=json.dumps(columns)
output:
[{...},{...},{......}]