Working with Cursors in Python - python

Searched the web and this forum without satisfaction. Using Python 2.7 and pyODBC on Windows XP. I can get the code below to run and generate two cursors from two different databases without problems. Ideally, I'd then like to join these result cursors thusly:
SELECT a.state, sum(b.Sales)
FROM cust_curs a
INNER JOIN fin_curs b
ON a.Cust_id = b.Cust_id
GROUP BY a.state
Is there a way to join cursors using SQL statements in python or pyODBC? Would I need to store these cursors in a common DB (SQLite3?) to accomplish this? Is there a pure python data handling approach that would generate this summary from these two cursors?
Thanks for your consideration.
Working code:
import pyodbc
#
# DB2 Financial Data Cursor
#
cnxn = pyodbc.connect('DSN=DB2_Fin;UID=;PWD=')
fin_curs = cnxn.cursor()
fin_curs.execute("""SELECT Cust_id, sum(Sales) as Sales
FROM Finance.Sales_Tbl
GROUP BY Cust_id""")
#
# Oracle Customer Data Cursor
#
cnxn = pyodbc.connect('DSN=Ora_Cust;UID=;PWD=')
cust_curs = cnxn.cursor()
cust_curs.execute("""SELECT Distinct Cust_id, gender, address, state
FROM Customers.Cust_Data""")

Cursors are simply objects used for executing SQL commands and retrieving the results. The data aren't migrated in a new database and thus joins aren't possible. If you would like to join the data you'll need to have the two tables in the same database. Whether that means brining both tables and their data into a SQLite database or doing it some other way depends on the specifics of your use case, but that would theoretically work.

Related

POSTGRESQL Queries using Python

I am trying to access tables from a database using python. There was some code on the website: https://rnacentral.org/help/public-database
import psycopg2.extras
def main():
conn_string = "host='hh-pgsql-public.ebi.ac.uk' dbname='pfmegrnargs' user='reader' password='NWDMCE5xdipIjRrp'"
conn = psycopg2.connect(conn_string)
cursor = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)`
# retrieve a list of RNAcentral databases
query = "SELECT * FROM rnc_database"
cursor.execute(query)
for row in cursor:
print(row)`
When i run this code, i get back a list of databases:
I want to access tables from one of these databases but I don't know what the schema for those tables are or what the values in each list returned represents. I have been looking at 'postgresql to python' resources but all of them are about accessing tables when you know the name of the tables and the columns within.... Is there code for how I can access the table names from the database?
Thank You
Edit: sorry, i thought i linked the website before
The dataset you want to use has schema diagram here https://rnacentral.org/help/public-database
For general purpose I would use something like https://dbeaver.io/ tool it will show you all the schemas in the db and tables inside the schema and so forth. The DBeaver settings to connect to your db would look like this
If you want to keep using python script to explore the db this sql query
SELECT *
FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog' AND
schemaname != 'information_schema';
Should help you.

Copy tables from one database to another in SQL Server, using Python

Does anybody know of a good Python code that can copy large number of tables (around 100 tables) from one database to another in SQL Server?
I ask if there is a way to do it in Python, because due to restrictions at my place of employment, I cannot copy tables across databases inside SQL Server alone.
Here is a simple Python code that copies one table from one database to another. I am wondering if there is a better way to write it if I want to copy 100 tables.
print('Initializing...')
import pandas as pd
import sqlalchemy
import pyodbc
db1 = sqlalchemy.create_engine("mssql+pyodbc://user:password#db_one")
db2 = sqlalchemy.create_engine("mssql+pyodbc://user:password#db_two")
print('Writing...')
query = '''SELECT * FROM [dbo].[test_table]'''
df = pd.read_sql(query, db1)
df.to_sql('test_table', db2, schema='dbo', index=False, if_exists='replace')
print('(1) [test_table] copied.')
SQLAlchemy is actually a good tool to use to create identical tables in the second db:
table = Table('test_table', metadata, autoload=True, autoload_with=db1)
table.create(engine=db2)
This method will also produce correct keys, indexes, foreign keys. Once the needed tables are created, you can move the data by either select/insert if the tables are relatively small or use bcp utility to dump table to disk and then load it into the second database (much faster but more work to get it to work correctly)
If using select/insert then it is better to insert in batches of 500 records or so.
You can do something like this:
tabs = pd.read_sql("SELECT table_name FROM INFORMATION_SCHEMA.TABLES", db1)
for tab in tabs['table_name']:
pd.read_sql("select * from {}".format(tab), db1).to_sql(tab, db2, index=False)
But it might be be awfully slow. Use SQL Server tools to do this job.
Consider using sp_addlinkedserver procedure to link one SQL Server from another. After that you can execute:
SELECT * INTO server_name...table_name FROM table_name
for all tables from the db1 database.
PS this might be done in Python + SQLAlchemy as well...

How to create a table in one DB from the result of a query from different DB in Python

I'm working with two separate databases (Oracle and PostgreSQL) where a substantial amount of reference data is in Oracle that I'll frequently need to reference while doing analysis on data stored in Postgres (I have no control over this part). I'd like to be able to directly transfer the results of a query from Oracle to a table in Postgres (and vice versa). The closest I've gotten is something like the code below using Pandas as a go between, but it's quite slow.
import pandas as pd
import psycopg2
import cx_Oracle
import sqlalchemy
# CONNECT TO POSTGRES
db__PG = psycopg2.connect(dbname="username", user="password")
eng_PG = sqlalchemy.create_engine('postgresql://username:password#localhost:5432/db_one')
# CONNECT TO ORACLE
db__OR = cx_Oracle.connect('username', 'password', 'SID_Name')
eng_OR = sqlalchemy.create_engine('oracle+cx_oracle://username:password#10.1.2.3:4422/db_two')
#DEFINE QUERIES
query_A = "SELECT * FROM tbl_A"
query_B = "SELECT * FROM tbl_B"
# CREATE PANDAS DATAFRAMES FROM QUERY RESULTS
df_A = pd.read_sql_query(query_A, db__OR)
df_B = pd.read_sql_query(query_B, db__PG)
# CREATE TABLES FROM PANDAS DATAFRAMES
df_A.to_sql(name='tbl_a', con=eng_PG, if_exists='replace', index=False)
df_B.to_sql(name='tbl_b', con=eng_OR, if_exists='replace', index=False)
I think there would have to be a more efficient, direct way to do this (like database links for moving data across different DB's in Oracle), but I'm fairly new to Python and have generally worked either directly in SQL or in SAS previously. I've been searching for ways to create a table directly from a python cursor result or a SQLAlchemy ResultProxy, but haven't had much luck.
Suggestions?

inconsistent results from LIKE query: pyodbc vs. Access

I got a bunch of queries that should be executed in an Access database as a part of my Python script. Unfortunately, queries that used directly in MS Access are giving some records of output, in Python script return nothing (no error either). Connection with database and general syntax should be fine as simple queries (like select one column from table where something) are working just fine. Here is a code with one of these given queries:
import pyodbc
baza = r"C:\base.mdb"
driver = "{Microsoft Access Driver (*.mdb, *.accdb)}"
access_con_string = r"Driver={};Dbq={};".format(driver, baza)
cnn = pyodbc.connect(access_con_string)
db_cursor = cnn.cursor()
expression = """SELECT F_PARCEL.PARCEL_NR, F_PARCEL_LAND_USE.AREA_USE_CD, F_PARCEL_LAND_USE.SOIL_QUALITY_CD, F_ARODES.TEMP_ADRESS_FOREST, F_SUBAREA.AREA_TYPE_CD, F_AROD_LAND_USE.AROD_LAND_USE_AREA, F_PARCEL.COUNTY_CD, F_PARCEL.DISTRICT_CD, F_PARCEL.MUNICIPALITY_CD, F_PARCEL.COMMUNITY_CD, F_SUBAREA.SUB_AREA
FROM F_PARCEL INNER JOIN (F_PARCEL_LAND_USE INNER JOIN ((F_ARODES INNER JOIN F_AROD_LAND_USE ON F_ARODES.ARODES_INT_NUM = F_AROD_LAND_USE.ARODES_INT_NUM) INNER JOIN F_SUBAREA ON F_ARODES.ARODES_INT_NUM = F_SUBAREA.ARODES_INT_NUM) ON (F_PARCEL_LAND_USE.SHAPE_NR = F_AROD_LAND_USE.SHAPE_NR) AND (F_PARCEL_LAND_USE.PARCEL_INT_NUM = F_AROD_LAND_USE.PARCEL_INT_NUM)) ON F_PARCEL.PARCEL_INT_NUM = F_PARCEL_LAND_USE.PARCEL_INT_NUM
WHERE (((F_ARODES.TEMP_ADRESS_FOREST) Like ?) AND ((F_AROD_LAND_USE.AROD_LAND_USE_AREA)<?) AND ((F_ARODES.TEMP_ACT_ADRESS)= ?))
ORDER BY F_PARCEL.PARCEL_NR, F_PARCEL_LAND_USE.SHAPE_NR;"""
rows = db_cursor.execute(expression, ("14-17-2-03*", 0.0049, True)).fetchall()
for row in rows:
print row
cnn.close()
I know that those queries were generated from query builder in MS Access, so I was wondering that maybe this results in differences, but on the other hand this is still access database.
Anyway it seems, that the problem is in SQL, so I would like to know what elements could possibly result in different output between queries executed directly in MS Access and by pyodbc connection?
You are getting tripped up by the difference in LIKE wildcard characters between queries run in Access itself and queries run from an external application.
When running a query from within Access itself you need to use the asterisk as the wildcard character: "14-17-2-03*".
When running a query from an external application (like your Python app) you need to use the percent sign as the wildcard character: "14-17-2-03%".

Teradata MERGE yielding no results when executed through SQLAlchemy

I'm attempting to use python with sqlalchemy to download some data, create a temporary staging table on a Teradata Server, then MERGEing that table into another table which I've created to permanently store this data. I'm using sql = slqalchemy.text(merge) and td_engine.execute(sql) where merge is a string similar to the below:
MERGE INTO perm_table as p
USING temp_table as t
ON p.Id = t.Id
WHEN MATCHED THEN
UPDATE
SET col1 = t.col1,
col2 = t.col2,
...
col50 = t.col50
WHEN NOT MATCHED THEN
INSERT (col1,
col2,
...
col50)
VALUES (t.col1,
t.col2,
...
t.col50)
The script runs all the way to the end without error and the SQL executes properly through Teradata Studio, but for some reason the table won't update when I execute it through SQLAlchemy. However, I've also run different SQL expressions, like the insert that populated perm_table from the same python script and it worked fine. Maybe there's something specific to the MERGE and SQLAlchemy combo?
Since you're using the engine directly, without using a transaction, you're probably (barring unseen configuration on your part) relying on SQLAlchemy's version of autocommit, which works by detecting data changing operations such as INSERTs etc. Possibly MERGE is not one of the detected operations. Try
sql = sqlalchemy.text(merge).execution_options(autocommit=True)
td_engine.execute(sql)

Categories

Resources