How to insert a Pandas Dataframe into a SQL Server synonym table - python

Summary
In SQL Server, synonyms are often used to abstract a remote table into the current database context. Normal DML operations work just fine on such a construct, but SQL Server does track synonyms as their own object type separately from tables.
I'm attempting to leverage the pandas DataFrame#to_sql method to facilitate loading a synonym, and while it works well when the table is local to the database, it is unable to locate the table via synonym and instead attempts to create a new table coordinating with the DataFrame's structure, which results in an object name collision and undesirable behavior.
Tracking through the source, it looks like pandas leverages the dialect's has_table method, which in this case tracks to SQL Alchemy's MSSQL dialect implementation, which then queries the INFORMATION_SCHEMA.columns view as a way to verify whether the table exists.
Unfortunately, synonym tables don't appear in INFORMATION_SCHEMA views like this. In the answer for "How to find all column names of a synonym", the answerer provides a technique for establishing a synonym's columns, which may be applicable here.
The Question
Is there any method available which can optionally skip table existence checks during DataFrame#to_sql? If not, is there any way to force pandas or SQL Alchemy to recognize a synonym? I couldn't find any similar questions on SO, and neither git had an issue resembling this either.

I've accepted my own answer, but if anyone has a better technique for loading DataFrames to SQL Server synonyms, please post it!
SQL Alchemy on SQL Server doesn't currently support synonym tables, which means that the DataFrame#to_sql method cannot insert to them and another technique must be employed.
As of SQL Alchemy 1.2, the Oracle dialect supports Synonym/DBLINK Reflection, but no similar feature is available for SQL Server, even on the upcoming SQL Alchemy 1.4 release.
For those trying to solve this in different ways, if your situation meets the following criteria:
Your target synonym is already declared in the ORM as a table
The table's column names match the column names in the DataFrame
The table's column data types either match the DataFrame or can be casted without error
You can perform the following bulk_insert_mappings operation, with TargetTable defining your target in the ORM model and df defining your DataFrame:
db.session.bulk_insert_mappings(
TargetTable, df.to_dict('records')
)
As a bonus, this is substantially faster than the DataFrame#to_sql operation as well!

Related

Python or R -- create a SQL join using a dataframe

I am trying to find a way, either in R or Python, to use a dataframe as a table in an Oracle SQL statement.
It is impractical, for my objective, to:
Create a string out of a column and use that as a criteria (more than a 1k, which is the limit)
Create a new table in the database and use that (don't have access)
Download the entire contents of the table and merge in pandas (millions of records in the database and would bog down the db and my system)
I have found packages that will allow you to "register" a dataframe and have it act as a "table/view" to allow queries against it, but it will not allow them to be used in a query with a different connection string. Can anyone point me in the right direction? Either to allow two different connections in the same SQL statement (to Oracle and a package like DuckDB) to permit an inner join or direct link to the dataframe and allow that to be used as a table in a join?
SAS does this so effortlessly and I don't want to go back to SAS because the other functionality is not as good as Python / R, but this is a dealbreaker if I can't do database extractions.
Answering my own question here -- after much research.
In short, this cannot be done. A series of criteria, outside of a list or concat, you cannot create a dataframe in python or R and pass it through a query into a SQL Server or Oracle database. It's unfortunate, but if you don't have permissions to write to temporary tables in the Oracle database, you're out of options.

Can I use ? in SQL to select a table

I have an SQL DB with 3/4 tables in it. I would like to write 1 Python function that could use parameters to search the relevant tables. Is this possible?
The line I was thinking would be
self.cur.execute(select * from ? (Table))
this obviously works when choosing a column but I cannot get it to work for table. Is it possible, or should I change plan?
No. Tables are similar to types in a strongly-typed language, not parameters.
Queries aren't executed like scripts. They are compiled into execution plans, using different operators depending on the table schema, indexes and statistics, ie the number of rows and distribution of values. For the same JOIN, the query optimizer may decide to use a HASH JOIN for unordered, unindexed data or nested loops if the join columns are indexed. Or a MERGE join can be used if the data from both tables is ordered.
Even for the same query, a very different execution plan may be generated if the table contains a few dozen or a few million rows
Parameters are passed to that execution plan the same way parameters are passed to a method. They are even passed separately from the SQL text in the RPC call from client to server. That's why they aren't vulnerable to SQL injection - they are never part of the query itself.

How to handle customized schema with sqlalchemy

I'm pretty new to database and server related tasks. I currently have two tables stored in a MSsql database on a server and I'm trying to use python package sqlalchemy to pull some of the data to my local machine. The first table has default schema dbo, and I was able to use the Connect String
'mssql+pyodbc://<username>:<password>#<dsnname>'
to inspect the table, but the other table has a customized schema, and I don't see any information about the table when I use the previous commands. I assume it is because now the second table has different schema and the python package can't find it anymore.
I was looking at automap hoping the package offers a way to deal with customized schema, but many concepts described in there I don't quite understand and I'm not trying to alter the database just pulling data so not sure if it's the right way, any suggestions?
Thanks
In case of automap you should pass the schema argument when preparing reflectively:
AutomapBase.prepare(reflect=True, schema='myschema')
If you wish to reflect both the default schema and your "customized schema" using the same automapper, then first reflect both schemas using the MetaData instance and after that prepare the automapper:
AutomapBase.metadata.reflect()
AutomapBase.metadata.reflect(schema='myschema')
AutomapBase.prepare()
If you call AutomapBase.prepare(reflect=True, ...) consecutively for both schemas, then the automapper will recreate and replace the classes from the 1st prepare because the tables already exist in the metadata. This will then raise warnings.

How do I return json in a psycopg2 query?

I am using psycopg2 (version 2.5.4) to query a PostgreSQL database (version 9.2.7). One of the columns I'm querying is a json type which psycopg2 is documented as being able to handle. However, I'm receiving the following error:
psycopg2.ProgrammingError: could not identify an equality operator for type json
I'm not performing any equality operations on the column in question, just returning it with a select statement. The query is simple and fairly straight forward:
SELECT DISTINCT me.measure_id, me.choices
FROM measures ME
WHERE TRUE AND me.measure_id IN (3)
ORDER BY me.measure_id;
me.choices is the only column of type JSON in the table. I've searched extensively and found nothing and can't think of a way forward. Any advice will be appreciated.
select distinct requires that each full row be distinct. So when you say select DISTINCT me.measure_id, me.choices you're asking postgresql to perform equality operations on choices to see if two rows are the same.
Assuming measure_id is the primary key for measures you can drop the distinct. Otherwise you could use distinct on to grab just one row for a measure_id.

Work with Postgres/PostGIS View in SQLAlchemy

Two questions:
i want to generate a View in my PostGIS-DB. How do i add this View to my geometry_columns Table?
What i have to do, to use a View with SQLAlchemy? Is there a difference between a Table and View to SQLAlchemy or could i use the same way to use a View as i do to use a Table?
sorry for my poor english.
If there a questions about my question, please feel free to ask so i can try to explain it in another way maybe :)
Nico
Table objects in SQLAlchemy have two roles. They can be used to issue DDL commands to create the table in the database. But their main purpose is to describe the columns and types of tabular data that can be selected from and inserted to.
If you only want to select, then a view looks to SQLAlchemy exactly like a regular table. It's enough to describe the view as a Table with the columns that interest you (you don't even need to describe all of the columns). If you want to use the ORM you'll need to declare for SQLAlchemy that some combination of the columns can be used as the primary key (anything that's unique will do). Declaring some columns as foreign keys will also make it easier to set up any relations. If you don't issue create for that Table object, then it is just metadata for SQLAlchemy to know how to query the database.
If you also want to insert to the view, then you'll need to create PostgreSQL rules or triggers on the view that redirect the writes to the correct location. I'm not aware of a good usage recipe to redirect writes on the Python side.

Categories

Resources