Python SQLAlchemy commit to two different dbs one MSSQL and other PostgreSQL

Python SQLAlchemy commit to two different dbs one MSSQL and other PostgreSQL - python

I am trying to commit to two different dbs, one hosted on MSSQL and the other PostgreSQL. I have two different session objects. I know I can do the following,
session1.add(record) // MSSQL session
session1.commit()
session2.add(record) // PostgreSQL session
session2.commit()
But, I am trying keep then in sync, so either both successful or both fail (if one of them fails don't commit to other). I would appreciate any help or thoughts.

You need to use the distributed transaction coordinator to create a distributed transaction.
There is an old saying: A man who has one watch knows what time it is, a man who has two is never sure.

Related

How to identify number of connections to a Postgres Database (heroku)?

I am trying to identify the number of connections to a postgres database. This is in context of the connection limit on heroku-postgres for dev and hobby plans, which is limited to 20. I have a python django application using the database. I want to understand what constitute a connection. Will each instance of an user using the application count as one connection ? Or The connection from the application to the database is counted as one.
To figure this out I tried the following.
Opened multiple instances of the application from different clients (3 separate machines).
Connected to the database using an online Adminer tool(https://adminer.cs50.net/)
Connected to the database using pgAdmin installed in my local system.
Created and ran dataclips (query reports) on the database from heroku.
Ran the following query from adminer and pgadmin to observe the number of records:
select * from pg_stat_activity where datname ='db_name';
Initial it seemed there was a new record for each for the instance of the application I opened and 1 record for the adminer instance. After some time the query from adminer was showing 6 records (2 connections for adminer, 2 for the pgadmin and 2 for the web-app).
Unfortunately I am still not sure if each instance of users using my web application would be counted as a connection or will all connections to the database from the web app be counted as one ?
Thanks in advance.
Best Regards!

Using PostgreSQL parameters to log connections and disconnections (with right log_line_prefix parameter to have client information) should help:
log_connections (boolean)
Causes each attempted connection to the server to be logged, as well as successful completion of client authentication. Only
superusers can change this parameter at session start, and it cannot
be changed at all within a session. The default is off.
log_disconnections (boolean)
Causes session terminations to be logged. The log output provides information similar to log_connections, plus the duration of the
session. Only superusers can change this parameter at session start,
and it cannot be changed at all within a session. The default is off.

Sqlalchemy queries are running on snowflake but not consistently returning data

When executing queries through a sqlalchemy connection to Snowflake, only a small number return data. The other times, an Operational Error is thrown from the connection.
However, looking at the query_history of my user through the webUI, it shows that the query itself was run and the data produced (rows extracted > 0). However this data was not returned through the connection.
Is there some temporary table where this data is being stored, or further requirements for a snowflake connection that I'm missing?
Thanks

A possible reason might be that you have not whitelisted access to snowflake maintained customer internal stage area. This area is used to place large data sets and pull up in the tool when requested.
You can possibly use the whitelisting function and validate if the access is present from your network.
Link here has more information for the whitelisting function.
Hope this helps!

What is the best way to migrate all data from Oracle 11.2 to SQL Server 2012?

The approach I am trying is to write a dynamic script that would generate mirror tables as in Oracle with similar data types in SQL server. Then again, write a dynamic script to insert records to SQL server. The challenge I see is incompatible data types. Has anyone come across similar situation? I am a sql developer but I can learn python if someone can share their similar work.

Have you tried the "SQL Server Import and Export Wizard" in SSMS?
i.e. if you create an empty SQL server database and right click on it in SSMS then one of the "tasks" menu options is "Import Data..." which starts up the "SQL Server Import and Export Wizard". This builds a once-off SSIS package .. which can be saved if you want to re-use.
There is a data source option for "Microsoft OLE DB Provider for Oracle".
You might have a better Oracle OLE DB Provider available also to try.
The will require Oracle client software to be available.
I haven't actually tried this (Oracle to SQL*Server) so am not sure if reasonable or not.
How many tables, columns?
Oracle DB may also have Views, triggers, constraints, Indexes, Functions, Packages, sequence generators, synonyms.

I used linked server, got all the metadata of the tables from dba_tab_columns in Oracle. Wrote script to create tables based on the metadata. I needed to use SSIS script task to save the create table script for source control. Then I wrote sql script to insert data from oracle, handled type differences through script.

A command-line/API tool for tracking history of tables in a database, does it exist or should I go and develop one?

I am currently working on a project where I need to do databases synchronization. We have a main database on a server and a webapp on it to interact with the data. But since this data is geographic (complex polygons and some points), it is more convenient and more efficient for the users to have a local database when working on the polygons (we use QGIS), and then upload the changes in the server. But while an user was working locally, it is possible that some points were modified in the server (it is only possible to interact with the points on the server). This is why I need the ability to synchronize the databases.
Having an history of INSERT, UPDATE and DELETE of the points on the local database and the same on the server database should be enough to reconstruct a history of the points and then synchronize.
By the way, we use Spatialite for local databases and PostGIS for the server main database
I found a bunch of resources on how to do this using triggers on databases:
http://database-programmer.blogspot.com/2008/07/history-tables.html
How to Store Historical Data
...
But I could not find any tool or library for doing this without having to manually write the triggers. For my needs I could absolutely do it manually, but I feel like it is also something that could be made easier and more convenient with a dedicated command-line/API tool. The tool would for instance generate history tables and triggers for the tables where the user want to track an history, and we could also imagine different options such as:
Which columns do we want to track?
Do we only want to track the actions, or also the values?
...
So, to conclude, my questions are:
Is there any existing tool doing this? I searched and found nothing.
Do you think it would be feasible/relevant to implement a such tool? I was thinking in doing it in Python (since my project is Django-powered), enable different backends (right now I need SQLite/Spatialite and PostgreSQL/PostGIS)...
Thank's for your answers,
Dim'

Chek out GeoGig. GeoGig can track and synchronize geodata from various sources, i.e Postgis, Esri shapefile and spatialite. It implements the typical Git workflow but on data. You will have a data repository on a server which can be cloned and pulled and pushed from your local workstation.
GeoGit is a young project, still in beta but already powerful and features rich, having the ability to merge different commits, create diffs, switch branches, track history and all other typical Git tasks.
A example of a tipical GeoGig workflow:
Geogig has a comfortable command line interface:
# on http://server, initialize and start the remote repository on port 8182 (defaut)
geogig init
geogig serve
# on local, clone the remore repository to your machine
geogig clone http://server:8182 your_repository
cd your_repository/
# on local, import in geogig the data you are working on (Postgis)
geogig pg import --schema public --database your_database --user your_user --password your_pass --table your_table
# on local, add the local changes
geogig add
# on local, commit your changes
geogig commit -m "First commit"
# on local, push to the remote repository
geogig push

You could ask bucardo to do the heavy lifting in terms of multi-master-synchronization. Have a look at https://bucardo.org/wiki/Bucardo
They promise they can even synchronize between different types of databases, eg. postgresql <-> sqlite, http://blog.endpoint.com/2015/08/bucardo-postgres-replication-pgbench.html
I'm not sure about special geospatial capabilities though (synchronizing only regions).
Geogig is definitively worth a try. You can plug a Geogig Repository directly into GeoServer to serve WMS and to edit Features via Web/WFS.

As Wander hinted at, this is not as simple as "Having an history of INSERT, UPDATE and DELETE" and keeping them syncronized. There's lots going on under the hood. There are plenty of DBMS tools for replication / mirroring. Here is one example for PostreSQL: pgpool.

Thank for the answers Wander Nauta and David G, I totally agree on the fact that it is not as simple as this to perform synchronization in general. I should have given more details, but it my case I was believing it could be enough because:
The local data is always a subset of the server data, and each user is assigned a subset. So there is always only one person working offline on a given subset.
On the server, the users can only modify/delete the data they created.
To give more informations on the context, each user is locally digitizing an area in a district, from aerial images. Each user is assigned a district to digitize, and is able to upload his work on the server. On the server, through a webapp, the users can consult the work of everyone, post problem points and comment them, mainly to point out a doubt or an omission on the digitizing. What I want is the users to be able to download a copy of the district they're working on with the points added by their collegues, solve the problems locally, delete the points, eventually add new doubts and upload again.
There is not really a master/slave relation between a local database and the server one, each one has a specific role. Because of this, I am not sure that replication/mirroring will feed my needs, but maybe I'm wrong? Also, I'd like to avoing going with a too sophisticated solution that outfits the needs, and avoid adding too many new dependendies, because the needs won't evolve much.

Update and deploy PostgreSQL schema to Heroku

I have a PostgreSQL schema that resides in a schema.sql file that gets run each time a database connection is initiated in Python. It looks something like:
CREATE TABLE IF NOT EXISTS users (
id SERIAL PRIMARY KEY,
facebook_id TEXT NOT NULL,
name TEXT NOT NULL,
access_token TEXT,
created TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
The app is deployed on Heroku, using their PostgreSQL and everything works as expected.
Now, what if I want to change a bit the structure of my users table? How can I do this the easiest and the best way? I thought of writing an ALTER... line in schema.sql for each change I want to produce in the database, but I don't think this is the best approach, since after some time the schema file will be full of ALTERs and it will slow down my app.
What's the indicated way to deploy changes made to a database?

Running a hard-coded script on each connection is not a great way to handle schema management.
You need to either manage the schema manually, or use a full-fledged tool that keeps a schema version identifier in the database, checks that, and applies a script to upgrade to the next schema version if it's different to the latest one. Rails calls this "migrations" and it kind-of works. If you're using Django it has schema management too.
If you're not using a framework like that, I suggest just writing your own schema upgrade scripts. Add a "schema_version" table with a single row. SELECT it when the app first starts after a redeploy and if it's lower than the current version the app knows about, apply the update script(s) in order, eg schema_1_to_2, schema_2_to_3, etc.
I don't recommend doing this on connect, do it on app start, or better, as a special maintenance command. If you do it on every connection you'll have multiple connections trying to make the same changes and you'll land up with duplicated columns and all sorts of other mess.

I support several django apps on heroku with Postgres. I just connect via PgAdmin and run my scripts when changes are required. I don't see any need for running a script every time a connection is made.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.