How do I update a Postgres schema using a python script? - python

I'm new to using Postgres, so I'm not sure if this is a basic question. I obtain a Postgres dump from my company that I load as a Postgres schema using Pgadmin. I then use the psycopg2 package in python to load the various tables of this schema into pandas dataframes and do whatever processing I need.
Every time I get a new version of the data, I have to go through a 3-step process in pgadmin to update the schema before I work with it in python:
"Drop cascade" the existing public schema
Create a new public schema
Restore the public schema by pointing it to the new pgdump file
Rather than doing these three using pgadmin, can this be done programatically from within a python script?

Related

How to create a new database in mongodb using pyspark?

I am working on a project which is tenant based project means for every client creating a new database. For now, to create a new database I am using mongo compass and manually creating database on clicking new or plus sign.
But I want to create a new database in mongodb using the pyspark. I have a mongo connection string. Please suggest me to how to create?
Thank You.
I don't know pyspark but typical mongo workflow to create db is to run any write operation. If you want to create a db before any using of it, you can run a simple insert operation or, for example, create a collection inside this db

Upload data to Exasol from python Dataframe

I wonder if there's anyways to upload a dataframe and create a new table in Exasol? import_from_pandas assumes the table already exists. Do we need to run a SQL separately to create the table? for other databases, to_sql can just create the table if it doesn't exist.
Yes, As you mentioned import_from_pandas requires a table. So, you need to create a table before writing to it. You can run a SQL create table ... script by connection.execute before using import_from_pandas. Also to_sql needs a table since based on the documentation it will be translated to a SQL insert command.
Pandas to_sql allows to create a new table if it does not exist, but it needs an SQLAlchemy connection, which is not supported for Exasol out of the box. However, there seems to be a SQLAlchemy dialect for Exasol you could use (haven't tried it yet): sqlalchemy-exasol.
Alternatively, I think you have to use a create table statement and then populate the table via pyexasol's import_from_pandas.

How to set up Arelle to export xbrl into other postgres schema than `public`?

Here is a documentation about xbrl data exporting into postgres. Also there are XBRL-US and Abstract Model sql scripts for postgres db. The documentation also points out that export can be done into different databases. But postgres supports different schemas in single database. How to change Arelle settings to export xbrl into different schema than public? (here I mean python code, for sql scripts text replace can be easyly done)
I think you are mistaking the default PostgreSQL schema public with XBRL data-models. public is PostgreSQL specific, independent of Arelle or XBRL. To alter the XBRL data model, specify the technology parameter in the arelleCmdLine: pgSemantic will use the Abstract model, postgres will use the XBRL-US Public data-model.
In both cases the XBRL data will appear under the public schema in your PostgreSQL database.

Incrementally get all data from source table (in db1) to destination table (in db2) in PostgreSQL

I have two PostgreSQL databases db1 = source database and db2 = destination database in a AWS server endpoint. For db1, I just have read rights and for the db2 I have both read and write rights. db1 being production database has a table called 'public.purchases', my task is to get incrementally all the data from 'public.purchases' table in db1 to a 'to be newly created table' in db2 (let me call the table as 'public.purchases_copy'). And every time I run the script to perform this action the destination table which is 'public.purchases_copy' in db2 needs to be updated without fully reloading the table.
My question is what would be the best way to achieve this task more efficiently. I did quite a bit of research online and I found out that it can be achieved by connecting Python to PostgreSQL using 'psycopg2' module. Me being not so proficient in Python it would be of great help if somebody help me out in pointing out the links in StackOverflow where the similar question was being answered or guide me in what can be done or how this can be achieved or any particular tutorial which I can refer? Thanks in advance.
PostgreSQL version: 9.5,
PostgreSQL GUI using: pgadmin 3,
Python version installed: 3.5
While it is possible to do this using python, I would recommend first looking into Postgres own module postgres_fdw, if it is possible for you to use it :
The postgres_fdw module provides the foreign-data
wrapper postgres_fdw, which can be used to access data stored in
external PostgreSQL servers.
Details available on postgres docs, but specifically after you set it up, you can :
Create a foreign table, using CREATE FOREIGN TABLE or IMPORT FOREIGN
SCHEMA, for each remote table you want to access. The columns of the
foreign table must match the referenced remote table. You can,
however, use table and/or column names different from the remote
table's, if you specify the correct remote names as options of the
foreign table object.
Now you need only SELECT from a foreign table to access the data
stored in its underlying remote table
For simpler setup, it should probably be best to use readonly db as the foreign one.

How to handle customized schema with sqlalchemy

I'm pretty new to database and server related tasks. I currently have two tables stored in a MSsql database on a server and I'm trying to use python package sqlalchemy to pull some of the data to my local machine. The first table has default schema dbo, and I was able to use the Connect String
'mssql+pyodbc://<username>:<password>#<dsnname>'
to inspect the table, but the other table has a customized schema, and I don't see any information about the table when I use the previous commands. I assume it is because now the second table has different schema and the python package can't find it anymore.
I was looking at automap hoping the package offers a way to deal with customized schema, but many concepts described in there I don't quite understand and I'm not trying to alter the database just pulling data so not sure if it's the right way, any suggestions?
Thanks
In case of automap you should pass the schema argument when preparing reflectively:
AutomapBase.prepare(reflect=True, schema='myschema')
If you wish to reflect both the default schema and your "customized schema" using the same automapper, then first reflect both schemas using the MetaData instance and after that prepare the automapper:
AutomapBase.metadata.reflect()
AutomapBase.metadata.reflect(schema='myschema')
AutomapBase.prepare()
If you call AutomapBase.prepare(reflect=True, ...) consecutively for both schemas, then the automapper will recreate and replace the classes from the 1st prepare because the tables already exist in the metadata. This will then raise warnings.

Categories

Resources